
[2202.10054] Fine-Tuning can Distort Pretrained Features and ...
Feb 21, 2022 · We prove that the OOD error of fine-tuning is high when we initialize with a fixed or random head -- this is because while fine-tuning learns the head, the lower layers of the neural …
•Pretrained models give large improvements in accuracy, but how we fine-tune them is key •LP-FT is just a starting point, better methods? •What to do when linear probing not so good?
Fine-Tuning Distorts Pretrained Features - Emergent Mind
Feb 21, 2022 · Fine-tuning alters pretrained features due to simultaneous optimization of the head and lower layers, causing distortions that compromise OOD performance.
Key takeaway A larger change in parameters can distort pretrained features How to retain information beyond the limited data used for adaptation?
Fine-Tuning without Distortion: Improving Robustness to ... - NeurIPS
Our analysis suggests the easy two-step strategy of linear probing then full fine-tuning (LP-FT), which improves pretrained features without distortion, and leads to even higher accuracies.
Fine-Tuning can Distort Pretrained Features and Underperform...
Jan 28, 2022 · We prove that the OOD error of fine-tuning is high when we initialize with a fixed or random head---this is because while fine-tuning learns the head, the lower layers of the neural …
span of the training data when using “good” pretrained features. Even with an infinitesimally small learning rate, fine-tuning distorts pretrained features
Can we refine features without distorting them too much? +10% over fine-tuning! What to do when linear probing not so good?
rameterized two-layer linear networks. We prove that the OOD error of fine-tuning is high when we initialize with a fixed or random head—this is because while fine-tuning learns the head, the lower …
Fine-tune ViT-G/14 (pretrained on JFT-3B) many times with LP-FT using different hyperparameters, average their weights in a greedy strategy (add a new model to the “soup” if ID validation accuracy …