Effects of Parameter Norm Growth During Transformer Training: Inductive Bias from Gradient Descent

William Merrill, Vivek Ramanujan, Yoav Goldberg, Roy Schwartz, Noah A. Smith

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

9 Scopus citations

Fingerprint

Dive into the research topics of 'Effects of Parameter Norm Growth During Transformer Training: Inductive Bias from Gradient Descent'. Together they form a unique fingerprint.

Keyphrases

Computer Science

Mathematics