Gaurav Tadkapally

large language models

gpt-2 + pre-trainingmulti-head attn · layer-norm
llama-2rope embeddings · rms-norm · silu
llama-3 and llama-3.1rope embeddings · group query attn

machine learning