Tokenization
Byte-pair Encoding: https://arxiv.org/pdf/1508.07909
Byte Latent Transformer: https://arxiv.org/pdf/2412.09871
Vectorization
BERT: https://arxiv.org/pdf/1810.04805
IMAGEBIND: https://arxiv.org/pdf/2305.05665
SONAR: https://arxiv.org/pdf/2308.11466
FAISS Library: https://arxiv.org/pdf/2401.08281
Facebook Large Concept Models: https://arxiv.org/pdf/2412.08821v2
Infrastructure
TensorFlow: https://arxiv.org/pdf/1605.08695
Deepseek Filesystem: https://github.com/deepseek-ai/3FS/blob/main/docs/design_notes.md
Milvus DB: https://www.cs.purdue.edu/homes/csjgwang/pubs/SIGMOD21_Milvus.pdf
Core Architecture
Attention is All You Need: https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf
FlashAttention: https://arxiv.org/pdf/2205.14135
Multi Query Attention: https://arxiv.org/pdf/1911.02150
Grouped Query Attention: https://arxiv.org/pdf/2305.13245
Mixture of Experts
Sparsely-Gated MoE Layer: https://arxiv.org/pdf/1701.06538
GShard: https://arxiv.org/pdf/2006.16668
Switch Transformers: https://arxiv.org/pdf/2101.03961
RLHF
Deep RL with Human Feedback: https://arxiv.org/pdf/1706.03741
Fine-Tuning LMs with RHLF: https://arxiv.org/pdf/1909.08593
Training LMs with RHLF: https://arxiv.org/pdf/2203.02155
Chain of Thought
CoT Prompting: https://arxiv.org/pdf/2201.11903
Chain of Thought (Alibaba): https://arxiv.org/pdf/2411.14405v1
Demystifying Long CoT: https://arxiv.org/pdf/2502.03373
Reasoning
Transformer Reasoning: https://arxiv.org/pdf/2405.18512
Scaling Inference with Repeated Sampling: https://arxiv.org/pdf/2407.21787
Scale Test Time > Parameters: https://arxiv.org/pdf/2408.03314
DeepSeek R1: https://arxiv.org/pdf/2501.12948v1
Optimizations
1-bit LLMs (1.58 Bits): https://arxiv.org/pdf/2402.17764
Inference-Time Scaling for Diffusion Models: https://arxiv.org/pdf/2501.09732
1b > 405b: https://arxiv.org/pdf/2502.06703
Speculative Decoding: https://arxiv.org/pdf/2211.17192
Case Studies
Unit Test Improvement @Meta: https://arxiv.org/pdf/2402.09171
RAG + Knowledge Graphs: https://arxiv.org/pdf/2404.17723v1
OpenAI o1 System Card: https://arxiv.org/pdf/2412.16720
Bug Catchers via LLMs: https://arxiv.org/pdf/2501.12862
Chain-of-Retrieval RAG: https://arxiv.org/pdf/2501.14342
Swiggy Search: https://bytes.swiggy.com/improving-search-relevance-in-hyperlocal-food-delivery-using-small-language-models-ecda2acc24e6
Netflix Foundation Models: https://netflixtechblog.com/foundation-model-for-personalized-recommendation-1a0bd8e02d39
Model Context Protocol: https://www.anthropic.com/news/model-context-protocol
Uber QueryGPT: https://www.uber.com/en-IN/blog/query-gpt/