1. Tokenization

    Byte-pair Encoding: https://arxiv.org/pdf/1508.07909

    Byte Latent Transformer: https://arxiv.org/pdf/2412.09871

  2. Vectorization

    BERT: https://arxiv.org/pdf/1810.04805

    IMAGEBIND: https://arxiv.org/pdf/2305.05665

    SONAR: https://arxiv.org/pdf/2308.11466

    FAISS Library: https://arxiv.org/pdf/2401.08281

    Facebook Large Concept Models: https://arxiv.org/pdf/2412.08821v2

  3. Infrastructure

    TensorFlow: https://arxiv.org/pdf/1605.08695

    Deepseek Filesystem: https://github.com/deepseek-ai/3FS/blob/main/docs/design_notes.md

    Milvus DB: https://www.cs.purdue.edu/homes/csjgwang/pubs/SIGMOD21_Milvus.pdf

    FAISS: https://arxiv.org/pdf/1702.08734

    Ray: https://arxiv.org/pdf/1712.05889

  4. Core Architecture

    Attention is All You Need: https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf

    FlashAttention: https://arxiv.org/pdf/2205.14135

    Multi Query Attention: https://arxiv.org/pdf/1911.02150

    Grouped Query Attention: https://arxiv.org/pdf/2305.13245

  5. Mixture of Experts

    Sparsely-Gated MoE Layer: https://arxiv.org/pdf/1701.06538

    GShard: https://arxiv.org/pdf/2006.16668

    Switch Transformers: https://arxiv.org/pdf/2101.03961

  6. RLHF

    Deep RL with Human Feedback: https://arxiv.org/pdf/1706.03741

    Fine-Tuning LMs with RHLF: https://arxiv.org/pdf/1909.08593

    Training LMs with RHLF: https://arxiv.org/pdf/2203.02155

  7. Chain of Thought

    CoT Prompting: https://arxiv.org/pdf/2201.11903

    Chain of Thought (Alibaba): https://arxiv.org/pdf/2411.14405v1

    Demystifying Long CoT: https://arxiv.org/pdf/2502.03373

  8. Reasoning

    Transformer Reasoning: https://arxiv.org/pdf/2405.18512

    Scaling Inference with Repeated Sampling: https://arxiv.org/pdf/2407.21787

    Scale Test Time > Parameters: https://arxiv.org/pdf/2408.03314

    DeepSeek R1: https://arxiv.org/pdf/2501.12948v1

  9. Optimizations

    1-bit LLMs (1.58 Bits): https://arxiv.org/pdf/2402.17764

    Inference-Time Scaling for Diffusion Models: https://arxiv.org/pdf/2501.09732

    1b > 405b: https://arxiv.org/pdf/2502.06703

    Speculative Decoding: https://arxiv.org/pdf/2211.17192

  10. Case Studies

    Unit Test Improvement @Meta: https://arxiv.org/pdf/2402.09171

    RAG + Knowledge Graphs: https://arxiv.org/pdf/2404.17723v1

    OpenAI o1 System Card: https://arxiv.org/pdf/2412.16720

    Bug Catchers via LLMs: https://arxiv.org/pdf/2501.12862

    Chain-of-Retrieval RAG: https://arxiv.org/pdf/2501.14342

    Swiggy Search: https://bytes.swiggy.com/improving-search-relevance-in-hyperlocal-food-delivery-using-small-language-models-ecda2acc24e6

    Netflix Foundation Models: https://netflixtechblog.com/foundation-model-for-personalized-recommendation-1a0bd8e02d39

    Model Context Protocol: https://www.anthropic.com/news/model-context-protocol

    Uber QueryGPT: https://www.uber.com/en-IN/blog/query-gpt/