Research papers for AI Engineering.

Tokenization

Byte-pair Encoding: https://arxiv.org/pdf/1508.07909

Byte Latent Transformer: https://arxiv.org/pdf/2412.09871
Vectorization

BERT: https://arxiv.org/pdf/1810.04805

IMAGEBIND: https://arxiv.org/pdf/2305.05665

SONAR: https://arxiv.org/pdf/2308.11466

FAISS Library: https://arxiv.org/pdf/2401.08281

Facebook Large Concept Models: https://arxiv.org/pdf/2412.08821v2
Infrastructure

TensorFlow: https://arxiv.org/pdf/1605.08695

Deepseek Filesystem: https://github.com/deepseek-ai/3FS/blob/main/docs/design_notes.md

Milvus DB: https://www.cs.purdue.edu/homes/csjgwang/pubs/SIGMOD21_Milvus.pdf

FAISS: https://arxiv.org/pdf/1702.08734

Ray: https://arxiv.org/pdf/1712.05889
Core Architecture

Attention is All You Need: https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf

FlashAttention: https://arxiv.org/pdf/2205.14135

Multi Query Attention: https://arxiv.org/pdf/1911.02150

Grouped Query Attention: https://arxiv.org/pdf/2305.13245
Mixture of Experts

Sparsely-Gated MoE Layer: https://arxiv.org/pdf/1701.06538

GShard: https://arxiv.org/pdf/2006.16668

Switch Transformers: https://arxiv.org/pdf/2101.03961
RLHF

Deep RL with Human Feedback: https://arxiv.org/pdf/1706.03741

Fine-Tuning LMs with RHLF: https://arxiv.org/pdf/1909.08593

Training LMs with RHLF: https://arxiv.org/pdf/2203.02155
Chain of Thought

CoT Prompting: https://arxiv.org/pdf/2201.11903

Chain of Thought (Alibaba): https://arxiv.org/pdf/2411.14405v1

Demystifying Long CoT: https://arxiv.org/pdf/2502.03373
Reasoning

Transformer Reasoning: https://arxiv.org/pdf/2405.18512

Scaling Inference with Repeated Sampling: https://arxiv.org/pdf/2407.21787

Scale Test Time > Parameters: https://arxiv.org/pdf/2408.03314

DeepSeek R1: https://arxiv.org/pdf/2501.12948v1
Optimizations

1-bit LLMs (1.58 Bits): https://arxiv.org/pdf/2402.17764

Inference-Time Scaling for Diffusion Models: https://arxiv.org/pdf/2501.09732

1b > 405b: https://arxiv.org/pdf/2502.06703

Speculative Decoding: https://arxiv.org/pdf/2211.17192
Case Studies

Unit Test Improvement @Meta: https://arxiv.org/pdf/2402.09171

RAG + Knowledge Graphs: https://arxiv.org/pdf/2404.17723v1

OpenAI o1 System Card: https://arxiv.org/pdf/2412.16720

Bug Catchers via LLMs: https://arxiv.org/pdf/2501.12862

Chain-of-Retrieval RAG: https://arxiv.org/pdf/2501.14342

Swiggy Search: https://bytes.swiggy.com/improving-search-relevance-in-hyperlocal-food-delivery-using-small-language-models-ecda2acc24e6

Netflix Foundation Models: https://netflixtechblog.com/foundation-model-for-personalized-recommendation-1a0bd8e02d39

Model Context Protocol: https://www.anthropic.com/news/model-context-protocol

Uber QueryGPT: https://www.uber.com/en-IN/blog/query-gpt/