
DeepSeek-Prover-V2-671B
DeepSeek R1
DeepSeek V3
Janus Pro
Coder V2
OpenThinker
FlashMLA
DeepSeek AI is a Beijing-and-Hangzhou-based R&D firm founded by Liang Wenfeng in July 2023. By leveraging a pre-export stockpile of Nvidia A100 and H800 GPUs—and a small, multidisciplinary team—DeepSeek pioneered open-weight LLMs under an MIT license, democratizing access to cost-effective generative AI worldwide.
2. How Does DeepSeek Se You Millions? 2.1 Inference-Time Computing Activates only the most relevant neuron clusters per query, reducing compute cycles by up to 90 %. Drives inference costs below $0.001 per request via dynamic weight activation. 2.2 Domain-Specific Fine-TuningPre-train on large multilingual corpora, then fine-tune on industry datasets—minimizing over-parameterization and maximizing accuracy on specialized tasks.
3. Training Innovations: RL & Reward Engineering Reinforcement Learning: Rule-based reward models for logical reasoning and math benchmarks (AIME, Putnam). Reward Engineering: Hybrid rule-based and model-based rewards to align chain-of-thought with final answers. Distillation: Compressing 671 B-parameter capabilities into 1.5 B–7 B-parameter distilled models for edge deployment. Emergent Behior Networks: Synthetic expert-model data to spur natural reasoning patterns without manual prompt engineering. 4. Architecture Highlights: MoE & MLA Mixture-of-Experts Layers (MoE): Shared & routed experts balance capacity and minimize wastage. Multi-Head Latent Attention (MLA): Extends context windows to 128 K tokens with low overhead. K-V Caching: Stores key/value pairs between tokens to oid recomputation and boost throughput. Mixed-Precision: 8-bit and custom 12-bit floats reduce memory without sacrificing accuracy. 5. What Market Impact Has DeepSeek Caused? Nvidia & ASML: Shares fell on fears of reduced demand for high-end GPUs and EUV tools. Energy Sector: Stocks dipped amid speculation that energy-efficient inference will lower data-center power bills. App Store: DeepSeek’s mobile chatbot surged to #1 in “Productivity” within 48 hours, dethroning ChatGPT. 6. DeepSeek vs. GPT-4 & Google Gemini Provider Training Cost Cost/Query Active Params DeepSeek R1/V3 $6 M