Chinese AI lab founded under High-Flyer Capital Management in Hangzhou. Releases open-weight frontier models alongside a free chat interface and a pay-per-token API that undercuts OpenAI frontier-model pricing.
. Both expose a 1M token context window with 384K maximum output, support thinking mode by default, and offer JSON output, tool calls, and chat prefix completion. V4-Pro is now listed at the adjusted quarter-price after the May 31 promo ended. The legacy deepseek-chat and deepseek-reasoner endpoint names still resolve to V4-Flash non-thinking and thinking modes respectively, and DeepSeek’s docs flag them for deprecation. Hugging Face now lists DeepSeek V4 releases alongside earlier V3, V3.2, and R1 model families, so self-hosting is a real evaluation path even though hardware, license, and deployment controls still decide fit.
Related coverage: AI Industry Roundup, April 24 tracked the DeepSeek V4 preview alongside frontier-coding model news, Anthropic capital pressure, and Cohere-Aleph Alpha. On April 27, DeepSeek cut V4-Pro pricing by 75% in a developer adoption push that ended May 31 before the same quarter-price became the adjusted public rate on June 1. The June 20 DeepSeek vs GitHub Copilot refresh now separates DeepSeek’s low-cost model/backend role from Copilot’s GitHub-native developer workflow, while DeepSeek vs Replit Agent separates DeepSeek as model infrastructure from Replit’s browser-native app-builder workflow.
System Verdict
Pick DeepSeek if API cost is the hard constraint and frontier-class reasoning is the requirement. V4-Flash bills $0.14 per million cache-miss input tokens, with cache-hit input at $0.0028 per million after the April 26, 2026 cache-hit cut. V4-Pro lists at $0.435 input cache-miss, $0.003625 input cache-hit, and $0.87 output after the June 1 price adjustment. That keeps DeepSeek dramatically below most Western frontier-model APIs for similar long-context reasoning envelopes.
Skip it if compliance, polish, or uptime SLAs matter. The Berlin Data Protection Authority flagged DeepSeek as non-compliant with GDPR in mid-2025, triggering DSA Article 16 notifications to Apple and Google. U.S. House Select Committee scrutiny over chip-export violations continues. EU enterprises under GDPR Article 44 and any U.S.-regulated industry should assume this is not deployable.
Who pays: Free for chat at chat.deepseek.com, API pay-per-token for developers. No consumer subscription. No enterprise SKU with SOC 2 or SLAs.
Key Facts
| Current flagship API models | DeepSeek-V4-Flash (default) and DeepSeek-V4-Pro (premium quarter-price after June 1 adjustment) |
| Reasoning | Thinking mode supported on both V4 models; DeepSeek-R1 (open-weight, January 2025) remains the standalone reasoning weights release |
| Context window | 1M tokens on V4-Flash and V4-Pro |
| Max output | 384K tokens |
| API endpoints | deepseek-v4-flash and deepseek-v4-pro. Legacy deepseek-chat (non-thinking) and deepseek-reasoner (thinking) names map to V4-Flash modes and are flagged for deprecation. |
| V4-Flash pricing (per MTok) | $0.14 cache-miss input · $0.0028 cache-hit input · $0.28 output |
| V4-Pro pricing (per MTok, quarter-price after June 1 adjustment) | $0.435 cache-miss input · $0.003625 cache-hit input · $0.87 output |
| Cache-hit policy | Input cache hit was reduced to one-tenth of launch pricing on April 26, 2026 |
| Features | JSON output, tool calls, chat prefix completion, thinking mode |
| New account credit | 5M free tokens on registration |
| Chat interface | Free · no declared usage cap · DeepThink + web search |
| Open weights | DeepSeek V4 releases plus V3, V3.2, and R1 model families on Hugging Face; self-hosting depends on model, license, hardware, and governance |
| Compliance posture | Chinese company · GDPR concerns flagged by Berlin DPA · U.S. export-control scrutiny ongoing |
Every data point above was verified on 2026-06-20 against DeepSeek API docs, the official V4 release note, and DeepSeek’s Hugging Face organization.
What it actually is
Two product layers on the same underlying models. A free chat interface at chat.deepseek.com runs DeepSeek’s current model with a DeepThink reasoning mode and web search. A pay-per-token API exposes deepseek-v4-flash and deepseek-v4-pro models, with the legacy deepseek-chat (non-thinking) and deepseek-reasoner (thinking) names still resolving to V4-Flash modes during a deprecation window.
Both V4 models support a 1M token context window with 384K maximum output, JSON output, tool calls, chat prefix completion, and thinking mode by default. V4-Pro’s May promotion expired on May 31, but DeepSeek says the official price was adjusted to one-quarter of the original price from June 1, keeping the public listed rate at $0.435/M cache-miss input and $0.87/M output. Cache-hit input pricing was cut to one-tenth of launch pricing on April 26, 2026.
DeepSeek publishes V4 and earlier model families on Hugging Face. Distilled R1 variants still run on consumer GPUs, while full V3/V4-class routes need serious memory, deployment review, and license checks. Do not treat “open weights” as a free production green light: data handling, export-control posture, hosting location, and model-serving cost still matter.
mean any well-funded lab can reproduce earlier-generation architecture. Defensibility sits in training-data curation and inference-cost engineering, not in the model itself.
When to pick DeepSeek
- You need frontier-class reasoning on a tight API budget. V4-Flash at $0.14/M cache-miss input is the cheapest credible rate from a model that can hold a 1M context window and run agentic workflows.
- You self-host. V3 family weights are public. Quantized distills run on single consumer GPUs via Ollama or LM Studio.
- You want repeatable-prompt workloads cheap. After the April 26, 2026 cut and June 1 V4-Pro adjustment, V4-Flash cache-hit input is $0.0028/M and V4-Pro cache-hit input is $0.003625/M.
- You’re benchmarking against an open-weight baseline. R1’s paper and weights remain the reference point for cost-efficient reasoning.
- You build in cost-sensitive markets. The cost gap versus most Western frontier-model APIs is the product.
When to pick something else
- Enterprise compliance, SOC 2, GDPR: ChatGPT or Claude. DeepSeek has open regulatory questions in EU and U.S. jurisdictions.
- Polished consumer chat: ChatGPT, Claude, or Gemini. DeepSeek’s web UI is functional but minimal.
- Long-context with stronger enterprise posture: Claude and Gemini remain safer procurement paths for teams that need Western enterprise controls more than DeepSeek’s lowest price.
- Open-weight with larger Western community: Llama or Qwen for alternate licensing and tooling.
- Uptime SLAs: Mistral or Anthropic offer contractual SLAs. DeepSeek does not.
Pricing
API pricing via api-docs.deepseek.com.
| Plan | Cache-miss input | Cache-hit input | Output | Who’s it for |
|---|---|---|---|---|
| Chat (Free) | $0 | $0 | $0 | Any user · DeepThink + web search · no declared cap |
deepseek-v4-flash | $0.14/M | $0.0028/M | $0.28/M | Default API workload (replaces legacy deepseek-chat and deepseek-reasoner) |
deepseek-v4-pro (quarter-price after June 1 adjustment) | $0.435/M | $0.003625/M | $0.87/M | Premium reasoning runs and large agentic workloads |
| New account credit | 5M free tokens | n/a | n/a | One-time grant on registration |
Prices verified 2026-06-20 via DeepSeek API pricing. V4-Pro pricing now reflects DeepSeek’s June 1 adjustment to one-quarter of the original price after the May promotion ended. Cache-hit input was cut to one-tenth of launch pricing on April 26, 2026. Thinking mode is supported by default on both V4 models and generates additional reasoning tokens, so effective cost per task is higher than the raw input figure suggests.
Against the alternatives
| DeepSeek V4-Flash | OpenAI frontier models | Claude flagship APIs | |
|---|---|---|---|
| Input price (per M tokens) | $0.14 | ~$2.50 | $5 |
| Output price (per M tokens) | $0.28 | ~$10 | $25 |
| Context window | 1M | Undisclosed | 1M |
| Open weights | Yes, including current V4 releases and earlier V3/R1 families on Hugging Face | No | No |
| Self-hostable | Yes, subject to model, hardware, license, and governance review | No | No |
| SOC 2 / GDPR posture | Open questions | Yes | Yes |
| Consumer polish | Functional | Strongest ecosystem | Strongest reasoning |
| Best viewed as | Cost-optimized API baseline | Generalist default | Reasoning specialist |
Failure modes
- Do not treat the quarter-price as immutable. DeepSeek converted the May V4-Pro discount into the June 1 adjusted price, but model naming and rates still move quickly. Re-check the official pricing docs before locking annual budgets or hardcoding endpoint economics.
- Legacy endpoint deprecation.
deepseek-chatanddeepseek-reasonerstill resolve to V4-Flash modes but are flagged for deprecation. Production code should migrate todeepseek-v4-flashanddeepseek-v4-proto avoid surprise breakage. - Open weights do not remove deployment risk. V4 and earlier model releases are visible on Hugging Face, but teams still need to verify license terms, memory cost, serving stack, data residency, model updates, and procurement posture before self-hosting.
- No R2 announced. R1 (January 2025) remains the standalone open-weight reasoning product. Capability-wise it still holds, but frontier coding agents have moved past R1 for high-autonomy software work.
- Regulatory posture is hostile in EU and U.S. Berlin DPA has flagged the service as non-compliant with GDPR. House Select Committee reports cite export-control violations. Banking, healthcare, government, and most EU enterprise workloads cannot deploy this.
- No SLA or uptime guarantee. The service hit heavy rate-limiting during the January 2025 R1 launch spike. Stability has improved but is not contractually backed.
- Chat UI is minimal. No Projects, no Canvas, no GPT Store equivalent. DeepThink reasoning is visible but the surrounding product is utilitarian.
- Thin moat. Open-weight releases let any lab reproduce or fine-tune earlier-generation architecture. Qwen, Llama, and Mistral Small 4 compete directly on cost-per-capability.
- Thinking-mode output tokens multiply cost. Both V4 models default to thinking mode, which generates additional reasoning tokens. Effective cost per completed task is meaningfully higher than the raw input rate suggests, especially on V4-Pro.
- Data residency is China. Chat conversations and API calls route through Chinese infrastructure. Even outside regulated industries, this is a disclosure burden.
Methodology
This page was produced by the aipedia.wiki editorial pipeline, an automated system that ingests vendor documentation, verifies pricing and model details against primary sources, and generates the editorial analysis you are reading. No individual human wrote this review. Scoring follows the four-dimension rubric at /about/scoring/ (Utility x Value x Moat x Longevity, unweighted average). Last verified 2026-06-20 against DeepSeek API docs, the DeepSeek V4 release note, DeepSeek Hugging Face, the DeepSeek-R1 paper, chat.deepseek.com, V4 preview coverage, and DeepSeek V4-Pro price-cut coverage.
FAQ
Is DeepSeek free? Yes. The chat interface at chat.deepseek.com is free with no declared usage cap and includes DeepThink reasoning and web search. The API is pay-per-token; new accounts get 5M free tokens on registration.
Is DeepSeek V4 out?
Yes. As of June 20, 2026, V4-Flash and V4-Pro are the documented production models on the API pricing page, both with a 1M context window and 384K maximum output. V4-Pro is listed at the adjusted quarter-price after the May promotion ended. The legacy deepseek-chat and deepseek-reasoner endpoint names still resolve to V4-Flash modes but are flagged for deprecation.
How does DeepSeek R1 compare to OpenAI o1? At launch (January 2025), R1 matched o1 on AIME 2024 (79.8% vs 79.2%) and MATH-500 (97.3% vs 96.4%). R1 is open-weight and free via chat. Its role today is as the open-weight reasoning baseline, with V4-Flash and V4-Pro handling production thinking-mode workloads on the API.
Can I run DeepSeek locally? Yes, but not casually. DeepSeek’s Hugging Face organization now lists V4 releases as well as V3, V3.2, R1, and other earlier families. Distilled R1 variants can run on consumer GPUs via Ollama or LM Studio, but full V3/V4-class deployments need serious hardware, license review, security review, and model-serving operations.
Is DeepSeek safe for enterprise use? For regulated industries, no. The Berlin DPA flagged the app as non-compliant with GDPR. U.S. House Select Committee reports cite export-control violations. Banking, healthcare, government, and EU data workloads should not use the hosted API. Self-hosting the open weights avoids the data-transfer issue but does not change origin or export-control questions.
Sources
- DeepSeek API Pricing Docs: current per-token rates, V4-Flash and V4-Pro models, and cache discounts verified 2026-06-20
- DeepSeek V4 release note: V4-Flash, V4-Pro, 1M context, API availability, open weights, and legacy endpoint deprecation verified 2026-06-20
- DeepSeek API changelog: release sequence and current V4-era model history verified 2026-06-20
- DeepSeek-R1 ArXiv paper: benchmark results and architecture
- chat.deepseek.com: consumer chat interface
- DeepSeek V4 preview coverage: April 24, 2026 preview release note
- DeepSeek V4-Pro price-cut coverage: April 27, 2026 temporary discount and cache-hit pricing signal
- DeepSeek Hugging Face organization: open-weight model releases verified 2026-06-20
- DeepSeek Wikipedia: company and regulatory background
Related
- Category: AI Chatbots · AI Coding
- Comparisons: DeepSeek vs Mistral AI · DeepSeek vs Qwen