Modal is a serverless cloud platform for Python applications, AI jobs, GPU workloads, web endpoints, scheduled tasks, and sandboxes. It removes much of the container, queue, and Kubernetes work that normally sits between a notebook and a production AI service.
The useful mental model: write Python, decorate functions, attach CPU/GPU/memory requirements, and deploy. Modal handles image builds, scale-out, secrets, queues, logs, and web endpoints.
Recent developments
- April 30, 2026: RunPod Flash went GA with a Python-to-GPU-endpoint workflow that skips container work. Modal still has the more mature serverless Python platform in this catalog, but RunPod is now making a direct developer-experience push.
System Verdict
Pick Modal if you want AI infrastructure without becoming an infra team. It is particularly good for spiky inference, batch processing, embeddings, media jobs, and internal tools.
Skip it if your workload is steady 24/7. Always-on GPU fleets can be cheaper through reserved cloud instances or dedicated providers.
Modal’s moat is developer experience. It makes production-grade compute feel like an extension of Python code.
Key Facts
| Core product | Serverless Python and GPU cloud |
| Workloads | Functions, batch jobs, queues, web endpoints, sandboxes |
| GPU pricing | Per-second billing by class (B200 $0.001736/sec, H200 $0.001261/sec, H100 $0.001097/sec, RTX PRO 6000 $0.000842/sec, A100 80GB $0.000694/sec, A100 40GB $0.000583/sec, L40S $0.000542/sec, A10 $0.000306/sec, L4 $0.000222/sec, T4 $0.000164/sec) |
| CPU / memory | $0.0000131 per core/sec, $0.00000222 per GiB/sec |
| Starter | $0/mo with $30/mo compute credits, 100 containers, 10 GPU concurrency |
| Team | $250/mo workspace plan with $100/mo compute credits, 1,000 containers, 50 GPU concurrency |
| Enterprise | Custom plan with higher concurrency, support, audit logs, SSO, and HIPAA compatibility |
| Surcharges | Region selection 1.5x to 1.75x base; non-preemptible 3x base |
| GPU routing note | Modal docs say gpu="B200+" can run on B200 or B300 and is billed as B200, but only use it if the workload is compatible with both GPU types |
| Best fit | AI apps, pipelines, inference, internal tools |
| Alternatives | RunPod, Lambda Labs, AWS Batch, Kubernetes, Together AI |
When to pick Modal
- You have spiky GPU demand. Pay for active compute rather than idle GPU hours.
- You build in Python. Modal is optimized for Python-first teams.
- You need jobs and endpoints together. Batch processing and web APIs can share code and secrets.
- You want clean deployment ergonomics. Less YAML, fewer container chores, faster iteration.
- You are prototyping AI infrastructure. It is easier to start than assembling cloud primitives.
When to pick something else
- Steady GPU occupancy: Reserved cloud GPUs, Lambda Labs, or RunPod may be cheaper.
- Open-model inference APIs: Together AI or Fireworks AI.
- Media model APIs: Fal.ai or Replicate.
- Full platform control: Kubernetes on AWS, GCP, Azure, or your own cluster.
Pricing
Modal bills compute by actual resource usage. GPU prices are listed per second by class, with B200 at $0.001736/sec, H200 at $0.001261/sec, H100 at $0.001097/sec, RTX PRO 6000 at $0.000842/sec, A100 80GB at $0.000694/sec, A100 40GB at $0.000583/sec, L40S at $0.000542/sec, A10 at $0.000306/sec, L4 at $0.000222/sec, and T4 at $0.000164/sec. CPU is $0.0000131 per core per second and memory $0.00000222 per GiB per second. Volumes, sandboxes, and notebooks have separate meters.
The Starter plan is $0/mo with $30/mo in compute credits, 100 containers, and 10 GPU concurrency. Team is $250/mo plus compute, includes $100/mo in compute credits, and lifts caps to 1,000 containers and 50 GPU concurrency.
This is attractive for bursty jobs. For constant GPU load, compare against reserved instances before committing.
, customer-facing latency, and workloads that cannot tolerate interruption.
Evaluation checklist
Before committing a production workload to Modal, test:
- Cold start time, image build time, and model load time.
- Whether the workload is bursty enough to benefit from serverless billing.
- GPU memory requirements by model and batch size.
- Whether
B200+routing is acceptable for the code path, since Modal can route compatible workloads to B200 or B300 while billing as B200. - Queue behavior under peak traffic.
- Region requirements and whether region multipliers change the economics.
- Whether non-preemptible execution is required.
- Logging, alerting, secrets, rollbacks, and cost tags.
Buyer fit
Modal is strongest for Python-heavy teams that want to ship infrastructure as code without building a platform team. It fits evaluation jobs, embeddings, video and image processing, internal tools, scheduled tasks, custom inference endpoints, and workloads that scale from zero to many containers.
It is weaker for organizations that already have a mature Kubernetes platform, need deep network control, or run steady GPUs around the clock. In those cases, the developer experience may still be excellent, but the cost comparison needs to include reserved capacity and existing infrastructure staff.
Failure Modes
- Serverless is not magic for every workload. Cold starts, image builds, and large model loads still matter.
- Always-on can get expensive. Modal shines when utilization is uneven.
- Python-first bias. Great for Python teams, less natural for polyglot app stacks.
- Cloud abstraction limits. If you need low-level network or cluster control, you may hit boundaries.
- Cost needs tags and alerts. Per-second pricing is transparent, but runaway jobs are still runaway jobs.
- Pricing multipliers matter. Region selection and non-preemptible execution can materially change production cost.
Methodology
Last verified 2026-06-12 against Modal’s pricing page and product documentation, with GPU per-second rates, container caps, surcharge multipliers, and B200+/B300 routing guidance confirmed for Starter and Team tiers. Scoring emphasizes developer experience, fit for AI workloads, GPU flexibility, and cost risk.
FAQ
Is Modal only for AI? No. It runs general Python serverless workloads, but AI and GPU use cases are a major fit.
Does Modal support GPUs? Yes. GPU tasks are priced per second by GPU type.
Is Modal cheaper than cloud GPUs? For spiky workloads, often. For steady 24/7 load, reserved cloud or dedicated GPU providers may be cheaper.
Sources
Related
- Category: AI Infrastructure · AI Coding
- See also: Together AI · Fal.ai · Replicate · Fireworks AI · Groq