modal-serverless-gpu▌
davila7/claude-code-templates · updated Apr 8, 2026
MDX-style export adds YAML metadata + attribution linking explainx.ai and this canonical listing URL.
Comprehensive guide to running ML workloads on Modal's serverless GPU cloud platform.
Modal Serverless GPU
Comprehensive guide to running ML workloads on Modal's serverless GPU cloud platform.
When to use Modal
Use Modal when:
- Running GPU-intensive ML workloads without managing infrastructure
- Deploying ML models as auto-scaling APIs
- Running batch processing jobs (training, inference, data processing)
- Need pay-per-second GPU pricing without idle costs
- Prototyping ML applications quickly
- Running scheduled jobs (cron-like workloads)
Key features:
- Serverless GPUs: T4, L4, A10G, L40S, A100, H100, H200, B200 on-demand
- Python-native: Define infrastructure in Python code, no YAML
- Auto-scaling: Scale to zero, scale to 100+ GPUs instantly
- Sub-second cold starts: Rust-based infrastructure for fast container launches
- Container caching: Image layers cached for rapid iteration
- Web endpoints: Deploy functions as REST APIs with zero-downtime updates
Use alternatives instead:
- RunPod: For longer-running pods with persistent state
- Lambda Labs: For reserved GPU instances
- SkyPilot: For multi-cloud orchestration and cost optimization
- Kubernetes: For complex multi-service architectures
Quick start
Installation
pip install modal
modal setup # Opens browser for authentication
Hello World with GPU
import modal
app = modal.App("hello-gpu")
@app.function(gpu="T4")
def gpu_info():
import subprocess
return subprocess.run(["nvidia-smi"], capture_output=True, text=True).stdout
@app.local_entrypoint()
def main():
print(gpu_info.remote())
Run: modal run hello_gpu.py
Basic inference endpoint
import modal
app = modal.App("text-generation")
image = modal.Image.debian_slim().pip_install("transformers", "torch", "accelerate")
@app.cls(gpu="A10G", image=image)
class TextGenerator:
@modal.enter()
def load_model(self):
from transformers import pipeline
self.pipe = pipeline("text-generation", model="gpt2", device=0)
@modal.method()
def generate(self, prompt: str) -> str:
return self.pipe(prompt, max_length=100)[0]["generated_text"]
@app.local_entrypoint()
def main():
print(TextGenerator().generate.remote("Hello, world"))
Core concepts
Key components
| Component | Purpose |
|---|---|
App |
Container for functions and resources |
Function |
Serverless function with compute specs |
Cls |
Class-based functions with lifecycle hooks |
Image |
Container image definition |
Volume |
Persistent storage for models/data |
Secret |
Secure credential storage |
Execution modes
| Command | Description |
|---|---|
modal run script.py |
Execute and exit |
modal serve script.py |
Development with live reload |
modal deploy script.py |
Persistent cloud deployment |
GPU configuration
Available GPUs
| GPU | VRAM | Best For |
|---|---|---|
T4 |
16GB | Budget inference, small models |
L4 |
24GB | Inference, Ada Lovelace arch |
A10G |
24GB | Training/inference, 3.3x faster than T4 |
L40S |
48GB | Recommended for inference (best cost/perf) |
A100-40GB |
40GB | Large model training |
A100-80GB |
80GB | Very large models |
H100 |
80GB | Fastest, FP8 + Transformer Engine |
H200 |
141GB | Auto-upgrade from H100, 4.8TB/s bandwidth |
B200 |
Latest | Blackwell architecture |
GPU specification patterns
# Single GPU
@app.function(gpu="A100")
# Specific memory variant
@app.function(gpu="A100-80GB")
# Multiple GPUs (up to 8)
@app.function(gpu="H100:4")
# GPU with fallbacks
@app.function(gpu=["H100", "A100", "L40S"])
# Any available GPU
@app.function(gpu="any")
Container images
# Basic image with pip
image = modal.Image.debian_slim(python_version="3.11").pip_install(
"torch==2.1.0", "transformers==4.36.0", "accelerate"
)
# From CUDA base
image = modal.Image.from_registry(
"nvidia/cuda:12.1.0-cudnn8-devel-ubuntu22.04",
add_python="3.11"
).pip_install("torch", "transformers")
# With system packages
image = modal.Image.debian_slim().apt_install("git", "ffmpeg").pip_install("whisper")
Persistent storage
volume = modal.Volume.from_name("model-cache", create_if_missing=True)
@app.function(gpu="A10G", volumes={"/models": volume})
def load_model():
import os
model_path = "/models/llama-7b"
if not os.path.exists(model_path):
model = download_model()
model.save_pretrained(model_path)
volume.commit() # Persist changes
return load_from_path(model_path)
Web endpoints
FastAPI endpoint decorator
@app.function()
@modal.fastapi_endpoint(method="POST")
def predict(text: str) -> dict:
return {"result": model.predict(text)}
Full ASGI app
from fastapi import FastAPI
web_app = FastAPI()
@web_app.post("/predict")
async def predict(text: str):
return {"result": await model.predict.remote.aio(text)}
@app.function()
@modal.asgi_app()
def fastapi_app():
return web_app
Web endpoint types
| Decorator | Use Case |
|---|---|
@modal.fastapi_endpoint() |
Simple function → API |
@modal.asgi_app() |
Full FastAPI/Starlette apps |
@modal.wsgi_app() |
Django/Flask apps |
@modal.web_server(port) |
Arbitrary HTTP servers |
Dynamic batching
how to use modal-serverless-gpuHow to use modal-serverless-gpu on Cursor
AI-first code editor with Composer
1Prerequisites
Before installing skills in Cursor, ensure your development environment meets these requirements:
- ›Cursor installed and configured on your development machine
- ›Node.js version 16.0+ with npm package manager (verify with
node --version) - ›Active project directory or workspace where you want to add modal-serverless-gpu
2Execute installation command
Execute the skills CLI command in your project's root directory to begin installation:
$npx skills add https://github.com/davila7/claude-code-templates --skill modal-serverless-gpuThe skills CLI fetches modal-serverless-gpu from GitHub repository davila7/claude-code-templates and configures it for Cursor.
3Select Cursor when prompted
The CLI will show a list of available agents. Use arrow keys to navigate and space to select Cursor:
◆ Which agents do you want to install to?││ ── Universal (.agents/skills) ── always included ────│ • Amp│ • Antigravity│ • Cline│ • Codex│ ●Cursor(selected)│ • Cursor│ • Windsurf4Verify installation
Confirm successful installation by checking the skill directory location:
.cursor/skills/modal-serverless-gpuReload or restart Cursor to activate modal-serverless-gpu. Access the skill through slash commands (e.g., /modal-serverless-gpu) or your agent's skill management interface.
⚠Security & Verification Notice
We perform automated surface-level scans (Gen AI Scanner, Socket, Snyk) during installation. These checks detect common vulnerabilities but do not guarantee complete security. Always review skill source code and verify the publisher's reputation before production use.
Skills execute code in your development environment. Always verify the publisher's identity, review recent commits, and test in isolated environments before production deployment.
Additional Resources
List & Monetize Your Skill
Submit your Claude Code skill and start earning
GET_STARTED →Use Cases▌
Task Automation & Efficiency
Automate repetitive workflows and reduce manual effort
Example
Generate reports, summarize documents, draft communications
✓Save 3-5 hours per week on routine tasks
Knowledge Enhancement
Learn new skills, understand complex topics, get expert guidance
Example
Explain concepts, provide examples, suggest learning resources
✓Accelerate learning and skill development by 2x
Quality Improvement
Enhance output quality through reviews, suggestions, and refinements
Example
Review drafts, suggest improvements, catch errors
✓Improve work quality by 30-40% with less effort
Implementation Guide▌
Prerequisites
- ›Claude Desktop or compatible AI client with skill support
- ›Clear understanding of task or problem to solve
- ›Willingness to iterate and refine outputs
Time Estimate
15-45 minutes depending on use case complexity
Installation Steps
- 1.Install skill using provided installation command
- 2.Test with simple use case relevant to your work
- 3.Evaluate output quality and relevance
- 4.Iterate on prompts to improve results
- 5.Integrate into regular workflow if valuable
Common Pitfalls
- ⚠Expecting perfect results without iteration
- ⚠Not providing enough context in prompts
- ⚠Using skill for tasks outside its intended scope
- ⚠Accepting outputs without review and validation
Best Practices▌
✓ Do
- +Start with clear, specific prompts
- +Provide relevant context and constraints
- +Review and refine all outputs before using
- +Iterate to improve output quality
- +Document successful prompt patterns
✗ Don't
- −Don't use without understanding skill limitations
- −Don't skip validation of outputs
- −Don't share sensitive information in prompts
- −Don't expect skill to replace human judgment
💡 Pro Tips
- ★Be specific about desired format and style
- ★Ask for multiple options to choose from
- ★Request explanations to understand reasoning
- ★Combine AI efficiency with human expertise
When to Use This▌
✓ Use When
Use when skill capabilities match your task, clear ROI on time saved, and you can validate outputs. Best for repetitive tasks, learning, and quality improvement.
✗ Avoid When
Avoid when task requires deep expertise you can't validate, involves sensitive decisions, or when learning process is more valuable than speed of completion.
Learning Path▌
- 1Familiarize yourself with skill capabilities and limitations
- 2Start with low-risk, non-critical tasks
- 3Progress to more complex and valuable use cases
- 4Build expertise through regular use and experimentation
Discussion
Product Hunt–style comments (not star reviews)- No comments yet — start the thread.
general reviewsRatings
4.6★★★★★67 reviews- ★★★★★Aanya Gill· Dec 28, 2024
modal-serverless-gpu reduced setup friction for our internal harness; good balance of opinion and flexibility.
- ★★★★★Ganesh Mohane· Dec 24, 2024
modal-serverless-gpu reduced setup friction for our internal harness; good balance of opinion and flexibility.
- ★★★★★Amina Bansal· Dec 4, 2024
Solid pick for teams standardizing on skills: modal-serverless-gpu is focused, and the summary matches what you get after install.
- ★★★★★Amina Ramirez· Nov 23, 2024
modal-serverless-gpu is among the better-maintained entries we tried; worth keeping pinned for repeat workflows.
- ★★★★★Aarav Yang· Nov 19, 2024
I recommend modal-serverless-gpu for anyone iterating fast on agent tooling; clear intent and a small, reviewable surface area.
- ★★★★★Sakshi Patil· Nov 15, 2024
I recommend modal-serverless-gpu for anyone iterating fast on agent tooling; clear intent and a small, reviewable surface area.
- ★★★★★Tariq Kapoor· Nov 11, 2024
Keeps context tight: modal-serverless-gpu is the kind of skill you can hand to a new teammate without a long onboarding doc.
- ★★★★★Yusuf Khan· Nov 7, 2024
We added modal-serverless-gpu from the explainx registry; install was straightforward and the SKILL.md answered most questions upfront.
- ★★★★★Aanya Garcia· Nov 3, 2024
modal-serverless-gpu fits our agent workflows well — practical, well scoped, and easy to wire into existing repos.
- ★★★★★Fatima Zhang· Oct 26, 2024
modal-serverless-gpu fits our agent workflows well — practical, well scoped, and easy to wire into existing repos.
showing 1-10 of 67
1 / 7