DeepSeek-TUI (Hmbown) is a Rust terminal coding agent built around DeepSeek V4 (deepseek-v4-pro / deepseek-v4-flash): streaming thinking, tools (files, shell, git, web, MCP, sub-agents), session checkpoints, and cost telemetry including prefix-cache hints. The README disclaims affiliation with DeepSeek Inc.
It appeared on GitHub Trending in early May 2026; facts here come from the upstream README (v0.8.x era).
TL;DR
| Topic | Takeaway |
|---|---|
| Binaries | deepseek dispatcher → deepseek-tui (ratatui UI) |
| Models | V4 default; auto routes Flash vs Pro + thinking per turn |
| Modes | Plan (read-only explore), Agent (approval gates), YOLO (auto-approve) |
| Extras | MCP, skills from GitHub, HTTP/SSE deepseek serve, RLM batch helper, LSP diagnostics hooks |
| Install | npm i -g deepseek-tui or cargo install / Homebrew / Releases |
| License | MIT |
Why it matters next to "just use the API"
When DeepSeek released V4 Pro and Flash in early 2026, the raw API story was compelling: state-of-the-art coding and reasoning capabilities at a fraction of the cost of GPT-4 or Claude. But raw APIs leave critical questions unanswered:
How do you gate risky operations? An LLM with file-write and shell-exec tools can wreck a repository in seconds if it misunderstands context or hallucinates a destructive command. Production teams need approval workflows, not just API keys.
How do you survive context overflow? Even with million-token windows, long agent sessions accumulate transcripts that exceed limits. Naive implementations fail silently or lose critical state when compaction kicks in.
How do you track costs? DeepSeek's aggressive pricing (orders of magnitude cheaper than alternatives) only matters if you can attribute spend to specific tasks, understand cache-hit economics, and forecast bills before they arrive.
How do you maintain session continuity? Terminal windows close, SSH connections drop, laptops suspend. Stateless API wrappers start from scratch every time. Real workflows need durable task queues and resumable sessions.
DeepSeek-TUI addresses these operational gaps with harness patterns that experienced teams eventually build anyway:
- Side-git snapshots for rollback without touching your repo
.git—the agent can experiment freely while you maintain a clean undo path - Durable task queue across restarts—close your terminal, resume tomorrow, the context and pending tasks persist
- Reasoning-effort cycling (Shift+Tab)—dynamically adjust how hard the model thinks based on task complexity and budget
- 1M-token awareness with compaction controls and cache-hit accounting—see what is getting dropped, control summarization, understand prompt-cache economics
That matches the scaffold story in our agent harness article—here aimed at DeepSeek APIs and compatible hosts like vLLM, SGLang, and NVIDIA NIM.
Installation: npm for convenience, cargo for source
The repository offers multiple install paths to meet users where they are:
npm (recommended for quick starts):
npm install -g deepseek-tui
This downloads prebuilt binaries for macOS (x64/ARM), Linux (x64/ARM), and Windows (x64). The npm wrapper automatically selects the correct binary for your platform and places it in your PATH as deepseek and deepseek-tui.
Cargo (for Rust developers):
cargo install deepseek deepseek-tui
Compiles from source, useful if you are on a platform without prebuilt binaries or want to modify the code. Requires Rust toolchain 1.70+.
Homebrew (macOS):
brew tap Hmbown/deepseek-tui
brew install deepseek-tui
Scoop (Windows):
scoop bucket add deepseek-tui https://github.com/Hmbown/scoop-bucket
scoop install deepseek-tui
GitHub Releases: Download platform-specific binaries directly from the Releases page if package managers are not an option.
After installation, run deepseek doctor to verify setup. First launch prompts for a DeepSeek API key, stored in ~/.deepseek/config.toml. The config file supports multiple profiles for different API endpoints, keys, and default models.
Architecture: dispatcher pattern and terminal UI
DeepSeek-TUI uses a two-binary architecture:
deepseek (dispatcher): The main CLI entry point. Handles command parsing, configuration management, API credential loading, and dispatches to subcommands or the TUI. Think of it as the control plane.
deepseek-tui (terminal UI): The ratatui-based interactive interface. Renders streaming responses with syntax highlighting, shows thinking blocks in real-time, manages approval dialogs, and handles keyboard shortcuts. This is the data plane where you spend your time.
The separation means you can use deepseek in headless scripts and CI pipelines (deepseek run --file task.md --mode yolo) while still having the rich TUI available for interactive work (deepseek-tui or just deepseek with no subcommand).
The TUI is built on ratatui, a modern terminal UI library for Rust that provides:
- Declarative layouts with flexbox-style composition
- Incremental rendering so streaming tokens appear instantly
- Widget composition for panels, lists, syntax-highlighted code blocks
- Event-driven architecture for responsive keyboard/mouse input
This is not an Electron app or web view in a terminal emulator—it is native terminal control sequences for maximum performance on remote servers and low-bandwidth connections.
Model routing: auto mode and thinking levels
Auto mode is DeepSeek-TUI's answer to "which model should I use for this task?"
When you set --model auto or /model auto in the TUI, the tool does not pass "auto" to the DeepSeek API (which does not support it). Instead:
- The harness runs a small deepseek-v4-flash request (thinking disabled) with a routing prompt that analyzes the user task
- The routing model decides whether the task needs Flash (fast, cheap, good for simple edits) or Pro (slow, expensive, better for complex reasoning)
- The routing model also selects a thinking level (0-3 in DeepSeek's API schema)
- The harness issues the real request with the concrete model and thinking setting
- On failure, fallback heuristics kick in: use Pro for tasks mentioning "refactor" or "architecture", Flash for "fix typo" or "add comment"
This two-phase approach costs one extra Flash call per turn but can save significant money. Example: a simple "add logging to this function" task might cost $0.001 with Flash auto-routed, vs $0.015 with Pro always-on. Over hundreds of tasks, the routing overhead pays for itself many times over.
Thinking levels control how much internal reasoning the model exposes:
- 0: No thinking, just output (fastest, cheapest, works for straightforward tasks)
- 1: Brief thinking (model shows a few sentences of reasoning)
- 2: Moderate thinking (paragraph-scale internal monologue)
- 3: Deep thinking (multi-paragraph reasoning, useful for debugging complex logic)
You can cycle through levels with Shift+Tab during a session. Watch the token counters: thinking tokens count toward your bill and context window, so level 3 on a simple task wastes budget.
Operating modes: Plan, Agent, YOLO
Plan mode is read-only exploration:
- The agent can read files, run
git diff, search codebases, query documentation - No write operations allowed: no file edits, no
git commit, norm -rf - Useful for understanding a new codebase, debugging without risk of changes, or drafting a proposal
Use Plan mode when you want an AI assistant to explain what a project does, identify where a bug might be, or suggest an implementation approach—without touching anything.
Agent mode (default) adds approval gates:
- The agent proposes edits, shell commands, git operations
- You see a preview with syntax-highlighted diffs
- You approve (y), reject (n), or edit the proposal before execution
- Each action is logged to session history for audit and rollback
This is the sweet spot for most development work: the agent does the heavy lifting (write boilerplate, refactor functions, generate tests) while you maintain control over what actually runs.
YOLO mode removes the gates:
- The agent executes all proposed actions automatically
- Useful for batch tasks, CI/CD pipelines, trusted automation
- Dangerous if the agent misunderstands requirements or hallucinates destructive commands
YOLO mode is "just use the API" with session management bolted on. Only use it when you trust the agent, the task is well-scoped, and rollback is easy (e.g. you are working in a disposable Docker container or a git branch you can delete).
MCP integration: connecting to external tools
Model Context Protocol support is documented in docs/MCP.md in the repository. DeepSeek-TUI acts as an MCP client, connecting to MCP servers that provide tools and resources:
Standard MCP servers work out of the box:
@modelcontextprotocol/server-filesystem— read/write files with access controls@modelcontextprotocol/server-postgres— query databases@modelcontextprotocol/server-github— search issues, read PRs, post comments@modelcontextprotocol/server-brave-search— web search@modelcontextprotocol/server-puppeteer— browser automation
Configuration example (from docs):
# ~/.deepseek/config.toml
[[mcp_servers]]
name = "filesystem"
command = "npx"
args = ["-y", "@modelcontextprotocol/server-filesystem", "/home/user/projects"]
[[mcp_servers]]
name = "postgres"
command = "npx"
args = ["-y", "@modelcontextprotocol/server-postgres"]
env = { DATABASE_URL = "postgresql://localhost/mydb" }
When the TUI starts, it launches each MCP server as a subprocess and communicates via stdio. The agent sees MCP tools in its tool list alongside built-in capabilities like file-edit and shell-exec. Natural language requests route to the appropriate tool: "search GitHub issues for label:bug" triggers the GitHub MCP server, "query the users table" hits Postgres.
This architecture means you can extend DeepSeek-TUI with domain-specific tools (internal APIs, proprietary data sources, compliance checks) without forking the core codebase—just write an MCP server and configure it.
Skills: portable instructions from GitHub
The repository documents a skills system inspired by Cursor and Claude Code:
What are skills? Markdown files (SKILL.md) that contain structured instructions for common tasks. Example structure:
# Skill: Add API Endpoint
## Goal
Create a new REST API endpoint in our FastAPI application
## Steps
1. Define request/response models in `app/schemas.py`
2. Implement handler in `app/routers/`
3. Add tests in `tests/test_api.py`
4. Update OpenAPI docs
## Constraints
- Follow existing error handling patterns
- Include request validation
- Add rate limiting decorators
Loading skills: /skill install can pull from GitHub without a backend service:
/skill install username/repo/path/to/skill.md
DeepSeek-TUI downloads the skill, caches it locally in ~/.deepseek/skills/, and makes it available as a /skill run <name> command. When you invoke a skill, the TUI injects the skill instructions into the system prompt so the agent follows your team's patterns automatically.
Standard skill directories:
.cursor/skills/(Cursor compatibility).claude/skills/(Claude Code compatibility)~/.deepseek/skills/(user global)./.deepseek/skills/(project local)
The cross-compatible path means skills you write for DeepSeek-TUI work in Cursor and vice versa (subject to tool availability differences). This is the "portable agent instructions" layer our agent skills guide discusses.
Economics (verify live)
The README embeds DeepSeek per-1M cache hit/miss tables and notes time-limited discounts through 31 May 2026 UTC for Pro—reconcile with official pricing.
As of the README snapshot, approximate DeepSeek V4 pricing (per million tokens):
deepseek-v4-flash:
- Input: $0.14/1M tokens
- Output: $0.28/1M tokens
- Cache hit: $0.014/1M tokens (90% discount on cached input)
deepseek-v4-pro (with temporary discount):
- Input: $0.55/1M tokens (normally $2.19)
- Output: $2.19/1M tokens (normally $8.75)
- Cache hit: $0.055/1M tokens
Why cache hits matter: DeepSeek-TUI sends your codebase context, tool definitions, and system prompts in every request. With prompt caching, the API recognizes unchanged content and bills at cache-hit rates. On long sessions, 95%+ of your input tokens can be cache hits.
Example cost breakdown for a 1000-turn agent session (typical for implementing a medium feature):
- Without caching: 500M input tokens × $0.55 = $275
- With caching: 25M cache-miss tokens × $0.55 + 475M cache-hit tokens × $0.055 = $13.75 + $26.13 = $39.88
The TUI shows cache-hit percentages in the status bar so you can see economics in real time. If cache hits drop unexpectedly, it usually means you are editing files the agent is reading, causing cache invalidation.
Session management and persistence
Session snapshots: Every N turns (configurable), DeepSeek-TUI writes session state to ~/.deepseek/sessions/<id>.json:
- Full message history
- Tool call results
- Pending tasks
- File modification log
- Cost tracking
You can /save manually to checkpoint before risky operations. If the TUI crashes or you kill the terminal, /resume <id> picks up where you left off. This is critical for long-running tasks: implementing a feature might span hours across multiple SSH disconnects.
Side-git snapshots: When YOLO mode or approved Agent actions modify files, DeepSeek-TUI can maintain a parallel git history in .deepseek-snapshots/:
- Each file write triggers a git commit with the tool call as the message
- Your main
.gitstays clean - Roll back with
deepseek restore <snapshot-id>without polluting real git history
This separation means you can experiment aggressively (let the agent try three refactoring approaches) and only promote successful changes to your actual repository.
Alternative backends: vLLM, SGLang, NVIDIA NIM
DeepSeek-TUI is not hardcoded to DeepSeek's API. The README documents compatibility with:
vLLM: Self-hosted inference with DeepSeek weights downloaded from HuggingFace:
export VLLM_BASE_URL=http://localhost:8000/v1
deepseek --model deepseek-v4-flash
SGLang: Faster inference engine for long context:
export SGLANG_BASE_URL=http://localhost:7501/v1
NVIDIA NIM: Enterprise-grade deployment on NVIDIA hardware:
deepseek --provider nvidia --model deepseek-v4-pro
Fireworks AI: Managed hosting with auto-scaling:
deepseek --provider fireworks --api-key <key>
This flexibility matters for teams with:
- Data residency requirements (run models on-prem)
- Latency constraints (colocate inference with your VPC)
- Cost optimization (prepaid GPU hours vs pay-per-token APIs)
- Custom fine-tunes (host your domain-adapted DeepSeek variant)
The TUI abstracts provider differences: model selection, thinking modes, and tool calling work the same regardless of backend (subject to the provider actually supporting the features).
Advanced features: RLM batch, LSP diagnostics, HTTP serve
RLM batch helper: Run many independent tasks in parallel:
deepseek rlm --input tasks.jsonl --output results.jsonl --concurrency 10
Useful for dataset generation, bulk code migrations, or evaluations. Each line in tasks.jsonl is an independent conversation; results stream to output as they complete.
LSP diagnostics hooks: When editing code, DeepSeek-TUI can run language servers (typescript-language-server, rust-analyzer, pylsp) and inject compiler errors and warnings into the agent context:
File: app.ts
Error: Type 'string' is not assignable to type 'number'
Line 42: const count: number = getUserInput();
The agent sees real type errors, not just your natural-language description. This dramatically improves fix accuracy for compiler-enforced constraints.
HTTP/SSE serve mode: deepseek serve exposes an HTTP API with Server-Sent Events for streaming:
deepseek serve --port 8080 --acp
The --acp flag enables Zed ACP (Agent Communication Protocol) compatibility for integration with Zed editor. This mode is experimental per the README but opens the door to custom UIs, web dashboards, and editor plugins that consume DeepSeek-TUI as a backend service.
Comparison with alternatives
vs raw API calls: DeepSeek-TUI adds session management, approval workflows, cost tracking, and compaction—essential for production use. Raw API calls are fine for one-off scripts; agent sessions need harness logic.
vs Cursor: Cursor is IDE-native with premium UX and tight editor integration. DeepSeek-TUI is terminal-native, works over SSH, and supports custom MCP servers. Cursor costs $20/month; DeepSeek-TUI is free with bring-your-own-API-key.
vs Aider: Aider (Python) focuses on git-aware code edits with simple approval prompts. DeepSeek-TUI (Rust) adds streaming thinking, MCP, skills, and multi-provider routing. Aider is lighter; DeepSeek-TUI is more feature-complete.
vs Claude Code: Claude Code is Anthropic's official CLI with deep integration into their ecosystem. DeepSeek-TUI targets DeepSeek models and compatible backends. If you are already on Claude, use Claude Code; if you want DeepSeek economics, use DeepSeek-TUI.
Choose based on your workflow: IDE vs terminal, Anthropic vs DeepSeek economics, feature priorities.
Related on ExplainX
- DeepSeek V4 preview: API and migration
- DeepSeek V4-Pro: benchmarks and pricing
- What is MCP? Model Context Protocol guide
- context-mode: MCP sandboxing
Sources
- Repository: github.com/Hmbown/DeepSeek-TUI
- npm: npmjs.com/package/deepseek-tui
- DeepSeek pricing: api-docs.deepseek.com/quick_start/pricing
Releases, provider matrices, and ACP support evolve. Treat this as May 6, 2026 README context.