What is Gemma Chat in one sentence?

Gemma Chat is an open-source Electron desktop app (MIT license) that runs Google’s Gemma 4 family locally on Apple Silicon using Apple’s MLX-LM stack—aimed at offline chat and a build mode where the model writes multi-file projects with a live preview, after a one-time model download.

Who built Gemma Chat and where is the code?

Ammaar Reshi published the project on GitHub at github.com/ammaarreshi/gemma-chat. Reshi’s public profile positions him in product and design around Google AI Studio; treat the repo README and issues as the source of truth for setup and known bugs.

What hardware and software does Gemma Chat require?

The upstream README targets macOS on Apple Silicon, Python 3.10–3.13, and Node 20+. Model footprint depends on variant: the project’s own table cites roughly 1.5 GB for Gemma 4 E2B, ~3 GB for E4B (recommended default), ~8 GB for a 27B-class MoE (16 GB+ RAM suggested), and ~18 GB for a 31B class (32 GB+ RAM suggested). Verify exact filenames and revisions on the repository—you should expect these numbers to move as weights and docs update.

How do I try Gemma Chat locally?

Clone github.com/ammaarreshi/gemma-chat, run npm install and npm run dev. First launch is designed to provision Python venv, MLX-LM, and download weights (~3 GB for the recommended E4B path per README messaging). For packaged installs, npm run dist produces a .dmg per project docs.

Does 'no Wi-Fi' mean the entire developer loop is offline?

Inference and conversation can run offline after models are cached, but real product workflows usually still touch the network (package installs, docs, CI, deploy previews). Treat offline coding as a strong privacy and travel story for the model layer, not a claim that every dependency and spec lives on disk forever.

What are people reporting in early community feedback?

Public threads under the launch mention interest in routing through existing local runtimes (for example Ollama or LM Studio) and occasional first-run instability during model download—check GitHub Issues before assuming your environment matches the demo path.

Gemma Chat: offline vibe coding with Gemma 4 and MLX on Mac | explainx.ai Blog

Per its README, Gemma Chat is a local-first desktop app: Electron + Vite + React 19 + TypeScript + Tailwind on the surface, MLX-LM underneath for Gemma 4 on Apple Silicon, with optional Ollama compatibility called out in the repo description. The project bills itself as “vibe code without the internet” after the initial model pull—no API keys in the local narrative, MIT license.

This article is an ExplainX field guide: stack, model sizing, how the agent loop is described upstream, and what to validate if you fork it for your team.

TL;DR

Question	Short answer
What is it?	Desktop chat + coding agent for Gemma 4, running via MLX on Mac (Apple Silicon).
Why care?	A concrete open-source reference for offline-capable assistant UX tied to Google’s open Gemma line and Apple’s MLX runtime.
Primary source	github.com/ammaarreshi/gemma-chat
Creator signal	Ammaar Reshi—public launch thread and Google Gemma account amplification (April 2026); star/fork counts change—check the repo badge row.
License	MIT (per repository LICENSE).

What shipped

The README frames two modes:

Build mode — A coding agent with a live preview: the model writes multi-file HTML/CSS/JS-style trees into a sandboxed workspace while the UI streams updates.
Chat mode — Conversational use with tools (upstream mentions web search, URL fetch, calculator, bash in feature list).

Supporting pieces called out there include model switching across several Gemma 4 variants, voice input via Whisper (transformers.js / WASM in-browser path per stack table), and first-run automation: Python venv + MLX provisioning.

How the agent loop is described

The README’s architecture section is worth reading directly. In Build mode the story is:

Stream tokens from a local MLX server.
Parse XML <action> blocks from the stream (upstream notes small models behaving more reliably with XML than JSON tool calls).
Execute actions (file writes, bash, etc.) and feed results back—up to ~40 rounds per user message in the documented design.
Flush partial file writes on a timer so the preview iframe can reload while generation is in flight.

That pattern—stream → parse imperative actions → mutate workspace → loop—is the same family of “local Codex-style” loops teams are standardizing on in 2026; here it is bound to Gemma + MLX instead of a hosted API.

Models and memory (from upstream table)

The project’s README publishes a simple matrix. Paraphrased here—re-verify on the repo before you buy hardware:

Variant (as labeled upstream)	Approximate size	Notes
Gemma 4 E2B	~1.5 GB	Faster, lighter tasks
Gemma 4 E4B	~3 GB	Recommended balance in README
Gemma 4 27B MoE	~8 GB	Stronger reasoning; 16 GB+ RAM class machine
Gemma 4 31B	~18 GB	Heaviest; 32 GB+ RAM class machine

Community replies on X have asked the same question your laptop will ask: which row is “enough” for acceptable latency on your thermal budget—there is no substitute for local profiling on the exact chip and cooling you ship with.

Getting started (upstream commands)

From the README’s Getting Started block:

git clone https://github.com/ammaarreshi/gemma-chat.git
cd gemma-chat
npm install
npm run dev

Note: Some README snapshots on the web have referenced alternate clone URLs; use the repository you intend to fork and verify default branch and package scripts in package.json before documenting runbooks internally.

Packaging:

npm run dist

Upstream states this yields a .dmg for drag-to-Applications installs.

Tradeoffs practitioners are already naming

Offline inference ≠ offline everything. Installing npm dependencies, reading live API docs, and shipping CI/CD still want a network—even when the model weights never leave the machine. That distinction matters for security reviews (“data never hits OpenAI”) vs program reality (“the loop still phones home for packages”).
First-run downloads are the fragile step: public replies mention crashes during model download—triage via Issues and pinned guidance rather than assumptions.
Ecosystem routing: Comments ask for tighter integration with existing local weight stores (for example pointing at Ollama or LM Studio). The repo description already mentions Ollama; whether that satisfies “use my existing cache” is an integration detail to confirm in code and docs.
Speech-to-text: A reply thread references MLX-VLM-style server paths for STT—interesting for forks, not something to assert without matching commit and IPC in this repo.

Why ExplainX readers should care

ExplainX indexes skills, tools, agents, and MCP servers for teams that ship with assistants. Gemma Chat is a reference for one slice of that map: desktop shell + local weights + tool protocol + workspace sandbox. Whether you adopt it directly or borrow patterns, the artifact is inspectable in MIT-licensed source.

Related on ExplainX

What are agent skills? — how portable SKILL.md-style capability fits next to local agent loops
Google Chrome “skills” and Gemini — Google’s product-surface automation story vs local Gemma runs
What is MCP? — when your local app exposes tools to hosts
GLM-5.1, Hugging Face, Ollama — another local weights + stack matrix explainer for comparison

Sources

Repository: github.com/ammaarreshi/gemma-chat
Gemma (Google DeepMind open models): positioning and ecosystem context via Google Gemma on X and official Gemma documentation—use those for model policy and license nuance beyond this app.
MLX: Apple’s machine learning research materials on MLX / MLX-LM for runtime semantics.

Star counts, default models, and README clone URLs drift quickly after a viral launch. Reconcile any numbers in this post with the live GitHub page and Issues before budgeting hardware or support.

Gemma Chat: offline vibe coding with Gemma 4 and MLX on Mac

TL;DR

What shipped

How the agent loop is described

Models and memory (from upstream table)

Getting started (upstream commands)

Tradeoffs practitioners are already naming

Why ExplainX readers should care

Related on ExplainX

Sources

Related posts

ACE-Step UI: detailed guide to the open-source Suno alternative for local AI music

google/skills: Google’s official Agent Skills repo for Cloud, Gemini, and recipes

Agentic fatigue meets vibe coding: the AI developer productivity paradox (2026)