TL;DR: At Google I/O 2026, two palm-sized bipedal robot ducks demonstrated what on-device AI looks like in practice: Gemma 4 E2B running on a Raspberry Pi 5 and an NVIDIA Jetson Orin Nano, processing speech, vision, and generating responses in real time—no cloud, no API calls, no latency tax. The ducks are based on the open-source Open Duck Mini v2 project by Antoine Piron, and the demo was led by Xavier Plantaz, Partner Solutions Engineer at Google. One of the ducks, asked to introduce itself, replied: "I am Autumn, a small duck robot's brain."
What Is the Open Duck Mini?
The Open Duck Mini (ODM) is an open-source, 3D-printable bipedal duck robot created by Antoine Piron and hosted on GitHub. It is directly inspired by Disney's BDX Droid—the small walking robot revealed at Star Wars Celebration—and costs roughly $400 in components to build from scratch.
The project's goal is simple: make legged robotics accessible to anyone with a 3D printer and basic electronics skills. Piron has released all hardware designs, firmware, and software under an open license, and the Google I/O 2026 demo represents the project's most high-profile moment yet.
The Google I/O 2026 Demo
Xavier Plantaz brought two ducks to the stage. Both were running Gemma 4 E2B entirely on-device, with no cloud connection:
| Duck | Hardware | Model |
|---|---|---|
| Duck A | Raspberry Pi 5 | Gemma 4 E2B on LiteRT |
| Duck B | NVIDIA Jetson Orin Nano | Gemma 4 E2B on LiteRT |
Both ducks are equipped with:
- Microphone — for speech input
- Camera — for visual environment understanding
- Speaker — for spoken responses
- LED antennas — for expressive status signaling
The ducks boot into an attention mode, an animated loop with LED expressions that signals the system is live and listening.
The Live Interaction
During the demo, Plantaz asked one of the ducks:
"Hey Gemma, explain me what a large language model is."
The duck responded:
"A large language model is a complex AI designed to understand and generate human-like text. I am here to assist you with tasks and interactions."
When asked to introduce itself:
"I am Autumn, a small duck robot's brain."
The name "Autumn" comes from ODM—the Open Duck Mini acronym, phonetically rendered. The response latency was described as "very snappy" by Plantaz, running entirely on local hardware.
The Technical Stack
The full on-device pipeline across both duck configurations:
Speech-to-Text
Parakeet — NVIDIA's open ASR model handles transcription of the duck's microphone input into text, running locally without network access.
Inference
Gemma 4 E2B on LiteRT — Google's 2B-parameter edge model processes the transcribed text along with visual input from the camera. LiteRT (formerly TensorFlow Lite) handles model loading, memory management, and hardware acceleration, targeting GPU or NPU backends where available.
Text-to-Speech
Kokoro — A lightweight open-source TTS model generates the duck's spoken audio output from Gemma's text response.
The entire stack runs on a single board computer in real time, with no external dependencies.
What Is Gemma 4 E2B?
Gemma 4 was released on April 2, 2026 as Google's latest family of open-weight models under the Apache 2.0 license. The family includes:
- E2B — 2B parameters, optimized for mobile and edge
- E4B — 4B parameters, higher capability at slightly more cost
- 26B MoE — mixture-of-experts, server-class
- 31B dense — full-size dense model
The E2B variant is the one powering the ducks. Key specs:
- Parameters: ~2 billion
- Model size: ~2.58 GB
- Runtime memory: ~607 MB on XNNPACK (Apple CPUs), comparable on ARM Linux
- Context window: 256K tokens
- Modalities: text + image + audio (E2B and E4B)
- Deployment: via LiteRT-LM on Android, iOS, and Linux edge boards
- License: Apache 2.0
The E2B's native multimodality is what makes the duck demo possible—the same model simultaneously understands speech tokens (from Parakeet's output), processes visual tokens from the camera, and generates a response, all in one forward pass pipeline.
Why On-Device Matters for Robotics
Running AI inference locally on the robot rather than in the cloud changes the calculus for robotics in several meaningful ways:
Latency — No round-trip to a remote server. The duck responds in real time because the model is co-located with the sensors.
Privacy — Everything the camera sees and the microphone hears stays on-device. No audio or video is transmitted to any external service.
Connectivity independence — The duck works in a basement, a field, or anywhere without reliable internet. This matters enormously for autonomous robots operating in unstructured environments.
Cost at scale — Once the hardware is purchased, there is no per-query API cost. A fleet of 1,000 ducks running Gemma 4 E2B pays nothing per inference beyond electricity.
What Comes Next
Plantaz described the current demo as the foundation, not the destination. The stated next steps:
- Walking — The ducks currently demo speech and vision. The next milestone is getting them to walk using the bipedal locomotion system Antoine Piron has developed.
- Seeing each other — With cameras on both ducks, they could detect and recognize each other as agents in a shared environment.
- Talking to each other — Peer-to-peer communication between duck agents, enabling collaborative or emergent behavior.
- Autonomous exploration — The longer-horizon goal: ducks that can navigate their environment independently, making decisions from visual and contextual inputs.
How to Build Your Own
The Open Duck Mini project is fully open source:
- GitHub: github.com/apirrone/Open_Duck_Mini
- Hardware: All STL files for 3D printing are provided
- Cost: ~$400 in motors, electronics, and fasteners
- Compute: Raspberry Pi 5 (8GB) recommended for the Gemma 4 E2B stack
- Model: litert-community/gemma-4-E2B-it-litert-lm on Hugging Face
Gemma 4 E2B runs on LiteRT via the standard MediaPipe LLM Inference API, the same stack used for on-device inference on Android phones. If you can run it on a phone, you can run it on a Pi 5.
The Broader Signal
The Open Duck Mini demo at Google I/O 2026 is a proof of concept in the most literal sense: proof that a $400 hobby robot can carry a capable, multimodal AI model entirely on-device, respond conversationally in real time, and understand its visual environment—all running on a $80 single-board computer.
Three years ago, this required a GPU server and a cloud API. Today it runs on a duck.
The models will keep getting smaller and faster. The hardware will keep getting cheaper. What the Open Duck Mini project shows is that the combination is already interesting enough to build with—and the next version might actually walk.