LFM2.5-230M is Liquid AI's smallest open-weight foundation model, released June 25, 2026. At 230 million parameters it is built on the LFM2 architecture for fast inference on CPUs, NPUs, and GPUs — targeting lightweight agentic workloads on phones, robots, home automation, and network devices. Base (LFM2.5-230M-Base) and post-trained (LFM2.5-230M) checkpoints are on Hugging Face.

How fast is LFM2.5-230M on edge hardware?

Liquid AI reports 213 tokens per second decode speed on a Samsung Galaxy S25 Ultra (Qualcomm Snapdragon Gen4 CPU) and 42 tok/s on a Raspberry Pi 5 CPU. The model delivers the highest prefill and decode throughput in its class on both platforms while keeping the smallest memory footprint among comparable small models.

What is LFM2.5-230M good at — and what should you avoid?

Liquid AI recommends it for large-scale data extraction pipelines and lightweight on-device agentic workloads — instruction following, structured extraction, and tool use (BFCL benchmarks). It explicitly does not recommend the model for reasoning-heavy tasks: advanced math, code generation, or creative writing. For frontier reasoning at small scale, see specialist models like VibeThinker-3B instead.

How was LFM2.5-230M trained?

Pre-training on 19 trillion tokens, including a 32K context extension phase. Post-training uses three stages: (1) supervised fine-tuning with distillation from LFM2.5-350M, (2) direct preference optimization (DPO), and (3) multi-domain reinforcement learning. The recipe preserves flexibility for downstream fine-tuning while delivering strong out-of-the-box tool-use and extraction capability.

What inference runtimes support LFM2.5-230M?

Day-one support across llama.cpp (GGUF for edge), MLX (Apple Silicon), vLLM and SGLang (GPU serving), and ONNX (cross-platform accelerators). Liquid AI also ships an internal GPU inference stack for low-latency enterprise deployments benchmarked against SGLang-served competitors.

How is Liquid AI using LFM2.5-230M in robotics?

As an early demo, Liquid AI deployed LFM2.5-230M on a Unitree G1 humanoid running entirely on-device on its onboard NVIDIA Jetson Orin. After a quick fine-tune, the model acts as a skill-selection layer: it takes natural-language instructions and decomposes them into tool calls that invoke pre-trained low-level skills from NVIDIA's SONIC framework — e.g. timed walking at target velocity and one-legged kneel sequences.

LFM2.5-230M is Liquid AI's smallest open-weight foundation model, released June 25, 2026. At 230 million parameters it is built on the LFM2 architecture for fast inference on CPUs, NPUs, and GPUs — targeting lightweight agentic workloads on phones, robots, home automation, and network devices. Base (LFM2.5-230M-Base) and post-trained (LFM2.5-230M) checkpoints are on Hugging Face.

How fast is LFM2.5-230M on edge hardware?

Liquid AI reports 213 tokens per second decode speed on a Samsung Galaxy S25 Ultra (Qualcomm Snapdragon Gen4 CPU) and 42 tok/s on a Raspberry Pi 5 CPU. The model delivers the highest prefill and decode throughput in its class on both platforms while keeping the smallest memory footprint among comparable small models.

What is LFM2.5-230M good at — and what should you avoid?

Liquid AI recommends it for large-scale data extraction pipelines and lightweight on-device agentic workloads — instruction following, structured extraction, and tool use (BFCL benchmarks). It explicitly does not recommend the model for reasoning-heavy tasks: advanced math, code generation, or creative writing. For frontier reasoning at small scale, see specialist models like VibeThinker-3B instead.

How was LFM2.5-230M trained?

Pre-training on 19 trillion tokens, including a 32K context extension phase. Post-training uses three stages: (1) supervised fine-tuning with distillation from LFM2.5-350M, (2) direct preference optimization (DPO), and (3) multi-domain reinforcement learning. The recipe preserves flexibility for downstream fine-tuning while delivering strong out-of-the-box tool-use and extraction capability.

What inference runtimes support LFM2.5-230M?

Day-one support across llama.cpp (GGUF for edge), MLX (Apple Silicon), vLLM and SGLang (GPU serving), and ONNX (cross-platform accelerators). Liquid AI also ships an internal GPU inference stack for low-latency enterprise deployments benchmarked against SGLang-served competitors.

How is Liquid AI using LFM2.5-230M in robotics?

As an early demo, Liquid AI deployed LFM2.5-230M on a Unitree G1 humanoid running entirely on-device on its onboard NVIDIA Jetson Orin. After a quick fine-tune, the model acts as a skill-selection layer: it takes natural-language instructions and decomposes them into tool calls that invoke pre-trained low-level skills from NVIDIA's SONIC framework — e.g. timed walking at target velocity and one-legged kneel sequences.

LFM2.5-230M is Liquid AI's smallest open-weight foundation model, released June 25, 2026. At 230 million parameters it is built on the LFM2 architecture for fast inference on CPUs, NPUs, and GPUs — targeting lightweight agentic workloads on phones, robots, home automation, and network devices. Base (LFM2.5-230M-Base) and post-trained (LFM2.5-230M) checkpoints are on Hugging Face.

How fast is LFM2.5-230M on edge hardware?

Liquid AI reports 213 tokens per second decode speed on a Samsung Galaxy S25 Ultra (Qualcomm Snapdragon Gen4 CPU) and 42 tok/s on a Raspberry Pi 5 CPU. The model delivers the highest prefill and decode throughput in its class on both platforms while keeping the smallest memory footprint among comparable small models.

What is LFM2.5-230M good at — and what should you avoid?

Liquid AI recommends it for large-scale data extraction pipelines and lightweight on-device agentic workloads — instruction following, structured extraction, and tool use (BFCL benchmarks). It explicitly does not recommend the model for reasoning-heavy tasks: advanced math, code generation, or creative writing. For frontier reasoning at small scale, see specialist models like VibeThinker-3B instead.

How was LFM2.5-230M trained?

Pre-training on 19 trillion tokens, including a 32K context extension phase. Post-training uses three stages: (1) supervised fine-tuning with distillation from LFM2.5-350M, (2) direct preference optimization (DPO), and (3) multi-domain reinforcement learning. The recipe preserves flexibility for downstream fine-tuning while delivering strong out-of-the-box tool-use and extraction capability.

What inference runtimes support LFM2.5-230M?

Day-one support across llama.cpp (GGUF for edge), MLX (Apple Silicon), vLLM and SGLang (GPU serving), and ONNX (cross-platform accelerators). Liquid AI also ships an internal GPU inference stack for low-latency enterprise deployments benchmarked against SGLang-served competitors.

How is Liquid AI using LFM2.5-230M in robotics?

As an early demo, Liquid AI deployed LFM2.5-230M on a Unitree G1 humanoid running entirely on-device on its onboard NVIDIA Jetson Orin. After a quick fine-tune, the model acts as a skill-selection layer: it takes natural-language instructions and decomposes them into tool calls that invoke pre-trained low-level skills from NVIDIA's SONIC framework — e.g. timed walking at target velocity and one-legged kneel sequences.

LFM2.5-230M is Liquid AI's smallest open-weight foundation model, released June 25, 2026. At 230 million parameters it is built on the LFM2 architecture for fast inference on CPUs, NPUs, and GPUs — targeting lightweight agentic workloads on phones, robots, home automation, and network devices. Base (LFM2.5-230M-Base) and post-trained (LFM2.5-230M) checkpoints are on Hugging Face.

How fast is LFM2.5-230M on edge hardware?

Liquid AI reports 213 tokens per second decode speed on a Samsung Galaxy S25 Ultra (Qualcomm Snapdragon Gen4 CPU) and 42 tok/s on a Raspberry Pi 5 CPU. The model delivers the highest prefill and decode throughput in its class on both platforms while keeping the smallest memory footprint among comparable small models.

What is LFM2.5-230M good at — and what should you avoid?

Liquid AI recommends it for large-scale data extraction pipelines and lightweight on-device agentic workloads — instruction following, structured extraction, and tool use (BFCL benchmarks). It explicitly does not recommend the model for reasoning-heavy tasks: advanced math, code generation, or creative writing. For frontier reasoning at small scale, see specialist models like VibeThinker-3B instead.

How was LFM2.5-230M trained?

Pre-training on 19 trillion tokens, including a 32K context extension phase. Post-training uses three stages: (1) supervised fine-tuning with distillation from LFM2.5-350M, (2) direct preference optimization (DPO), and (3) multi-domain reinforcement learning. The recipe preserves flexibility for downstream fine-tuning while delivering strong out-of-the-box tool-use and extraction capability.

What inference runtimes support LFM2.5-230M?

Day-one support across llama.cpp (GGUF for edge), MLX (Apple Silicon), vLLM and SGLang (GPU serving), and ONNX (cross-platform accelerators). Liquid AI also ships an internal GPU inference stack for low-latency enterprise deployments benchmarked against SGLang-served competitors.

How is Liquid AI using LFM2.5-230M in robotics?

As an early demo, Liquid AI deployed LFM2.5-230M on a Unitree G1 humanoid running entirely on-device on its onboard NVIDIA Jetson Orin. After a quick fine-tune, the model acts as a skill-selection layer: it takes natural-language instructions and decomposes them into tool calls that invoke pre-trained low-level skills from NVIDIA's SONIC framework — e.g. timed walking at target velocity and one-legged kneel sequences.

LFM2.5-230M is Liquid AI's smallest open-weight foundation model, released June 25, 2026. At 230 million parameters it is built on the LFM2 architecture for fast inference on CPUs, NPUs, and GPUs — targeting lightweight agentic workloads on phones, robots, home automation, and network devices. Base (LFM2.5-230M-Base) and post-trained (LFM2.5-230M) checkpoints are on Hugging Face.

How fast is LFM2.5-230M on edge hardware?

Liquid AI reports 213 tokens per second decode speed on a Samsung Galaxy S25 Ultra (Qualcomm Snapdragon Gen4 CPU) and 42 tok/s on a Raspberry Pi 5 CPU. The model delivers the highest prefill and decode throughput in its class on both platforms while keeping the smallest memory footprint among comparable small models.

What is LFM2.5-230M good at — and what should you avoid?

Liquid AI recommends it for large-scale data extraction pipelines and lightweight on-device agentic workloads — instruction following, structured extraction, and tool use (BFCL benchmarks). It explicitly does not recommend the model for reasoning-heavy tasks: advanced math, code generation, or creative writing. For frontier reasoning at small scale, see specialist models like VibeThinker-3B instead.

How was LFM2.5-230M trained?

Pre-training on 19 trillion tokens, including a 32K context extension phase. Post-training uses three stages: (1) supervised fine-tuning with distillation from LFM2.5-350M, (2) direct preference optimization (DPO), and (3) multi-domain reinforcement learning. The recipe preserves flexibility for downstream fine-tuning while delivering strong out-of-the-box tool-use and extraction capability.

What inference runtimes support LFM2.5-230M?

Day-one support across llama.cpp (GGUF for edge), MLX (Apple Silicon), vLLM and SGLang (GPU serving), and ONNX (cross-platform accelerators). Liquid AI also ships an internal GPU inference stack for low-latency enterprise deployments benchmarked against SGLang-served competitors.

How is Liquid AI using LFM2.5-230M in robotics?

As an early demo, Liquid AI deployed LFM2.5-230M on a Unitree G1 humanoid running entirely on-device on its onboard NVIDIA Jetson Orin. After a quick fine-tune, the model acts as a skill-selection layer: it takes natural-language instructions and decomposes them into tool calls that invoke pre-trained low-level skills from NVIDIA's SONIC framework — e.g. timed walking at target velocity and one-legged kneel sequences.

LFM2.5-230M is Liquid AI's smallest open-weight foundation model, released June 25, 2026. At 230 million parameters it is built on the LFM2 architecture for fast inference on CPUs, NPUs, and GPUs — targeting lightweight agentic workloads on phones, robots, home automation, and network devices. Base (LFM2.5-230M-Base) and post-trained (LFM2.5-230M) checkpoints are on Hugging Face.

How fast is LFM2.5-230M on edge hardware?

Liquid AI reports 213 tokens per second decode speed on a Samsung Galaxy S25 Ultra (Qualcomm Snapdragon Gen4 CPU) and 42 tok/s on a Raspberry Pi 5 CPU. The model delivers the highest prefill and decode throughput in its class on both platforms while keeping the smallest memory footprint among comparable small models.

What is LFM2.5-230M good at — and what should you avoid?

Liquid AI recommends it for large-scale data extraction pipelines and lightweight on-device agentic workloads — instruction following, structured extraction, and tool use (BFCL benchmarks). It explicitly does not recommend the model for reasoning-heavy tasks: advanced math, code generation, or creative writing. For frontier reasoning at small scale, see specialist models like VibeThinker-3B instead.

How was LFM2.5-230M trained?

Pre-training on 19 trillion tokens, including a 32K context extension phase. Post-training uses three stages: (1) supervised fine-tuning with distillation from LFM2.5-350M, (2) direct preference optimization (DPO), and (3) multi-domain reinforcement learning. The recipe preserves flexibility for downstream fine-tuning while delivering strong out-of-the-box tool-use and extraction capability.

What inference runtimes support LFM2.5-230M?

Day-one support across llama.cpp (GGUF for edge), MLX (Apple Silicon), vLLM and SGLang (GPU serving), and ONNX (cross-platform accelerators). Liquid AI also ships an internal GPU inference stack for low-latency enterprise deployments benchmarked against SGLang-served competitors.

How is Liquid AI using LFM2.5-230M in robotics?

As an early demo, Liquid AI deployed LFM2.5-230M on a Unitree G1 humanoid running entirely on-device on its onboard NVIDIA Jetson Orin. After a quick fine-tune, the model acts as a skill-selection layer: it takes natural-language instructions and decomposes them into tool calls that invoke pre-trained low-level skills from NVIDIA's SONIC framework — e.g. timed walking at target velocity and one-legged kneel sequences.

LFM2.5-230M is Liquid AI's smallest open-weight foundation model, released June 25, 2026. At 230 million parameters it is built on the LFM2 architecture for fast inference on CPUs, NPUs, and GPUs — targeting lightweight agentic workloads on phones, robots, home automation, and network devices. Base (LFM2.5-230M-Base) and post-trained (LFM2.5-230M) checkpoints are on Hugging Face.

How fast is LFM2.5-230M on edge hardware?

Liquid AI reports 213 tokens per second decode speed on a Samsung Galaxy S25 Ultra (Qualcomm Snapdragon Gen4 CPU) and 42 tok/s on a Raspberry Pi 5 CPU. The model delivers the highest prefill and decode throughput in its class on both platforms while keeping the smallest memory footprint among comparable small models.

What is LFM2.5-230M good at — and what should you avoid?

Liquid AI recommends it for large-scale data extraction pipelines and lightweight on-device agentic workloads — instruction following, structured extraction, and tool use (BFCL benchmarks). It explicitly does not recommend the model for reasoning-heavy tasks: advanced math, code generation, or creative writing. For frontier reasoning at small scale, see specialist models like VibeThinker-3B instead.

How was LFM2.5-230M trained?

Pre-training on 19 trillion tokens, including a 32K context extension phase. Post-training uses three stages: (1) supervised fine-tuning with distillation from LFM2.5-350M, (2) direct preference optimization (DPO), and (3) multi-domain reinforcement learning. The recipe preserves flexibility for downstream fine-tuning while delivering strong out-of-the-box tool-use and extraction capability.

What inference runtimes support LFM2.5-230M?

Day-one support across llama.cpp (GGUF for edge), MLX (Apple Silicon), vLLM and SGLang (GPU serving), and ONNX (cross-platform accelerators). Liquid AI also ships an internal GPU inference stack for low-latency enterprise deployments benchmarked against SGLang-served competitors.

How is Liquid AI using LFM2.5-230M in robotics?

As an early demo, Liquid AI deployed LFM2.5-230M on a Unitree G1 humanoid running entirely on-device on its onboard NVIDIA Jetson Orin. After a quick fine-tune, the model acts as a skill-selection layer: it takes natural-language instructions and decomposes them into tool calls that invoke pre-trained low-level skills from NVIDIA's SONIC framework — e.g. timed walking at target velocity and one-legged kneel sequences.

LFM2.5-230M is Liquid AI's smallest open-weight foundation model, released June 25, 2026. At 230 million parameters it is built on the LFM2 architecture for fast inference on CPUs, NPUs, and GPUs — targeting lightweight agentic workloads on phones, robots, home automation, and network devices. Base (LFM2.5-230M-Base) and post-trained (LFM2.5-230M) checkpoints are on Hugging Face.

How fast is LFM2.5-230M on edge hardware?

Liquid AI reports 213 tokens per second decode speed on a Samsung Galaxy S25 Ultra (Qualcomm Snapdragon Gen4 CPU) and 42 tok/s on a Raspberry Pi 5 CPU. The model delivers the highest prefill and decode throughput in its class on both platforms while keeping the smallest memory footprint among comparable small models.

What is LFM2.5-230M good at — and what should you avoid?

Liquid AI recommends it for large-scale data extraction pipelines and lightweight on-device agentic workloads — instruction following, structured extraction, and tool use (BFCL benchmarks). It explicitly does not recommend the model for reasoning-heavy tasks: advanced math, code generation, or creative writing. For frontier reasoning at small scale, see specialist models like VibeThinker-3B instead.

How was LFM2.5-230M trained?

Pre-training on 19 trillion tokens, including a 32K context extension phase. Post-training uses three stages: (1) supervised fine-tuning with distillation from LFM2.5-350M, (2) direct preference optimization (DPO), and (3) multi-domain reinforcement learning. The recipe preserves flexibility for downstream fine-tuning while delivering strong out-of-the-box tool-use and extraction capability.

What inference runtimes support LFM2.5-230M?

Day-one support across llama.cpp (GGUF for edge), MLX (Apple Silicon), vLLM and SGLang (GPU serving), and ONNX (cross-platform accelerators). Liquid AI also ships an internal GPU inference stack for low-latency enterprise deployments benchmarked against SGLang-served competitors.

How is Liquid AI using LFM2.5-230M in robotics?

As an early demo, Liquid AI deployed LFM2.5-230M on a Unitree G1 humanoid running entirely on-device on its onboard NVIDIA Jetson Orin. After a quick fine-tune, the model acts as a skill-selection layer: it takes natural-language instructions and decomposes them into tool calls that invoke pre-trained low-level skills from NVIDIA's SONIC framework — e.g. timed walking at target velocity and one-legged kneel sequences.

LFM2.5-230M is Liquid AI's smallest open-weight foundation model, released June 25, 2026. At 230 million parameters it is built on the LFM2 architecture for fast inference on CPUs, NPUs, and GPUs — targeting lightweight agentic workloads on phones, robots, home automation, and network devices. Base (LFM2.5-230M-Base) and post-trained (LFM2.5-230M) checkpoints are on Hugging Face.

How fast is LFM2.5-230M on edge hardware?

Liquid AI reports 213 tokens per second decode speed on a Samsung Galaxy S25 Ultra (Qualcomm Snapdragon Gen4 CPU) and 42 tok/s on a Raspberry Pi 5 CPU. The model delivers the highest prefill and decode throughput in its class on both platforms while keeping the smallest memory footprint among comparable small models.

What is LFM2.5-230M good at — and what should you avoid?

Liquid AI recommends it for large-scale data extraction pipelines and lightweight on-device agentic workloads — instruction following, structured extraction, and tool use (BFCL benchmarks). It explicitly does not recommend the model for reasoning-heavy tasks: advanced math, code generation, or creative writing. For frontier reasoning at small scale, see specialist models like VibeThinker-3B instead.

LFM2.5-230M: Liquid AI Edge Agent Model — 213 tok/s on Phone CPU | explainx.ai Blog

On June 25, 2026, Liquid AI released LFM2.5-230M — its smallest foundation model yet, and one of the clearest 2026 statements about where the edge-AI market is heading: not bigger models in the cloud, but fast, open-weight models that run agentic tool loops on the device you already have.

Liquid AI's framing on X (@liquidai) and in the official blog post is explicit: LFM2.5-230M is built to run anywhere — cloud GPUs, phone CPUs, Raspberry Pi boards, and robot onboard computers — and to power data extraction pipelines and lightweight on-device agentic workloads, not frontier math or long-form creative writing.

TL;DR

Spec	LFM2.5-230M
Parameters	230M (smallest in LFM2.5 family)
Architecture	LFM2 (Liquid Foundation Model v2)
Pre-training	19T tokens + 32K context extension
Post-training	SFT (distilled from LFM2.5-350M) → DPO → multi-domain RL
Variants	LFM2.5-230M-Base, LFM2.5-230M (post-trained)
Availability	Hugging Face (open-weight)
Phone CPU decode	213 tok/s (Galaxy S25 Ultra, Snapdragon Gen4)
Pi 5 CPU decode	42 tok/s (Raspberry Pi 5)
Best for	Tool use, data extraction, instruction following
Avoid for	Advanced math, code generation, creative writing
Inference	llama.cpp, MLX, vLLM, SGLang, ONNX

Why Liquid AI Built a 230M Model

The small-model landscape in mid-2026 splits into two camps:

Reasoning specialists — models like VibeThinker-3B that compress verifiable math and coding into compact parameter counts.
Edge agents — models optimized for speed, tool calling, and structured extraction on constrained hardware.

LFM2.5-230M is firmly in the second camp. Liquid AI is not trying to beat Claude Fable 5 on SWE-Bench. It is trying to make "hold still for 2 seconds, then walk forward at 1 meter per second" parse into a valid multi-step robot skill plan — on a Jetson Orin, with no cloud round-trip.

That use case — natural language → structured tool calls → physical action — is the same pattern emerging across home automation, industrial IoT, and phone-based agents. The bottleneck is not raw IQ. It is latency, memory footprint, and inference cost per tool loop.

Training Recipe

Liquid AI's post-training pipeline is designed to preserve downstream fine-tuning flexibility while shipping strong default capability:

Stage 1: Supervised fine-tuning with distillation

The 230M model learns from LFM2.5-350M — a larger sibling in the same architecture family. Distillation from a bigger in-family model is a proven pattern for small models: the teacher provides richer supervision signals than raw pre-training alone, without requiring the student to match the teacher on every task class.

Stage 2: Direct preference optimization (DPO)

DPO aligns the model with human-preferred outputs without a separate reward model training loop — lighter-weight than classic RLHF for a model this size.

Stage 3: Multi-domain reinforcement learning

RL across multiple domains pushes tool-use and extraction behavior beyond what SFT alone achieves — similar in spirit to the multi-domain RL stage in other 2026 small-model pipelines, but tuned for applied tasks rather than competition math.

The base checkpoint (LFM2.5-230M-Base) skips post-training for developers who want a clean starting point for domain-specific fine-tunes.

Benchmarks: Beats Models Twice Its Size — on the Right Tasks

Liquid AI evaluated LFM2.5-230M across ten benchmarks. The headline from the blog post: despite 230M parameters, it competes with and often beats models more than twice as large on instruction following, data extraction, and tool use.

Knowledge and instruction following

Model	GPQA Diamond	MMLU-Pro	IFEval	IFBench	Multi-IF
LFM2.5-230M	25.41	20.25	71.71	38.40	37.70
LFM2.5-350M	30.64	20.01	76.96	40.69	44.92
LFM2-350M	27.58	19.29	64.96	18.20	32.92
Granite 4.0-H-350M	22.32	13.14	61.27	17.22	28.70
Qwen3.5-0.8B (Instruct)	27.41	37.42	59.94	22.87	41.68
Gemma 3 1B IT	23.89	14.04	63.49	20.33	44.25

On IFEval and IFBench, LFM2.5-230M leads Gemma 3 1B and Qwen3.5-0.8B despite being 3–4× smaller. On broad knowledge (MMLU-Pro), Qwen3.5-0.8B still wins — consistent with the Parametric Compression-Coverage pattern: knowledge coverage scales with parameters differently than instruction-following discipline.

Tool use and data extraction

Model	CaseReportBench	BFCLv3	BFCLv4	τ²-Bench Telecom	τ²-Bench Retail
LFM2.5-230M	22.51	43.26	21.03	5.26	13.68
LFM2.5-350M	32.45	44.11	21.86	18.86	17.84
LFM2-350M	11.67	22.95	12.29	10.82	5.56
Granite 4.0-H-350M	12.44	43.07	13.28	13.74	6.14
Qwen3.5-0.8B (Instruct)	13.83	35.08	18.70	12.57	6.14

BFCLv3 (Berkeley Function Calling Leaderboard) scores above 43 put LFM2.5-230M in the same tier as Granite 4.0-H-350M — a model with ~50% more parameters. CaseReportBench (structured medical/clinical data extraction) at 22.51 beats Qwen3.5-0.8B (13.83) and LFM2-350M (11.67) by wide margins.

The τ²-Bench telecom scores are low across the board for 230M — multi-turn customer-service simulation is hard at this scale. Retail is relatively stronger (13.68), suggesting the model handles simpler structured agent scenarios better than long conversational tool chains.

CPU Speed: 213 tok/s on a Phone, 42 tok/s on a Pi

Raw benchmark scores matter less if inference is too slow for real-time agents. Liquid AI's CPU numbers are the release's most practical signal:

Platform	Hardware	Decode throughput
Samsung Galaxy S25 Ultra	Qualcomm Snapdragon Gen4 (CPU)	213 tok/s
Raspberry Pi 5	ARM CPU	42 tok/s

Liquid AI compares LFM2.5-230M against similar-sized attention-based and hybrid models (SSM hybrids, Gated Delta Networks) and reports the highest prefill and decode throughput in its class with the smallest memory footprint.

Flash-attention tuning is device-specific: enabled (-fa 1) on Raspberry Pi 5, disabled (-fa 0) on Snapdragon Gen4 — a reminder that edge deployment is as much about per-platform tuning as model selection. See our quantization guide for the broader stack of techniques that make sub-billion models viable on consumer hardware.

Inference Ecosystem: Day-One Support

LFM2.5-230M ships with checkpoints across the edge-to-cloud inference stack:

Runtime	Use case
llama.cpp	GGUF checkpoints for Raspberry Pi, phones, embedded
MLX	Apple Silicon (Mac, iPhone via future MLX ports)
vLLM / SGLang	GPU-accelerated production serving
ONNX	Cross-platform deployment across diverse accelerators

For production GPU serving, Liquid AI also benchmarks an internal inference stack against SGLang-served competitors — reporting lower end-to-end latency across concurrency levels for LFM2.5 models.

Unitree G1 Demo: Natural Language → Robot Skills

The most visually compelling demo in the release is not a benchmark table — it is a Unitree G1 humanoid robot running LFM2.5-230M entirely on-device on its onboard NVIDIA Jetson Orin.

The architecture:

User speaks a free-form natural-language command.
LFM2.5-230M (after a quick fine-tune) acts as a skill-selection layer.
The model decomposes the instruction into a sequence of tool calls.
Each tool call invokes a pre-trained low-level skill from NVIDIA's SONIC framework — timed walking, velocity targets, one-legged kneel holds, etc.

Example command from Liquid AI's blog:

"Hold still for 2 seconds, then walk forward at 1 meter per second for 3 meters, hold a forward one-leg kneel for 5 seconds, and walk backward at 0.5 meters per second for 3 meters"

The model outputs a structured multi-step plan chaining skills like timed walking and kneel holds — without cloud inference.

This parallels the Gemma 4 + Open Duck Mini demo at Google I/O 2026 — but with a different model class: 230M parameters focused on tool decomposition rather than 2B multimodal conversation. Both demos point to the same product direction — robots and edge devices need a language-to-action compiler, not a chatbot.

Liquid AI's demo video: YouTube Shorts — Unitree G1 + LFM2.5-230M

Where LFM2.5-230M Fits in the Small-Model Landscape

Model	Params	Strength	Weakness
LFM2.5-230M	230M	Speed, tool use, extraction, edge agents	Math, code, creative writing
MiniCPM5-1B	1B	Broad open-model intelligence at 0.5GB	Heavier than 230M for pure tool loops
VibeThinker-3B	3B	AIME 94.3, frontier verifiable reasoning	Too large for Pi-class real-time agents
Gemma 4 E2B	2B	Multimodal on-device (vision + speech)	Different deployment path (LiteRT)

Liquid AI's honest limitation statement is refreshing: do not use LFM2.5-230M for advanced math, code generation, or creative writing. That clarity helps developers route tasks correctly — use a 230M model for the tool-selection layer in a pipeline, and call a larger model (or cloud API) only when the subtask requires it.

For agentic coding on developer machines, models like Claude Opus 4.8 or OpenRouter Fusion remain the practical choice while Fable 5 stays suspended. LFM2.5-230M targets a different surface: phones, robots, home automation, and high-volume extraction pipelines where cost and latency dominate.

Get Started

Both checkpoints are available now:

LFM2.5-230M — post-trained, ready for tool-use and extraction workloads
LFM2.5-230M-Base — pre-trained base for custom fine-tuning

Download from Hugging Face and follow Liquid AI's documentation for local run and fine-tune instructions.

Liquid AI's broader LFM2.5 family spans base models, audio variants, and vision variants under one architecture — positioning the company as an efficiency-first alternative to scaling-parameter frontier labs.

Related ExplainX coverage

Post	Connection
Gemma 4 + Open Duck Mini	On-device robot demo on Pi 5 and Jetson Orin
MiniCPM5-1B	Another open small-model breakthrough at sub-1B scale
VibeThinker-3B	Opposite end: frontier reasoning in a compact model
AI Model Quantization Guide	How sub-billion models run on phones and edge boards
NVIDIA N1X at Computex 2026	On-device AI compute trend on consumer hardware

Summary

LFM2.5-230M is Liquid AI's bet that the next wave of useful AI is not another 100B-parameter cloud model — it is a 230M-parameter open-weight agent that runs at 213 tok/s on a phone CPU, parses natural language into tool calls on a humanoid robot, and beats models twice its size on instruction following and data extraction.

It is explicitly not a reasoning or coding model. It is an edge agent compiler — fast, small, and deployable everywhere from a Raspberry Pi to a Jetson Orin to a Snapdragon phone.

Last updated: June 26, 2026. Specs and benchmarks sourced from Liquid AI's blog post and @liquidai on X, published June 25, 2026.

TL;DR

Spec	LFM2.5-230M
Parameters	230M (smallest in LFM2.5 family)
Architecture	LFM2 (Liquid Foundation Model v2)
Pre-training	19T tokens + 32K context extension
Post-training	SFT (distilled from LFM2.5-350M) → DPO → multi-domain RL
Variants	LFM2.5-230M-Base, LFM2.5-230M (post-trained)
Availability	Hugging Face (open-weight)
Phone CPU decode	213 tok/s (Galaxy S25 Ultra, Snapdragon Gen4)
Pi 5 CPU decode	42 tok/s (Raspberry Pi 5)
Best for	Tool use, data extraction, instruction following
Avoid for	Advanced math, code generation, creative writing
Inference	llama.cpp, MLX, vLLM, SGLang, ONNX

Why Liquid AI Built a 230M Model

The small-model landscape in mid-2026 splits into two camps:

Reasoning specialists — models like VibeThinker-3B that compress verifiable math and coding into compact parameter counts.
Edge agents — models optimized for speed, tool calling, and structured extraction on constrained hardware.

Training Recipe

Liquid AI's post-training pipeline is designed to preserve downstream fine-tuning flexibility while shipping strong default capability:

Stage 1: Supervised fine-tuning with distillation

Stage 2: Direct preference optimization (DPO)

DPO aligns the model with human-preferred outputs without a separate reward model training loop — lighter-weight than classic RLHF for a model this size.

Stage 3: Multi-domain reinforcement learning

The base checkpoint (LFM2.5-230M-Base) skips post-training for developers who want a clean starting point for domain-specific fine-tunes.

Benchmarks: Beats Models Twice Its Size — on the Right Tasks

Knowledge and instruction following

Model	GPQA Diamond	MMLU-Pro	IFEval	IFBench	Multi-IF
LFM2.5-230M	25.41	20.25	71.71	38.40	37.70
LFM2.5-350M	30.64	20.01	76.96	40.69	44.92
LFM2-350M	27.58	19.29	64.96	18.20	32.92
Granite 4.0-H-350M	22.32	13.14	61.27	17.22	28.70
Qwen3.5-0.8B (Instruct)	27.41	37.42	59.94	22.87	41.68
Gemma 3 1B IT	23.89	14.04	63.49	20.33	44.25

Tool use and data extraction

Model	CaseReportBench	BFCLv3	BFCLv4	τ²-Bench Telecom	τ²-Bench Retail
LFM2.5-230M	22.51	43.26	21.03	5.26	13.68
LFM2.5-350M	32.45	44.11	21.86	18.86	17.84
LFM2-350M	11.67	22.95	12.29	10.82	5.56
Granite 4.0-H-350M	12.44	43.07	13.28	13.74	6.14
Qwen3.5-0.8B (Instruct)	13.83	35.08	18.70	12.57	6.14

CPU Speed: 213 tok/s on a Phone, 42 tok/s on a Pi

Raw benchmark scores matter less if inference is too slow for real-time agents. Liquid AI's CPU numbers are the release's most practical signal:

Platform	Hardware	Decode throughput
Samsung Galaxy S25 Ultra	Qualcomm Snapdragon Gen4 (CPU)	213 tok/s
Raspberry Pi 5	ARM CPU	42 tok/s

Inference Ecosystem: Day-One Support

LFM2.5-230M ships with checkpoints across the edge-to-cloud inference stack:

Runtime	Use case
llama.cpp	GGUF checkpoints for Raspberry Pi, phones, embedded
MLX	Apple Silicon (Mac, iPhone via future MLX ports)
vLLM / SGLang	GPU-accelerated production serving
ONNX	Cross-platform deployment across diverse accelerators

Unitree G1 Demo: Natural Language → Robot Skills

The most visually compelling demo in the release is not a benchmark table — it is a Unitree G1 humanoid robot running LFM2.5-230M entirely on-device on its onboard NVIDIA Jetson Orin.

The architecture:

User speaks a free-form natural-language command.
LFM2.5-230M (after a quick fine-tune) acts as a skill-selection layer.
The model decomposes the instruction into a sequence of tool calls.
Each tool call invokes a pre-trained low-level skill from NVIDIA's SONIC framework — timed walking, velocity targets, one-legged kneel holds, etc.

Example command from Liquid AI's blog:

"Hold still for 2 seconds, then walk forward at 1 meter per second for 3 meters, hold a forward one-leg kneel for 5 seconds, and walk backward at 0.5 meters per second for 3 meters"

The model outputs a structured multi-step plan chaining skills like timed walking and kneel holds — without cloud inference.

Liquid AI's demo video: YouTube Shorts — Unitree G1 + LFM2.5-230M

Where LFM2.5-230M Fits in the Small-Model Landscape

Model	Params	Strength	Weakness
LFM2.5-230M	230M	Speed, tool use, extraction, edge agents	Math, code, creative writing
MiniCPM5-1B	1B	Broad open-model intelligence at 0.5GB	Heavier than 230M for pure tool loops
VibeThinker-3B	3B	AIME 94.3, frontier verifiable reasoning	Too large for Pi-class real-time agents
Gemma 4 E2B	2B	Multimodal on-device (vision + speech)	Different deployment path (LiteRT)

Get Started

Both checkpoints are available now:

LFM2.5-230M — post-trained, ready for tool-use and extraction workloads
LFM2.5-230M-Base — pre-trained base for custom fine-tuning

Download from Hugging Face and follow Liquid AI's documentation for local run and fine-tune instructions.

Related ExplainX coverage

Post	Connection
Gemma 4 + Open Duck Mini	On-device robot demo on Pi 5 and Jetson Orin
MiniCPM5-1B	Another open small-model breakthrough at sub-1B scale
VibeThinker-3B	Opposite end: frontier reasoning in a compact model
AI Model Quantization Guide	How sub-billion models run on phones and edge boards
NVIDIA N1X at Computex 2026	On-device AI compute trend on consumer hardware

Summary

It is explicitly not a reasoning or coding model. It is an edge agent compiler — fast, small, and deployable everywhere from a Raspberry Pi to a Jetson Orin to a Snapdragon phone.

Last updated: June 26, 2026. Specs and benchmarks sourced from Liquid AI's blog post and @liquidai on X, published June 25, 2026.

TL;DR

Why Liquid AI Built a 230M Model

Training Recipe

Stage 1: Supervised fine-tuning with distillation

Stage 2: Direct preference optimization (DPO)

Stage 3: Multi-domain reinforcement learning

Benchmarks: Beats Models Twice Its Size — on the Right Tasks

Knowledge and instruction following

Tool use and data extraction

CPU Speed: 213 tok/s on a Phone, 42 tok/s on a Pi

Inference Ecosystem: Day-One Support

Unitree G1 Demo: Natural Language → Robot Skills

Where LFM2.5-230M Fits in the Small-Model Landscape

Get Started

Related ExplainX coverage

Summary

Related posts

Gemma 4 Powers Open Duck Mini: Meet Autumn, the On-Device AI Robot Duck

MiniCPM5-1B: The Tiny 1B Model That's Crushing 2B+ AI Models

MinerU 3.4: PDF and Office Parsing for LLM, RAG, and Agent Workflows

TL;DR

Why Liquid AI Built a 230M Model

Training Recipe

Stage 1: Supervised fine-tuning with distillation

Stage 2: Direct preference optimization (DPO)

Stage 3: Multi-domain reinforcement learning

Benchmarks: Beats Models Twice Its Size — on the Right Tasks

Knowledge and instruction following

Tool use and data extraction

CPU Speed: 213 tok/s on a Phone, 42 tok/s on a Pi

Inference Ecosystem: Day-One Support

Unitree G1 Demo: Natural Language → Robot Skills

Where LFM2.5-230M Fits in the Small-Model Landscape

Get Started

Related ExplainX coverage

Summary

Related posts

Gemma 4 Powers Open Duck Mini: Meet Autumn, the On-Device AI Robot Duck

MiniCPM5-1B: The Tiny 1B Model That's Crushing 2B+ AI Models

MinerU 3.4: PDF and Office Parsing for LLM, RAG, and Agent Workflows