ai-ml

DINO-X

by idea-research

DINO-X is a powerful multimodal AI model that lets you detect, localize, and describe anything in images using natural l

Empower LLMs with fine-grained visual understanding — detect, localize, and describe anything in images with natural language prompts.

github stars

112

Fine-grained object detection and localizationStructured JSON outputs with coordinatesMultiple transport modes (local/cloud)

best for

  • / Building visual AI applications and chatbots
  • / Automating visual inspection workflows
  • / Creating multimodal reasoning systems

capabilities

  • / Detect objects in images using natural language queries
  • / Generate region-level descriptions of image areas
  • / Count and locate specific objects with coordinates
  • / Analyze full images for detailed understanding
  • / Create annotated visualizations with bounding boxes
  • / Process images from local files or web URLs

what it does

Provides AI-powered object detection and visual analysis in images using natural language prompts. Works with local files or web URLs to find, locate, and describe specific objects or regions.

about

DINO-X is a community-built MCP server published by idea-research that provides AI assistants with tools and capabilities via the Model Context Protocol. DINO-X is a powerful multimodal AI model that lets you detect, localize, and describe anything in images using natural l It is categorized under ai ml.

how to install

You can install DINO-X in your AI client of choice. Use the install panel on this page to get one-click setup for Cursor, Claude Desktop, VS Code, and other MCP-compatible clients. This server runs locally on your machine via the stdio transport.

license

Apache-2.0

DINO-X is released under the Apache-2.0 license. This is a permissive open-source license, meaning you can freely use, modify, and distribute the software.

readme

DINO-X MCP Server

License npm version npm downloads PRs Welcome MCP Badge GitHub stars

English | 中文

DINO-X Official MCP Server — powered by the DINO-X and Grounding DINO models — brings fine-grained object detection and image understanding to your multimodal applications.

<p align="center"> <video width="800" controls> <source src="https://dds-frontend.oss-cn-shenzhen.aliyuncs.com/dinox-mcp/dinox-mcp-en-overveiw.mp4" type="video/mp4"> Your browser does not support the video tag. </video> </p>

Why DINO-X MCP?

With DINO-X MCP, you can:

  • Fine-Grained Understanding: Full image detection, object detection, and region-level descriptions.

  • Structured Outputs: Get object categories, counts, locations, and attributes for VQA and multi-step reasoning tasks.

  • Composable: Works seamlessly with other MCP servers to build end-to-end visual agents or automation pipelines.

Transport Modes

DINO-X MCP supports two transport modes:

FeatureSTDIO (default)Streamable HTTP
RuntimeLocalLocal or Cloud
TransportStandard I/OHTTP (streaming responses)
Input sourcefile:// and https://https:// only
VisualizationSupported (saves annotated images locally)Not supported (for now)

Quick Start

1. Prepare an MCP client

Any MCP-compatible client works, e.g.:

2. Get your API key

Apply on the DINO-X platform: Request API Key (new users get free quota).

3. Configure MCP

Option A: Official Hosted Streamable HTTP (Recommended)

Add to your MCP client config and replace with your API key:

{
  "mcpServers": {
    "dinox-mcp": {
      "url": "https://mcp.deepdataspace.com/mcp?key=your-api-key"
    }
  }
}

Option B: Use the NPM package locally (STDIO)

Install Node.js first

  • Download the installer from nodejs.org

  • Or use command:

# macOS / Linux
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.40.1/install.sh | bash
# or
wget -qO- https://raw.githubusercontent.com/nvm-sh/nvm/v0.40.1/install.sh | bash

# load nvm into current shell (choose the one you use)
source ~/.bashrc || true
source ~/.zshrc  || true

# install and use LTS Node.js
nvm install --lts
nvm use --lts

# Windows (one of the following)
winget install OpenJS.NodeJS.LTS
# or with Chocolatey (in admin PowerShell)
iwr -useb https://raw.githubusercontent.com/chocolatey/chocolatey/master/chocolateyInstall/InstallChocolatey.ps1 | iex
choco install nodejs-lts -y

Configure your MCP client:

{
  "mcpServers": {
    "dinox-mcp": {
      "command": "npx",
      "args": ["-y", "@deepdataspace/dinox-mcp"],
      "env": {
        "DINOX_API_KEY": "your-api-key-here",
        "IMAGE_STORAGE_DIRECTORY": "/path/to/your/image/directory"
      }
    }
  }
}

Note: Replace your-api-key-here with your real key.

Option C: Run from source locally

Make sure Node.js is installed (see Option B), then:

# clone
git clone https://github.com/IDEA-Research/DINO-X-MCP.git
cd DINO-X-MCP

# install deps
npm install

# build
npm run build

Configure your MCP client:

{
  "mcpServers": {
    "dinox-mcp": {
      "command": "node",
      "args": ["/path/to/DINO-X-MCP/build/index.js"],
      "env": {
        "DINOX_API_KEY": "your-api-key-here",
        "IMAGE_STORAGE_DIRECTORY": "/path/to/your/image/directory"
      }
    }
  }
}

CLI Flags & Environment Variables

  • Common flags

    • --http: start in Streamable HTTP mode (otherwise STDIO by default)
    • --stdio: force STDIO mode
    • --dinox-api-key=...: set API key
    • --enable-client-key: allow API key via URL ?key= (Streamable HTTP only)
    • --port=8080: HTTP port (default 3020)
  • Environment variables

    • DINOX_API_KEY (required/conditionally required): DINO-X platform API key
    • IMAGE_STORAGE_DIRECTORY (optional, STDIO): directory to save annotated images
    • AUTH_TOKEN (optional, HTTP): if set, client must send Authorization: Bearer <token>

    Examples:

# STDIO (local)
node build/index.js --dinox-api-key=your-api-key

# Streamable HTTP (server provides a shared API key)
node build/index.js --http --dinox-api-key=your-api-key

# Streamable HTTP (custom port)
node build/index.js --http --dinox-api-key=your-api-key --port=8080

# Streamable HTTP (require client-provided API key via URL)
node build/index.js --http --enable-client-key

Client config when using ?key=:

{
  "mcpServers": {
    "dinox-mcp": {
      "url": "http://localhost:3020/mcp?key=your-api-key"
    }
  }
}

Using AUTH_TOKEN with a gateway that injects Authorization: Bearer <token>:

AUTH_TOKEN=my-token node build/index.js --http --enable-client-key

Client example with supergateway:

{
  "mcpServers": {
    "dinox-mcp": {
      "command": "npx",
      "args": [
        "-y",
        "supergateway",
        "--streamableHttp",
        "http://localhost:3020/mcp?key=your-api-key",
        "--oauth2Bearer",
        "my-token"
      ]
    }
  }
}

Tools

CapabilityTool IDTransportInputOutput
Full-scene object detectiondetect-all-objectsSTDIO / HTTPImage URLCategory + bbox + (optional) captions
Text-prompted object detectiondetect-objects-by-textSTDIO / HTTPImage URL + English nouns (dot-separated for multiple, e.g., person.car)Target object bbox + (optional) captions
Human pose estimationdetect-human-pose-keypointsSTDIO / HTTPImage URL17 keypoints + bbox + (optional) captions
Visualizationvisualize-detection-resultSTDIO onlyImage URL + detection results arrayLocal path to annotated image

🎬 Use Cases

🎯 Scenario📝 Input✨ Output
Detection & Localization💬 Prompt:<br>Detect and visualize the <br>fire areas in the forest <br><br>🖼️ Input Image:<br>1-11-2
Object Counting💬 Prompt:<br>Please analyze this<br>warehouse image, detect<br>all the cardboard boxes,<br>count the total number<br><br>🖼️ Input Image:<br>2-1<img width="1276" alt="2-2" src="https://github.com/user-attachments/assets/3f18ef8c-5e89-45fc-bd0f-f23381304272" />
Feature Detection💬 Prompt:<br>Find all red cars<br>in the image<br><br>🖼️ Input Image:<br>4-14-2
Attribute Reasoning💬 Prompt:<br>Find the tallest person<br>in the image, describe<br>their clothing<br><br>🖼️ Input Image:<br>5-15-2
Full Scene Detection💬 Prompt:<br>Find the fruit with<br>the highest vitamin C<br>content in the image<br><br>🖼️ Input Image:<br>6-16-3<br><br>Answer: Kiwi fruit (93mg/100g)
Pose Analysis💬 Prompt:<br>Please analyze what<br>yoga pose this is<br><br>🖼️ Input Image:<br>3-13-3

FAQ

  • Supported image sources?
    • STDIO: file:// and https://
    • Streamable HTTP: https:// only
  • Supported image formats?
    • jpg, jpeg, webp, png

Development & Debugging

Use watch mode to auto-rebuild during development:

npm run watch

Use MCP Inspector for debugging:

npm run inspector

License

Apache License 2.0