Scienceofficial

ensembl-database

google-deepmind/science-skills · updated Jun 4, 2026

MDX-style export adds YAML metadata + attribution linking explainx.ai and this canonical listing URL.

$npx skills add https://github.com/google-deepmind/science-skills --skill ensembl-database
0 commentsdiscussion
summary

### Ensembl Database

  • name: "ensembl-database"
  • description: "Query the Ensembl database to resolve gene, transcript, and protein IDs, fetch genomic or protein sequences, retrieve gene structures (exons), and get variant consequence and effect predictions (VEP)...."
skill.md
name
ensembl-database
description
> Query the Ensembl database to resolve gene, transcript, and protein IDs, fetch genomic or protein sequences, retrieve gene structures (exons), and get variant consequence and effect predictions (VEP). Use this skill as a primary ID translator, genomic sequence database and variant effect prediction tool.

Ensembl Database: ID Mapping and Genomic Features

Prerequisites

  1. uv: Read the uv skill and follow its Setup instructions to ensure uv is installed and on PATH.
  2. User Notification: If LICENSE_NOTIFICATION.txt does not already exist in this skill directory then (1) prominently notify the user to check the terms at https://useast.ensembl.org/index.html and https://github.com/Ensembl/ensembl-rest/wiki, then (2) create the file recording the notification text and timestamp.

Overview

The Ensembl database is a resource for genome annotation. This skill allows you to interact with the Ensembl REST API to resolve ambiguous symbols, cross-reference IDs (RefSeq, HGNC, UniProt, ENSG), fetch raw sequences, and retrieve detailed transcript structures.

Key Concepts:

  • ENSG (Gene): Stable identifier for a human gene. Other species will have different three-letter species codes.
  • ENST (Transcript): Stable identifier for a transcript (splicing isoform).
  • ENSP (Protein): Stable identifier for a translated protein.
  • MANE Select: The consensus primary transcript agreed upon by Ensembl and NCBI.
  • Canonical: Ensembl's representative transcript (used if MANE is not available or non-human).

Core Rules

  • Use the Wrapper: ALWAYS execute the provided helper scripts to query the database rather than accessing the database directly. The scripts automatically enforce the required rate limit gracefully.
  • Default Species: If the species is absent or ambiguous in the prompt, default to "human". You MUST explicitly flag this default to the user to ensure they are aware.
  • Primary Transcripts: When listing transcripts for a gene, only return the MANE Select transcript (for human) or the Canonical transcript (for others) unless the user explicitly asks for all alternative isoforms. You MUST flag to the user when multiple transcripts are available and you are defaulting to the primary one.
  • Assembly Handling: The default assembly is GRCh38. For GRCh37 requests, you MUST use the --assembly GRCh37 flag. You MUST explicitly flag to the user when a non-default assembly is being used.
  • Output Location: The script writes full JSON/FASTA output to temporary files in /tmp by default, or to a user-specified file using the --output flag. It also prints a concise summary to stdout.
  • Notification: If this skill is used, ensure this is mentioned in the output.

Available Commands

1. Resolve Gene ID — Resolve a symbol, alias, or RefSeq ID to ENSG ID(s). Automatically falls back to resolving synonyms if primary symbol is not found.

uv run scripts/ensembl_api.py resolve-gene TP53 --species human --output tp53.json
uv run scripts/ensembl_api.py resolve-gene PCL2 --output pcl2.json # Falls back to synonym resolution

2. Map ID to External Database — Cross-reference an Ensembl ID to UniProt, HGNC, RefSeq, etc.

uv run scripts/ensembl_api.py map-id ENSG00000141510 --external-db UniProt --output uniprot_map.json
uv run scripts/ensembl_api.py map-id ENST00000269305 --external-db RefSeq_mRNA --output refseq_map.json

3. Get Genomic Sequence — Fetch raw DNA for a coordinate window. Supports GRCh37 via --assembly GRCh37.

uv run scripts/ensembl_api.py get-sequence 17:7661779-7687550 --species human --output seq.txt
uv run scripts/ensembl_api.py get-sequence chr9:21971100-21971200 --assembly GRCh37 --output seq_grch37.txt

4. Gene Summary — High-level metadata: symbol, biotype, description, chromosomal location.

uv run scripts/ensembl_api.py gene-summary ENSG00000141510 --output gene_summary.json

5. List Transcripts — All transcripts for a gene, with optional --only-mane or --only-canonical filters. Output includes Transcript Support Level (TSL).

uv run scripts/ensembl_api.py transcripts ENSG00000141510 --only-mane --output transcripts_mane.json
uv run scripts/ensembl_api.py transcripts ENSG00000141510 --only-canonical --output transcripts_canonical.json
uv run scripts/ensembl_api.py transcripts ENSG00000141510 --output transcripts_all.json

5b. Canonical TSS — Get the single coordinate of the Transcription Start Site (TSS) for the canonical transcript of a gene.

[!NOTE] Unlike the standard transcripts command, canonical-tss accepts both symbols (e.g., TP53) and Ensembl IDs, and automatically resolves them. It also does the math for strand orientation (TSS is Start for + strand and End for - strand), outputting the single integer coordinate directly.

uv run scripts/ensembl_api.py canonical-tss TP53 --output tp53_tss.json
uv run scripts/ensembl_api.py canonical-tss ENSG00000141510 --output tss.json

6. Transcript Structure — Exon coordinates, CDS boundaries, and computed 5'/3' UTR regions for a transcript.

uv run scripts/ensembl_api.py transcript-structure ENST00000269305 --output structure.json

7. Protein Info — ENSP ID and sequence length for a transcript.

uv run scripts/ensembl_api.py protein-info ENST00000269305 --output protein_info.json

8. Protein Sequence — Amino acid FASTA for a transcript (ENST) or protein (ENSP) ID.

uv run scripts/ensembl_api.py protein-sequence ENST00000269305 --output protein.fasta
uv run scripts/ensembl_api.py protein-sequence ENSP00000269305 --output protein_ensp.fasta

9. Variant Consequence (VEP) — Predict molecular consequences for a genomic variant. Includes open-licensed plugins: AlphaMissense, Conservation, DosageSensitivity, IntAct, MaveDB, OpenTargets, LoF (Loftee), NMD, UTRAnnotator, mutfunc, LOEUF.

uv run scripts/ensembl_api.py vep 9:21971147:T:C --species human --output vep.json
uv run scripts/ensembl_api.py vep rs699 --species human --output vep_rs699.json

Example VEP stdout output:

[*] Variant: 9:21971147:T>C
[*] Most severe consequence: missense_variant
[*] Found 15 transcript consequences.

[*] VEP Predictions:

  - ENST00000304494 (CDKN2A): Consequence = missense_variant
  - ENST00000304494 (CDKN2A): Amino Acids = N/S
  - ENST00000304494 (CDKN2A): SIFT = deleterious (0.01)
  - ENST00000304494 (CDKN2A): AlphaMissense Class = likely_benign
  - ENST00000304494 (CDKN2A): AlphaMissense Pathogenicity = 0.2129
  - ENST00000304494 (CDKN2A): Conservation = 2.05
  - ENST00000304494 (CDKN2A): Dosage Sensitivity (Haplo) = 0.889228328567991
  - ENST00000304494 (CDKN2A): Dosage Sensitivity (Triplo) = 0.135514349094646
  - ENST00000304494 (CDKN2A): Loss of Function (LOEUF) = 0.791

Presenting VEP Results: After running the VEP command, you MUST present the full VEP Predictions list from stdout to the user. This list contains both standard VEP predictions (Consequence, Amino Acids, SIFT, PolyPhen) and open-license plugin results (AlphaMissense, Conservation, Dosage Sensitivity, LOEUF, Loftee LoF, NMD, UTRAnnotator, Mutfunc). Do NOT just summarize — show the complete list so the user can see all predictions. If the list is very long (many transcripts), show the MANE Select / canonical transcript rows in full and note that the complete data is in the JSON output.

Parsing Outputs

If the user needs detailed, nested structural data (like the precise integer coordinates of Exon 2 of a transcript) that isn't summarized in stdout:

  1. Locate the JSON file (either specified via --output or the temporary file path printed by the script).
  2. Use terminal tools like jq or write a quick, disposable python snippet to extract the specific data point requested. Do not attempt to read the entire JSON file into your context if it is very large.

Custom Queries

If you need to make an API call that the script does not support (e.g., fetching protein domain annotations, coordinate mapping between assemblies, homology searches, linkage disequilibrium, or phenotype lookups), read references/ensembl_rest_api_reference.md for a complete reference of available endpoints, parameters, and response fields.

CRITICAL: When writing custom scripts or using alternatives to the provided scripts, you MUST respect the Ensembl REST API rate limits (maximum 15 requests per second) and handle 429 Too Many Requests errors gracefully (e.g., with exponential backoff).

how to use ensembl-database

How to use ensembl-database on Cursor

AI-first code editor with Composer

1

Prerequisites

Before installing skills in Cursor, ensure your development environment meets these requirements:

  • Cursor installed and configured on your development machine
  • Node.js version 16.0+ with npm package manager (verify with node --version)
  • Active project directory or workspace where you want to add ensembl-database
2

Execute installation command

Execute the skills CLI command in your project's root directory to begin installation:

$npx skills add https://github.com/google-deepmind/science-skills --skill ensembl-database

The skills CLI fetches ensembl-database from GitHub repository google-deepmind/science-skills and configures it for Cursor.

3

Select Cursor when prompted

The CLI will show a list of available agents. Use arrow keys to navigate and space to select Cursor:

◆ Which agents do you want to install to?
│ ── Universal (.agents/skills) ── always included ────
│ • Amp
│ • Antigravity
│ • Cline
│ • Codex
│ ●Cursor(selected)
│ • Cursor
│ • Windsurf
4

Verify installation

Confirm successful installation by checking the skill directory location:

.cursor/skills/ensembl-database

Reload or restart Cursor to activate ensembl-database. Access the skill through slash commands (e.g., /ensembl-database) or your agent's skill management interface.

Security & Verification Notice

We perform automated surface-level scans (Gen AI Scanner, Socket, Snyk) during installation. These checks detect common vulnerabilities but do not guarantee complete security. Always review skill source code and verify the publisher's reputation before production use.

Skills execute code in your development environment. Always verify the publisher's identity, review recent commits, and test in isolated environments before production deployment.

List & Monetize Your Skill

Submit your Claude Code skill and start earning

GET_STARTED →

Use Cases

Task Automation & Efficiency

Automate repetitive workflows and reduce manual effort

Example

Generate reports, summarize documents, draft communications

Save 3-5 hours per week on routine tasks

Knowledge Enhancement

Learn new skills, understand complex topics, get expert guidance

Example

Explain concepts, provide examples, suggest learning resources

Accelerate learning and skill development by 2x

Quality Improvement

Enhance output quality through reviews, suggestions, and refinements

Example

Review drafts, suggest improvements, catch errors

Improve work quality by 30-40% with less effort

Implementation Guide

Prerequisites

  • Claude Desktop or compatible AI client with skill support
  • Clear understanding of task or problem to solve
  • Willingness to iterate and refine outputs

Time Estimate

15-45 minutes depending on use case complexity

Installation Steps

  1. 1.Install skill using provided installation command
  2. 2.Test with simple use case relevant to your work
  3. 3.Evaluate output quality and relevance
  4. 4.Iterate on prompts to improve results
  5. 5.Integrate into regular workflow if valuable

Common Pitfalls

  • Expecting perfect results without iteration
  • Not providing enough context in prompts
  • Using skill for tasks outside its intended scope
  • Accepting outputs without review and validation

Best Practices

✓ Do

  • +Start with clear, specific prompts
  • +Provide relevant context and constraints
  • +Review and refine all outputs before using
  • +Iterate to improve output quality
  • +Document successful prompt patterns

✗ Don't

  • Don't use without understanding skill limitations
  • Don't skip validation of outputs
  • Don't share sensitive information in prompts
  • Don't expect skill to replace human judgment

💡 Pro Tips

  • Be specific about desired format and style
  • Ask for multiple options to choose from
  • Request explanations to understand reasoning
  • Combine AI efficiency with human expertise

When to Use This

✓ Use When

Use when skill capabilities match your task, clear ROI on time saved, and you can validate outputs. Best for repetitive tasks, learning, and quality improvement.

✗ Avoid When

Avoid when task requires deep expertise you can't validate, involves sensitive decisions, or when learning process is more valuable than speed of completion.

Learning Path

  1. 1Familiarize yourself with skill capabilities and limitations
  2. 2Start with low-risk, non-critical tasks
  3. 3Progress to more complex and valuable use cases
  4. 4Build expertise through regular use and experimentation

Discussion

Product Hunt–style comments (not star reviews)
  • No comments yet — start the thread.
general reviews

Ratings

4.646 reviews
  • Harper Jackson· Dec 28, 2024

    ensembl-database is among the better-maintained entries we tried; worth keeping pinned for repeat workflows.

  • Chaitanya Patil· Dec 16, 2024

    I recommend ensembl-database for anyone iterating fast on agent tooling; clear intent and a small, reviewable surface area.

  • Emma Gonzalez· Dec 12, 2024

    Keeps context tight: ensembl-database is the kind of skill you can hand to a new teammate without a long onboarding doc.

  • Li Jackson· Nov 19, 2024

    Keeps context tight: ensembl-database is the kind of skill you can hand to a new teammate without a long onboarding doc.

  • Piyush G· Nov 7, 2024

    Solid pick for teams standardizing on skills: ensembl-database is focused, and the summary matches what you get after install.

  • Kofi Lopez· Nov 3, 2024

    ensembl-database is among the better-maintained entries we tried; worth keeping pinned for repeat workflows.

  • Shikha Mishra· Oct 26, 2024

    ensembl-database is among the better-maintained entries we tried; worth keeping pinned for repeat workflows.

  • Kofi Flores· Oct 22, 2024

    Solid pick for teams standardizing on skills: ensembl-database is focused, and the summary matches what you get after install.

  • Li Brown· Oct 10, 2024

    I recommend ensembl-database for anyone iterating fast on agent tooling; clear intent and a small, reviewable surface area.

  • Nia Srinivasan· Sep 21, 2024

    ensembl-database fits our agent workflows well — practical, well scoped, and easy to wire into existing repos.

showing 1-10 of 46

1 / 5