clinvar-database▌
google-deepmind/science-skills · updated Jun 4, 2026
MDX-style export adds YAML metadata + attribution linking explainx.ai and this canonical listing URL.
### Clinvar Database
- ›name: "clinvar-database"
- ›description: "Use when needing clinical significance, pathogenicity classifications (e.g., Pathogenic, Benign, VUS), clinical evidence rationales, or finding "hard positive" benchmark controls for human genomic var..."
| name | clinvar-database |
| description | > Use when needing clinical significance, pathogenicity classifications (e.g., Pathogenic, Benign, VUS), clinical evidence rationales, or finding "hard positive" benchmark controls for human genomic variants. |
ClinVar Database
Prerequisites
-
uv: Read theuvskill and follow its Setup instructions to ensureuvis installed and on PATH. -
User Notification: If LICENSE_NOTIFICATION.txt does not already exist in this skill directory then (1) prominently notify the user to check the terms at https://www.ncbi.nlm.nih.gov/clinvar/, then (2) create the file recording the notification text and timestamp.
-
.envfile: Make sure the.envfile exists in your home directory. Create one if it does not exist. -
NCBI_API_KEY(optional): Raises the NCBI rate limit from 3 to 10 requests/second. The skill works without it, but a key is recommended if the user plans many queries or encounters a 429 error. The user can obtain one for free by registering at https://www.ncbi.nlm.nih.gov/account/settings/. If the variable is missing from.env, do NOT ask the user to paste it into the chat (this would leak the key into the agent's context). Instead, give the user this command — substitutingENV_FILEwith the resolved literal path to the.envfile:printf "Enter NCBI API key (typing hidden): " && read -s key && echo && echo "NCBI_API_KEY=$key" >> "ENV_FILE" && echo "Saved."The scripts load credentials automatically via
dotenv. NEVER read, print, or inspect the.envfile or its variables (e.g. nocat,grep,echo,printenv, oros.environ.geton keys). Credentials must stay out of the agent's context. See the API Key section for more details.
Overview
ClinVar is the primary consensus record for clinical classifications of human genomic variations. It provides the "clinical ground truth" for pathogenicity labels (Pathogenic, Likely Pathogenic, Benign, VUS) based on assertions from global laboratories.
When to Use
Use when you need to:
- Find the current clinical significance and star rating (review status) for a specific variant.
- Fetch clinician notes, assertion criteria, or rationales for previous clinical laboratory classifications.
- Retrieve the preferred condition name and associated HPO terms for a specific variant.
- Find a list of variant controls (e.g., "Find all Pathogenic variants in the HBB gene within 50bp of a signal").
- Check for conflicting interpretations for a given variant and identify the organizations submitting each classification.
Do NOT use when you need to:
- Find specific allele frequencies in global populations (use gnomAD).
- Describe the normal biological role of a protein and typical inheritance patterns (use OMIM).
- Predict mechanistic effects of novel mutations, like frameshifts or exon skipping (use AlphaGenome).
- Find recommended surveillance schedules for patients with a pathogenic variant (use GeneReviews).
- Generate or view 3D structural models of affected proteins (use PDB / AlphaFold).
Quick Start
ClinVar queries are executed via a robust Python wrapper script to handle strict rate limiting and XML/JSON parsing.
Example: Search for BRCA1 variants
uv run scripts/clinvar_api.py search --query "BRCA1[gene]" --output results.json
Core Rules
- Retmax Constraint: The search command defaults to
--retmax 200. For any "List all" or gene-wide request, you MUST explicitly set--retmaxhigher (e.g., 1000) to ensure data completeness. - Use the Wrapper: Prefer the wrapper script for standard queries. It handles rate limiting, retries, and the complex XML parsing for you. If the script's parsed output does not contain the specific fields you need, you may modify the script or query the NCBI E-utilities API directly — but be aware that the raw XML schemas are complex and vary between record types.
- If the rate limit is hit, the script will throw a clear error. Follow the
prerequisite instructions above to help the user add
NCBI_API_KEYto the.envfile. - Notification: If this skill is used, ensure this is mentioned in the output.
Utility Scripts
1. count — Count Matching Variants
Purpose: Check how many variants match a query without fetching IDs. Use to
decide whether a full search is warranted.
Arguments:
--query: (Required) NCBI Entrez search query string.--output: (Required) Output JSON file path.
Example: uv run scripts/clinvar_api.py count \ --query "TP53[gene] AND \"uncertain significance\"[clinsig]" \ --output count.json Output:
{"total_count": <int>}
2. search — Search Variants
Purpose: Identify variants based on genomic location, gene symbols, or clinical attributes using NCBI Entrez search syntax. The search command automatically paginates through all matching results to ensure complete, deterministic retrieval.
# Fetch ALL matching variants (default behavior)
uv run scripts/clinvar_api.py search \
--query "BRCA1[gene]" --output results.json
# Search by Chromosome and Position Range
uv run scripts/clinvar_api.py search \
--query "11[chr] AND 5225000:5226000[chrpos]" --output results.json
# Combine terms using Entrez syntax
uv run scripts/clinvar_api.py search \
--query "HBB[gene] AND pathogenic[clinsig]" --output results.json
# Cap results at 50
uv run scripts/clinvar_api.py search \
--query "TP53[gene]" --retmax 50 --output results.json
Arguments:
--query: (Required) NCBI Entrez search query string.--retmax: Maximum total number of variant IDs to return. Default is 0, which means "fetch all matching results." Set to a positive integer to cap the result set.--page_size: Number of IDs to fetch per API request (default: 500, max: 10000 per NCBI limits).--output: (Required) Output JSON file path.
Output: A JSON object containing:
total_count— Total number of matching variants in ClinVar.fetched_count— Number of IDs actually retrieved.variant_ids— List of ClinVar Variation ID strings.
3. summary — Get Interpretation Summary
Purpose: Retrieve top-line clinical significance labels, star ratings (review status), and basic phenotype data for rapid variant screening.
# Get summary for one or more Variation IDs
uv run scripts/clinvar_api.py summary \
--variant_ids 12345 67890 --output summary.json
Arguments:
--variant_ids: (Required) One or more ClinVar Variation IDs.--output: (Required) Output JSON file path.
Output: A JSON list of summary objects, each containing:
variant_id,title,clinical_significance,review_status,
last_evaluated,phenotypesgenes— list of{gene_id, symbol, strand}variation_type— e.g., single nucleotide variant, Deletion, Insertionmolecular_consequences— list of strings (e.g., ["missense variant",
"nonsense"])
4. evidence — Get Clinical Evidence
Purpose: Fetch the full clinical record for a single variant, including free-text clinician rationales, assertion methods, and specific submitter notes.
# Get full evidence for a single Variation ID
uv run scripts/clinvar_api.py evidence \
--variant_id 12345 --output evidence.json
Arguments:
--variant_id: (Required) A single ClinVar Variation ID.--output: (Required) Output JSON file path.
Output: A JSON object containing:
variant_idallele_info—{chromosome, position_start, position_stop, reference_allele, alternate_allele, cytogenetic_band, dbsnp_rsid}(GRCh38 preferred)conditions— list of{name, medgen_cui, omim_id, orphanet_id, hpo_terms}functional_consequences— list of{value, sequence_ontology_id}structural_variant_details—{outer_start, inner_start, inner_stop, outer_stop, copy_number}(present only for CNVs, otherwise null)citation_references— list of PubMed IDs cited in the global "Citations" sectionsubmissions— list of per-submitter records, each containing:submitter_name,classification,curator_notes,assertion_criteriadate_last_evaluated— when the submitter last reviewed the classification
Typical Workflows
Count-First Workflow (Recommended)
For large or unknown result sets, use count first to decide whether to
proceed, then search (which auto-paginates and returns total_count /
fetched_count), then summary to screen.
# Step 1: Gauge size (optional — search also returns total_count)
uv run scripts/clinvar_api.py count \
--query "HBB[gene] AND pathogenic[clinsig]" --output count.json
# Step 2: Fetch all variant IDs (auto-paginates)
uv run scripts/clinvar_api.py search \
--query "HBB[gene] AND pathogenic[clinsig]" --output ids.json
# Step 3: Get summaries (extract variant_ids from search output)
uv run scripts/clinvar_api.py summary \
--variant_ids 12345 67890 --output summary.json
Deep Dive: search → evidence
When you need the full clinical picture for a specific variant — including
submitter rationales, PubMed citations, ontology-linked conditions, and allele
coordinates — use evidence.
uv run scripts/clinvar_api.py evidence \
--variant_id 12345 --output evidence.json
Workflow: Robust Variant Discovery (Triangulation)
ClinVar metadata is inconsistent. To fulfill "List all" requests, do not rely on a single filter. Perform the following in a single turn and merge results:
- Search by exact label (e.g.,
"3 prime UTR variant"[molecular_consequence]). - Search by HGVS nomenclature pattern (e.g.,
c.*). - Search by genomic coordinate range (using
[chrpos]).
This "triangulation" ensures structural variants with missing labels are not overlooked.
Verifying Coding vs. Non-Coding Status via HGVS
molecular_consequences alone can be ambiguous (e.g., splice donor variant
appears in both coding and non-coding contexts). Always cross-check the title
field for HGVS patterns:
c.-…— 5' UTR (non-coding)c.*…— 3' UTR (non-coding)c.123+N/c.123-N— intronic (non-coding)p.Trp146Argetc. — protein effect (coding)
A variant with UTR/intronic HGVS and no p. annotation is non-coding, even with
splicing labels. Conversely, any p. annotation indicates a coding effect.
ClinVar Metadata Reference
- 3' UTR
- Search String:
"3 prime UTR variant"[mol_consequence] - HGVS:
c.*
- Search String:
- 5' UTR
- Search String:
"5 prime UTR variant"[mol_consequence] - HGVS:
c.-
- Search String:
- To find "high-confidence" variants or expert-reviewed consensus, use the
review_statusfilter. This is the most efficient way to distinguish between single-laboratory assertions and panel-reviewed ground truth.
When to Use Which Fields
- Quick pathogenicity label — Use
summary→clinical_significance - Gene symbol and strand — Use
summary→genes - Variant type (SNV, del, etc.) — Use
summary→variation_type - Protein-level effect — Use
summary→molecular_consequences - Genomic coordinates (GRCh38) — Use
evidence→allele_info - Linked conditions (ontology) — Use
evidence→conditions - SO functional consequence — Use
evidence→functional_consequences - CNV breakpoints/copy number — Use
evidence→structural_variant_details - PubMed references — Use
evidence→citation_references - Date of last lab review — Use both →
last_evaluated - Clinician rationales — Use
evidence→submissions[].curator_notes
Retrieving Genomic Coordinates (Default HG38/GRCh38)
To get precise genomic coordinates in the format <chrom>:<pos>:<ref>><alt>
(e.g., chr5:70951945:G>A), you must use the evidence command, as these
details are not available in the summary output.
You MUST always include genomic coordinates in the format
<chrom>:<pos>:<ref>><alt> when listing or presenting variants, even if not
explicitly requested by the user. If coordinates are missing from the summary,
use the evidence command or dbSNP fallback to retrieve them.
- Fetch Evidence: Use
uv run scripts/clinvar_api.py evidence --variant_id <ID> --output evidence.json. - Extract VCF Attributes: The
evidencecommand parses the XML. Extract:- Chromosome:
Chr - Position:
positionVCF(orstart) - Ref:
referenceAlleleVCF(orreferenceAllele) - Alt:
alternateAlleleVCF(oralternateAllele) from theSequenceLocationelement withAssembly="GRCh38".
- Chromosome:
Fallback for Imprecise Coordinates (Gene Range): ClinVar often returns the
full gene range for non-coding variants. If the extracted coordinates correspond
to the gene range instead of a specific position, use the dbsnp-database skill
to resolve the precise coordinates using the dbsnp_rsid or HGVS title: 1.Check
for dbsnp_rsid in the evidence output. 2. Run uv run scripts/dbsnp_cli.py resolve-rsid {rsid} to get precise GRCh38 coordinates. 3. Format as
<chrom>:<pos>:<ref>><alt> using the SPDI or HGVS data from dbSNP.
Structural Variant Note
The structural_variant_details field is only populated for copy number
variants (CNVs). For standard SNVs and small indels this field will be null.
Use the allele_info fields (position_start, position_stop,
reference_allele, alternate_allele) instead.
CNV / Large Deletion Note
Large copy-number variants (CNVs) frequently have empty
molecular_consequences. If a variant title mentions "del" and coordinates
overlap your target region, it is relevant regardless of missing labels.
Obtaining and Using an API Key
To increase the rate limit to 10 requests per second, you need to obtain an NCBI
API key and add it to the .env file. You can obtain a key by following the
instructions at NCBI ClinVar API docs
Once you have a key, follow the prerequisite instructions to add it to the
.env file.
uv run scripts/clinvar_api.py search --query "BRCA1[gene]" --output results.json
If a RateLimitError is encountered, follow the prerequisite instructions to
help the user add NCBI_API_KEY to the .env file, providing the
NCBI ClinVar API docs URL for instructions on how to obtain one.
Best Practices
- Always use
uv runto executepython. - If
jqis unavailable pivot immediately to using Python one-liners for processing JSON (e.g.,uv run python3 -c "import json; ..."). - Use
countbeforesearchto understand the result set size. - The
searchcommand fetches all results by default and includestotal_countandfetched_countin the output — always verify these match to confirm complete retrieval. - Entrez results are unsorted. To order by date, fetch all results and
sort locally by
last_evaluated.
Common Mistakes
- Attempting to parse the E-utilities XML yourself — Always use the
provided
clinvar_api.pyclient which handles the unpredictable XML schemas robustly. - Getting HTTP 429 Too Many Requests — The client throws an exception
telling you to pause. Follow the prerequisite instructions to help the user
add
NCBI_API_KEYto the.envfile, then retry. - Sending raw DNA sequences to the API — The API expects HGVS
nomenclature, RS IDs, or proper Entrez coordinate syntax (
11[chr] AND 1234[chrpos]), not raw ATCG strings. - For synonymous or non-coding variants — HGVS nomenclature (e.g., CAPN3 AND "c.551C>T") is more reliable than coordinate searches ([chrpos]), as many ClinVar records for these types lack precise genomic mappings.
- Case sensitivity in molecular consequences — ClinVar returns mixed-case
strings. Always use case-insensitive matching (
.lower()) when filtering. - Parsing
searchoutput as a bare list —searchreturns a JSON object withtotal_count,fetched_count, andvariant_ids— not a bare list.
How to use clinvar-database on Cursor
AI-first code editor with Composer
Prerequisites
Before installing skills in Cursor, ensure your development environment meets these requirements:
- ›Cursor installed and configured on your development machine
- ›Node.js version 16.0+ with npm package manager (verify with
node --version) - ›Active project directory or workspace where you want to add clinvar-database
Execute installation command
Execute the skills CLI command in your project's root directory to begin installation:
The skills CLI fetches clinvar-database from GitHub repository google-deepmind/science-skills and configures it for Cursor.
Select Cursor when prompted
The CLI will show a list of available agents. Use arrow keys to navigate and space to select Cursor:
Verify installation
Confirm successful installation by checking the skill directory location:
Reload or restart Cursor to activate clinvar-database. Access the skill through slash commands (e.g., /clinvar-database) or your agent's skill management interface.
Security & Verification Notice
We perform automated surface-level scans (Gen AI Scanner, Socket, Snyk) during installation. These checks detect common vulnerabilities but do not guarantee complete security. Always review skill source code and verify the publisher's reputation before production use.
Skills execute code in your development environment. Always verify the publisher's identity, review recent commits, and test in isolated environments before production deployment.
List & Monetize Your Skill
Submit your Claude Code skill and start earning
Use Cases▌
Task Automation & Efficiency
Automate repetitive workflows and reduce manual effort
Example
Generate reports, summarize documents, draft communications
Save 3-5 hours per week on routine tasks
Knowledge Enhancement
Learn new skills, understand complex topics, get expert guidance
Example
Explain concepts, provide examples, suggest learning resources
Accelerate learning and skill development by 2x
Quality Improvement
Enhance output quality through reviews, suggestions, and refinements
Example
Review drafts, suggest improvements, catch errors
Improve work quality by 30-40% with less effort
Implementation Guide▌
Prerequisites
- ›Claude Desktop or compatible AI client with skill support
- ›Clear understanding of task or problem to solve
- ›Willingness to iterate and refine outputs
Time Estimate
15-45 minutes depending on use case complexity
Installation Steps
- 1.Install skill using provided installation command
- 2.Test with simple use case relevant to your work
- 3.Evaluate output quality and relevance
- 4.Iterate on prompts to improve results
- 5.Integrate into regular workflow if valuable
Common Pitfalls
- ⚠Expecting perfect results without iteration
- ⚠Not providing enough context in prompts
- ⚠Using skill for tasks outside its intended scope
- ⚠Accepting outputs without review and validation
Best Practices▌
✓ Do
- +Start with clear, specific prompts
- +Provide relevant context and constraints
- +Review and refine all outputs before using
- +Iterate to improve output quality
- +Document successful prompt patterns
✗ Don't
- −Don't use without understanding skill limitations
- −Don't skip validation of outputs
- −Don't share sensitive information in prompts
- −Don't expect skill to replace human judgment
💡 Pro Tips
- ★Be specific about desired format and style
- ★Ask for multiple options to choose from
- ★Request explanations to understand reasoning
- ★Combine AI efficiency with human expertise
When to Use This▌
✓ Use When
Use when skill capabilities match your task, clear ROI on time saved, and you can validate outputs. Best for repetitive tasks, learning, and quality improvement.
✗ Avoid When
Avoid when task requires deep expertise you can't validate, involves sensitive decisions, or when learning process is more valuable than speed of completion.
Learning Path▌
- 1Familiarize yourself with skill capabilities and limitations
- 2Start with low-risk, non-critical tasks
- 3Progress to more complex and valuable use cases
- 4Build expertise through regular use and experimentation
Discussion
Product Hunt–style comments (not star reviews)- No comments yet — start the thread.
Ratings
4.7★★★★★49 reviews- ★★★★★Maya Rahman· Dec 8, 2024
Useful defaults in clinvar-database — fewer surprises than typical one-off scripts, and it plays nicely with `npx skills` flows.
- ★★★★★Michael Tandon· Dec 4, 2024
clinvar-database fits our agent workflows well — practical, well scoped, and easy to wire into existing repos.
- ★★★★★Fatima Gupta· Dec 4, 2024
clinvar-database is among the better-maintained entries we tried; worth keeping pinned for repeat workflows.
- ★★★★★Maya Smith· Nov 27, 2024
I recommend clinvar-database for anyone iterating fast on agent tooling; clear intent and a small, reviewable surface area.
- ★★★★★Arjun Abebe· Nov 23, 2024
Keeps context tight: clinvar-database is the kind of skill you can hand to a new teammate without a long onboarding doc.
- ★★★★★Yusuf Taylor· Nov 23, 2024
clinvar-database fits our agent workflows well — practical, well scoped, and easy to wire into existing repos.
- ★★★★★Michael Verma· Nov 11, 2024
clinvar-database has been reliable in day-to-day use. Documentation quality is above average for community skills.
- ★★★★★Zara Okafor· Nov 7, 2024
Solid pick for teams standardizing on skills: clinvar-database is focused, and the summary matches what you get after install.
- ★★★★★Arjun Gupta· Oct 26, 2024
I recommend clinvar-database for anyone iterating fast on agent tooling; clear intent and a small, reviewable surface area.
- ★★★★★Zara Iyer· Oct 18, 2024
Solid pick for teams standardizing on skills: clinvar-database is focused, and the summary matches what you get after install.
showing 1-10 of 49