bulk-rnaseq▌
K-Dense-AI/scientific-agent-skills · updated Jun 4, 2026
MDX-style export adds YAML metadata + attribution linking explainx.ai and this canonical listing URL.
### Bulk Rnaseq
- ›name: "bulk-rnaseq"
- ›description: "End-to-end bulk RNA-seq orchestrator — takes raw FASTQ reads through QC and trimming (FastQC, fastp/Trim Galore), alignment and quantification (STAR, Salmon, featureCounts), assembles a gene-level cou..."
| name | bulk-rnaseq |
| description | End-to-end bulk RNA-seq orchestrator — takes raw FASTQ reads through QC and trimming (FastQC, fastp/Trim Galore), alignment and quantification (STAR, Salmon, featureCounts), assembles a gene-level counts matrix, then hands off to differential expression (pydeseq2), pathway/GSEA enrichment (pathway-enrichment), and publication figures (scientific-visualization). Use whenever the user has bulk RNA-seq reads or quant output and wants a complete, reproducible differential-expression workflow — e.g. "analyze my RNA-seq", "FASTQ to DESeq2", "run nf-core/rnaseq", "STAR/Salmon quantification", "build a counts matrix for DESeq2", or "go from reads to differentially expressed genes and enriched pathways". Routes between an nf-core/rnaseq (Nextflow) path and a standalone STAR/Salmon path, and covers experimental design, strandedness, and QC gates. For single-cell RNA-seq use the scanpy skill instead. |
| license | MIT |
| metadata | version: "1.0" skill-author: K-Dense Inc. |
Bulk RNA-seq
Overview
This skill orchestrates a complete, defensible bulk RNA-seq differential-expression study, from raw sequencing reads to enriched pathways and figures. It is a router, not a reimplementation: most stages already have dedicated skills in this repo, and this skill connects them in the right order, fills the one real gap (raw reads → a gene-level counts matrix), and enforces the design and QC decisions that determine whether the final result is trustworthy.
"Defensible" means three things, applied throughout:
- Reproducible — pinned pipeline/tool versions, containers where possible, recorded parameters, fixed random seeds.
- Quality-gated — QC is inspected and acted on before, during, and after quantification, not skipped.
- Statistically sound — adequate replication, a design that matches the biology, counts handled correctly, and FDR-controlled testing.
The pipeline is: FastQC/trim → align/quant (STAR/Salmon) → counts → DE (pydeseq2) → enrichment (pathway-enrichment) → figures.
When to Use This Skill
Use this skill when the user wants to:
- Go from FASTQ files (or a sequencing run) to differentially expressed genes and pathways.
- Run or configure
nf-core/rnaseq, or align/quantify with STAR, Salmon, or featureCounts. - Turn Salmon/STAR/featureCounts output into a counts matrix ready for DESeq2/PyDESeq2.
- Design or sanity-check a bulk RNA-seq experiment (replicates, batch, strandedness) before committing compute.
- Scope an end-to-end RNA-seq analysis and decide which tools and skills to chain.
This is bulk RNA-seq (samples = biological specimens). For single-cell/nuclei data use scanpy; for the DE statistics alone use pydeseq2; for enrichment alone use pathway-enrichment.
The Pipeline at a Glance
flowchart TD
fastq["Raw FASTQ + samplesheet"] --> qc["FastQC + MultiQC"]
qc --> trim["Trim: fastp / Trim Galore"]
trim --> align["Align + quant: STAR and/or Salmon"]
align --> counts["Gene-level counts matrix"]
counts --> de["Differential expression"]
de --> enrich["Pathway / GSEA enrichment"]
de --> fig["Figures"]
enrich --> fig
nfcore["nf-core/rnaseq via nextflow skill"] -.->|"path A"| align
manual["Standalone recipes (this skill)"] -.->|"path B"| align
bridge["build_counts_matrix.py (this skill)"] -.-> counts
pydeseq2skill["pydeseq2 skill"] -.-> de
pwskill["pathway-enrichment skill"] -.-> enrich
vizskill["scientific-visualization skill"] -.-> fig
Two Upstream Paths — Pick One
The reads → counts stage can be run two ways. They produce equivalent gene counts; choose by context, then stay on that path.
Use Path A — nf-core/rnaseq when… | Use Path B — standalone tools when… |
|---|---|
| You want the field-standard, audited, citable pipeline with one command | You have a few samples and want to learn/inspect each step |
| Many samples, or you'll scale to HPC/cloud | No Nextflow/containers available, or a constrained environment |
| Reproducibility and a full MultiQC report matter most | You need a non-standard step the pipeline doesn't expose |
→ Drive it through the nextflow skill | → Follow references/upstream-manual.md |
When unsure, prefer Path A: nf-core/rnaseq already wires together FastQC → trimming → STAR/Salmon → quantification → tximport → MultiQC with sensible, reviewed defaults, which is the most defensible option. Path B exists for transparency and constrained setups.
Both paths converge on a gene-level counts matrix, after which the workflow is identical.
Setup
# This skill's glue (bridge + handoffs) — Python
uv pip install pytximport pandas
# Downstream skills install their own deps:
# pydeseq2 skill -> uv pip install pydeseq2
# pathway-enrichment skill -> uv pip install gseapy gprofiler-official
# Path A (nf-core): only Nextflow + a container engine are needed — see the `nextflow` skill.
# Path B (standalone tools): install via bioconda. Pin versions for reproducibility.
conda create -n rnaseq -c bioconda -c conda-forge \
fastqc fastp trim-galore "star=2.7.11b" "salmon=1.10.3" subread multiqc
Record the exact versions you use (pipeline revision, tool versions, reference genome + annotation release) — they belong in the methods section and make the analysis reproducible.
Quick Start
Path A — nf-core/rnaseq (recommended)
# 0. Validate the samplesheet first (catches the most common failures early)
python scripts/validate_samplesheet.py --samplesheet samplesheet.csv
# 1. Smoke-test the environment with tiny bundled data
nextflow run nf-core/rnaseq -r 3.26.0 -profile test,docker --outdir test_results
# 2. Real run: pin the revision, pick an aligner, pass a samplesheet + reference
nextflow run nf-core/rnaseq -r 3.26.0 \
-profile docker \
--input samplesheet.csv \
--genome GRCh38 \
--aligner star_salmon \
--outdir results \
-resume
nf-core/rnaseq runs tximport internally, so gene counts come out already merged — no bridge script needed. Use results/star_salmon/salmon.merged.gene_counts_length_scaled.tsv for DE. Samplesheet format, aligner choice, and outputs: references/upstream-nfcore.md. For engine/HPC/cloud/container detail, use the nextflow skill.
Path B — standalone STAR/Salmon (abbreviated)
fastqc -o qc/ reads/*.fastq.gz # 1. QC raw reads
fastp -i s1_R1.fq.gz -I s1_R2.fq.gz \
-o s1_R1.trim.fq.gz -O s1_R2.trim.fq.gz \
--thread 4 -j s1.fastp.json # 2. Trim adapters/low-quality
salmon quant -i salmon_index -l A \
-1 s1_R1.trim.fq.gz -2 s1_R2.trim.fq.gz \
--gcBias --seqBias -p 8 -o quant/s1 # 3. Quantify (per sample)
Full recipes (FastQC, fastp/Trim Galore, STAR index+align+--quantMode GeneCounts, Salmon decoy-aware index, featureCounts, strandedness): references/upstream-manual.md.
Counts → DE → enrichment (both paths)
# Path B only: assemble a gene x sample counts matrix + metadata template for PyDESeq2
python scripts/build_counts_matrix.py --from salmon \
--quant-dir quant/ --tx2gene tx2gene.tsv --output-dir counts/
# Then hand off (see the dedicated skills):
# pydeseq2: counts.csv + metadata.csv -> DE table (log2FC, padj, stat)
# pathway-enrichment: rank by `stat` (GSEA) or padj+|LFC| hit list (ORA)
# scientific-visualization / matplotlib: volcano, MA, heatmap, PCA, enrichment dotplot
Stage-by-Stage Workflow
Work top to bottom. Each stage names the skill or file that owns the detail. Don't skip the design/QC stages — they are where bulk RNA-seq studies most often go wrong.
- Design & sample sheet. Confirm ≥3 biological replicates per group, identify batch/confounders, and choose the comparison(s). Build the samplesheet and validate it with
scripts/validate_samplesheet.py. Rationale and rules:references/design-and-qc.md. - Raw-read QC. FastQC per file; aggregate with MultiQC. Check per-base quality, adapter content, duplication, and over-representation. Thresholds:
references/design-and-qc.md. - Trimming. Remove adapters and low-quality tails (via
fastporTrim Galore). Re-run FastQC to confirm. Recipes:references/upstream-manual.md(Path A does this for you). - Align / quantify. STAR (genome alignment +
--quantMode GeneCounts) and/or Salmon (transcript quasi-mapping, decoy-aware). Determine strandedness — it is easy to get wrong and silently halves your counts. Detail:references/upstream-manual.md; pipeline params:references/upstream-nfcore.md. - Build the counts matrix. Turn quant output into a gene × sample integer matrix and a metadata template (
scripts/build_counts_matrix.py). The estimated-count and gene-ID-mapping nuances live inreferences/counts-and-handoff.md. - Differential expression →
pydeseq2skill. Loadcounts.csv+metadata.csv, set the design (e.g.~batch + condition), fit, and test with FDR control. Inspect the PCA and p-value histogram as QC. - Enrichment →
pathway-enrichmentskill. For GSEA, rank the full gene list by the DESeq2stat; for ORA, pass the thresholded hit list (padj < 0.05, optionally |log2FC| > 1). Map gene IDs to symbols first. - Figures →
scientific-visualizationskill. Volcano, MA, sample-distance heatmap, PCA, and enrichment dotplots, plus the MultiQC report for the QC narrative.
The counts → DE bridge (the key glue)
This is the one stage with no upstream/downstream skill, so this skill owns it. scripts/build_counts_matrix.py converts quant output into exactly what pydeseq2 expects:
- Salmon (
--from salmon): aggregates per-samplequant.sfto gene level withpytximportusingcounts_from_abundance="length_scaled_tpm"(the right choice for gene-level DE), needs atx2genemap. - STAR (
--from star): reads eachReadsPerGene.out.tab, selecting the column for your--strandedness(unstranded/forward/reverse). - featureCounts (
--from featurecounts): parses the combinedfeatureCountsmatrix.
It writes counts.csv (genes × samples, integers) and metadata_template.csv (one row per sample) for you to fill in. Salmon/RSEM counts are estimates (non-integer); they are rounded to integers because PyDESeq2 requires integer counts — see references/counts-and-handoff.md for why this is acceptable with length_scaled_tpm and how it differs from the offset-based DESeq2+tximport route. That reference also covers Ensembl→symbol mapping (needed before enrichment) and the exact orientation PyDESeq2 wants.
Common Pitfalls
These cause most wrong or irreproducible bulk RNA-seq results:
- Too few replicates. <3 biological replicates per group gives almost no power and unstable dispersion estimates. More replicates beat deeper sequencing.
- Confounded batch and condition. If every treated sample was processed on a different day/lane than controls, the effect is unrecoverable. Randomize, and model known batches (
~batch + condition). Seereferences/design-and-qc.md. - Wrong strandedness. Choosing the wrong STAR column or featureCounts
-s/Salmon library type silently discards ~half the reads. Use Salmon-l Aor infer strandedness, and verify the assigned-reads fraction. - Feeding TPM/FPKM to DESeq2. DESeq2 needs raw (or length-scaled) counts, never TPM/FPKM/normalized values. The bridge handles this.
- Non-integer counts. PyDESeq2 requires integers; round Salmon estimates (the bridge does this).
- Gene-ID mismatch into enrichment. DESeq2 output is often Ensembl IDs; Enrichr/MSigDB want symbols. Map IDs before
pathway-enrichmentor "nothing is significant". - Skipping post-quant QC. Always look at the PCA and sample-distance heatmap before trusting DE — they expose swapped labels, outliers, and hidden batches.
- Mixing aligners across samples. Quantify every sample with the same tool, version, reference, and parameters.
- Unpinned versions. "latest" pipelines/genomes make results unreproducible; pin
-r, tool versions, and the genome/annotation release.
Integration with Other Skills
- Upstream execution:
nextflow(runsnf-core/rnaseq, Path A; HPC/cloud/containers). - Reference data / gene IDs:
gget(gget reffor genome+GTF,gget info/gget searchfor ID mapping),database-lookup(Ensembl/NCBI),biopython/pysam(FASTA/BAM handling). - Differential expression:
pydeseq2(the DE engine this skill hands counts to). - Enrichment:
pathway-enrichment(ORA + GSEA; itsscripts/run_enrichment.pyreads a DESeq2 results CSV directly). - Figures & reporting:
scientific-visualization,matplotlib,seaborn;scientific-writingfor the methods/results narrative. - Related but distinct:
scanpy(single-cell),statistical-analysis(multiple-testing depth).
Reference Files
Read the relevant file when you need depth — each is self-contained:
references/upstream-nfcore.md— Path A: samplesheet format,--aligner/--pseudo_alignerchoice, key params, thesalmon.merged.gene_counts*.tsvoutputs, MultiQC, and what to hand topydeseq2.references/upstream-manual.md— Path B: FastQC, fastp/Trim Galore, STAR genome index + alignment +--quantMode GeneCounts, Salmon decoy-aware index +quant, featureCounts, and how to determine strandedness.references/counts-and-handoff.md— turning quant output into PyDESeq2-readycounts.csv/metadata.csv(pytximport, STAR column selection, featureCounts), the integer/estimated-count nuance, Ensembl→symbol mapping, and the DE→enrichment rank/hit-list recipe.references/design-and-qc.md— experimental design (replication, batch, confounding, design formulas) and QC-metric interpretation (mapping rate, duplication, rRNA, complexity, PCA/outliers) — the defensible-pipeline backbone.
Resources
- nf-core/rnaseq: https://nf-co.re/rnaseq · STAR: https://github.com/alexdobin/STAR · Salmon: https://salmon.readthedocs.io
- fastp: https://github.com/OpenGene/fastp · Trim Galore: https://github.com/FelixKrueger/TrimGalore · MultiQC: https://multiqc.info
- pytximport: https://pytximport.complextissue.com · featureCounts (Subread): https://subread.sourceforge.net
- Method background: Love et al. 2014 (DESeq2) DOI 10.1186/s13059-014-0550-8 · Soneson et al. 2015 (tximport) DOI 10.12688/f1000research.7563.2
How to use bulk-rnaseq on Cursor
AI-first code editor with Composer
Prerequisites
Before installing skills in Cursor, ensure your development environment meets these requirements:
- ›Cursor installed and configured on your development machine
- ›Node.js version 16.0+ with npm package manager (verify with
node --version) - ›Active project directory or workspace where you want to add bulk-rnaseq
Execute installation command
Execute the skills CLI command in your project's root directory to begin installation:
The skills CLI fetches bulk-rnaseq from GitHub repository K-Dense-AI/scientific-agent-skills and configures it for Cursor.
Select Cursor when prompted
The CLI will show a list of available agents. Use arrow keys to navigate and space to select Cursor:
Verify installation
Confirm successful installation by checking the skill directory location:
Reload or restart Cursor to activate bulk-rnaseq. Access the skill through slash commands (e.g., /bulk-rnaseq) or your agent's skill management interface.
Security & Verification Notice
We perform automated surface-level scans (Gen AI Scanner, Socket, Snyk) during installation. These checks detect common vulnerabilities but do not guarantee complete security. Always review skill source code and verify the publisher's reputation before production use.
Skills execute code in your development environment. Always verify the publisher's identity, review recent commits, and test in isolated environments before production deployment.
List & Monetize Your Skill
Submit your Claude Code skill and start earning
Use Cases▌
Task Automation & Efficiency
Automate repetitive workflows and reduce manual effort
Example
Generate reports, summarize documents, draft communications
Save 3-5 hours per week on routine tasks
Knowledge Enhancement
Learn new skills, understand complex topics, get expert guidance
Example
Explain concepts, provide examples, suggest learning resources
Accelerate learning and skill development by 2x
Quality Improvement
Enhance output quality through reviews, suggestions, and refinements
Example
Review drafts, suggest improvements, catch errors
Improve work quality by 30-40% with less effort
Implementation Guide▌
Prerequisites
- ›Claude Desktop or compatible AI client with skill support
- ›Clear understanding of task or problem to solve
- ›Willingness to iterate and refine outputs
Time Estimate
15-45 minutes depending on use case complexity
Installation Steps
- 1.Install skill using provided installation command
- 2.Test with simple use case relevant to your work
- 3.Evaluate output quality and relevance
- 4.Iterate on prompts to improve results
- 5.Integrate into regular workflow if valuable
Common Pitfalls
- ⚠Expecting perfect results without iteration
- ⚠Not providing enough context in prompts
- ⚠Using skill for tasks outside its intended scope
- ⚠Accepting outputs without review and validation
Best Practices▌
✓ Do
- +Start with clear, specific prompts
- +Provide relevant context and constraints
- +Review and refine all outputs before using
- +Iterate to improve output quality
- +Document successful prompt patterns
✗ Don't
- −Don't use without understanding skill limitations
- −Don't skip validation of outputs
- −Don't share sensitive information in prompts
- −Don't expect skill to replace human judgment
💡 Pro Tips
- ★Be specific about desired format and style
- ★Ask for multiple options to choose from
- ★Request explanations to understand reasoning
- ★Combine AI efficiency with human expertise
When to Use This▌
✓ Use When
Use when skill capabilities match your task, clear ROI on time saved, and you can validate outputs. Best for repetitive tasks, learning, and quality improvement.
✗ Avoid When
Avoid when task requires deep expertise you can't validate, involves sensitive decisions, or when learning process is more valuable than speed of completion.
Learning Path▌
- 1Familiarize yourself with skill capabilities and limitations
- 2Start with low-risk, non-critical tasks
- 3Progress to more complex and valuable use cases
- 4Build expertise through regular use and experimentation
Discussion
Product Hunt–style comments (not star reviews)- No comments yet — start the thread.
Ratings
4.6★★★★★37 reviews- ★★★★★Evelyn Gupta· Dec 24, 2024
bulk-rnaseq is among the better-maintained entries we tried; worth keeping pinned for repeat workflows.
- ★★★★★Neel Rahman· Dec 24, 2024
bulk-rnaseq has been reliable in day-to-day use. Documentation quality is above average for community skills.
- ★★★★★Amina Johnson· Nov 15, 2024
Solid pick for teams standardizing on skills: bulk-rnaseq is focused, and the summary matches what you get after install.
- ★★★★★Neel Singh· Nov 15, 2024
Keeps context tight: bulk-rnaseq is the kind of skill you can hand to a new teammate without a long onboarding doc.
- ★★★★★Evelyn Ramirez· Nov 3, 2024
I recommend bulk-rnaseq for anyone iterating fast on agent tooling; clear intent and a small, reviewable surface area.
- ★★★★★Evelyn Abbas· Oct 22, 2024
Useful defaults in bulk-rnaseq — fewer surprises than typical one-off scripts, and it plays nicely with `npx skills` flows.
- ★★★★★Amina Mensah· Oct 6, 2024
bulk-rnaseq has been reliable in day-to-day use. Documentation quality is above average for community skills.
- ★★★★★Chinedu Bhatia· Sep 25, 2024
bulk-rnaseq reduced setup friction for our internal harness; good balance of opinion and flexibility.
- ★★★★★Yuki Johnson· Sep 17, 2024
bulk-rnaseq is among the better-maintained entries we tried; worth keeping pinned for repeat workflows.
- ★★★★★Oshnikdeep· Sep 9, 2024
bulk-rnaseq is among the better-maintained entries we tried; worth keeping pinned for repeat workflows.
showing 1-10 of 37