pdf-ocr

yejinlei/pdf-ocr-skill · updated May 29, 2026

MDX-style export adds YAML metadata + attribution linking explainx.ai and this canonical listing URL.

$npx skills add https://github.com/yejinlei/pdf-ocr-skill --skill pdf-ocr
0 commentsdiscussion
summary

Dual-engine OCR for extracting text from scanned PDFs and images with local or cloud processing.

  • Supports RapidOCR (local, free, no API key) and SiliconFlow API (cloud-based, high precision) with automatic fallback when local engine fails
  • Handles scanned PDFs and multiple image formats (JPG, PNG, BMP, GIF, TIFF, WEBP) with Chinese and English text recognition
  • Preserves text order and structure; automatically converts PDF pages to images for processing
  • Batch processing capability f
skill.md

PDF OCR Skill

中文版本

PDF OCR技能用于从影印版PDF文件和图片文件中提取文字内容。该技能支持两种OCR引擎:

  • RapidOCR(本地引擎):无需API密钥,免费使用,识别速度快
  • 硅基流动大模型(云端引擎):使用AI大模型进行高精度OCR识别

功能特性

  • 支持影印版PDF文件的文字提取
  • 支持多种图片格式的文字识别(JPG、PNG、BMP、GIF、TIFF、WEBP)
  • 双引擎支持:RapidOCR(本地)和硅基流动API(云端)
  • 支持中文和英文文字识别
  • 保持文字的顺序和结构
  • 自动将PDF页面转换为图片进行识别
  • 智能引擎切换:当RapidOCR初始化失败时自动切换到硅基流动API

安装

依赖要求

pip install pymupdf pillow requests python-dotenv

可选依赖(推荐)

安装RapidOCR以获得本地识别能力:

pip install rapidocr_onnxruntime

环境变量配置

  1. 复制 .env.example 文件并重命名为 .env
  2. 根据需要配置以下选项:
# OCR引擎选择
# - "rapid": 使用RapidOCR本地引擎(默认,无需API密钥)
# - "siliconflow": 使用硅基流动API引擎(需要API密钥)
OCR_ENGINE=rapid

# 如果使用硅基流动API引擎,需要配置以下选项:
SILICON_FLOW_API_KEY=your_api_key_here
SILICON_FLOW_OCR_MODEL=deepseek-ai/DeepSeek-OCR

快速开始

使用默认引擎(RapidOCR本地识别)

# 导入OCR处理器
from scripts.pdf_ocr_processor import PDFOCRProcessor

# 创建处理器实例(默认使用RapidOCR)
processor = PDFOCRProcessor()

# 执行PDF OCR识别
result = processor.ocr_pdf('path/to/your/scanned.pdf')

# 获取识别结果
print(f"识别完成,共 {result['page_count']} 页")
print(f"使用引擎: {result['engine']}")
print(result['text'])

使用硅基流动API引擎

# 导入OCR处理器
from scripts.pdf_ocr_processor import PDFOCRProcessor

# 创建处理器实例,指定使用硅基流动API
processor = PDFOCRProcessor(engine="siliconflow")

# 执行PDF OCR识别
result = processor.ocr_pdf('path/to/your/scanned.pdf')

# 获取识别结果
print(f"识别完成,共 {result['page_count']} 页")
print(result['text'])

识别图片文件

# 导入OCR处理器
from scripts.pdf_ocr_processor import PDFOCRProcessor

# 创建处理器实例
processor = PDFOCRProcessor()  # 或 PDFOCRProcessor(engine="siliconflow")

# 执行图片OCR识别
result = processor.ocr_image_file('path/to/your/image.jpg')

# 获取识别结果
print(f"识别结果: {result['text']}")

命令行使用

# 使用默认RapidOCR引擎
python pdf_ocr_processor.py your_document.pdf

# 使用硅基流动API引擎
python pdf_ocr_processor.py your_document.pdf siliconflow

进阶使用示例

批量处理多个PDF文件

import os
from scripts.pdf_ocr_processor import PDFOCRProcessor

# 创建处理器实例
processor = PDFOCRProcessor()

# 批量处理目录中的所有PDF文件
pdf_dir = "path/to/pdf/files"
output_dir = "path/to/output"
os.makedirs(output_dir, exist_ok=True)

for pdf_file in os.listdir(pdf_dir):
    if pdf_file.endswith('.pdf'):
        pdf_path = os.path.join(pdf_dir, pdf_file)
        output_path = os.path.join(output_dir, f"{os.path.splitext(pdf_file)[0]}.txt")
        
        print(f"处理文件: {pdf_file}")
        try:
            result = processor.ocr_pdf(pdf_path)
            
            # 保存识别结果到文本文件
            with open(output_path, 'w', encoding='utf-8') as f:
                f.write(f"=== PDF OCR 识别结果 ===\n")
                f.write(f"文件名: {pdf_file}\n")
                f.write(f"页数: {result['page_count']}\n")
                f.write(f"使用引擎: {result['engine']}\n\n")
                f.write(result['text'])
            
            print(f"处理完成,结果已保存到: {output_path}")
        except Exception as e:
            print(f"处理失败: {e}")

混合使用两种引擎

from scripts.pdf_ocr_processor import PDFOCRProcessor

def process_with_best_engine(pdf_path):
    """尝试使用RapidOCR,如果效果不佳则使用硅基流动API"""
    # 首先使用RapidOCR本地引擎
    rapid_processor = PDFOCRProcessor(engine="rapid")
    rapid_result = rapid_processor.ocr_pdf(pdf_path)
    
    # 简单评估识别效果(例如:检查识别出的文本长度)
    text_length = len(rapid_result['text'])
    
    if text_length < 100:  # 如果识别出的文本太短,可能效果不佳
        print("RapidOCR识别效果可能不佳,尝试使用硅基流动API...")
        silicon_processor = PDFOCRProcessor(engine="siliconflow")
        silicon_result = silicon_processor.ocr_pdf(pdf_path)
        return silicon_result
    else:
        return rapid_result

# 使用示例
result = process_with_best_engine('path/to/your/document.pdf')
print(f"识别完成,使用引擎: {result['engine']}")
print(result['text'])

支持的文件格式

  • PDF文件: .pdf
  • 图片文件: .jpg, .jpeg, .png, .bmp, .gif, .tiff, .webp

输出格式

{
    "text": "识别的完整文本内容",
    "page_count": 页数,  # 图片文件始终为1
    "engine": "rapid" | "siliconflow"  # 使用的OCR引擎
}

使用场景

  • 处理扫描版合同、协议等文档
  • 提取影印版书籍、报告中的文字
  • 处理无法直接复制文字的PDF文件
  • 批量处理扫描版PDF文档
  • 识别截图、扫描件等图片中的文字
  • 处理手写体或印刷体图片文字识别

注意事项

  1. RapidOCR引擎

    • 完全免费,无需网络连接
    • 首次使用会自动下载模型文件
    • 识别速度取决于CPU性能
  2. 硅基流动API引擎

    • 需要有效的API密钥
    • 可能会产生费用
    • 识别速度取决于文件页数、图片大小和网络状况
  3. 对于复杂的扫描版PDF或图片,识别准确率可能会有所不同

  4. 建议使用高清晰度的扫描版PDF或图片以获得更好的识别效果

触发使用不同引擎的提示词

在与 AI IDE 中的助手交互时,您可以使用以下提示词来指定使用不同的 OCR 引擎:

📍 触发 RapidOCR(本地引擎)的提示词

  • "使用本地 OCR 引擎处理这个 PDF"
  • "用 RapidOCR 识别这个文件"
  • "本地处理,不需要 API"
  • "快速识别这个文档"
  • "离线处理这个 PDF"
  • "不使用硅基流动 API,用本地引擎"

📍 触发硅基流动 API(云端引擎)的提示词

  • "使用硅基流动 API 处理这个 PDF"
  • "用大模型 OCR 识别这个文件"
  • "高精度识别这个文档"
  • "处理复杂的扫描件"
  • "用云端 OCR 引擎"
  • "使用 AI 大模型识别"

📍 示例对话

示例 1:使用本地引擎

用户:帮我处理这个扫描版 PDF,用本地 OCR 引擎快速识别
助手:好的,我将使用 RapidOCR 本地引擎为您处理。请提供 PDF 文件路径。

示例 2:使用云端引擎

用户:这个 PDF 包含手写体,需要高精度识别,用硅基流动 API
助手:理解,我将使用硅基流动 API 大模型为您处理。请提供 PDF 文件路径和您的 API 密钥(如果尚未配置)。

示例 3:自动选择

用户:帮我识别这个 PDF,选择最合适的引擎
助手:我将默认使用 RapidOCR 本地引擎为您处理。如果识别效果不理想,我们可以尝试使用硅基流动 API。

🔧 技术实现

当 AI 助手接收到这些提示词时,会:

  1. 解析用户意图,确定要使用的引擎
  2. 调用 PDFOCRProcessor(engine="rapid") 或 PDFOCRProcessor(engine="siliconflow")
  3. 执行 OCR 识别并返回结果

🎯 最佳实践

  • 明确指定引擎:如果您对引擎有特定要求,最好在提示词中明确说明
  • 提供上下文:说明文档类型(如手写体、复杂格式等)有助于助手选择合适的引擎
  • 测试不同引擎:对于重要文档,可以尝试两种引擎并比较结果

通过使用这些提示词,您可以在与 AI IDE 交互时灵活控制 OCR 引擎的选择,获得最佳的识别效果

故障排除

常见问题及解决方案

  1. RapidOCR初始化失败

    • 问题:ModuleNotFoundError: No module named 'rapidocr_onnxruntime'
    • 解决方案:安装RapidOCR依赖:pip install rapid
how to use pdf-ocr

How to use pdf-ocr on Cursor

AI-first code editor with Composer

1

Prerequisites

Before installing skills in Cursor, ensure your development environment meets these requirements:

  • Cursor installed and configured on your development machine
  • Node.js version 16.0+ with npm package manager (verify with node --version)
  • Active project directory or workspace where you want to add pdf-ocr
2

Execute installation command

Execute the skills CLI command in your project's root directory to begin installation:

$npx skills add https://github.com/yejinlei/pdf-ocr-skill --skill pdf-ocr

The skills CLI fetches pdf-ocr from GitHub repository yejinlei/pdf-ocr-skill and configures it for Cursor.

3

Select Cursor when prompted

The CLI will show a list of available agents. Use arrow keys to navigate and space to select Cursor:

◆ Which agents do you want to install to?
│ ── Universal (.agents/skills) ── always included ────
│ • Amp
│ • Antigravity
│ • Cline
│ • Codex
│ ●Cursor(selected)
│ • Cursor
│ • Windsurf
4

Verify installation

Confirm successful installation by checking the skill directory location:

.cursor/skills/pdf-ocr

Reload or restart Cursor to activate pdf-ocr. Access the skill through slash commands (e.g., /pdf-ocr) or your agent's skill management interface.

Security & Verification Notice

We perform automated surface-level scans (Gen AI Scanner, Socket, Snyk) during installation. These checks detect common vulnerabilities but do not guarantee complete security. Always review skill source code and verify the publisher's reputation before production use.

Skills execute code in your development environment. Always verify the publisher's identity, review recent commits, and test in isolated environments before production deployment.

List & Monetize Your Skill

Submit your Claude Code skill and start earning

GET_STARTED →

Use Cases

Task Automation & Efficiency

Automate repetitive workflows and reduce manual effort

Example

Generate reports, summarize documents, draft communications

Save 3-5 hours per week on routine tasks

Knowledge Enhancement

Learn new skills, understand complex topics, get expert guidance

Example

Explain concepts, provide examples, suggest learning resources

Accelerate learning and skill development by 2x

Quality Improvement

Enhance output quality through reviews, suggestions, and refinements

Example

Review drafts, suggest improvements, catch errors

Improve work quality by 30-40% with less effort

Implementation Guide

Prerequisites

  • Claude Desktop or compatible AI client with skill support
  • Clear understanding of task or problem to solve
  • Willingness to iterate and refine outputs

Time Estimate

15-45 minutes depending on use case complexity

Installation Steps

  1. 1.Install skill using provided installation command
  2. 2.Test with simple use case relevant to your work
  3. 3.Evaluate output quality and relevance
  4. 4.Iterate on prompts to improve results
  5. 5.Integrate into regular workflow if valuable

Common Pitfalls

  • Expecting perfect results without iteration
  • Not providing enough context in prompts
  • Using skill for tasks outside its intended scope
  • Accepting outputs without review and validation

Best Practices

✓ Do

  • +Start with clear, specific prompts
  • +Provide relevant context and constraints
  • +Review and refine all outputs before using
  • +Iterate to improve output quality
  • +Document successful prompt patterns

✗ Don't

  • Don't use without understanding skill limitations
  • Don't skip validation of outputs
  • Don't share sensitive information in prompts
  • Don't expect skill to replace human judgment

💡 Pro Tips

  • Be specific about desired format and style
  • Ask for multiple options to choose from
  • Request explanations to understand reasoning
  • Combine AI efficiency with human expertise

When to Use This

✓ Use When

Use when skill capabilities match your task, clear ROI on time saved, and you can validate outputs. Best for repetitive tasks, learning, and quality improvement.

✗ Avoid When

Avoid when task requires deep expertise you can't validate, involves sensitive decisions, or when learning process is more valuable than speed of completion.

Learning Path

  1. 1Familiarize yourself with skill capabilities and limitations
  2. 2Start with low-risk, non-critical tasks
  3. 3Progress to more complex and valuable use cases
  4. 4Build expertise through regular use and experimentation

Discussion

Product Hunt–style comments (not star reviews)
  • No comments yet — start the thread.
general reviews

Ratings

4.661 reviews
  • Meera White· Dec 28, 2024

    pdf-ocr is among the better-maintained entries we tried; worth keeping pinned for repeat workflows.

  • Ava Chawla· Dec 24, 2024

    Keeps context tight: pdf-ocr is the kind of skill you can hand to a new teammate without a long onboarding doc.

  • Ama Nasser· Dec 20, 2024

    pdf-ocr fits our agent workflows well — practical, well scoped, and easy to wire into existing repos.

  • Naina Smith· Dec 12, 2024

    I recommend pdf-ocr for anyone iterating fast on agent tooling; clear intent and a small, reviewable surface area.

  • Meera Anderson· Dec 8, 2024

    We added pdf-ocr from the explainx registry; install was straightforward and the SKILL.md answered most questions upfront.

  • Min Agarwal· Nov 19, 2024

    Solid pick for teams standardizing on skills: pdf-ocr is focused, and the summary matches what you get after install.

  • Soo Anderson· Nov 15, 2024

    pdf-ocr has been reliable in day-to-day use. Documentation quality is above average for community skills.

  • Meera Martin· Nov 7, 2024

    pdf-ocr fits our agent workflows well — practical, well scoped, and easy to wire into existing repos.

  • Meera Harris· Nov 3, 2024

    pdf-ocr reduced setup friction for our internal harness; good balance of opinion and flexibility.

  • Soo Mehta· Oct 26, 2024

    We added pdf-ocr from the explainx registry; install was straightforward and the SKILL.md answered most questions upfront.

showing 1-10 of 61

1 / 7