How do I extract text from a scanned PDF or image-based PDF?

The PDF skill can integrate with OCR libraries like Tesseract or pytesseract for image-based PDFs. Install the OCR dependency (`pip install pytesseract`), ensure Tesseract is installed on your system, and set `use_ocr=True` when calling `extract_text()`. This processes each page as an image and converts it to searchable text, though accuracy depends on image quality and language.

What's the difference between text extraction and table extraction?

Text extraction returns all content as a continuous string, preserving line breaks but losing table structure. Table extraction specifically identifies grid-based data and returns it as structured JSON objects with rows and columns, maintaining cell relationships. Use table extraction when you need to preserve data relationships; use text extraction for content analysis and language processing.

Can I use this skill to extract data from password-protected PDFs?

Yes, if you have the password. Pass the password to the extraction function: `pdf_tool.extract_text('file.pdf', password='your_password')`. The skill will decrypt the PDF before processing. Note that some PDFs use permission-only passwords (which prevent printing but allow reading)—these don't require a password for text extraction.

How do I merge multiple PDFs while preserving bookmarks and annotations?

Use the `merge_pdfs()` function with the `preserve_metadata=True` flag: `pdf_tool.merge_pdfs(file_list, output_path='merged.pdf', preserve_metadata=True)`. This maintains bookmarks from source documents and preserves any existing annotations, though note that bookmark hierarchies may flatten depending on the source structure.

What file size limits exist for PDF processing?

The skill can handle PDFs up to several hundred MB, though processing time increases significantly with size. For files over 500 MB, consider splitting them first using `split_pdf()`. Memory usage scales with document complexity—scanned PDFs with high-resolution images consume more memory than text-based PDFs of the same page count.

How do I add annotations programmatically to a PDF?

Use the `annotate_pdf()` function to add comments, highlights, or markup: `pdf_tool.annotate_pdf('input.pdf', annotations=[{'page': 0, 'type': 'highlight', 'coordinates': [x1, y1, x2, y2]}])`. Annotations are stored as PDF objects that remain visible in any PDF reader. You can add text comments, highlights, strikethrough, and underline markups.

Can the skill detect and extract metadata like author and creation date?

Yes. Call `get_metadata(pdf_file)` to retrieve all document properties including author, creation date, modification date, subject, keywords, and custom fields. This returns a dictionary of all available metadata. Note that metadata must be explicitly embedded in the PDF—if not present, those fields will be empty.

How does the skill handle PDFs with different encodings or special characters?

The skill automatically detects text encoding and handles Unicode characters, supporting PDFs with international text, mathematical symbols, and special characters. However, some legacy PDFs with non-standard encodings may require the `encoding='latin-1'` parameter. Test extraction on a sample page if you encounter character issues.

pdf | Claude Skill | cload.cloud

What pdf Does

The PDF skill is a comprehensive tool for working with PDF documents programmatically. It enables you to extract text and structured data from PDFs, retrieve metadata, merge multiple documents, and add annotations—all without manual file handling. This skill is essential for teams that process large volumes of documents, automate data extraction workflows, or need to programmatically manipulate PDF files as part of their AI agent pipelines.

Designed for product designers and power users leveraging Claude AI agents, this skill transforms PDFs from static documents into actionable data. Whether you’re building workflows that parse invoices, consolidate reports, or annotate contracts, the PDF skill handles the heavy lifting of document processing. It integrates seamlessly with Claude’s agent framework, making it ideal for automation workflows that touch documentation.

How to Install

Prerequisites

Python 3.8 or higher
pip package manager
Access to Claude API credentials

Installation Steps

Clone or download the skills repository

git clone https://github.com/anthropics/skills.git
cd skills/skills/pdf

Install required dependencies

pip install pypdf pdfplumber python-dotenv

Configure your Claude API key
- Create a .env file in your project directory
- Add your API key: ANTHROPIC_API_KEY=your_api_key_here
- Never commit this file to version control

Import the skill into your Claude agent

from skills.pdf import PDFSkill
pdf_tool = PDFSkill(api_key=os.getenv('ANTHROPIC_API_KEY'))

Verify installation

# Test basic functionality
text = pdf_tool.extract_text('sample.pdf')
print(text[:100])  # Print first 100 characters

Add to your agent configuration
- Register the PDF skill in your Claude agent’s tool manifest
- Test extraction with a sample PDF file

Use Cases

Invoice and Receipt Processing: Automatically extract line items, amounts, dates, and vendor information from hundreds of invoices to feed into accounting systems or expense management platforms.,Legal Document Review: Parse contracts and agreements to identify key clauses, dates, and obligations, enabling faster contract analysis and compliance checking across large document sets.,Report Consolidation: Merge quarterly reports, research documents, or project summaries into unified PDFs while maintaining formatting, then extract key metrics for executive dashboards.,Form Data Extraction: Pull structured data from filled PDF forms (tax returns, applications, surveys) and transform it into CSV or JSON for database import without manual data entry.,Document Annotation Workflows: Add comments, highlighting, and metadata tags to PDFs as part of review processes, enabling collaborative document workflows with audit trails.

How It Works

The PDF skill leverages two primary libraries—PyPDF and pdfplumber—to handle different aspects of PDF processing. PyPDF excels at document-level operations like merging, splitting, and metadata manipulation, while pdfplumber specializes in precise text and table extraction by understanding PDF geometry and layout. When you invoke text extraction, the skill analyzes the PDF’s internal structure to determine whether content exists as selectable text or embedded images. For text-based PDFs, it preserves layout information including spacing and column structure; for image-heavy or scanned PDFs, it can integrate OCR capabilities through optional dependencies.

Table extraction is particularly sophisticated—the skill uses pdfplumber’s table detection algorithms to identify grid structures, parse cells, and reconstruct tabular data as JSON or CSV. This approach maintains relationships between headers and values that simple text extraction would lose. Metadata extraction retrieves document properties like author, creation date, title, and custom fields embedded in the PDF’s information dictionary, which is crucial for document management and compliance workflows.

For merging and annotation operations, the skill constructs new PDF objects that reference the original pages while applying transformations. Annotations are stored as PDF markup objects, preserving them for downstream applications. All operations can be chained—extract metadata to determine file importance, extract tables for processing, then merge results back into an annotated output document. This modular approach integrates seamlessly with Claude’s agent framework, allowing multi-step workflows where each extraction feeds into AI analysis or data transformation steps.

Pros and Cons

Pros:

Seamless integration with Claude agent framework for end-to-end automation
Handles both text-based and scanned (image) PDFs with optional OCR
Accurate table detection preserves data structure for complex layouts
Lightweight Python implementation with minimal dependencies
Open-source with community support and active maintenance
No cloud dependency—process PDFs locally with full privacy
Supports batch operations for processing large document volumes efficiently

Cons:

OCR accuracy depends on image quality and requires additional dependencies
Performance degrades significantly with very large PDFs (500+ MB)
Bookmark hierarchies may flatten when merging complex multi-level structures
Limited form field extraction compared to commercial PDF APIs
Metadata preservation during transformations may lose some custom properties
No built-in support for extracting data from dynamic form widgets or XFA forms
Requires manual setup compared to drag-and-drop commercial tools

Document Parsing — General-purpose document processing for various formats beyond PDFs,CSV/Excel Handler — Export extracted PDF tables to spreadsheets or import tabular data,Image Recognition — Complement OCR capabilities for complex document layouts,File Management — Organize, version, and move processed PDF files,Data Transformation — Convert extracted PDF data into different formats for downstream systems

Alternatives

Adobe PDF Services API — Cloud-based PDF processing with advanced features like PDF generation and form data extraction. More expensive and cloud-dependent, but handles complex commercial workflows and offers guaranteed uptime.,Apache PDFBox — Open-source Java library offering similar extraction and manipulation capabilities. Better for Java-based systems but requires JVM overhead compared to Python solutions.,IronPDF / SelectPdf — Commercial solutions with robust table detection and image-to-PDF conversion. Offer superior support and specialized features but at higher cost and vendor lock-in risk.

pdf

What pdf Does

How to Install

Prerequisites

Installation Steps

Use Cases

How It Works

Pros and Cons

Alternatives

Key terms

Frequently Asked Questions

More in Documentation

Twitter Algorithm Optimizer

NotebookLM Integration

Meeting Insights Analyzer

family-history-research

pdf

What pdf Does

How to Install

Prerequisites

Installation Steps

Use Cases

How It Works

Pros and Cons

Related Skills

Alternatives

Twitter Algorithm Optimizer

NotebookLM Integration

Meeting Insights Analyzer

family-history-research