doclinglisted
Install: claude install-skill existential-birds/beagle
# Docling Document Parser
Docling is a document parsing library that converts PDFs, Word documents, PowerPoint, images, and other formats into structured data with advanced layout understanding.
## Quick Start
Basic document conversion:
```python
from docling.document_converter import DocumentConverter
source = "https://arxiv.org/pdf/2408.09869" # URL, Path, or BytesIO
converter = DocumentConverter()
result = converter.convert(source)
print(result.document.export_to_markdown())
```
## Core Concepts
### DocumentConverter
The main entry point for document conversion. Supports various input formats and conversion options.
```python
from docling.document_converter import DocumentConverter
from docling.datamodel.base_models import InputFormat
from docling.document_converter import PdfFormatOption
from docling.datamodel.pipeline_options import PdfPipelineOptions
# Basic converter (all formats enabled)
converter = DocumentConverter()
# Restricted formats
converter = DocumentConverter(
allowed_formats=[InputFormat.PDF, InputFormat.DOCX]
)
# Custom pipeline options
pipeline_options = PdfPipelineOptions()
pipeline_options.do_ocr = True
pipeline_options.do_table_structure = True
converter = DocumentConverter(
format_options={
InputFormat.PDF: PdfFormatOption(pipeline_options=pipeline_options)
}
)
```
### ConversionResult
All conversion operations return a `ConversionResult` containing:
- `document`: The parsed `DoclingDocument`
- `status`: `Conversio