June 18, 2026

107 file formats, one API: universal file intelligence

PDFs, spreadsheets, images, video, audio, code files, and live websites — all handled by a single endpoint with the same schema interface.

By Bigyan Karki 1200 words 4 min read

Most document APIs support PDFs. Some add DOCX and images. But your AI agent encounters everything: spreadsheets from finance, PPTX decks from sales, HEIC photos from a phone, MP4 recordings from meetings, CSV exports from tools, and live websites. Each format traditionally needs its own parser, its own error handling, its own integration.

We built one API that handles all of them with the same interface.

The format coverage problem

A typical document processing pipeline handles maybe 5-10 formats. When your agent encounters format #11, you have two choices:

  1. Tell the user "unsupported format" (bad UX)
  2. Add another parser library, handle its edge cases, test it, deploy it (bad DX)

We chose a third option: support everything from day one. The API auto-detects the format, routes to the optimal processing pipeline internally, and returns the same structured output regardless of what went in.

Supported format categories

Documents (25+ formats)

PDF, DOCX, DOC, PPTX, PPT, XLSX, XLS, ODT, ODS, ODP, RTF, CSV, TSV, PAGES, NUMBERS, KEYNOTE, EPUB, MOBI, LaTeX, Markdown, HTML, XML, JSON, YAML, TOML

Every document format returns the same typed JSON when you define a schema. Extract fields from a PPTX the same way you extract from a PDF — no format-specific code needed.

Images (20+ formats)

JPG, PNG, GIF, WebP, SVG, AVIF, HEIC, HEIF, TIFF, BMP, ICO, PSD, AI, EPS, RAW (CR2, NEF, ARW, DNG)

Images are processed with vision models — extract text from photos, read whiteboard captures, pull data from screenshots, analyze diagrams. HEIC from iPhones works as well as PNG.

Video (15+ formats)

MP4, MOV, WebM, AVI, MKV, WMV, FLV, 3GP, M4V, OGV, MPEG, TS, VOB, ASF, SWF

Videos are transcribed and analyzed. Extract meeting notes, identify topics discussed, pull action items from recorded calls. The same schema-based interface works — define what you need, get it as JSON.

Audio (15+ formats)

MP3, WAV, M4A, FLAC, OGG, AAC, WMA, AIFF, OPUS, AMR, AC3, APE, WV, MKA, SPX

Audio files are transcribed and extracted from. Meeting recordings, podcast episodes, voice notes — extract structured data from spoken content with the same API call.

Code (40+ languages)

Python, JavaScript, TypeScript, Go, Rust, Java, C, C++, C#, Ruby, PHP, Swift, Kotlin, Scala, R, SQL, Shell, and more

Code files can be analyzed for structure, documentation extraction, dependency identification, or converted to annotated markdown with syntax highlighting.

Web (any URL)

Any publicly accessible URL — rendered with headless Chromium including full JavaScript execution

Websites are first-class inputs. Same schema, same output, same confidence scores. JavaScript-rendered pages, SPAs, dynamic content — all handled.

Same interface, every format

The key insight: your code doesn't change based on the input format. The schema is the same. The output is the same. The confidence scores work the same way.

schema = {
    "title": {"type": "string", "description": "Document or content title"},
    "key_points": {"type": "array", "description": "Main points or takeaways"},
    "people_mentioned": {"type": "array", "description": "Names of people referenced"}
}

# All of these work with the same schema:
client.extract(file="report.pdf", schema=schema)
client.extract(file="presentation.pptx", schema=schema)
client.extract(file="meeting.mp4", schema=schema)
client.extract(file="voice_note.m4a", schema=schema)
client.extract(file="screenshot.heic", schema=schema)
client.extract(file="https://example.com/blog/post", schema=schema)

Your agent defines one tool with one schema. It handles whatever file it encounters — because the API handles format detection and routing internally.

Format coverage vs. competitors

Category The Drive AI Reducto Extend Textract
Documents25+20+20+PDF, images
Images20+BasicBasicJPG, PNG, TIFF
Video15+NoNoNo
Audio15+NoNoNo
Code40+NoNoNo
WebsitesAny URLNoNoNo
Total107+ + URLs25+25+~5

Why breadth matters for AI agents

AI agents don't control what files they receive. A customer might upload:

  • A HEIC photo of a whiteboard from their iPhone
  • A PAGES document from their Mac
  • An M4A voice memo
  • A CSV export from their accounting software
  • A link to a Google Doc

If your agent's file tool only handles PDFs and DOCX, it fails on 3 out of 5 inputs. With universal format support, it handles all of them with the same code path.

Browse all supported formats or try the playground with any file type.

Try it yourself

Free tier included. No credit card required.