Chapter 1 — The Spreadsheet That Wouldn’t Stay Still
Somewhere in your backlog there is a task called “import marketing data”. You asked for CSV; instead you got an Excel file with four sheets, merged cells, two header rows and the occasional #N/A
sprinkled in for good measure. You could fire up a GUI editor or write a throw‑away script, but neither option scales beyond today’s task.
JSONSheet exists for that exact moment: when data should be simple JSON, but reality shows up as a quirky spreadsheet or a quote embedded in a PNG.
Chapter 2 — What JSONSheet Actually Does
• It streams files directly from publicly reachable HTTPS links—Google Drive, Dropbox, S3 presigned URLs, you name it.
• It parses .xlsx
and .csv
on the fly, detecting titles, headers, and the last row of real data.
• It can tidy up headers, normalise 60+ date formats into ISO‑8601, strip currency symbols, and drop duplicate or empty rows.
• It exposes one more endpoint that reads text from solid‑coloured images via a lightweight Tesseract 5 CLI wrapper.
Chapter 3 — Why Simplicity Matters
JSONSheet was designed under a single constraint: no state, ever. Every request begins with an external link and ends with JSON. Nothing is cached beyond an in‑memory LRU. The service can be started, scaled, or recycled in seconds. For serverless platforms, that means predictable cold‑start times; for CI jobs, it means no leftover containers chewing up CPU credits.
Chapter 4 — When to Reach for It
- Quick data prototyping preview a partner’s Excel sheet in Postman.
- ETL glue transform daily exports before loading into Snowflake.
- Edge functions convert a Google Sheet to JSON inside a Cloudflare Worker without extra libraries.
- Automation scrape the headline from a banner image dropped into your team’s shared drive.
Chapter 5 — The Two Endpoints
/converter accepts a JSON body with one field:
{
"url": "https://…/export?format=xlsx"
}
Optional query parameters let you turn data cleaning on or off; full details live in the Excel → JSON chapter of this documentation.
/ocr expects an array of image URLs:
{
"urls": [
"https://example.com/banner.jpg",
"https://example.com/quote.png"
]
}
The response is a list of objects each containing the original URL and the text found inside—one endpoint, nothing else.
Current public limits
50 requests / minute • 2 000 requests / day
This keeps the lights on for everyone. If your use‑case needs more, drop us a note. Chapter 6 — Epilogue
The rest of the documentation dives into request syntax, cleanup flags, error codes and example snippets in half a dozen languages. “Simple” does not mean “under‑documented.” Feel free to skim, copy‑paste, and above all, build something that spares your future self another round of manual CSV surgery.