What it does
DataPolish solves four small but recurring problems that show up whenever you work with text and AI tools:
- PDF Layout Fixer — repairs broken line breaks and soft hyphens left behind when you copy text out of a PDF.
- HTML Stripper — extracts clean plain text from any HTML snippet, using the browser's own DOM parser rather than fragile regex.
- Excel to Markdown — converts tab-separated rows (the format you get when you copy from Excel, Google Sheets, or Numbers) into a clean Markdown table.
- AI Prompt Optimizer — strips redundant whitespace and empty lines that waste LLM tokens without improving clarity.
How it is built
The entire application is a single HTML page, a single CSS file, and a single JavaScript file. No frameworks, no build step, no package manager. The result loads in well under a second on any connection and works offline once cached.
Everything runs in your browser. There is no backend that receives your data. The server's only job is to serve five static files over HTTPS. You can verify this for yourself by opening your browser's DevTools and watching the Network tab while you use the tools — you will see zero outgoing requests during processing.
How it is hosted
DataPolish runs in an isolated, read-only Docker container behind a reverse proxy with Let's Encrypt SSL. The container has dropped all unnecessary Linux capabilities and cannot escalate privileges. Standard security headers (CSP, HSTS, X-Frame-Options, Referrer-Policy, Permissions-Policy) are applied to every response.
Open source
The entire codebase, including the Nginx configuration and the Docker setup, is published under the MIT license at github.com/tosko21-ux/datapolish. Fork it, run it locally, deploy your own copy — whatever is useful.
Other small tools
If you found DataPolish handy, you may also like BikeGearDecoder — another minimalist single-page utility from the same author.
Contact
The fastest way to reach the maintainer is to open an issue or discussion on the GitHub repository.