Andrew Perrault

LaTeX-LLM-cleaner

LLM chat interfaces extract text from documents in ways that silently corrupt math, drop figures, and mangle tables (especially .tex and .pdf files). With more effort these can be made properly LLM-readable: flattening .tex projects, inlining bibliographies and macros, OCRing mangled math, converting OMML to LaTeX, and summarizing figures and tables with a vision model. This Python CLI tool does all of this locally. The only external call is an optional Gemini API for figure/table summarization.