The enigma of numbers PDF

Well I recently asked a question about getting a PDF-file to become an XML-file and then return it to a PDF-file preferably exactly the same as the original, but at least almost the same.

I’ve been trying different methods and so far I came up with this one. This gets done by running: xsltproc -o intermediate-fo-file. The result is an intermediate XSL-FO which becomes a PDF by running: fop intermediate-fo-file. This PDF-file looks almost the same as the original ODT-file.

But still, say I have a PDF-file in the beginning, how could the same thing be done? The only chance of a lossless conversion from PDF to XML is to use a target XML vocabulary which has the same view of documents that PDF has. If this is a kind of thought experiment and you are thinking about the PDF-XML-PDF round trip to see when and how it’s possible, then you now know the reasons some people will give for believing it’s not possible in any general form. If you want this PDF-to-PDF data flow for some practical reason, you might want to reflect on whether your practical goals can be met in some other way. Gradually I’ve come to understand the difficulty in doing this task but still given the huge benefits in would accomplish given the fraction of space an XML requires in comparison with the same info presented in PDF. That’s why PDF-XML-PDF would be “swell” to solve.

Paul if your pdf documents are simple I’m fairly sure you can do what you originally asked. A-2 are well suited formats for long term archiving but rather bulky in comparison with XML. PDFs produced by any companies or authorities. PDFX might be able to help. It converts PDF articles to XML similar in structure to Docbook documents.

PDFX XML to the Docbook XML you already make PDFs out of. You might also consider the Tex alternative to XSL-FO, TeXML. I had an old XSL to turn PDFX-like XML into . Why don’t you tell us about the API you used ?