Monday, December 20, 2004

adobe provides pdf to text ?

Xeni@BoingBoing points us to an Adobe site that promises to convert PDF to text or HTML. Apart from the fact that Adobe "may occasionally access the content you submit", it seems like a useful tool. But for anyone with Linux, pdftotext seems equally adept (or not) at converting documents. I tried converting some of my papers with the Adobe tool. First of all, if you choose the HTML 3.2 option, the website chugs away for many minutes, and then reports the ever-so-informative "error occurred" error message.

So I tried being even kinder: I directed the script to generate text, and for windows. The source file is here, and the results are here. This is the conversion that pdftotext did. I can't say that one is significantly better than the other, but you'd think Adobe could do a better job.

p.s I use Type I fonts: no bitmap nonsense. So there is no excuse really....
