I convert large documents (books) into various e-book reader formats. A typical scenario is to receive a manuscript that has been written in Microsoft Word or worse yet, it is delivered in the print format acrobat (pdf).
In the conversion process, the best intermediary format I’ve found is standard xhtml, the same format used in web pages. Problem is, MS Word, while being able to export filtered html (MS ‘Cleaned”), produces far from truly clean html.
Now, with smaller documents, I can use Dreamweaver’s ‘Paste Special’ command, and it will clean the dirty Word html, but on large manuscripts it chokes, forcing you to copy-paste a little at a time- not exactly a great use of time and certainly not what I’d expect to do, given the fact I own a ‘modern’ computer.
I began searching for a solution and came across plenty of posts by those with the same problem. I tried several programs and routines but still found the suggestions cumbersome or unable to really produce ‘clean’ html.
Then I tried a program that is a stand-alone batch converter* and add-in for MS Word (v2000 and above). I use Word 2013 on a Win7, 64 machine and it functions brilliantly. You open the document, run the Save-As-Clean DocToHtml command, set what-and-what-not to include in your output and presto changeo!- absolutely clean, perfect html, no matter the size of the document.
What is this little gem: Doc To Html
*This solution does require Word (v2000 and up), the standard used by most writers today.
After the conversion is complete the next step is to tag all the chapters with heading tags and then run the html through another program called Calibre. This is probably the most popular program of its type and is, incredibly, FREE.
All produce fairly clean e-book formats, althoiugh nothing works better than a clean start from truly basic, clean html.
If you have a manuscript and would like a quote, please give me a call, my rates are very competitive.