How to convert a Word document and prepare it for HTML import
This article talks about how to convert a Word document to HTML and prepare it for import by our HTML Import module.
In Microsoft Word
The first step is to make sure the source Word document is clean and does not contain any comments or other markups. To do this, go to:
Review > Delete > Delete All Comments in Document
Then save the Word document as HTML by going to:
File > Save As > Web page, filtered
This will also save all the images and supporting files in a directory that matches the HTML file name. Rename this directory to "images" and only keep image files referenced by the HTML in it (e.g. delete script files such as CSS and JavaScript).
Compress this directory to images.zip
Close Word.
In Dreamweaver
This step is optional but it will help create standard compliant HTML
Dreamweaver has a reasonably good function built-in to clean up HTML document exported from Word.
First we need to ensure the correct HTML standard is used. In Dreamweaver:
File > New > HTML and be sure the DocType is the standard used on your website > Create
Now open the HTML file exported from Word in Dreamweaver. Go to code view and copy all code between <body> and </body> and paste the copied code to between the <body> tags in the new HTML file created above.
Next go to: Command > Clean Up Word HTML, leave all options in their default settings and click OK. This will take a few seconds and will pop up a message box showing the clean-up status
Then go to: Command > Clean Up XHTML, leave all options in their default settings and click OK
Then go to the source code view and change all references to the directory created by Word to "images/". A simple search and replace all operation in Dreamweaver will do nicely.
As the result, all images referenced by the HTML file should be visible in Dreamweaver.
Save the HTML file to where the directory images/ is and be sure to use .html as the extension.
In Drupal
Login as an administrator. A black admin toolbar will appear at the top of the screen. In the admin bar, go to:
Content > Add content > Advanced publication
Type the title of the document to be imported in the title filed and enter some introduction text in the body field.
Scroll to the FEED section and attach the HTML file to import to "File" and attach the zipped images directory to "Images".
Be sure to choose the correct Heading level depth. For example, if "H3" is selected, any heading at and above this level will be imported as a new page.
Click Save
Go to Import tab and click Import
The website will now automatically import and break the HTML file.
Re-organise the structure of an imported document
Site administrators can re-organise the structure of an imported document at anytime. Go to:
Content > BOOKS tab > edit order and titles next to the document you wish to re-organise
This will load a drag-n-drop interface where admins can re-arrange imported pages. Drag and drop imported pages to your desired place and save. Please note sometimes orders of pages may be screwed up by this tool. Double check and re-organise if necessary.

Comments
hello ,i'm surprise because i
hello ,i'm surprise because i haven't find the Advanced publication item menu ?
Hi the "Advanced publication"
Hi the "Advanced publication" menu is a new content type you will have to create when you first install and configure the HTML Import module. Please see https://www.drupal.org/project/html_import under "Installation and configuration".
Add new comment