Chapter 3. Command-line version

The command-line version is a program used from the Windows command prompt. It provides the feature to convert the input docx file to an HTML file.

3.1 Command-line startup message

When you start the command-line version with the Windows command prompt, the following message will be displayed:

Please enter alt text.

(1) Serial number and maintenance deadline

The alphanumeric characters beginning with “XHW20” are the serial numbers.

The meaning of the message following the serial number is as follows:

Maintenance Deadline:

For official version

Trial Deadline:

For trial/evaluation version

(2) How to use

The display following "usage: Word2HTML" is a command line conversion option.

3.2 Conversion options

When running the command line, specify the input file name (required), the output file name and the conversion option after the Word2HTML message.

The following table shows the parameters for conversion options. Specifying input file is mandatory, but other parameters are specified only when necessary. If no parameters are specified, the default operation is used.

Parameter

Description

<input-file>

(Required) Specify the input file name including file path.

The <input-file> statement is not necessary, but directly describes the name of the input file.

<output-file>

Specify the output file name including file path. If not specified, the output will be an HTML file (extension:.html) with the same file name and in the same folder as the input file.

The <output-file> statement is not necessary, but directly describes the name of the input file.

-clrsettings

When this option is specified, option settings already specified in the default setting file, etc. will be cleared.

Notice:

If this parameter is not specified, some options will be set as duplicate as specified in the default setting file or additionally set, so please specify this parameter if you do not want them to be duplicated or additionally set.

-settings <settings-file>

Reads the conversion option setting file specified in <settings-file>.

-xhtml

By default, output HTML grammar tags. If -xhtml is specified, XML grammar tags are output.

Also, <section>/<nav> tags are output as <div class="section-area">/<div class="nav-area"> tags, respectively.

-viewport <content>

Outputs a meta tag of the following format to <head>.

<meta name=”viewport” content=” Content specified in ‘content’”>

-endl

Outputs a line break at the end of the block tag.

-emptyP

By default, blank lines (lines with line breaks only) in Word are ignored when outputting HTML. When this option is specified, empty <p></p> tags are output as many as the number of blank lines.

-nonrefid

While editing in Word, a lot of IDs that are not internally referenced may be created. By default, this converter scans IDs that are not internally referenced and deletes them when outputting HTML. Unreferenced IDs will not be deleted when this option is specified.

-imgwidth

Outputs the width of the image to the style attribute of the <img> tag in the size pasted into the Word document.

-hstrong

Ignores the emphasis specified in the heading style.

-embedimg

When this option is not specified (default), images are output to the image folder (see 5.5.1).

When this option is specified, the images are embedded in the body HTML with a data URL.

-(x|o)math

Specifies the output format for formulas edited in the Word formula editor. The following four output formats can be specified:

Unspecified: Output formulas to <img> tags as files in svg file format.

-math: Output formulas to <img> tags as files in MathML format.

-xmath: Output formulas in MathML format markup.

-omath: Output formulas in Word's own Office Math format.

-throughimg

Outputs the image in its original format inserted into Word.

-pstyle

Outputs the style name of the paragraph specified in Word by setting it as the value of the class attribute. Style names other than single-byte alphanumeric characters and some single-byte symbols are not output in the value of the class attribute.

-citation

Outputs the value of tag in the Citation field by setting it as the value of the href attribute of the <a> tag.

-tablestyle

Outputs the background color, border thickness, color, style (only some styles are supported), and table width specified for tables and table cells in a Word document using the style attribute of each HTML tag.

-textcolor

Outputs the color specified for the text as <span style="color:color value">.

-italic n|t|s

Specifies the output method when italics are specified for text:

-italic n: Do not output. (default)

-italic t: Output as <i>tag.

-italic s: Output as <span style="font-style:italic

Note that if the font displayed by the web browser does not have italics, it will not be displayed in italics.

-underline n|t|s

Specifies the output method when underline is specified for text:

-underline n: Do not output. (default)

-underline t: Output as <u> tag.

-underline s: Output as <span style="text-decoration-line:underline;">.

-linethrough n|t|s

Specifies the output method when strikethrough is specified for text:

-linethrough n: Do not output. (default)

-linethrough t: Outputs as <del> tag.

-linethrough s: Outputs as <span style="text-decoration-line: line-through;">.

-encoding <encoding>

When you want to specify a character code (encoding method) other than Unicode's UTF-8 for HTML files, specify the encoding method with this parameter.

-encoding Shift_JIS: Output in Shift-JIS (see Note 1)

-encoding UTF-16: Unicode's UTF-16 encoding

Note 1: Because fewer character types are specified in Shift-JIS than in Unicode, Unicode characters that cannot be handled by Shift-JIS are output as &#x character_number; (character_number is a hexadecimal number). Note that the old model-dependent characters added by Microsoft to JIS X0208 (e.g., ①, ②) are treated as Shift-JIS characters.

-split 1|2|3

When this parameter is specified, the HTML file is split and output according to the outline level of the Word document. Outline level can be specified from 1 to 3.

-tocout

When this parameter is specified, the table of contents inserted by the Word table of contents function when the -split parameter is specified is output as a separate HTML file (inc-toc.html).

The inc-toc.html file can be included in the split HTML file using JavaScript. inc-toc.html does not output tags such as <head> and <body> other than the tags for the table of contents.

Please refer to the following web page for a sample of how to include a table of contents using JavaScript.

https://www.antennahouse.com/html-on-word-samples

If this parameter is not specified, the table of contents will be output at the top of all the split HTML files.

-pagenavi language

When this parameter is specified, links to the previous and next pages are output at the top (immediately after the table of contents, if any) and bottom of the HTML file that was split when the -split parameter was specified.

If "ja" is specified in the "language" field following -pagenavi, "前へ" and "次へ" links are output in Japanese.

If you specify anything other than "ja" in the "language" field or omit it, "Prev" and "Next" links will be output in English.

If the previous or next page does not exist, each link is omitted.

-lang language

With this option, you can specify the language (lang attribute) to be output in the <html> tag of the output HTML file. Specify the language code in the "language" field following -lang. (e.g. "ja" for Japanese, "en" for English.)

If "none" is specified for "language", the lang attribute is not output to the <html> tag.

If this option is not specified, or if values other than single-byte alphanumeric characters or single-byte hyphens are specified, "ja" (Japanese) or "en" (English) is output, inferred from the Word document.

If the "-xhtml" parameter is specified, the language code specified for the xml:lang attribute and lang attribute of the <html> tag, respectively, is output.

e.g. <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="ja" lang="ja">

-defstyle

When this option is specified, the <style> element (element specifying the default CSS style) in <head> is not output.

-spaceindent

When this option is specified, the indentation is converted to a single full-width space when one or more indentations are specified at the beginning of a paragraph.

-outputbr

Instead of enclosing a paragraph in a <p> tag, a <br> tag is output at the end of the paragraph. This is invalid when “-xhtml” parameter is specified.

-fileimages

Name the folder that stores image files as "destination_file_name.images". See section 5.5.1 for details.

-css cssfile

Links the CSS file. Place the CSS file in a folder on Windows and specify its path. An error will occur if the specified CSS file does not exist. You can optionally specify “media”.

Outputs a link tag of the following format in <head>. (if media is specified)

<link rel="stylesheet" href="xxx.css" media="print">

The specified CSS file is copied to the HTML output destination folder.

You can specify multiple pairs of -css and CSS files.

If the "-xhtml" parameter is specified, <meta> and <link> tags will be output.

<meta http-equiv="Content-Style-Type" content="text/css" />

<link rel="stylesheet" href="xxx.css" type="text/css" media="print" />

-js javascript-path

Place the script tag in <head> and specify the path (URL) of the JavaScript file in its src attribute. The JavaScript file will not be copied to the specified location, so please save the JavaScript file to the specified file path.

No error will occur even if the specified JavaScript path does not exist.

You can specify multiple pairs of -js and JavaScript files.

If the "-xhtml" parameter is specified, <meta> and <script> tags will be output.

<meta http-equiv="Content-Script-Type" content="text/javascript" />

<script type="text/javascript" src="xxx.js"></script>

-savesettings <settings-file>

Saves the specified values of the conversion option parameters at command line execution with the file name specified in <settings-file>. See 3.4.2 for details on setting files.

-savedefault

Outputs the specified values of conversion option parameters at command line execution to the default settings file (def-settings.xml). See 3.4.1 for details on default setting file.

3.3 Command line operation examples

The following is an example of using the command line with NewsRelease.docx as the original file name, NewsRelease.html as the destination file name, and sample.news.css as the CSS file.

Please enter alt text.

If the conversion is successful, the following message is displayed and an HTML file is created.

Please enter alt text.

3.4 Setting file

You can specify conversion options in a setting file instead of specifying them on the command line by saving the conversion options in advance in a setting file. There are two types of setting files

  1. Default setting file
  2. Conversion option setting file

3.4.1 Default setting file

The default settings file allows you to switch the default operation. The file name of the default settings file is "def-settings.xml" and the file can be saved in either of the following two locations:

  1. Same folder as the EXE file (Word2HTML.exe)
  2. Roaming folder

If the default settings file is placed in the same folder as the EXE file, the default values will be the same for all users.

If the default settings file is placed in the Roaming folder, it will be a different default settings file for each user. The default setting file path for the Roaming folder is usually,

C:\Users\USER\AppData\Roaming\AntennaHouse\xhw\2.0\def-settings.xml

When default setting files are in both two folders, the contents of the default setting file in the Roaming folder takes precedence.

Note that if the default setting file contains ON/OFF type conversion option settings, specifying the same conversion option as an add-in or command line parameter will invert ON/OFF.

When specify the "-savedefault" parameter of the conversion option you can create a default settings file (def-settings.xml) in the Roaming folder. The default settings file is in XML format and can be edited with a text editor.

For example, specifying the following on the command line will create a default setting file with the; (1) output line breaks at the end of block tags (-endl) and, (2) output underlines as <u> tags (-underline t).

Please enter alt text.

In addition, the following specification will create a default settings file that clears all the settings in the default settings file and restores them to the default settings of the program itself.

Please enter alt text.

Tip

When rewriting default settings, some parameters may be turned on or off each time they are specified, or set values may be added. To avoid unexpected setting values, rewrite the default settings with the "-clrsetting" parameter specified to clear the setting values.

e.g. Word2HTML.exe -clrsettings -endl -css sample.css -savedefault

The following is the default settings file created by specifying "-clrsettings -savedefault" as a conversion option parameter. Parameters with no default values will not be set.

<?xml version="1.0"?>
<word-to-html-settings>
<enable-XHTML enable="false"/>
<viewport content=""/>
<enable-endl enable="false"/>
<enable-empty-paragraph enable="false"/>
<enable-non-reference-id enable="false"/>
<enable-image-width enable="false"/>
<enable-heading-strong enable="true"/>
<enable-embed-image enable="false"/>
<enable-mathml enable="false"/>
<xml-mathml enable="false"/>
<xml-omath enable="false"/>
<through-image enable="false"/>
<enable-pstyle enable="false"/>
<enable-citation enable="false"/>
<text-color enable="false"/>
<output-br enable="false"/>
<style-tag enable="true"/>
<space-indent enable="false"/>
<fil-images enable="false"/>
<italic out="n"/>
<underline out="n"/>
<linethrough out="n"/>
<split val="0"/>
<tocout enable="false"/>
<lang val=""/>
</word-to-html-settings>

3.4.2 Conversion option setting file

The conversion option setting file is a file that saves parameter values for conversion options.

It is read at command line execution by specifying the name of the conversion option setting file.

If you repeatedly convert using the same settings, you can save the conversion options in a settings file so that the next time you convert, you only need to specify the settings file instead of specifying the same options.

The file name of the conversion option setting file is optional.

The conversion option setting file can be created as a file specified by the "-savesettings" conversion option when the command line is executed.

The conversion option setting file is in XML format, so parameter values can be modified using a text editor.

3.4.3 Setting file format

The settings file is an XML file whose root element is "word-to-html-settings" and the items to be set for conversion options are the child element type names. The format of the default setting file and the conversion option setting file are the same. The correspondence between each element type name and the conversion option parameters is shown in the table below.

Element type name

Attribute

Program default value

Corresponding conversion option parameter

word-to-html-settings

enable-XHTML

enable

false

-xhtml

viewport

content

-viewport

enable-endl

enable

false

-endl

enable-empty-paragraph

enable

false

-emptyP

enable-non-reference-id

enable

false

-nonrefiid

enable-image-width

enable

false

-imgwidth

enable-heading-strong

enable

true

-hstrong

enable-embed-image

enable

false

-embedimg

enable-mathml

enable

false

-math

xml-mathml

enable

false

-xmath

xml-omath

enable

false

-omath 

through-image

enable

false

-throughimg

enable-pstyle

enable

false

-pstyle

enable-citation

enable

false

-citation

text-color

enable

false

-textcolor

output-br

enable

false

-outputbr

style-tag

enable

true

-defstyle

space-indent

enable

false

-spaceindent

fil-images

enable

false

-fileimages

table-style

enable

N/A

-tablestyle

italic

out

n

-italic n|t|s

underline

out

n

-underline n|t|s

linethrough

out

n

-linethrough n|t|s

encoding

encoding

N/A

-encoding

link-css

src

N/A

-css css-file

link-js

src

N/A

-js javascript-path

split

val

0

-split 1|2|3

tocout

enable

false

-tocout

pagenavi

pagenavi

N/A

-pagenavi language

lang

val

-lang language

3.5 Parameters in the add-in menu

Only following two conversion option parameters can be specified in the add-in menu:

Checking the "Use specified CSS" checkbox corresponds to the “-css” parameter specification of the conversion option in the command line version. The command line version allows multiple pairs of “-css” and filename to be specified, but only one can be specified in the add-in.

Checking the "Line break with block tag" checkbox is equivalent to the “-endl” parameter specification of the conversion option.

The add-in does not allow you to specify other conversion options, so use the default settings file (see 3.4.1) to specify them if necessary.

3.6 Error messages

The error messages in the command-line version are:

Error message

Possible cause

‘Word2HTML’ is not recognized as an internal or external command, operable program or batch file.

The command-line version is not installed normally.

(Countermeasure) Reinstall.

The path to the folder where the command-line version is installed is not set.

(Countermeasure) In the Windows settings, set the path to the folder in the environment variable.

“Cannot Open File”

The conversion destination file cannot be opened.

(Countermeasure) It is possible that the conversion destination file has been opened with an editor, etc., and editing is locked. In that case, please finish editing.

(Countermeasure) It is considered that the CSS file for which the link is specified does not exist.

“Input file not found”

Input file not found.