This is an old revision of the document!
Table of Contents
Convert HTML to DokuWiki
An idea for converting existing documents to text files with DokuWiki syntax using an HTML to Text converter. I've set up a config file for the tool available at http://www.mbayer.de/html2text/ which produces some usable results: text2html.rc.
Thomas J. Messenger made a Perl module to convert HTML to DokuWiki's syntax available at http://www.citlink.net/~messengertj/ (see 81, too)
Or the one at CPAN: http://search.cpan.org/~diberri/HTML-WikiConverter-0.68/lib/HTML/WikiConverter.pm
Online HTML to wiki converters (based on WikiConverter?) (paste HTML or fetch URL):
- HTML::WikiConverter - Convert HTML text to wiki markup
A Converter Tool (HTML>DokuWiki, UTF8, Tablespacing) WikiTool
Pandoc: Webinterface: http://johnmacfarlane.net/pandoc/try/ (Pandoc includes a Haskell library and a standalone command-line program.)
Html2DokuWiki Converter GUI for Win32
Html2DokuWiki is a free HTML to DokuWiki converter for Win32 platforms. It is very simple to install and extremely easy to use. Just extract the executable from the archive, double-click to start, and Html2DokuWiki is ready to go.
To start converting, just type HTML into the upper edit. The converted DokuWiki syntax will immediately appear in the lower edit. Then select (CTRL+A
) the converted document and copy / paste it into any DokuWiki site. Larger HTML documents can also be pasted into the HTML input.
Click here to download Html2DokuWiki.
Supported HTML Elements
Html2DokuWiki converts all HTML elements currently supported by DokuWiki:
- <A> → links, outputs multiple links for formatting
- <B> →
**
- <BLOCKQUOTE> →
>
, including nested quotes - <BR> → new line
\\
- <CODE> →
''
- <DEL> →
<del>
- <DL>, <DT>, and <DD> → Simulate output as simple unnumbered lists with
<DT>
as bold. - <H1> … <H5> →
======
to==
- <I> →
//
- <IMG> → images
- <LI> → list items, including nested lists
- <OL> → numbered lists
- <P> → new paragraph
- <PRE> →
<code>
- <S> →
**
- <STRIKE> →
//
- <STRONG> →
**
- <SUB> →
<sub>
- <SUP> →
<sup>
- <TABLE> → tables
- <TBODY> → recognized, but not output
- <TD> → table cell, with
align
andcolspan
support - <TFOOT> → recognized, but not output
- <TH> → table header cell, with
align
andcolspan
support - <THEAD> → recognized, but not output
- <TR> → table row, with
align
support - <TT> → to
''
- <U> → to
__
Special Features
- Internal links are converted to DokuWiki's
:
style, external ones are left unchanged.mailto:
is removed from e-mail links. - Support for alignment in table cells and rows.
- Image properties are converted, including alignment, width, and height.
- Formatting is only applied where accepted by DokuWiki, but not to
=== … ===
and<code> … </code>
, for example. - Full Unicode support, with optional UTF-8 input or output encoding.
- Option to hide links from DokuWiki syntax.
Author, Contact, and Development
Html2DokuWiki is developed by Ralf Junker. You can contact the author via the Yunqa mailing list. Feel free to report praise, bugs, or suggestions about Html2DokuWiki.
The converter might also be available as a software library (*.DLL, *.DCU, or Delphi source code). Please get in touch if interested.
Version History
2007-08-27
- Add inline formatting to table cells.
- New UTF-8 input encoding.
- Fix paragraph problems with alternating inline-tags and block-tags.
- Do not escape
//
to%%//%%
if part of an external URI. - Escape
%%
to<nowiki>%%<nowiki>
. - Empty heading elements separate paragraphs.
2007-10-22
- New: Support for
<DL>
,<DT>
, and<DD>
. Simulate output as simple unnumbered lists with<DT>
as bold. - Improve: Empty paragraph inserts line break.
- Improve: Recognize DokuWiki internal escapes
%%
,<nowiki>
, and</nowiki>
and escape them properly. - Improve: Escape table markup (
|
and^
) when inside a table. - Improve: Escape double parenthesis
((
, which starts a DokuWiki footnote. - Improve: Suppress DokuWiki escapes and typography in
<PRE>
blocks. - Fix: Newline output for HTML like
<P><PRE>one</PRE>two</P><BR>three
. - Fix:
<TBODY>
table problem where a new row did not output a linebreak.
Workflow Microsoft Word 2 DokuWiki using html2wiki-GUI
I was looking for a way to convert about 150 DOC-files (Microsoft Word2000 or 2003) in our new wiki without too much hassle. The macros available didn't work for me.
Specifications
- usable for multiple files
- converting tables
- converting images
suggested workflow (works best IMO)
- open word-document in OpenOffice
- save as HTML (the pictures will be stored in the same folder as the HTML)
- open html2dokuwiki pluginClick here to download Html2DokuWiki
- paste and copy HTML-code in the GUI
- save as txt-file (or create new page for integrity)
- copy txt-file in the DATA/PAGES/NAMESPACE-folder and Pictures in the DATA/MEDIA/NAMESPACE-folder
Is there a faster way (2009/07/18 by bobeck)
→ This is still the fastest and most reliable way (2014/09/24 by josh)