DokuWiki

It's better when it's simple

User Tools

Site Tools


tips:htmltowiki

This is an old revision of the document!


Convert HTML to DokuWiki

An idea for converting existing documents to text files with DokuWiki syntax using an HTML to Text converter. I've set up a config file for the tool available at http://www.mbayer.de/html2text/ which produces some usable results: text2html.rc.

Thomas J. Messenger made a Perl module to convert HTML to DokuWiki's syntax available at http://www.citlink.net/~messengertj/ FIXME (see 81, too)

Or the one at CPAN: http://search.cpan.org/~diberri/HTML-WikiConverter-0.68/lib/HTML/WikiConverter.pm

Online HTML to wiki converters (based on WikiConverter?) (paste HTML or fetch URL):

A Converter Tool (HTML>DokuWiki, UTF8, Tablespacing) WikiTool

Pandoc: Webinterface: http://johnmacfarlane.net/pandoc/try/ (Pandoc includes a Haskell library and a standalone command-line program.)

Html2DokuWiki Converter GUI for Win32

Html2DokuWiki is a free HTML to DokuWiki converter for Win32 platforms. It is very simple to install and extremely easy to use. Just extract the executable from the archive, double-click to start, and Html2DokuWiki is ready to go.

To start converting, just type HTML into the upper edit. The converted DokuWiki syntax will immediately appear in the lower edit. Then select (CTRL+A) the converted document and copy / paste it into any DokuWiki site. Larger HTML documents can also be pasted into the HTML input.

Click here to download Html2DokuWiki.

Supported HTML Elements

Html2DokuWikiHtml2DokuWiki converts all HTML elements currently supported by DokuWiki:

  • <A> → links, outputs multiple links for formatting
  • <B>**
  • <BLOCKQUOTE>>, including nested quotes
  • <BR> → new line \\
  • <CODE>''
  • <DEL><del>
  • <DL>, <DT>, and <DD> → Simulate output as simple unnumbered lists with <DT> as bold.
  • <H1><H5>====== to ==
  • <I>//
  • <IMG> → images
  • <LI> → list items, including nested lists
  • <OL> → numbered lists
  • <P> → new paragraph
  • <PRE><code>
  • <S>**
  • <STRIKE>//
  • <STRONG>**
  • <SUB><sub>
  • <SUP><sup>
  • <TABLE> → tables
  • <TBODY> → recognized, but not output
  • <TD> → table cell, with align and colspan support
  • <TFOOT> → recognized, but not output
  • <TH> → table header cell, with align and colspan support
  • <THEAD> → recognized, but not output
  • <TR> → table row, with align support
  • <TT> → to ''
  • <U> → to __

Special Features

  • Internal links are converted to DokuWiki's : style, external ones are left unchanged. mailto: is removed from e-mail links.
  • Support for alignment in table cells and rows.
  • Image properties are converted, including alignment, width, and height.
  • Formatting is only applied where accepted by DokuWiki, but not to === … === and <code> … </code>, for example.
  • Full Unicode support, with optional UTF-8 input or output encoding.
  • Option to hide links from DokuWiki syntax.

Author, Contact, and Development

Html2DokuWiki is developed by Ralf Junker. You can contact the author via the Yunqa mailing list. Feel free to report praise, bugs, or suggestions about Html2DokuWiki.

The converter might also be available as a software library (*.DLL, *.DCU, or Delphi source code). Please get in touch if interested.

Version History

2007-08-27

  • Add inline formatting to table cells.
  • New UTF-8 input encoding.
  • Fix paragraph problems with alternating inline-tags and block-tags.
  • Do not escape // to %%//%% if part of an external URI.
  • Escape %% to <nowiki>%%<nowiki>.
  • Empty heading elements separate paragraphs.

2007-10-22

  • New: Support for <DL>, <DT>, and <DD>. Simulate output as simple unnumbered lists with <DT> as bold.
  • Improve: Empty paragraph inserts line break.
  • Improve: Recognize DokuWiki internal escapes %%, <nowiki>, and </nowiki> and escape them properly.
  • Improve: Escape table markup (| and ^) when inside a table.
  • Improve: Escape double parenthesis ((, which starts a DokuWiki footnote.
  • Improve: Suppress DokuWiki escapes and typography in <PRE> blocks.
  • Fix: Newline output for HTML like <P><PRE>one</PRE>two</P><BR>three.
  • Fix: <TBODY> table problem where a new row did not output a linebreak.

Workflow Microsoft Word 2 DokuWiki using html2wiki-GUI

I was looking for a way to convert about 150 DOC-files (Microsoft Word2000 or 2003) in our new wiki without too much hassle. The macros available didn't work for me.

Specifications

  • usable for multiple files
  • converting tables
  • converting images

suggested workflow (works best IMO)

  1. open word-document in OpenOffice
  2. save as HTML (the pictures will be stored in the same folder as the HTML)
  3. open html2dokuwiki pluginClick here to download Html2DokuWiki
  4. paste and copy HTML-code in the GUI
  5. save as txt-file (or create new page for integrity)
  6. copy txt-file in the DATA/PAGES/NAMESPACE-folder and Pictures in the DATA/MEDIA/NAMESPACE-folder

Is there a faster way (2009/07/18 by bobeck)

→ This is still the fastest and most reliable way (2014/09/24 by josh)

tips/htmltowiki.1419335442.txt.gz · Last modified: 2014-12-23 12:50 by 213.185.164.117

Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Share Alike 4.0 International
CC Attribution-Share Alike 4.0 International Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki