This is an old revision of the document!

Migration from MoinMoin to DokuWiki

Below you will find scripts in PHP and Python to facilitate the conversion process. Before running them you must eliminate the leftmost “>” in <>/code> and <>code> in the Python convert_page functions, or remove the “»”'s in the $replace Array(… in PHP scripts.

Are there any parameters that need to be passed to the PHP script and how is that to be done? According to the code there should be three parameters. Passed through the URL? Syntax? Can anyone help?

Another document on switching appears at http://www.emilsit.net/blog/archives/migrating-from-moinmoin-to-dokuwiki/

PHP

I have written a small PHP script to convert wiki pages from MoinMoin http://moinmoin.wikiwikiweb.de/ to DokuWiki syntax. It does not take care of all differences, but it worked for me.

#!/usr/bin/php
<?php
 
//check comman line parameters
if ($argc != 3 || in_array($argv[1], array('--help', '-help', '-h', '-?'))) {
  echo "\n  Converts all files from given directory\n";
  echo "  from MoinMoin to DokuWiki syntax. NOT RECURSIV\n\n";
  echo "  Usage:\n";
  echo "  ".$argv[0]." <input dir> <output dir>\n\n";
} 
else {
  //get input and output directories
  $inDir = realpath($argv[1]) or die("input dir error");
  $outDir = realpath($argv[2]) or die("output dir error");
  //just print information
  echo "\nInput Directory: ".$inDir."\n";
  echo "Output Directory: ".$outDir."\n\n";
 
  //get all files from directory
  if (is_dir($inDir)) {
    $files = filesFromDir($inDir);
  }
 
  //migrate each file
  foreach ($files As $file) {
    //convert filename
    $ofile = convFileNames($file);
    //just print information
    echo "Migrating from ".$inDir."/".$file." to ".$outDir."/".$ofile."\n";
 
    //read input file
    $text = readFl($inDir."/".$file);
 
    //convert content
    $text = moin2doku($text);
 
    //encode in utf8
    $text = utf8_encode($text);
 
    //write output file
    writeFl($outDir."/".$ofile, $text);
  }
}
 
 
function moin2doku($text) {
  /* like convFileNames and more
  *   ToDo: [[Datestamp]] delete?
  *         bold and italic, what goes wrong?
  *         images
  *         Problems with newline and [[BR]]
  *         CamelCase in Heading: it will be converted
  *         Moin handles code sections without closing }}} right, DokuWiki does not     
  */
 
  //line by line
  $lines = explode("\n", $text);
  foreach($lines As $line) {
    //start converting
    $find = Array(  
                  '/\[\[TableOfContents\]\]/',      //remove
                  '/\[\[BR\]\]$/',                  //newline at end of line - remove
                  '/\[\[BR\]\]/',                   //newline
                  '/#pragma section-numbers off/',  //remove
                  '/\["(.*)"\]/',                   //internal link
                  '/(\[http.*\])/',                 //web link
                  '/\{{3}/',                        //code open
                  '/\}{3}/',                        //code close
                  '/^\s\*/',                        //lists must have not only but 2 whitespaces before *
                  '/={5}(\s.*\s)={5}$/',            //heading 5
                  '/={4}(\s.*\s)={4}$/',            //heading 4
                  '/={3}(\s.*\s)={3}$/',            //heading 3
                  '/={2}(\s.*\s)={2}$/',            //heading 2
                  '/={1}(\s.*\s)={1}$/',            //heading 1
                  '/\|{2}/',                        //table separator
                  '/\'{5}(.*)\'{5}/',               //bold and italic
                  '/\'{3}(.*)\'{3}/',               //bold
                  '/\'{2}(.*)\'{2}/',               //italic
                  '/(?<!\[)(\b[A-Z]+[a-z]+[A-Z][A-Za-z]*\b)/',  //CamelCase, dont change if CamelCase is in InternalLink
                  '/\[\[Date\(([\d]{4}-[\d]{2}-[\d]{2}T[\d]{2}:[\d]{2}:[\d]{2}Z)\)\]\]/'  //Date value 
                  );
    $replace = Array(
                     '',                            //remove                                
                     '',                            //newline remove                                
                     '\\\\\ ',                      //newline
                     '',                            //remove                                
                     '[[${1}]]',                    //internal link
                     '[${1}]',                      //web link
                     '<>>code>',                      //code open - remove >>, its included for viewing in DokuWiki
                     '<>>/code>',                     //code close - remove >>, its included for viewing in DokuWiki
                     '  *',                         //lists must have 2 whitespaces before *
                     '==${1}==',                      //heading 5                        
                     '===${1}===',                    //heading 4                        
                     '====${1}====',                  //heading 3                        
                     '=====${1}=====',                //heading 2                        
                     '======${1}======',              //heading 1                        
                     '|',                           //table separator                       
                     '**//${1}//**',                //bold and italic
                     '**${1}**',                    //bold                                  
                     '//${1}//',                    //italic
                     '[[${1}]]',                    //CamelCase
                     '${1}'                         //Date value
                     );
    $line = preg_replace($find,$replace,$line);
 
    $ret = $ret.$line."\r\n";
  }
  return $ret;
}
 
 
function convFileNames($name) {
  /* ö,ä,ü, ,. and more
  */
  $find = Array('/_20/',
                '/_5f/',
                '/_2e/',
                '/_c4/',
                '/_f6/',
                '/_fc/',
                '/_26/',
                '/_2d/'
                );
  $replace = Array('_',
                   '_',
                   '_',
                   'Ae',
                   'oe',
                   'ue',
                   '_',
                   '-'
                   );
  $name = preg_replace($find,$replace,$name);
  $name = strtolower($name);
  return $name.".txt";
}
 
 
function filesFromDir($dir) {
  $files = Array();
  $handle=opendir($dir);
  while ($file = readdir ($handle)) {
     if ($file != "." && $file != ".." && !is_dir($dir."/".$file)) {
         array_push($files, $file);
     }
  }
  closedir($handle); 
  return $files;
}
 
function readFl($file) {
  $fr = fopen($file,"r");
  if ($fr) {
    while(!feof($fr)) {
      $text = $text.fgets($fr);
    }
    fclose($fr);
  }
  return $text;
}
 
function writeFl($file, $text) {
  $fw = fopen($file, "w");
  if ($fw) {
    fwrite($fw, $text);
  }
  fclose($fw);
}
 
?>

Python

Based on the above two I've written a python script that automates the file renaming, copying and conversion business. Worked for me on windows.

import sys, os, os.path
import re
from os import listdir
from os.path import isdir, basename
 
def check_dirs(moin_pages_dir, output_dir):
    if not isdir(moin_pages_dir):
        print >> sys.stderr, "MoinMoin pages directory doesn't exist!"
        sys.exit(1)
    if not isdir(output_dir):
        print >> sys.stderr, "Output directory doesn't exist!"
        sys.exit(1)
 
 
def get_page_names(moin_pages_dir):
    items = listdir(moin_pages_dir)
    pages = []
    for item in items:
        item = os.path.join(moin_pages_dir, item)
        if isdir(item):
            pages.append(item)
    return pages
 
 
def get_current_revision(page_dir):
    rev_dir = os.path.join(page_dir, 'revisions')
    revisions = listdir(rev_dir)
    revisions.sort()
    return os.path.join(rev_dir, revisions[-1])
 
 
def convert_page(page):
    regexp = (
        ('\[\[TableOfContents\]\]', ''),            # remove
        ('\[\[BR\]\]$', ''),                        # newline at end of line - remove
        ('\[\[BR\]\]', '\n'),                       # newline
        ('#pragma section-numbers off', ''),        # remove
        ('^##.*?\\n', ''),                          # remove
        ('\["(.*)"\]',  '[[\\1]]'),                 # internal link
        ('(\[http.*\])', '[\\1]'),                  # web link
        ('\{{3}', '<>code>'),                       # code open
        ('\}{3}', '<>/code>'),                      # code close
        ('^\s\*', '  *'),                           # lists must have not only but 2 whitespaces before *
        ('={5}(\s.*\s)={5}$', '==\\1=='),           # heading 5
        ('={4}(\s.*\s)={4}$', '===\\1}==='),        # heading 4
        ('={3}(\s.*\s)={3}$', '====\\1===='),       # heading 3
        ('={2}(\s.*\s)={2}$', '=====\\1====='),     # heading 2
        ('={1}(\s.*\s)={1}$', '======\\1======'),   # heading 1
        ('\|{2}', '|'),                             # table separator
        ('\'{5}(.*)\'{5}', '**//\\1//**'),          # bold and italic
        ('\'{3}(.*)\'{3}', '**\\1**'),              # bold
        ('\'{2}(.*)\'{2}', '//\\1//'),              # italic
        ('(?<!\[)(\b[A-Z]+[a-z]+[A-Z][A-Za-z]*\b)','[[\\1]]'),  # CamelCase, dont change if CamelCase is in InternalLink
        ('\[\[Date\(([\d]{4}-[\d]{2}-[\d]{2}T[\d]{2}:[\d]{2}:[\d]{2}Z)\)\]\]', '\\1')  # Date value
    )
    for i in range(len(page)):
        line = page[i]
        for item in regexp:
            line = re.sub(item[0], item[1], line)
        page[i] = line
    return page
 
 
def print_help():
    print "Usage: moinconv.py <moinmoin pages directory> <output directory>"
    print "Convert MoinMoin pages to DokuWiki."
    sys.exit(0)
 
 
def print_parameter_error():
    print >> sys.stderr, 'Incorrect parameters! Use --help switch to learn more.'
    sys.exit(1)
 
 
if __name__ == '__main__':
    if len(sys.argv) > 1:
        if sys.argv[1] in ('-h', '--help'):
            print_help()
        elif len(sys.argv) > 2:
            moin_pages_dir = sys.argv[1]
            output_dir = sys.argv[2]
        else:
            print_parameter_error()
    else:
        print_parameter_error()
 
    check_dirs(moin_pages_dir, output_dir)
    print 'Input dir is: %s.' % moin_pages_dir
    print 'Output dir is: %s.' % output_dir
    print
 
    pages = get_page_names(moin_pages_dir)
    for page in pages:
        curr_rev = get_current_revision(page)
        curr_rev_desc = file(curr_rev, 'r')
        curr_rev_content = curr_rev_desc.readlines()
        curr_rev_desc.close()
 
        curr_rev_content = convert_page(curr_rev_content)
 
        page_name = basename(page).lower()
        out_file = os.path.join(output_dir, page_name + '.txt')
        out_desc = file(out_file, 'w')
        out_desc.writelines([it.rstrip() + '\n' for it in curr_rev_content if it])
        out_desc.close()
 
        print 'Migrated %s to %s.' % (basename(page), basename(out_file))

Extended Python

I've extended above script to this.

It fixes some bugs from above, moves attachments, convert attachment code, creates namespaces based on structure in MoinMoin-Wiki.
Converts some Codes of German 'Umlaute'.
Works in this version just on Linux, but should not be so difficult to convert to Windows.
Remember to change '<>code>' and '<>/code>' to <code> and </code>.

Use:

Download as convert.py
chmod a+rwx convert.py
./convert.py <pages folder of MoinMoin-Wiki> <pages folder of DokuWiki>

#!/usr/bin/python
import sys, os, os.path
 
import re
 
from os import listdir
 
from os.path import isdir, basename
 
 
 
def check_dirs(moin_pages_dir, output_dir):
 
    if not isdir(moin_pages_dir):
 
        print >> sys.stderr, "MoinMoin pages directory doesn't exist!"
 
        sys.exit(1)
 
    if not isdir(output_dir):
 
        print >> sys.stderr, "Output directory doesn't exist!"
 
        sys.exit(1)
 
 
 
 
 
def get_page_names(moin_pages_dir):
 
    items = listdir(moin_pages_dir)
 
    pages = []
 
    for item in items:
 
        item = os.path.join(moin_pages_dir, item)
 
        if isdir(item):
 
            pages.append(item)
 
    return pages
 
 
 
 
 
def get_current_revision(page_dir):
 
    rev_dir = os.path.join(page_dir, 'revisions')
 
    if isdir(rev_dir):
 
        revisions = listdir(rev_dir)
 
        revisions.sort()
 
        return os.path.join(rev_dir, revisions[-1])
    return ''
 
def copy_attachments(page_dir, attachment_dir):
 
  dir = os.path.join(page_dir,'attachments')
 
  if isdir(dir):
 
    attachments = listdir(dir) 
    for attachment in attachments:
 
      os.system ('cp "' + dir +'/' + attachment + '" "' + attachment_dir +'"')
 
 
def convert_page(page, file):
 
 
    namespace = ':'
    for i in range(0, len(file) - 1):
      namespace += file[i] + ':' 
 
    regexp = (
 
        ('\[\[TableOfContents.*\]\]', ''),            # remove
 
        ('\[\[BR\]\]$', ''),                        # newline at end of line - remove
 
        ('\[\[BR\]\]', '\n'),                       # newline
 
        ('#pragma section-numbers off', ''),        # remove
 
        ('^##.*?\\n', ''),                          # remove
 
        ('\[:(.*):',  '[[\\1]] '),                 # internal link
 
         ('\[\[(.*)/(.*)\]\]',  '[[\\1:\\2]]'),
        ('(\[\[.*\]\]).*\]', '\\1'),
 
 
        ('\[(http.*) .*\]', '[[\\1]]'),                  # web link
 
        ('\["/(.*)"\]', '[['+file[-1]+':\\1]]'),
        ('\{{3}', '<>code>'),                       # code open
 
        ('\}{3}', '<>/code>'),                      # code close
 
 
 
        ('^\s\s\s\s\*', '        *'),
        ('^\s\s\s\*', '      *'),
        ('^\s\s\*', '    *'),        
        ('^\s\*', '  *'),                           # lists must have not only but 2 whitespaces before *
 
        ('^\s\s\s\s1\.', '      -'),
        ('^\s\s1\.', '    -'),
	('^\s1\.', '  -'),
 
        ('^\s*=====\s*(.*)\s*=====\s*$', '=-=- \\1 =-=-'),           # heading 5
 
        ('^\s*====\s*(.*)\s*====\s*$', '=-=-=- \\1 =-=-=-'),        # heading 4
 
        ('^\s*===\s*(.*)\s*===\s*$', '=-=-=-=- \\1 =-=-=-=-'),       # heading 3
 
        ('^\s*==\s*(.*)\s*==\s*$', '=-=-=-=-=- \\1 =-=-=-=-=-'),     # heading 2
 
        ('^\s*=\s*(.*)\s=\s*$', '=-=-=-=-=-=- \\1 =-=-=-=-=-=-'),   # heading 1
 
        ('=-', '='),
 
        ('\|{2}', '|'),                             # table separator
 
        ('\'{5}(.*)\'{5}', '**//\\1//**'),          # bold and italic
 
        ('\'{3}(.*)\'{3}', '**\\1**'),              # bold
 
        ('\'{2}(.*)\'{2}', '//\\1//'),              # italic
 
        ('(?<!\[)(\b[A-Z]+[a-z]+[A-Z][A-Za-z]*\b)','[[\\1]]'),  # CamelCase, dont change if CamelCase is in InternalLink
 
        ('\[\[Date\(([\d]{4}-[\d]{2}-[\d]{2}T[\d]{2}:[\d]{2}:[\d]{2}Z)\)\]\]', '\\1'),  # Date value
 
        ('attachment:(.*)','{{'+namespace+'\\1|}}')
    )
 
    for i in range(len(page)):
 
        line = page[i]
 
        for item in regexp:
 
            line = re.sub(item[0], item[1], line)
 
        page[i] = line
 
    return page
 
 
 
 
 
def print_help():
 
    print "Usage: moinconv.py <moinmoin pages directory> <output directory>"
 
    print "Convert MoinMoin pages to DokuWiki."
 
    sys.exit(0)
 
 
 
 
 
def print_parameter_error():
 
    print >> sys.stderr, 'Incorrect parameters! Use --help switch to learn more.'
 
    sys.exit(1)
 
 
 
 
 
if __name__ == '__main__':
 
    if len(sys.argv) > 1:
 
        if sys.argv[1] in ('-h', '--help'):
 
            print_help()
 
        elif len(sys.argv) > 2:
 
            moin_pages_dir = sys.argv[1]
 
            output_dir = sys.argv[2]
 
        else:
 
            print_parameter_error()
 
    else:
 
        print_parameter_error()
 
 
 
    check_dirs(moin_pages_dir, output_dir)
 
    print 'Input dir is: %s.' % moin_pages_dir
 
    print 'Output dir is: %s.' % output_dir
 
    print
 
 
 
    pages = get_page_names(moin_pages_dir)
 
    for page in pages:
 
        curr_rev = get_current_revision(page)
 
        if os.path.exists(curr_rev):
            page_name = basename(page).lower()
 
            curr_rev_desc = file(curr_rev, 'r')
 
            curr_rev_content = curr_rev_desc.readlines()
 
            curr_rev_desc.close()
 
 
 
 
 
 
 
 
 
            if 'moineditorbackup' not in page_name: #dont convert backups
 
 
              page_name = page_name.replace('(2d)', '-') 
              page_name = page_name.replace('(c3bc)', 'ue')
              page_name = page_name.replace('(c384)', 'Ae')
              page_name = page_name.replace('(c3a4)', 'ae')
              page_name = page_name.replace('(c3b6)', 'oe')
 
              split = page_name.split('(2f)') # namespaces
 
              count = len(split)
 
 
 
              dateiname = split[-1]
 
              dir = output_dir
              attachment_dir = output_dir + '../media/'
              if count == 1:
                dir += 'unsorted'
                if not isdir (dir):
 
                  os.mkdir(dir)
                attachment_dir += 'unsorted/'
                if not isdir (attachment_dir):
                  os.mkdir(attachment_dir)                
              for i in range(0, count - 1):
 
                dir += split[i] + '/'
 
                if not isdir (dir):
 
                  os.mkdir(dir)
                attachment_dir += split[i] + '/'
                if not isdir (attachment_dir):
                  os.mkdir(attachment_dir)                
              if count == 1:
                str = 'unsorted/' + page_name
                split = str.split('/')
                curr_rev_content = convert_page(curr_rev_content, split)
              else:
                curr_rev_content = convert_page(curr_rev_content, split)
 
 
              out_file = os.path.join(dir, dateiname + '.txt')
 
              out_desc = file(out_file, 'w')
 
              out_desc.writelines([it.rstrip() + '\n' for it in curr_rev_content if it])
 
              out_desc.close()
              copy_attachments(page, attachment_dir)

Another Python Script

Here is another “improved” python script, based on the one above (convert.py). Remove the >'s in “<>code” and “<>/code” in the convert_page function. See file header for more info.

moin2doku.py

#!/usr/bin/python
#
# moin2doku.py
#
# A script for converting MoinMoin version 1.3+ wiki data to DokuWiki format.
# Call with the name of the directory containing the MoinMoin pages and that
# of the directory to receive the DokuWiki pages on the command line:
#
# python moin2doku.py ./moin/data/pages/ ./doku/
# 
# then move the doku pages to e.g. /var/www/MyWikiName/data/pages/,
# move the media files to e.g. /var/www/MyWikiName/data/media/,
# set ownership: chown -R www-data:www-data /var/www/MyWikiName/data/pages/*
# chown -R www-data:www-data /var/www/MyWikiName/data/media/*
#
# This script doesn't do all the work, and some of the work it does is
# wrong. For instance attachment links end up with the trailing "|}}"
# on the line following the link. This works, but doesn't look good.
# The script interprets a "/" in a pagename as a namespace delimiter and
# creates and fills namespace subdirectories accordingly.
#
# version 0.1  02.2010  Slim Gaillard, based on the "extended python"
#                       convert.py script here:
#                       http://www.dokuwiki.org/tips:moinmoin2doku
#
import sys, os, os.path, re, pdb
from os import listdir
from os.path import isdir, basename
 
def check_dirs(moin_pages_dir, output_dir):
 
    if not isdir(moin_pages_dir):
        print >> sys.stderr, "MoinMoin pages directory doesn't exist!"
        sys.exit(1)
 
    if not isdir(output_dir):
        print >> sys.stderr, "Output directory doesn't exist!"
        sys.exit(1)
 
 
def get_path_names(moin_pages_dir):
 
    items = listdir(moin_pages_dir)
    pathnames = []
 
    for item in items:
        item = os.path.join(moin_pages_dir, item)
        if isdir(item):
            pathnames.append(item)
 
    return pathnames
 
 
def get_current_revision(page_dir):
 
    rev_dir = os.path.join(page_dir, 'revisions')
 
    if isdir(rev_dir):
        revisions = listdir(rev_dir)
        revisions.sort()
        return os.path.join(rev_dir, revisions[-1])
 
    return ''
 
 
def copy_attachments(page_dir, attachment_dir):
 
  dir = os.path.join(page_dir,'attachments')
 
  if isdir(dir):
    attachments = listdir(dir) 
    #pdb.set_trace()
    for attachment in attachments:
      cmd_string = 'cp "' + dir +'/' + attachment + '" "' + attachment_dir + attachment.lower() + '"'
      os.system ( cmd_string )
 
 
def convert_page(page, file):
 
    namespace = ':'
    for i in range(0, len(file) - 1):
      namespace += file[i] + ':' 
 
    regexp = (
        ('\[\[TableOfContents.*\]\]', ''),          # remove
        ('\[\[BR\]\]$', ''),                        # newline at end of line - remove
        ('\[\[BR\]\]', '\n'),                       # newline
        ('#pragma section-numbers off', ''),        # remove
        ('^##.*?\\n', ''),                          # remove
        ('\["', '[['),                              # internal link open
        ('"\]', ']]'),                              # internal link close
        #('\[:(.*):',  '[[\\1]] '),                 # original internal link expressions
        #('\[\[(.*)/(.*)\]\]',  '[[\\1:\\2]]'),
        #('(\[\[.*\]\]).*\]', '\\1'),
        ('\[(http.*) .*\]', '[[\\1]]'),             # web link
        ('\["/(.*)"\]', '[['+file[-1]+':\\1]]'),
        ('\{{3}', '<>code>'),                        # code open
        ('\}{3}', '<>/code>'),                       # code close
        ('^\s\s\s\s\*', '        *'),
        ('^\s\s\s\*', '      *'),
        ('^\s\s\*', '    *'),        
        ('^\s\*', '  *'),                           # lists must have 2 whitespaces before the asterisk
        ('^\s\s\s\s1\.', '      -'),
        ('^\s\s1\.', '    -'),
        ('^\s1\.', '  -'),
        ('^\s*=====\s*(.*)\s*=====\s*$', '=-=- \\1 =-=-'),           # heading 5
        ('^\s*====\s*(.*)\s*====\s*$', '=-=-=- \\1 =-=-=-'),         # heading 4
        ('^\s*===\s*(.*)\s*===\s*$', '=-=-=-=- \\1 =-=-=-=-'),       # heading 3
        ('^\s*==\s*(.*)\s*==\s*$', '=-=-=-=-=- \\1 =-=-=-=-=-'),     # heading 2
        ('^\s*=\s*(.*)\s=\s*$', '=-=-=-=-=-=- \\1 =-=-=-=-=-=-'),    # heading 1
        ('=-', '='),
        ('\|{2}', '|'),                             # table separator
        ('\'{5}(.*)\'{5}', '**//\\1//**'),          # bold and italic
        ('\'{3}(.*)\'{3}', '**\\1**'),              # bold
        ('\'{2}(.*)\'{2}', '//\\1//'),              # italic
        ('(?<!\[)(\b[A-Z]+[a-z]+[A-Z][A-Za-z]*\b)','[[\\1]]'),  # CamelCase, dont change if CamelCase is in InternalLink
        ('\[\[Date\(([\d]{4}-[\d]{2}-[\d]{2}T[\d]{2}:[\d]{2}:[\d]{2}Z)\)\]\]', '\\1'),  # Date value
        ('attachment:(.*)','{{'+namespace+'\\1|}}')
    )
 
    for i in range(len(page)):
        line = page[i]
        for item in regexp:
            line = re.sub(item[0], item[1], line)
        page[i] = line
    return page
 
def print_help():
    print "Usage: moinconv.py <moinmoin pages directory> <output directory>"
    print "Convert MoinMoin pages to DokuWiki."
    sys.exit(0)
 
def print_parameter_error():
    print >> sys.stderr, 'Incorrect parameters! Use --help switch to learn more.'
    sys.exit(1)
 
def fix_name( filename ):
    filename = filename.lower()
    filename = filename.replace('(2d)', '-')          # hyphen
    filename = filename.replace('(20)', '_')          # space->underscore
    filename = filename.replace('(2e)', '_')          # decimal point->underscore
    filename = filename.replace('(29)', '_')          # )->underscore
    filename = filename.replace('(28)', '_')          # (->underscore    
    filename = filename.replace('.', '_')             # decimal point->underscore
    filename = filename.replace('(2c20)', '_')        # comma + space->underscore
    filename = filename.replace('(2028)', '_')        # space + (->underscore
    filename = filename.replace('(2920)', '_')        # ) + space->underscore
    filename = filename.replace('(2220)', 'inch_')    # " + space->inch + underscore
    filename = filename.replace('(3a20)', '_')        # : + space->underscore
    filename = filename.replace('(202827)', '_')      # space+(+'->underscore
    filename = filename.replace('(2720)', '_')        # '+ space->underscore
    filename = filename.replace('(c3bc)', 'ue')       # umlaut
    filename = filename.replace('(c384)', 'Ae')       # umlaut
    filename = filename.replace('(c3a4)', 'ae')       # umlaut
    filename = filename.replace('(c3b6)', 'oe')       # umlaut
    return filename
 
#
# "main" starts here
#
if len(sys.argv) > 1:
    if sys.argv[1] in ('-h', '--help'):
        print_help()
    elif len(sys.argv) > 2:
        moin_pages_dir = sys.argv[1]
        output_dir = sys.argv[2]
    else:
        print_parameter_error()
else:
    print_parameter_error()
 
check_dirs(moin_pages_dir, output_dir)
 
print 'Input dir is: %s.' % moin_pages_dir
print 'Output dir is: %s.' % output_dir
 
pathnames = get_path_names(moin_pages_dir)
 
for pathname in pathnames:
    #pdb.set_trace() # start debugging here
 
    curr_rev = get_current_revision( pathname )
    if not os.path.exists( curr_rev ) : continue
 
    page_name = basename(pathname)
    if page_name.count('MoinEditorBackup') > 0 : continue # don't convert backups
 
    curr_rev_desc = file(curr_rev, 'r')
    curr_rev_content = curr_rev_desc.readlines()
    curr_rev_desc.close()
 
    page_name = fix_name( page_name )
 
    split = page_name.split('(2f)') # namespaces
 
    count = len(split)
 
    dateiname = split[-1]
 
    dir = output_dir
    # changed from attachment_dir = output_dir + '../media/':
    attachment_dir = output_dir + 'media/'
    if not isdir (attachment_dir):
      os.mkdir(attachment_dir)
 
    if count == 1:
      dir += 'unsorted'
      if not isdir (dir):
        os.mkdir(dir)
 
      attachment_dir += 'unsorted/'
      if not isdir (attachment_dir):
        os.mkdir(attachment_dir)
 
    for i in range(0, count - 1):
 
      dir += split[i] + '/'
      if not isdir (dir):
        os.mkdir(dir)
 
      attachment_dir += split[i] + '/'
      if not isdir (attachment_dir):
        os.mkdir(attachment_dir)
 
    if count == 1:
      str = 'unsorted/' + page_name
      split = str.split('/')
      curr_rev_content = convert_page(curr_rev_content, split)
    else:
      curr_rev_content = convert_page(curr_rev_content, split)
 
    out_file = os.path.join(dir, dateiname + '.txt')
    out_desc = file(out_file, 'w')
    out_desc.writelines([it.rstrip() + '\n' for it in curr_rev_content if it])
    out_desc.close()
 
    # pdb.set_trace() # start debugging here
    copy_attachments(pathname, attachment_dir)

Perl

I've written more powerful conversion script, now it converts correctly (as I think ) all syntax from http://moinmo.in/HelpOnEditing except tables (now it doesn't convert aligning and spans). You can get latest version here, just copy all code from codeblock and replace <!/code> with </code>.

Discussion

Why did you switch from MoinMoin to DokuWiki? Just curious, I'm debating between the two and MoinMoin's WYSIWYG editor is very nice, and big sites like fedoraproject.org and ubuntu.com are using MoinMoin. - posted on 1/16/2006
Because MoinMoin is not as stable as it looks like? You know the Ubuntuusers Wiki-case? - posted on 04/26/2007
I've add Perl script which convert all syntax from http://moinmo.in/HelpOnEditing. Please, report me all errors if you found them.

Table of Contents