This is an old revision of the document!
Table of Contents
Migration from MoinMoin to DokuWiki
Below you will find scripts in PHP and Python to facilitate the conversion process. Before running them you must eliminate the leftmost “>” in <>/code> and <>code> in the Python convert_page functions, or remove the “»”'s in the $replace Array(… in PHP scripts.
Are there any parameters that need to be passed to the PHP script and how is that to be done? According to the code there should be three parameters. Passed through the URL? Syntax? Can anyone help?
Another document on switching appears at http://www.emilsit.net/blog/archives/migrating-from-moinmoin-to-dokuwiki/
PHP
I have written a small PHP script to convert wiki pages from MoinMoin http://moinmoin.wikiwikiweb.de/ to DokuWiki syntax. It does not take care of all differences, but it worked for me.
#!/usr/bin/php <?php //check comman line parameters if ($argc != 3 || in_array($argv[1], array('--help', '-help', '-h', '-?'))) { echo "\n Converts all files from given directory\n"; echo " from MoinMoin to DokuWiki syntax. NOT RECURSIV\n\n"; echo " Usage:\n"; echo " ".$argv[0]." <input dir> <output dir>\n\n"; } else { //get input and output directories $inDir = realpath($argv[1]) or die("input dir error"); $outDir = realpath($argv[2]) or die("output dir error"); //just print information echo "\nInput Directory: ".$inDir."\n"; echo "Output Directory: ".$outDir."\n\n"; //get all files from directory if (is_dir($inDir)) { $files = filesFromDir($inDir); } //migrate each file foreach ($files As $file) { //convert filename $ofile = convFileNames($file); //just print information echo "Migrating from ".$inDir."/".$file." to ".$outDir."/".$ofile."\n"; //read input file $text = readFl($inDir."/".$file); //convert content $text = moin2doku($text); //encode in utf8 $text = utf8_encode($text); //write output file writeFl($outDir."/".$ofile, $text); } } function moin2doku($text) { /* like convFileNames and more * ToDo: [[Datestamp]] delete? * bold and italic, what goes wrong? * images * Problems with newline and [[BR]] * CamelCase in Heading: it will be converted * Moin handles code sections without closing }}} right, DokuWiki does not */ //line by line $lines = explode("\n", $text); foreach($lines As $line) { //start converting $find = Array( '/\[\[TableOfContents\]\]/', //remove '/\[\[BR\]\]$/', //newline at end of line - remove '/\[\[BR\]\]/', //newline '/#pragma section-numbers off/', //remove '/\["(.*)"\]/', //internal link '/(\[http.*\])/', //web link '/\{{3}/', //code open '/\}{3}/', //code close '/^\s\*/', //lists must have not only but 2 whitespaces before * '/={5}(\s.*\s)={5}$/', //heading 5 '/={4}(\s.*\s)={4}$/', //heading 4 '/={3}(\s.*\s)={3}$/', //heading 3 '/={2}(\s.*\s)={2}$/', //heading 2 '/={1}(\s.*\s)={1}$/', //heading 1 '/\|{2}/', //table separator '/\'{5}(.*)\'{5}/', //bold and italic '/\'{3}(.*)\'{3}/', //bold '/\'{2}(.*)\'{2}/', //italic '/(?<!\[)(\b[A-Z]+[a-z]+[A-Z][A-Za-z]*\b)/', //CamelCase, dont change if CamelCase is in InternalLink '/\[\[Date\(([\d]{4}-[\d]{2}-[\d]{2}T[\d]{2}:[\d]{2}:[\d]{2}Z)\)\]\]/' //Date value ); $replace = Array( '', //remove '', //newline remove '\\\\\ ', //newline '', //remove '[[${1}]]', //internal link '[${1}]', //web link '<>>code>', //code open - remove >>, its included for viewing in DokuWiki '<>>/code>', //code close - remove >>, its included for viewing in DokuWiki ' *', //lists must have 2 whitespaces before * '==${1}==', //heading 5 '===${1}===', //heading 4 '====${1}====', //heading 3 '=====${1}=====', //heading 2 '======${1}======', //heading 1 '|', //table separator '**//${1}//**', //bold and italic '**${1}**', //bold '//${1}//', //italic '[[${1}]]', //CamelCase '${1}' //Date value ); $line = preg_replace($find,$replace,$line); $ret = $ret.$line."\r\n"; } return $ret; } function convFileNames($name) { /* ö,ä,ü, ,. and more */ $find = Array('/_20/', '/_5f/', '/_2e/', '/_c4/', '/_f6/', '/_fc/', '/_26/', '/_2d/' ); $replace = Array('_', '_', '_', 'Ae', 'oe', 'ue', '_', '-' ); $name = preg_replace($find,$replace,$name); $name = strtolower($name); return $name.".txt"; } function filesFromDir($dir) { $files = Array(); $handle=opendir($dir); while ($file = readdir ($handle)) { if ($file != "." && $file != ".." && !is_dir($dir."/".$file)) { array_push($files, $file); } } closedir($handle); return $files; } function readFl($file) { $fr = fopen($file,"r"); if ($fr) { while(!feof($fr)) { $text = $text.fgets($fr); } fclose($fr); } return $text; } function writeFl($file, $text) { $fw = fopen($file, "w"); if ($fw) { fwrite($fw, $text); } fclose($fw); } ?>
Python
Based on the above two I've written a python script that automates the file renaming, copying and conversion business. Worked for me on windows.
import sys, os, os.path import re from os import listdir from os.path import isdir, basename def check_dirs(moin_pages_dir, output_dir): if not isdir(moin_pages_dir): print >> sys.stderr, "MoinMoin pages directory doesn't exist!" sys.exit(1) if not isdir(output_dir): print >> sys.stderr, "Output directory doesn't exist!" sys.exit(1) def get_page_names(moin_pages_dir): items = listdir(moin_pages_dir) pages = [] for item in items: item = os.path.join(moin_pages_dir, item) if isdir(item): pages.append(item) return pages def get_current_revision(page_dir): rev_dir = os.path.join(page_dir, 'revisions') revisions = listdir(rev_dir) revisions.sort() return os.path.join(rev_dir, revisions[-1]) def convert_page(page): regexp = ( ('\[\[TableOfContents\]\]', ''), # remove ('\[\[BR\]\]$', ''), # newline at end of line - remove ('\[\[BR\]\]', '\n'), # newline ('#pragma section-numbers off', ''), # remove ('^##.*?\\n', ''), # remove ('\["(.*)"\]', '[[\\1]]'), # internal link ('(\[http.*\])', '[\\1]'), # web link ('\{{3}', '<>code>'), # code open ('\}{3}', '<>/code>'), # code close ('^\s\*', ' *'), # lists must have not only but 2 whitespaces before * ('={5}(\s.*\s)={5}$', '==\\1=='), # heading 5 ('={4}(\s.*\s)={4}$', '===\\1}==='), # heading 4 ('={3}(\s.*\s)={3}$', '====\\1===='), # heading 3 ('={2}(\s.*\s)={2}$', '=====\\1====='), # heading 2 ('={1}(\s.*\s)={1}$', '======\\1======'), # heading 1 ('\|{2}', '|'), # table separator ('\'{5}(.*)\'{5}', '**//\\1//**'), # bold and italic ('\'{3}(.*)\'{3}', '**\\1**'), # bold ('\'{2}(.*)\'{2}', '//\\1//'), # italic ('(?<!\[)(\b[A-Z]+[a-z]+[A-Z][A-Za-z]*\b)','[[\\1]]'), # CamelCase, dont change if CamelCase is in InternalLink ('\[\[Date\(([\d]{4}-[\d]{2}-[\d]{2}T[\d]{2}:[\d]{2}:[\d]{2}Z)\)\]\]', '\\1') # Date value ) for i in range(len(page)): line = page[i] for item in regexp: line = re.sub(item[0], item[1], line) page[i] = line return page def print_help(): print "Usage: moinconv.py <moinmoin pages directory> <output directory>" print "Convert MoinMoin pages to DokuWiki." sys.exit(0) def print_parameter_error(): print >> sys.stderr, 'Incorrect parameters! Use --help switch to learn more.' sys.exit(1) if __name__ == '__main__': if len(sys.argv) > 1: if sys.argv[1] in ('-h', '--help'): print_help() elif len(sys.argv) > 2: moin_pages_dir = sys.argv[1] output_dir = sys.argv[2] else: print_parameter_error() else: print_parameter_error() check_dirs(moin_pages_dir, output_dir) print 'Input dir is: %s.' % moin_pages_dir print 'Output dir is: %s.' % output_dir print pages = get_page_names(moin_pages_dir) for page in pages: curr_rev = get_current_revision(page) curr_rev_desc = file(curr_rev, 'r') curr_rev_content = curr_rev_desc.readlines() curr_rev_desc.close() curr_rev_content = convert_page(curr_rev_content) page_name = basename(page).lower() out_file = os.path.join(output_dir, page_name + '.txt') out_desc = file(out_file, 'w') out_desc.writelines([it.rstrip() + '\n' for it in curr_rev_content if it]) out_desc.close() print 'Migrated %s to %s.' % (basename(page), basename(out_file))
Extended Python
I've extended above script to this.
- It fixes some bugs from above, moves attachments, convert attachment code, creates namespaces based on structure in MoinMoin-Wiki.
- Converts some Codes of German 'Umlaute'.
- Works in this version just on Linux, but should not be so difficult to convert to Windows.
- Remember to change '<>code>' and '<>/code>' to <code> and </code>.
Use:
- Download as convert.py
- chmod a+rwx convert.py
- ./convert.py <pages folder of MoinMoin-Wiki> <pages folder of DokuWiki>
#!/usr/bin/python import sys, os, os.path import re from os import listdir from os.path import isdir, basename def check_dirs(moin_pages_dir, output_dir): if not isdir(moin_pages_dir): print >> sys.stderr, "MoinMoin pages directory doesn't exist!" sys.exit(1) if not isdir(output_dir): print >> sys.stderr, "Output directory doesn't exist!" sys.exit(1) def get_page_names(moin_pages_dir): items = listdir(moin_pages_dir) pages = [] for item in items: item = os.path.join(moin_pages_dir, item) if isdir(item): pages.append(item) return pages def get_current_revision(page_dir): rev_dir = os.path.join(page_dir, 'revisions') if isdir(rev_dir): revisions = listdir(rev_dir) revisions.sort() return os.path.join(rev_dir, revisions[-1]) return '' def copy_attachments(page_dir, attachment_dir): dir = os.path.join(page_dir,'attachments') if isdir(dir): attachments = listdir(dir) for attachment in attachments: os.system ('cp "' + dir +'/' + attachment + '" "' + attachment_dir +'"') def convert_page(page, file): namespace = ':' for i in range(0, len(file) - 1): namespace += file[i] + ':' regexp = ( ('\[\[TableOfContents.*\]\]', ''), # remove ('\[\[BR\]\]$', ''), # newline at end of line - remove ('\[\[BR\]\]', '\n'), # newline ('#pragma section-numbers off', ''), # remove ('^##.*?\\n', ''), # remove ('\[:(.*):', '[[\\1]] '), # internal link ('\[\[(.*)/(.*)\]\]', '[[\\1:\\2]]'), ('(\[\[.*\]\]).*\]', '\\1'), ('\[(http.*) .*\]', '[[\\1]]'), # web link ('\["/(.*)"\]', '[['+file[-1]+':\\1]]'), ('\{{3}', '<>code>'), # code open ('\}{3}', '<>/code>'), # code close ('^\s\s\s\s\*', ' *'), ('^\s\s\s\*', ' *'), ('^\s\s\*', ' *'), ('^\s\*', ' *'), # lists must have not only but 2 whitespaces before * ('^\s\s\s\s1\.', ' -'), ('^\s\s1\.', ' -'), ('^\s1\.', ' -'), ('^\s*=====\s*(.*)\s*=====\s*$', '=-=- \\1 =-=-'), # heading 5 ('^\s*====\s*(.*)\s*====\s*$', '=-=-=- \\1 =-=-=-'), # heading 4 ('^\s*===\s*(.*)\s*===\s*$', '=-=-=-=- \\1 =-=-=-=-'), # heading 3 ('^\s*==\s*(.*)\s*==\s*$', '=-=-=-=-=- \\1 =-=-=-=-=-'), # heading 2 ('^\s*=\s*(.*)\s=\s*$', '=-=-=-=-=-=- \\1 =-=-=-=-=-=-'), # heading 1 ('=-', '='), ('\|{2}', '|'), # table separator ('\'{5}(.*)\'{5}', '**//\\1//**'), # bold and italic ('\'{3}(.*)\'{3}', '**\\1**'), # bold ('\'{2}(.*)\'{2}', '//\\1//'), # italic ('(?<!\[)(\b[A-Z]+[a-z]+[A-Z][A-Za-z]*\b)','[[\\1]]'), # CamelCase, dont change if CamelCase is in InternalLink ('\[\[Date\(([\d]{4}-[\d]{2}-[\d]{2}T[\d]{2}:[\d]{2}:[\d]{2}Z)\)\]\]', '\\1'), # Date value ('attachment:(.*)','{{'+namespace+'\\1|}}') ) for i in range(len(page)): line = page[i] for item in regexp: line = re.sub(item[0], item[1], line) page[i] = line return page def print_help(): print "Usage: moinconv.py <moinmoin pages directory> <output directory>" print "Convert MoinMoin pages to DokuWiki." sys.exit(0) def print_parameter_error(): print >> sys.stderr, 'Incorrect parameters! Use --help switch to learn more.' sys.exit(1) if __name__ == '__main__': if len(sys.argv) > 1: if sys.argv[1] in ('-h', '--help'): print_help() elif len(sys.argv) > 2: moin_pages_dir = sys.argv[1] output_dir = sys.argv[2] else: print_parameter_error() else: print_parameter_error() check_dirs(moin_pages_dir, output_dir) print 'Input dir is: %s.' % moin_pages_dir print 'Output dir is: %s.' % output_dir print pages = get_page_names(moin_pages_dir) for page in pages: curr_rev = get_current_revision(page) if os.path.exists(curr_rev): page_name = basename(page).lower() curr_rev_desc = file(curr_rev, 'r') curr_rev_content = curr_rev_desc.readlines() curr_rev_desc.close() if 'moineditorbackup' not in page_name: #dont convert backups page_name = page_name.replace('(2d)', '-') page_name = page_name.replace('(c3bc)', 'ue') page_name = page_name.replace('(c384)', 'Ae') page_name = page_name.replace('(c3a4)', 'ae') page_name = page_name.replace('(c3b6)', 'oe') split = page_name.split('(2f)') # namespaces count = len(split) dateiname = split[-1] dir = output_dir attachment_dir = output_dir + '../media/' if count == 1: dir += 'unsorted' if not isdir (dir): os.mkdir(dir) attachment_dir += 'unsorted/' if not isdir (attachment_dir): os.mkdir(attachment_dir) for i in range(0, count - 1): dir += split[i] + '/' if not isdir (dir): os.mkdir(dir) attachment_dir += split[i] + '/' if not isdir (attachment_dir): os.mkdir(attachment_dir) if count == 1: str = 'unsorted/' + page_name split = str.split('/') curr_rev_content = convert_page(curr_rev_content, split) else: curr_rev_content = convert_page(curr_rev_content, split) out_file = os.path.join(dir, dateiname + '.txt') out_desc = file(out_file, 'w') out_desc.writelines([it.rstrip() + '\n' for it in curr_rev_content if it]) out_desc.close() copy_attachments(page, attachment_dir)
Another Python Script
Here is another “improved” python script, based on the one above (convert.py). Remove the >'s in “<>code” and “<>/code” in the convert_page function. See file header for more info.
- moin2doku.py
#!/usr/bin/python # # moin2doku.py # # A script for converting MoinMoin version 1.3+ wiki data to DokuWiki format. # Call with the name of the directory containing the MoinMoin pages and that # of the directory to receive the DokuWiki pages on the command line: # # python moin2doku.py ./moin/data/pages/ ./doku/ # # then move the doku pages to e.g. /var/www/MyWikiName/data/pages/, # move the media files to e.g. /var/www/MyWikiName/data/media/, # set ownership: chown -R www-data:www-data /var/www/MyWikiName/data/pages/* # chown -R www-data:www-data /var/www/MyWikiName/data/media/* # # This script doesn't do all the work, and some of the work it does is # wrong. For instance attachment links end up with the trailing "|}}" # on the line following the link. This works, but doesn't look good. # The script interprets a "/" in a pagename as a namespace delimiter and # creates and fills namespace subdirectories accordingly. # # version 0.1 02.2010 Slim Gaillard, based on the "extended python" # convert.py script here: # http://www.dokuwiki.org/tips:moinmoin2doku # import sys, os, os.path, re, pdb from os import listdir from os.path import isdir, basename def check_dirs(moin_pages_dir, output_dir): if not isdir(moin_pages_dir): print >> sys.stderr, "MoinMoin pages directory doesn't exist!" sys.exit(1) if not isdir(output_dir): print >> sys.stderr, "Output directory doesn't exist!" sys.exit(1) def get_path_names(moin_pages_dir): items = listdir(moin_pages_dir) pathnames = [] for item in items: item = os.path.join(moin_pages_dir, item) if isdir(item): pathnames.append(item) return pathnames def get_current_revision(page_dir): rev_dir = os.path.join(page_dir, 'revisions') if isdir(rev_dir): revisions = listdir(rev_dir) revisions.sort() return os.path.join(rev_dir, revisions[-1]) return '' def copy_attachments(page_dir, attachment_dir): dir = os.path.join(page_dir,'attachments') if isdir(dir): attachments = listdir(dir) #pdb.set_trace() for attachment in attachments: cmd_string = 'cp "' + dir +'/' + attachment + '" "' + attachment_dir + attachment.lower() + '"' os.system ( cmd_string ) def convert_page(page, file): namespace = ':' for i in range(0, len(file) - 1): namespace += file[i] + ':' regexp = ( ('\[\[TableOfContents.*\]\]', ''), # remove ('\[\[BR\]\]$', ''), # newline at end of line - remove ('\[\[BR\]\]', '\n'), # newline ('#pragma section-numbers off', ''), # remove ('^##.*?\\n', ''), # remove ('\["', '[['), # internal link open ('"\]', ']]'), # internal link close #('\[:(.*):', '[[\\1]] '), # original internal link expressions #('\[\[(.*)/(.*)\]\]', '[[\\1:\\2]]'), #('(\[\[.*\]\]).*\]', '\\1'), ('\[(http.*) .*\]', '[[\\1]]'), # web link ('\["/(.*)"\]', '[['+file[-1]+':\\1]]'), ('\{{3}', '<>code>'), # code open ('\}{3}', '<>/code>'), # code close ('^\s\s\s\s\*', ' *'), ('^\s\s\s\*', ' *'), ('^\s\s\*', ' *'), ('^\s\*', ' *'), # lists must have 2 whitespaces before the asterisk ('^\s\s\s\s1\.', ' -'), ('^\s\s1\.', ' -'), ('^\s1\.', ' -'), ('^\s*=====\s*(.*)\s*=====\s*$', '=-=- \\1 =-=-'), # heading 5 ('^\s*====\s*(.*)\s*====\s*$', '=-=-=- \\1 =-=-=-'), # heading 4 ('^\s*===\s*(.*)\s*===\s*$', '=-=-=-=- \\1 =-=-=-=-'), # heading 3 ('^\s*==\s*(.*)\s*==\s*$', '=-=-=-=-=- \\1 =-=-=-=-=-'), # heading 2 ('^\s*=\s*(.*)\s=\s*$', '=-=-=-=-=-=- \\1 =-=-=-=-=-=-'), # heading 1 ('=-', '='), ('\|{2}', '|'), # table separator ('\'{5}(.*)\'{5}', '**//\\1//**'), # bold and italic ('\'{3}(.*)\'{3}', '**\\1**'), # bold ('\'{2}(.*)\'{2}', '//\\1//'), # italic ('(?<!\[)(\b[A-Z]+[a-z]+[A-Z][A-Za-z]*\b)','[[\\1]]'), # CamelCase, dont change if CamelCase is in InternalLink ('\[\[Date\(([\d]{4}-[\d]{2}-[\d]{2}T[\d]{2}:[\d]{2}:[\d]{2}Z)\)\]\]', '\\1'), # Date value ('attachment:(.*)','{{'+namespace+'\\1|}}') ) for i in range(len(page)): line = page[i] for item in regexp: line = re.sub(item[0], item[1], line) page[i] = line return page def print_help(): print "Usage: moinconv.py <moinmoin pages directory> <output directory>" print "Convert MoinMoin pages to DokuWiki." sys.exit(0) def print_parameter_error(): print >> sys.stderr, 'Incorrect parameters! Use --help switch to learn more.' sys.exit(1) def fix_name( filename ): filename = filename.lower() filename = filename.replace('(2d)', '-') # hyphen filename = filename.replace('(20)', '_') # space->underscore filename = filename.replace('(2e)', '_') # decimal point->underscore filename = filename.replace('(29)', '_') # )->underscore filename = filename.replace('(28)', '_') # (->underscore filename = filename.replace('.', '_') # decimal point->underscore filename = filename.replace('(2c20)', '_') # comma + space->underscore filename = filename.replace('(2028)', '_') # space + (->underscore filename = filename.replace('(2920)', '_') # ) + space->underscore filename = filename.replace('(2220)', 'inch_') # " + space->inch + underscore filename = filename.replace('(3a20)', '_') # : + space->underscore filename = filename.replace('(202827)', '_') # space+(+'->underscore filename = filename.replace('(2720)', '_') # '+ space->underscore filename = filename.replace('(c3bc)', 'ue') # umlaut filename = filename.replace('(c384)', 'Ae') # umlaut filename = filename.replace('(c3a4)', 'ae') # umlaut filename = filename.replace('(c3b6)', 'oe') # umlaut return filename # # "main" starts here # if len(sys.argv) > 1: if sys.argv[1] in ('-h', '--help'): print_help() elif len(sys.argv) > 2: moin_pages_dir = sys.argv[1] output_dir = sys.argv[2] else: print_parameter_error() else: print_parameter_error() check_dirs(moin_pages_dir, output_dir) print 'Input dir is: %s.' % moin_pages_dir print 'Output dir is: %s.' % output_dir pathnames = get_path_names(moin_pages_dir) for pathname in pathnames: #pdb.set_trace() # start debugging here curr_rev = get_current_revision( pathname ) if not os.path.exists( curr_rev ) : continue page_name = basename(pathname) if page_name.count('MoinEditorBackup') > 0 : continue # don't convert backups curr_rev_desc = file(curr_rev, 'r') curr_rev_content = curr_rev_desc.readlines() curr_rev_desc.close() page_name = fix_name( page_name ) split = page_name.split('(2f)') # namespaces count = len(split) dateiname = split[-1] dir = output_dir # changed from attachment_dir = output_dir + '../media/': attachment_dir = output_dir + 'media/' if not isdir (attachment_dir): os.mkdir(attachment_dir) if count == 1: dir += 'unsorted' if not isdir (dir): os.mkdir(dir) attachment_dir += 'unsorted/' if not isdir (attachment_dir): os.mkdir(attachment_dir) for i in range(0, count - 1): dir += split[i] + '/' if not isdir (dir): os.mkdir(dir) attachment_dir += split[i] + '/' if not isdir (attachment_dir): os.mkdir(attachment_dir) if count == 1: str = 'unsorted/' + page_name split = str.split('/') curr_rev_content = convert_page(curr_rev_content, split) else: curr_rev_content = convert_page(curr_rev_content, split) out_file = os.path.join(dir, dateiname + '.txt') out_desc = file(out_file, 'w') out_desc.writelines([it.rstrip() + '\n' for it in curr_rev_content if it]) out_desc.close() # pdb.set_trace() # start debugging here copy_attachments(pathname, attachment_dir)
Perl
I've written more powerful conversion script, now it converts correctly (as I think ) all syntax from http://moinmo.in/HelpOnEditing except tables (now it doesn't convert aligning and spans). You can get latest version here, just copy all code from codeblock and replace <!/code> with </code>.
Discussion
Why did you switch from MoinMoin to DokuWiki? Just curious, I'm debating between the two and MoinMoin's WYSIWYG editor is very nice, and big sites like fedoraproject.org and ubuntu.com are using MoinMoin. - posted on 1/16/2006Because MoinMoin is not as stable as it looks like? You know the Ubuntuusers Wiki-case? - posted on 04/26/2007I've add Perl script which convert all syntax from http://moinmo.in/HelpOnEditing. Please, report me all errors if you found them.