DokuWiki

It's better when it's simple

User Tools

Site Tools


tips:export_html

Export multiple pages to HTML

Export your page(s) by:

  • Siteexport Plugin (only for open wiki)
  • or one of the scripts below

For exporting multiple pages or whole namespaces have a look at the offline-doku script by Pavel Shevaev.

Unfortunately, offline-DokuWiki does not handle plugin content correctly. Does anyone knows a solution for that?
Pavel's script requires PHP >4.3. for those who don't want to upgrade change line ~46
from
$tokens = $parser->parse(file_get_contents($file));
to
$fp = fopen($file, "rb");
$buffer = fread($fp, filesize($file));
fclose($fp);
$tokens = $parser->parse($buffer);

Here's an example command line using Pavuk for exporting all pages:

pavuk -dont_leave_site -noRobots -index_name "index.html" -httpad "+X_DOKUWIKI_DO: export_xhtml" -cookie_file cookies.txt -cookie_send -skip_rpattern "(.*\?do=(diff|revisions|backlink|index|export_.*))|feed\.php.*" -tr_chr_chr "?&*:" _ -post_update -fnrules F "*" "%h/%d/%b%E" http://www.dokuwiki.org

Simply change the URL at the end of the command. Also, this command handles ACL restrictions using a cookie file. Copy the “cookies.txt” file from your web browser's profile to allow the script to login using your credentials.

Here's a quick-and-dirty bash script to export all pages using the export_xhtmlbody option (see above)

#!/bin/bash
 
#DokuWiki Export 0.1 - by Venator85 (venator85[at]gmail[dot]com)
#Warning: DokuWiki's URL rewrite must be turned OFF for this to work, otherwise change line 27 accordingly
 
#USAGE:
# Save this script in an empty dir and run it from a shell:
# sh whatever.sh
 
FTP_DOKU_PATH="ftp://ftp.wesavetheworld.com/dokuwiki" # No trailing slashes!
FTPUSER="albert_einstein"
FTPPASS="emc2"
 
HTTP_DOKU_PATH="http://www.wesavetheworld.com/dokuwiki" # No trailing slashes!
 
wget --ftp-user=$FTPUSER --ftp-password=$FTPPASS --recursive --no-host-directories --cut-dirs=2 "$FTP_DOKU_PATH/data/pages/"
 
SLASH='/'
COLON=':'
mkdir "./exported"
for i in `find pages/ -type f`
do
	PAGE=${i#"pages/"}
	PAGE=${PAGE%".txt"}
	PAGE=${PAGE//$SLASH/$COLON}
 
	wget -O - "$HTTP_DOKU_PATH/doku.php?do=export_xhtmlbody&id=$PAGE" > "./exported/$PAGE.htm"
done

And not so dirty PHP-CLI script to do the same without problems. But you need commandline access to server (e.g. SSH) and PHP-CLI installed to do this. Copy this script to dokuwiki root, cd to that directory and launch /usr/bin/php export.php (or whatever you called the file) → it will show you help.

export.php
#!/usr/bin/php
<?php //DokuWiki exporter - copylefted by Harvie.cz 2o1o (copy this file to dokuwiki root)
if(!defined('DOKU_INC')) define('DOKU_INC',dirname(__FILE__).'/');
require_once(DOKU_INC.'inc/init.php');
require_once(DOKU_INC.'inc/common.php');
require_once(DOKU_INC.'inc/events.php');
require_once(DOKU_INC.'inc/parserutils.php');
require_once(DOKU_INC.'inc/auth.php');
 
function p_file_xhtml($id, $excuse=false){
    if(@file_exists($id)) return p_cached_output($id,'xhtml',$id);
    return p_wiki_xhtml($id, '', $excuse);
}
 
if($argc > 1) {
  array_shift($argv);
  foreach($argv as $file) echo p_file_xhtml($file, false);
} else { 
  if(!isset($argv[0])) $argv[0] = __FILE__;
  echo "<h1>This is NOT web application, this is PHP-CLI application (for commandline)</h1><pre>\n";
  echo "Note that you will probably need to install php-cgi package. Check if you have 'php' command on your system\n";
  echo "php-cgi binary is commonly placed in /usr/bin/php\n\n";
  echo "Usage examples:\n";
  echo "\tphp ".$argv[0]." start\n\t\t- export single page 'start'\n";
  echo "\tphp ".$argv[0]." start > start.html\n\t\t- export single page 'start' to file start.html\n";
  echo "\tphp ".$argv[0]." start wiki:syntax\n\t\t- export multiple pages\n";
  echo "\tphp ".$argv[0]." data/pages/start.txt\n\t\t- export single page using filename\n";
  echo "\tphp ".$argv[0]." data/pages/wiki/*\n\t\t- export whole namespace 'wiki'\n";
  echo "\tphp ".$argv[0]." $(find ./data/pages/wiki/)\n\t\t- export whole namespace 'wiki' and it's sub-namespaces\n";
  echo "\tphp ".$argv[0]." $(find ./data/pages/) > dump.html\n\t\t- dump whole wiki to file dump.html\n";
  echo "\nOnce you have HTML dump you need, you can add optional CSS styles or charset-encoding header to it,\n";
  echo "then you are ready to distribute it, or (eg.) convert it to PDF using htmldoc, OpenOffice.org or html2pdf webservice.\n\n";
}

BTW I've just realized that it's not secure to have this script in dokuwiki root with register globals = on, since somebody can set $argv to override ACL to read any page, so if you don't want to have all of your pages to be publicly readable then don't be stupid and disable register globals in php.ini or at least deny this file in .htaccess if you can't!

And if you want to export each page to single file instead of one big, you can use some command like this:

mkdir dump; for i in $(ls -1 data/pages/); do php export.php data/pages/$i > dump/$i.html; done

but maybe you will appreciate siteexport more to do this…

→ see also related page offline-dokuwiki.sh

tips/export_html.txt · Last modified: 2013-11-20 16:30 by Klap-in

Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Share Alike 4.0 International
CC Attribution-Share Alike 4.0 International Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki