====== How to convert docs to DokuWiki ====== I was just googleing a little to some conversion tools. Hopefully i meet this : http://www.linux.com/articles/61713 ====== Main goal : Magic conversion in bureaucratic environment ====== \*.doc -> *.html ---> *.txt ((wiki syntax)) To do this here the main schema in use : ==== Step 0 | Preparing the environment ==== === Dependencies : === * Linux ! * OpenOffice.org: http://www.openoffice.org/ * Java and JODConverter by: http://www.artofsolving.com * Perl and WikiConverter module: http://search.cpan.org/src/DIBERRI/HTML-WikiConverter-0.61/ * Apache, PHP, and a DokuWiki out of the box ... or MediaWiki.. * Optional, the extension FCKW for DokuWiki: [[plugin:fckw]] === Code needed === three files : - The main bash script : oocwiki.sh [[doc_to_wiki_syntax#oocwiki.sh|The code]]. - The cleaning bash script : cleanfolder.sh [[doc_to_wiki_syntax#cleanfolder.sh |The code]]. - The renaming / auto loop conversion Perl script : oocwiki.pl [[doc_to_wiki_syntax#oocwiki.pl |The code]]. Copy this code and create the files needed in a folder of your computer. === Folders : === Create your folder with your bunch of Ms Word files : Ms World environment : ENWOLRD=/home/massou/Documents/oldies/ and write on the bash script the parameters for others folders and files we need : Temp folder : TMPOOCWIKI=/tmp/oocwiki/ JODConverter folder ; JODCON=/home/massou/Documents/perl/jodconverter-2.2.1/lib/jodconverter-cli-2.2.1.jar DokuWiki transfert folder : OUTWIKI=/srv/www/htdocs/dokuwiki/data/pages/outdoc/ OUTMEDIA=/srv/www/htdocs/dokuwiki/data/media/outdoc/ and use this bash ==oocwiki.sh== #!/bin/bash # script oocwiki.sh # # sh oocwiki.sh /home/massou/Documents/oldies/ /tmp/oociKi/ # This script makes a backup of my home directory. # Change the values of the variables to make the script work for you: ENWOLRD=/home/massou/Documents/oldies/ TMPOOCWIKI=/tmp/oocwiki/ JODCON=/home/massou/Documents/perl/jodconverter-2.2.1/lib/jodconverter-cli-2.2.1.jar OUTWIKI=/srv/www/htdocs/dokuwiki/data/pages/outdoc/ OUTMEDIA=/srv/www/htdocs/dokuwiki/data/media/outdoc/ if [ $(whoami) != 'root' ]; then echo "Must be root to run $0" exit 1; fi # if [ -z $1 ]; then # echo "Usage: $0 " # exit 1 # fi parameters=($ENWOLRD $TMPOOCWIKI $OUTWIKI $OUTMEDIA) ## is parameters ok ? for i in ${parameters[@]}; do if [ ! -e "${i}" ]; then echo "${i} don't exist" mkdir ${i} echo "${i} resolved" elif [ -f "${i}" ]; then echo "${i} est un fichier" elif [ -d "$1" ]; then echo "${i} sembre prêt" fi done if [ ! -e "$JODCON" ]; then echo "$JODCON n'existe pas" exit 1; elif [ -f "$JODCON" ]; then echo "$JODCON is ready" fi pgrep soffice retval=$? if [ "$retval" = 1 ] then echo "soffice n'a pas l'air de fonctionner..." soffice -headless -accept="socket,host=127.0.0.1,port=8100;urp;" -nofirststartwizard & fi ###cleaning and copy parameters=($TMPOOCWIKI $OUTWIKI $OUTMEDIA) ## is parameters ok ? for i in ${parameters[@]}; do if [ -e "${i}" ]; then echo "${i} don't exist" rm -R ${i} mkdir ${i} echo "${i} resolved" fi done cp -R $ENWOLRD/* $TMPOOCWIKI ################### Step1 Some cleaning ################## sh ./cleanfolder.sh $TMPOOCWIKI ######################### Step 2-3 Time of perl ################# perl oocwiki.pl $TMPOOCWIKI $JODCON ######################### Step 4 Copy of the files ################# cp -R $TMPOOCWIKI/* $OUTWIKI cp -R $TMPOOCWIKI/* $OUTMEDIA ########### Step 5 time for ACL ######### parameters=($OUTWIKI $OUTMEDIA) ## is parameters ok ? for i in ${parameters[@]}; do chown -R wwwrun ${i} chgrp -R www ${i} chmod -R 775 ${i} done ==== Step 1 | cleaning the Ms Word environment :==== /////*.doc Bash or Perl script for renaming folder / under folder / file name from Windows file system to more simply Unix-like syntax ==cleanfolder.sh== #!/bin/bash # file cleanfolder.sh # Convert filenames to lowercase # and replace characters recursively ##################################### if [ -z $1 ];then echo Give target directory; exit 0;fi find "$1" -depth -name '*' | while read file ; do directory=$(dirname "$file") oldfilename=$(basename "$file") newfilename=$(echo "$oldfilename" | tr 'A-Z' 'a-z' | tr ' ' '_' | sed 's/_-_/-/g') if [ "$oldfilename" != "$newfilename" ]; then mv -i "$directory/$oldfilename" "$directory/$newfilename" echo ""$directory/$oldfilename" ---> "$directory/$newfilename"" #echo "$directory" #echo "$oldfilename" #echo "$newfilename" #echo fi done exit 0 ==== Step 2 : ==== lower_case/whithout_blank_space.doc ---> Soffice as a service + jodconverter ---> *.html ==oocwiki.pl== #!/usr/bin/perl -w $time = localtime; print "The time is now $time\n"; my $TMPOOCWIKI=$ARGV[0]."\n"; my $JODCON=$ARGV[1]."\n"; print $TMPOOCWIKI."\n"; print $JODCON."\n"; $chemin = $TMPOOCWIKI; $jod = $JODCON; chomp($chemin); chomp($jod); use File::Basename; use File::Find; find(\&Wanted, $chemin); sub Wanted { if ($File::Find::name =~ m/^$DocumentRoot(\/.*)?$/) { $fullname = $File::Find::name . "\n"; ($name,$path,$suffix) = fileparse($fullname,qr{\..*}); $suffix . "\n"; if ($suffix eq '.doc'){ # if ($suffix = "\.doc") { $name = fileparse($fullname); $basename = basename($fullname); $dir = dirname($fullname); $base2=lc($name); $base2 =~ tr/ /_/; $base2 =~ tr/ÀÁÂÃÄÅàáâãäåÒÓÔÕÖØòóôõöøÈÉÊËèéêëÇçÌÍÎÏìíîïÙÚÛÜùúûüÿÑñ/aaaaaaaaaaaaooooooooooooeeeeeeeecciiiiiiiiuuuuuuuuynn/; #Step1 renaming, again $dir =~ s/$/\//; $newname = $dir.$base2; # $newname =~ s/$/\.doc/; print $fullname; print $newname; print $fullname; print $newname; # $fullname =~ s/ /\\ /; # $newname =~ s/ /\\ /; chomp($fullname); chomp($newname); # # print $newname; rename("$fullname", "$newname") or warn "Couldn't rename $fullname to $newname: $!\n"; #Prepare newname for conversion $newname2 = $newname; $newname3 = $newname; $newname2 =~ s/\.doc$/\.html/ ; $newname3 =~ s/\.doc$/\.txt/ ; # print "sortie-----$newname2\n"; # Subroutine to execute the command step 2 and 3 my $res=""; my $cmd="java -jar $jod $newname $newname2|"; my $cmd2="html2wiki --dialect DokuWiki $newname2 > $newname3|"; open(EXEC,"$cmd"); while($res=){ chomp($res); print "$res \n"; } close(EXEC); open(EXEC,"$cmd2"); while($res=){ chomp($res); print "$res \n"; } close(EXEC); } } } ==== Step3 : ==== *.html ---> HtmlWikiConverter ---> *.txt ==== Step4 : ==== Finally we just copy the files to media and pages folders... enough. Perl scripting to change URL of media to point to good URL media and dispatch media and txt files in good place on the server... ==== Step5 : ==== Fix permissions. ====== Command lines in use ====== First you need OpenOffice.org on a Linux box. go to a terminal and execute this : soffice -headless -accept="socket,port=8100;urp;" http://www.artofsolving.com/node/10 (dont forget cli :!=à=) java -jar jodconverter-cli-2.2.1.jar A.doc A.pdf java -jar jodconverter-cli-2.2.1.jar A.doc A.html http://search.cpan.org/src/DIBERRI/HTML-WikiConverter-0.61/README massou@linux-hj6y:~/Documents/momas/jodconverter-2.2.1/lib> html2wiki --dialect DokuWiki A.html > output.mw