DokuWiki

It's better when it's simple

User Tools

Site Tools


tips:maintenance

Maintenance

Here are some tips to automate some of the day-to-day maintenance needed or recommended for DokuWiki.

See also the plugins: cleanup and clearhistory

Keep Blacklist up to date

See blacklist on how to set up a cronjob to keep the Anti-Spam Blacklist current.

Automatic cleanup script

It is recommended to set up some cleanup process for busy DokuWikis. The following Bash (Unix shell) shell script serves as an example. It deletes old revisions from the attic, removes stale lock files and empty directories, and it cleans up the cache1).

cleanup.sh
#!/bin/bash
 
cleanup()
{
    local data_path="$1"        # full path to data directory of wiki
    local retention_days="$2"   # number of days after which old files are to be removed
 
    # purge files older than ${retention_days} days from attic and media_attic (old revisions)
    find "${data_path}"/{media_,}attic/ -type f -not -name _dummy -mtime +"${retention_days}" -delete
 
    # remove stale lock files (files which are 1-2 days old)
    find "${data_path}"/locks/ -name '*.lock' -type f -mtime +1 -delete
 
    # remove empty directories
    find "${data_path}"/{attic,cache,index,locks,media,media_attic,media_meta,meta,pages,tmp}/ \
        -mindepth 1 -type d -empty -delete
 
    # remove files older than ${retention_days} days from the cache
    if test -n "$(find "${data_path}"/cache/?/ -maxdepth 1 -print -quit &> /dev/null)"
    then
        find "${data_path}"/cache/?/ -type f -not -name _dummy -mtime +"${retention_days}" -delete
    fi
}
 
 
# cleanup DokuWiki installations (path to datadir, number of days)
# some examples:
 
cleanup /home/user1/htdocs/doku/data    256
cleanup /home/user2/htdocs/mywiki/data  180
cleanup /var/www/superwiki/data         180

To run it automatically, set up a cronjob. The following example calls the script every day 7 minutes after midnight. To run as non-root user remove root.

7 0 * * *   root  /full/path/to/cleanup.sh

Be sure to set everything up correctly - you don't want to delete the wrong things, do you?

Windows -- warmzip

A script for cleaning out old files on Windows systems is waRmZip, available from here on SourceForge. Write a batch file to call it, and schedule it to run every day. And as the man says: 'Be sure to set everything up correctly' ;-)

I took the above suggestion to use waRmZip and wrote this batch file - maybe it will help out.

My favorite way to run cron jobs on Windows is PyCron.

dw-cleanup.bat
@echo off
set waRmZip="c:\Program Files\waRmZip\waRmZip.wsf"
set wikiHome="c:\path\to\htdocs\wiki\data"

rem Move attic files older than 30 days to an archive location
%waRmZip% %wikiHome%\attic /ma:30 /md:%wikiHome%_archive\attic /r /q

rem Option: delete attic files older than 30 days
rem %waRmZip% %wikiHome%\attic /da:30 /dc /r /q

rem Delete empty attic directories; waRmZip requires the /da flag when using
rem /df, so add filter for *.zzz so /da doesn't remove any files
%waRmZip% %wikiHome%\attic /r /da:31 /df /fo:*.zzz /q

rem Remove stale lock files
%waRmZip% %wikiHome%\locks /da:1 /fo:*.lock /r /q

rem Remove empty directories
%waRmZip% %wikiHome%\pages /da:365 /df /fo:*.zzz /r /q

Windows -- batch script

This is another Windows command shell script for maintaining your dokuwiki base on a Windows environment. The script uses the free and open source utility find, which can be obtained via http://gnuwin32.sourceforge.net/

All paths are read from the DokuWiki config file. Files to be deleted can be shown before deletion, to prevent accidental deletion of files.

maintain_dokuwiki.cmd
@echo off
setlocal

REM This script performs some basic DokuWiki maintenance

REM Copyright (C) 2012 Peter Mosmans

REM This program is free software: you can redistribute it and/or modify
REM it under the terms of the GNU General Public License as published by
REM the Free Software Foundation, either version 3 of the License, or
REM (at your option) any later version.

REM This program is distributed in the hope that it will be useful,
REM but WITHOUT ANY WARRANTY; without even the implied warranty of
REM MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
REM GNU General Public License for more details.

REM You should have received a copy of the GNU General Public License
REM along with this program. If not, see <http://www.gnu.org/licenses/>.

REM Please contact support AT go-forward.net for questions and/or feedback

 
REM Last modification: 02-05-2012 (Peter Mosmans)
set NAME=maintain_dokuwiki
set VERSION=0.13

REM path to the dokuwiki configuration file enclosed in double quotes
set DOKUWIKICONFIG="\full\filename\of\your\dokuwiki\conf\local.php"
REM preserve all files that are younger than DAYSTOKEEP days
set DAYSTOKEEP=31
REM set to true if you want to show results and pause before deleting any files
set SHOWRESULTSFIRST=true
set FIND=c:\tools\find.exe
set TEMPFILE=%TMP%\%NAME%.tmp

REM see if all tools are present
for %%i in (%FIND%) do (
    if not exist %%i (
        echo sorry, could not find %%i - exiting
        echo you can obtain the free GNU tools from gnuwin32.sourceforge.net
        exit /b
    )
)

REM see if the dokuwiki configuration file can be read
if not exist %DOKUWIKICONFIG% (
    echo sorry, could not find DokuWiki config at %DOKUWIKICONFIG% - exiting
    exit /b
)

REM grab the correct paths from the configuration file
for /f "usebackq delims=' tokens=2,4" %%i in (%DOKUWIKICONFIG%) do (
    if /i "%%i"=="datadir" set DOCUMENTROOT=%%j
    if /i "%%i"=="olddir" set ATTICDIR=%%j
    if /i "%%i"=="cachedir" set CACHEDIR=%%j
    if /i "%%i"=="lockdir" set LOCKDIR=%%j
)
if "%DOCUMENTROOT%" == "" (
    echo sorry, could not find datadir variable in %DOKUWIKICONFIG%, exiting...
    exit /b
)

REM use defaults if the paths are not specified
if /i "%ATTICDIR%" == "" set ATTICDIR=%DOCUMENTROOT%/attic
if /i "%LOCKDIR%" == "" set LOCKDIR=%DOCUMENTROOT%/lock
if /i "%CACHEDIR%" == "" set CACHEDIR=%DOCUMENTROOT%/cache

REM purge files older than DAYSTOKEEP days from the attic
%FIND% "%ATTICDIR%" -type f -mtime +%DAYSTOKEEP% -print > %TEMPFILE%
REM remove locks older than one day
%FIND% "%LOCKDIR%" -name "*.lock" -type f -mtime +1 -print >> %TEMPFILE%
REM remove cache files older than DAYSTOKEEP
%FIND% "%CACHEDIR%" -type f -mtime +%DAYSTOKEEP% -print >> %TEMPFILE%

REM show results, if any
for /f "usebackq" %%i in (`%FIND% "%TMP%" -size +1 -name %NAME%.tmp`) do (
    if /i "%SHOWRESULTSFIRST%"=="TRUE" (
        echo files to be deleted:
        type %TEMPFILE%
        pause
    )
    for /f "delims=#" %%i in (%TEMPFILE%) do del "%%i"
)

REM clean up
del /f /q %TEMPFILE%
 
endlocal

Keeping Playground Clean

To keep the wiki's Playground and other pages clean, use a cron job e.g. every 30 minutes, that restores Playground and other pages to their original content.

Example: Restore Playground every 30 min:

0,30 * * * * cp -f /path/to/savedwiki/data/pages/playground/playground.txt /path/to/dokuwiki/data/pages/playground/

Example: Restore all pages in namespace “wiki” every 30 min:

0,30 * * * * cp -rf /path/to/savedwiki/data/pages/wiki/ /path/to/dokuwiki/data/pages/wiki/

Problems with CAPTCHA plugin

Using the CAPTCHA plugin and the recommended maintenance method to keep the playground clean, can result in the effect of being unable to edit the playground.

When this occurs, the problem can be easily resolved by removing the related playground files in the meta folder with the next cronjob.

Example: Deletes Playground metafiles every 30 min:

0,30 * * * * rm -f /path/to/dokuwiki/data/meta/playground/playground.*

When cronjob is not available

When your hosting doesn't allow to use cronjobs, consider using the cronojob plugin instead.

Discussion

Could you please provide PHP versions of these scripts to use with the cronojob plugin?


Regarding the above cleanup script which uses file modification time (mtime), wouldn't it be safer to use the timestamp in the filename to determine if a file in the attic should be deleted or not?

On the one hand, I'd say it could be done but it's of course trickier to set up. For many installations it will be fine to use mtime. On the other hand, some might want to make sure they clean up old files no matter what (e.g. files left after a crash or critical PHP error).


Could someone add the appropriate line for cache maintenance to the Windows waRmZip script?


Does the cleanup Plugin handle all the above tasks? Would it be recommended over running these scripts?


This is example of php script to clean old cache files. useful when .sh is not available to run.

cleanup.php
<?php 
/* 
 * mrlemonade ~ 
 */ 
function getFilesFromDir($dir) { 
  $files = array(); 
  if ($handle = opendir($dir)) { 
    while (false !== ($file = readdir($handle))) { 
        if ($file != "." && $file != "..") { 
            if(is_dir($dir.'/'.$file)) { 
                $dir2 = $dir.'/'.$file; 
                $files[] = getFilesFromDir($dir2); 
            } 
            else { 
              $files[] = $dir.'/'.$file; 
            } 
        } 
    } 
    closedir($handle); 
  } 
  return array_flat($files); 
} 
function array_flat($array) { 
  foreach($array as $a) { 
    if(is_array($a)) { 
      $tmp = array_merge($tmp, array_flat($a)); 
    } 
    else { 
      $tmp[] = $a; 
    } 
  } 
  return $tmp; 
} 
 
// Define the folder to clean
$captchaFolder = 'data/cache';
// Here you can define after how many
// days the files should get deleted
$expire_time = 5; 
// Find all files of the given file type
foreach (getFilesFromDir($captchaFolder) as $Filename) {
        // Read file creation time
        $FileCreationTime = filectime($Filename);
        // Calculate file age in seconds
        $FileAge = time() - $FileCreationTime; 
        // Is the file older than the given time span?
        if ($FileAge > ($expire_time*60*60*24 )) {
            // Now do something with the olders files...
            print "The file $Filename is older than $expire_time days \n";
            // For example deleting files:
            // unlink($Filename);
        }
}
echo 'ran';
?>

use this at your own risk. — S.C. Yoo 2012/02/10 12:49


Cheers, I'd like to add that it is a good idea to clean up orphaned meta data, don't you think? I do the following (in an R script):

  1. list all files in the pages directory recursively
  2. add a column 'pagename' to this list that countains the file name again but without the base directory
  3. in pagename exchange '/' (or '\') with ':' and remove the file extension
  4. do the same for the meta directory + exclude some additional files
  5. remove all entries from the meta-list from which the page name is in the pages-list
  6. delete all files left in the meta list

Of course one could add a time constraint on it so that you don't use metadata immediately.

Clemo 2016/09/23 sometime

1)
For a discussion of cache maintenance see also the forum discussion.
tips/maintenance.txt · Last modified: by staze

Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Share Alike 4.0 International
CC Attribution-Share Alike 4.0 International Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki