DokuWiki

It's better when it's simple

User Tools

Site Tools


blacklist

This is an old revision of the document!


Blacklisting

The internet isn't the place it used to be anymore. Everything good gets corrupted and so it is with Wikis. WikiSpam, like Spam in blogs and email, is on the rise. If you use DokuWiki in your Intranet this is no problem for you. But if you intend to use it on the open Internet you may want to blacklist some known Spam words.

For using a blacklist in DokuWiki enable the usewordblock option in the config manager (by default on) and edit the conf/wordblock.local.conf file. You can have a look inside the file conf/wordblock.conf for a list of existing word blocks. The file contains Regular Expressions (Perl compatible) if any of these match saving is disallowed.

IP based blocking can be done using Apache's deny from directives or the ipban plugin.

To understand why a certain text was banned for spam, you can use the whyspam plugin to analyze the text.

Blacklist Sources

Updating the blacklist from a public source through a daily cronjob is recommended, here is a list of sources you can use to do so.

Chongqed

The blacklist maintained by the folks at chongqed.org seems no longer to be available

$> wget http://blacklist.chongqed.org/ -O conf/wordblock.conf

Wikipedia

The nice people at Wikipedia maintain a similar blacklist. You can use the following command for updating your blacklist from this source:

$> curl http://meta.wikimedia.org/wiki/Spam_blacklist?action=raw |grep -v '<pre>' > conf/wordblock.conf

Logging of blocked Attacks

This small change makes it possible to log blocked attacks in \DATA\meta\wordblock.log and can also be used for block lists debugging.
But need a modification of a original DokuWiki file.

File: \inc\common.php

Search Line:

function checkwordblock($text=''){
[...]
   if(count($re) && preg_match('#('.join('|',$re).')#si',$text,$matches)) {
      // prepare event data
      $data['matches'] = $matches;
      $data['userinfo']['ip'] = $_SERVER['REMOTE_ADDR'];
[...]

Change it to:

function checkwordblock($text=''){
[...]
    if(count($re) && preg_match('#('.join('|',$re).')#si',$text,$matches)) {
       // prepare event data       
       io_saveFile($conf['metadir'].'/wordblock.log', strftime($conf['dformat'])."\t".$matches[0]."\t".$ID.$_SERVER['REMOTE_USER']."\t".$_SERVER['REMOTE_ADDR'].":".$_SERVER['SERVER_PORT']."\t".gethostbyaddr($_SERVER['REMOTE_ADDR'])."\t".$_SERVER['HTTP_USER_AGENT']."\n", true);
 
      $data['matches'] = $matches;
      $data['userinfo']['ip'] = $_SERVER['REMOTE_ADDR'];
[...]
blacklist.1358183127.txt.gz · Last modified: 2013-01-14 18:05 by Klap-in

Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Share Alike 4.0 International
CC Attribution-Share Alike 4.0 International Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki