====== DocSearch Plugin ======

---- plugin ----
description: Search through your uploaded documents
author     : Dominik Eckelmann
email      : dokuwiki@cosmocode.de 
type       : action
lastupdate : 2016-07-18
compatible : Hogfather, 2009-08-01, 2013-05-10
depends    : 
conflicts  : 
similar    : elasticsearch
tags       : search

downloadurl: https://github.com/cosmocode/docsearch/zipball/master
bugtracker : https://github.com/cosmocode/docsearch/issues
sourcerepo : https://github.com/cosmocode/docsearch
donationurl: 
----

This plugin allows you to search through your uploaded documents. It is integrated into the default DokuWiki search. Just fill in a search string and start to search.

:!: A probably better alternative to this plugin, is the [[plugin:elasticsearch|elasticsearch Plugin]] with its ability to index documents.

[[https://www.cosmocode.de/en/open-source/dokuwiki-plugins/|{{ https://www.cosmocode.de/static/img/dokuwiki/dwplugins.png?recache|A CosmoCode Plugin}}]]

===== Download and Installation =====

Search and install the plugin using the [[plugin:extension|Extension Manager]]. Refer to [[:Plugins]] on how to install plugins manually.

==== Changes ====

{{rss>https://github.com/cosmocode/docsearch/commits/master.atom author date}}

===== Index Updating and Cronjob =====

This plugin creates it's own index of documents, similar but separate from DokuWiki's own index. Only documents that have been indexed will be found by the plugin, so the index has to be updated periodically. 

The index is built by the ''dokuwiki/lib/plugins/docsearch/cron.php'' script. You need to set up a scheduled job to call this command periodically. Eg. once a day. This can be done using cron on Linux, scheduled tasks on Windows or online cron job services like [[https://www.easycron.com/cron-job-tutorials/how-to-set-up-cron-job-for-dokuwiki-docsearch|easycron]].

When using your operating system's scheduler, be aware that PHP settings may differ between command line and web server execution. This is important if you want to increase the ''memory_limit'' of your PHP configuration (see [[phpfn>ini.core]]).

Note: if you run a DokuWiki [[:farm]], you need to run the cronjob for each animal separately, passing the animal's name as first parameter to the script.

===== Converters =====

The plugin works by converting the documents to index into text files first. To do this, it relies on external converters for each file type. These have to be set up in the ''dokuwiki/lib/plugins/docsearch/conf/converter.php'' file.

Either edit it in your text editor of choice or use the [[plugin:confmanager|ConfManager Plugin]].

Each line in that file configures one file extension and the converter call to use. The abstract syntax is
<code>
fileextension /path/to/converter -with_calls_to_convert --from %in% --to %out%
</code>

''%in%'' refers to the input document, ''%out%'' is the resulting text file. For converters writing their results to STDOUT, be sure to redirect it to the ''%out%'' file.

Below is a typical configuration. The comments show how to install the tools on a Debian linux system.

<code bash>
#<?php die(); ?>
pdf   /usr/bin/pdftotext -enc UTF-8   %in% %out%      # apt install poppler-utils
doc   /usr/bin/catdoc -d UTF-8        %in% > %out%    # apt install catdoc
ppt   /usr/bin/catppt -d UTF-8        %in% > %out%    # apt install catdoc
xls   /usr/bin/xls2csv -d UTF-8       %in% > %out%    # apt install catdoc
docx  /usr/bin/docx2txt               %in% %out%      # apt install docx2txt
xlsx  /usr/bin/incsv                  %in% > %out%    # apt install csvkit
pptx  /pathto/pptx2txt.sh             %in% > %out%    # curl https://raw.githubusercontent.com/welcheb/pptx2txt.sh/refs/heads/master/pptx2txt.sh -o /pathto/pptx2txt.sh
odt   /usr/bin/odt2txt         %in% --output=%out%    # apt install odt2txt
</code>

Below is a list of user contributed setups:

  * [[~jodconverter|jodconverter & OpenOffice]]
  * [[~freemind|FreeMind Mind Maps]]
  * [[~zipfiles|Zip Files]]
  * [[~windows|Setup on a Windows Server]]