====== Fulltext Index ====== For quickly [[:search|searching]] the wiki, a fulltext index is used. FIXME documentation in progress - might be wrong. The index system is designed with the following three premises in mind: - PHP execution time is limited (usually 30 seconds, on some hosts less) - Memory is limited (we recommend 32MB but it's less for many hosters) - Disk Space is cheap ===== Structure ===== All parts of the fulltext index are stored in ''data/index'': * ''page.idx'' -- a list of all known pages * ''w////.idx'' -- a list of all known words with a byte length of //// * ''i////.idx'' -- word -> page assignments * ''pageword.idx'' -- page -> word assignments ==== page.idx ==== This file contains all ever indexed pages, one page per line. The line number (starting from 0) is considered the **PID**. Note, that pages are not removed from this index when they are deleted. The existence check is done on search time. ==== w.idx ==== These files contain a list of all ever indexed words with a byte length of ////. The line number (starting from 0) is considered the **WID** ==== i.idx ==== These files contain the real index by mapping words to the pages they occur on. The file also contains the frequency of the word on the given page, which is used for search ranking. The line numbers correspond to the ones in ''w.idx'', so the w and i indexes should always have the same number of lines. Each line contains ''//PID*//'' pairs separated by colons. Imagine line 0 of ''i5.idx'' containing this entry: 0*2:55*5:23*1 this would mean that //word 0// (-> line 0 in ''w5.idx'') occurs 2 times on page 0 (-> line 0 in ''pages.idx''), 5 times on page 55 (-> line 55 in ''pages.idx'') and one time on page 23 (-> line 23 in ''pages.idx''). ==== pageword.idx ==== This index is used to remove words from the index that are no longer on a changed or deleted page. The line number (starting from 0) is the **PID** from the ''page.idx''. Each line contains ''//*WID//'' pairs separated by colons. As only one line of that index is read and written during an index update that index is quite efficient. ===== Indexing ====== [[xref>inc/indexer.php]] contains all fulltext index building related functions. FIXME describe the process ===== Searching ====== [[xref>inc/fulltext.php]] contains all fulltext index searching related functions. FIXME describe the process