====== Non-breaking Space Syntax PlugIn ====== ---- plugin ---- description: Use non-breaking spaces author : Matthias Watermann email : support@mwat.de type : syntax lastupdate : 2020-10-07 compatible : Frusterick Manners, Greebo, Hogfather depends : conflicts : similar : tags : typography downloadurl: https://github.com/turnermm/convenience/archive/nbsp.zip ---- ===== About ===== ==== Upgrade ==== The above download version has been updated to work with hogfather by --- [[user>turnermm|turnermm]] //2020-10-07 18:36//. Any issues go to the forum, not to turnermm. Note on upgrade to version 2020-10-07: On some machines it appeared necessary to disable and re-enable the plugin after the update. ---[[user>JoeT]] 2022-07-14 ==== Background ==== A so-called //non-breaking space//((NBSP: ISO-8859-1 char #160 and UTF-8 char-sequence #194#160)) is a character which is rendered visually just as the usual spaces((''space'': ASCII char #32 --- not to be confused with ''blanks'', ASCII char #255)) are. The whole point in using this unusual character (in­stead of an ordinary ''space'') is, that it is //not// considered a word delimiter. In other words: It's supposed to be handled like a normal character which just happens to have no visible points.((In consequence it will //not// be used to break lines (word wrap) and it won't get expanded if the renderer is going to justify text by moving words in a line of text.)) While the NBSP character is quite often abused for nothing more than de­sign pur­po­ses((which are usually dealt with better by use of CSS)) there are a few occasions for its legitimate use. Con­si­de­ring, how­ever, today's keyboards and/or writing habits, it's not that easy to actually type in the NBSP character where it's needed. This is especially true if one's going to write text which is to be stored in UTF-8 format((which is the default text format used for DokuWiki's pages)) because here are even //two// bytes used to represent the NBSP: byte #194 im­me­dia­te­ly fol­lo­wed by byte #160. While modern textprocessing software((e.g. [[http://www.openoffice.org/|OpenOffice.org]] and most GNU/Linux editors)) often supports UTF-8 characters, Do­ku­Wi­ki pages are usually written and edited using a simple web-browser with (X)HTML input forms where it can get quite difficult to insert such a cha­rac­ter((see the [[bomfix|BOMfix]] plugin for another way to edit)). --- This [[#Plugin Source|plugin]] tries to solve the problem. Note from the comments: You can get most of the effect of this by simply creating a ''conf/entities.local.conf'' file and adding a line like this to it: (nbsp)   Same way you can add %%(tab)%% for 3-4 spaces and using an indented paragraph. ===== Usage ===== The markup syntax implemented by this [[#Plugin Source|plugin]] is quite simple and looks like either \  (i.e. a ''backslash''((''backslash'': ASCII char #92)) character followed by a ''space'')((One could call this an "''escaped space''")) or ~~SP~~ (for those of you who'd rather a more expressive markup). Whenever one of the­se cha­rac­ter sequences((you may use whichever variant you like and may even mix them in the same page)) is found it will be replaced by the appropriate UTF-8 cha­rac­ters. That's all. Personally I'd recommend to use the first variant (i.e. ''__%%\ %%__'') as it both seems to be more intuitive and needs less characters to type and store. Both ways of mark­up, how­ever, are replaced by the UTF-8 character sequence exactly the same. ===== Installation ===== Search and install the plugin using the [[plugin:extension|Extension Manager]]. Refer to [[:Plugins]] on how to install plugins manually. * http://dev.mwat.de/dw/syntax_plugin_nbsp.zip ===== Plugin Source ===== Here comes the [[http://www.gnu.org/licenses/gpl.html|GPLed]] PHP source((The comments within the [[#Plugin Source|source]] file are suitable for the OSS [[http://www.stack.nl/~dimitri/doxygen/index.html|doxygen]] tool, a do­cu­men­ta­tion sy­stem for C++, C, Java, Ob­jec­tive-C, Python, IDL and to some extent PHP, C#, and D. --- Since I'm working with dif­fe­rent pro­gram­ming lan­gua­ges it's a great ease to have one tool that handles the docs for all of them.)) for those who'd like to scan it be­fore actu­ally in­stal­ling it: syntax_plugin_nbsp.php - A PHP4 class that provides the * ability to insert non-breaking spaces in DokuWiki page. * *

* To actually use this plugin just add \\ (i.e. backslash * space) or ~~SP~~ in a DokuWiki page. This will be expanded * to the UTF-8 character sequence. *

 *  Copyright (C) 2005, 2007 DFG/M.Watermann, D-10247 Berlin, FRG
 *      All rights reserved
 *    EMail : <support@mwat.de>
 * 
*
* This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either * version 3 of the * License, or (at your option) any later version.
* This software is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. * See the GNU General Public License for more details. *
* @author Matthias Watermann * @version $Id: syntax_plugin_nbsp.php,v 1.7 2007/08/15 12:36:19 matthias Exp $ * @since created 27-Aug-2005 */ class syntax_plugin_nbsp extends DokuWiki_Syntax_Plugin { /** * Tell the parser whether the plugin accepts syntax mode * $aMode within its own markup. * *

* This method always returns FALSE since no other data * can be nested inside a non-breaking space. *

* @param $aMode String The requested syntaxmode. * @return Boolean FALSE always. * @public * @see getAllowedTypes() */ function accepts($aMode) { return FALSE; } // accepts() /** * Connect lookup patterns to lexer. * * @param $aMode String The desired rendermode. * @public * @see render() */ function connectTo($aMode) { // 'verbose' pattern: $this->Lexer->addSpecialPattern('~~SP~~', $aMode, 'plugin_nbsp'); // Don't match DokuWiki's linebreak markup: $this->Lexer->addSpecialPattern('(?Lexer->addSpecialPattern('(? * The returned array holds the following fields: *
*
author
Author of the plugin
*
email
Email address to contact the author
*
date
Last modified date of the plugin in * YYYY-MM-DD format
*
name
Name of the plugin
*
desc
Short description of the plugin (Text only)
*
url
Website with more information on the plugin * (eg. syntax description)
*
* @return Array Information about this plugin class. * @public * @static */ function getInfo() { return array ( 'author' => 'Matthias Watermann', 'email' => 'support@mwat.de', 'date' => '2007-08-15', 'name' => 'NBSP Plugin', 'desc' => 'Include non-breaking spaces in wiki pages.', 'url' => 'https://www.dokuwiki.org/plugin:nbsp'); } // getInfo() /** * Where to sort in? * * @return Integer 176. * @public * @static */ function getSort() { return 176; } // getSort() /** * Get the type of syntax this plugin defines. * * @return String 'substition' (a mispelled 'substitution'). * @public * @static */ function getType() { return 'substition'; } // getType() /** * Handler to prepare matched data for the rendering process. * * @param $aMatch String The text matched by the patterns. * @param $aState Integer The lexer state for the match. * @param $aPos Integer The character position of the matched text. * @param $aHandler Object Reference to the Doku_Handler object. * @return Integer The given $aState value. * @public * @see render() * @static */ function handle($aMatch, $aState, $aPos, &$aHandler) { return $aState; // nothing more to do here ... } // handle() /** * Handle the actual output creation. * *

* The method checks for the given $aMode and returns * FALSE when a mode isn't supported. $aRenderer * contains a reference to the renderer object which is currently * handling the rendering. The contents of $aData is the * return value of the handle() method. *

* @param $aFormat String The output format to generate. * @param $aRenderer Object A reference to the renderer object. * @param $aData Integer The state value returned by handle(). * @return Boolean TRUE always. * @public * @see handle() */ function render($aFormat, &$aRenderer, $aData) { if (DOKU_LEXER_SPECIAL == $aData) { // No test of '$aFormat' needed here: // The raw UTF-8 character sequence is the same anyway. $aRenderer->doc .= chr(194) . chr(160); } // if return TRUE; } // render() } // class syntax_plugin_nbsp } // if //Setup VIM: ex: et ts=2 enc=utf-8 : ?>
==== Changes ==== __2007-08-15__:\\ * added GPL link and fixed some doc problems; __2007-01-16__:\\ * replaced UTF8_ENTITY_NBSP const by raw UTF-8 characters in 'render()'; __2007-01-06__:\\ * minor internal changes to write out raw UTF-8 character sequence __2005-09-26__:\\ # fixed problem with [[#UTF-8 sequences with chr(160)]] __2005-08-29__:\\ - removed unneeded method 'getAllowedTypes()'; __2005-08-27__:\\ + initial release; //[[support@mwat.de|Matthias Watermann]] 2007-08-15// ===== See also ===== ==== Plugins by the same author ==== * [[bomfix|BOMfix Plugin]] -- ignore Byte-Order-Mark characters in your pages * [[code2|Code Syntax Plugin]] -- use syntax highlighting of code fragments in your pages * [[deflist|Definition List Syntax Plugin]] -- use the only complete definition lists in your pages * [[diff|Diff Syntax Plugin]] -- use highlighting of diff files (aka "patches") in your pages((obsoleted by incorporating its ability into the [[code2|Code]] plugin)) * [[hr|HR Syntax Plugin]] -- use horizontal rules in nested block elements of your pages * [[lang|LANGuage Syntax Plugin]] -- markup different languages in your pages * [[lists|Lists Syntax Plugin]] -- use the only complete un-/ordered lists in your pages * [[nbsp|NBSP Syntax Plugin]] -- use Non-Breakable-Spaces in your pages * [[nstoc|NsToC Syntax Plugin]] -- use automatically generated namespace indices * [[shy|Shy Syntax Plugin]] -- use soft hyphens in your pages * [[tip|Tip Syntax Plugin]] -- add hint areas to your pages ==== UTF-8 sequences with chr(160) ==== The following table is an incomplete extract of UTF-8 sequences containing character #160 gathered from: ISO/IEC 10646-1:2000 aka Unicode v3.0.1 by Unicode Consortium Please note that your browser might not be able to correctly show/render all characters((All non-ASCII characters to view on screen (or print out) depend on the fonts actually installed with your OS and GUI. If there are no UTF-8 enabled fonts (or incomplete ones) the browser will kind of "fallback" to a browser-dependent default character.)) in this table. ^ Hex ^ Dec ^ Chr ^ ISO/IEC 10646-1:2000(E) Character Name ^ | 00A0 | 160 | | NO-BREAK SPACE | | 00E0 | 224 | à | LATIN SMALL LETTER A WITH GRAVE | | 0120 | 288 | Ġ | LATIN CAPITAL LETTER G WITH DOT ABOVE | | 0160 | 352 | Š | LATIN CAPITAL LETTER S WITH CARON | | 01A0 | 416 | Ơ | LATIN CAPITAL LETTER O WITH HORN | | 01E0 | 480 | Ǡ | LATIN CAPITAL LETTER A WITH DOT ABOVE AND MACRON | | 0260 | 608 | ɠ | LATIN SMALL LETTER G WITH HOOK | | 02A0 | 672 | ʠ | LATIN SMALL LETTER Q WITH HOOK | | 02E0 | 736 | ˠ | MODIFIER LETTER SMALL GAMMA | | 0320 | 800 | ̠ | COMBINING MINUS SIGN BELOW | | 0360 | 864 | ͠ | COMBINING DOUBLE TILDE | | 03A0 | 928 | Π | GREEK CAPITAL LETTER PI | | 03E0 | 992 | Ϡ | GREEK LETTER SAMPI | | 0420 | 1056 | Р | CYRILLIC CAPITAL LETTER ER | | 0460 | 1120 | Ѡ | CYRILLIC CAPITAL LETTER OMEGA | | 04A0 | 1184 | Ҡ | CYRILLIC CAPITAL LETTER BASHKIR KA | | 04E0 | 1248 | Ӡ | CYRILLIC CAPITAL LETTER ABKHASIAN DZE | | 05A0 | 1440 | ֠ | HEBREW ACCENT TELISHA GEDOLA | | 05E0 | 1504 | נ | HEBREW LETTER NUN | | 0660 | 1632 | ٠ | ARABIC-INDIC DIGIT ZERO | | 06A0 | 1696 | ڠ | ARABIC LETTER AIN WITH THREE DOTS ABOVE | | 06E0 | 1760 | ۠ | ARABIC SMALL HIGH UPRIGHT RECTANGULAR ZERO | | 07A0 | 1952 | ޠ | THAANA LETTER TO | | 0920 | 2336 | ठ | DEVANAGARI LETTER TTHA | | 0960 | 2400 | ॠ | DEVANAGARI LETTER VOCALIC RR | | 0E20 | 3616 | ภ | THAI CHARACTER PHO SAMPHAO | | 10A0 | 4256 | Ⴀ | GEORGIAN CAPITAL LETTER AN | | 10E0 | 4320 | რ | GEORGIAN LETTER RAE | | 1120 | 4384 | ᄠ | HANGUL CHOSEONG PIEUP-TIKEUT | | 1160 | 4448 | ᅠ | HANGUL JUNGSEONG FILLER | | 11A0 | 4512 | ᆠ | HANGUL JUNGSEONG ARAEA-U | | 11E0 | 4576 | ᇠ | HANGUL JONGSEONG MIEUM-CHIEUCH | | 1220 | 4640 | ሠ | ETHIOPIC SYLLABLE SZA | | 1260 | 4704 | በ | ETHIOPIC SYLLABLE BA | | 12A0 | 4768 | አ | ETHIOPIC SYLLABLE GLOTTAL A | | 12E0 | 4832 | ዠ | ETHIOPIC SYLLABLE ZHA | | 1320 | 4896 | ጠ | ETHIOPIC SYLLABLE THA | | 13A0 | 5024 | Ꭰ | CHEROKEE LETTER A | | 13E0 | 5088 | Ꮰ | CHEROKEE LETTER TLO | | 1420 | 5152 | ᐠ | CANADIAN SYLLABICS FINAL GRAVE | | 1460 | 5216 | ᑠ | CANADIAN SYLLABICS WEST-CREE TWOO | | 14A0 | 5280 | ᒠ | CANADIAN SYLLABICS NASKAPI CWAA | | 14E0 | 5344 | ᓠ | CANADIAN SYLLABICS LWII | | 1520 | 5408 | ᔠ | CANADIAN SYLLABICS WEST-CREE SHWOO | | 1560 | 5472 | ᕠ | CANADIAN SYLLABICS THI | | 15A0 | 5536 | ᖠ | CANADIAN SYLLABICS LHI | | 15E0 | 5600 | ᗠ | CANADIAN SYLLABICS CARRIER THI | | 1620 | 5664 | ᘠ | CANADIAN SYLLABICS CARRIER JJI | | 1660 | 5728 | ᙠ | CANADIAN SYLLABICS CARRIER TSA | | 16A0 | 5792 | ᚠ | RUNIC LETTER FEHU FEOH FE F | | 16E0 | 5856 | ᛠ | RUNIC LETTER EAR | | 1E20 | 7712 | Ḡ | LATIN CAPITAL LETTER G WITH MACRON | | 1E60 | 7776 | Ṡ | LATIN CAPITAL LETTER S WITH DOT ABOVE | | 1EA0 | 7840 | Ạ | LATIN CAPITAL LETTER A WITH DOT BELOW | | 1EE0 | 7904 | Ỡ | LATIN CAPITAL LETTER O WITH HORN AND TILDE | | 1F20 | 7968 | ἠ | GREEK SMALL LETTER ETA WITH PSILI | | 1F60 | 8032 | ὠ | GREEK SMALL LETTER OMEGA WITH PSILI | | 1FA0 | 8096 | ᾠ | GREEK SMALL LETTER OMEGA WITH PSILI AND YPOGEGRAMMENI | | 1FE0 | 8160 | ῠ | GREEK SMALL LETTER UPSILON WITH VRACHY | | 2020 | 8224 | † | DAGGER | | 20A0 | 8352 | ₠ | EURO-CURRENCY SIGN | | 20E0 | 8416 | ⃠ | COMBINING ENCLOSING CIRCLE BACKSLASH | | 2120 | 8480 | ℠ | SERVICE MARK | | 2160 | 8544 | Ⅰ | ROMAN NUMERAL ONE | | 21A0 | 8608 | ↠ | RIGHTWARDS TWO HEADED ARROW | | 21E0 | 8672 | ⇠ | LEFTWARDS DASHED ARROW | | 2220 | 8736 | ∠ | ANGLE | | 2260 | 8800 | ≠ | NOT EQUAL TO | | 22A0 | 8864 | ⊠ | SQUARED TIMES | | 22E0 | 8928 | ⋠ | DOES NOT PRECEDE OR EQUAL | | 2320 | 8992 | ⌠ | TOP HALF INTEGRAL | | 2360 | 9056 | ⍠ | APL FUNCTIONAL SYMBOL QUAD COLON | | 2420 | 9248 | ␠ | SYMBOL FOR SPACE | | 2460 | 9312 | ① | CIRCLED DIGIT ONE | | 24A0 | 9376 | ⒠ | PARENTHESIZED LATIN SMALL LETTER E | | 24E0 | 9440 | ⓠ | CIRCLED LATIN SMALL LETTER Q | | 2520 | 9504 | ┠ | BOX DRAWINGS VERTICAL HEAVY AND RIGHT LIGHT | | 2560 | 9568 | ╠ | BOX DRAWINGS DOUBLE VERTICAL AND RIGHT | | 25A0 | 9632 | ■ | BLACK SQUARE | | 25E0 | 9696 | ◠ | UPPER HALF CIRCLE | | 2620 | 9760 | ☠ | SKULL AND CROSSBONES | | 2660 | 9824 | ♠ | BLACK SPADE SUIT | | 2720 | 10016 | ✠ | MALTESE CROSS | | 27A0 | 10144 | ➠ | HEAVY DASHED TRIANGLE-HEADED RIGHTWARDS ARROW | | 2800 | 10240 | ⠀ | BRAILLE PATTERN BLANK | | 2801 | 10241 | ⠁ | BRAILLE PATTERN DOTS-1 | | 2802 | 10242 | ⠂ | BRAILLE PATTERN DOTS-2 | | 2803 | 10243 | ⠃ | BRAILLE PATTERN DOTS-12 | | 2804 | 10244 | ⠄ | BRAILLE PATTERN DOTS-3 | | 2805 | 10245 | ⠅ | BRAILLE PATTERN DOTS-13 | | 2806 | 10246 | ⠆ | BRAILLE PATTERN DOTS-23 | | 2807 | 10247 | ⠇ | BRAILLE PATTERN DOTS-123 | | 2808 | 10248 | ⠈ | BRAILLE PATTERN DOTS-4 | | 2809 | 10249 | ⠉ | BRAILLE PATTERN DOTS-14 | | 280A | 10250 | ⠊ | BRAILLE PATTERN DOTS-24 | | 280B | 10251 | ⠋ | BRAILLE PATTERN DOTS-124 | | 280C | 10252 | ⠌ | BRAILLE PATTERN DOTS-34 | | 280D | 10253 | ⠍ | BRAILLE PATTERN DOTS-134 | | 280E | 10254 | ⠎ | BRAILLE PATTERN DOTS-234 | | 280F | 10255 | ⠏ | BRAILLE PATTERN DOTS-1234 | | 2810 | 10256 | ⠐ | BRAILLE PATTERN DOTS-5 | | 2811 | 10257 | ⠑ | BRAILLE PATTERN DOTS-15 | | 2812 | 10258 | ⠒ | BRAILLE PATTERN DOTS-25 | | 2813 | 10259 | ⠓ | BRAILLE PATTERN DOTS-125 | | 2814 | 10260 | ⠔ | BRAILLE PATTERN DOTS-35 | | 2815 | 10261 | ⠕ | BRAILLE PATTERN DOTS-135 | | 2816 | 10262 | ⠖ | BRAILLE PATTERN DOTS-235 | | 2817 | 10263 | ⠗ | BRAILLE PATTERN DOTS-1235 | | 2818 | 10264 | ⠘ | BRAILLE PATTERN DOTS-45 | | 2819 | 10265 | ⠙ | BRAILLE PATTERN DOTS-145 | | 281A | 10266 | ⠚ | BRAILLE PATTERN DOTS-245 | | 281B | 10267 | ⠛ | BRAILLE PATTERN DOTS-1245 | | 281C | 10268 | ⠜ | BRAILLE PATTERN DOTS-345 | | 281D | 10269 | ⠝ | BRAILLE PATTERN DOTS-1345 | | 281E | 10270 | ⠞ | BRAILLE PATTERN DOTS-2345 | | 281F | 10271 | ⠟ | BRAILLE PATTERN DOTS-12345 | | 2820 | 10272 | ⠠ | BRAILLE PATTERN DOTS-6 | | 2821 | 10273 | ⠡ | BRAILLE PATTERN DOTS-16 | | 2822 | 10274 | ⠢ | BRAILLE PATTERN DOTS-26 | | 2823 | 10275 | ⠣ | BRAILLE PATTERN DOTS-126 | | 2824 | 10276 | ⠤ | BRAILLE PATTERN DOTS-36 | | 2825 | 10277 | ⠥ | BRAILLE PATTERN DOTS-136 | | 2826 | 10278 | ⠦ | BRAILLE PATTERN DOTS-236 | | 2827 | 10279 | ⠧ | BRAILLE PATTERN DOTS-1236 | | 2828 | 10280 | ⠨ | BRAILLE PATTERN DOTS-46 | | 2829 | 10281 | ⠩ | BRAILLE PATTERN DOTS-146 | | 282A | 10282 | ⠪ | BRAILLE PATTERN DOTS-246 | | 282B | 10283 | ⠫ | BRAILLE PATTERN DOTS-1246 | | 282C | 10284 | ⠬ | BRAILLE PATTERN DOTS-346 | | 282D | 10285 | ⠭ | BRAILLE PATTERN DOTS-1346 | | 282E | 10286 | ⠮ | BRAILLE PATTERN DOTS-2346 | | 282F | 10287 | ⠯ | BRAILLE PATTERN DOTS-12346 | | 2830 | 10288 | ⠰ | BRAILLE PATTERN DOTS-56 | | 2831 | 10289 | ⠱ | BRAILLE PATTERN DOTS-156 | | 2832 | 10290 | ⠲ | BRAILLE PATTERN DOTS-256 | | 2833 | 10291 | ⠳ | BRAILLE PATTERN DOTS-1256 | | 2834 | 10292 | ⠴ | BRAILLE PATTERN DOTS-356 | | 2835 | 10293 | ⠵ | BRAILLE PATTERN DOTS-1356 | | 2836 | 10294 | ⠶ | BRAILLE PATTERN DOTS-2356 | | 2837 | 10295 | ⠷ | BRAILLE PATTERN DOTS-12356 | | 2838 | 10296 | ⠸ | BRAILLE PATTERN DOTS-456 | | 2839 | 10297 | ⠹ | BRAILLE PATTERN DOTS-1456 | | 283A | 10298 | ⠺ | BRAILLE PATTERN DOTS-2456 | | 283B | 10299 | ⠻ | BRAILLE PATTERN DOTS-12456 | | 283C | 10300 | ⠼ | BRAILLE PATTERN DOTS-3456 | | 283D | 10301 | ⠽ | BRAILLE PATTERN DOTS-13456 | | 283E | 10302 | ⠾ | BRAILLE PATTERN DOTS-23456 | | 283F | 10303 | ⠿ | BRAILLE PATTERN DOTS-123456 | | 2860 | 10336 | ⡠ | BRAILLE PATTERN DOTS-67 | | 28A0 | 10400 | ⢠ | BRAILLE PATTERN DOTS-68 | | 28E0 | 10464 | ⣠ | BRAILLE PATTERN DOTS-678 | \\ For more information please refer to the [[http://www.unicode.org/unicode/standard/standard.html|Unicode Consortium]]. ===== Discussion ===== Hints, comments, suggestions ... * Before writing this little plugin I tried to use the file ''conf/custom.conf'' for this task. But this didn't work because for one reason or the other that file isn't used at all by DokuWiki (2005-07-13). ''Grep''ing the sources showed that the file isn't referenced anywhere. I dunno whether this is a bug or in­ten­ded be­ha­viour. ---- You can get most of the effect of this by simply creating a conf/entities.local.conf file and adding a line like this to it. -- //[[http://adam.shand.net/|Adam Shand]] 2007-01-29// > (nbsp)   I am new to DokuWiki (7days). I installed your plugin in a locally installed DokuWiki and a test one available on the net. Since then, when there is the letter "à" (à), this letter is replaced by a space or a question mark followed by a space according to the browser used. All my files are in UTF-8. The funny thing is that you are detecting the sequence "C2A0" and this letter is "C3A0" in UTF-8. You can have a look to the page http://carnetweb.info/wiki/doku.php?id=developement at the end of the line starting with "nbsp" there is the sequence "aàa". > Thanks for spotting this! Please update your installation by using the fixed code above.\\ //--- [[support@mwat.de|Matthias Watermann]] 2005-09-27 14:17// PS: sorry to report here, but I failed to subscribe to the mailing list. Each time the confimation message create a "failure notice" as follow: : 206.53.239.180 does not like recipient. Remote host said: 554 <26.mail-out.ovh.net[213.186.42.179]>: Client host rejected: Access denied Giving up on 206.53.239.180. -- francois DOT granger AT gmail DOT com > replace the third pattern in function connectTo with '\xC2\xA0'. There does seem to be a problem with PHP's preg and these characters, however to my mind the pattern should be as I described as it doesn't make sense to leave the "\xC2" hanging on its own when there is a pattern match. --- //[[chris@jalakai.co.uk|Christopher Smith]] 2005-09-26 02:02// >> In this case, I fear, it's not PHP's fault. My intention (as stated in the source comment) was to replace chr(160) occurences which where not proper UTF-8 encoded. But, alas, it didn't come to my mind at that time, that chr(160) can be part of another UTF-8 sequence. So while your suggestion would (X)HTML encode the correct ''nbsp'' UTF-8 sequence (which is a good thing in itself) it does //not// was I intended in the first place, it's in fact the opposite: while I wanted to say "find all #160 that are //not// UTF-8 encoded" your proposal says "find #160 that //is// UTF-8 encoded". --- As about the chr(194) (0xC2): it was not supposed to "hanging on its own" as the RegEx means that chr(194) must not in front of chr(160) and, as Francois reported, the RegEx as such does do exactly that, with the unintended side effects he mentioned :-(\\ //--- [[support@mwat.de|Matthias Watermann]] 2005-09-26 07:44// >>> It was late :? (and I just realised there is no oops smiley in Dokuwiki) I went by the comments without paying enough attention to the regex. I guess you need a more complex pattern that attempts to match invalid UTF-8 byte sequences or do away with that pattern altogether. I guess from a conceptual point of view a plugin shouldn't really have to validate the byte sequence used in raw wiki page - especially when it is outside the syntax proposed by the plugin. --- //[[chris@jalakai.co.uk|Christopher Smith]] 2005-09-26 09:40// Had UTF-8 problems with this one. Deleting "$this->Lexer->addSpecialPattern('(?