Table of Contents
Non-breaking Space Syntax PlugIn
About
Upgrade
The above download version has been updated to work with hogfather by — turnermm 2020-10-07 18:36. Any issues go to the forum, not to turnermm.
Note on upgrade to version 2020-10-07: On some machines it appeared necessary to disable and re-enable the plugin after the update. —JoeT 2022-07-14
Background
A so-called non-breaking space1) is a character which is rendered visually just as the usual spaces2) are.
The whole point in using this unusual character (instead of an ordinary space
) is, that it is not considered a word delimiter.
In other words: It's supposed to be handled like a normal character which just happens to have no visible points.3)
While the NBSP character is quite often abused for nothing more than design purposes4) there are a few occasions for its legitimate use. Considering, however, today's keyboards and/or writing habits, it's not that easy to actually type in the NBSP character where it's needed. This is especially true if one's going to write text which is to be stored in UTF-8 format5) because here are even two bytes used to represent the NBSP: byte #194 immediately followed by byte #160.
While modern textprocessing software6) often supports UTF-8 characters, DokuWiki pages are usually written and edited using a simple web-browser with (X)HTML input forms where it can get quite difficult to insert such a character7). — This plugin tries to solve the problem.
Note from the comments: You can get most of the effect of this by simply creating a conf/entities.local.conf
file and adding a line like this to it:
(nbsp)
Same way you can add (tab) for 3-4 spaces and using an indented paragraph.
Usage
The markup syntax implemented by this plugin is quite simple and looks like either
\
(i.e. a backslash
8) character followed by a space
)9) or
~~SP~~
(for those of you who'd rather a more expressive markup). Whenever one of these character sequences10) is found it will be replaced by the appropriate UTF-8 characters. That's all.
Personally I'd recommend to use the first variant (i.e. \
) as it both seems to be more intuitive and needs less characters to type and store.
Both ways of markup, however, are replaced by the UTF-8 character sequence exactly the same.
Installation
Search and install the plugin using the Extension Manager. Refer to Plugins on how to install plugins manually.
Plugin Source
Here comes the GPLed PHP source11) for those who'd like to scan it before actually installing it:
<?php if (! class_exists('syntax_plugin_nbsp')) { if (! defined('DOKU_PLUGIN')) { if (! defined('DOKU_INC')) { define('DOKU_INC', realpath(dirname(__FILE__) . '/../../') . '/'); } // if define('DOKU_PLUGIN', DOKU_INC . 'lib/plugins/'); } // if // Include parent class: require_once(DOKU_PLUGIN . 'syntax.php'); /** * <tt>syntax_plugin_nbsp.php </tt>- A PHP4 class that provides the * ability to insert non-breaking spaces in <tt>DokuWiki</tt> page. * * <p> * To actually use this plugin just add <tt>\\ </tt> (i.e. backslash * space) or <tt>~~SP~~</tt> in a DokuWiki page. This will be expanded * to the UTF-8 character sequence. * </p><pre> * Copyright (C) 2005, 2007 DFG/M.Watermann, D-10247 Berlin, FRG * All rights reserved * EMail : <support@mwat.de> * </pre> * <div class="disclaimer"> * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either * <a href="http://www.gnu.org/licenses/gpl.html">version 3</a> of the * License, or (at your option) any later version.<br> * This software is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. * See the GNU General Public License for more details. * </div> * @author <a href="mailto:support@mwat.de">Matthias Watermann</a> * @version <tt>$Id: syntax_plugin_nbsp.php,v 1.7 2007/08/15 12:36:19 matthias Exp $</tt> * @since created 27-Aug-2005 */ class syntax_plugin_nbsp extends DokuWiki_Syntax_Plugin { /** * Tell the parser whether the plugin accepts syntax mode * <tt>$aMode</tt> within its own markup. * * <p> * This method always returns <tt>FALSE</tt> since no other data * can be nested inside a non-breaking space. * </p> * @param $aMode String The requested syntaxmode. * @return Boolean <tt>FALSE</tt> always. * @public * @see getAllowedTypes() */ function accepts($aMode) { return FALSE; } // accepts() /** * Connect lookup patterns to lexer. * * @param $aMode String The desired rendermode. * @public * @see render() */ function connectTo($aMode) { // 'verbose' pattern: $this->Lexer->addSpecialPattern('~~SP~~', $aMode, 'plugin_nbsp'); // Don't match DokuWiki's linebreak markup: $this->Lexer->addSpecialPattern('(?<!\x5C)\x5C\x20', $aMode, 'plugin_nbsp'); // in case a raw #160 was inserted (e.g. by copy&paste): $this->Lexer->addSpecialPattern('(?<![\x80-\xE2])\xA0', $aMode, 'plugin_nbsp'); } // connectTo() /** * Get an associative array with plugin info. * * <p> * The returned array holds the following fields: * <dl> * <dt>author</dt><dd>Author of the plugin</dd> * <dt>email</dt><dd>Email address to contact the author</dd> * <dt>date</dt><dd>Last modified date of the plugin in * <tt>YYYY-MM-DD</tt> format</dd> * <dt>name</dt><dd>Name of the plugin</dd> * <dt>desc</dt><dd>Short description of the plugin (Text only)</dd> * <dt>url</dt><dd>Website with more information on the plugin * (eg. syntax description)</dd> * </dl> * @return Array Information about this plugin class. * @public * @static */ function getInfo() { return array ( 'author' => 'Matthias Watermann', 'email' => 'support@mwat.de', 'date' => '2007-08-15', 'name' => 'NBSP Plugin', 'desc' => 'Include non-breaking spaces in wiki pages.', 'url' => 'https://www.dokuwiki.org/plugin:nbsp'); } // getInfo() /** * Where to sort in? * * @return Integer <tt>176</tt>. * @public * @static */ function getSort() { return 176; } // getSort() /** * Get the type of syntax this plugin defines. * * @return String <tt>'substition'</tt> (a mispelled 'substitution'). * @public * @static */ function getType() { return 'substition'; } // getType() /** * Handler to prepare matched data for the rendering process. * * @param $aMatch String The text matched by the patterns. * @param $aState Integer The lexer state for the match. * @param $aPos Integer The character position of the matched text. * @param $aHandler Object Reference to the Doku_Handler object. * @return Integer The given <tt>$aState</tt> value. * @public * @see render() * @static */ function handle($aMatch, $aState, $aPos, &$aHandler) { return $aState; // nothing more to do here ... } // handle() /** * Handle the actual output creation. * * <p> * The method checks for the given <tt>$aMode</tt> and returns * <tt>FALSE</tt> when a mode isn't supported. <tt>$aRenderer</tt> * contains a reference to the renderer object which is currently * handling the rendering. The contents of <tt>$aData</tt> is the * return value of the <tt>handle()</tt> method. * </p> * @param $aFormat String The output format to generate. * @param $aRenderer Object A reference to the renderer object. * @param $aData Integer The state value returned by <tt>handle()</tt>. * @return Boolean <tt>TRUE</tt> always. * @public * @see handle() */ function render($aFormat, &$aRenderer, $aData) { if (DOKU_LEXER_SPECIAL == $aData) { // No test of '$aFormat' needed here: // The raw UTF-8 character sequence is the same anyway. $aRenderer->doc .= chr(194) . chr(160); } // if return TRUE; } // render() } // class syntax_plugin_nbsp } // if //Setup VIM: ex: et ts=2 enc=utf-8 : ?>
Changes
2007-08-15:
* added GPL link and fixed some doc problems;
2007-01-16:
* replaced UTF8_ENTITY_NBSP const by raw UTF-8 characters in 'render()';
2007-01-06:
* minor internal changes to write out raw UTF-8 character sequence
2005-09-26:
# fixed problem with UTF-8 sequences with chr(160)
2005-08-29:
- removed unneeded method 'getAllowedTypes()';
2005-08-27:
+ initial release;
Matthias Watermann 2007-08-15
See also
Plugins by the same author
- BOMfix Plugin – ignore Byte-Order-Mark characters in your pages
- Code Syntax Plugin – use syntax highlighting of code fragments in your pages
- Definition List Syntax Plugin – use the only complete definition lists in your pages
- Diff Syntax Plugin – use highlighting of diff files (aka “patches”) in your pages12)
- HR Syntax Plugin – use horizontal rules in nested block elements of your pages
- LANGuage Syntax Plugin – markup different languages in your pages
- Lists Syntax Plugin – use the only complete un-/ordered lists in your pages
- NBSP Syntax Plugin – use Non-Breakable-Spaces in your pages
- NsToC Syntax Plugin – use automatically generated namespace indices
- Shy Syntax Plugin – use soft hyphens in your pages
- Tip Syntax Plugin – add hint areas to your pages
UTF-8 sequences with chr(160)
The following table is an incomplete extract of UTF-8 sequences containing character #160 gathered from:
ISO/IEC 10646-1:2000 aka Unicode v3.0.1 by Unicode Consortium
Please note that your browser might not be able to correctly show/render all characters13) in this table.
Hex | Dec | Chr | ISO/IEC 10646-1:2000(E) Character Name |
---|---|---|---|
00A0 | 160 | NO-BREAK SPACE | |
00E0 | 224 | à | LATIN SMALL LETTER A WITH GRAVE |
0120 | 288 | Ġ | LATIN CAPITAL LETTER G WITH DOT ABOVE |
0160 | 352 | Š | LATIN CAPITAL LETTER S WITH CARON |
01A0 | 416 | Ơ | LATIN CAPITAL LETTER O WITH HORN |
01E0 | 480 | Ǡ | LATIN CAPITAL LETTER A WITH DOT ABOVE AND MACRON |
0260 | 608 | ɠ | LATIN SMALL LETTER G WITH HOOK |
02A0 | 672 | ʠ | LATIN SMALL LETTER Q WITH HOOK |
02E0 | 736 | ˠ | MODIFIER LETTER SMALL GAMMA |
0320 | 800 | ̠ | COMBINING MINUS SIGN BELOW |
0360 | 864 | ͠ | COMBINING DOUBLE TILDE |
03A0 | 928 | Π | GREEK CAPITAL LETTER PI |
03E0 | 992 | Ϡ | GREEK LETTER SAMPI |
0420 | 1056 | Р | CYRILLIC CAPITAL LETTER ER |
0460 | 1120 | Ѡ | CYRILLIC CAPITAL LETTER OMEGA |
04A0 | 1184 | Ҡ | CYRILLIC CAPITAL LETTER BASHKIR KA |
04E0 | 1248 | Ӡ | CYRILLIC CAPITAL LETTER ABKHASIAN DZE |
05A0 | 1440 | ֠ | HEBREW ACCENT TELISHA GEDOLA |
05E0 | 1504 | נ | HEBREW LETTER NUN |
0660 | 1632 | ٠ | ARABIC-INDIC DIGIT ZERO |
06A0 | 1696 | ڠ | ARABIC LETTER AIN WITH THREE DOTS ABOVE |
06E0 | 1760 | ۠ | ARABIC SMALL HIGH UPRIGHT RECTANGULAR ZERO |
07A0 | 1952 | ޠ | THAANA LETTER TO |
0920 | 2336 | ठ | DEVANAGARI LETTER TTHA |
0960 | 2400 | ॠ | DEVANAGARI LETTER VOCALIC RR |
0E20 | 3616 | ภ | THAI CHARACTER PHO SAMPHAO |
10A0 | 4256 | Ⴀ | GEORGIAN CAPITAL LETTER AN |
10E0 | 4320 | რ | GEORGIAN LETTER RAE |
1120 | 4384 | ᄠ | HANGUL CHOSEONG PIEUP-TIKEUT |
1160 | 4448 | ᅠ | HANGUL JUNGSEONG FILLER |
11A0 | 4512 | ᆠ | HANGUL JUNGSEONG ARAEA-U |
11E0 | 4576 | ᇠ | HANGUL JONGSEONG MIEUM-CHIEUCH |
1220 | 4640 | ሠ | ETHIOPIC SYLLABLE SZA |
1260 | 4704 | በ | ETHIOPIC SYLLABLE BA |
12A0 | 4768 | አ | ETHIOPIC SYLLABLE GLOTTAL A |
12E0 | 4832 | ዠ | ETHIOPIC SYLLABLE ZHA |
1320 | 4896 | ጠ | ETHIOPIC SYLLABLE THA |
13A0 | 5024 | Ꭰ | CHEROKEE LETTER A |
13E0 | 5088 | Ꮰ | CHEROKEE LETTER TLO |
1420 | 5152 | ᐠ | CANADIAN SYLLABICS FINAL GRAVE |
1460 | 5216 | ᑠ | CANADIAN SYLLABICS WEST-CREE TWOO |
14A0 | 5280 | ᒠ | CANADIAN SYLLABICS NASKAPI CWAA |
14E0 | 5344 | ᓠ | CANADIAN SYLLABICS LWII |
1520 | 5408 | ᔠ | CANADIAN SYLLABICS WEST-CREE SHWOO |
1560 | 5472 | ᕠ | CANADIAN SYLLABICS THI |
15A0 | 5536 | ᖠ | CANADIAN SYLLABICS LHI |
15E0 | 5600 | ᗠ | CANADIAN SYLLABICS CARRIER THI |
1620 | 5664 | ᘠ | CANADIAN SYLLABICS CARRIER JJI |
1660 | 5728 | ᙠ | CANADIAN SYLLABICS CARRIER TSA |
16A0 | 5792 | ᚠ | RUNIC LETTER FEHU FEOH FE F |
16E0 | 5856 | ᛠ | RUNIC LETTER EAR |
1E20 | 7712 | Ḡ | LATIN CAPITAL LETTER G WITH MACRON |
1E60 | 7776 | Ṡ | LATIN CAPITAL LETTER S WITH DOT ABOVE |
1EA0 | 7840 | Ạ | LATIN CAPITAL LETTER A WITH DOT BELOW |
1EE0 | 7904 | Ỡ | LATIN CAPITAL LETTER O WITH HORN AND TILDE |
1F20 | 7968 | ἠ | GREEK SMALL LETTER ETA WITH PSILI |
1F60 | 8032 | ὠ | GREEK SMALL LETTER OMEGA WITH PSILI |
1FA0 | 8096 | ᾠ | GREEK SMALL LETTER OMEGA WITH PSILI AND YPOGEGRAMMENI |
1FE0 | 8160 | ῠ | GREEK SMALL LETTER UPSILON WITH VRACHY |
2020 | 8224 | † | DAGGER |
20A0 | 8352 | ₠ | EURO-CURRENCY SIGN |
20E0 | 8416 | ⃠ | COMBINING ENCLOSING CIRCLE BACKSLASH |
2120 | 8480 | ℠ | SERVICE MARK |
2160 | 8544 | Ⅰ | ROMAN NUMERAL ONE |
21A0 | 8608 | ↠ | RIGHTWARDS TWO HEADED ARROW |
21E0 | 8672 | ⇠ | LEFTWARDS DASHED ARROW |
2220 | 8736 | ∠ | ANGLE |
2260 | 8800 | ≠ | NOT EQUAL TO |
22A0 | 8864 | ⊠ | SQUARED TIMES |
22E0 | 8928 | ⋠ | DOES NOT PRECEDE OR EQUAL |
2320 | 8992 | ⌠ | TOP HALF INTEGRAL |
2360 | 9056 | ⍠ | APL FUNCTIONAL SYMBOL QUAD COLON |
2420 | 9248 | ␠ | SYMBOL FOR SPACE |
2460 | 9312 | ① | CIRCLED DIGIT ONE |
24A0 | 9376 | ⒠ | PARENTHESIZED LATIN SMALL LETTER E |
24E0 | 9440 | ⓠ | CIRCLED LATIN SMALL LETTER Q |
2520 | 9504 | ┠ | BOX DRAWINGS VERTICAL HEAVY AND RIGHT LIGHT |
2560 | 9568 | ╠ | BOX DRAWINGS DOUBLE VERTICAL AND RIGHT |
25A0 | 9632 | ■ | BLACK SQUARE |
25E0 | 9696 | ◠ | UPPER HALF CIRCLE |
2620 | 9760 | ☠ | SKULL AND CROSSBONES |
2660 | 9824 | ♠ | BLACK SPADE SUIT |
2720 | 10016 | ✠ | MALTESE CROSS |
27A0 | 10144 | ➠ | HEAVY DASHED TRIANGLE-HEADED RIGHTWARDS ARROW |
2800 | 10240 | ⠀ | BRAILLE PATTERN BLANK |
2801 | 10241 | ⠁ | BRAILLE PATTERN DOTS-1 |
2802 | 10242 | ⠂ | BRAILLE PATTERN DOTS-2 |
2803 | 10243 | ⠃ | BRAILLE PATTERN DOTS-12 |
2804 | 10244 | ⠄ | BRAILLE PATTERN DOTS-3 |
2805 | 10245 | ⠅ | BRAILLE PATTERN DOTS-13 |
2806 | 10246 | ⠆ | BRAILLE PATTERN DOTS-23 |
2807 | 10247 | ⠇ | BRAILLE PATTERN DOTS-123 |
2808 | 10248 | ⠈ | BRAILLE PATTERN DOTS-4 |
2809 | 10249 | ⠉ | BRAILLE PATTERN DOTS-14 |
280A | 10250 | ⠊ | BRAILLE PATTERN DOTS-24 |
280B | 10251 | ⠋ | BRAILLE PATTERN DOTS-124 |
280C | 10252 | ⠌ | BRAILLE PATTERN DOTS-34 |
280D | 10253 | ⠍ | BRAILLE PATTERN DOTS-134 |
280E | 10254 | ⠎ | BRAILLE PATTERN DOTS-234 |
280F | 10255 | ⠏ | BRAILLE PATTERN DOTS-1234 |
2810 | 10256 | ⠐ | BRAILLE PATTERN DOTS-5 |
2811 | 10257 | ⠑ | BRAILLE PATTERN DOTS-15 |
2812 | 10258 | ⠒ | BRAILLE PATTERN DOTS-25 |
2813 | 10259 | ⠓ | BRAILLE PATTERN DOTS-125 |
2814 | 10260 | ⠔ | BRAILLE PATTERN DOTS-35 |
2815 | 10261 | ⠕ | BRAILLE PATTERN DOTS-135 |
2816 | 10262 | ⠖ | BRAILLE PATTERN DOTS-235 |
2817 | 10263 | ⠗ | BRAILLE PATTERN DOTS-1235 |
2818 | 10264 | ⠘ | BRAILLE PATTERN DOTS-45 |
2819 | 10265 | ⠙ | BRAILLE PATTERN DOTS-145 |
281A | 10266 | ⠚ | BRAILLE PATTERN DOTS-245 |
281B | 10267 | ⠛ | BRAILLE PATTERN DOTS-1245 |
281C | 10268 | ⠜ | BRAILLE PATTERN DOTS-345 |
281D | 10269 | ⠝ | BRAILLE PATTERN DOTS-1345 |
281E | 10270 | ⠞ | BRAILLE PATTERN DOTS-2345 |
281F | 10271 | ⠟ | BRAILLE PATTERN DOTS-12345 |
2820 | 10272 | ⠠ | BRAILLE PATTERN DOTS-6 |
2821 | 10273 | ⠡ | BRAILLE PATTERN DOTS-16 |
2822 | 10274 | ⠢ | BRAILLE PATTERN DOTS-26 |
2823 | 10275 | ⠣ | BRAILLE PATTERN DOTS-126 |
2824 | 10276 | ⠤ | BRAILLE PATTERN DOTS-36 |
2825 | 10277 | ⠥ | BRAILLE PATTERN DOTS-136 |
2826 | 10278 | ⠦ | BRAILLE PATTERN DOTS-236 |
2827 | 10279 | ⠧ | BRAILLE PATTERN DOTS-1236 |
2828 | 10280 | ⠨ | BRAILLE PATTERN DOTS-46 |
2829 | 10281 | ⠩ | BRAILLE PATTERN DOTS-146 |
282A | 10282 | ⠪ | BRAILLE PATTERN DOTS-246 |
282B | 10283 | ⠫ | BRAILLE PATTERN DOTS-1246 |
282C | 10284 | ⠬ | BRAILLE PATTERN DOTS-346 |
282D | 10285 | ⠭ | BRAILLE PATTERN DOTS-1346 |
282E | 10286 | ⠮ | BRAILLE PATTERN DOTS-2346 |
282F | 10287 | ⠯ | BRAILLE PATTERN DOTS-12346 |
2830 | 10288 | ⠰ | BRAILLE PATTERN DOTS-56 |
2831 | 10289 | ⠱ | BRAILLE PATTERN DOTS-156 |
2832 | 10290 | ⠲ | BRAILLE PATTERN DOTS-256 |
2833 | 10291 | ⠳ | BRAILLE PATTERN DOTS-1256 |
2834 | 10292 | ⠴ | BRAILLE PATTERN DOTS-356 |
2835 | 10293 | ⠵ | BRAILLE PATTERN DOTS-1356 |
2836 | 10294 | ⠶ | BRAILLE PATTERN DOTS-2356 |
2837 | 10295 | ⠷ | BRAILLE PATTERN DOTS-12356 |
2838 | 10296 | ⠸ | BRAILLE PATTERN DOTS-456 |
2839 | 10297 | ⠹ | BRAILLE PATTERN DOTS-1456 |
283A | 10298 | ⠺ | BRAILLE PATTERN DOTS-2456 |
283B | 10299 | ⠻ | BRAILLE PATTERN DOTS-12456 |
283C | 10300 | ⠼ | BRAILLE PATTERN DOTS-3456 |
283D | 10301 | ⠽ | BRAILLE PATTERN DOTS-13456 |
283E | 10302 | ⠾ | BRAILLE PATTERN DOTS-23456 |
283F | 10303 | ⠿ | BRAILLE PATTERN DOTS-123456 |
2860 | 10336 | ⡠ | BRAILLE PATTERN DOTS-67 |
28A0 | 10400 | ⢠ | BRAILLE PATTERN DOTS-68 |
28E0 | 10464 | ⣠ | BRAILLE PATTERN DOTS-678 |
For more information please refer to the Unicode Consortium.
Discussion
Hints, comments, suggestions …
- Before writing this little plugin I tried to use the file
conf/custom.conf
for this task. But this didn't work because for one reason or the other that file isn't used at all by DokuWiki (2005-07-13).Grep
ing the sources showed that the file isn't referenced anywhere. I dunno whether this is a bug or intended behaviour.
You can get most of the effect of this by simply creating a conf/entities.local.conf file and adding a line like this to it. – Adam Shand 2007-01-29
(nbsp)
I am new to DokuWiki (7days). I installed your plugin in a locally installed DokuWiki and a test one available on the net. Since then, when there is the letter “à” (à), this letter is replaced by a space or a question mark followed by a space according to the browser used. All my files are in UTF-8. The funny thing is that you are detecting the sequence “C2A0” and this letter is “C3A0” in UTF-8. You can have a look to the page http://carnetweb.info/wiki/doku.php?id=developement at the end of the line starting with “nbsp” there is the sequence “aàa”.
Thanks for spotting this! Please update your installation by using the fixed code above.
— Matthias Watermann 2005-09-27 14:17
PS: sorry to report here, but I failed to subscribe to the mailing list. Each time the confimation message create a “failure notice” as follow:
<ecartis@freelists.org>: 206.53.239.180 does not like recipient. Remote host said: 554 <26.mail-out.ovh.net[213.186.42.179]>: Client host rejected: Access denied Giving up on 206.53.239.180.
– francois DOT granger AT gmail DOT com
replace the third pattern in function connectTo with '\xC2\xA0'. There does seem to be a problem with PHP's preg and these characters, however to my mind the pattern should be as I described as it doesn't make sense to leave the “\xC2” hanging on its own when there is a pattern match. — Christopher Smith 2005-09-26 02:02In this case, I fear, it's not PHP's fault. My intention (as stated in the source comment) was to replace chr(160) occurences which where not proper UTF-8 encoded. But, alas, it didn't come to my mind at that time, that chr(160) can be part of another UTF-8 sequence. So while your suggestion would (X)HTML encode the correctnbsp
UTF-8 sequence (which is a good thing in itself) it does not was I intended in the first place, it's in fact the opposite: while I wanted to say “find all #160 that are not UTF-8 encoded” your proposal says “find #160 that is UTF-8 encoded”. — As about the chr(194) (0xC2): it was not supposed to “hanging on its own” as the RegEx means that chr(194) must not in front of chr(160) and, as Francois reported, the RegEx as such does do exactly that, with the unintended side effects he mentioned
— Matthias Watermann 2005-09-26 07:44It was late :? (and I just realised there is no oops smiley in Dokuwiki) I went by the comments without paying enough attention to the regex. I guess you need a more complex pattern that attempts to match invalid UTF-8 byte sequences or do away with that pattern altogether. I guess from a conceptual point of view a plugin shouldn't really have to validate the byte sequence used in raw wiki page - especially when it is outside the syntax proposed by the plugin. — Christopher Smith 2005-09-26 09:40
Had UTF-8 problems with this one. Deleting
“$this→Lexer→addSpecialPattern('(?<![\x80-\xE2])\xA0', $aMode, 'plugin_nbsp');”
solved it. Hope it helps others.
I'm Korean user. And I'm using 2007-08-15 version. When I enables this plugin, rows contains some unicode letter which is from U+B800 to U+B83F and from U+C800 to U+C83F are NOT displayed. Please refer dokuwiki issue #988 Thank you in advance :) 2016-04-22
space
: ASCII char #32 — not to be confused with blanks
, ASCII char #255backslash
: ASCII char #92escaped space
”