devel:utf-8
UTF-8 String Handling
PHP treats all strings as ASCII by default. The recommended way of working with UTF-8 strings is to use the mb_string extension. Unfortunately this library is not always available. DokuWiki comes with a library that will handle all UTF-8 string in pure PHP when mb_string is not available.
Note: Only use UTF-8 aware functions when needed. If operations can be done on byte level without special care for character boundaries this should be done as it is usually much faster.
The available UTF-8 aware methods can be found in the \dokuwiki\Utf8
namespace.
- dokuwiki\Utf8\Asian provides methods to treat Asian scripts as “words”. This is mostly used to tokenize Asian texts for full text search
- dokuwiki\Utf8\Clean provides methods to check and clean strings. It also provides romanization for some language scripts.
- dokuwiki\Utf8\Conversion provides methods to convert between Unicode dialects and HTML entitites
- dokuwiki\Utf8\PhpString provides UTF-8 aware replacements for typical PHP string methods like
strlen
,substr
,strtolower
, etc. This is probably the class, plugin authors might use the most. - dokuwiki\Utf8\Sort provides language aware sorting without the need for the
intl
extension (but will use it if available).
devel/utf-8.txt · Last modified: 2022-08-22 14:21 by andi