Table of Contents
Language Syntax PlugIn
Compatible with DokuWiki
2005-07-13+
This extension has not been updated in over 2 years. It may no longer be maintained or supported and may have compatibility issues.
Similar to wrap
Sometimes there arises the need to use words, phrases or even whole sentences or paragraphs in a language different from the document's main language1). To support the readers2) of such a document using several languages it's advisably to explicitly markup all language changes in a document.
This plugin allows for adding markup to indicate such language changes.
It is implemented – technically speaking – by adding appropriate span
tags around the text in question.
Usage
To actually make use of this plugin embed the text using another language than the document's rest in lang
tags:
<lang code> ... </lang>
The language-code
part is usually the two-letter language code as defined by ISO standard 639, Code for the representation of names of languages, the details of its use are explained in
RFC 3066 Tags for the
Identification of Languages.
See the
HTML specs as well for further details.
Please note that this is so-called inline markup, meaning it is to be used inside block elements3).
The lang
tag (as well as its HTML equivalent span
) does not constitute a text block but is part of it.
In consequence this means that you'll have to open a new block (by inserting an empty line) in case you want to markup a whole paragraph as can be seen in the following examples.
Examples
Suppose a document written in plain English. Some sentences, however, are to be given in another language. Therefore those “foreign” parts are marked up as in the following example:
**1** This is an __English__ sentence. <lang de>Dies ist ein //deutscher// Satz.</lang> This is a second __English__ sentence. **2** This is an __English__ sentence. <lang de-DE>Dies ist ein //deutscher// Satz.</lang> This is a second __English__ sentence. **3** This is an __English__ sentence. <lang de> Dies ist ein //deutscher// Satz. </lang> This is a second __English__ sentence. **4** This is an __English__ paragraph. <lang de-> Dies ist ein //deutscher// Absatz. </lang> This is a second __English__ paragraph. **5** This is an __English__ paragraph. <lang x-klingon>Well, I, er ... dunno how to, hmmm... write Klingon.</lang> This is a second __English__ paragraph.
As can be seen the formatting4) follows the usual
rules for inline markup.
In sections one to three the text portion in a different language5) is just a part (here: sentence) between other parts.
In sections four and five, however, there are newlines (empty lines) before and after the lang
markup which renders that part to become a paragraph between other paragraphs.
The resulting HTML, btw, looks as follows:
<p><strong>1</strong></p> <p>This is an <u>English</u> sentence. <span lang="de" xml:lang="de">Dies ist ein <em>deutscher</em> Satz.</span> This is a second <u>English</u> sentence.</p> <p><strong>2</strong></p> <p>This is an <u>English</u> sentence. <span lang="de-DE" xml:lang="de-DE">Dies ist ein <em>deutscher</em> Satz.</span> This is a second <u>English</u> sentence.</p> <p><strong>3</strong></p> <p>This is an <u>English</u> sentence. <span lang="de" xml:lang="de">Dies ist ein <em>deutscher</em> Satz. </span> This is a second <u>English</u> sentence.</p> <p><strong>4</strong></p> <p>This is an <u>English</u> paragraph.</p> <p><span lang="de" xml:lang="de">Dies ist ein <em>deutscher</em> Absatz. </span></p> <p>This is a second <u>English</u> paragraph.</p> <p><strong>5</strong></p> <p>This is an <u>English</u> paragraph.</p> <p><span lang="x-klingon" xml:lang="x-klingon">Well, I, er ... dunno how to, hmmm... write Klingon.</span></p> <p>This is a second <u>English</u> paragraph.</p>
Installation
Search and install the plugin using the Extension Manager. Alternatively, refer to Plugins on how to install plugins manually.
Plugin Source
Here comes the GPLed PHP source6) for those who'd like to scan it before actually installing it:
<?php if (! class_exists('syntax_plugin_lang')) { if (! defined('DOKU_PLUGIN')) { if (! defined('DOKU_INC')) { define('DOKU_INC', realpath(dirname(__FILE__) . '/../../') . '/'); } // if define('DOKU_PLUGIN', DOKU_INC . 'lib/plugins/'); } // if // include parent class require_once(DOKU_PLUGIN . 'syntax.php'); /** * <tt>syntax_plugin_lang.php </tt>- A PHP4 class that implements * a <tt>DokuWiki</tt> plugin to specify an area using a different * language than the remaining document. * * <p> * Markup a section of text to be using a different language, * <tt>lang 2-letter-lang-code</tt> * </p><pre> * Copyright (C) 2005, 2007 DFG/M.Watermann, D-10247 Berlin, FRG * All rights reserved * EMail : <support@mwat.de> * </pre> * <div class="disclaimer"> * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either * <a href="http://www.gnu.org/licenses/gpl.html">version 3</a> of the * License, or (at your option) any later version.<br> * This software is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * General Public License for more details. * </div> * @author <a href="mailto:support@mwat.de">Matthias Watermann</a> * @version <tt>$Id: syntax_plugin_lang.php,v 1.4 2007/08/15 12:36:19 matthias Exp $</tt> * @since created 1-Sep-2005 */ class syntax_plugin_lang extends DokuWiki_Syntax_Plugin { /** * @publicsection */ //@{ /** * Tell the parser whether the plugin accepts syntax mode * <tt>$aMode</tt> within its own markup. * * @param $aMode String The requested syntaxmode. * @return Boolean <tt>TRUE</tt> unless <tt>$aMode</tt> is * <tt>plugin_lang</tt> (which would result in a * <tt>FALSE</tt> method result). * @public * @see getAllowedTypes() * @static */ function accepts($aMode) { return ('plugin_lang' != $aMode); } // accepts() /** * Connect lookup pattern to lexer. * * @param $aMode String The desired rendermode. * @public * @see render() */ function connectTo($aMode) { // See http://www.w3.org/TR/html401/struct/dirlang.html#h-8.1.1; // better (specialized) REs are used in 'handle()' method. $this->Lexer->addEntryPattern( '\x3Clang\s+[a-z\-A-Z0-9]{2,})?\s*\x3E\s*(?=(?s).*?\x3C\x2Flang\x3E)', $aMode, 'plugin_lang'); } // connectTo() /** * Get an associative array with plugin info. * * <p> * The returned array holds the following fields: * <dl> * <dt>author</dt><dd>Author of the plugin</dd> * <dt>email</dt><dd>Email address to contact the author</dd> * <dt>date</dt><dd>Last modified date of the plugin in * <tt>YYYY-MM-DD</tt> format</dd> * <dt>name</dt><dd>Name of the plugin</dd> * <dt>desc</dt><dd>Short description of the plugin (Text only)</dd> * <dt>url</dt><dd>Website with more information on the plugin * (eg. syntax description)</dd> * </dl> * @return Array Information about this plugin class. * @public * @static */ function getInfo() { return array( 'author' => 'Matthias Watermann', 'email' => 'support@mwat.de', 'date' => '2007-08-15', 'name' => 'LANGuage Syntax Plugin', 'desc' => 'Markup a text area using another language', 'url' => 'http://www.dokuwiki.org/plugin:lang'); } // getInfo() /** * Where to sort in? * * @return Integer <tt>498</tt> (doesn't really matter). * @public * @static */ function getSort() { return 498; } // getSort() /** * Get the type of syntax this plugin defines. * * @return String <tt>'formatting'</tt>. * @public * @static */ function getType() { return 'formatting'; } // getType() /** * Handler to prepare matched data for the rendering process. * * <p> * The <tt>$aState</tt> parameter gives the type of pattern * which triggered the call to this method: * </p> * <dl> * <dt>DOKU_LEXER_ENTER</dt> * <dd>a pattern set by <tt>addEntryPattern()</tt></dd> * <dt>DOKU_LEXER_MATCHED</dt> * <dd>a pattern set by <tt>addPattern()</tt></dd> * <dt>DOKU_LEXER_EXIT</dt> * <dd> a pattern set by <tt>addExitPattern()</tt></dd> * <dt>DOKU_LEXER_SPECIAL</dt> * <dd>a pattern set by <tt>addSpecialPattern()</tt></dd> * <dt>DOKU_LEXER_UNMATCHED</dt> * <dd>ordinary text encountered within the plugin's syntax mode * which doesn't match any pattern.</dd> * </dl> * @param $aMatch String The text matched by the patterns. * @param $aState Integer The lexer state for the match. * @param $aPos Integer The character position of the matched text. * @param $aHandler Object Reference to the Doku_Handler object. * @return Array Index <tt>[0]</tt> holds the current * <tt>$aState</tt>, index <tt>[1]</tt> the match prepared for * the <tt>render()</tt> method. * @public * @see render() * @static */ function handle($aMatch, $aState, $aPos, &$aHandler) { if (DOKU_LEXER_ENTER == $aState) { $hits = array(); // RFC 3066, "2. The Language tag", p. 2f. // Language-Tag = Primary-subtag *( "-" Subtag ) if (preg_match('|\s+([a-z]{2,3})\s*>|i', $aMatch, $hits)) { // primary _only_ (most likely to be used) return array($aState, $hits[1]); } // if if (preg_match('|\s+([a-z]{2,3}\-[a-z0-9]{2,})\s*>|i', $aMatch, $hits)) { // primary _and_ subtag return array($aState, $hits[1]); } // if if (preg_match('|\s+([ix]\-[a-z0-9]{2,})\s*>|i', $aMatch, $hits)) { // 1-letter primary _and_ subtag return array($aState, $hits[1]); } // if if (preg_match('|\s+([a-z]{2,3})\-.*\s*>|i', $aMatch, $hits)) { // convenience: accept primary with empty subtag return array($aState, $hits[1]); } // if // invalid language specification return array($aState, FALSE); } // if return array($aState, $aMatch); } // handle() /** * Add exit pattern to lexer. * * @public */ function postConnect() { $this->Lexer->addExitPattern('\x3C\x2Flang\x3E', 'plugin_lang'); } // postConnect() /** * Handle the actual output creation. * * <p> * The method checks for the given <tt>$aFormat</tt> and returns * <tt>FALSE</tt> when a format isn't supported. <tt>$aRenderer</tt> * contains a reference to the renderer object which is currently * handling the rendering. The contents of <tt>$aData</tt> is the * return value of the <tt>handle()</tt> method. * </p> * @param $aFormat String The output format to generate. * @param $aRenderer Object A reference to the renderer object. * @param $aData Array The data created by the <tt>handle()</tt> * method. * @return Boolean <tt>TRUE</tt> if rendered successfully, or * <tt>FALSE</tt> otherwise. * @public * @see handle() * */ function render($aFormat, &$aRenderer, &$aData) { if ('xhtml' != $aFormat) { return FALSE; } // if static $VALID = TRUE; // flag to notice invalid markup switch ($aData[0]) { case DOKU_LEXER_ENTER: if ($aData[1]) { $aRenderer->doc .= '<span lang="' . $aData[1] . '" xml:lang="' . $aData[1] . '">'; } else { $VALID = FALSE; } // if return TRUE; case DOKU_LEXER_UNMATCHED: $aRenderer->doc .= str_replace(array('&','<', '>'), array('&', '<', '>'), $aData[1]); return TRUE; case DOKU_LEXER_EXIT: if ($VALID) { $aRenderer->doc .= '</span>'; } else { $VALID = TRUE; } // if default: return TRUE; } // switch } // render() //@} } // class syntax_plugin_lang } // if //Setup VIM: ex: et ts=2 enc=utf-8 : ?>
Changes
2007-08-15:
* added GPL link and fixed some doc problems;
2007-01-05:
* minor internal changes (added comments, date updated);
2005-09-04:
+ initial release;
Matthias Watermann 2007-08-15
See also
Plugins by the same author
- BOMfix Plugin – ignore Byte-Order-Mark characters in your pages
- Code Syntax Plugin – use syntax highlighting of code fragments in your pages
- Definition List Syntax Plugin – use the only complete definition lists in your pages
- Diff Syntax Plugin – use highlighting of diff files (aka “patches”) in your pages7)
- HR Syntax Plugin – use horizontal rules in nested block elements of your pages
- LANGuage Syntax Plugin – markup different languages in your pages
- Lists Syntax Plugin – use the only complete un-/ordered lists in your pages
- NBSP Syntax Plugin – use Non-Breakable-Spaces in your pages
- NsToC Syntax Plugin – use automatically generated namespace indices
- Shy Syntax Plugin – use soft hyphens in your pages
- Tip Syntax Plugin – add hint areas to your pages
Discussion
Hints, comments, suggestions …
Dosn't seem to work too well in Internet Explorer.
Word 2003 has an option to manually insert phonetics above specified words…
I was wondering if it was possible to create a module or plugin for DokuWiki that does the following for Koine-Greek: a) allows the user to upload a two column wordlist; first column source text, second column phonetic text. b) specify the fonts for the source and phonetic text. c) Have the DokuWiki, automatically recognize the words from the source text on any text [as one types] and auto-insert and center the phonetic text ABOVE each (tagged) occurrence…
An optional button to insert tags on selected text would be great also, not to mention Unicode capability for the source text column, and the option to configure both language and fonts as per source text and phonetic output, if necessary
Thanx a million…
Please contact keith (at) pm-intl (.) org
Keith
See bounties for such requests.
Suggestion: Add dir=“rtl” to span tag for RTL languages. It can possibly be determined by $lang['direction'] in lang.php of that language.
Unfortunately, headline code, e.g. “== headline ==” is not interpreted as headline code, but printed as raw code, ie. the “==” are printed and no headlining code is generated. The same is true for the language tag of the wrap tool.
Rolf Hemmerling 2009-12-23 10:00
<lang ...>
markup in regard to the surrounding text and newlines