<- [[:tips|Tips and Tricks]] ====== Romanize filenames ====== **Keywords: UTF-8, romanize, cyrillic, latin, convert, filename** When upgrading from previous versions that did not yet have the "romanize" function, you will encounter a completely 'unreadable' directory structure. For example: %D0%BA%D1%8B%D1%80%D0%B3%D1%8B%D0%B7%D1%81%D1%82%D0%B0%D0%BD.txt is the same as кыргызстан.txt This is because UTF-8 filenames have been urlencoded. In later versions, the "romanization" option has been added to circumvent this problem. ((see [[config:deaccent]] and [[:romanization]] for more info)) The script below will convert this unreadable directory structure to "romanized" filenames. You will have to include the [[tips:romanize:UTF8.php]] file which is part of the dokuwiki installation. Note: this script is not error free: for example: there are some cyrillic characters that will end your filename with "'". This is because in UTF-8.php the transliteration of the 'ъ' is as "'" Please check your pagestructure after conversion for invalid filenames. I hope this will help someone. Any improvements welcome. Update: UTF8.php has been rewritten, code below has only been tested with this version of [[tips:romanize:UTF8.php]] * @link http://aidanlister.com/repos/v/function.copyr.php * @param string $source Source path * @param string $dest Destination path * @return bool Returns TRUE on success, FALSE on failure */ function copyr($source, $dest) { $dest2=cleanID($dest); echo $source."->".$dest." ->$dest2
\n"; // Simple copy for a file if (is_file($source)) { return copy($source, $dest2); } // Make destination directory if (!is_dir($dest)) { mkdir($dest2); } // Loop through the folder $dir = dir($source); while (false !== $entry = $dir->read()) { // Skip pointers if ($entry == '.' || $entry == '..') { continue; } // Deep copy directories if ($dest !== "$source/$entry") { copyr("$source/$entry", "$dest/$entry"); } } // Clean up $dir->close(); return true; } copyr("/dokuwiki/data/pages/","/dokuwiki/data/pagesnew/"); function cleanID($id,$ascii=false){ $id = trim(urldecode($id)); $id = utf8_strtolower($id); $id = utf8_romanize($id); utf8_deaccent($id,-1); $id = preg_replace('#\'+#','_',$id); return($id); } ?>