Re: [PHP-I18N] trimming spaces from multibyte strings From: Moriyoshi Koizumi (moriyoshi <email protected>)
Date: 05/03/03

Hi,

Jean-Christian Imbeault <jc <email protected>> wrote:

> #1 is trim() multibyte safe?

Principally no, but when it comes to half-width white spaces, trim() works
without problem.

> #2 if not what is a simple way of trimming whitespace from multibyte
> strings, in general if possible, and in particular for japanese strings?

1. Enable mbregex extension at configure time and use mb_ereg_replace().
   Note that mb_ereg_replace() can only handle strings encoded in SJIS,
   EUC-JP, UTF-8, and ASCII code sets.

   ex.

   <?php
       function my_mb_trim($str, $spc)
       {
           return mb_ereg_replace("[$spc]*$", "",
                      mb_ereg_replace("^[$spc]*", "", $str));
       }
   ?>

2. Use preg_replace with U option, assuming UTF-8 as the internal
   character encoding. This method is considered to be safer and stable
   than mb_ereg_replace() because it was reported that mb_ereg_replace()
   caused segmentation fault with some complicated pattern.

3. Request the developers adding a multibyte-safe version of
   trim(). This should definitely take some time :)

Moriyoshi

-- 
PHP Internationalization Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php