Date: 05/06/03
- Next message: Jean-Christian Imbeault: "Re: [PHP-I18N] trimming spaces from multibyte strings"
- Previous message: Jean-Christian Imbeault: "Re: [PHP-I18N] trimming spaces from multibyte strings"
- In reply to: Jean-Christian Imbeault: "Re: [PHP-I18N] trimming spaces from multibyte strings"
- Next in thread: Jean-Christian Imbeault: "Re: [PHP-I18N] trimming spaces from multibyte strings"
- Reply: Jean-Christian Imbeault: "Re: [PHP-I18N] trimming spaces from multibyte strings"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi,
On Tue, 06 May 2003 14:58:25 +0900
Jean-Christian Imbeault <jc <email protected>> wrote:
> Moriyoshi Koizumi wrote:
> >
> >>#1 is trim() multibyte safe?
> >
> > Principally no, but when it comes to half-width white spaces, trim() works
> > without problem.
>
> Hum you say 'no' but then 'yes' :) By "white spaces" did you mean
> "whitespace" as in tabs, line feeds, etc... or specifically only ASCII 0x20?
>
> If you meant all of trim()'s whitespaces (0x[00,09,0A,0B,0D,20]) then
> trim() *is* multibyte safe no?
Since trim() can take the second parameter as well so you can specify which
character to be trimmed, if the parameter is set to "\\" and
the internal encoding is set to Shift_JIS, trim() function might strip
too many and end up cluttering the resulting string. In addition trim() cannot
eliminate full-width (zenkaku) spaces properly as it deems the multibyte character
as two separate character. As for EUC-JP encoding or UTF-8, it'd be safer to use
with trim() than Shift_JIS because of its structure.
> But then again I guess that the is no generic definition of whitespace
> in that different character sets have different codes for white space.
>
> Can ASCII whitespace be considered universal in all character sets?
Virtually many character sets have the same whitespace mapping as ASCII.
But, trim() still fails to deal with such encodings as UTF-32 (a flavor of Unicode)
where each character could be compounded of more than two octets (bytes).
> But isn't there a regular expression for whitespace? Something like
> 'sS'? Is using the regex patter for whitespace not an option?
In many case, the full-width space isn't treated as a generic white space
in regex expressions.
> > 3. Request the developers adding a multibyte-safe version of
> > trim(). This should definitely take some time :)
>
> I did ask something similar six months back but was told (by you :) that
> this was not a common "problem" and that asking this list would be better.
>
> But since you suggest it I will ask for this as a new feature as I
> really do find it difficult to remove whitespace from multibyte strings :)
Well, do you really find it difficult? :)
Moriyoshi
-- PHP Internationalization Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
- Next message: Jean-Christian Imbeault: "Re: [PHP-I18N] trimming spaces from multibyte strings"
- Previous message: Jean-Christian Imbeault: "Re: [PHP-I18N] trimming spaces from multibyte strings"
- In reply to: Jean-Christian Imbeault: "Re: [PHP-I18N] trimming spaces from multibyte strings"
- Next in thread: Jean-Christian Imbeault: "Re: [PHP-I18N] trimming spaces from multibyte strings"
- Reply: Jean-Christian Imbeault: "Re: [PHP-I18N] trimming spaces from multibyte strings"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

