Date: 05/13/03
- Next message: Moriyoshi Koizumi: "Re: [PHP-I18N] UTF-8 string validity detection"
- Previous message: Moriyoshi Koizumi: "Re: [PHP-I18N] UTF-8 string validity detection"
- In reply to: Moriyoshi Koizumi: "Re: [PHP-I18N] UTF-8 string validity detection"
- Next in thread: Moriyoshi Koizumi: "Re: [PHP-I18N] UTF-8 string validity detection"
- Reply: Moriyoshi Koizumi: "Re: [PHP-I18N] UTF-8 string validity detection"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Thanks a lot!
Meanwhile, I was playng with PCRE solution like this:
function utf8IsValidString($AStr)
{
$ptrASCII = '[\x00-\x7F]';
$ptr2Octet = '[\xC2-\xDF][\x80-\xBF]';
$ptr3Octet = '[\xE0-\xEF][\x80-\xBF]{2}';
$ptr4Octet = '[\xF0-\xF4][\x80-\xBF]{3}';
$ptr5Octet = '[\xF8-\xFB][\x80-\xBF]{4}';
$ptr6Octet = '[\xFC-\xFD][\x80-\xBF]{5}';
return
preg_match("/^($ptrASCII|$ptr2Octet|$ptr3Octet|$ptr4Octet|$ptr5Octet|$ptr6Oc
tet)*$/s", $AStr);
}
but it tends to segfault on longer input (~10kB of text).
I've performed couple of tests and your solution seems to work fine, though
there's no specification on how exactly mb_convert_encoding() behaves on
incorrect input and how this may change in future. Stability of UTF-8 <->
UCS-4 round trip seems to be guarantied in RFC 2279.
CH
> As of the current mbstring implementation, there's no particular function
> to verify if a given string is encoded in valid utf-8. Instead, it'd be
> worth trying the following workaround:
>
> <?php
> function verify_utf8($str) {
> if ($str === mb_convert_encoding(mb_convert_encoding($str, "UTF-32",
> "UTF-8"), "UTF-8", "UTF-32")) {
> return true;
> }
> return false;
> }
-- PHP Internationalization Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
- Next message: Moriyoshi Koizumi: "Re: [PHP-I18N] UTF-8 string validity detection"
- Previous message: Moriyoshi Koizumi: "Re: [PHP-I18N] UTF-8 string validity detection"
- In reply to: Moriyoshi Koizumi: "Re: [PHP-I18N] UTF-8 string validity detection"
- Next in thread: Moriyoshi Koizumi: "Re: [PHP-I18N] UTF-8 string validity detection"
- Reply: Moriyoshi Koizumi: "Re: [PHP-I18N] UTF-8 string validity detection"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

