Re: [PHP-I18N] UTF-8 string validity detection From: Moriyoshi Koizumi (moriyoshi <email protected>)
Date: 05/13/03

Hi,

As of the current mbstring implementation, there's no particular function
to verify if a given string is encoded in valid utf-8. Instead, it'd be
worth trying the following workaround:

<?php
function verify_utf8($str) {
        if ($str === mb_convert_encoding(mb_convert_encoding($str, "UTF-32",
"UTF-8"), "UTF-8", "UTF-32")) {
                return true;
        }
        return false;
}

$str = "some UTF-8 encoded string";

var_dump(verify_utf8($str));
?>

Moriyoshi

"Cestmir Hybl" <cestmir <email protected>> wrote:

> Hi.
>
> Is there a way to detect validity of UTF-8 string?
>
> We have a problem processing external data, where "almost-all" of records
> are using valid UTF-8 strings but those "1-of-10000" hangs the process up.
> Records are transformed and then stored into RDBMS (PostgreSQL), which
> returns an error ("invalid UTF sequence"). This RDBMS error cannot be used
> as source of information about UTF validity, because this information has to
> be known prior to inserting DB record.
>
> I've tried this:
>
> $str = 'some INVALID UTF-8 seq. here';
> var_dump(mb_detect_encoding($str, 'UTF-8')); // dumps false
>
> $str = 'some valid UTF-8 seq. here';
> var_dump(mb_detect_encoding($str, 'UTF-8')); // dumps 'UTF-8'
>
> $str = 'some valid UTF-8 seq. here' . 'some INVALID UTF-8 seq. here';
> var_dump(mb_detect_encoding($str, 'UTF-8')); // dumps 'UTF-8'
>
> It seems that mb_detect_encoding() doesn't scan whole string to detect
> encoding (and it's quite reasonable for the purpose of just estimating
> encoding from given set).
> But I can't find any other method not even workaround to test the validity
> of UTF-8 sequence with MBSTRING support in PHP.
>
> (well, I could have used some cheap DB query like 'select
> upper(some-utf-sequence)' which will raise an error if called with invalid
> sequence, but some client-side solution would be much better)
>
> Cestmir Hybl
>
>
>
> --
> PHP Internationalization Mailing List (http://www.php.net/)
> To unsubscribe, visit: http://www.php.net/unsub.php
>

-- 
PHP Internationalization Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php