Docs for page bad.php

/ext/utf8/utils/bad.php

Description

version: $Id: bad.php,v 1.1 2007/09/09 20:39:51 pitlinz Exp $ Tools for locating / replacing bad bytes in UTF-8 strings The Original Code is Mozilla Communicator client code. The Initial Developer of the Original Code is Netscape Communications Corporation. Portions created by the Initial Developer are Copyright (C) 1998 the Initial Developer. All Rights Reserved. Ported to PHP by Henri Sivonen (http://hsivonen.iki.fi) Slight modifications to fit with phputf8 library by Harry Fuecks (hfuecks gmail com)
see: utf8_is_valid()
see: http://hsivonen.iki.fi/php-utf8/
see: http://lxr.mozilla.org/seamonkey/source/intl/uconv/src/nsUnicodeToUTF8.cpp
see: http://lxr.mozilla.org/seamonkey/source/intl/uconv/src/nsUTF8ToUnicode.cpp
filesource: Source Code for this file

Constants

UTF8_BAD_5OCTET = 1 (line 180)

Return code from utf8_bad_identify() when a five octet sequence is detected.

Note: 5 octets sequences are valid UTF-8 but are not supported by Unicode so do not represent a useful character

see: utf8_bad_identify()

UTF8_BAD_6OCTET = 2 (line 190)

Return code from utf8_bad_identify() when a six octet sequence is detected.

Note: 6 octets sequences are valid UTF-8 but are not supported by Unicode so do not represent a useful character

see: utf8_bad_identify()

UTF8_BAD_NONSHORT = 4 (line 208)

Return code from utf8_bad_identify().

From Unicode 3.1, non-shortest form is illegal

see: utf8_bad_identify()

UTF8_BAD_SEQID = 3 (line 199)

Return code from utf8_bad_identify().

Invalid octet for use as start of multi-byte UTF-8 sequence

see: utf8_bad_identify()

UTF8_BAD_SEQINCOMPLETE = 7 (line 236)

Return code from utf8_bad_identify().

Incomplete multi-octet sequence Note: this is kind of a "catch-all"

see: utf8_bad_identify()

UTF8_BAD_SURROGATE = 5 (line 217)

Return code from utf8_bad_identify().

From Unicode 3.2, surrogate characters are illegal

see: utf8_bad_identify()

UTF8_BAD_UNIOUTRANGE = 6 (line 226)

Return code from utf8_bad_identify().

Codepoints outside the Unicode range are illegal

see: utf8_bad_identify()

Functions

utf8_bad_explain (line 384)

Takes a return code from utf8_bad_identify() are returns a message (in English) explaining what the problem is.

return: string message or FALSE if return code unknown
see: utf8_bad_identify()

mixed utf8_bad_explain (int $code)

int $code: return code from utf8_bad_identify

utf8_bad_find (line 33)

Locates the first bad byte in a UTF-8 string returning it's

byte index in the string PCRE Pattern to locate bad bytes in a UTF-8 string Comes from W3 FAQ: Multilingual Forms Note: modified to include full ASCII range including control chars

return: integer byte index or FALSE if no bad found
see: http://www.w3.org/International/questions/qa-forms-utf-8

mixed utf8_bad_find (string $str)

string $str

utf8_bad_findall (line 70)

Locates all bad bytes in a UTF-8 string and returns a list of their

byte index in the string PCRE Pattern to locate bad bytes in a UTF-8 string Comes from W3 FAQ: Multilingual Forms Note: modified to include full ASCII range including control chars

return: array of integers or FALSE if no bad found
see: http://www.w3.org/International/questions/qa-forms-utf-8

mixed utf8_bad_findall (string $str)

string $str

utf8_bad_identify (line 250)

Reports on the type of bad byte found in a UTF-8 string. Returns a

status code on the first bad byte found

return: integer constant describing problem or FALSE if valid UTF-8
author: <hsivonen@iki.fi>
see: http://hsivonen.iki.fi/php-utf8/
see: utf8_bad_explain()

mixed utf8_bad_identify (string $str, &$i)

string $str: UTF-8 encoded string
&$i

utf8_bad_replace (line 146)

Replace bad bytes with an alternative character - ASCII character

recommended is replacement char PCRE Pattern to locate bad bytes in a UTF-8 string Comes from W3 FAQ: Multilingual Forms Note: modified to include full ASCII range including control chars

see: http://www.w3.org/International/questions/qa-forms-utf-8

string utf8_bad_replace (string $str, [string $replace = '?'])

string $str: to search
string $replace: to replace bad bytes with (defaults to '?') - use ASCII

utf8_bad_strip (line 109)

Strips out any bad bytes from a UTF-8 string and returns the rest

PCRE Pattern to locate bad bytes in a UTF-8 string Comes from W3 FAQ: Multilingual Forms Note: modified to include full ASCII range including control chars

see: http://www.w3.org/International/questions/qa-forms-utf-8

string utf8_bad_strip (string $str)

string $str

Documentation generated on Thu, 08 Jan 2009 17:37:33 +0100 by phpDocumentor 1.4.0a2