File/ext/utf8/utils/bad.php

Description
Constants
UTF8_BAD_5OCTET = 1 (line 180)

Return code from utf8_bad_identify() when a five octet sequence is detected.

Note: 5 octets sequences are valid UTF-8 but are not supported by Unicode so do not represent a useful character

UTF8_BAD_6OCTET = 2 (line 190)

Return code from utf8_bad_identify() when a six octet sequence is detected.

Note: 6 octets sequences are valid UTF-8 but are not supported by Unicode so do not represent a useful character

UTF8_BAD_NONSHORT = 4 (line 208)

Return code from utf8_bad_identify().

From Unicode 3.1, non-shortest form is illegal

UTF8_BAD_SEQID = 3 (line 199)

Return code from utf8_bad_identify().

Invalid octet for use as start of multi-byte UTF-8 sequence

UTF8_BAD_SEQINCOMPLETE = 7 (line 236)

Return code from utf8_bad_identify().

Incomplete multi-octet sequence Note: this is kind of a "catch-all"

UTF8_BAD_SURROGATE = 5 (line 217)

Return code from utf8_bad_identify().

From Unicode 3.2, surrogate characters are illegal

UTF8_BAD_UNIOUTRANGE = 6 (line 226)

Return code from utf8_bad_identify().

Codepoints outside the Unicode range are illegal

Functions
utf8_bad_explain (line 384)

Takes a return code from utf8_bad_identify() are returns a message (in English) explaining what the problem is.

mixed utf8_bad_explain (int $code)
  • int $code: return code from utf8_bad_identify
utf8_bad_find (line 33)

Locates the first bad byte in a UTF-8 string returning it's

byte index in the string PCRE Pattern to locate bad bytes in a UTF-8 string Comes from W3 FAQ: Multilingual Forms Note: modified to include full ASCII range including control chars

mixed utf8_bad_find (string $str)
  • string $str
utf8_bad_findall (line 70)

Locates all bad bytes in a UTF-8 string and returns a list of their

byte index in the string PCRE Pattern to locate bad bytes in a UTF-8 string Comes from W3 FAQ: Multilingual Forms Note: modified to include full ASCII range including control chars

mixed utf8_bad_findall (string $str)
  • string $str
utf8_bad_identify (line 250)

Reports on the type of bad byte found in a UTF-8 string. Returns a

status code on the first bad byte found

mixed utf8_bad_identify (string $str,  &$i)
  • string $str: UTF-8 encoded string
  • &$i
utf8_bad_replace (line 146)

Replace bad bytes with an alternative character - ASCII character

recommended is replacement char PCRE Pattern to locate bad bytes in a UTF-8 string Comes from W3 FAQ: Multilingual Forms Note: modified to include full ASCII range including control chars

string utf8_bad_replace (string $str, [string $replace = '?'])
  • string $str: to search
  • string $replace: to replace bad bytes with (defaults to '?') - use ASCII
utf8_bad_strip (line 109)

Strips out any bad bytes from a UTF-8 string and returns the rest

PCRE Pattern to locate bad bytes in a UTF-8 string Comes from W3 FAQ: Multilingual Forms Note: modified to include full ASCII range including control chars

string utf8_bad_strip (string $str)
  • string $str

Documentation generated on Thu, 08 Jan 2009 17:37:33 +0100 by phpDocumentor 1.4.0a2