File/ext/utf8/native/core.php

Description
Constants
UTF8_CORE = TRUE (line 12)

Define UTF8_CORE as required

Functions
utf8_strlen (line 33)

Unicode aware replacement for strlen(). Returns the number

of characters in the string (not the number of bytes), replacing multibyte characters with a single byte equivalent utf8_decode() converts characters that are not in ISO-8859-1 to '?', which, for the purpose of counting, is alright - It's much faster than iconv_strlen Note: this function does not count bad UTF-8 bytes in the string

  • these are simply ignored

int utf8_strlen (string $str)
  • string $str: UTF-8 string
utf8_strpos (line 54)

UTF-8 aware alternative to strpos

Find position of first occurrence of a string Note: This will get alot slower if offset is used Note: requires utf8_strlen amd utf8_substr to be loaded

mixed utf8_strpos (string $str, string $needle, [integer $offset = NULL])
  • string $str: haystack
  • string $needle: needle (you should validate this with utf8_is_valid)
  • integer $offset: offset in characters (from left)
utf8_strrpos (line 98)

UTF-8 aware alternative to strrpos

Find position of last occurrence of a char in a string Note: This will get alot slower if offset is used Note: requires utf8_substr and utf8_strlen to be loaded

mixed utf8_strrpos (string $str, string $needle, [integer $offset = NULL])
  • string $str: haystack
  • string $needle: needle (you should validate this with utf8_is_valid)
  • integer $offset: (optional) offset (from left)
utf8_strtolower (line 284)

UTF-8 aware alternative to strtolower

Make a string lowercase Note: The concept of a characters "case" only exists is some alphabets such as Latin, Greek, Cyrillic, Armenian and archaic Georgian - it does not exist in the Chinese alphabet, for example. See Unicode Standard Annex #21: Case Mappings Note: requires utf8_to_unicode and utf8_from_unicode

mixed utf8_strtolower (string $string)
  • string $string
utf8_strtoupper (line 372)

UTF-8 aware alternative to strtoupper

Make a string uppercase Note: The concept of a characters "case" only exists is some alphabets such as Latin, Greek, Cyrillic, Armenian and archaic Georgian - it does not exist in the Chinese alphabet, for example. See Unicode Standard Annex #21: Case Mappings Note: requires utf8_to_unicode and utf8_from_unicode

mixed utf8_strtoupper (string $string)
  • string $string
utf8_substr (line 160)

UTF-8 aware alternative to substr Return part of a string given character offset (and optionally length)

Note arguments: comparied to substr - if offset or length are not integers, this version will not complain but rather massages them into an integer.

Note on returned values: substr documentation states false can be returned in some cases (e.g. offset > string length) mb_substr never returns false, it will return an empty string instead. This adopts the mb_substr approach

Note on implementation: PCRE only supports repetitions of less than 65536, in order to accept up to MAXINT values for offset and length, we'll repeat a group of 65535 characters when needed.

Note on implementation: calculating the number of characters in the string is a relatively expensive operation, so we only carry it out when necessary. It isn't necessary for +ve offsets and no specified length

mixed utf8_substr (string $str, integer $offset, [integer $length = NULL])
  • string $str
  • integer $offset: number of UTF-8 characters offset (from left)
  • integer $length: (optional) length in UTF-8 characters from offset

Documentation generated on Thu, 08 Jan 2009 17:40:14 +0100 by phpDocumentor 1.4.0a2