Docs for page core.php

Description

version: $Id: core.php,v 1.1 2007/09/09 20:39:50 pitlinz Exp $
filesource: Source Code for this file

Constants

UTF8_CORE = TRUE (line 12)

Define UTF8_CORE as required

Functions

utf8_strlen (line 33)

Unicode aware replacement for strlen(). Returns the number

of characters in the string (not the number of bytes), replacing multibyte characters with a single byte equivalent utf8_decode() converts characters that are not in ISO-8859-1 to '?', which, for the purpose of counting, is alright - It's much faster than iconv_strlen Note: this function does not count bad UTF-8 bytes in the string

these are simply ignored

return: number of UTF-8 characters in string
author: <at hotmail dot com chernyshevsky at hotmail dot com>
link: http://www.php.net/manual/en/function.utf8-decode.php
link: http://www.php.net/manual/en/function.strlen.php

int utf8_strlen (string $str)

string $str: UTF-8 string

utf8_strpos (line 54)

UTF-8 aware alternative to strpos

Find position of first occurrence of a string Note: This will get alot slower if offset is used Note: requires utf8_strlen amd utf8_substr to be loaded

return: integer position or FALSE on failure
see: utf8_substr()
see: utf8_strlen()
see: http://www.php.net/strpos

mixed utf8_strpos (string $str, string $needle, [integer $offset = NULL])

string $str: haystack
string $needle: needle (you should validate this with utf8_is_valid)
integer $offset: offset in characters (from left)

utf8_strrpos (line 98)

UTF-8 aware alternative to strrpos

Find position of last occurrence of a char in a string Note: This will get alot slower if offset is used Note: requires utf8_substr and utf8_strlen to be loaded

return: integer position or FALSE on failure
see: utf8_strlen()
see: utf8_substr()
see: http://www.php.net/strrpos

mixed utf8_strrpos (string $str, string $needle, [integer $offset = NULL])

string $str: haystack
string $needle: needle (you should validate this with utf8_is_valid)
integer $offset: (optional) offset (from left)

utf8_strtolower (line 284)

UTF-8 aware alternative to strtolower

Make a string lowercase Note: The concept of a characters "case" only exists is some alphabets such as Latin, Greek, Cyrillic, Armenian and archaic Georgian - it does not exist in the Chinese alphabet, for example. See Unicode Standard Annex #21: Case Mappings Note: requires utf8_to_unicode and utf8_from_unicode

return: either string in lowercase or FALSE is UTF-8 invalid
author: Andreas Gohr <andi@splitbrain.org>
see: http://dev.splitbrain.org/view/darcs/dokuwiki/inc/utf8.php
see: http://www.unicode.org/reports/tr21/tr21-5.html
see: utf8_to_unicode()
see: http://www.php.net/strtolower
see: utf8_from_unicode()

mixed utf8_strtolower (string $string)

string $string

utf8_strtoupper (line 372)

UTF-8 aware alternative to strtoupper

Make a string uppercase Note: The concept of a characters "case" only exists is some alphabets such as Latin, Greek, Cyrillic, Armenian and archaic Georgian - it does not exist in the Chinese alphabet, for example. See Unicode Standard Annex #21: Case Mappings Note: requires utf8_to_unicode and utf8_from_unicode

return: either string in lowercase or FALSE is UTF-8 invalid
author: Andreas Gohr <andi@splitbrain.org>
see: http://dev.splitbrain.org/view/darcs/dokuwiki/inc/utf8.php
see: http://www.unicode.org/reports/tr21/tr21-5.html
see: utf8_to_unicode()
see: http://www.php.net/strtoupper
see: utf8_from_unicode()

mixed utf8_strtoupper (string $string)

string $string

utf8_substr (line 160)

UTF-8 aware alternative to substr Return part of a string given character offset (and optionally length)

Note arguments: comparied to substr - if offset or length are not integers, this version will not complain but rather massages them into an integer.

Note on returned values: substr documentation states false can be returned in some cases (e.g. offset > string length) mb_substr never returns false, it will return an empty string instead. This adopts the mb_substr approach

Note on implementation: PCRE only supports repetitions of less than 65536, in order to accept up to MAXINT values for offset and length, we'll repeat a group of 65535 characters when needed.

Note on implementation: calculating the number of characters in the string is a relatively expensive operation, so we only carry it out when necessary. It isn't necessary for +ve offsets and no specified length

return: string or FALSE if failure
author: Chris Smith<chris@jalakai.co.uk>

mixed utf8_substr (string $str, integer $offset, [integer $length = NULL])

string $str
integer $offset: number of UTF-8 characters offset (from left)
integer $length: (optional) length in UTF-8 characters from offset

/ext/utf8/native/core.php