WIKINDX API 6.4.9

UTF8

smartUtf8_encode() : string: Encode a string in UTF-8 if not already UTF-8
smartUtf8_decode() : string: Decode UTF-8 ONLY if the input has been UTF-8-encoded.
decodeUtf8() : string: UTF-8 encoding - PROPERLY decode UTF-8 as PHP's utf8_decode can't hack it.
html_uentity_decode() : string: Encode UTF-8 from unicode characters
mb_ucfirst() : string: A unicode aware replacement for ucfirst()
mb_str_word_count() : int|array<string|int, string>: count UTF-8 words in a string
mb_chr() : string: Simulate chr() for multibytes strings
mb_explode() : string: Simulate explode() for multibytes strings (as documented for PHP 7.0)
mb_str_pad() : string: Simulate str_pad() for multibytes strings
mb_strcasecmp() : string: Simulate strcasecmp() for multibytes strings
mb_strrev() : string: Simulate strrev() for multibytes strings
mb_substr_replace() : string: Simulate substr_replace() for multibytes strings
mb_ord() : string: Simulate ord() for UTF8 strings (not arbitrary multibytes strings)
mb_trim() : string: Code by Ben XO at https://www.php.net/manual/en/ref.mbstring.php
code2utf8() : string: convert an integer to its chr() representation

smartUtf8_encode()

Encode a string in UTF-8 if not already UTF-8


    
                smartUtf8_encode(string $str) : string

Tools for validing a UTF-8 string is well formed. The Original Code is Mozilla Communicator client code. The Initial Developer of the Original Code is Netscape Communications Corporation. Portions created by the Initial Developer are Copyright (C) 1998 the Initial Developer. All Rights Reserved. Ported to PHP by Henri Sivonen (http://hsivonen.iki.fi) Slight modifications to fit with phputf8 library by Harry Fuecks (hfuecks gmail com)

Tests a string as to whether it's valid UTF-8 and supported by the Unicode standard

Parameters

$str : string: UTF-8 encoded string

smartUtf8_decode()

Decode UTF-8 ONLY if the input has been UTF-8-encoded.


    
                smartUtf8_decode(string $inStr) : string

Adapted from 'nospam' in the user contributions at: http://www.php.net/manual/en/function.utf8-decode.php

Parameters

$inStr : string

decodeUtf8()

UTF-8 encoding - PROPERLY decode UTF-8 as PHP's utf8_decode can't hack it.


    
                decodeUtf8(string $utf8_string) : string

Freely borrowed from morris_hirsch at http://www.php.net/manual/en/function.utf8-decode.php bytes bits representation 1 7 0bbbbbbb 2 11 110bbbbb 10bbbbbb 3 16 1110bbbb 10bbbbbb 10bbbbbb 4 21 11110bbb 10bbbbbb 10bbbbbb 10bbbbbb Each b represents a bit that can be used to store character data.

input CANNOT have single byte upper half extended ascii codes

Parameters

$utf8_string : string

html_uentity_decode()

Encode UTF-8 from unicode characters


    
                html_uentity_decode(string $str) : string

Parameters

$str : string

mb_ucfirst()

A unicode aware replacement for ucfirst()


    
                mb_ucfirst(string $str) : string

Parameters

$str : string

mb_str_word_count()

count UTF-8 words in a string


    
                mb_str_word_count(string $str, string $format[, string $charlist = "" ]) : int|array<string|int, string>

Parameters

$str : string
$format : string
$charlist : string = ""

mb_chr()

Simulate chr() for multibytes strings


    
                mb_chr(string $dec) : string

Parameters

$dec : string

mb_explode()

Simulate explode() for multibytes strings (as documented for PHP 7.0)


    
                mb_explode(string $delimiter, string $string[, int $limit = PHP_INT_MAX ]) : string

Parameters

$delimiter : string
$string : string
$limit : int = PHP_INT_MAX: Default is PHP_INT_MAX.

mb_str_pad()

Simulate str_pad() for multibytes strings


    
                mb_str_pad(string $str, int $pad_len[, string $pad_str = ' ' ][, string $dir = STR_PAD_RIGHT ][, string $encoding = NULL ]) : string

Parameters

$str : string
$pad_len : int
$pad_str : string = ' ': Default is ' '.
$dir : string = STR_PAD_RIGHT: Default is STR_PAD_RIGHT.
$encoding : string = NULL: Default is NULL.

mb_strcasecmp()

Simulate strcasecmp() for multibytes strings


    
                mb_strcasecmp(string $str1, string $str2[, string $encoding = NULL ]) : string

A simple multibyte-safe case-insensitive string comparison

Parameters

$str1 : string
$str2 : string
$encoding : string = NULL: Default is NULL.

mb_strrev()

Simulate strrev() for multibytes strings


    
                mb_strrev(string $str) : string

Parameters

$str : string

mb_substr_replace()

Simulate substr_replace() for multibytes strings


    
                mb_substr_replace(string $string, string $replacement, int $start[, int $length = NULL ][, string $encoding = NULL ]) : string

Parameters

$string : string
$replacement : string
$start : int
$length : int = NULL: Default is NULL.
$encoding : string = NULL: Default is NULL.

mb_ord()

Simulate ord() for UTF8 strings (not arbitrary multibytes strings)


    
                mb_ord(string $string) : string

Parameters

$string : string

mb_trim()

Code by Ben XO at https://www.php.net/manual/en/ref.mbstring.php


    
                mb_trim(string $string[, string $charlist = '\\s' ][, bool $ltrim = TRUE ][, bool $rtrim = TRUE ]) : string

Trim characters from either (or both) ends of a string in a way that is multibyte-friendly.

Mostly, this behaves exactly like trim() would: for example supplying 'abc' as the charlist will trim all 'a', 'b' and 'c' chars from the string, with, of course, the added bonus that you can put unicode characters in the charlist.

We are using a PCRE character-class to do the trimming in a unicode-aware way, so we must escape ^, , - and ] which have special meanings here. As you would expect, a single \ in the charlist is interpretted as "trim backslashes" (and duly escaped into a double-\ ). Under most circumstances you can ignore this detail.

As a bonus, however, we also allow PCRE special character-classes (such as '\s') because they can be extremely useful when dealing with UCS. '\pZ', for example, matches every 'separator' character defined in Unicode, including non-breaking and zero-width spaces.

It doesn't make sense to have two or more of the same character in a character class, therefore we interpret a double \ in the character list to mean a single \ in the regex, allowing you to safely mix normal characters with PCRE special classes.

Be careful when using this bonus feature, as PHP also interprets backslashes as escape characters before they are even seen by the regex. Therefore, to specify '\s' in the regex (which will be converted to the special character class '\s' for trimming), you will usually have to put 4 backslashes in the PHP code - as you can see from the default value of $charlist.

Parameters

$string : string: The string to trim
$charlist : string = '\\s': charlist list of characters to remove from the ends
$ltrim : bool = TRUE: trim the left? (Default is TRUE)
$rtrim : bool = TRUE: trim the right? (Default is TRUE)

code2utf8()

convert an integer to its chr() representation


    
                code2utf8(int $num) : string

Parameters

$num : int

WIKINDX API 6.4.9

UTF8

Table of Contents

Functions

smartUtf8_encode()

Parameters

Tags

smartUtf8_decode()

Parameters

decodeUtf8()

Parameters

html_uentity_decode()

Parameters

mb_ucfirst()

Parameters

Tags

mb_str_word_count()

Parameters

Tags

mb_chr()

Parameters

mb_explode()

Parameters

mb_str_pad()

Parameters

mb_strcasecmp()

Parameters

mb_strrev()

Parameters

mb_substr_replace()

Parameters

mb_ord()

Parameters

mb_trim()

Parameters

code2utf8()

Parameters

Search results