My favorites | Sign in
Project Home Downloads Wiki Issues Source
Search
for
Overview  
Updated Oct 12, 2011 by rin.nas.77

Standard PHP functions, implemented for UTF-8 encoding string

Alphabetical order list:

  1. array_change_key_case()
  2. chr() — Converts a UNICODE codepoint to a UTF-8 character
  3. chunk_split()
  4. ltrim()
  5. ord() — Converts a UTF-8 character to a UNICODE codepoint
  6. preg_match_all() — Call preg_match_all() and convert byte offsets into character offsets for PREG_OFFSET_CAPTURE flag. This is regardless of whether you use /u modifier.
  7. range()
  8. rtrim()
  9. str_pad()
  10. str_split()
  11. strcasecmp()
  12. strcmp()
  13. stripos()
  14. strlen()
  15. strncmp()
  16. strpos()
  17. strrev()
  18. strspn()
  19. strtolower(), lowercase() is alias
  20. strtoupper(), uppercase() is alias
  21. strtr()
  22. substr()
  23. substr_replace()
  24. trim()
  25. ucfirst()
  26. ucwords()

Extra useful functions for UTF-8 encoding string

Alphabetical order list:

  1. blocks_check()
    Check the data in UTF-8 charset on given ranges of the standard UNICODE.
    The suitable alternative to regular expressions.
  2. convert_case()
    Конвертирует регистр букв в данных в кодировке UTF-8.
    Массивы обходятся рекурсивно, при этом конвертируются только значения в элементах массива, а ключи остаются без изменений.
  3. convert_files_from()
    Recode the text files in a specified folder in the UTF-8
    In the processing skipped binary files, files encoded in UTF-8, files that could not convert.
  4. convert_from()
    Encodes data from another character encoding to UTF-8.
  5. convert_to()
    Encodes data from UTF-8 to another character encoding.
  6. diactrical_remove()
    Remove combining diactrical marks, with possibility of the restore
    Удаляет диакритические знаки в тексте, с возможностью восстановления (опция)
  7. diactrical_restore()
    Restore combining diactrical marks, removed by diactrical_remove()
    Восстанавливает диакритические знаки в тексте, при условии, что их символьные позиции и кол-во символов не изменились!
  8. from_unicode()
    Converts a UNICODE codepoints to a UTF-8 string
  9. has_binary()
    Check the data accessory to the class of control characters in ASCII.
  10. html_entity_decode()
    Convert all HTML entities to native UTF-8 characters
  11. html_entity_encode()
    Convert special UTF-8 characters to HTML entities.
  12. is_ascii()
    Check the data accessory to the class of characters ASCII.
  13. is_utf8()
    Returns true if data is valid UTF-8 and false otherwise. For null, integer, float, boolean returns TRUE.
  14. preg_quote_case_insensitive()
    Make regular expression for case insensitive match
  15. str_limit(), truncate()
    Обрезает текст в кодировке UTF-8 до заданной длины, причём последнее слово показывается целиком, а не обрывается на середине. Html сущности корректно обрабатываются.
  16. strict()
    Strips out device control codes in the ASCII range.
  17. textarea_rows()
    Calculates the height of the edit text in <textarea> html tag by value and width.
  18. to_unicode()
    Converts a UTF-8 string to a UNICODE codepoints
  19. unescape()
    Decodes a string to UTF-8 string from some formats (can be mixed)
    Examples
  20. '%D1%82%D0%B5%D1%81%D1%82'        => "\xD1\x82\xD0\xB5\xD1\x81\xD1\x82"  #binary (regular)
    '0xD182D0B5D181D182'              => "\xD1\x82\xD0\xB5\xD1\x81\xD1\x82"  #binary (compact)
    '%u0442%u0435%u0441%u0442'        => "\xD1\x82\xD0\xB5\xD1\x81\xD1\x82"  #UCS-2  (U+0 — U+FFFF)
    '%u{442}%u{435}%u{0441}%u{00442}' => "\xD1\x82\xD0\xB5\xD1\x81\xD1\x82"  #UTF-8  (U+0 — U+FFFFFF)
  21. unescape_request()
    1. Corrects the global arrays $_GET, $_POST, $_COOKIE, $_REQUEST, $_FILES decoded values from %XX and extended %uXXXX / %u{XXXXXX} format, for example, through an outdated javascript function escape(). Standard PHP5 cannot do it.
    2. Recode $_GET, $_POST, $_COOKIE, $_REQUEST, $_FILES from $charset encoding to UTF-8, if necessary. A side effect is a positive protection against XSS attacks with non-printable characters on the vulnerable PHP function. Thus web forms can be sent to the server in 2-encoding: $charset and UTF-8. For example: ?тест[тест]=тест
    3. If in the HTTP_COOKIE there are parameters with the same name, takes the last value (as in the QUERY_STRING), not the first.
    4. Creates an array of $_POST for non-standard Content-Type, for example, "Content-Type: application/octet-stream". Standard PHP5 creates an array for "Content-Type: application/x-www-form-urlencoded" and "Content-Type: multipart/form-data".
      Examples
    5. '%F2%E5%F1%F2'                    => 'тест'  #CP1251 (regular)
      '0xF2E5F1F2'                      => 'тест'  #CP1251 (compact)
      '%D1%82%D0%B5%D1%81%D1%82'        => 'тест'  #UTF-8 (regular)
      '0xD182D0B5D181D182'              => 'тест'  #UTF-8 (compact)
      '%u0442%u0435%u0441%u0442'        => 'тест'  #UCS-2 (U+0 — U+FFFF)
      '%u{442}%u{435}%u{0441}%u{00442}' => 'тест'  #UTF-8 (U+0 — U+FFFFFF)

Sign in to add a comment
Powered by Google Project Hosting