Friday, November 21, 2008

Strings

  • C#
    string is an alias for String
    String: UTF16
  • Java
    UTF16
  • C++
    wchar_t: UTF16 on Windows, UTF32 on Unix
    wchar_t is not portable, better to use ICU instead.
  • stl::string
    typedef basic_string < char > string;
    typedef basic_string < wchar_t > wstring;
    wstring string1 = L"obecnÄ›";
  • CString
  • ICU
    UnicodeString: UTF16 internally. Note that the endianess of UTF-16 is platform dependent.
    ICU does not use UCS-2. UCS-2 is a subset of UTF-16. UCS-2 does not support surrogates, and UTF-16 does support surrogates.
  • Note: most commonly, "Unicode" means UCS-2, but the difference between UCS-2 and UTF16 is not visible from the Java and C# standpoint
  • Windows TCHAR
    #ifdef UNICODE
    typedef wchar_t TCHAR;
    #else
    typedef char TCHAR;
    #endif
    TCHAR * toto = _T("obecnÄ›");
  • Note: in Java and C#, strings are immutable. String content cannot be changed after the object is created. See StringBuffer (Java) and StringBuilder (C#)
  • No comments: