Logged In: YES
user_id=21627
Obtaining the locale's codeset by parsing environment
variables is bogus. For example, in most installations, the
codeset for de_DE@euro is iso-8859-15. However, this is
impossible to find out by just parsing the environment
variables.
Instead, the proper way is to use
locale._nl_langinfo(CODESET) where available. If that is not
available, the following heuristics could be applied:
- On Windows, it is "mbcs"
- On Unix, parse the environment variables
As for the actual usage of the charset, I think you
misinterpret the gettext recommendation: the result of
gettext ought to be in the locale's encoding (this is not a
default encoding). This means that, if the codeset of the
locale and the charset of the catalog differ, character set
conversion needs to be invoked; I can see no traces of that
happening in your patch.
The common case is a catalog in UTF-8, and the user's
codeset is language-specific (such as Latin-9). In that
case, conversion works well. There is also the case of
unsupported conversions (e.g. usage of EURO SIGN in the
catalog, but Latin-1 in the locale); in this case, glibc
iconv uses transliteration (to "EUR", in the example). Since
we have no transliteration, we would probably fall back to
return the string in the catalog's encoding :-(
|