This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: locale 1251 does not convert to upper case properly
Type: behavior Stage: needs patch
Components: Interpreter Core, Windows Versions: Python 2.7
process
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: Nosy List: BreamoreBoy, ajaksu2, amaury.forgeotdarc, dobrokot, georg.brandl, loewis, r.david.murray
Priority: normal Keywords:

Created on 2007-01-13 17:30 by dobrokot, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
yo.py dobrokot, 2007-01-13 17:30 source code
toupper.zip dobrokot, 2007-01-18 21:18 _toupper.c and toupper.c files from VC++7.1 CRT
Messages (14)
msg31021 - (view) Author: Ivan Dobrokotov (dobrokot) Date: 2007-01-13 17:30
<pre>
 # -*- coding: 1251 -*-

import locale

locale.setlocale(locale.LC_ALL, ".1251") #locale name may be Windows specific?

#-----------------------------------------------
print chr(184), chr(168)
assert  chr(255).upper() == chr(223) #OK
assert  chr(184).upper() == chr(168) #fail
#-----------------------------------------------
assert  'q'.upper() == 'Q' #OK 
assert  'ж'.upper() == 'Ж' #OK
assert  'Ж'.upper() == 'Ж' #OK
assert  'я'.upper() == 'Я' #OK
assert  u'ё'.upper() == u'Ё' #OK (locale independent)
assert  'ё'.upper() == 'Ё' #fail
</pre>

I suppose incorrect realization of uppercase like 

<pre>
if ('a' <= c && c <= 'я')
  return c+'Я'-'я'
</pre>

symbol 'ё' (184 in cp1251) is not in range 'a'-'я'
msg31022 - (view) Author: Ivan Dobrokotov (dobrokot) Date: 2007-01-13 17:49
C-CRT library fucntion toupper('Ñ‘') works properly, if I set setlocale(LC_ALL, ".1251")
msg31023 - (view) Author: Ivan Dobrokotov (dobrokot) Date: 2007-01-13 17:51
sorry, I mean 
toupper((int)(unsigned char)'Ñ‘') 
not just  toupper('Ñ‘') 
msg31024 - (view) Author: Ivan Dobrokotov (dobrokot) Date: 2007-01-13 21:08
forgot to mention used python version - http://www.python.org/ftp/python/2.5/python-2.5.msi
msg31025 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2007-01-18 20:08
You can see the implementation of .upper in

http://svn.python.org/projects/python/tags/r25/Objects/stringobject.c
(function string_upper)

Off-hand, I cannot see anything wrong in that code. It definitely does *not* use c+'Я'-'я'.
msg31026 - (view) Author: Ivan Dobrokotov (dobrokot) Date: 2007-01-18 21:18
well, C:
----------------------------

#include <locale.h>
#include <stdio.h>
#include <assert.h>

int main()
{
  int i = 184;
  char *old = setlocale(LC_CTYPE, ".1251");
  assert(old);
  printf("%d -> %d\n", i, _toupper(i));   
  printf("%d -> %d\n", i, toupper(i));   
}

----------------------------
C ouput: 
184 -> 152
184 -> 168

so, _toupper and upper are different functions. MSDN does not mention nothing about difference, except that 'toupper' is "ANSI compatible" :(



File Added: toupper.zip
msg31027 - (view) Author: Ivan Dobrokotov (dobrokot) Date: 2007-01-18 21:59

----------------------------------------------
standard header ctype.h:

#define _toupper(_c)    ( (_c)-'a'+'A' )


----------------------------------------------
CRT file toupper.c:



/* define function-like macro equivalent to _toupper()
 */
#define mkupper(c)  ( (c)-'a'+'A' )



int __cdecl _toupper (
        int c
        )
{
        return(mkupper(c));
}

( http://www.everfall.com/paste/id.php?j13ernl40i9e )

suggestion: replace _toupper with toupper. Performance may degrade ( a lot thread locks/MultiByteToWideChar/other code for every non-ASCII lowercase symbol). Sugestion for optimization: setup "int toupper_table[256]"  (and other tables) in everycall to setlocale.


msg84614 - (view) Author: Daniel Diniz (ajaksu2) * (Python triager) Date: 2009-03-30 19:04
May be related to issue 1633600.
msg116585 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2010-09-16 18:01
I've tried to see if this is still an issue but frankly can't make head nor tail out of it :(  Any locale gurus up for this?
msg117452 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2010-09-27 17:36
the OP is right: str.upper is supposed to be locale-dependent
http://docs.python.org/library/stdtypes.html#str.upper

But the implementation uses _toupper() which is a macro with Visual Studio, and obviously not locale-dependent:

#define _toupper(_Char)    ( (_Char)-'a'+'A' )
msg185925 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2013-04-03 14:12
Am I correct in saying that fixing this wouldn't help much as there are known issues with locales on Windows, e.g. #10466 ?
msg185926 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2013-04-03 14:16
No, the issues with locale on Windows have to do with the locale names.  Locale otherwise works fine on windows.
msg185932 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2013-04-03 16:00
With Python3 .upper() is locale-independent for unicode and bytes strings.
For serious work with non-ascii text Python3 is strongly recommended anyway, so I suggest to close this issue.
msg199745 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2013-10-13 18:21
I agree that it's better not to touch this in 2.x.
History
Date User Action Args
2022-04-11 14:56:22adminsetgithub: 44460
2013-10-13 18:21:07georg.brandlsetstatus: pending -> closed
nosy: + georg.brandl
messages: + msg199745

2013-04-03 16:00:29amaury.forgeotdarcsetstatus: open -> pending
resolution: wont fix
messages: + msg185932
2013-04-03 14:16:30r.david.murraysetnosy: + r.david.murray
messages: + msg185926
2013-04-03 14:12:37BreamoreBoysetmessages: + msg185925
2012-10-11 13:15:22serhiy.storchakasetcomponents: + Windows
2012-10-11 13:14:59serhiy.storchakasetcomponents: + Interpreter Core, - Library (Lib)
versions: + Python 2.7, - Python 2.6, Python 3.0
2010-09-27 17:36:29amaury.forgeotdarcsetnosy: + amaury.forgeotdarc

messages: + msg117452
stage: test needed -> needs patch
2010-09-16 18:01:57BreamoreBoysetnosy: + BreamoreBoy
messages: + msg116585
2009-03-30 19:04:16ajaksu2setversions: + Python 2.6, Python 3.0
nosy: + ajaksu2

messages: + msg84614

type: behavior
stage: test needed
2007-01-13 17:30:16dobrokotcreate