This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: iso-latin-1 strings and functions lower & upper
Type: Stage:
Components: None Versions: Python 2.3
process
Status: closed Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: kowaltowski, scott_daniels
Priority: normal Keywords:

Created on 2004-09-11 21:28 by kowaltowski, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (3)
msg22432 - (view) Author: Tomasz Kowaltowski (kowaltowski) Date: 2004-09-11 21:28
I have no problems in Python in using strings which
contain accented letters (my Emacs has no problems in
producing them using one-byte iso-8859-1 encoding).
However functions 'lower' and 'upper' do not work
properly on these letters as shown below (I hope all
accents appear properly within your browsers):

-------------------------------------------------------------
as = "aáàâãä"      # except for the first 'a', all
other have accents
AS = "AÁÀÂÃÄ"      # except for the first 'A', all
other have accents
print "direct: %s -- %s" % (as, AS)
print "lower:  %s -- %s" % (as.lower(), AS.lower())
print "upper:  %s -- %s" % (as.upper(), AS.upper())
-------------------------------------------------------------

The output is:
--------------------------------------------------------------
direct: aáàâãä -- AÁÀÂÃÄ
lower:  aáàâãä -- aÁÀÂÃÄ
upper:  Aáàâãä -- AÁÀÂÃÄ
--------------------------------------------------------------

i.e., accented letters (above 128) are not translated.
It did not make any difference to put the line 

# -*- coding: iso-latin-1 -*-

about the encoding as recommended by PEP 0263.

I am not sure whether this is a bug or it is
intentional, i.e. these functions work only for pure
ASCII letters. However it is a major inconvenience for
those who use any language which is not English but
uses the Latin aplphabet :-(. 

There should be some mechanism to signal these
functions which Latin variant (iso-8859-1, iso-8859-2,
...) is being used, so that they behave properly; eg,
optional second argument?
msg22433 - (view) Author: Scott David Daniels (scott_daniels) * Date: 2004-09-13 20:00
Logged In: YES 
user_id=493818

Note: lower and upper are defined as for ASCII on strs, 
but works correctly for unicode.
 uas = u"aáàâãä" # except first 'a', all have accents
 UAS = u"AÁÀÂÃÄ" # except first 'A', all have accents
 print u"direct: %s -- %s" % (uas, UAS)
 print u"lower: %s -- %s" % (uas.lower(), UAS.lower())
 print u"upper: %s -- %s" % (uas.upper(), UAS.upper())

What you are asking is pretty hopeless.  With two 
modules loaded with differing encodings, whose idea of 
"how to uppercase an 8-bit character" should be used?

What you might want to use is:
  def codedupper(coding, string):
     return string.decode(coding).upper().encode(coding)
  def codedlower(coding, string):
     return string.decode(coding).lower().encode(coding)
or:
  def latinupper(string):
     return string.decode('latin-1').upper().encode('latin-1')
  def latinlower(string):
     return string.decode('latin-1').lower().encode('latin-1')

Any of these functions is well-defined even with several 
modules of differing encodings loaded.
msg22434 - (view) Author: Tomasz Kowaltowski (kowaltowski) Date: 2004-09-14 00:12
Logged In: YES 
user_id=185428

I guess you are right from conceptual point of view. It is
just somewhat frustrating because almost every language
which uses the Latin alphabet needs characters above 128 (is
English the only exception?). On the other hand 'lower' and
'upper' work for Unicode (really utf-8) representation in
which many alphabets do not even have the concept of lower
and upper cases!

Your suggestion about 'latinlower' and 'latinupper' is
basically what I asked for, but about 10 times slower than
direct 'lower' and 'upper' :-(.

Thanks anyway -- I guess the matter may be closed.
History
Date User Action Args
2022-04-11 14:56:07adminsetgithub: 40901
2004-09-11 21:28:34kowaltowskicreate