This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Default encoding harmful
Type: Stage:
Components: Unicode Versions: Python 2.3
process
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: lemburg Nosy List: hyeshik.chang, lemburg, tgoeller
Priority: normal Keywords:

Created on 2004-04-03 16:00 by tgoeller, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (3)
msg20422 - (view) Author: Toni Goeller (tgoeller) Date: 2004-04-03 16:00
Although I am a newbie to Python and seem to be the
only one in the world who is unhappy about it (at least
I didn't find open bugs against it):

Setting the default encoding to 'ascii' and the error
handling to 'strict' is the worst choice to make.

The following rules to ease the creation of stable
programs are violated:

Rule 1:
str() should *NEVER* raise exceptions. It is meant to
give you a readable representation of your data. Period.
It is ok to return "You fool! Why do you have non-ascii
characters in your Unicode string when ascii is your
default encoding?", but it is *NOT* ok to return an
exception!

Rule 2:
Avoid data driven exceptions, if they can be avoided.
In most cases, it is absolutely not what the programmer
wants to get an exception for half of the possible byte
values (and this is the less tested half in
English-speaking countries). The default should enable
the less experienced programmer to write acceptably
stable programs, non-default settings should enable the
expert to fine-tune program behaviour.


This gets worse for Python's choice to make it
extremely hard to set the default encoding after the
sitecustomize.py file was run (if it is run at all).

No other programming language I know is as hard to use
in this respect as Python.
I know this is a defined feature, not a bug, but it is
a bad and dangerous design decision nonetheless.
msg20423 - (view) Author: Hyeshik Chang (hyeshik.chang) * (Python committer) Date: 2004-04-03 16:53
Logged In: YES 
user_id=55188

> str() should *NEVER* raise exceptions. It is meant to
> give you a readable representation of your data.

If you want just string representations, use repr(). str()
have never guaranteed not raising exceptions.

> It is ok to return "You fool! Why do you have non-ascii
> characters in your Unicode string when ascii is your
> default encoding?", but it is *NOT* ok to return an
> exception!

What remains when except exceptions?

> This gets worse for Python's choice to make it
> extremely hard to set the default encoding after the
> sitecustomize.py file was run (if it is run at all).

I agree that sitecustomize.py and "if 0:" blocks that
controls encodings stuff are somewhat newbie-unfriendly.
Can't you say any good idea to improve this situation?
msg20424 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2004-04-03 17:13
Logged In: YES 
user_id=38388

There is a simple solutions to both of your problems: always
use Unicode for text data and use proper encodings
when doing text data I/O.

The intention of the strict Python defaults is to teach
programmers to right correct programs rather than
get away with bad design.

Closing this as won't fix.
History
Date User Action Args
2022-04-11 14:56:03adminsetgithub: 40118
2004-04-03 16:00:56tgoellercreate