This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Seting defaultencoding through an environment variable
Type: enhancement Stage:
Components: Library (Lib) Versions:
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: Nosy List: diedrich, doerwalter, loewis
Priority: low Keywords:

Created on 2004-10-22 10:11 by diedrich, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (14)
msg54288 - (view) Author: Diedrich Vorberg (diedrich) Date: 2004-10-22 10:11
Hallo, I'd love to be able to set Python's default
encoding on a per instance basis through an environment
variable like

  PYTHON_DEFAULT_ENCODING

or something. That would require a trivial modification
to site.py. 

Diedrich
msg54289 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2004-10-23 08:32
Logged In: YES 
user_id=21627

Why do you want this change? In the long run, Python will
drop the notion of a default encoding; code should be
changed to avoid relying on the default encoding.
msg54290 - (view) Author: Diedrich Vorberg (diedrich) Date: 2004-10-23 09:08
Logged In: YES 
user_id=442849

Even if the notion of default encoding were dropped from
Python, my application will need a global variable to have
one encoding that is used most of the time. That`s what I`m
using the default encoding for right now. Also, at least one
of the 3d party modules I'm using does the same and it makes
tons of sense. Changing the default encoding from within my
application doesn't work, because sys.setdefaultencoding()
gets removed (which makes sense, too, of course). So I need
a custom sitecustomize.py for every project which may or may
not be adhered to, depending on the PYTHONPATH variable.
That`s why a  PYTHON_DEFAULT_ENCODING variable would be very
usefull to me.
msg54291 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2004-10-23 09:52
Logged In: YES 
user_id=21627

If your application just needs a global setting, it can come
up with its own environment variable, right? Just set
MY_DEFAULT_ENCODING, and do

encoding=os.environ["MY_DEFAULT_ENCODING"]

Alternatively, try using locale.getpreferredencoding(); this
may allow to avoid an additional environment variable
altogether.
msg54292 - (view) Author: Diedrich Vorberg (diedrich) Date: 2004-10-23 10:05
Logged In: YES 
user_id=442849

Yes, you are right. I could also replace the global
unicode() function with my own version that adheres to my
global encoding settings. But that doesn't seem the right
thing to do. Anyway, this is not so much a technical
decision but one that touches design. 

Anyway, it's just a small thing I tought usefull. If the
defaultencoding goes away altogether, I'll have to make
something up on my own anyway.

Thanks for your comments!
msg54293 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2004-10-23 10:09
Logged In: YES 
user_id=21627

I take it that you are withdrawing your feature request,
then? Closing it as rejected; if you still think this should
be done, please indicate so. It would be good if you then
could give some reasoning beyond "it would be useful"
msg54294 - (view) Author: Diedrich Vorberg (diedrich) Date: 2004-10-23 10:40
Logged In: YES 
user_id=442849

Well... what more than useful is it supposed to be?
Earh-shaking? Spiritually rewarding? ;-) As long as there is
a default encoding I'd like to be able to set it in a
simpler way than I can now. And I'd like to be able to set
it once, at interpreter startup. An environment variable
seems like a reasonable way to do that.

If the notion of default encoding is removed from Python's
library the suggested modification is going to go away as well. 

If you want to reject the feature request because the notion
of default encoding is depricated, than so be it. From my
point of view the suggested modification is something I'd
like to see.
msg54295 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2004-10-23 10:55
Logged In: YES 
user_id=21627

No, I'm asking *why* you consider it useful. For example

a) here is this and that problem, and there is absolutely no
way to express this in Python today. If that feature is
added, I can write it in the following way.

b) here is this and that problem. Today, I have to write it
in this way. With the feature added, I can write it in that
way. That way is better than this way because of such and
such reason.

IOW, I would like to see what you are using the default
encoding *for* that makes it desirable to set it through an
environment variable.
msg54296 - (view) Author: Diedrich Vorberg (diedrich) Date: 2004-10-23 11:30
Logged In: YES 
user_id=442849

Ah.. ok.. some examples:

I'm writing Web-Applications using CGI or FastCGI. The
webserver will pass user input as plain strings but within
my program I want to use plain strings as little as possible
to avoid all kinds of problems with diacritical characters
(foremost umlauts and the EUR symbol). The websites the CGI
scripts are used with usually have one charset for all
pages. But some sites use iso-8859-1 and some use utf-8 and
I don't want to modify my scripts to fit the charset of the
site. Rather I'd like to  use Apache's SetEnv directive to
tell my scripts what charset their input is going to be in.
Yes, I know, there are other ways to handle this :-)

I keep using XIST for html output
(http://www.livinglogic.de/Python). XIST tries to avoid
using the default encoding, probably in the light of it
being depricated. But it does not avoid it completely. There
are a number of subtle cases in which it still depends on it
and in all these cases I run into run-time errors which can
be avoided in a snap by 
setting the default encoding to what my actual default
encoding is.
I use XIST with Zope and FastCGI, running several instances
of the same interpreter on the same machine, with different
default encodings each (mostly utf-8, but some iso-8859-1,
which I can't just convert).

My own OSS project orm (http://t4w.de/orm) carefully
seperates the database's and the application's charset but
uses Python's default encoding as the default for each and
every setting. Those settings need to be set explicitly if
the default encoding does not match the encoding actually
used, cluttering the sourcecode and making maintaince more
difficult. I use orm in connection with all the above cases.
msg54297 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2004-10-23 11:50
Logged In: YES 
user_id=21627

I see. For the XIST case, it probably would be best to
correct XIST, to always use explicit encodings.
Alternatively, would it be possible to pass byte strings
instead of Unicode objects to XIST in the cases which XIST
doesn't handle correctly? If not, why not?

If you cannot fix XIST, here is a work-around:

import sys
reload(sys)
sys.setdefaultencoding("desired encoding")

This is somewhat of a hack, but using a hack is ok if you
are looking for a work-around for a problem that really
should be fixed in the long run, anyway.

I don't understand the orm example. What is "each and every
setting"? Grepping the source of orm 1.0.1, the word
"setting" occurs only in two places, both apparently
irrelevant. If you are talking about some sort of
configuration file - wouldn't it be good to create a library
for the configuration file, and make only that library be
aware of the encoding of the configuration file?
msg54298 - (view) Author: Diedrich Vorberg (diedrich) Date: 2004-10-23 12:06
Logged In: YES 
user_id=442849

The reload(sys) bid is a good idea, thanks.

Anyway, I can't fix XIST, because it it complicated as heck
and I don't really have time to do that. Also I'm not sure,
it this is really due to a fault in XIST or some side-effect
I can't fathom.

ORM is not touched by the encoding problem itself, it's in
the data model modules. Each and every Unicode column needs
a appEncoding= parameter on definition. That's not a
technical problem, though, it's just plain ugly. (This might
go away in the mid-run, because I'll re-implement parts of
orm and I can put unicode support on the todo list. But
until then...;-)

msg54299 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2004-10-23 12:25
Logged In: YES 
user_id=21627

If all you want is to specify the appEncoding through an
environment variable: this should be really easy. Instead of

class Unicode(varchar):
    def __init__(self, columnName=None,
appEncoding="iso-8859-1"):
        datatype.__init__(self, columnName)
        self._appEncoding = appEncoding

do this instead

class Unicode(varchar):
    def __init__(self, columnName=None, 
                      appEncoding=os.environ.get("ORM_ENCODING",
                                                           
    "iso-8859-1")):
        datatype.__init__(self, columnName)
        self._appEncoding = appEncoding

Then set ORM_ENCODING as you'ld like. No other changes to
your source are required.
msg54300 - (view) Author: Diedrich Vorberg (diedrich) Date: 2004-10-23 12:41
Logged In: YES 
user_id=442849

Yes, that's a way to go.
msg54301 - (view) Author: Walter Dörwald (doerwalter) * (Python committer) Date: 2004-10-25 10:33
Logged In: YES 
user_id=89016

There's only one spot in XIST where the default encoding is 
relevant: When you pass a str object to one of the node 
constructors. This str object is converted to unicode using 
the default encoding. So html.p("ö") won't work, unless you 
change the default encoding, but html.p(unicode("ö", "iso-
8859-1")) will, and so will html.p(u"ö") with a proper PEP 263 
encoding declaration at the beginning of the file. If this 
doesn't fix you problem, you can mail me directly.
History
Date User Action Args
2022-04-11 14:56:07adminsetgithub: 41061
2008-01-05 18:25:02christian.heimessetstatus: open -> closed
2004-10-22 10:11:14diedrichcreate