Issue 1052098: Seting defaultencoding through an environment variable

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/41061

classification

Title:	Seting defaultencoding through an environment variable
Type:	enhancement	Stage:
Components:	Library (Lib)	Versions:

process

Status:	closed	Resolution:	rejected
Dependencies:		Superseder:
Assigned To:		Nosy List:	diedrich, doerwalter, loewis
Priority:	low	Keywords:

Created on 2004-10-22 10:11 by diedrich, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (14)
msg54288 - (view)	Author: Diedrich Vorberg (diedrich)	Date: 2004-10-22 10:11
Hallo, I'd love to be able to set Python's default encoding on a per instance basis through an environment variable like PYTHON_DEFAULT_ENCODING or something. That would require a trivial modification to site.py. Diedrich
msg54289 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2004-10-23 08:32
Logged In: YES user_id=21627 Why do you want this change? In the long run, Python will drop the notion of a default encoding; code should be changed to avoid relying on the default encoding.
msg54290 - (view)	Author: Diedrich Vorberg (diedrich)	Date: 2004-10-23 09:08
Logged In: YES user_id=442849 Even if the notion of default encoding were dropped from Python, my application will need a global variable to have one encoding that is used most of the time. That`s what I`m using the default encoding for right now. Also, at least one of the 3d party modules I'm using does the same and it makes tons of sense. Changing the default encoding from within my application doesn't work, because sys.setdefaultencoding() gets removed (which makes sense, too, of course). So I need a custom sitecustomize.py for every project which may or may not be adhered to, depending on the PYTHONPATH variable. That`s why a PYTHON_DEFAULT_ENCODING variable would be very usefull to me.
msg54291 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2004-10-23 09:52
Logged In: YES user_id=21627 If your application just needs a global setting, it can come up with its own environment variable, right? Just set MY_DEFAULT_ENCODING, and do encoding=os.environ["MY_DEFAULT_ENCODING"] Alternatively, try using locale.getpreferredencoding(); this may allow to avoid an additional environment variable altogether.
msg54292 - (view)	Author: Diedrich Vorberg (diedrich)	Date: 2004-10-23 10:05
Logged In: YES user_id=442849 Yes, you are right. I could also replace the global unicode() function with my own version that adheres to my global encoding settings. But that doesn't seem the right thing to do. Anyway, this is not so much a technical decision but one that touches design. Anyway, it's just a small thing I tought usefull. If the defaultencoding goes away altogether, I'll have to make something up on my own anyway. Thanks for your comments!
msg54293 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2004-10-23 10:09
Logged In: YES user_id=21627 I take it that you are withdrawing your feature request, then? Closing it as rejected; if you still think this should be done, please indicate so. It would be good if you then could give some reasoning beyond "it would be useful"
msg54294 - (view)	Author: Diedrich Vorberg (diedrich)	Date: 2004-10-23 10:40
Logged In: YES user_id=442849 Well... what more than useful is it supposed to be? Earh-shaking? Spiritually rewarding? ;-) As long as there is a default encoding I'd like to be able to set it in a simpler way than I can now. And I'd like to be able to set it once, at interpreter startup. An environment variable seems like a reasonable way to do that. If the notion of default encoding is removed from Python's library the suggested modification is going to go away as well. If you want to reject the feature request because the notion of default encoding is depricated, than so be it. From my point of view the suggested modification is something I'd like to see.
msg54295 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2004-10-23 10:55
Logged In: YES user_id=21627 No, I'm asking why you consider it useful. For example a) here is this and that problem, and there is absolutely no way to express this in Python today. If that feature is added, I can write it in the following way. b) here is this and that problem. Today, I have to write it in this way. With the feature added, I can write it in that way. That way is better than this way because of such and such reason. IOW, I would like to see what you are using the default encoding for that makes it desirable to set it through an environment variable.
msg54296 - (view)	Author: Diedrich Vorberg (diedrich)	Date: 2004-10-23 11:30
Logged In: YES user_id=442849 Ah.. ok.. some examples: I'm writing Web-Applications using CGI or FastCGI. The webserver will pass user input as plain strings but within my program I want to use plain strings as little as possible to avoid all kinds of problems with diacritical characters (foremost umlauts and the EUR symbol). The websites the CGI scripts are used with usually have one charset for all pages. But some sites use iso-8859-1 and some use utf-8 and I don't want to modify my scripts to fit the charset of the site. Rather I'd like to use Apache's SetEnv directive to tell my scripts what charset their input is going to be in. Yes, I know, there are other ways to handle this :-) I keep using XIST for html output (http://www.livinglogic.de/Python). XIST tries to avoid using the default encoding, probably in the light of it being depricated. But it does not avoid it completely. There are a number of subtle cases in which it still depends on it and in all these cases I run into run-time errors which can be avoided in a snap by setting the default encoding to what my actual default encoding is. I use XIST with Zope and FastCGI, running several instances of the same interpreter on the same machine, with different default encodings each (mostly utf-8, but some iso-8859-1, which I can't just convert). My own OSS project orm (http://t4w.de/orm) carefully seperates the database's and the application's charset but uses Python's default encoding as the default for each and every setting. Those settings need to be set explicitly if the default encoding does not match the encoding actually used, cluttering the sourcecode and making maintaince more difficult. I use orm in connection with all the above cases.
msg54297 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2004-10-23 11:50
Logged In: YES user_id=21627 I see. For the XIST case, it probably would be best to correct XIST, to always use explicit encodings. Alternatively, would it be possible to pass byte strings instead of Unicode objects to XIST in the cases which XIST doesn't handle correctly? If not, why not? If you cannot fix XIST, here is a work-around: import sys reload(sys) sys.setdefaultencoding("desired encoding") This is somewhat of a hack, but using a hack is ok if you are looking for a work-around for a problem that really should be fixed in the long run, anyway. I don't understand the orm example. What is "each and every setting"? Grepping the source of orm 1.0.1, the word "setting" occurs only in two places, both apparently irrelevant. If you are talking about some sort of configuration file - wouldn't it be good to create a library for the configuration file, and make only that library be aware of the encoding of the configuration file?
msg54298 - (view)	Author: Diedrich Vorberg (diedrich)	Date: 2004-10-23 12:06
Logged In: YES user_id=442849 The reload(sys) bid is a good idea, thanks. Anyway, I can't fix XIST, because it it complicated as heck and I don't really have time to do that. Also I'm not sure, it this is really due to a fault in XIST or some side-effect I can't fathom. ORM is not touched by the encoding problem itself, it's in the data model modules. Each and every Unicode column needs a appEncoding= parameter on definition. That's not a technical problem, though, it's just plain ugly. (This might go away in the mid-run, because I'll re-implement parts of orm and I can put unicode support on the todo list. But until then...;-)
msg54299 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2004-10-23 12:25
Logged In: YES user_id=21627 If all you want is to specify the appEncoding through an environment variable: this should be really easy. Instead of class Unicode(varchar): def __init__(self, columnName=None, appEncoding="iso-8859-1"): datatype.__init__(self, columnName) self._appEncoding = appEncoding do this instead class Unicode(varchar): def __init__(self, columnName=None, appEncoding=os.environ.get("ORM_ENCODING", "iso-8859-1")): datatype.__init__(self, columnName) self._appEncoding = appEncoding Then set ORM_ENCODING as you'ld like. No other changes to your source are required.
msg54300 - (view)	Author: Diedrich Vorberg (diedrich)	Date: 2004-10-23 12:41
Logged In: YES user_id=442849 Yes, that's a way to go.
msg54301 - (view)	Author: Walter Dörwald (doerwalter) *	Date: 2004-10-25 10:33
Logged In: YES user_id=89016 There's only one spot in XIST where the default encoding is relevant: When you pass a str object to one of the node constructors. This str object is converted to unicode using the default encoding. So html.p("ö") won't work, unless you change the default encoding, but html.p(unicode("ö", "iso- 8859-1")) will, and so will html.p(u"ö") with a proper PEP 263 encoding declaration at the beginning of the file. If this doesn't fix you problem, you can mail me directly.

History
Date	User	Action	Args
2022-04-11 14:56:07	admin	set	github: 41061
2008-01-05 18:25:02	christian.heimes	set	status: open -> closed
2004-10-22 10:11:14	diedrich	create