This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Add unicode for sys.argv, os.environ, os.system
Type: Stage:
Components: Extension Modules Versions: Python 2.5
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: Nosy List: lemburg, loewis, nyamatongwe
Priority: normal Keywords: patch

Created on 2005-07-02 01:55 by nyamatongwe, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
diffs.txt nyamatongwe, 2005-07-02 01:55 Differences from current CVS containing code and doc
diffu.txt nyamatongwe, 2005-07-05 04:25
diffu.txt nyamatongwe, 2005-07-05 04:27 Diff that also handles -c, -O, and -E.
Messages (13)
msg48550 - (view) Author: Neil Hodgson (nyamatongwe) Date: 2005-07-02 01:55
Most installations of Windows (2000, XP) are unicode
native with narrow character APIs only providing a
distorted view of the system. Python does not currently
provide access to some basic features through wide
character calls and so may see distorted values. This
patch adds unicode compatibility for sys.argv,
os.environ, and os.system. os.system accepts a unicode
argument in the same way as described in PEP 277 for
file APIs. For sys.argv and os.environ, new parallel
unicode attributes sys.argvu and os.environu are added
as it would cause too many problems to use unicode
values for the existing attributes or to use unicode
only for non-ASCII values. The features are only
enabled on unicode native versions of Windows.
The three features are demonstrated at
http://www.scintilla.org/pyunicode.png
The patch contains some documentation additions for
sys.argvu and os.environu. 
There are no test cases as test cases involving running
extra processes can be messy and fail for uninteresting
reasons.
msg48551 - (view) Author: Neil Hodgson (nyamatongwe) Date: 2005-07-05 04:25
Logged In: YES 
user_id=12579

There are problems in sys.argvu as the current argument
processing code removes some option arguments where these
are processed by python. This can be almost fixed by storing
the argc last elements into sys.argvu. However, when using
[-c command], the command is removed from sys.argv as this
allows the Python code to determine that it is either
running with a command line command ("-c") or the name of
the file.
Attached patch fixes these problems.
msg48552 - (view) Author: Neil Hodgson (nyamatongwe) Date: 2005-07-05 04:27
Logged In: YES 
user_id=12579

Added a description to diff file.
msg48553 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2005-07-11 16:42
Logged In: YES 
user_id=21627

For os.environ, I think I would prefer a solution where
Unicode keys result in Unicode values and string keys result
in string values, with the canonical conversion through
"mbcs" in place.

For argv, I agree something should be done, but I'm not
certain that the introduction of argvu is the best thing to
do; this should be dicsussed on python-dev, and with all
people originally involved in PEP 277.

The change to system() is not mentioned at all in your
message. It doesn't seem to belong into this patch, either,
so please submit it as a separate patch. If system() is
changed to support Unicode commands, I think spawn*() should
be changed as well. These seem less debatable, as they come
as natural extensions to PEP 277 (i.e. pass Unicode through
to the system).
msg48554 - (view) Author: Neil Hodgson (nyamatongwe) Date: 2005-07-14 10:35
Logged In: YES 
user_id=12579

os.environ is a dictionary and unicode keys can not be
discerned from string keys.
For sys.argv it appears that there is no support for the
"parallel universe" approach with sys.argvu and I expect one
of the "promotion" models will be chosen.
The patch should be rejected (or parked?) until consensus
emerges.
os.system was only included to allow testing but I saw
difficulties in writing robust unit tests for these features
so didn't include any.
msg48555 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2005-07-14 18:10
Logged In: YES 
user_id=21627

os.environ is not a dictionary, it is a
UserDict.IterableUserDict. Discerning strings and Unicode
object would well be possible. As you are not willing to
discuss the issues on python-dev, I'm rejecting the patch.
msg48556 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2005-07-14 18:33
Logged In: YES 
user_id=38388

Just as data point: the idea of using the type of a
dictionary key to determine the resulting return type is a
really bad design idea - just like the idea to let functions
determine their return type based on the types of their
input parameters.

These things should always be made explicit, e.g.
os.environ.get_unicode(), sys.argv.get_unicode() etc.

However, as the discussion on python-dev shows, we may not
need this kind of approach at all.

Cheers, Marc-Andre.
msg48557 - (view) Author: Neil Hodgson (nyamatongwe) Date: 2005-07-15 00:17
Logged In: YES 
user_id=12579

I thought that posixmodule.c was creating os.environ but now
see the code in os.py.
"As you are not willing to discuss the issues on
python-dev". Eh? I thought that was what I was doing in the
"Adding the 'path' module" thread.
You can reject the patch due to the discussion on python-dev
but I don't think the given reason is valid.
msg48558 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2005-07-15 05:17
Logged In: YES 
user_id=21627

Sorry, I missed the discussion; reopening.
msg48559 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2005-08-09 15:08
Logged In: YES 
user_id=21627

I think the discussion came to the following conclusion:
environu should not be added, instead, os.environ should
have Unicode where necessary (i.e. non-ASCII), I guess this
applies both to keys and to values.

Are you interested in revising the patch in this direction?
msg48560 - (view) Author: Neil Hodgson (nyamatongwe) Date: 2005-08-09 23:23
Logged In: YES 
user_id=12579

Marc-Andre Lemburg's point of view that os.environ use
unicode when the string is outside Python's default encoding
attracted most support. For the reasons given in the
discussion, I feel this will cause problems for users. It is
more difficult to code than a CP_ACP or non-ASCII test and
there would be flow-on work for other calls such as open
that would need to convert from the default encoding to
Unicode. Due to the size of these changes and my doubts
about this being the correct design, I don't want to work on
its implementation.
msg48561 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2005-08-10 07:17
Logged In: YES 
user_id=21627

Ok, I think we have to reject this patch, then, and wait for
somebody to write a PEP.
msg48562 - (view) Author: Neil Hodgson (nyamatongwe) Date: 2005-08-10 07:24
Logged In: YES 
user_id=12579

Yes, the scope of the changes needed requires a PEP and
transition plan and needs to make sense in moving towards
the all-unicode string future. 
History
Date User Action Args
2022-04-11 14:56:12adminsetgithub: 42154
2005-07-02 01:55:44nyamatongwecreate