Issue 780730: source files using encoding ./. universal newlines

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/38984

classification

Title:	source files using encoding ./. universal newlines
Type:		Stage:
Components:	Interpreter Core	Versions:	Python 2.5

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:	lemburg	Nosy List:	doerwalter, georg.brandl, ishimoto, jackjansen, johnjsmith, lemburg, loewis, nnorwitz
Priority:	normal	Keywords:

Created on 2003-07-31 09:16 by ishimoto, last changed 2022-04-10 16:10 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
test.py	ishimoto, 2003-08-01 07:35

Messages (19)
msg17521 - (view)	Author: Atsuo Ishimoto (ishimoto) *	Date: 2003-07-31 09:16
Universal Newline Support doesn't work for source files that contain encoding definition.
msg17522 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2003-08-01 07:04
Logged In: YES user_id=21627 It's not clear to me what the problem is: Source code files and universal newline support have nothing to do with each other. Can you attach a small example that demonstrates the problem?
msg17523 - (view)	Author: Atsuo Ishimoto (ishimoto) *	Date: 2003-08-01 07:35
Logged In: YES user_id=463672 Attached file has encoding definition and it's newline is '\r'. Executing this script on Windows 2000, I got following error. But it runs fine if I remove encoding definition. C:\Python23>python .\test.py File ".\test.py", line 2 a() print 'hi' ^ SyntaxError: invalid syntax
msg17524 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2003-08-01 08:22
Logged In: YES user_id=21627 I see. This is not about PEP 278, though, as you are not calling any open() function, and passing no 'U' argument to it - cross-platform newlines should work independent of that PEP.
msg17525 - (view)	Author: Jack Jansen (jackjansen) *	Date: 2003-08-01 09:45
Logged In: YES user_id=45365 The submitter isn't calling open explicitly, but PEP278 is also about the implicit opening of sourcefiles by the interpreter. My guess (but I don't know the PEP263 implementation at all) is that when you specify an encoding the normal code that the interpreter uses to read sourcefiles is bypassed, and replaced by something that does the encoding magic. Apparently this code does not do universal newline conversion, which it should.
msg17526 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2003-08-03 19:19
Logged In: YES user_id=21627 The only way to implement that would be to add 'U' support for all codecs, and open the codec with the 'U' flag. As this will also affect codecs not part of Python, we cannot fix this bug in Python 2.3, but have to defer a solution to Python 2.4.
msg17527 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2003-08-03 19:31
Logged In: YES user_id=38388 I'm not sure this is correct: unless the codecs implement their own .readline() implementation, the one in codecs.py is used and that simply delegates the readline request to the underlying stream object. Now. since the stream object in the source code reader is a plain Python file object, currently opened in "rb" mode, changing the mode to "rbU" should be enough to get universal readline support for all such codecs. The relevant code is in Parser/tokenizer.c:fp_setreadl(): static int fp_setreadl(struct tok_state tok, const char enc) { PyObject reader, stream, readline; / XXX: constify filename argument. / stream = PyFile_FromFile(tok->fp, (char)tok->filename, "rb", NULL); if (stream == NULL) return 0; reader = PyCodec_StreamReader(enc, stream, NULL); Py_DECREF(stream); if (reader == NULL) return 0; readline = PyObject_GetAttrString(reader, "readline"); Py_DECREF(reader); if (readline == NULL) return 0; tok->decoding_readline = readline; return 1; }
msg17528 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2003-08-03 20:22
Logged In: YES user_id=21627 That would work indeed. The question is whether we can impose support for universal newlines on all codecs "out there", for Python 2.3.1, when 2.3 makes no such requirement.
msg17529 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2003-08-04 08:21
Logged In: YES user_id=38388 Uhm, we don't impose Universal readline support on the codecs. They would just get a stream object that happens to know about universal newlines and work with it. That's completely in line with the codec spec. I'm +1 on adding this to 2.3.1.
msg17530 - (view)	Author: Jack Jansen (jackjansen) *	Date: 2003-08-04 08:24
Logged In: YES user_id=45365 There's no such things as "rbU", I think, but simply "rU" should work. As far as I know the only difference between "r" and "rb" is newline conversion, right? If there are C libraries that do more then we should implement "rbU". About 2.3.1 compatibility: technically we could break workarounds people have done themselves, but I think the chances are slim.
msg17531 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2003-08-04 15:57
Logged In: YES user_id=38388 Jack, I was just looking at the code I posted and the one in fileobect.c. The latter enables universal newline support whenever it sees a 'U' in the mode string, so I throught that adding a 'U' to the mode would be enough. The only system where 'b' does make a difference that I'm aware of is Windows, so you may want to check whether it makes a difference there.
msg17532 - (view)	Author: John J Smith (johnjsmith)	Date: 2003-08-04 19:34
Logged In: YES user_id=830565 In MS Windows, a '\x1a' (Ctrl-Z) in a file will be treated as EOF, unless the file is opened with 'rb'.
msg17533 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2003-08-05 07:31
Logged In: YES user_id=38388 Thanks John. Not sure whether any of codecs would actually use 0x1A, but using "rbU" sounds like the safer approach then.
msg17534 - (view)	Author: Jack Jansen (jackjansen) *	Date: 2003-08-05 10:48
Logged In: YES user_id=45365 You misunderstand what I tried to say (or I mis-said it:-): there is no such thing as mode "rbU", check the code in fileobject.c. There is "r" == "rt" for text mode, "rb" for binary mode, "U"=="rU" for universal newline textmode. With "rU" the underlying file is opened in binary mode, so I don't think we'll have the control-Z problem.
msg17535 - (view)	Author: Neal Norwitz (nnorwitz) *	Date: 2005-10-02 05:50
Logged In: YES user_id=33168 I don't see a clear resolution here. Is there something we can/should do to fix this problem in 2.5?
msg17536 - (view)	Author: Walter Dörwald (doerwalter) *	Date: 2006-02-20 21:42
Logged In: YES user_id=89016 The changes to the codecs done in Python 2.4 added support for universal newlines: Python 2.4.1 (#2, Mar 31 2005, 00:05:10) [GCC 3.3 20030304 (Apple Computer, Inc. build 1666)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> open("foo.py", "wb").write("# -- coding: iso-8859-1 --\rprint 17\rprint 23\r") >>> import foo 17 23 >>>
msg17537 - (view)	Author: Georg Brandl (georg.brandl) *	Date: 2006-02-20 21:48
Logged In: YES user_id=849994 So this is resolved now?
msg17538 - (view)	Author: Walter Dörwald (doerwalter) *	Date: 2006-02-20 22:06
Logged In: YES user_id=89016 It looks to me that way. Any comments from the OP?
msg17539 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2006-02-21 09:45
Logged In: YES user_id=38388 Closing the bug. Thanks Walter.

History
Date	User	Action	Args
2022-04-10 16:10:24	admin	set	github: 38984
2003-07-31 09:16:56	ishimoto	create