Issue 1235646: codecs.StreamRecoder.next doesn't encode

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/42180

classification

Title:	codecs.StreamRecoder.next doesn't encode
Type:		Stage:
Components:	Unicode	Versions:	Python 2.4

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:	doerwalter	Nosy List:	doerwalter, lemburg, wangnick
Priority:	normal	Keywords:

Created on 2005-07-10 16:55 by wangnick, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
diff.txt	doerwalter, 2005-08-31 16:11

Messages (7)
msg25788 - (view)	Author: Sebastian Wangnick (wangnick)	Date: 2005-07-10 16:55
Codecs.StreamRecode.next does't encode the data it gets from self.reader.next. This breaks the "for line in codecs.EncodedFile(...)" idiom.
msg25789 - (view)	Author: Walter Dörwald (doerwalter) *	Date: 2005-08-31 16:11
Logged In: YES user_id=89016 Here's a simple patch
msg25790 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2005-08-31 20:58
Logged In: YES user_id=38388 Looks good, Walter. Please check it in. Thanks.
msg25791 - (view)	Author: Walter Dörwald (doerwalter) *	Date: 2005-09-01 12:22
Logged In: YES user_id=89016 Checked in as: Lib/codecs.py 1.48/1.35.2.10 I'll try to add tests for StreamRecoder tomorrow. StreamRecoder is broken in its current form, as it uses the stateless codec for the frontend encoding. Recoding from e.g. latin-1 to utf-16 will return a BOM for every call to read() which is clearly wrong. What gets read from the backend stream should be pushed through a stateful encoder. BTW, a feed style API would help here ;)
msg25792 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2005-09-01 18:28
Logged In: YES user_id=38388 Thanks, Walter. StreamRecorder is not broken: it works as advertised (see the .__init__() doc-string and interface) and yes, this means that only stateless encodings can be used, such as e.g. UTF-16-LE, simply because the encode and decode functions are defined as being stateless.
msg25793 - (view)	Author: Walter Dörwald (doerwalter) *	Date: 2005-09-02 18:33
Logged In: YES user_id=89016 OK, now I'm beginning to understand the docstring. Nevertheless I think having a class that uses stateful codecs at both ends would be useful. If you want, I can give this a try (after I'm back from vactation in four weeks). Closing the report as fixed.
msg25794 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2005-09-03 10:31
Logged In: YES user_id=38388 If you think there's a use case, yes. Enjoy your vacation !

History
Date	User	Action	Args
2022-04-11 14:56:12	admin	set	github: 42180
2005-07-10 16:55:32	wangnick	create