This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: codecs.StreamRecoder.next doesn't encode
Type: Stage:
Components: Unicode Versions: Python 2.4
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: doerwalter Nosy List: doerwalter, lemburg, wangnick
Priority: normal Keywords:

Created on 2005-07-10 16:55 by wangnick, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
diff.txt doerwalter, 2005-08-31 16:11
Messages (7)
msg25788 - (view) Author: Sebastian Wangnick (wangnick) Date: 2005-07-10 16:55
Codecs.StreamRecode.next does't encode the data it 
gets from self.reader.next. This breaks the "for line in 
codecs.EncodedFile(...)" idiom.
msg25789 - (view) Author: Walter Dörwald (doerwalter) * (Python committer) Date: 2005-08-31 16:11
Logged In: YES 
user_id=89016

Here's a simple patch
msg25790 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2005-08-31 20:58
Logged In: YES 
user_id=38388

Looks good, Walter.

Please check it in.

Thanks.
msg25791 - (view) Author: Walter Dörwald (doerwalter) * (Python committer) Date: 2005-09-01 12:22
Logged In: YES 
user_id=89016

Checked in as:
Lib/codecs.py 1.48/1.35.2.10

I'll try to add tests for StreamRecoder tomorrow.

StreamRecoder is broken in its current form, as it uses the
stateless codec for the frontend encoding. Recoding from
e.g. latin-1 to utf-16 will return a BOM for every call to
read() which is clearly wrong. What gets read from the
backend stream should be pushed through a *stateful*
encoder. BTW, a feed style API would help here ;)
msg25792 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2005-09-01 18:28
Logged In: YES 
user_id=38388

Thanks, Walter.

StreamRecorder is not broken: it works as advertised (see
the .__init__() doc-string and interface) and yes, this
means that only stateless encodings can be used, such as
e.g. UTF-16-LE, simply because the encode and decode
functions are defined as being stateless.


msg25793 - (view) Author: Walter Dörwald (doerwalter) * (Python committer) Date: 2005-09-02 18:33
Logged In: YES 
user_id=89016

OK, now I'm beginning to understand the docstring.
Nevertheless I think having a class that uses stateful
codecs at both ends would be useful. If you want, I can give
this a try (after I'm back from vactation in four weeks).

Closing the report as fixed.
msg25794 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2005-09-03 10:31
Logged In: YES 
user_id=38388

If you think there's a use case, yes. 

Enjoy your vacation !
History
Date User Action Args
2022-04-11 14:56:12adminsetgithub: 42180
2005-07-10 16:55:32wangnickcreate