Issue 1101097: Feed style codec API

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/41432

classification

Title:	Feed style codec API
Type:		Stage:
Components:	Library (Lib)	Versions:

process

Status:	closed	Resolution:	rejected
Dependencies:		Superseder:
Assigned To:	lemburg	Nosy List:	doerwalter, lemburg
Priority:	normal	Keywords:	patch

Created on 2005-01-12 18:14 by doerwalter, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
diff.txt	doerwalter, 2005-01-12 18:14
diff2.txt	doerwalter, 2006-01-11 21:48

Messages (8)
msg47521 - (view)	Author: Walter Dörwald (doerwalter) *	Date: 2005-01-12 18:14
The attached patch implements a feed style codec API by adding feed methods to StreamReader and StreamWriter (see SF patch #998993 for a history of this issue).
msg47522 - (view)	Author: Walter Dörwald (doerwalter) *	Date: 2006-01-11 21:48
Logged In: YES user_id=89016 The second version of the patch is updated for the current svn head and includes patches to the documentation.
msg47523 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2006-01-12 14:20
Logged In: YES user_id=38388 I don't like the name of the methods, since feed style APIs usually take data and store in the object's state whereas the method you are suggesting is merely an encode method that takes the current state into account. The idea is to allow incremental processing. This is not what your versions implement. The StreamWriter would have to grow buffering for this. The .feed() method on the StreamReader would have to be adjusted to store the input in the .charbuffer only and not return anything. If you just want to make the code easier to follow, I'd suggest you use private methods, e.g. ._stateful_encode() and ._stateful_decode() - which is what these method do implement. Please also explain "If only the \method{feed()} method is used, \var{stream} will be ignored and can be \constant{None}.". I don't see this being true - .write() will still require a .stream object.
msg47524 - (view)	Author: Walter Dörwald (doerwalter) *	Date: 2006-01-12 15:41
Logged In: YES user_id=89016 Basically what I want to have is a decoupling of the stateful encoding/decoding from the stream API. An example: Suppose I have a generator: def foo(): yield u"Hello" yield u"World" I want to wrap this generator into another generator that does a stateful encoding of the strings from the first generator: def encode(it, encoding, errors): writer = codecs.getwriter(encoding)(None, errors) for data in it: yield writer.feed(data) for x in encode(foo(), "utf-16", "strict"): print repr(x) '\xff\xfeH\x00e\x00l\x00l\x00o\x00' 'W\x00o\x00r\x00l\x00d\x00' The writer itself shouldn't write anything to the stream (in fact, there is no stream), it should just encode what it gets fed and spit out the result. The reason why StreamWriter.feed() is implemented the way it is, is that currently there are no Python encodings where encode(string)[1] != len(string). If we want to handle that case the StreamWriter would have to grow a charbuffer. Should I add that to the patch? For decoding I want the same functionality: def blocks(name, size=8192): f = open(name, "rb") while True: data = f.read(size) if data: yield data else: break def decode(it, encoding, errors): reader = codecs.getreader(encoding)(None, errors) for data in it: yield reader.feed(data) decode(blocks("foo.xml")) Again, here the StreamReader doesn't read for a stream, it just decodes what it gets fed and spits it back out. I'm not attached to the name "feed". Of course the natural choice for the method names would be "encode" and "decode", but those are already taken. Would "handle" or "convert" be better names? I don't know what the "this" refers to in "This is not what your versions implement". If "this" refers to "The idea is to allow incremental processing", this is exactly what the patch tries to achieve: Incremental processing without tying this processing to a stream API. If "this" refers to "feed style APIs usually take data and store it in the object's state" that's true, but that's not the purpose of the patch, so maybe the name is misleading.
msg47525 - (view)	Author: Walter Dörwald (doerwalter) *	Date: 2006-02-09 15:56
Logged In: YES user_id=89016 Looking at PEP 342 I think the natural name for this method would be send(). It does exactly what send() does for generators: in sends data into the codec, which processes it, returns a result and keeps state for the next call.
msg47526 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2006-02-09 17:58
Logged In: YES user_id=38388 I can see your point in wanting a way to use the stateful encoding/decoding, but still don't understand why you have to sidestep the stream API for doing this. Wouldn't using a StringIO buffer as stream be the more natural choice for the writer and for the reader (StringIO supports Unicode as well). You can then use the standard .write() API to "send" in the data and the .getvalue() method on the StringIO buffer to fetch the results. For the reader, you'd write to the StringIO buffer and then fetch the results using the standard .read() API. This is how you'd normally use a file or stream IO based API in a string context and it doesn't require adding methods to the StreamReader/Writer API. I'm not opposed to adding new methods, but you see, the whole point of StreamReader/Writer is to read from and write to streams. If you just want a stateful encoder/decoder it would be better to create a separate implementation for that, say StatefulEncoder/StatefulDecoder (which could then be used by the StreamReader/Writer).
msg47527 - (view)	Author: Walter Dörwald (doerwalter) *	Date: 2006-02-11 19:50
Logged In: YES user_id=89016 > I can see your point in wanting a way to use the stateful > encoding/decoding, but still don't understand why you > have to sidestep the stream API for doing this. > > Wouldn't using a StringIO buffer as stream be the more > natural choice for the writer and for the reader (StringIO > supports Unicode as well). > > You can then use the standard .write() API to "send" > in the > data and the .getvalue() method on the StringIO buffer to > fetch the results. This doesn't work, because getvalue() doesn't remove the bytes from the buffer: import codecs, StringIO stream = StringIO.StringIO() writer = codecs.getwriter("utf-16")(stream) for c in u"foo": writer.write(c) print repr(stream.getvalue()) This prints: '\xff\xfef\x00' '\xff\xfef\x00o\x00' '\xff\xfef\x00o\x00o\x00' instead of '\xff\xfef\x00' 'o\x00' 'o\x00' > For the reader, you'd write to the > StringIO buffer and then fetch the results using the > standard .read() API. This doesn't work either because the StringIO buffer doesn't keep separate read and write positions: import codecs, StringIO stream = StringIO.StringIO() reader = codecs.getreader("utf-16")(stream) for c in u"foo".encode("utf-16"): stream.write(c) print repr(reader.read()) This outputs: u'' u'' u'' u'' u'' u'' u'' u'' because after the write() call the read() call done trough reader.read() reads from the end of the buffer. BTW, we have been through this before, see: http://mail.python.org/pipermail/python-dev/2004-July/046497.html > This is how you'd normally use a file or stream IO based > API > in a string context and it doesn't require adding methods > to > the StreamReader/Writer API. I'm not opposed to adding new > methods, but you see, the whole point of StreamReader/Writer > is to read from and write to streams. If you just want a > stateful encoder/decoder it would be better to create a > separate implementation for that, say > StatefulEncoder/StatefulDecoder (which could then be used by > the StreamReader/Writer). See http://mail.python.org/pipermail/python-dev/2004-August/047568.html for a proposal. I do have a patch lying around that implements part of that (i.e. codecs.lookup() returns stateful encoders/decoders instead of stream readers/writers), but IMHO this patch is IMHO much to pervasive. We can have the same effect with a small patch to codecs.py.
msg47528 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2006-02-17 16:16
Logged In: YES user_id=38388 See http://mail.python.org/pipermail/python-dev/2006-February/061230.html for details why I'm rejecting this patch.

History
Date	User	Action	Args
2022-04-11 14:56:09	admin	set	github: 41432
2005-01-12 18:14:42	doerwalter	create