Issue1443155
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2006-03-04 18:45 by hyeshik.chang, last changed 2022-04-11 14:56 by admin. This issue is now closed.
Files | ||||
---|---|---|---|---|
File name | Uploaded | Description | Edit | |
cjkcodecs-inc1.diff | hyeshik.chang, 2006-03-04 18:45 | incremental codecs patch for cjkcodecs | ||
cjkcodecs-inc2.diff | hyeshik.chang, 2006-03-18 18:32 | revised patch |
Messages (8) | |||
---|---|---|---|
msg49644 - (view) | Author: Hyeshik Chang (hyeshik.chang) * | Date: 2006-03-04 18:45 | |
Here's a supplemental patch for SF #1436130 to implement CJKCodecs part of the Incremental codec specification. This patch is implemented in an interface of Walter's fourth patch on #1436130. Please test this whether it agrees the design. |
|||
msg49645 - (view) | Author: Walter Dörwald (doerwalter) * | Date: 2006-03-15 12:08 | |
Logged In: YES user_id=89016 The patch doesn't apply cleanly (conflicts in Lib/test/test_multibytecodec.py and Tools/unicode/Makefile). Could you update the patch? I haven't looked at the C code to closely yet. Two notes: 1) The tests often call incencoder.encode() or incdecoder.decode() again after the method has been called with final=True before. I'm not sure that this should be allowed. If we allow it, it should be documented in what state the codec is after calling with final=True (probably it should be back to the initial state (i.e. like calling reset())). 2) It seems to me that it isn't possible to change the error handling during the lifetime of a codec. Anyway, thanks for the quick patch. |
|||
msg49646 - (view) | Author: Hyeshik Chang (hyeshik.chang) * | Date: 2006-03-16 12:32 | |
Logged In: YES user_id=55188 1) Because CJKCodecs had an internal stateful framework, I implemented just an interface using it for IncrementalCodec. It treats final=True as a simple `flush' message(which doesn't reset or terminate the codec). The behavior is quite useful for real-time stream processing such as sockets and tail log watchers. If we disallow that, such usages may require its own sequence detectors. For "to reset or not" issue, I think we can simply follow how iconv does. iconv doesn't reset the internal state for iconv(ic, NULL, NULL, ..). 2) Aah. I didn't notice that .errors is a part of public API. The current CJKCodecs can't support it easily yet. I'll fix it and upload a updated patch soon. Thank you for your review! |
|||
msg49647 - (view) | Author: Walter Dörwald (doerwalter) * | Date: 2006-03-18 15:24 | |
Logged In: YES user_id=89016 What other interpretation of the final parameter can we use that doesn't make it completely useless? What about the following: "If final is true the codec must encode/decode the input completely and must flush all buffers. If this isn't possible (e.g. because of incomplete byte sequences on decoding) it must raise an exception (unless prevented by an error handler)"? |
|||
msg49648 - (view) | Author: Hyeshik Chang (hyeshik.chang) * | Date: 2006-03-18 18:32 | |
Logged In: YES user_id=55188 I updated the patch for the .errors visibility. I like the statement. But the current implementation of the patch is slightly different for a corner case; the current implementation keeps buffers if an error is occurred. That is somewhat natural because error cases doesn't make side-effects usually. But in other side, I agree to your statement in a view of a direct interpretation of final=True. What do you think about this? |
|||
msg49649 - (view) | Author: Walter Dörwald (doerwalter) * | Date: 2006-03-18 20:47 | |
Logged In: YES user_id=89016 Maybe the statement is a bit misleading in its current form. I didn't mean that error handling prevents the flushing of the buffers, just that error handling prevents raising an error. I hope the following is clearer: "If final is true the codec must encode/decode the input completely and must flush all buffers. If this isn't possible (e.g. because of incomplete byte sequences still remaining in the buffer on decoding) it must initiate error handling just like in the stateless case (which might raise an exception)." I'll take a look at the patch in the next few days. |
|||
msg49650 - (view) | Author: Walter Dörwald (doerwalter) * | Date: 2006-03-25 19:56 | |
Logged In: YES user_id=89016 OK, a few notes on the patch: In the test "c = codecs.lookup('utf-8')[3](s)" should be written as "c = codecs.getwriter('utf-8')(s)". Someday in the future CodecInfo may no longer be a tuple. Instead of htmlentitydefs.entitydefs, the test could use htmlentitydefs.name2codepoint. I'd like to see a few tests for error callbacks that return the wrong objects and for callbacks that return an offset that is not exc.end. The ERROR_ISCUSTOM() macro looks wrong to me: smaller than 1 *and* greater that 3? In mbiencoder_encode() r = encoder_encode_stateful(STATEFUL_ECTX(self), data, final); if (r == NULL) return NULL; return r; can be simplified to: return encoder_encode_stateful(STATEFUL_ECTX(self), data, final); Apart from that the patch looks good to me, so go ahead and check it in. |
|||
msg49651 - (view) | Author: Hyeshik Chang (hyeshik.chang) * | Date: 2006-03-26 02:35 | |
Logged In: YES user_id=55188 Committed as r43320. Thank you for the kind reviews! :) |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:56:15 | admin | set | github: 42979 |
2006-03-04 18:45:01 | hyeshik.chang | create |