Issue847346
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2003-11-22 19:12 by tlynn, last changed 2022-04-11 14:56 by admin. This issue is now closed.
Messages (6) | |||
---|---|---|---|
msg19089 - (view) | Author: Tom Lynn (tlynn) | Date: 2003-11-22 19:12 | |
>>> import textwrap >>> t=textwrap. TextWrapper(fix_sentence_endings=True) >>> print t.fill("A short line. Note the single space.") A short line. Note the single space. TextWrapper.wrap() is implemented as: def wrap(self, text): text = self._munge_whitespace(text) indent = self.initial_indent # *** Next line seems to be the bug *** if len(text) + len(indent) <= self.width: return [indent + text] chunks = self._split(text) if self.fix_sentence_endings: self._fix_sentence_endings(chunks) return self._wrap_chunks(chunks) (if sf breaks the indentation, check the actual source!) That early-return "if" clause seems to be an incorrect optimisation which skips fix_sentence_endings. Commenting it out seems to fix the problem. |
|||
msg19090 - (view) | Author: Neal Norwitz (nnorwitz) * | Date: 2003-11-24 19:53 | |
Logged In: YES user_id=33168 Greg, any comments? You implemented textwrap IIRC. |
|||
msg19091 - (view) | Author: Jim Jewett (jimjjewett) | Date: 2003-11-24 20:53 | |
Logged In: YES user_id=764593 I think fix_sentence endings was originally intended to repair sentences that the wrapper had messed up itself. If the string isn't split, then it hasn't been broken by textwrapper. If the edit should always apply, then I think you want if len(text) + len(indent) <= self.width: to become if ((not self.fix_sentence_endings) and (len(text) + len(indent) <= self.width)): (which also saved the length call on long paragraphs). That said, it feels wrong to hardcode fix_sentence_endings and treat it differently than other cleanup rules. I would feel better if the fix were made in _munge_whitespace, like the following additions: # Look for a lower case letter, followed by # a period, question mark or exclamation mark followed by # (maybe a quotation mark, followed by) # a new line. # This newline is only one whitespace character, but should be # treated as two at the end of a sentence -- so add in a space. fix_sentence_re = re.compile(r'([%s][\.\!\?][\"\']?)(\r\n|\n|\r)' % string.lowercase) if fix_sentence_endings: text = re.sub(fix_sentence_re, r"\1 \2", text) Then the newline can be turned into spaces (or not) just as it is now, but it will have one space in front already. |
|||
msg19092 - (view) | Author: Jim Jewett (jimjjewett) | Date: 2003-12-04 16:18 | |
Logged In: YES user_id=764593 It looks like I was missing a key point. I was thinking that the problem would occur only when a line-break hid the second space; the actual test is a situation where the user only typed one space. As the code already says, the algorithm can't be perfect. (For instance, we don't want a double space after Mr. or Dr., which both end with a lower case letter and a period, and are generally followed by an upper case name instead of an upper-case new sentence.) |
|||
msg19093 - (view) | Author: Tom Lynn (tlynn) | Date: 2004-01-20 20:03 | |
Logged In: YES user_id=915320 I still think it's a bug though. Dr. and Mr. are understandable special cases, single-line input not getting fixed isn't. It's particularly bad since the user has to deliberately turn on fix_sentence_endings. If it then doesn't fix the sentence endings, that's bad. |
|||
msg19094 - (view) | Author: Greg Ward (gward) | Date: 2004-05-13 01:53 | |
Logged In: YES user_id=14422 Yes, that bit of optimization was misguided -- removing it is the right fix. (This isn't the first bug it has caused!) Fixed in Lib/textwrap.py rev 1.32.8.2 (release23-maint branch) and rev 1.25 (trunk). Tested in Lib/test/test_textwrap.py rev 1.22.8.2 (release23-maint branch) and rev 1.34 (trunk). |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:56:01 | admin | set | github: 39585 |
2003-11-22 19:12:00 | tlynn | create |