This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: textwrap ignoring fix_sentence_endings for single lines
Type: Stage:
Components: Library (Lib) Versions: Python 2.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: gward Nosy List: gward, jimjjewett, nnorwitz, tlynn
Priority: normal Keywords:

Created on 2003-11-22 19:12 by tlynn, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (6)
msg19089 - (view) Author: Tom Lynn (tlynn) Date: 2003-11-22 19:12
>>> import textwrap
>>> t=textwrap.
TextWrapper(fix_sentence_endings=True)
>>> print t.fill("A short line. Note the single space.")
A short line. Note the single space.

TextWrapper.wrap() is implemented as:

    def wrap(self, text):
        text = self._munge_whitespace(text)
        indent = self.initial_indent
        # *** Next line seems to be the bug ***
        if len(text) + len(indent) <= self.width:
            return [indent + text]
        chunks = self._split(text)
        if self.fix_sentence_endings:
            self._fix_sentence_endings(chunks)
        return self._wrap_chunks(chunks)

(if sf breaks the indentation, check the actual source!)

That early-return "if" clause seems to be an incorrect 
optimisation which skips fix_sentence_endings.  
Commenting it out seems to fix the problem.
msg19090 - (view) Author: Neal Norwitz (nnorwitz) * (Python committer) Date: 2003-11-24 19:53
Logged In: YES 
user_id=33168

Greg, any comments?  You implemented textwrap IIRC.
msg19091 - (view) Author: Jim Jewett (jimjjewett) Date: 2003-11-24 20:53
Logged In: YES 
user_id=764593

I think fix_sentence endings was originally intended to repair sentences that the wrapper had messed up itself.  If the string isn't split, then it hasn't been broken by textwrapper.

If the edit should always apply, then I think you want

if len(text) + len(indent) <= self.width:

to become

if ((not self.fix_sentence_endings) and (len(text) + len(indent) <= self.width)):

(which also saved the length call on long paragraphs).

That said, it feels wrong to hardcode fix_sentence_endings and treat it differently than other cleanup rules.

I would feel better if the fix were made in _munge_whitespace, like the following additions:

# Look for a lower case letter, followed by
# a period, question mark or exclamation mark followed by 
# (maybe a quotation mark, followed by)
# a new line.
# This newline is only one whitespace character, but should be
# treated as two at the end of a sentence -- so add in a space.
fix_sentence_re = re.compile(r'([%s][\.\!\?][\"\']?)(\r\n|\n|\r)'
                             % string.lowercase)


if fix_sentence_endings:  text = re.sub(fix_sentence_re, r"\1 \2", text)

Then the newline can be turned into spaces (or not) just as it is now, but it will have one space in front already.  
msg19092 - (view) Author: Jim Jewett (jimjjewett) Date: 2003-12-04 16:18
Logged In: YES 
user_id=764593

It looks like I was missing a key point.  I was thinking that the 
problem would occur only when a line-break hid the second 
space; the actual test is a situation where the user only typed 
one space.  As the code already says, the algorithm can't be 
perfect.  (For instance, we don't want a double space after 
Mr. or Dr., which both end with a lower case letter and a 
period, and are generally followed by an upper case name 
instead of an upper-case new sentence.)
msg19093 - (view) Author: Tom Lynn (tlynn) Date: 2004-01-20 20:03
Logged In: YES 
user_id=915320

I still think it's a bug though.  Dr. and Mr. are understandable 
special cases, single-line input not getting fixed isn't.  It's 
particularly bad since the user has to deliberately turn on 
fix_sentence_endings.  If it then doesn't fix the sentence 
endings, that's bad.
msg19094 - (view) Author: Greg Ward (gward) (Python committer) Date: 2004-05-13 01:53
Logged In: YES 
user_id=14422

Yes, that bit of optimization was misguided -- removing it
is the right fix.  (This isn't the first bug it has caused!)

Fixed in Lib/textwrap.py rev 1.32.8.2 (release23-maint
branch) and rev 1.25 (trunk).

Tested in Lib/test/test_textwrap.py rev 1.22.8.2
(release23-maint branch) and rev 1.34 (trunk).

History
Date User Action Args
2022-04-11 14:56:01adminsetgithub: 39585
2003-11-22 19:12:00tlynncreate