This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: tokenize module does not detect inconsistent dedents
Type: Stage:
Components: Library (Lib) Versions:
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: rhettinger Nosy List: arigo, dyoo, georg.brandl, kbk, rhettinger
Priority: high Keywords:

Created on 2005-06-21 06:10 by dyoo, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
tokenize.py.diff dyoo, 2005-06-21 06:10 Diff to correct non-detection of incorrect dedent bug
testcase.py dyoo, 2005-06-21 06:10 test case to expose tokenize bug
breaking-getsource.py arigo, 2005-09-02 12:10 inspect.getsource() breaks now
patch.tokenize arigo, 2005-09-02 12:40 diff with test to fix the inspect.getsource() case
Messages (6)
msg25595 - (view) Author: Danny Yoo (dyoo) Date: 2005-06-21 06:10
The attached code snippet 'testcase.py' should produce an 
IndentationError, but does not.  The code in tokenize.py is too 
trusting, and needs to add a check against bad indentation as it 
yields DEDENT tokens.

I'm including a diff to tokenize.py that should at least raise an 
exception on bad indentation like this.

Just in case, I'm including testcase.py here too:
------
import tokenize
from StringIO import StringIO
sampleBadText = """
def foo():
    bar
  baz
"""
print list(tokenize.generate_tokens(
    StringIO(sampleBadText).readline))
msg25596 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2005-06-21 07:54
Logged In: YES 
user_id=80475

Fixed.  
See Lib/tokenize.py 1.38 and 1.36.4.1
msg25597 - (view) Author: Armin Rigo (arigo) * (Python committer) Date: 2005-09-02 12:10
Logged In: YES 
user_id=4771

Reopening this bug report: this might fix the problem at
hand, but it breaks inspect.getsource() on cases where it
used to work.  See attached example.
msg25598 - (view) Author: Armin Rigo (arigo) * (Python committer) Date: 2005-09-02 12:40
Logged In: YES 
user_id=4771

Here is a proposed patch.  It relaxes the dedent policy a
bit.  It assumes that the first line may already have some
initial indentation, as is the case when tokenizing from the
middle of a file (as inspect.getsource() does).

It should also be back-ported to 2.4, given that the
previous patch was.  For 2.4, only the non-test part of the
patch applies cleanly; I suggest to ignore the test part and
just apply it, given that there are much more tests in 2.5
for inspect.getsource() anyway.

The whole issue of inspect.getsource() being muddy anyway, I
will go ahead and check this patch in unless someone spots a
problem.  For now the previously-applied patch makes parts
of PyPy break with an uncaught IndentationError.
msg25599 - (view) Author: Kurt B. Kaiser (kbk) * (Python committer) Date: 2006-08-10 01:40
Logged In: YES 
user_id=149084

Tokenize Rev 39046 21Jun05 breaks tabnanny.

tabnanny doesn't handle the IndentationError exception
when tokenize detects a dedent.

I patched up ScriptBinding.py in IDLE.  The 
IndentationError probably should pass the same parms as
TokenError and tabnanny should catch it.
msg25600 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2006-08-14 21:34
Logged In: YES 
user_id=849994

tabnanny's been taken care of in r51284.
History
Date User Action Args
2022-04-11 14:56:11adminsetgithub: 42104
2005-06-21 06:10:00dyoocreate