This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Add col information to parse & ast nodes
Type: Stage:
Components: Interpreter Core Versions: Python 2.5
process
Status: closed Resolution: accepted
Dependencies: Superseder:
Assigned To: loewis Nosy List: jpe, loewis
Priority: normal Keywords: patch

Created on 2006-02-28 21:36 by jpe, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
col-offset.diff jpe, 2006-03-01 21:50 Another upload try
Messages (4)
msg49610 - (view) Author: John Ehresman (jpe) * Date: 2006-02-28 21:36
This adds fields to the parser to capture the column
where each token starts and each ast node starts (this
is defined as the initial token in the ast node).  With
this it's reasonably easy to extract the text that ast
nodes are based on.

The patch is incomplete, will probably change a bit,
and lacks tests, but I wanted to get feedback on a few
questions.

* The byte offset of the column position is what is
being recorded.  I wonder now if the unicode character
position should be recorded.  This will slow things
down somewhat, but the performance loss may not be
signficant.

* I changed the signature of PyNode_AddChild and
PyParse_AddToken.  Is this permitted or do new
functions need to be created so that the old signatures
are preserved.

* Where should I put a function that given an ast tree
and the source text will add the text that each node is
based on?  This will be a python function (I'm pretty
sure) so it's not easily put in the _ast module.

Note that generated files are omitted from the patch.
msg49611 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2006-02-28 22:39
Logged In: YES 
user_id=21627

- the byte offset is actually a UTF-8 byte offset. That should be documented, in 
the grammar, and perhaps elaborated in libast.tex.
- changing the signatures is fine; it is unlikely that anybody calls this API, and if 
they do, the compiler will tell them.
- applications of the AST should go into Demo/parser.
msg49612 - (view) Author: John Ehresman (jpe) * Date: 2006-03-01 20:54
Logged In: YES 
user_id=22785

Updated patch that includes some tests and documentation. 
The slightly tricky part is the col_offset of an Attribute
node -- it was being set to the start of the attribute and
after the initial name.  Now it points to the start of the
initial name.  I think we need to wait for some use cases to
determine if any more positional information is needed.  I
suspect some uses may want the positions of each identifier,
which is not easily obtainable right now.

Includes change to asdl.py to return attributes in the order
specified in the .asdl file.
msg49613 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2006-03-02 00:07
Logged In: YES 
user_id=21627

Thanks for the patch. Committed as 42753
History
Date User Action Args
2022-04-11 14:56:15adminsetgithub: 42955
2006-02-28 21:36:24jpecreate