This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Improve Template error detection and reporting
Type: Stage:
Components: Library (Lib) Versions: Python 2.4
process
Status: closed Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: barry, loewis, rhettinger
Priority: high Keywords: patch

Created on 2004-09-04 00:54 by rhettinger, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
string.diff rhettinger, 2004-09-04 00:54 Diff for string.py
Messages (13)
msg46845 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2004-09-04 00:54
Report line number and token rather than just character
position.

Detect and report situations where non-ASCII alphabet
characters are used in a placeholder number. 
Currently, this situation results in a silent error for
SafeTemplates and either a KeyError or mis-substitution
for Templates.

Does not change the API or existing tests.
msg46846 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2004-09-04 17:15
Logged In: YES 
user_id=12800

I wonder about this patch.  PEP 292 clearly says that the
first non-identifier character terminates the placeholder. 
So why would you expect that the eñe would cause an
exception to be raised instead of a valid substitution for $ma?

Will discuss on python-dev, but in any event, if we accept
this patch we would need a unittest update, as well as
documentation and PEP 292 updates.
msg46847 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2004-09-04 19:43
Logged In: YES 
user_id=80475

When a user has their locale set so that a different alphabet 
is enabled, then they see $mañana as a single token.  To 
them, the third character is not out of place with the rest -- 
anymore than we think of the letter "d" as not being special.  
In such case, SafeTemplate will pass the error silently.

Bug Report:
"""SafeTemplate broke.  I see placeholder in dictionary but it 
no substitute.  Please fix.

>>>   SafeTemplate("vamanos $mañana o esté dia") % 
{'mañana':'ahora'}
u'vamanos $mañana o esté dia' 

"""

The templates are likely to be exposed to end users (non-
programmers).  The above is not an unlikely scenario.  We 
should give the users as much help as possible.  

Yes, tests and docs will be updated if accepted.  It's a waste 
of time to do so now if you think that $ma was the intended 
placeholder and want the silent error to pass.

Also, the line number / token is an important part of the 
error message.  In a long template, it is useless to say that 
there is an error at position 23019 for example.
msg46848 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2004-09-04 19:44
Logged In: YES 
user_id=80475

When a user has their locale set so that a different alphabet 
is enabled, then they see $mañana as a single token.  To 
them, the third character is not out of place with the rest -- 
anymore than we think of the letter "d" as not being special.  
In such case, SafeTemplate will pass the error silently.

Bug Report:
"""SafeTemplate broke.  I see placeholder in dictionary but it 
no substitute.  Please fix.

>>>   SafeTemplate("vamanos $mañana o esté dia") % 
{'mañana':'ahora'}
u'vamanos $mañana o esté dia' 

"""

The templates are likely to be exposed to end users (non-
programmers).  The above is not an unlikely scenario.  We 
should give the users as much help as possible.  

Yes, tests and docs will be updated if accepted.  It's a waste 
of time to do so now if you think that $ma was the intended 
placeholder and want the silent error to pass.

Also, the line number / token is an important part of the 
error message.  In a long template, it is useless to say that 
there is an error at position 23019 for example.
msg46849 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2004-09-05 15:34
Logged In: YES 
user_id=12800

Why would the locale have any effect on what Python defines
an identifier as?  The PEP and documentation clearly state
that the substitution variables are Python identifiers and
that's a well-defined, locale-neutral concept.

The resolution of your hypothetical bug report is:

Won't Fix -- "mañana" is not a Python identifier.  You can't
use it as a variable in regular Python code, and you can't
use it as a placeholder in a Template string.
msg46850 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2004-09-05 19:19
Logged In: YES 
user_id=80475

Sure, it is documented that way.  That doesn't mean that we
can't give a useful error message when a potentially common
end-user mistake is made.

The locale has no affect on valid python identifiers;
however, it is a strong indicator of what the user expects
to be valid alphabetical characters.  The idea is to avoid a
silent failure for non-programmer end users who may
understandably not know that some of their everyday
characters will be viewed as delimiters by the template
logic.  As it stands, it is a usability bug (documented, but
a problem never-the-less).

MvL concurred when I discussed with him two weeks ago.

msg46851 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2004-09-05 19:30
Logged In: YES 
user_id=80475

Martin, I'm failing to articulate something that seems
obvious to me.  Can you add your thoughts on the most user
friendly way to treat a placeholder like $mañana in a latin
locale.

Currently, it captures $ma and proceeds.  My thought is to
raise a ValueError noting that $mañana contains characters
other than _A-Za-z0-9.
msg46852 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2004-09-05 21:37
Logged In: YES 
user_id=21627

I think the locale should have no effect whatsoever on
templates. The template most likely uses the encoding of the
source code, which may or may not be encoding of the locale
at run-time. In many cases, it won't, as the run-time locale
will be "C", as locale.setlocale has not been called.

Of course, it might be possible to state an explicit set of
termination characters (e.g. all ASCII punctuation and
whitespace) and mandate that the template either terminates
with one of these, or uses explicit parentheses. That would
mean that the only requirement is that the source encoding
is an ASCII superset, which is a requirement, anyway.

Whether such a change to the PEP is still possible at this
point in time, I don't know.
msg46853 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2004-09-05 22:41
Logged In: YES 
user_id=80475

Can you recommend a non-locale sensitive approach to
detecting alphabetic characters outside of A-Za-z?

For SafeTemplates especially, capturing only $ma is a small
disaster.
msg46854 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2004-09-05 23:22
Logged In: YES 
user_id=21627

If the string is a Unicode string, you can use .isletter. If
it is a byte string, then it is impossible to determine
letters (strictly speaking, it is even impossible to
determin ASCII letters then).
msg46855 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2004-09-07 04:49
Logged In: YES 
user_id=80475

Put new alternate versions in nondist\sandbox\string.

Has improved line number logic that recognized various types
of line endings.

Removes the locale specific alphabet tests.  Now uses the
unicode definitions of alphanumeric characters to detect and
report a broad class of usability errors that will likely
affect non-programmer end users.

By using the unicode definitions, the code is no longer
locale sensitive and will provide identical cross-platform
results.

The sandbox code also includes doctests.

Currently, it is in a form for class based __call__
invocation or for funtion invocation.  Easily adapted to the
__mod__ form is that is  the final decision.
msg46856 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2004-09-07 05:24
Logged In: YES 
user_id=80475

There are now three alternatives in the sandbox.  In order
of preference:

alt292.py     Function invocation
curry292.py  Guido's __call__ invocation
mod292.py    Version using the % operator
msg46857 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2004-09-10 06:28
Logged In: YES 
user_id=80475

Moved the conversion to python-dev at Barry's request.
History
Date User Action Args
2022-04-11 14:56:06adminsetgithub: 40867
2004-09-04 00:54:52rhettingercreate