Issue1022173
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2004-09-04 00:54 by rhettinger, last changed 2022-04-11 14:56 by admin. This issue is now closed.
Files | ||||
---|---|---|---|---|
File name | Uploaded | Description | Edit | |
string.diff | rhettinger, 2004-09-04 00:54 | Diff for string.py |
Messages (13) | |||
---|---|---|---|
msg46845 - (view) | Author: Raymond Hettinger (rhettinger) * | Date: 2004-09-04 00:54 | |
Report line number and token rather than just character position. Detect and report situations where non-ASCII alphabet characters are used in a placeholder number. Currently, this situation results in a silent error for SafeTemplates and either a KeyError or mis-substitution for Templates. Does not change the API or existing tests. |
|||
msg46846 - (view) | Author: Barry A. Warsaw (barry) * | Date: 2004-09-04 17:15 | |
Logged In: YES user_id=12800 I wonder about this patch. PEP 292 clearly says that the first non-identifier character terminates the placeholder. So why would you expect that the eñe would cause an exception to be raised instead of a valid substitution for $ma? Will discuss on python-dev, but in any event, if we accept this patch we would need a unittest update, as well as documentation and PEP 292 updates. |
|||
msg46847 - (view) | Author: Raymond Hettinger (rhettinger) * | Date: 2004-09-04 19:43 | |
Logged In: YES user_id=80475 When a user has their locale set so that a different alphabet is enabled, then they see $mañana as a single token. To them, the third character is not out of place with the rest -- anymore than we think of the letter "d" as not being special. In such case, SafeTemplate will pass the error silently. Bug Report: """SafeTemplate broke. I see placeholder in dictionary but it no substitute. Please fix. >>> SafeTemplate("vamanos $mañana o esté dia") % {'mañana':'ahora'} u'vamanos $mañana o esté dia' """ The templates are likely to be exposed to end users (non- programmers). The above is not an unlikely scenario. We should give the users as much help as possible. Yes, tests and docs will be updated if accepted. It's a waste of time to do so now if you think that $ma was the intended placeholder and want the silent error to pass. Also, the line number / token is an important part of the error message. In a long template, it is useless to say that there is an error at position 23019 for example. |
|||
msg46848 - (view) | Author: Raymond Hettinger (rhettinger) * | Date: 2004-09-04 19:44 | |
Logged In: YES user_id=80475 When a user has their locale set so that a different alphabet is enabled, then they see $mañana as a single token. To them, the third character is not out of place with the rest -- anymore than we think of the letter "d" as not being special. In such case, SafeTemplate will pass the error silently. Bug Report: """SafeTemplate broke. I see placeholder in dictionary but it no substitute. Please fix. >>> SafeTemplate("vamanos $mañana o esté dia") % {'mañana':'ahora'} u'vamanos $mañana o esté dia' """ The templates are likely to be exposed to end users (non- programmers). The above is not an unlikely scenario. We should give the users as much help as possible. Yes, tests and docs will be updated if accepted. It's a waste of time to do so now if you think that $ma was the intended placeholder and want the silent error to pass. Also, the line number / token is an important part of the error message. In a long template, it is useless to say that there is an error at position 23019 for example. |
|||
msg46849 - (view) | Author: Barry A. Warsaw (barry) * | Date: 2004-09-05 15:34 | |
Logged In: YES user_id=12800 Why would the locale have any effect on what Python defines an identifier as? The PEP and documentation clearly state that the substitution variables are Python identifiers and that's a well-defined, locale-neutral concept. The resolution of your hypothetical bug report is: Won't Fix -- "mañana" is not a Python identifier. You can't use it as a variable in regular Python code, and you can't use it as a placeholder in a Template string. |
|||
msg46850 - (view) | Author: Raymond Hettinger (rhettinger) * | Date: 2004-09-05 19:19 | |
Logged In: YES user_id=80475 Sure, it is documented that way. That doesn't mean that we can't give a useful error message when a potentially common end-user mistake is made. The locale has no affect on valid python identifiers; however, it is a strong indicator of what the user expects to be valid alphabetical characters. The idea is to avoid a silent failure for non-programmer end users who may understandably not know that some of their everyday characters will be viewed as delimiters by the template logic. As it stands, it is a usability bug (documented, but a problem never-the-less). MvL concurred when I discussed with him two weeks ago. |
|||
msg46851 - (view) | Author: Raymond Hettinger (rhettinger) * | Date: 2004-09-05 19:30 | |
Logged In: YES user_id=80475 Martin, I'm failing to articulate something that seems obvious to me. Can you add your thoughts on the most user friendly way to treat a placeholder like $mañana in a latin locale. Currently, it captures $ma and proceeds. My thought is to raise a ValueError noting that $mañana contains characters other than _A-Za-z0-9. |
|||
msg46852 - (view) | Author: Martin v. Löwis (loewis) * | Date: 2004-09-05 21:37 | |
Logged In: YES user_id=21627 I think the locale should have no effect whatsoever on templates. The template most likely uses the encoding of the source code, which may or may not be encoding of the locale at run-time. In many cases, it won't, as the run-time locale will be "C", as locale.setlocale has not been called. Of course, it might be possible to state an explicit set of termination characters (e.g. all ASCII punctuation and whitespace) and mandate that the template either terminates with one of these, or uses explicit parentheses. That would mean that the only requirement is that the source encoding is an ASCII superset, which is a requirement, anyway. Whether such a change to the PEP is still possible at this point in time, I don't know. |
|||
msg46853 - (view) | Author: Raymond Hettinger (rhettinger) * | Date: 2004-09-05 22:41 | |
Logged In: YES user_id=80475 Can you recommend a non-locale sensitive approach to detecting alphabetic characters outside of A-Za-z? For SafeTemplates especially, capturing only $ma is a small disaster. |
|||
msg46854 - (view) | Author: Martin v. Löwis (loewis) * | Date: 2004-09-05 23:22 | |
Logged In: YES user_id=21627 If the string is a Unicode string, you can use .isletter. If it is a byte string, then it is impossible to determine letters (strictly speaking, it is even impossible to determin ASCII letters then). |
|||
msg46855 - (view) | Author: Raymond Hettinger (rhettinger) * | Date: 2004-09-07 04:49 | |
Logged In: YES user_id=80475 Put new alternate versions in nondist\sandbox\string. Has improved line number logic that recognized various types of line endings. Removes the locale specific alphabet tests. Now uses the unicode definitions of alphanumeric characters to detect and report a broad class of usability errors that will likely affect non-programmer end users. By using the unicode definitions, the code is no longer locale sensitive and will provide identical cross-platform results. The sandbox code also includes doctests. Currently, it is in a form for class based __call__ invocation or for funtion invocation. Easily adapted to the __mod__ form is that is the final decision. |
|||
msg46856 - (view) | Author: Raymond Hettinger (rhettinger) * | Date: 2004-09-07 05:24 | |
Logged In: YES user_id=80475 There are now three alternatives in the sandbox. In order of preference: alt292.py Function invocation curry292.py Guido's __call__ invocation mod292.py Version using the % operator |
|||
msg46857 - (view) | Author: Raymond Hettinger (rhettinger) * | Date: 2004-09-10 06:28 | |
Logged In: YES user_id=80475 Moved the conversion to python-dev at Barry's request. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:56:06 | admin | set | github: 40867 |
2004-09-04 00:54:52 | rhettinger | create |