Issue676346
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2003-01-28 20:59 by dmgrime, last changed 2022-04-10 16:06 by admin. This issue is now closed.
Files | ||||
---|---|---|---|---|
File name | Uploaded | Description | Edit | |
func.py | dmgrime, 2003-01-28 21:00 | Test Case |
Messages (4) | |||
---|---|---|---|
msg14292 - (view) | Author: David M. Grimes (dmgrime) | Date: 2003-01-28 20:59 | |
When performing a string formatting operation using %s and a unicode argument, the argument evaluation is performed more than once. In certain environments (see example) this leads to excessive calls. It seems Python-2.2.2:Objects/stringobject.c:3394 is where PyObject_GetItem is used (for dictionary-like formatting args). Later, at :3509, there is a"goto unicode" when a string argument is actually unicode. At this point, everything resets and we do it all over again in PyUnicode_Format. There is an underlying assumption that the cost of the call to PyObject_GetItem is very low (since we're going to do them all again for unicode). We've got a Python-based templating system which uses a very simple Mix-In class to facilitate flexible page generation. At the core is a simple __getitem__ implementation which maps calls to getattr(): class mixin: def __getitem__(self, name): print '%r::__getitem__(%s)' % (self, name) hook = getattr(self, name) if callable(hook): return hook() else: return hook Obviously, the print is diagnostic. So, this basic mechanism allows one to write hierarchical templates filling in content found in "%(xxxx)s" escapes with functions returning strings. It has worked extremely well for us. BUT, we recently did some XML-based work which uncovered this strange unicode behaviour. Given the following classes: class w1u(mixin): v1 = u'v1' class w2u(mixin): def v2(self): return '%(v1)s' % w1u() class w3u(mixin): def v3(self): return '%(v2)s' % w2u() class w1(mixin): v1 = 'v1' class w2(mixin): def v2(self): return '%(v1)s' % w1() class w3(mixin): def v3(self): return '%(v2)s' % w2() And test case: print 'All string:' print '%(v3)s' % w3() print print 'Unicode injected at w1u:' print '%(v3)s' % w3u() print As we can see, the only difference between the w{1,2,3} and w{1,2,3}u classes is that w1u defines v1 as unicode where w1 uses a "normal" string. What we see is the string-based one shows 3 calls, as expected: All string: <__main__.w3 instance at 0x8150524>::__getitem__(v3) <__main__.w2 instance at 0x814effc>::__getitem__(v2) <__main__.w1 instance at 0x814f024>::__getitem__(v1) v1 But the unicode causes a tree-like recursion: Unicode injected at w1u: <__main__.w3u instance at 0x8150524>::__getitem__(v3) <__main__.w2u instance at 0x814effc>::__getitem__(v2) <__main__.w1u instance at 0x814f024>::__getitem__(v1) <__main__.w1u instance at 0x814f024>::__getitem__(v1) <__main__.w2u instance at 0x814effc>::__getitem__(v2) <__main__.w1u instance at 0x814f024>::__getitem__(v1) <__main__.w1u instance at 0x814f024>::__getitem__(v1) <__main__.w3u instance at 0x8150524>::__getitem__(v3) <__main__.w2u instance at 0x814effc>::__getitem__(v2) <__main__.w1u instance at 0x814f024>::__getitem__(v1) <__main__.w1u instance at 0x814f024>::__getitem__(v1) <__main__.w2u instance at 0x814effc>::__getitem__(v2) <__main__.w1u instance at 0x814f024>::__getitem__(v1) <__main__.w1u instance at 0x814f024>::__getitem__(v1) v1 I'm sure this isn't a "common" use of the string formatting mechanism, but it seems that evaluating the arguments multiple times could be a bad thing. It certainly is for us 8^) We're running this on a RedHat 7.3/8.0 setup, not that it appears to matter (from looking in stringojbect.c). Also appears to still be a problem in 2.3a1. Any comments? Help? Questions? |
|||
msg14293 - (view) | Author: Marc-Andre Lemburg (lemburg) * | Date: 2003-01-28 22:23 | |
Logged In: YES user_id=38388 I don't see how you can avoid fetching the Unicode argument a second time without restructuring the formatting code altogether. If you know that your arguments can be Unicode, you should start with a Unicode formatting string to begin with. That's faster and doesn't involve a fallback solution. If you still want to see this fixed, I'd suggest to submit a patch. |
|||
msg14294 - (view) | Author: Facundo Batista (facundobatista) * | Date: 2005-01-11 03:54 | |
Logged In: YES user_id=752496 Please, could you verify if this problem persists in Python 2.3.4 or 2.4? If yes, in which version? Can you provide a test case? If the problem is solved, from which version? Note that if you fail to answer in one month, I'll close this bug as "Won't fix". Thank you! . Facundo |
|||
msg14295 - (view) | Author: Facundo Batista (facundobatista) * | Date: 2005-05-30 19:55 | |
Logged In: YES user_id=752496 Deprecated. Reopen only if still happens in 2.3 or newer. . Facundo |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-10 16:06:15 | admin | set | github: 37858 |
2003-01-28 20:59:42 | dmgrime | create |