Issue1065388
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2004-11-12 19:12 by gvanrossum, last changed 2022-04-11 14:56 by admin. This issue is now closed.
Files | ||||
---|---|---|---|---|
File name | Uploaded | Description | Edit | |
caldiff.txt | tim.peters, 2004-11-12 20:00 | Cache a lot once, compute a lot less |
Messages (8) | |||
---|---|---|---|
msg23106 - (view) | Author: Guido van Rossum (gvanrossum) * | Date: 2004-11-12 19:12 | |
The day and month lookups in calendar.py recompute the entire array of names on each __getitem__ call. This caused me some embarrassment when a colleague put a month name lookup in a critical inner loop, and removing the lookup sped it up by an order of magnitude (and there was plenty going on in that loop!). More to the point, something that *looks* like a quick lookup operation should behave as such. I understand this is hard to fix, because we have no way of trapping locale changes -- perhaps adding a mechanism to do that would be the start for fixing this. |
|||
msg23107 - (view) | Author: Tim Peters (tim.peters) * | Date: 2004-11-12 20:00 | |
Logged In: YES user_id=31435 The entire array is recomputed because the index may be a slice object that references the entire array. Can't imagine what "*looks* like a quick lookup" could mean in any objective sense. I assume your colleague was using "[]" notation. Doesn't necessarily look quick to me <wink>. Any inner loop that expects to run long enough that the month may change is necessarily not a critical inner loop <0.5 wink>. Maybe the attached patch is good enough. It creates appropriate datetime objects just once, caches their strftime () methods, and recomputes only as much as a specific __getitem__ invocation needs. |
|||
msg23108 - (view) | Author: Guido van Rossum (gvanrossum) * | Date: 2004-11-12 21:47 | |
Logged In: YES user_id=6380 The loop could be eliminated from __getitem__ by writing this instead: def __getitem__(self, i): return datetime.date(2001, i, 1).strftime(self.format) and similar for _localized_day. But we could improve even upon that by creating an instance variable of 13 date objects like this: # in __init__ self._datelist = [None] + [datetime.date(2001, j+1, 1) for j in range(12)] def __getitem__(self, i): return self._datelist[i].strftime(self.format) (I would make this change myself but I want someone to review this because I recall there were subtle compatibility issues around this...) That's still suboptimal because the strftime is done over and over, but at least it's done only once per __getitem__ call rather than 12x. |
|||
msg23109 - (view) | Author: Tim Peters (tim.peters) * | Date: 2004-11-12 23:09 | |
Logged In: YES user_id=31435 Guido, see my earlier patch (which is attached). Looping cannot be eliminated because i may be a slice object, in which case __getitem__ needs to return a list. The patch avoids looping when i is not a slice object. It also builds the date objects just once. I believe it's the simplest approach of this kind that's strong enough so that test_calendar still passes. |
|||
msg23110 - (view) | Author: Skip Montanaro (skip.montanaro) * | Date: 2004-11-13 15:26 | |
Logged In: YES user_id=44345 Looks good to me. I'll check in the change and close out the report unless you think this should wait until after 2.4 is released. |
|||
msg23111 - (view) | Author: Tim Peters (tim.peters) * | Date: 2004-11-13 16:21 | |
Logged In: YES user_id=31435 I checked in a variant of the patch. Since this should have no effect on visible semantics, and is a straightforward change, there's no reason to wait for 2.5: Lib/calendar.py 1.35 Lib/test/test_calendar.py 1.8 Misc/NEWS 1.1188 |
|||
msg23112 - (view) | Author: Guido van Rossum (gvanrossum) * | Date: 2004-11-17 01:08 | |
Logged In: YES user_id=6380 Thanks for the fix, I knew there was something I forgot! One comment: [Tim] > Can't imagine what "*looks* like a quick lookup" could mean in > any objective sense. I assume your colleague was using "[]" > notation. Doesn't necessarily look quick to me <wink>. > > Any inner loop that expects to run long enough that the > month may change is necessarily not a critical inner loop <0.5 > wink>. Well, the loop was something like (not literally): for year, month, day in <source of many dates>: monthname = calendar.month_name[month] key = (year, monthname, day) <now use key for look up in some other table> The other table happened to be indexed by (year, monthname, day) instead of using the numeric month, and IMO calendar.month_name[month] is indeed the "obvious" way to do the translation from numeric month to month name if you don't want to have to type them yourself (never mind that it's not documented -- but it's in __all__ and tested by the test suite so I consider that a doc bug). Since upgrading to Python 2.4 isn't in the cards anytime soon and we don't want to go down the slippery path of costum changes to Python, we fixed it by changing the way the other table was indexed. |
|||
msg23113 - (view) | Author: Tim Peters (tim.peters) * | Date: 2004-11-17 03:03 | |
Logged In: YES user_id=31435 Oh, I sure agree it's the obvious way to look up the month name. What's *not* obvious is that damn foreigners forced us to turn an indexing operation into a locale-aware nightmare <wink>. A semi-obvious way to rewrite that loop avoids all the overheads in the loop, with or without the patch: create a local month_name = calendar.month_name[:] outside the loop, and reference `month_name` inside the loop instead. Then inside the loop it's literally & truly just indexing a builtin list with an integer. Of course then the month names won't change inside the loop if the locale changes, but I count that a feature, being a non-foreigner myself <wink>. BTW, the crushing effectiveness of the "[:]" trick here is an upside of allowing a slice object as an index. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:56:08 | admin | set | github: 41166 |
2004-11-12 19:12:47 | gvanrossum | create |