This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: calendar day/month name lookup too slow
Type: Stage:
Components: Library (Lib) Versions: Python 2.4
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: skip.montanaro Nosy List: gvanrossum, skip.montanaro, tim.peters
Priority: normal Keywords:

Created on 2004-11-12 19:12 by gvanrossum, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
caldiff.txt tim.peters, 2004-11-12 20:00 Cache a lot once, compute a lot less
Messages (8)
msg23106 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2004-11-12 19:12
The day and month lookups in calendar.py recompute the
entire array of names on each __getitem__ call. This
caused me some embarrassment when a colleague put a
month name lookup in a critical inner loop, and
removing the lookup sped it up by an order of magnitude
(and there was plenty going on in that loop!). More to
the point, something that *looks* like a quick lookup
operation should behave as such.

I understand this is hard to fix, because we have no
way of trapping locale changes -- perhaps adding a
mechanism to do that would be the start for fixing this.
msg23107 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2004-11-12 20:00
Logged In: YES 
user_id=31435

The entire array is recomputed because the index may be a 
slice object that references the entire array.

Can't imagine what "*looks* like a quick lookup" could mean in 
any objective sense.  I assume your colleague was using "[]" 
notation.  Doesn't necessarily look quick to me <wink>.

Any inner loop that expects to run long enough that the 
month may change is necessarily not a critical inner loop <0.5 
wink>.

Maybe the attached patch is good enough.  It creates 
appropriate datetime objects just once, caches their strftime
() methods, and recomputes only as much as a specific 
__getitem__ invocation needs.
msg23108 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2004-11-12 21:47
Logged In: YES 
user_id=6380

The loop could be eliminated from __getitem__ by writing
this instead:

def __getitem__(self, i):
    return datetime.date(2001, i, 1).strftime(self.format)

and similar for _localized_day.

But we could improve even upon that by creating an instance
variable of 13 date objects like this:

    # in __init__
    self._datelist = [None] + [datetime.date(2001, j+1, 1)
                                      for j in range(12)]

def __getitem__(self, i):
    return self._datelist[i].strftime(self.format)

(I would make this change myself but I want someone to
review this because I recall there were subtle compatibility
issues around this...)

That's still suboptimal because the strftime is done over
and over, but at least it's done only once per __getitem__
call rather than 12x.
msg23109 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2004-11-12 23:09
Logged In: YES 
user_id=31435

Guido, see my earlier patch (which is attached).  Looping 
cannot be eliminated because i may be a slice object, in 
which case __getitem__ needs to return a list.  The patch 
avoids looping when i is not a slice object.  It also builds the 
date objects just once.  I believe it's the simplest approach 
of this kind that's strong enough so that test_calendar still 
passes.
msg23110 - (view) Author: Skip Montanaro (skip.montanaro) * (Python triager) Date: 2004-11-13 15:26
Logged In: YES 
user_id=44345

Looks good to me.  I'll check in the change and close out the 
report unless you think this should wait until after 2.4 is released.
msg23111 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2004-11-13 16:21
Logged In: YES 
user_id=31435

I checked in a variant of the patch.  Since this should have 
no effect on visible semantics, and is a straightforward 
change, there's no reason to wait for 2.5:

Lib/calendar.py 1.35
Lib/test/test_calendar.py 1.8
Misc/NEWS 1.1188
msg23112 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2004-11-17 01:08
Logged In: YES 
user_id=6380

Thanks for the fix, I knew there was something I forgot!

One comment:

[Tim]
> Can't imagine what "*looks* like a quick lookup" could mean in
> any objective sense.  I assume your colleague was using "[]"
> notation.  Doesn't necessarily look quick to me <wink>.
>
> Any inner loop that expects to run long enough that the
> month may change is necessarily not a critical inner loop <0.5
> wink>.

Well, the loop was something like (not literally):

  for year, month, day in <source of many dates>:
     monthname = calendar.month_name[month]
     key = (year, monthname, day)
     <now use key for look up in some other table>

The other table happened to be indexed by (year, monthname,
day) instead of using the numeric month, and IMO
calendar.month_name[month] is indeed the "obvious" way to do
the translation from numeric month to month name if you
don't want to have to type them yourself (never mind that
it's not documented -- but it's in __all__ and tested by the
test suite so I consider that a doc bug).

Since upgrading to Python 2.4 isn't in the cards anytime
soon and we don't want to go down the slippery path of
costum changes to Python, we fixed it by changing the way
the other table was indexed.
msg23113 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2004-11-17 03:03
Logged In: YES 
user_id=31435

Oh, I sure agree it's the obvious way to look up the month 
name.  What's *not* obvious is that damn foreigners forced 
us to turn an indexing operation into a locale-aware 
nightmare <wink>.

A semi-obvious way to rewrite that loop avoids all the 
overheads in the loop, with or without the patch:  create a 
local

     month_name = calendar.month_name[:]

outside the loop, and reference `month_name` inside the 
loop instead.  Then inside the loop it's literally & truly just 
indexing a builtin list with an integer.

Of course then the month names won't change inside the 
loop if the locale changes, but I count that a feature, being a 
non-foreigner myself <wink>.

BTW, the crushing effectiveness of the "[:]" trick here is an 
upside of allowing a slice object as an index.
History
Date User Action Args
2022-04-11 14:56:08adminsetgithub: 41166
2004-11-12 19:12:47gvanrossumcreate