This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: struct module doesn't use weakref for cache
Type: Stage:
Components: Library (Lib) Versions: Python 2.5
process
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: bob.ippolito Nosy List: bob.ippolito, josiahcarlson, markaflacy, zseil
Priority: normal Keywords:

Created on 2006-10-08 23:37 by markaflacy, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (8)
msg30184 - (view) Author: Mark Flacy (markaflacy) Date: 2006-10-08 23:37
At the moment, when you attempt to add your 101st
different Struct object to the cache, all the other 100
entries are tossed into the garbage.  That seems a
trifle odd.

The Struct cache would appear to be a perfect use for a
weakref dictionary.
msg30185 - (view) Author: Ziga Seilnacht (zseil) * (Python committer) Date: 2006-10-12 15:52
Logged In: YES 
user_id=1326842

WeakValueDictionary would be useless here; the cache is
the only thing that holds references to Struct objects.
Replacing the _cache.clear() call with _cache.popitem()
seems like a better solution.
msg30186 - (view) Author: Bob Ippolito (bob.ippolito) * (Python committer) Date: 2006-10-12 16:56
Logged In: YES 
user_id=139309

Weakrefs would slow it down.. probably to the point where
the cache wouldn't be an improvement at all.

The re module does the same thing with the regular
expression cache. This isn't a bug until someone presents a
patch that proves another strategy is better for performance.
msg30187 - (view) Author: Ziga Seilnacht (zseil) * (Python committer) Date: 2006-10-12 17:36
Logged In: YES 
user_id=1326842

I was thinking abot something like this:

Index: Lib/struct.py
===================================================================
--- Lib/struct.py	(revision 52316)
+++ Lib/struct.py	(working copy)
@@ -35,7 +35,7 @@
 def _compile(fmt):
     # Internal: compile struct pattern
     if len(_cache) >= _MAXCACHE:
-        _cache.clear()
+        _cache.popitem()
     s = Struct(fmt)
     _cache[fmt] = s
     return s

(sorry, I don't have the rights to attach a file)
msg30188 - (view) Author: Bob Ippolito (bob.ippolito) * (Python committer) Date: 2006-10-12 17:54
Logged In: YES 
user_id=139309

Yes, that code is different. You haven't shown that it's
better though. Using popitem probably doesn't have very good
cache effects.
msg30189 - (view) Author: Josiah Carlson (josiahcarlson) * (Python triager) Date: 2006-10-16 22:41
Logged In: YES 
user_id=341410

Add a deque and a bit of logic, and one would get a fifo
implementation of the cache.

from collections import deque
_toss = deque()

def _compile(fmt):
    # Internal: compile struct pattern
    if len(_cache) >= _MAXCACHE:
        del _cache[_toss.popleft()]
    s = Struct(fmt)
    _cache[fmt] = s
    _toss.append(fmt)
    return s

Race conditions from multiple threads could result in
#threads-1 more entries in the cache than _MAXCACHE, but
this could be trimmed in later calls by replacing 'if' with
'while'.
msg30190 - (view) Author: Bob Ippolito (bob.ippolito) * (Python committer) Date: 2006-10-16 23:57
Logged In: YES 
user_id=139309

The flush strategy is a silly thing to rattle on about.

The idea is just to pick a cache size that's big enough that
any sane application will never have to flush it. The
current cache size is plenty big enough for anything I threw
at it. If there's an application out there that needs a
bigger cache I'd like to see it, and then perhaps we could
adjust the size in some minor release to accommodate it.

A FIFO is probably a bad strategy because there are a lot of
common structs that are used a lot (the ones with just a few
characters).

If there was an obviously better solution here, surely it
would have already been proposed or implemented for the re
module.
msg30191 - (view) Author: Josiah Carlson (josiahcarlson) * (Python triager) Date: 2006-10-17 00:08
Logged In: YES 
user_id=341410

I agree, just pointing out that there exists a cache
replacement strategy.  Also, there is an LRU implementation
in the Python cookbook, but unless it is rewritten in C (a
serious overkill for the 2? uses in the stdlib), it would
necessitate a lock, which would be a waste.  But we've
talked enough, and I probably shouldn't have bothered
posting *this* message.  I'll let the thread die this time,
really!
History
Date User Action Args
2022-04-11 14:56:20adminsetgithub: 44100
2006-10-08 23:37:14markaflacycreate