Issue625698
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2002-10-19 19:27 by erik_andersen, last changed 2022-04-10 16:05 by admin. This issue is now closed.
Files | ||||
---|---|---|---|---|
File name | Uploaded | Description | Edit | |
rec.diff | nnorwitz, 2003-01-19 16:05 | fix slow problem when comparing recursive objects | ||
patch.txt | tim.peters, 2003-01-23 18:57 | compare_latch patch |
Messages (20) | |||
---|---|---|---|
msg12836 - (view) | Author: Erik Andersén (erik_andersen) | Date: 2002-10-19 19:27 | |
List and dictionaries put Python into an infinite loop if they contain more than one reference to itself >>> d = {} >>> d[1] = d >>> d == d # OK Python can handle one recursion 1 >>> d[2] = d >>> d == d # Crash with two Lists are similar >>> l=[] >>> l.append(l) >>> l==l # OK 1 >>> l.append(l) >>> l==l # Crash >>> l>l # Also crash Tested with ActivePython 2.2.1 under Windows 98 and (the first part) Python 2.2.2 under Linux |
|||
msg12837 - (view) | Author: Magnus Lyckå (mly) | Date: 2002-10-26 23:12 | |
Logged In: YES user_id=95217 I tried the above under Python 2.2.1 (#34, Sep 27 2002, 18:37:42) [MSC 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. (ActiveState on Win 2000) and Python 2.1.1 (#1, Aug 20 2001, 20:23:29) [GCC 2.96 20000731 (Mandrake Linux 8.1 2.96-0.60mdk)] on linux-i386 In both cases it works correctly. No crash. It takes a lot of time though, Several seconds on Duron 700 w/ Win2k, and tens of seconds on K6-233 w/ Linux. |
|||
msg12838 - (view) | Author: Neal Norwitz (nnorwitz) * | Date: 2003-01-08 00:59 | |
Logged In: YES user_id=33168 Erik, do you still have this problem with 2.2.2? I can't reproduce your problem under Linux. mly can't reproduce the problems with 2.2.1 on Windows (2000). Do you only have the problem on Windows? |
|||
msg12839 - (view) | Author: Neal Norwitz (nnorwitz) * | Date: 2003-01-19 16:05 | |
Logged In: YES user_id=33168 Attached is a patch which fixes the problem for me. |
|||
msg12840 - (view) | Author: Martin v. Löwis (loewis) * | Date: 2003-01-19 17:30 | |
Logged In: YES user_id=21627 The patch is fine, except that a comment needs to be added explaining what this does and why it does that. It should be understood that this can't fix the very similar case a = [] a.append(a) a.append(a) b = [] b.append(b) b.append(b) a==b This would still take a long time, right? |
|||
msg12841 - (view) | Author: Tim Peters (tim.peters) * | Date: 2003-01-20 04:28 | |
Logged In: YES user_id=31435 The patch assumes "is" implies "==", which isn't necessarily so for rich comparisons. The attached patch cuts the time about in half, via: 1. Getting rid of the per-thread inprogress dicts, in favor of a shared inprogress dict. 2. Restoring the intent of the code that tuples be exempted (tuples can't be recursive). Rev 2.65 of tupleobject.c gave tuples a tp_as_mapping slot, breaking the intent of the code. 3. Changing the inprogress dict to map a tuple to the smallest value of compare_nesting at which the tuple was seen. delete_token() was changed to leave the token alone if the current compare_nesting value is greater than that. This allows tuples to stay in the dict longer. 4. Cutting NESTING_LIMIT from 20 to 19. This has a huge effect, because the algorithm is still basically exponential- time. The problem remaining is that the inprogress dict is still useless (and empty) so long as compare_nesting is less than NESTING_LIMIT. This can leave an exponential amount of work to be done to fight back up to NESTING_LIMIT. Leaving stuff in inprogress longer, and consulting inprogress regardless of the value of compare_nesting (provided that's gone over NESTING_LIMIT at least once) slashes the time to triviality. However, it isn't safe: we can't know that the addresses in the dict keys refer to the same objects then. We could if started storing into inprogress from the very start, but that would be a big speed hit for usual-case comparisons. I'd be happy to knock NESTING_LIMIT down more, though. Assigned to Jeremy, as IIRC this was his code originally. |
|||
msg12842 - (view) | Author: Martin v. Löwis (loewis) * | Date: 2003-01-20 08:20 | |
Logged In: YES user_id=21627 Where does the documentation say that "is" may not imply ==? The check for recursion assumes this implication, anyway: if we meet the very same object, we infer that they are equal. What is the purpose of clear_inprogress_dict in your patch? |
|||
msg12843 - (view) | Author: Tim Peters (tim.peters) * | Date: 2003-01-20 16:12 | |
Logged In: YES user_id=31435 I deleted my patch -- it didn't make sense. Martin, AFAIK the docs say nothing about the relationship (if any) between "is" and "==". It was deliberate intent that "is" not imply "==" for rich comparisons, though, in part so that IEEE-754 rules for NaN could be implemented by users. >>> class NaN: ... def __eq__(self, other): return False ... >>> n = NaN() >>> n is n True >>> n == n False >>> |
|||
msg12844 - (view) | Author: Erik Andersén (erik_andersen) | Date: 2003-01-20 19:24 | |
Logged In: YES user_id=364358 Excuse me for interupting the efficiency discussion, but what is the definition of equality for recursive containers. I am particularly worrided about cases like >>> class A: ... def __eq__(self,other): return not self.u==other.u ... >>> a=A() >>> l=[a] >>> a.u=l >>> l==l # ???? If l==l then a!=a and l[0]!=l[0], so l!=l. If l!=l then a==a and l[0]==l[0], so l==l !!!!!!! |
|||
msg12845 - (view) | Author: Tim Peters (tim.peters) * | Date: 2003-01-20 19:50 | |
Logged In: YES user_id=31435 Equailty for self-consistent recursive objects does a graph isomorphism computation. In your self-inconsistent example, it so happens that False is returned, although True would be returned if the compile-time constant (in object.c) NESTING_LIMIT were odd instead of even. But if you worry about this, you need to get out more <wink>. |
|||
msg12846 - (view) | Author: Tim Peters (tim.peters) * | Date: 2003-01-20 20:09 | |
Logged In: YES user_id=31435 BTW, Erik, what did you mean by "crash"? Nobody else has seen a crash here. You started the report by saying "infinite loop", which isn't the same thing as a crash. There isn't an infinite loop here either, although the algorithm can take astronomical amounts of time to finish (so may appear to be in an infinite loop). |
|||
msg12847 - (view) | Author: Neal Norwitz (nnorwitz) * | Date: 2003-01-20 20:20 | |
Logged In: YES user_id=33168 Sorry Tim, I was going to reply to one of your first msgs and forgot. Erik sent me mail which said there wasn't a crash, just that "the statements take horrendously long time to execute." |
|||
msg12848 - (view) | Author: Tim Peters (tim.peters) * | Date: 2003-01-20 20:44 | |
Logged In: YES user_id=31435 Cool. He's right that they can take a horrendously long time. The 2-element list example takes c*2**20 units of time, for some suitable constant c, and if it were extended to 3 elements, it would take c*3**20 units of time, thousands of times longer. As Erik's latest example shows, the outcome isn't always particularly well defined either. An alternative to speeding this silliness is to raise an exception instead when recursive objects are detected. There was some hack value in doing the graph isomorphism bit, but no real practical value I can see. |
|||
msg12849 - (view) | Author: Till Plewe (llit) | Date: 2003-01-21 06:01 | |
Logged In: YES user_id=540660 How about checking for equality by trying to build a bisimulation which "proves" equality. Time complexity is O(nodes(a) x nodes(b)) SAMPLE PYTHON CODE =============================== #! /usr/bin/env python from types import * SeqType=[ListType,DictType,TupleType] def isequal(a,b): if id(a)==id(b): return True compare={(id(a),id(b)):(a,b)} bisimilar={} while compare: (ida,idb),(oba,obb)=compare.popitem() if ida==idb: continue elif type(oba)==type(obb): if type(oba) not in SeqType: if oba!=obb: return False else: continue elif len(oba)!=len(obb): return False elif (ida,idb) not in bisimilar: bisimilar[(ida,idb)]=1 if type(oba)==DictType: for x in oba: #oba and obb have the same number of keys if x not in obb: return False else: compare[(id(oba[x]),id(obb[x]))]=(oba[x],obb[x]) else: for x,y in zip(oba,obb): compare[(id(x),id(y))]=(x,y) else: return False return bisimilar |
|||
msg12850 - (view) | Author: Tim Peters (tim.peters) * | Date: 2003-01-22 03:09 | |
Logged In: YES user_id=31435 llit, you should study the current code before suggesting an algorithm. The current code would be blazing fast if it were looking for recursion from the start too. But the very process of building that dict with the ids of the objects as keys (which the current algorithm also does) costs more than most comparisons cost in real life. That's why the current code doesn't even try to catch recursion until the comparison level exceeds 20, and that's where the sloth comes from. Stick a gimmick like that in your algorithm, and you'll have similar speed problems. Also note that id(a) == id(b) doesn't imply a == b (see the other note about NaN for an example). |
|||
msg12851 - (view) | Author: Jeremy Hylton (jhylton) | Date: 2003-01-23 18:15 | |
Logged In: YES user_id=31392 Maybe I need to read the code again, but IIRC the code doesn't assume that id(a) == id(b) implies a == b. Rather it uses id(a) and id(b) to look for recursive calls to compare. It assumes that a == b will always produce the same result when invoked recursively within a comparison. I think a decent solution is to create a latch for the nesting limit. If a comparison ever exceeds the recursion limit, the check is done until the comparison is finished. |
|||
msg12852 - (view) | Author: Tim Peters (tim.peters) * | Date: 2003-01-23 18:57 | |
Logged In: YES user_id=31435 I'll attach a patch with a simple recursion latch. It handles the small examples fine. Make them bigger and sloth returns, though; e.g., this one takes more than 20 seconds after the patch (but longer than the lifetime of the universe without the patch): N = 1000 print "building a" a = [] for i in range(N): a.append(a) print "building b" b = [] for i in range(N): b.append(b) print "comparing" print a == b So long as inprogress entries don't outlive the comparison call that created them, it still leaves exponentially- compounding overhead to keep on recreating the entries that used to exist. But if an inprogress entry does outlive the comparison that created it, then the cached addresses may end up attached to objects other than the original comparands, and how to get each thread to eventually clear its inprogress dict becomes a puzzle and/or an expense. |
|||
msg12853 - (view) | Author: Armin Rigo (arigo) * | Date: 2003-01-29 14:11 | |
Logged In: YES user_id=4771 Bisimulation is a cool tool that only works in "monotonic" settings, i.e. when the set of pairs of objects known to be equal is always growing and objects cannot become different when other objects are discovered to be equal. So the whole bisimulation thing is inconsistent with user-defined comparisons because these can use negations (there is no negation in the standard list and dict compares). I think the language definition should stick to the simple recursive definition of comparison, and thus be allowed to enter an infinite recursion. The interpreter should be Ctrl-C interruptible and raise a "RuntimeError: maximum recursion depth exceeded". This is similar to instance_call() in classobject.c. In particular I think the interpreter should not try to detect loops explicitely. Consider the case of someone who really wants to implement a bisimulation thing: >>> class X: ... def __eq__(self, other): ... if (self, other) in already_comparing: ... return True ... else: ... ... We must allow the same object's __eq__() method to be entered twice without raising any exception. I would say that the problem is similar to: >>> def f(): ... f() It is not the interpreter's role to detect the loop and raise an exception, but only to crash nicely when the recursion depth becomes critically large. |
|||
msg12854 - (view) | Author: Michael Stone (mbrierst) | Date: 2003-02-25 22:03 | |
Logged In: YES user_id=670441 I have posted patch 693221 which I believe will quickly evaluate all examples given on this page. There's some discussion there. Feel free to argue with it. |
|||
msg12855 - (view) | Author: Tim Peters (tim.peters) * | Date: 2004-03-20 21:10 | |
Logged In: YES user_id=31435 Python 2.4 was changed so that all these examples yield RuntimeError: maximum recursion depth exceeded in cmp so closing this as Fixed. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-10 16:05:46 | admin | set | github: 37342 |
2002-10-19 19:27:15 | erik_andersen | create |