This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: emphasize iteration volatility for set
Type: Stage:
Components: Documentation Versions:
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: Nosy List: aisaac0, loewis, rhettinger
Priority: normal Keywords:

Created on 2007-05-18 15:10 by aisaac0, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (11)
msg32062 - (view) Author: Alan (aisaac0) Date: 2007-05-18 15:10
For <URL:http://docs.python.org/lib/types-set.html>, append the following new sentence to the 2nd paragraph.

    Iteration over a set returns elements in an indeterminate order, which generally depends on factors outside the scope of the containing program.

*Justification:* users should not be expected to understand without being told that iteration order depends on factors outside the scope of the containing program. (Additionally, unlike the documentation for dictionaries, the documentation for sets fails to give a serious warning not to rely on iteration order.)
msg32063 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2007-05-18 17:05
What factors outside the containing program influence iteration order? Iteration is completely deterministic, and only depends on the items inserted, and the order in which they were inserted, neither of which is outside the scope of the containing program. It's just that the order is not easily predictable.
msg32064 - (view) Author: Alan (aisaac0) Date: 2007-05-18 18:00
Location in memory.
See Peter Otten's discussion at
http://www.thescripts.com/forum/post2552380-16.html
msg32065 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2007-05-18 23:08
While the OP knows what he means here, the suggested text does not add clarity, it only makes the subject harder to understand and implies that some mysterious, dark force is in place.  Further, the suggested text is simply incorrect.  Given deterministic assignment of hash values and a consistent insertion order, the order of keys in a set or dictionary is fully determined.

I've read the source of this suggestion on comp.lang.python and commented there.  The underlying issue had nothing to do with either sets or dicts.  The code in question "re-discovered" that the location of objects in memory  would vary between runs if the user deleted a pyc file for a module.  The OP's script used object ids as hash values, hence the set/dict ordering could vary between runs.  This was at odds with his expectation that that the ordering would be deterministic.  The moral is that non-deterministic hash values lead to non-deterministic set/dict ordering.

The docs for sets and dicts should not be muddled with tangential discussions about implementation specific details regarding what governs where objects are placed in memory.
msg32066 - (view) Author: Alan (aisaac0) Date: 2007-05-19 02:28
While I do not mind my language being rejected, *something* should be added to warn users.  What the previous comment fails to mention is the number of people on c.l.python, some of whom are quite sophisticated users, who failed to discover the source of indeterminacy.  Users should not have to "rediscover" this because of a documentation failure.
msg32067 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2007-05-19 06:38
The documentation already says "Being an unordered collection, sets do not record element position or order of insertion."

If users read this and fail to understand the notion of an unordered collection, I see no way of "fixing" this.
msg32068 - (view) Author: Alan (aisaac0) Date: 2007-05-19 13:09
The previous comment completely misses the point.  Again, please see the discussion on c.l.python.  Not one of the participants expected sets to be "ordered". What was suprising to them was the order can *change* across sequential executions of an **unchanged** source.   This is of course *quite* different than expecting that sets are ordered; I am perplexed that anyone would conflate the two.  One cannot credibly argue that anyone who understands that sets are not ordered will not be surprised, since even sophisticated users were as a matter of fact surprised in the c.l.python discussion.  (Until it was explained by Peter of course.)  A natural conclusion is that the docs should offer better protection against such surprise, since we have concrete evidence that even sophisticated users can be surprised by this.

In sum, the previous comment conflates two distinct issues and so fails to address the reasons for the proposed docs patch.
msg32069 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2007-05-19 21:29
aisaac0, thanks for elaborating. Your remark now convinces me that it was the right thing to reject this change.

Ite seems that you suggest that experienced users
a) are aware that some objects compare and hash by their id(), and
b) that the id() is the address in memory, and
c) that the id() will influence the order in which objects are iterated, and
d) fail to see that the id() may differ across runs

Such users are *not* experienced. There are many more reasons why the id of an object may vary across runs. E.g. Linux 2.6 deliberately randomizes memory management, so that identical processes get their objects allocated at different addresses, to defeat security exploits that rely on deterministic address of things in main memory (there is a system call to disable this randomization)

Looking at the entire thread, I agree with Carsten Haese's posting: That even experienced users couldn't diagnose this correctly is because they
a) did not receive the source code, and
b) were talked into believing that this has to do something with the random module.

The library reference is a specification, not a tutorial.
msg32070 - (view) Author: Alan (aisaac0) Date: 2007-05-22 03:27
Note that on c.l.python, Raymond Hettinger justifies this rejection as follows:

  "the docs are sufficient when they say that set ordering is arbitrary"

Where exactly do the docs say this? I do not see it.
I am looking here: <URL:http://docs.python.org/lib/types-set.html>

I also take this as a concession that the docs *should* say something like this,
which is about half of the language I proposed
(unless there is some reason why 'arbitrary' is superior to 'indeterminate').

Btw, I did provide the source code to several people before Peter.
This was clear in the thread on c.l.python.
I do not think they would appreciate being called "not experienced".
msg32071 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2007-05-22 05:22
aisaac0: The first sentence of types-set says "A set object is an unordered collection of immutable values."

How else could you interpret "unordered collection" but as "having arbitrary order"?
msg32072 - (view) Author: Alan (aisaac0) Date: 2007-05-23 03:01
Martin's preceding comment asks:
  "How else could you interpret 'unordered collection' but as 'having arbitrary order'?"

This question seems to me to read into the docs much that is not there. There are so many possibilities ...

Keep in mind that I proposed a docs patch believing that the docs should communicate to users that a particular iteration on a set may produce a different order of elements on *sequential* runs of an *unchanged* source.  Martin been claiming that the docs do communicate this (although he has been somewhat vague about the target audience).

Here is a simple description of a class whose instances are plausibly considered an "unordered collection" that would not have that "volatility" property. (The question here should not whether this is a silly class but whether Martin is actually drawing a valid inference from the existing docs rather than just reading his experience into them.)

Call the class ``Silly``. Initialize a Silly instance with an empty list and a deterministically seeded instance of random.Random. A Silly instance has an ``add`` method that computes the length of its list N and picks an insertion point for the new item using randrange(N+1).  (If you want a Silly instance to behave more like a set, we can insert an element only if it's not in the list.) Iteration on this object is to be iteration on its list, but no indexing or slicing is to be provided.

A ``Silly`` instance is plausibly an 'unordered collection' that is iterable but does NOT have the "volatility" property under discussion.
History
Date User Action Args
2022-04-11 14:56:24adminsetgithub: 44974
2007-05-18 15:10:28aisaac0create