This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: filter() implementation does not match docs
Type: Stage:
Components: Interpreter Core Versions: Python 2.5
process
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: Nosy List: collinwinter, rhettinger
Priority: normal Keywords:

Created on 2006-07-05 13:09 by collinwinter, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (5)
msg29048 - (view) Author: Collin Winter (collinwinter) * (Python committer) Date: 2006-07-05 13:09
The docs for the built-in function filter() claim that
"filter(function, list) is equivalent to [item for item
in list if function(item)] if function is not None and
[item for item in list if item] if function is None".

>>> class infinite_str(str):
...     def __getitem__(self, index):
...         return "a"
...
>>> filter(None, infinite_str("1234"))
'aaaa'


Now, if we translate this to a listcomp according to
the docs:

>>> [x for x in infinite_str("1234") if x]

The listcomp version proceeds to chew up memory until
it exhausts the system resources or is killed by the user.

If the docs are to be believed, the filter() version
should do the same thing. 
msg29049 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2006-07-05 14:58
Logged In: YES 
user_id=80475

Please take a larger view when reviewing the docs. 
The listcomp analogy is very helpful in explaining what 
filter() does and readers would not benefit by its removal.

Throughtout the docs, the phrase "is equivalent to" does 
not mean "is identical to" or "exactly the same as".   In 
this case, you have isolated a non-guaranteed 
implementation detail that is almost always irrelevant.  
When an object such as infinite_str lies about its length, 
the consequent behavior is undefined.  It is not hard to 
produce weird results when objects violate basic 
invariants such as len(istr)!=len(list(istr)) or the 
expected relation between __eq__ and __hash__.
msg29050 - (view) Author: Collin Winter (collinwinter) * (Python committer) Date: 2006-07-05 15:16
Logged In: YES 
user_id=1344176

I wasn't just reviewing the docs; I came upon this while
re-implementing filter() in pure Python for a compatibility
module I'm working on. I discovered this particular
implementation detail when my naive implementation (as a
listcomp) didn't pass test_builtin.

My question is why filter() even cares about __len__, ie,
why it doesn't simply use the iterator protocol. This is
especially curious since filter() actively ignores the
iterator protocol when dealing with subclasses of str,
unicode and tuple:

>>> class badstr(str):
...     def __len__(self):
...         return 4
...     def __getitem__(self, index):
...         return "a"
...     def __iter__(self):
...         for i in range(5):
...             yield "b"
...
>>> filter(None, badseq("cccc"))
aaaa
>>> [x for x in badseq("cccc") if x]
['b', 'b', 'b', 'b', 'b']

Ignore the difference in return types (which is expected),
these two do not do the same thing; filter() uses a
combination of __len__ and __getitem__ while the listcomp
version uses the iterator protocol.

Clearly, in this case, "is equivalent to" means "bears only
a superficial relationship to".
msg29051 - (view) Author: Collin Winter (collinwinter) * (Python committer) Date: 2006-07-13 19:26
Logged In: YES 
user_id=1344176

I've posted a patch for the __len__ issue; it's SF #1522016.
msg29052 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2006-07-13 22:35
Logged In: YES 
user_id=80475

It's a speed hack.  The result list for filter() is
pre-allocated to the input length and then resized downward
to the actual length.

Stop posting patches for non-bugs.
History
Date User Action Args
2022-04-11 14:56:18adminsetgithub: 43616
2006-07-05 13:09:25collinwintercreate