This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Add nlargest() and nsmallest() to heapq.
Type: Stage:
Components: Library (Lib) Versions: Python 2.4
process
Status: closed Resolution:
Dependencies: Superseder:
Assigned To: tim.peters Nosy List: rhettinger, tim.peters
Priority: normal Keywords: patch

Created on 2004-06-09 17:59 by rhettinger, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
heap2.diff rhettinger, 2004-06-09 19:00 Patch for nsmallest() and nlargest()
heapperf.py rhettinger, 2004-06-09 23:41 Comparison counting suite
Messages (5)
msg46148 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2004-06-09 17:59
This patch adds a function that encapsulates a principal 
use case for heaps, finding the n largest from a dataset 
without doing a full sort.
msg46149 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2004-06-09 19:00
Logged In: YES 
user_id=80475

Revised the patch to also include nsmallest().

Note, the API intentionally does not provide default values of 
n.  This is to encourage use of min() and sorted() for corner 
cases.
msg46150 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2004-06-09 20:06
Logged In: YES 
user_id=31435

+1 on nlargest().

-0 on nsmalles(), because it has radically different time and 
space requirements than nlargest() for the typical case (n is 
much less than the total # of elements).  If the user can 
afford all that space, it will probably be faster for the user to 
sort a materialized list then peel off the first n.  That's a 
simple one-liner.  If so, that makes nsmallest an "attractive 
nuisance".  If heapq had a more abstract interface, so that 
we had both min-heaps and max-heaps, this objection would 
of course go away.  But we don't, so ...
msg46151 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2004-06-09 23:41
Logged In: YES 
user_id=80475

For many values of n, nsmallest() still makes substantially
fewer comparisons than sort().  Attaching comparison count
test suite.

Here are the comparison counts for a list with a 1000 random
entries and various values of n:

8649 sort

999 1 nlargest
1667 1 nsmallest

1088 5 nlargest
1709 5 nsmallest

1238 10 nlargest
1759 10 nsmallest

1487 20 nlargest
1860 20 nsmallest

2052 40 nlargest
2066 40 nsmallest

2953 80 nlargest
2466 80 nsmallest

4516 160 nlargest
3262 160 nsmallest

6835 320 nlargest
4811 320 nsmallest

9177 640 nlargest
7736 640 nsmallest

9742 1000 nlargest
10315 1000 nsmallest


P.S.  Using itertools, the nsmallest() can be made to run at
C speed:

def nsmallest(iterable, n):
    """Find the n smallest elements in a dataset.

    Equivalent to:  sorted(iterable)[:n]
    """
    h = list(iterable)
    heapify(h)
    return map(heappop, repeat(h, min(n, len(h))))
msg46152 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2004-06-10 04:53
Logged In: YES 
user_id=80475

Taking this one back off Tim's busy plate.

If I deluded myself with the attached tests, he can give me a 
hard time later ;-)
History
Date User Action Args
2022-04-11 14:56:04adminsetgithub: 40369
2004-06-09 17:59:15rhettingercreate