Issue 1185383: Make bisect.* functions accept an optional compare function

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/41873

classification

Title:	Make bisect.* functions accept an optional compare function
Type:	enhancement	Stage:
Components:	Library (Lib)	Versions:

process

Status:	closed	Resolution:	rejected
Dependencies:		Superseder:
Assigned To:		Nosy List:	jonaskoelker, mciura, minmax, rhettinger
Priority:	normal	Keywords:

Created on 2005-04-18 19:26 by mciura, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (7)
msg54448 - (view)	Author: Marcin Ciura (mciura)	Date: 2005-04-18 19:26
The usability of bisect.bisect_right, bisect.bisect_left, bisect.insort_right and bisect.insort_left would increase if they accepted an optional less_than function to compare the keys. Here is my use case: I have a sorted list of reversed words and a parallel list of flags associated with these words (taken from ispell). The list of reversed words is the one searched; the flags are the result of a search. Issue #1: Now I cannot use simply a list of tuples (rev_word,flags) and a custom less_than function that compares only the first elements of two tuples. Issue #2: When a word is not found in the list, I'd like to make an educated guess as to its flags (this makes more sense in non-English languages, where e.g. infinitives have a special ending), using bisect_left and bisect_right: from bisect import * less_than = lambda x,y: x[:3]<y[:3] lp = bisect_left(word_list, given_rev_word, lt=less_than) rp = bisect_right(word_list, given_rev_word, lt=less_than) # return the union of flag_list[lp:rp] An example (given_rev_word = 'abcpqr'): word_list: 'abbx', 'abcaa', <- lp 'abcdd', 'abcss', 'abdf' <- rp Currently, the first search could be replaced with lp = bisect_left(word_list, given_rev_word[:3]) but I can see only non-nice ways to replace the other search. Rolling my own class that stores a word and its flags, with __lt__ depending on some global setting is not thread-safe, and I find such a solution too heavyweight. I hope that I expressed myself clearly enough. If not, let me know, and I'll try to clarify my point.
msg54449 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2005-04-18 20:09
Logged In: YES user_id=80475 A few thoughts: * If bisect provided an optional lt argument, it would still not be thread-safe. The code for the lt function can call arbitrary Python code that runs through the eval-loop and is hence subject to a thread-switch. * An advantage of the class wrapper approach is that the prefix function gets computed just once per word. This is not a big deal for the specific case of [:3], but it could be a significant benefit for other functions that are expensive to compute (because repeated calls to bisect will access the lt function more than once). * My own approach to the particular use case would be to map prefixes to a list of revwords or (revword, flag) pairs: d = dict(abb=['abbc'], abc=['abcaa', 'abcdd', 'abcss'], abd=['abdf']) This gives O(1) lookup time and limits the calls to the prefix function to once per word. Will leave this request open as it may yet be a good idea. My concern is that it will clutter the code and the API for only a small benefit. Also, I'm looking at a more general purpose solution that would make this feature request moot. This idea is to create a fast comparison decorator class used like this: dwordlist = map(ComparisonDecorator(lambda x: x[:3]), wordlist) lp = bisect_left(dwordlist, given_rev_word) rp = bisect_right(dwordlist, given_rev_word)
msg54450 - (view)	Author: Jonas Kölker (jonaskoelker)	Date: 2005-04-28 20:13
Logged In: YES user_id=1153395 In a similar vein, I'd like to see that a `key' keyword argument was added to bisect.* (and perhaps `reversed' too)--it's really annoying to sort something by (not __lt__) and not be able to bsearch it. It got added to min/max and heapq.n(smallest\|largest) in 2.5 fwiw ;)
msg54451 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2005-06-03 01:25
Logged In: YES user_id=80475 Overall, I'm -1 on this RFE. The comparison to nsmallest() and nlargest() is inaccurate. They run start-to-finish in one function call. The other heapq methods do not use key functions because they have to leave the original data structure unmolested between calls; hence, there is no ability to limit the key function calls to one per record. Likewise, with this request, the key function calls get wasted. The sort() method calls key() for every record and tosses the result afterwards. Each subsequect call to bisect() would need to repeat those calls for a log(N) subset of the data. Hence, accepting this request would create an API that encourages a wasteful design.
msg54452 - (view)	Author: Jonas Kölker (jonaskoelker)	Date: 2005-06-10 23:44
Logged In: YES user_id=1153395 > [...] but it could be a significant benefit for other functions that are expensive to compute (because repeated calls to bisect will access the lt function more than once). > Each subsequect call to bisect() would need to repeat those calls for a log(N) subset of the data. Which is exactly why I'm suggesting the key argument: to save those extra calls to the key function. Since that sounds counter-intuitive, let me explain: one sorts (origkey(item), item) pairs, the bisects with key=itemgetter(0), not calling the expensive origkey.
msg54453 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2005-06-11 02:26
Logged In: YES user_id=80475 The whole purpose of the key= argument was to avoid having to build (key, record) tuples. If those are going to be built anyway, then there is little point to using key=. Closing this because: 1) the original use case was better addressed through other methods 2) key= for bisect does not provide the same benefits as it does for other functions. 3) the most recent proposed use is far afield from what key= is supposed to do 4) I think it is bad design and would encourage misguided approaches.
msg54454 - (view)	Author: Andraz Tori (minmax)	Date: 2006-11-22 18:50
This really is annoying. If sort() accepts different comparison functions, so should bisect & co. It makes things that should be easy, very hard. If you are writting threaded programs, you know that functions passed to sort and bisect have to behave properly.

History
Date	User	Action	Args
2022-04-11 14:56:10	admin	set	github: 41873
2005-04-18 19:26:50	mciura	create