Issue 681780: Faster commonprefix (OS independent)

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/37925

classification

Title:	Faster commonprefix (OS independent)
Type:		Stage:
Components:	Library (Lib)	Versions:	Python 2.3

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:	rhettinger	Nosy List:	glchapman, nnorwitz, rhettinger, s_keim, tzot
Priority:	normal	Keywords:	patch

Created on 2003-02-06 17:03 by tzot, last changed 2022-04-10 16:06 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
commonprefix.diff	tzot, 2003-02-06 17:04	diff -u of ntpath, posixpath
commonprefix2.diff	tzot, 2003-02-06 18:02	diff -u
commonprefix_nobuf.diff	tzot, 2003-02-07 11:11	diff -u

Messages (8)
msg42702 - (view)	Author: Χρήστος Γεωργίου (Christos Georgiou) (tzot) *	Date: 2003-02-06 17:03
This routine is about 20% faster on a test set of 7 sets of strings run 100000 times each (I can provide the test if requested). The longer the common prefix is, the faster the routine becomes relative to original commonprefix. My only worry is that it might get rejected if it is considered too fancy; therefore I wasn't shy on commenting. I think we should also write a commonpathprefix, that will do what commonprefix should do, being in the *path.py module. I'll do that if none other does. The provided patch is for posixpath.py and ntpath.py, but since it's OS neutral it should work as is. It uses itertools for speed, though, so it is not backportable, but it can be if requested by substituting map for imap and a normal slice for islice.
msg42703 - (view)	Author: Χρήστος Γεωργίου (Christos Georgiou) (tzot) *	Date: 2003-02-06 17:04
Logged In: YES user_id=539787 For some reason, my IE never uploads the file on the first attempt.
msg42704 - (view)	Author: Neal Norwitz (nnorwitz) *	Date: 2003-02-06 17:25
Logged In: YES user_id=33168 As much as I'd like to blame IE, it's a SF bug AFAIK. http://sf.net/tracker/?func=detail&atid=200001&aid=675910&group_id=1
msg42705 - (view)	Author: Χρήστος Γεωργίου (Christos Georgiou) (tzot) *	Date: 2003-02-06 18:02
Logged In: YES user_id=539787 Best case: comparing this to the old version with a list: ['/usr/local/lib/python2.3/posixpath.py']120, 10000 iterations, the speed difference is: old: 319.58 sec new: 34.43 sec Since prefix_len always grows in the "while next_bit:" loop, applying commonprefix2.diff to the patched* version does a very minor speedup (comparing smaller buffers in every iteration); but it is only a matter of overoptimisation (ie it does not hurt, but it's a trivial one, just 0.1%).
msg42706 - (view)	Author: Χρήστος Γεωργίου (Christos Georgiou) (tzot) *	Date: 2003-02-07 11:11
Logged In: YES user_id=539787 I did my homework better, and found out that the buffer object quite probably will be deprecated. So I rewrote the routine without the buffer object (using str.startswith), which by the way got another 10% speedup (relative to the latest version using buffer.) The commonprefix_nobuf.diff patch applies directly to the original posixpath.py, ntpath.py. I will try to delete the other patches, but I don't think I am allowed to do it.
msg42707 - (view)	Author: Sebastien Keim (s_keim)	Date: 2003-03-04 08:01
Logged In: YES user_id=498191 I would suggest another possibility. This one use a property of strings ordering: if you have a<=b<=c and c.startswith(a) then b.startswith(a). I have tested two implementations : # a 5 lines function with a really straightforward code. # It can degenerate rather badly in the worst case (large strings # with a short common prefix) but is generally quite fast. def commonprefix1(m): if not m: return '' prefix, greater = min(m), max(m) while not greater.startswith(prefix): prefix = prefix[:-1] return prefix # The second use a bissection to avoid the worst case. This make # the implementation a little more complex but seems to provide the # fastest result. def commonprefix2(m): prefix = '' if m: low, high = min(m), max(m) while low: n = len(low)//2 + 1 l, h = low[:n], high[:n] if h==l: prefix += l low, high = low[n:], high[n:] else: low, high = l[:-1], h return prefix I personally prefer the commonprefix1 implementation: its the simplest one and it is probably fast enough for the few commonprefix use-cases (anyway, it is still faster than the current implementation).
msg42708 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2003-12-31 22:47
Logged In: YES user_id=80475 Applied Terry Reedy's solution to Lib/posixpath.py 1.64. His solution is at: http://groups.google.com/groups? th=fc7b54f11af6b24e&seekm=bss2so$om$5@news.t- online.com
msg42709 - (view)	Author: Greg Chapman (glchapman)	Date: 2004-08-15 23:48
Logged In: YES user_id=86307 Sorry for commenting on a closed item, but it appears the patched commonprefix has not made it beyond posixpath.py (at any rate, it's not in ntpath.py). Perhaps the change can be applied to the other path modules?

History
Date	User	Action	Args
2022-04-10 16:06:35	admin	set	github: 37925
2003-02-06 17:03:54	tzot	create