This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Better/faster implementation of os.path.split
Type: Stage:
Components: Library (Lib) Versions: Python 2.4
process
Status: closed Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: einsteinmg
Priority: normal Keywords:

Created on 2006-09-17 14:09 by einsteinmg, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (4)
msg29857 - (view) Author: Michael Gebetsroither (einsteinmg) Date: 2006-09-17 14:09
hi,

os.path.split is quite bad regarding performance on 
long pathnames:

def split(p):
    i = p.rfind('/') + 1
    head, tail = p[:i], p[i:]
    if head and head != '/'*len(head):
        head = head.rstrip('/')
    return head, tail

especially this: '/'*len(head)
this constructs an unnecessary string sometimes 
thousands of chars long.

better would be:
if head and len(head) != head.count('/')

BUT:
what is this 'if head and head != '/'*len(head):' for?
this if is imho useless, because
if head exists and is not all '/' => rstrip '/'

imho better would be:
rstrip '/' from head and if head is empty add a '/'
would be the same effect, because a singel '/' is just 
the same as a path as '/'*len(head).

def split(p):
    i = p.rfind('/') + 1
    head, tail = p[:i], p[i:]
    head = head.rstrip('/')
    if not head:
        head = '/'
    return head, tail

such a implementation would be ways faster for long 
pathnames.

greets,
michael
msg29858 - (view) Author: Michael Gebetsroither (einsteinmg) Date: 2006-09-18 11:08
Logged In: YES 
user_id=1600082

sorry, haven't benchmarked my solution
msg29859 - (view) Author: Michael Gebetsroither (einsteinmg) Date: 2006-09-18 11:25
Logged In: YES 
user_id=1600082

patch passes all unittests for posixpath.

basename( 310 ) means basename called with path of length 
310

sum = 0.0453672409058 min = 4.19616699219e-05 
posixpath.basename( 310 )
sum = 0.15571641922 min = 0.000146865844727 
posixpath_orig.basename( 310 )

sum = 0.0432558059692 min = 4.10079956055e-05 
posixpath.basename( 106 )
sum = 0.128361940384 min = 0.000113964080811 
posixpath_orig.basename( 106 )

sum = 0.0422701835632 min = 4.10079956055e-05 
posixpath.basename( 21 )
sum = 0.118340730667 min = 0.000111818313599 
posixpath_orig.basename( 21 )

so this optimized basename is about 3 times faster as the 
old one and gets even faster for longer paths.

sum = 0.124966621399 min = 0.000120878219604 
posixpath.dirname( 310 )
sum = 0.156893730164 min = 0.000144958496094 
posixpath_orig.dirname( 310 )

sum = 0.0986065864563 min = 9.10758972168e-05 
posixpath.dirname( 106 )
sum = 0.117443084717 min = 0.000113964080811 
posixpath_orig.dirname( 106 )

sum = 0.0905299186707 min = 8.89301300049e-05 
posixpath.dirname( 21 )
sum = 0.118889808655 min = 0.000111103057861 
posixpath_orig.dirname( 21 )

optimized dirname is also faster but not that much.
but it saves an allocation which could save a few cycles 
later.
msg29860 - (view) Author: Michael Gebetsroither (einsteinmg) Date: 2006-09-18 11:29
Logged In: YES 
user_id=1600082

@#&^%#@$ webformular :(
History
Date User Action Args
2022-04-11 14:56:20adminsetgithub: 43986
2006-09-17 14:09:20einsteinmgcreate