This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: splitext performances improvement
Type: Stage:
Components: Library (Lib) Versions: Python 2.3
process
Status: closed Resolution: accepted
Dependencies: Superseder:
Assigned To: loewis Nosy List: arigo, loewis, s_keim, tim.peters
Priority: normal Keywords: patch

Created on 2002-03-29 08:06 by s_keim, last changed 2022-04-10 16:05 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
posixpath.dif s_keim, 2002-03-29 08:06 posixpath.py and test_posixpath.py diff
xxxpath.dif s_keim, 2002-04-03 07:03
Messages (9)
msg39410 - (view) Author: Sebastien Keim (s_keim) Date: 2002-03-29 08:06
After more thought, I must admit that the behavior change in splitext, I proposed with patch 536120 is not acceptable. So I would instead propose this one which should only improve performances without modifying behavior.
The following bench says that patched splitext is between 2x(for l1) and 25x(for l2) faster than the original one.

The diff patch also test_posixpath.py to check the pitfall described by Tim comments in patch 536120 page.

def splitext(p):
    root, ext = '', ''
    for c in p:
        if c == '/':
            root, ext = root + ext + c, ''
        elif c == '.':
            if ext:
                root, ext = root + ext, c
            else:
                ext = c
        elif ext:
            ext = ext + c
        else:
            root = root + c
    return root, ext

def splitext2(p):
    i = p.rfind('.')
    if i<=p.rfind('/'):
        return p, ''
    else:
        return p[:i], p[i:]

l1 = ('t','.t','a.b/','a.b','/a.b','a.b/.c','a.b/c.d')

l2 = (
'usr/tmp.doc/list/home/sebastien/foo/bar/hghgt/yttyutyuyuttyuyut.tyyttyt',
'usr/tmp.doc/list/home/sebastien/foo/bar/hghgt/yttyutyuyuttyuyut.',
'usr/tmp.doc/list/home/sebastien/foo/bar/hghgt/.tyyttyt',
'usr/tmp.doc/list/home/sebastien/foo/bar/hghgt/yttyutyuyuttyuyut',
'reeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeyttyutyuyuttyuyut.tyyttyt',
'/iuouiiuuoiiuiikhjzekezhjzekejkejkzejkhejkhzejzehjkhjezhjkehzkhjezh.tyyttyt'
    )

for i in l1+l2:
    assert splitext2(i) == splitext(i)

import time

def test(f,args):
    t = time.clock()
    for p in args:
        for i in range(1000):
            f(p)
    return time.clock() - t

def f(p):pass

a=test(splitext, l1)
b=test(splitext2, l1)
c=test(f,l1)
print a,b,c,(a-c)/(b-c)

a=test(splitext, l2)
b=test(splitext2, l2)
c=test(f,l2)
print a,b,c,(a-c)/(b-c)
msg39411 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2002-03-29 09:49
Logged In: YES 
user_id=21627

The patch looks good to me.
msg39412 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2002-03-29 18:56
Logged In: YES 
user_id=31435

I like it fine so far as it goes, but I'd like it a lot 
more if it also patched the splitext and test 
implementations for other platforms.  It's not good that, 
e.g., posixpath.py and ntpath.py get more and more out of 
synch over time, and that their test suites also diverge.
msg39413 - (view) Author: Sebastien Keim (s_keim) Date: 2002-04-02 07:15
Logged In: YES 
user_id=498191

I have take a look at macpath, dospath and ntpath. I have found quite a lot of code duplication. What would be your opinion, if I tried to do a little refactoring on this?
msg39414 - (view) Author: Sebastien Keim (s_keim) Date: 2002-04-02 07:28
Logged In: YES 
user_id=498191

I have take a look at macpath, dospath and ntpath. I have found quite a lot of code duplication. What would be your opinion, if I tried to do a little refactoring on this?
msg39415 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2002-04-02 09:24
Logged In: YES 
user_id=21627

Sharing code is a good thing. However, it would be critical
as to how exactly this is done, since os is such a central
module. If you start now, and don't get agreement
immediately, it may well be that you cannot complete until
Python 2.3.
msg39416 - (view) Author: Sebastien Keim (s_keim) Date: 2002-04-03 07:03
Logged In: YES 
user_id=498191

xxxpath.dif contains the splitext patch for posixpath, ntpath, dospath macpath and the corresponding test files (I have added a test file for macpath).

I have found better to not attempt to modify riscospath.py since I don't know this platform. Anyway, it already use a rfind strategy.
msg39417 - (view) Author: Armin Rigo (arigo) * (Python committer) Date: 2002-12-07 16:04
Logged In: YES 
user_id=4771

The test_macpath module should probably use

    from test import test_support

instead of

    import test_support

Apart from this the patch looks fine.
msg39418 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2002-12-12 20:31
Logged In: YES 
user_id=21627

Sebastien, thanks for the patch, and Armin, thanks for the
review. Applied as

macpath.py 1.41
ntpath.py 1.52
posixpath.py 1.55
test_macpath.py 1.1
test_ntpath.py 1.17
test_posixpath.py 1.5

(dospath has gone meanwhile)
History
Date User Action Args
2022-04-10 16:05:10adminsetgithub: 36353
2002-03-29 08:06:22s_keimcreate