This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: os.path.split does not handle . & .. properly
Type: Stage:
Components: Library (Lib) Versions:
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: csiemens, jepler
Priority: normal Keywords:

Created on 2003-06-05 01:03 by csiemens, last changed 2022-04-10 16:09 by admin. This issue is now closed.

Messages (6)
msg16253 - (view) Author: Curtis Siemens (csiemens) Date: 2003-06-05 01:03
The os.path.split() & posixpath.split() functions in my
opinion do not handle '.' & '..' at the end of a path
properly which causes os.path.dirname() &
os.path.basename() to also return the wrong result
because they are directly based on os.path.split().

I'll demonstrate the Unix Python case (the Windows
ntpath.py case is just a close parallel variation).

Example:
>python
Python 2.1.1
>>> posixpath.split('.')
('', '.')
>>> posixpath.split('..')
('', '..')

Yet:
>>> posixpath.split('./')
('..', '')
>>> posixpath.split('../')
('..', '')

Now '.' really represents './', and '..' really
represents '../'
Since the split() function simply uses a string split
on '/' to
find directories, it goofs up on this one case.  The
'.' and
'..' are like the slash character in the sense that
they all
only refer to directories.
The '.' & '..' can never be files in Unix or Windows, so I
think that the split() function should treat paths like:
    .
    ..
    dir/.
    dir/..
    /dir1/dir2/.
    /dir1/dir2/..
as not having a file portion, just as if:
    ./
    ../
    dir/./
    dir/../
    /dir1/dir2/./
    /dir1/dir2/../
respectively were given instead.

The fix in posixpath.py for this is just to put a
little path
processing code at the beginning of the split() function
that looks for the follow cases:
    if p in ['.','..'] or p[-2:] == '/.' or p[-3:] ==
'/..':
        p = p+'/'
And then go into all the regular split() code.
In fix in ntpath.py is very similar.
msg16254 - (view) Author: Jeff Epler (jepler) Date: 2003-06-08 15:34
Logged In: YES 
user_id=2772

I don't believe this behavior is a bug.  os.path.split's task is to split the last component of a path from the other components, regardless of whether any of the components actually names a directory.

Another property of os.path.split is that eventually this loop will terminate:
    while path != "": path = os.path.split(path)[0]
with your proposed change, this would not be true for paths that initially contain a "." or ".." component (since os.path.split("..") -> ('..', ''))
msg16255 - (view) Author: Curtis Siemens (csiemens) Date: 2003-06-12 22:59
Logged In: YES 
user_id=794244

Ok, I see your points, but I have 2 points.

Point 1:
Your loop 'while path != "": path = os.path.split(path)[0]'
won't stop with an absolute path because it will get down
to '/' and go into infinite spin.
OK, so you can modify it to be:
  while path != "" and path != '/':path =os.path.split(path)[0]
But this too will spin if start with an absolute path that has
more than 2 slashes - like '//dir1/dir2' or '///dir1/dir2'
at the
front of the path.
OK, you can fix that up to by doing something like:
   old_path = ''
   while path != old_path:
       old_path = path
       path = os.path.split(path)[0]
But that final loop will work with my new os.path.split
proposal - which makes me wonder if your assertion that
split should have the 'terminate loop' property.

Point 2:
You may be right about os.path.split's slated task/job.
So maybe the change shouldn't be done to os.path.split(),
but rather os.path.dirname() & os.path.basename() should
be changed to not just simply return the 1st and 2nd
components of split(), but rather try to be as "smart" as
possible and dirname's intention is to return the directory
portion, and basename's intention is to return the (end)
filename portion - if possible.  With paths like /abc/xyz
you have no idea if xyz is a file or dir, so the default should
be 'file'.  Currently /abc/xyz/ knows that xyz is a dir and
returns /abc/xyz for the dirname and '' for the basename.
My point is that currently basename/dirname are "smart"
and not just returning the last component that is a file or
is a directory, otherwise it would return /abc for the dirname
and xyz/ for the basename.
So given the current behavior of dirname/basename, they
should be smart in ALL "we can tell its a directory" cases
such as:
  .
  ..
  dir/.
  dir/..
  /dir1/dir2/.
  /dir1/dir2/..

So do I have a good Point #1, and more importantly do I have
a good Point #2 - and if I do I could change this bug's title
to be os.path.dirname/basename related.

Curtis Siemens
msg16256 - (view) Author: Jeff Epler (jepler) Date: 2003-06-13 12:03
Logged In: YES 
user_id=2772

OK-- so my statement of the "important property" of split
was only correct in the case of a non-absolute path.

The important point is that split shortens the path whenever
it contains more than one component.  You propose that of
the values given by repeated splits of "/foo/.."  or
"foo/..", you'll never see the one-component return "foo" or
"/foo".  Why do you believe that in the loop
    while 1:
        p = os.path.split(p)[0]
that p should never have one those values?  To me this seems
obviously incorrect.

You didn't respond to my point that os.path.split is about
components, not about whether those components name
directories.  For instance, because "/usr/local/bin" names a
directory on my system, shouldn't
os.path.split("/usr/local/bin") -> ('/usr/local/bin', '') if
your test really is about whether the final component names
a directory?  To me this seems obviously incorrect.

Let me also address your claim that because of this split
behavior, basename and dirname behave improperly.  This is
also wrong.  In "/tmp/.." and "/usr/local/bin", the first
names an entry ".." in the directory "/tmp", and the second
names an entry "bin" in the directory "/usr/local", just
like "/bin/sh" names an entry "sh" in the directory "/bin".

I strongly believe this bug should be marked closed,
resolution: invalid.
msg16257 - (view) Author: Curtis Siemens (csiemens) Date: 2003-06-13 18:43
Logged In: YES 
user_id=794244

Ok, I like the statment,
  "split shortens the path whenever it contains more than one
   component"
I can go with that definition of os.path.split()
because that's consistent for all paths, absolute or relative,
and given that definition I'll agree that split is about
components.

Ok, onto dirname/basename which are really the source of my
concern.  I looked at the python documentation for basename()
and I think that it points out a problem that has been
tolerated.
It states:
    Note that the result of this function is different from the
    Unix basename program; where basename for '/foo/bar/'
    returns 'bar', the basename() function returns an empty
    string ('').
You state that the final component of a path should be
returned for basename() irregardless if it is a file or
directory.
I can get behind that, but then I think that statement supports
the Unix basename function implementation where /foo/bar/
has 'bar' (or 'bar/') returned for basename because /foo/bar
and /foo/bar/ are the same path, and to me 'bar' or 'bar/' is
the same single component since the trailing slash (and only
the trailing slash(es) case) is redundant.  Am I way off on
this?
msg16258 - (view) Author: Jeff Epler (jepler) Date: 2003-06-13 19:44
Logged In: YES 
user_id=2772

Interestingly, it appears that back in Python version 1.2,
os.path.split may have behaved in the way you described.
From
http://www.via.ecp.fr/python-doc/python-lib/posixpath.html :
split (p) -- function of module posixpath
    Split the pathname p in a pair (head, tail), where tail
is the last pathname component and head is everything
leading up to that. If p ends in a slash (except if it is
the root), the trailing slash is removed and the operation
applied to the result; otherwise, join(head, tail) equals p.
The tail part never contains a slash. Some boundary cases:
if p is the root, head equals p and tail is empty; if p is
empty, both head and tail are empty; if p contains no slash,
head is empty and tail equals p.

By version 1.4, the behavior had
changed:http://www.python.org/doc/1.4/lib/node75.html
split(p)
    Split the pathname p in a pair (head, tail), where tail
is the last pathname component and head is everything
leading up to that. The tail part will never contain a
slash; if p ends in a slash, tail will be empty. If there is
no slash in p, head will be empty. If p is empty, both head
and tail are empty. Trailing slashes are stripped from head
unless it is the root (one or more slashes only). In nearly
all cases, join(head, tail) equals p (the only exception
being when there were multiple slashes separating head from
tail).

This change in the Python CVS was made by Guido himself,
between the 1.2 and 1.3 releases:
http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/python/python/dist/src/Lib/posixpath.py.diff?r1=1.15&r2=1.16

Since the behavior you are now proposing was one that Guido
explicitly got rid of, it seems like an uphill battle to ask
for it back, especially since the current behavior has been
clearly documented for the 1.3, 1.4, 1.5, 1.6, 2.0, 2.1, and
2.2 releases (the last 7 major releases, spanning  something
like 8 years---or about the time of the introduction of
keyword arguments, according to 1.3's Misc/NEWS)
History
Date User Action Args
2022-04-10 16:09:02adminsetgithub: 38593
2003-06-05 01:03:01csiemenscreate