Issue749261
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2003-06-05 01:03 by csiemens, last changed 2022-04-10 16:09 by admin. This issue is now closed.
Messages (6) | |||
---|---|---|---|
msg16253 - (view) | Author: Curtis Siemens (csiemens) | Date: 2003-06-05 01:03 | |
The os.path.split() & posixpath.split() functions in my opinion do not handle '.' & '..' at the end of a path properly which causes os.path.dirname() & os.path.basename() to also return the wrong result because they are directly based on os.path.split(). I'll demonstrate the Unix Python case (the Windows ntpath.py case is just a close parallel variation). Example: >python Python 2.1.1 >>> posixpath.split('.') ('', '.') >>> posixpath.split('..') ('', '..') Yet: >>> posixpath.split('./') ('..', '') >>> posixpath.split('../') ('..', '') Now '.' really represents './', and '..' really represents '../' Since the split() function simply uses a string split on '/' to find directories, it goofs up on this one case. The '.' and '..' are like the slash character in the sense that they all only refer to directories. The '.' & '..' can never be files in Unix or Windows, so I think that the split() function should treat paths like: . .. dir/. dir/.. /dir1/dir2/. /dir1/dir2/.. as not having a file portion, just as if: ./ ../ dir/./ dir/../ /dir1/dir2/./ /dir1/dir2/../ respectively were given instead. The fix in posixpath.py for this is just to put a little path processing code at the beginning of the split() function that looks for the follow cases: if p in ['.','..'] or p[-2:] == '/.' or p[-3:] == '/..': p = p+'/' And then go into all the regular split() code. In fix in ntpath.py is very similar. |
|||
msg16254 - (view) | Author: Jeff Epler (jepler) | Date: 2003-06-08 15:34 | |
Logged In: YES user_id=2772 I don't believe this behavior is a bug. os.path.split's task is to split the last component of a path from the other components, regardless of whether any of the components actually names a directory. Another property of os.path.split is that eventually this loop will terminate: while path != "": path = os.path.split(path)[0] with your proposed change, this would not be true for paths that initially contain a "." or ".." component (since os.path.split("..") -> ('..', '')) |
|||
msg16255 - (view) | Author: Curtis Siemens (csiemens) | Date: 2003-06-12 22:59 | |
Logged In: YES user_id=794244 Ok, I see your points, but I have 2 points. Point 1: Your loop 'while path != "": path = os.path.split(path)[0]' won't stop with an absolute path because it will get down to '/' and go into infinite spin. OK, so you can modify it to be: while path != "" and path != '/':path =os.path.split(path)[0] But this too will spin if start with an absolute path that has more than 2 slashes - like '//dir1/dir2' or '///dir1/dir2' at the front of the path. OK, you can fix that up to by doing something like: old_path = '' while path != old_path: old_path = path path = os.path.split(path)[0] But that final loop will work with my new os.path.split proposal - which makes me wonder if your assertion that split should have the 'terminate loop' property. Point 2: You may be right about os.path.split's slated task/job. So maybe the change shouldn't be done to os.path.split(), but rather os.path.dirname() & os.path.basename() should be changed to not just simply return the 1st and 2nd components of split(), but rather try to be as "smart" as possible and dirname's intention is to return the directory portion, and basename's intention is to return the (end) filename portion - if possible. With paths like /abc/xyz you have no idea if xyz is a file or dir, so the default should be 'file'. Currently /abc/xyz/ knows that xyz is a dir and returns /abc/xyz for the dirname and '' for the basename. My point is that currently basename/dirname are "smart" and not just returning the last component that is a file or is a directory, otherwise it would return /abc for the dirname and xyz/ for the basename. So given the current behavior of dirname/basename, they should be smart in ALL "we can tell its a directory" cases such as: . .. dir/. dir/.. /dir1/dir2/. /dir1/dir2/.. So do I have a good Point #1, and more importantly do I have a good Point #2 - and if I do I could change this bug's title to be os.path.dirname/basename related. Curtis Siemens |
|||
msg16256 - (view) | Author: Jeff Epler (jepler) | Date: 2003-06-13 12:03 | |
Logged In: YES user_id=2772 OK-- so my statement of the "important property" of split was only correct in the case of a non-absolute path. The important point is that split shortens the path whenever it contains more than one component. You propose that of the values given by repeated splits of "/foo/.." or "foo/..", you'll never see the one-component return "foo" or "/foo". Why do you believe that in the loop while 1: p = os.path.split(p)[0] that p should never have one those values? To me this seems obviously incorrect. You didn't respond to my point that os.path.split is about components, not about whether those components name directories. For instance, because "/usr/local/bin" names a directory on my system, shouldn't os.path.split("/usr/local/bin") -> ('/usr/local/bin', '') if your test really is about whether the final component names a directory? To me this seems obviously incorrect. Let me also address your claim that because of this split behavior, basename and dirname behave improperly. This is also wrong. In "/tmp/.." and "/usr/local/bin", the first names an entry ".." in the directory "/tmp", and the second names an entry "bin" in the directory "/usr/local", just like "/bin/sh" names an entry "sh" in the directory "/bin". I strongly believe this bug should be marked closed, resolution: invalid. |
|||
msg16257 - (view) | Author: Curtis Siemens (csiemens) | Date: 2003-06-13 18:43 | |
Logged In: YES user_id=794244 Ok, I like the statment, "split shortens the path whenever it contains more than one component" I can go with that definition of os.path.split() because that's consistent for all paths, absolute or relative, and given that definition I'll agree that split is about components. Ok, onto dirname/basename which are really the source of my concern. I looked at the python documentation for basename() and I think that it points out a problem that has been tolerated. It states: Note that the result of this function is different from the Unix basename program; where basename for '/foo/bar/' returns 'bar', the basename() function returns an empty string (''). You state that the final component of a path should be returned for basename() irregardless if it is a file or directory. I can get behind that, but then I think that statement supports the Unix basename function implementation where /foo/bar/ has 'bar' (or 'bar/') returned for basename because /foo/bar and /foo/bar/ are the same path, and to me 'bar' or 'bar/' is the same single component since the trailing slash (and only the trailing slash(es) case) is redundant. Am I way off on this? |
|||
msg16258 - (view) | Author: Jeff Epler (jepler) | Date: 2003-06-13 19:44 | |
Logged In: YES user_id=2772 Interestingly, it appears that back in Python version 1.2, os.path.split may have behaved in the way you described. From http://www.via.ecp.fr/python-doc/python-lib/posixpath.html : split (p) -- function of module posixpath Split the pathname p in a pair (head, tail), where tail is the last pathname component and head is everything leading up to that. If p ends in a slash (except if it is the root), the trailing slash is removed and the operation applied to the result; otherwise, join(head, tail) equals p. The tail part never contains a slash. Some boundary cases: if p is the root, head equals p and tail is empty; if p is empty, both head and tail are empty; if p contains no slash, head is empty and tail equals p. By version 1.4, the behavior had changed:http://www.python.org/doc/1.4/lib/node75.html split(p) Split the pathname p in a pair (head, tail), where tail is the last pathname component and head is everything leading up to that. The tail part will never contain a slash; if p ends in a slash, tail will be empty. If there is no slash in p, head will be empty. If p is empty, both head and tail are empty. Trailing slashes are stripped from head unless it is the root (one or more slashes only). In nearly all cases, join(head, tail) equals p (the only exception being when there were multiple slashes separating head from tail). This change in the Python CVS was made by Guido himself, between the 1.2 and 1.3 releases: http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/python/python/dist/src/Lib/posixpath.py.diff?r1=1.15&r2=1.16 Since the behavior you are now proposing was one that Guido explicitly got rid of, it seems like an uphill battle to ask for it back, especially since the current behavior has been clearly documented for the 1.3, 1.4, 1.5, 1.6, 2.0, 2.1, and 2.2 releases (the last 7 major releases, spanning something like 8 years---or about the time of the introduction of keyword arguments, according to 1.3's Misc/NEWS) |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-10 16:09:02 | admin | set | github: 38593 |
2003-06-05 01:03:01 | csiemens | create |