This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: os.path.normpath changes path (chops of trailing slash)
Type: behavior Stage: test needed
Components: Library (Lib) Versions: Python 3.0, Python 2.6
process
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: Nosy List: georg.brandl, iszegedi, josm, siemer
Priority: normal Keywords: patch

Created on 2007-04-25 23:44 by siemer, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (11)
msg31890 - (view) Author: Robert Siemer (siemer) Date: 2007-04-25 23:44
Hello everybody!

>>> os.path.normpath('/etc/passwd')
'/etc/passwd'


I don't know any environment at all where

a) '/etc/passwd/'
b) '/etc/passwd'

are treated the same. It clearly does not apply for the path part of http urls (this is left as an exercise for the reader).

But it also does not apply for (e.g.) Linux either:
an open() on path a) return ENOTDIR while it succeeds with b).

(assuming /etc/passwd is a file)

This is definitively not a documentation bug, as "normpath" should normalize a path and not fuck it up.


Robert
msg31891 - (view) Author: Robert Siemer (siemer) Date: 2007-04-25 23:47
A bugreport bug:

The example should read os.path.normpath('/etc/passwd/')...
msg31892 - (view) Author: Istvan Szegedi (iszegedi) Date: 2007-04-28 20:27

Here is what Posix standard says about pathnames:

"Base Definitions volume of IEEE Std 1003.1-2001, Section 3.266, Pathname.

A character string that is used to identify a file. In the context of IEEE Std 1003.1-2001, a pathname consists of, at most, {PATH_MAX} bytes, including the terminating null byte. It has an optional beginning slash, followed by zero or more filenames separated by slashes. A pathname may optionally contain one or more trailing slashes. Multiple successive slashes are considered to be the same as one slash."

And in the details:

"A pathname that contains at least one non-slash character and that ends with one or more trailing slashes shall be resolved as if a single dot character ( '.' ) were appended to the pathname."

So if I am not mistaken, according to the POSIX standard the example that you gave - '/etc/passwd/' - should be normalized to '/etc/passwd/.' That does not happen, indeed.

The reason for that is that in posixpath.py file the normpath() function is using a split('/') function to split up the path into smaller chunks, skips everything which is empty or '.' and at the end of the normpath() function it adds slash(es) only to the beginning of the string. 

As a test, I modified the normpath() function in the posixpath.py as follows:

--- clip ---

def normpath(path):
    """Normalize path, eliminating double slashes, etc."""
    if path == '':
        return '.'
    initial_slashes = path.startswith('/')
    # The next two lines were added by iszegedi
    path = path.rstrip()
    trailing_slash = path.endswith('/')
    # POSIX allows one or two initial slashes, but treats three or more
    # as single slash.
    if (initial_slashes and
        path.startswith('//') and not path.startswith('///')):
        initial_slashes = 2
    comps = path.split('/')
    new_comps = []
    for comp in comps:
        if comp in ('', '.'):
            continue
        if (comp != '..' or (not initial_slashes and not new_comps) or
             (new_comps and new_comps[-1] == '..')):
            new_comps.append(comp)
        elif new_comps:
            new_comps.pop()
    comps = new_comps
    path = '/'.join(comps)
    if initial_slashes:
        path = '/'*initial_slashes + path
    # The next two lines were added by iszegedi
    if trailing_slash:
        path = path + '/.'
    return path or '.'
  
-- clip --

So I added two lines (marked as "added by iszegedi" ) in the beginning to remove any trailing whitespaces and check whether the path ends with slash. Then at the end of the function I added another two lines to append '/.' to the end of the return value if the input path variable ended by slash

This works now fine.

What makes it a bit tricky is that python os module imports different xxxpath.py module depending on the host operating system. So it imports different modules for nt, for mac, for os2, for posix, etc.  The solution above works for posix, but the other modules need to be checked, to.
msg31893 - (view) Author: John Smith (josm) Date: 2007-04-30 06:48
I think we should be careful enough to tackle on this.
iszegedi's patch seems to work correctly,
but XBD's spec itself has some defect.
http://www.opengroup.org/austin/mailarchives/ag-review/msg01722.html

What do you think of the follow behavior?
>>> os.mkdir('dir/')
>>> os.mkdir('dir2/')
>>> os.rmdir(os.path.normpath('dir'))
>>> os.rmdir(os.path.normpath('dir2/'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OSError: [Errno 22] Invalid argument: 'dir2/.'


msg31894 - (view) Author: Istvan Szegedi (iszegedi) Date: 2007-05-01 18:05
I must admit that josm's comments make sense: in fact, I quickly tried out how mkdir command from a bash shell would behave and it does the same:

# mkdir hello
# rmdir hello/. 
Invalid argument

whereas
# rmdir hello/

works fine. I also wrote a small C program using mkdir() and rmdir() functions and they behave exactly the same as mkdir/rmdir from bash (well, no real suprise).

My suggestion to get the original issue fixed was based on POSIX standard and apparently the Linux commands are not fully POSIX compliant, either... Or do I misunderstand the quotes from the standard?  Anyway, it is pretty easy to modify my fix to be inline with Linux commands and C functions - everything could be the same, apart from the last line where I added "/."  -- this should be only "/".  So the entire function could look like this:

-- clip --


def normpath(path):
    """Normalize path, eliminating double slashes, etc."""
    if path == '':
        return '.'
    initial_slashes = path.startswith('/')
    # The next two lines were added by iszegedi
    path = path.rstrip()
    trailing_slash = path.endswith('/')
    # POSIX allows one or two initial slashes, but treats three or more
    # as single slash.
    if (initial_slashes and
        path.startswith('//') and not path.startswith('///')):
        initial_slashes = 2
    comps = path.split('/')
    new_comps = []
    for comp in comps:
        if comp in ('', '.'):
            continue
        if (comp != '..' or (not initial_slashes and not new_comps) or
             (new_comps and new_comps[-1] == '..')):
            new_comps.append(comp)
        elif new_comps:
            new_comps.pop()
    comps = new_comps
    path = '/'.join(comps)
    if initial_slashes:
        path = '/'*initial_slashes + path
    # The next two lines were added by iszegedi
    if trailing_slash:
        path = path + '/'
    return path or '.'


-- clip --

Nevertheless, I would really appreciate to receive some comments from POSIX gurus, how they see this problem.
msg31895 - (view) Author: Robert Siemer (siemer) Date: 2007-05-06 04:15
1) I (submitter) didn't specify what I expected to see:

os.path.normpath('/etc/passwd/') --> '/etc/passwd/'

So, I agree with the latest consensus, but definitely not with the "/etc/passwd/." version...


2) I can't draw any explicit normalization rules from the excerpts of the POSIX standard posted by iszegedi. Saying that "dir/" should be treated as "dir/." doesn't mean that it is the normalized version of the first one. - I actually read implicitly that the first one is the habitual one that needs interpretation.

And I think everybody agrees that - beeing the same or not - "dir/." is unusual.

3) I don't know what this is good for in the proposal:
path = path.rstrip()

It removes significant whitespace from the path, what must be avoided.
msg31896 - (view) Author: John Smith (josm) Date: 2007-05-06 09:31
Like all other python libraries, posixpath.normpath has its test cases,
which specify what normpath is supposed to work.
See test_normpath() in Lib/test/test_posixpath.py.

siemer, would you add some test cases for this problem to test_posixpath.py?
I think it needs not just adding but also need to update existent cases.
That means it would break backwards-compatibility.

According to http://www.python.org/dev/intro/,
changing existing functionality "requires python- dev as a whole to agree to the change."

So the TODO list would be

1. Write test cases for this problem and post it to the patch manager
2. Discuss the changes on python-dev at python.org and get agreement
3. Write patch and post it to the patch manager
4. Get some reviews
5. Commit

Correct Me If I'm Wrong.
msg31897 - (view) Author: Istvan Szegedi (iszegedi) Date: 2007-05-08 12:50
I agree with the approach that josm suggested. That is a reasonable way to get the fix approved or declined.

To response to the remarks from siemer:

1 - 2./ I think what we miss here is the exact definition what path normalization means. All examples that I have seen so far - including the comments in the posixpath python module as well - are about removing redundant information or extra dots and double dots from the path (e.g. A//B, A/./B and A/foo/../B all become A/B). I have not seen any precise definition how to deal with trailing slash in a pathname (except for the one that I copy&pasted from POSIX standard). 
That says explicitly that "a pathname ... SHALL BE RESOLVED as if a single dot .. were appended". 

Nevertheless, I can accept that since it is not implemented neither in regular UNIX shell utilities nor in C system libraries, we may opt for a "common sense solution" meaning that 

os.path.normpath('/etc/passwd/')  -->  '/etc/passwd/'


In fact, this is pretty simple to implement as I have already given an example; the original proposal needs to be modified by one character only:

Instead of adding '/.' to the end of the path, just add a '/': 

# The next two lines were added by iszegedi
    if trailing_slash:
        path = path + '/'



3./ path = path.rstrip()
This doesn't remove any significant whitespaces from the pathname, as far as I see. It only removes the trailing whitespaces from the pathname, the rest is going to stay as is. An example:

In the current posixpath module, if you use spaces at the end of the pathname string, it will be displayed like:

os.path.normpath('/etc/password/   ')   --->  '/etc/passwd/    '

if we use rstrip() as suggested, then the result would be:

os.path.normpath('/etc/password/   ')   --->  '/etc/passwd/'

which makes more sense to me.

But for instance:

os.path.normpath('/etc/   /passwd/')   ---> '/etc/   /passwd/'


Istvan
msg31898 - (view) Author: Robert Siemer (siemer) Date: 2007-05-08 14:24
Istvan, you seem to think that trailing whitespace is not significant. That assumption is wrong. '   ' is a perfect file name... Apart from spaces even newlines, etc. are okay - and not the same.

I'm also "working" on a good path name handling essay. (-:

Robert
msg31899 - (view) Author: Istvan Szegedi (iszegedi) Date: 2007-05-08 16:01
Robert,
I checked once again the POSIX definition of filenames and you are right; trailing whitespaces are allowed:

-- clip --
 3.169 Filename

A name consisting of 1 to {NAME_MAX} bytes used to name a file. The characters composing the name may be selected from the set of all character values excluding the slash character and the null byte. The filenames dot and dot-dot have special meaning. A filename is sometimes referred to as a "pathname component".
-- clip --


So I agree that this line

path = path.rstrip()

should be removed from the correction suggested by me.


-- clip --


def normpath(path):
    """Normalize path, eliminating double slashes, etc."""
    if path == '':
        return '.'
    initial_slashes = path.startswith('/')
    # The next line was added by iszegedi
    trailing_slash = path.endswith('/')
    # POSIX allows one or two initial slashes, but treats three or more
    # as single slash.
    if (initial_slashes and
        path.startswith('//') and not path.startswith('///')):
        initial_slashes = 2
    comps = path.split('/')
    new_comps = []
    for comp in comps:
        if comp in ('', '.'):
            continue
        if (comp != '..' or (not initial_slashes and not new_comps) or
             (new_comps and new_comps[-1] == '..')):
            new_comps.append(comp)
        elif new_comps:
            new_comps.pop()
    comps = new_comps
    path = '/'.join(comps)
    if initial_slashes:
        path = '/'*initial_slashes + path
    # The next two lines were added by iszegedi
    if trailing_slash:
        path = path + '/'
    return path or '.'


-- clip --

Nevertheless, the remaining question for me is still the same: what is the exact definition of normalized path in regards with trailing slash (in other words what is the correct solution: no slash at the end of the path OR there should be a slash at the end of the path OR or there should be a single . appended to the end of the path):

So what is the correct answer:
os.path.normpath('/etc/passwd/')  ---> '/etc/passwd'   OR
os.path.normpath('/etc/passwd/')  ---> '/etc/passwd/'  OR
os.path.normpath('/etc/passwd/')  ---> '/etc/passwd/.'

Once this is clarified, the solution is pretty easy.

Istvan
msg85474 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2009-04-05 10:27
The amount of discussion on this bug is already an indication that the
proposed change is questionable.  Combine this with
backwards-compatibility concerns, and it's enough reason not to change this.
History
Date User Action Args
2022-04-11 14:56:24adminsetgithub: 44901
2009-04-05 10:27:01georg.brandlsetstatus: open -> closed

nosy: + georg.brandl
messages: + msg85474

resolution: wont fix
2009-03-30 23:38:12ajaksu2setkeywords: + patch
stage: test needed
type: behavior
versions: + Python 2.6, Python 3.0, - Python 2.5
2007-04-25 23:44:05siemercreate