This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: os.stat producing incorrect / invalid results
Type: Stage:
Components: Library (Lib) Versions: Python 2.5
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: loewis Nosy List: joesalmeri, loewis, nnorwitz
Priority: normal Keywords:

Created on 2007-05-31 21:24 by joesalmeri, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
pa.zip loewis, 2007-06-02 09:04
Messages (5)
msg32157 - (view) Author: Joe (joesalmeri) Date: 2007-05-31 21:24
Python 2.51
Windows XP SP2 + all updates
*** NOTE all code has worked on previous python versions


I just upgraded from Python 2.4.2 to Python 2.5.1 and have found some 
unexpected behavior that appears to be a bug in the os.stat module.

My OS is Windows XP SP2 + all updates.

I have several programs that have worked flawlessly on all previous Python 
versions for years and they are now producing incorrect results in the code 
that uses os.stat.

Searching through the 2.5.1 release notes I found the following:

    Use Win32 API to implement os.stat/fstat. As a result, subsecond 
timestamps
    are reported, the limit on path name lengths is removed, and stat 
reports
    WindowsError now (instead of OSError).

*********************
* Overview of the problem:
*********************

Reviewing my code it seems that it should still work with the 2.5.1 os.stat 
changes however that does not appear to be the case.

Timestamps reported by os.stat() are no longer correct and the results are 
not even consistent.

In my first test case ALL 3 timestamps reported by Python are 1 hour less 
than the actual timestamp on the file.

In my second test case the created and last accessed timestamps reported by 
Python are correct but the last write timestamp is 1 hour less than the 
actual timestamp on the file.

As a third test I looked at the last write timestamps on approximately 
21,000 files.
Python determined wrong last write timestamp on approximately 1581 files.

Assuming there is no error in the following code that prints out the 
timestamps using the new return value from os.stat() then it would appear 
that the 2.5.1 os.stat changes have a bug.

print 'Creation    Time: %s' % time.strftime('%m/%d/%Y %H:%M:%S', 
time.localtime(file_stats[stat.ST_CTIME]))
print 'Last Access Time: %s' % time.strftime('%m/%d/%Y %H:%M:%S', 
time.localtime(file_stats[stat.ST_ATIME]))
print 'Last Write  Time: %s' % time.strftime('%m/%d/%Y %H:%M:%S', 
time.localtime(file_stats[stat.ST_MTIME]))


*********************
* Detailed test results
*********************

To demonstrate the problem I have created the following test.

Here are the files that will be used in my test and their associated 
timestamps as reported the the dir command.

01/02/2003  12:34 PM                       0 broke_test
03/06/2007  05:24 PM         3,497,177 broke_test2
05/31/2007  04:35 PM                   254 runtest.cmd
05/31/2007  04:31 PM                   513 broke.py

The file named broke_test has a timestamp of 01/02/2003 12:34:56 (dir does 
not show seconds).

The runtest.cmd script shows the created, last accessed, and last write 
timestamps as reported by the dir command (and also verified via right 
clicking on the file and selecting properties in Explorer.

------ >>> START runtest.cmd script <<<<
@Echo Off

echo Create TimeStamp reported by dir command
dir /tc %1

echo Last Access TimeStamp reported by dir command
dir /ta %1

echo Last Write TimeStamp reported by dir command
dir /tw %1

echo Python 2.5.1 timestamp info
broke.py %1

------ >>> END runtest.cmd script <<<<

The broke.py script prints out the created, last accessed, last write 
timestamps as Python sees them.

------ >>> START broke.py script <<<<
import sys
import os
import stat
import time

file_name  = sys.argv[1]

file_stats = os.stat(file_name)

print
print 'File Name       : %s' % (file_name)
print
print 'Creation    Time: %s' % time.strftime('%m/%d/%Y %H:%M:%S', 
time.localtime(file_stats[stat.ST_CTIME]))
print 'Last Access Time: %s' % time.strftime('%m/%d/%Y %H:%M:%S', 
time.localtime(file_stats[stat.ST_ATIME]))
print 'Last Write  Time: %s' % time.strftime('%m/%d/%Y %H:%M:%S', 
time.localtime(file_stats[stat.ST_MTIME]))
print

------ >>> END broke.py script <<<<

#
# Test 1 on file broke_test
#

runtest broke_test

Create TimeStamp reported by dir command

01/02/2003  12:34 PM                 0 broke_test

Last Access TimeStamp reported by dir command

01/02/2003  12:34 PM                 0 broke_test

Last Write TimeStamp reported by dir command

01/02/2003  12:34 PM                 0 broke_test

Python 2.5.1 timestamp info

File Name       : broke_test

Creation    Time: 01/02/2003 11:34:56   -- Python results are WRONG hours 
reported are 1 hour short
Last Access Time: 01/02/2003 11:34:56   -- Python results are WRONG hours 
reported are 1 hour short
Last Write  Time: 01/02/2003 11:34:56   -- Python results are WRONG hours 
reported are 1 hour short

#
# Test 2 on file broke_test2
#

runtest broke_test2

Create TimeStamp reported by dir command

05/31/2007  04:26 PM         3,497,177 broke_test2

Last Access TimeStamp reported by dir command

05/31/2007  04:26 PM         3,497,177 broke_test2

Last Write TimeStamp reported by dir command

03/06/2007  05:24 PM         3,497,177 broke_test2

Python 2.5.1 timestamp info

File Name       : broke_test2

Creation    Time: 05/31/2007 16:26:36   -- Python results are correct
Last Access Time: 05/31/2007 16:26:38   -- Python results are correct
Last Write  Time: 03/06/2007 16:24:21   -- Python results are WRONG hours 
reported are 1 hour short



















msg32158 - (view) Author: Joe (joesalmeri) Date: 2007-06-01 01:19
More details and a program that demonstrates the problem

I created a file and specifically set the created date, last accessed date 
and last write date to

    01/02/2003 12:34:56

After setting the date and time on the file I verified the following:

The dir /tc command shows the correct Create date of 01/02/2003 12:34
The dir /ta command shows the correct Access date of 01/02/2003 12:34
The dir /tw command shows the correct Write date of 01/02/2003 12:34

Looking at the properties of the file in Windows Explorer also shows that 
all 3 dates match.

When Python 2.4.2 os.stat is used to get those timestamps all 3 are CORRECT

When Python 2.5.1 os.stat is used to get those timestamps all 3 are WRONG 
and they are off by exact 1 hour

    01/02/2003 11:34:56 instead of 01/02/2003 12:34:56

In the case of my above test I know exactly what the timestamp on the file 
is because I manually set it so that all 3 timestamps are the same.
Since Python 2.5.1 does not return the known values for that files 
timestamps how can it not be a Python 2.5.1 bug?

Further testing shows that the results are somewhat inconsistent, many times 
the create and access date are correct but the Last Write timestamp is 
wrong.  It is generally off by one hour but I have seen situations where it 
was +1 hour and other situations where it was -1 hour.

I even found situations where the python timestamp was 1 minute later.  (I 
know about the 2 second timestamps on FAT, all my filesystems are NTFS).  I 
just found a situation where the python timestamp was 02:51 PM and the 
windows timestamp was 02:12 PM.   DST or timezone changes are not going to 
make the results be off by 39 minutes?   (My timezone is GMT - 5:00).

As an additional test I wrote the following Python program which takes a 
directory name as a paramater.
It retrieves all 3 dates for all files in all directories and subdirectories 
for the specified directory name.
First it gets the dates using Python and os.stat, then it runs the dir /tc, 
dir /ta, and dir /tw commands on the file.
Finally it compares the dates and times returned by Python against the dates 
and times that the Windows dir command is reporting.

If you run the program using Python 2.4.2 then ALL dates and times returned 
by Python 2.4.2 correctly match the dates and times that Windows reports.

If you run the program using Python 2.5.1 against the same directory then 
you get intermittient results.

As a quick test I ran it against a directory on my home directory.

Python 2.4.2 all dates matched between what the Windows dir command reported 
and what Python 2.4.2 reported.

Python 2.5.1 had the following differences (more than 50% of the time it was 
wrong)

Total Files Processed    : 149

Matched All 3 Dates      : 70
Did NOT Match All 3 Dates: 79

Matched Create Date      : 69
Did NOT Match Create Date: 10

Matched Access Date      : 59
Did NOT Match Access Date: 20

Matched Write Date       : 11
Did NOT Match Write Date : 68

Here's the source for the check_dates.py program.  Run it for youself and 
see what your results are.

import os
import stat
import sys
import time

if len(sys.argv) == 1:
    print
    print '%s path_to_check' % (sys.argv[0])
    print
    raise SystemExit
else:
    path_to_check       = sys.argv[1]

file_count              = 0

file_match_count        = 0
file_no_match_count     = 0

create_match_count      = 0
create_no_match_count   = 0

access_match_count      = 0
access_no_match_count   = 0

write_match_count       = 0
write_no_match_count    = 0

for root, dirs, files in os.walk(r'%s' % path_to_check):
    for file in files:
        file_count      = file_count + 1
        file_name       = os.path.join(root, file)

        create_ts_match = False
        access_ts_match = False
        write_ts_match  = False

        file_stats  = os.stat(file_name)

        python_create_ts    = time.strftime('%m/%d/%Y  %I:%M %p', 
time.localtime(file_stats[stat.ST_CTIME]))
        python_access_ts    = time.strftime('%m/%d/%Y  %I:%M %p', 
time.localtime(file_stats[stat.ST_ATIME]))
        python_write_ts     = time.strftime('%m/%d/%Y  %I:%M %p', 
time.localtime(file_stats[stat.ST_MTIME]))

        win_create_ts       = os.popen('dir /a /tc "%s"' % 
(file_name)).readlines()[5][0:20]
        win_access_ts       = os.popen('dir /a /ta "%s"' % 
(file_name)).readlines()[5][0:20]
        win_write_ts        = os.popen('dir /a /tw "%s"' % 
(file_name)).readlines()[5][0:20]

        if python_create_ts == win_create_ts:   create_ts_match = True
        if python_access_ts == win_access_ts:   access_ts_match = True
        if python_write_ts  == win_write_ts:    write_ts_match  = True

        if create_ts_match and access_ts_match and write_ts_match:
            # All timestamps match
            file_match_count            = file_match_count + 1
        else:
            file_no_match_count         = file_no_match_count + 1
            print
            print 'File Name       : %s' % (file_name)
            print

            if create_ts_match:
                create_match_count      = create_match_count + 1
            else:
                create_no_match_count   = create_no_match_count + 1

                print 'python_create_ts: %s' % (python_create_ts)
                print 'win_create_ts   : %s' % (win_create_ts)
                print

            if access_ts_match:
                access_match_count      = access_match_count + 1
            else:
                access_no_match_count   = access_no_match_count + 1

                print 'python_access_ts: %s' % (python_access_ts)
                print 'win_access_ts   : %s' % (win_access_ts)
                print

            if write_ts_match:
                write_match_count       = write_match_count + 1
            else:
                write_no_match_count    = write_no_match_count + 1

                print 'python_write_ts : %s' % (python_write_ts)
                print 'win_write_ts    : %s' % (win_write_ts)
                print

#
# Show Count Results
#

print 'Total Files Processed    : %s' % (file_count)
print
print 'Matched All 3 Dates      : %s' % (file_match_count)
print 'Did NOT Match All 3 Dates: %s' % (file_no_match_count)
print
print 'Matched Create Date      : %s' % (create_match_count)
print 'Did NOT Match Create Date: %s' % (create_no_match_count)
print
print 'Matched Access Date      : %s' % (access_match_count)
print 'Did NOT Match Access Date: %s' % (access_no_match_count)
print
print 'Matched Write Date       : %s' % (write_match_count)
print 'Did NOT Match Write Date : %s' % (write_no_match_count)
print

msg32159 - (view) Author: Neal Norwitz (nnorwitz) * (Python committer) Date: 2007-06-01 05:39
Martin, was there a similar problem to this one?  I looked in the log for posixmodule.c and didn't see this fixed.
msg32160 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2007-06-02 09:04
I think there are several things going on here, and I believe that Python is behaving correctly.

On Win32, there are two ways to find out the time stamps of a file: FindFirstFile reads it from the directory, and GetFileAttributesEx gets it directly from the file (i.e. on NTFS, from the MFT record). When a file is modified, NTFS normally modifies the file only, and updates the directory only occasionally (due to hard links, there might be multiple directories to update also).

In particular, the last access time is rarely written back to the directory. You can easily see this for yourself with the following procedure:

1. Inspect the last-access time with dir /ta, or in Explorer.
2. Open the file in an editor, but don't edit it; close the editor.
3. Inspect the last-access time with dir /ta, and with broke.py
4. Wait 10 minutes.
5. Repeat steps 2 and 3.

You should see that broke.py will report that the file was accessed, whereas Explorer and dir/ta will report an "old" last-acecssed time.

I created a program that displays both the directory-stored and the file-stored last-acecss time (attached). This program may produce outputs such as

C:\Pyinst\25\python>..\..\unstop\pa\Debug\pa.exe .svn\entries
Last Access according to:
      FindFirstFile: 128252459235000000 2007-6-2 8:18:43
GetFileAttributesEx: 128252466734062500 2007-6-2 8:31:13

Please refer to 

http://www.microsoft.com/resources/documentation/windows/xp/all/proddocs/en-us/fsutil_behavior.mspx?mfr=true

There, they explain that the access time (if it is stored at all, which can
be turned off) gets updated on disk only if the actual last-access time
and the stored last-access time differ by 1 hour. So if you perform steps
2 and 3 above after 1 hour again, you will indeed see that the last access
time gets updated in explorer, too.

Python 2.4 did use the same (incorrect) source that dir/ta and the explorer
uses; Python 2.5 has fixed that bug.

So I think we should remove the last-access time from our discussion.

File Added: pa.zip
msg32161 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2007-06-03 08:02
As discussed on comp.lang.python, the 1hour difference is just a display issue resulting from daylight-saving time, not an issue with the os.stat result. Closing this report as invalid.
History
Date User Action Args
2022-04-11 14:56:24adminsetgithub: 45019
2007-05-31 21:24:13joesalmericreate