This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Pure Python strptime() (PEP 42)
Type: Stage:
Components: Library (Lib) Versions: Python 2.3
process
Status: closed Resolution: accepted
Dependencies: Superseder:
Assigned To: gvanrossum Nosy List: barry, brett.cannon, gvanrossum, jaraco, nascheme, nnorwitz, skip.montanaro
Priority: normal Keywords: patch

Created on 2001-10-23 23:15 by brett.cannon, last changed 2022-04-10 16:04 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
diff_time brett.cannon, 2002-07-12 21:03 diff for time importing _strptime against 2002-07-10 CVS
_strptime.py brett.cannon, 2002-07-18 21:24 uploaded 2002-07-17
test_time.py brett.cannon, 2002-07-18 21:25 uploaded 2002-07-17
test_strptime.py brett.cannon, 2002-07-18 21:27 uploaded 2002-07-17
libtime_diff brett.cannon, 2002-07-18 23:30 uploaded 2002-07-17: patch for lib/libtime.tex
Messages (39)
msg37953 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2001-10-23 23:15
The attached file contains a pure Python version of
strptime().  It attempts to operate as much like
time.strptime() within reason.  Where vagueness or
obvious platform dependence existed, I tried to
standardize and be reasonable.

PEP 42 makes a request for a portable, consistent
version of time.strptime():

- Add a portable implementation of time.strptime() that
works in
      clearly defined ways on all platforms.

This module attempts to close that feature request.

The code has been tested thoroughly by myself as well
as some other people who happened to have caught the
post I made to c.l.p a while back and used the module.

It is available at the Python Cookbook
(http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/56036).
 It has been approved by the editors there and thus is
listed as approved.  It is also being considered for
inclusion in the book (thanks, Alex, for encouraging
this submission).

A PyUnit testing suite for the module is available at
http://www.ocf.berkeley.edu/~bac/Askewed_Thoughts/HTML/code/index.php3#strptime
along with the code for the function itself.
Localization has been handled in a modular way using
regexes.  All of it is self-explanatory in the doc
strings.  It is very straight-forward to include your
own localization settings or modify the two languages
included in the module  (English and Swedish).

If the code needs to have its license changed, I am
quite happy to do it (I have already given the OK to
the Python Cookbook).

-Brett Cannon
msg37954 - (view) Author: Neil Schemenauer (nascheme) * (Python committer) Date: 2002-03-23 22:41
Logged In: YES 
user_id=35752

I'm pretty sure this code needs a different license before
it can be accepted.  The current license contains the
"BSD advertising clause".  See
http://www.gnu.org/philosophy/bsd.html.

msg37955 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2002-03-24 23:01
Logged In: YES 
user_id=357491

Oops.  I thought I had removed the clause.  Feel free to
remove it.

I am going to be cleaning up the module, though, so if you
would rather not bother reviewing this version and wait on
the cleaned-up one, go ahead.

Speaking of which, should I just reply to this bugfix when I
get around to the update, or start a new patch?
msg37956 - (view) Author: Neil Schemenauer (nascheme) * (Python committer) Date: 2002-03-24 23:15
Logged In: YES 
user_id=35752

Go ahead and reuse this item.  I'll wait for the updated
version.
msg37957 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2002-05-27 21:38
Logged In: YES 
user_id=357491

Version 2 of strptime() has now been uploaded.  This nearly
complete rewrite includes the removal of the need to input
locale-specific time info.  All need locale info is gleaned
from time.strftime().  This makes it able to behave exactly
like time.strptime().
msg37958 - (view) Author: Neal Norwitz (nnorwitz) * (Python committer) Date: 2002-06-01 13:46
Logged In: YES 
user_id=33168

Overall, the patch looks pretty good.  
I didn't check for completeness or consistency, though.

 * You don't need: from exceptions import Exception
 * The comment "from strptime import * will only export
strptime()" is not correct.
 * I'm not sure what should be included for the license.
 * Why do you need success flag in CheckIntegrity, you raise
an exception?
    (You don't need to return anything, raise an exception,
else it's ok)
 * In return_time(), could you change xrange(9) to
range(len(temp_time))
    this removes a dependancy.
 * strings are sequences, so instead of if found in ('y', 'Y')
    you can do if found in 'yY'
 * daylight should use the new bools True, False
   (this also applies to any other flags) * The formatting
doesn't follow the standard (see PEP 8)
    (specifically, spaces after commas, =, binary ops,
comparisons, etc)
 * Long lines should be broken up
The test looks pretty good too.  I didn't check it for
completeness.
The URL is wrong (too high up), the test can be found here:
 http://www.ocf.berkeley.edu/~bac/Askewed_Thoughts/code/Python/Scripts/test_strptime.py
I noticed a spelling mistake in the test: anme -> name.

Also, note that PEP 42 has a comment about a python strptime.
So if this gets implemented, we need to update PEP 42.
Thanks.
msg37959 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2002-06-02 07:44
Logged In: YES 
user_id=357491

I'm afraid you looked at the wrong patch!  My fault since I
accidentally forgot to add a description for my patch.  So
the file with no description is the newest one and
completely supercedes the older file.  I am very sorry about
that.  Trust me, the new version is much better.

I realized the other day that since the time module is a C
extension file, would getting this accepted require getting
BDFL approval to add this as a separate module into the
standard library?  Would the time module have to have a
Python interface module where this is put and all other
methods in the module just pass directly to the extension file?

As for the suggestions, here are my replies to the ones that
still apply to the new file:
* strings are sequences, so instead of if found in ('y',
'Y') you can do if found in 'yY'
-> True, but I personally find it easier to read using the
tuple.  If it is standard practice in the standard library
to do it the suggested way, I will change it.

* daylight should use the new bools True, False (this also
applies to any other flags)
-> Oops.  Since I wrote this under Python 2.2.1 I didn't
think about it.  I will go through the code and look for
places where True and False should be used.

-Brett C.
msg37960 - (view) Author: Skip Montanaro (skip.montanaro) * (Python triager) Date: 2002-06-04 04:52
Logged In: YES 
user_id=44345

Brett,

Please see the drastically shortened test_strptime.py.  (Basically all I'm
interested in here is whether or not strptime.strptime and time.strptime
will pass the tests.)  Near the top are two lines, one commented out:

  parsetime = time.strptime
  #parsetime = strptime.strptime

Regardless which version of parsetime I get, I get some errors.  If 
parsetime == time.strptime I get

======================================================================
ERROR: Test timezone directives.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_strptime.py", line 69, in test_timezone
    strp_output = parsetime(strf_output, "%Z")
ValueError: unconverted data remains: 'CDT'

If parsetime == strptime.strptime I get

ERROR: *** Test %c directive. ***
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_strptime.py", line 75, in test_date_time
    self.helper('c', position)
  File "test_strptime.py", line 17, in helper
    strp_output = parsetime(strf_output, '%'+directive)
  File "strptime.py", line 380, in strptime
    found_dict = found.groupdict()
AttributeError: NoneType object has no attribute 'groupdict'

======================================================================
ERROR: Test for month directives.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_strptime.py", line 31, in test_month
    self.helper(directive, 1)
  File "test_strptime.py", line 17, in helper
    strp_output = parsetime(strf_output, '%'+directive)
  File "strptime.py", line 393, in strptime
    month = list(locale_time.f_month).index(found_dict['b'])
ValueError: list.index(x): x not in list

This is with a very recent interpreter (updated from CVS in the past 
day) running on Mandrake Linux 8.1.

Can you reproduce either or both problems?  Got fixes for the 
strptime.strptime problems?

Thx,

Skip
msg37961 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2002-06-04 05:35
Logged In: YES 
user_id=357491

I have uploaded a verision 2.0.1 which fixes the %b format
bug (stupid typo on a variable name).

As for the %c directive, I pass that test.  Can you please
send the output of strftime and the time tuple used to
generate it?

As for the time.strptime() failure, I don't have
time.strptime() on any system available to me, so could you
please send me the output you have for strftime('%Z'), and
time.tzname?

I don't know how much %Z should be worried about since its
use is deprecated (according to the time module's
documentation).  Perhaps strptime() should take the
initiative and not support it?

-Brett
msg37962 - (view) Author: Skip Montanaro (skip.montanaro) * (Python triager) Date: 2002-06-04 05:55
Logged In: YES 
user_id=44345

Here ya go...

% ./python
Python 2.3a0 (#185, Jun  1 2002, 23:19:40) 
[GCC 2.96 20000731 (Mandrake Linux 8.1 2.96-0.62mdk)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import time
>>> now = time.localtime(time.time())
>>> now
(2002, 6, 4, 0, 53, 39, 1, 155, 1)
>>> time.strftime("%c", now)
'Tue Jun  4 00:53:39 2002'
>>> time.tzname
('CST', 'CDT')
>>> time.strftime("%Z", now)
'CDT'
msg37963 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2002-06-04 22:33
Logged In: YES 
user_id=357491

Thanks for being so prompt with your response, Skip.

I found the problem with your %c.  If you look at your
output you will notice that the day of the month is '4', but
if you look at the docs for time.strftime() you will notice
that is specifies the day of the month (%d) as being in the
range [01,31].  The regex for %d (simplified) is
'(3[0-1])|([0-2]\d)'; not being represented by 2 digits
caused the regex to fail.

Now the question becomes do we follow the spec and chaulk
this up to a non-standard strftime() implementation, or do
we adapt strptime to deal with possible improper output from
strftime()?  Changing the regexes should not be a big issue
since I could just tack on '\d' as the last option for all
numerical regexes. 

As for the test error from time.strptime(), I don't know
what is causing it.  If you look at the test you will notice
that all it basically does is parsetime(time.strftime("%Z"),
"%Z").  Now how that can fail I don't know.  The docs do say
that strptime() tends to be buggy, so perhaps this is a case
of this.

One last thing.  Should I wait until the bugs are worked out
before I post to python-dev asking to either add this as a
module to the standard library or change time to a Python
stub and rename timemodule.c?  Should I ask now to get the
ball rolling?  Since I just joined python-dev literally this
morning I don't know what the protocol is.
msg37964 - (view) Author: Neal Norwitz (nnorwitz) * (Python committer) Date: 2002-06-04 23:33
Logged In: YES 
user_id=33168

Hopefully, I'm looking at the correct patch this time. :-)

To answer one question you had (re:  'yY' vs. ('y', 'Y')),
I'm not sure people really care.  It's not big to me.
Although 'yY' is faster than ('y', 'Y').

In order to try to reduce the lines where you raise an error
(in __init__)
you could change 'sequence of ... must be X items long' to
'... must have/contain X items'.

Generally, it would be nice to make sure none of the lines
are over 72-79 chars (see PEP 8).

Instead of doing:
    newlist = list(orig)
    newlist.append('')
    x = tuple(newlist)

you could do:
    x = tuple(orig[:])
or something like that.  Perhaps a helper function?

In __init__ do you want to check the params against 'is None'
If someone passes a non-sequence that doesn't evaluate
to False, the __init__ won't raise a TypeError which it
probably should.

What is the magic date used in __calc_weekday()?
  (1999/3/15+ 22:44:55)  is this significant, should there
be a comment?
  (magic dates are used elsewhere too, e.g., __calc_month,
__calc_am_pm, many more)

__calc_month() doesn't seem to take leap year into account?
  (not sure if this is a problem or not)
In __calc_date_time(), you use date_time[offset] repetatively,
  couldn't you start the loop with something like dto =
date_time[offset] and then use dto
  (dto is not a good name, I'm just making an example)

Are you supposed to use __init__ when deriving from
built-ins (TimeRE(dict)) or __new__?
  (sorry, I don't remember the answer)

In __tupleToRE.sorter(), instead of the last 3 lines, you
can do:
  return cmp(b_length, a_length)

Note:  you can do x = y = z = -1, instead of x = -1 ; y = -1
; z = -1

It could be problematic to compare x is -1.  You should
probably just use ==.
It would be a problem if NSMALLPOSINTS or NSMALLNEGINTS
were not defined in Objects/intobject.c.

This docstring seems backwards:
def gregToJulian(year, month, day):
    """Calculate the Gregorian date from the Julian date."""
I know a lot of these things seem like a pain.
And it's not that bad now, but the problem is maintaining
the code.  It will be easier for everyone else if the code
is similar to the rest.

BTW, protocol on python-dev is pretty loose and friendly. :-)
msg37965 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2002-06-05 06:32
Logged In: YES 
user_id=357491

I went ahead an implemented most of Neal's suggestions.  On
a few, of them, though, I either didn't do it or took a
slightly different route.

For the 'yY' vs. ('y', 'Y'), I went with 'yY'.  If it gives
a performance boost, why not since it doesn't make the code
harder to read.  Implementing it actually had me catch some
redundant code for dealing with a literal %.

The tests in the __init__ for LocaleTime have been reworked
to check that they are either None or have the proper
length, otherwise they raise a TypeError.

I have gone through and tried to catch all the lines that
were over 80 characters and cut them up to fit.

For the adding of '' to tuples, I created a method that
could specify front or back concatination.  Not much
different from before, but it allows me to specify front or
back concatination easily.

I explained why the various magic dates were used.

I in no way have to worry about leap year.  Since it is not
validating the data string for validity the fxn just takes
the data and uses it.  I have no reason to calc for leap year.

date_time[offset] has been replaced with current_format and
added the requisite two lines to assign between it and the list.

You are only supposed to use __new__ when it is immutable. 
Since dict is obviously mutable, I don't need to worry about it.

Used Neal's suggested shortening of the sorter helper fxn.

I also used the suggestion of doing x = y = z = -1.  Now it
barely fits on a single line instead of two.

All numerical compares use == and != instead of is and is
not.  Didn't know about that dependency on
NSMALL((POS)|(NEG))INTS; good thing to know.

The doc string was backwards.  Thanks for catching that, Neal.

I also went through and added True and False where
appropriate.  There is a line in the code where True = 1;
False = 0 right at the top.  That can obviously be removed
if being run under Python 2.3.

And I completely understand being picky about minute details
where maintainability is a concern.  I just graduated from
Cal and so the memory of seeing beginning programmers' code
is still fresh in my mind <shudders>.

And I will query python-dev about how to go about to get
this added after the bugs are fixed and I am back home
(going to be out of town until June 16).  I will still be
periodically checking email, though, so I will continue to
implement any suggestions/bugfixes that anyone suggests/finds.
msg37966 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2002-06-13 00:46
Logged In: YES 
user_id=357491

I am back from my vacation and ready to email python-
dev about getting this patch accepted (whether to modify 
time or make this a separate module, etc.).  I think I will 
do the email on June 17.

Before then, though, I am going to make two changes.  
One is the raise a Value Error exception if the regex doesn't 
match (to try to match time.strptime()s exception as seen 
in Skip's run of the unit test).  The other change is to tack 
on a \d on all numeric formats where it might come out as 
a single digit (i.e., lacking a leading zero).  This will be for 
v2.0.3 which I will post before June 17.

If there is any reason anyone thinks I should hold back on 
this, please let me know!  I would like to have this code as 
done as possible before I make any announcement.
msg37967 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2002-06-17 20:11
Logged In: YES 
user_id=357491

I uploaded v.2.0.3.  Beyond implementing what I mentioned
previously (raising TypeError when a match fails, adding \d
to all applicable regexes) I did a few more things.

For one, I added a special " \d" to the numeric month regex.
 I discovered that ANSI C for ctime displays the month with
a leading space if it is a single digit.  So to deal with
that since at least Skip's C library likes to use that
format for %c, I went ahead and added it.

I changed all attributes in LocaleTime to lists.  A recent
mail on python-dev from GvR said that lists are for
homogeneous data, which everything that is grouped together
in LocaleTime is.  It also simplified the code slightly and
led to less conversions of data types.

I also added a method that raises a TypeError if you try to
assign to any of LocaleTime's attributes.  I thought that if
you left out the set value for property() it wouldn't work;
didn't realize it just defaults over to __setitem__.  So I
added that method as the set value for all of the property()s.

It does require 2.2.1 now since I used True and False
without defining them.  Obviously just set those values to 1
and 0 respectively if you are running under 2.2

I also updated the overly exhaustive PyUnit suite that I
have for testing my code.   It is not black-box testing,
though; Skip's pruned version of my testing suite fits that
bill (I think).
msg37968 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2002-06-18 19:05
Logged In: YES 
user_id=357491

I have uploaded v. 2.0.4.  It now uses the calendar module
to figure out the names of weekdays and months.  Thanks goes
out to Guido for pointing out this undocumented feature of
calendar.
msg37969 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2002-06-19 21:49
Logged In: YES 
user_id=357491

2.1.0 is now up and ready for use.  I only changed two
things to the code, but since they change the semantics of
stprtime()s use, I made this a new minor release.

One, I removed the ability to pass in your own LocaleTime
object.  I did this for two reasons.  One is because I
forgot about how default arguments are created at the time
of function creation and not at each fxn call.  This meant
that if someone was not thinking and ran strptime() under
one locale and then switched to another locale without
explicitly passing in a new LocaleTime object for every call
for the new locale, they would get bad matches.  That is not
good.

The other reason was that I don't want to force users to
pass in a LocaleTime object on every call if I can't have a
default value for it.  This is meant to act as a drop-in
replacement for time.strptime().  That forced the removal of
the parameter since it can't have a default value.

In retrospect, though, people will probably never parse log
files in other languages other then there default locale. 
And if they were, they should change the locale for the
interpreter and not just for strptime().

The second change was what triggers strptime() to return an
re object that it can use.  Initially it was any nothing
value (i.e., would be considered false), but I realized that
an empty string could trigger that and it would be better to
raise a TypeError then let some error come up from trying to
use the re object in an incorrect way.

Now, to have an re object returned, you pass in False.  I
figured that there is a very minimal chance of passing in
False when you meant to pass in a string.  Also, False as
the data_string, to me, means that I don't want what would
normally be returned.

I debated about removing this feature from strptime(), but I
profiled it and most of the time comes from TimeRE's
__getitem__.  So building the string to be compiled into a
regex is the big bottleneck.  Using a precompiled regex
instead of constructing a new one everytime took 25% of the
time overall for strptime() when calling strptime() 10,000
times in a row.  This is a conservative number, IMO, for
calls in a row; I checked the Apache hit logs for a single
day on Open Computing Facility's web server
(http://www.ocf.berkeley.edu/) and there were 188,562 hits
on June 16 alone.  So I am going to keep the feature until
someone tells me otherwise.
msg37970 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2002-06-21 04:35
Logged In: YES 
user_id=357491

I have uploaded a contextual diff of timemodule.c with a
callout to strptime.strptime when HAVE_STRPTIME is not
defined just as Guido requested.

It's my first extension module, so I am not totally sure of
myself with it.  But since Alex Marttelli told me what I
needed to do I am fairly certain it is correct.
msg37971 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2002-06-24 01:06
Logged In: YES 
user_id=357491

2.1.1 is now uploaded.  Almost a purely syntatical change. 
From discussions on python-dev I renamed the helper fxns so
they are all lowercase-style.  Also changed them so that
they state what the fxn returns.

I also put all of the imports on their own line as per PEP 8.

The only semantical change I did was directly import
re.compile since it is the only thing I am using from the re
module.

These changes required tweaking of my exhaustive testing
suite, so that got uploaded, too.
msg37972 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2002-06-27 04:54
Logged In: YES 
user_id=357491

Uploaded 2.1.2 (but accidentally labelled it 2.1.3 down
below!).  Just a little bit more cleanup.  Biggest change is
that I changed the default format string and made strptime()
raise ValueError instead of TypeError.  This was all done to
match the time module docs.

I also fiddled with the regexes so that the groups were
none-capturing.  Mainly done for a possible performance
improvement.
msg37973 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2002-07-10 20:51
Logged In: YES 
user_id=357491

The actual 2.1.3 edition of strptime is now up.  I don't
think there are any changes, but since I renamed the file
_strptime.py, I figured uploading it again wouldn't hurt.

I also uploaded a new contextual diff of the time module
taken from CVS on 2002-07-10.  The only difference between
this and the previous diff (which was against 2.2.1's time
module) is the change of the imported module to _strptime.
msg37974 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2002-07-12 16:58
Logged In: YES 
user_id=6380

Hm.  This isn't done yet. I get these problems:

(a) the patch for timemodule.c doesn't apply cleanly in
current CVS (trivial)

(b) it still tries to import strptime (no leading '_') (also
trivial)

(c) so does test_strptime.py (also trivial)

(d) the simplest of simple examples fails:

With Linux's strptime:

>>> time.strptime("7/12/02", "%m/%d/%y")
(2002, 7, 12, 0, 0, 0, 4, 193, 0)
>>>

With yours:

>>> time.strptime("7/12/02", "%m/%d/%y")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "/home/guido/python/dist/src/Lib/_strptime.py", line
392, in strptime
    raise ValueError("time data did not match format")
ValueError: time data did not match format
>>> 

Perhaps you should write a regression test suite for the
strptime function as found in the time module courtesy of
libc, and then make sure that your code satisfies it?
msg37975 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2002-07-12 21:02
Logged In: YES 
user_id=357491

To respond to your points, Guido:

(a) I accidentally uploaded the old file.  Sorry about that.
 I misnamed the new one 'time_diff" but in my head I meant
to overwrite "diff_time".  I have uploaded the new one.

(b) See (a)

(c)  Oops.  That is a complete oversight on my part.  Now in
(d) you mention writing up regression tests for the standard
time.strptime.  I am quite hapy to do this.  Do you want
that as a separate patch?  If so I will just stop with
uploading tests here and just start a patch with my strptime
tests for the stdlib tests.

(d) The reason this test failed is because your input is not
compliant with the Python docs.  Read what %m accepts:

Month as a decimal number [01,12]

Notice the leading 0 for the single digit month.  My
implementation follows the docs and not what glibc suggests.
 If you want, I can obviously add on to all the regexes \d
as an option and eliminate this issue.  But that means it
will no longer be following the docs.  This tripped Skip up
too since no one writes numbers that way; strftime does, though.
Now if the docs meant for no trailing 0, I think they should
be rewritten since that is misleading.

In other words, either strptime stays as it is and follows
the docs or I change the regexes, but then the docs will
have to be changed.  I can go either way, but I personally
would want to follow the docs as-is since strptime is meant
to parse strftime output and not human output.  =)
msg37976 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2002-07-12 21:13
Logged In: YES 
user_id=6380

Hm, the new diff_time *still* fails to apply. But don't
worry about that.

I'd love to see regression tests for time.strptime. Please
upload them here -- don't start a new patch.

I think your interpretation of the docs is overly
restrictive; the table shows what strftime does but I think
it's reasonable for strptime to accept missing leading
zeros. You can upload a patch for the docs too if you feel
that's necessary. You may also try to read up on what the
XPG standard says about strptime.
msg37977 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2002-07-12 22:27
Logged In: YES 
user_id=357491

Uploaded 2.1.4.  I added \d to the end of all relevant
regexes (basically all of them but %y and %Y) to deal with
non-zero-leading numbers.

I also made the regex case-insensitive.

As for the diff failing, I am wondering if I am doing
something wrong.  I am just running diff -c CVS_file
modified_file > diff_file .  Isn't that right?

I will work on merging my strptime tests into the time
regression tests and upload a patch here.

I will do a patch for the docs since it is not consistent
with the explanation of struct_time (or at least in my opinion).

I tried finding XPG docs, but the best Google came up with
was the NetBSD man pages for strptime (which they claim is
XPG compliant).  The difference between that implementation
and mine is that NetBSD's allows whitespace (defined as
isspace()) in the format string to match \s* in the data
string.  It also requires a whitespace or a non-alphanumeric
character while my implementation does not require that.

Personally, I don't like either difference.  If they were
used, though, there might be a possibility of rewriting
strptime to just use a bunch of string methods instead of
regexes for a possible performance benefit.  But I prefer
regexes since it adds checks of the input.  That and I just
like regexes period.  =)

Also, I noticed that your little test returned 0 for all
unknown values.  Mine returns -1 since 0 can be a legitimate
value for some and I figured that would eliminate ambiguity.
 I can change it to 0, though.
msg37978 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2002-07-16 21:34
Logged In: YES 
user_id=357491

Two things have been uploaded.  First, test_time.py w/ a
strptime test.  It is almost an exact mirror of the strftime
test; only difference is that I used strftime to test
strptime.  So if strftime ever fails, strptime will fail
also.  I feel this is fine since strptime depends on
strftime so much that if strftime were to fail strptime
would definitely fail.

The other file is version 2.1.5 of strptime.  I made two
changes.  One was to remove the TypeError raised when %I was
used without %p.  This was from me being very picky about
only accepting good data strings.  The second was to go
through and replace all whitespace in the format string with
\s*.  That basically makes this version of strptime XPG
compatible as far as I (and the NetBSD man page) can tell. 
The only difference now is that I do not require whitespace
or a non-alphanumeric character between format strings. 
Seems like a ridiculous requirement since the requirement
that whitespace be able to compress down to no whitespace
negates this requirement.  Oh well, we are more than
compliant now.

I decided not to write a patch for the docs to make them
read more leniently for what the format directives.  Figured
I would just let people who think like me do it in a more
"proper" way with leading zeros and those who don't read it
like that to still be okay.

I think that is everything.  If you want more in-depth
tests, Guido, I can add them to the testing suite, but I
figured that since this is (hopefully) going in bug-free it
needs only be checked to make sure it isn't broken by
anything.  And if you do want more in-depth tests, do you
want me to add mirror tests for strftime or not worry about
that since that is the ANSI C library's problem?  Other then
that, I think strptime is pretty much done.
msg37979 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2002-07-18 15:29
Logged In: YES 
user_id=6380

- Can you please delete all the obsolete uploads? (If SF
won't let you, let me know and I'll do it for you, leaving
only the most recend version of each.)

- There' still a confusion between strptime.py and
_strptime.py; your test_time.py imports strptime, and so
does the latest version of test_strptime.py I can find.

- The "from __future__ import division" is unnecessary,
since you're never using the single / operator (// doesn't
need the future statement). Also note that future statements
should come *after* a module's docstring (for future
reference :-).

- When I run test_strptime.py, I get one failure:

======================================================================
FAIL: Test TimeRE.pattern.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "../Lib/test/test_strptime.py", line 124, in test_pattern
   
self.failUnless(pattern_string.find("(?P<d>(3[0-1])|([0-2]\d)|\d|(
\d))") != -1, "did not find 'd' directive pattern string
'%s'" % pattern_string)
  File "/home/guido/python/dist/src/Lib/unittest.py", line
262, in failUnless
    if not expr: raise self.failureException, msg
AssertionError: did not find 'd' directive pattern string
'(?P<a>(?:Mon)|(?:Tue)|(?:Wed)|(?:Thu)|(?:Fri)|(?:Sat)|(?:Sun))\s*(?P<A>(?:Wednesday)|(?:Thursday)|(?:Saturday)|(?:Tuesday)|(?:Monday)|(?:Friday)|(?:Sunday))\s*(?P<d>3[0-1]|[0-2]\d|\d|
\d)'

----------------------------------------------------------------------

I haven't looked into this deeper.

Back to you...
msg37980 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2002-07-18 21:39
Logged In: YES 
user_id=357491

God I wish I could delete those old files!  Poor Neal
Norwitz was nice enough to go over my code once to help me
make it sure it was up for being included in the stdlib, but
he initially used an old version.  Thankfully he was nice
enough to look over the newer version at the time.  But no,
SF does not give me the priveleges to delete old files (and
why is that?  I am the creator of the patch; you would think
I could manage my own files).  I re-uploaded everything now.
 All files that specify they were uploaded 2002-07-17 are
the newest files.

I am terribly sorry about this whole name mix-up.  I have
now fixed test_strptime.py to use _strptime.  I completely
removed the strptime import so that the strptime testing
will go through time and thus test which ever version time
will export.

I removed the __future__ import.  And thanks for the piece
of advice; I was taking the advice that __future__
statements should come before code a little too far.  =)

As for your error, that is because the test_strptime.py you
are using is old.  I originally had a test in there that
checked to make sure the regex returned was the same as the
one being tested for; that was a bad decision.  So I went
through and removed all hard-coded tests like that. 
Unfortunately the version you ran still had that test in
there.  SF should really let patch creators delete old files.

That's it this time.  Now I await the next drama in this
never-ending saga of trying to make a non-trivial
contribution to Python.  =)
msg37981 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2002-07-18 21:47
Logged In: YES 
user_id=6380

OK, deleting all old files as promised. All tests succeed. I
think I'll check this version in (but it may be tomorrow,
since I've got a few other things to take care of).
msg37982 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2002-07-18 22:34
Logged In: YES 
user_id=357491

Wonderful!

About the docs; do you want me to email Fred or upload a
patched version of the docs for time fixed?  And for
removing the request in PEP 42, should I email Jeremy about
it or Barry?
msg37983 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2002-07-18 23:35
Logged In: YES 
user_id=357491

Since I had the time, I went ahead and did a patch for
libtime.tex that removes the comment saying that strptime
fully relies on the C library and uploaded it.
msg37984 - (view) Author: Neal Norwitz (nnorwitz) * (Python committer) Date: 2002-07-19 00:43
Logged In: YES 
user_id=33168

Brett, I'm still following.  It wasn't that bad. :-)
Guido, let me know if you want me to
do anything/check stuff in.

Docs are fine to upload here.  I can change PEP 42 also.
msg37985 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2002-07-19 17:09
Logged In: YES 
user_id=6380

Thanks!  All checked in.
msg37986 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2002-08-23 20:06
Logged In: YES 
user_id=599869

_strptime still doesn't handle case correctly.  Although it 
matches case-insensitive, the lookups for month names are 
still case sensitive (since they must match exactly the 
month names returned by time.strftime).  Allow me to 
demonstrate below.

>>> import _strptime as strptime
>>> strptime.strptime( 'Aug', '%b' )
(-1, 8, -1, -1, -1, -1, -1, -1, -1)
>>> strptime.strptime( 'AUG', '%b' )
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "C:\Program Files\Python\Lib\site-
packages\strptime.py", line 410, in strptime
    month = locale_time.a_month.index(found_dict['b'])
ValueError: list.index(x): x not in list

The same result is encountered using any variation of 'aug' 
other than 'Aug'.

I imagine the solution to this problem is as simple as 
changing lines 408, 410, 430, and 432 to something like:
    month = locale_time.a_month.index(capwords(found_dict
['b']))
and adding
from string import capwords
to the header of the file.  This is what I did in strptime v2.0.0 
and the latest v2.1.5, and it has worked well.

As a general note, I believe the test suite should also test for 
cases not automatically generated.  Many of the parse 
routines are going to be parsing times generated by systems 
other than time.strftime, to include variants such as the 
aforementioned case variant and other things like 
miscellaneous interspersed strings.

I will only suggest these changes, not implement them 
myself on this page, as I'm not intimately familiar with 
sourceforge or code customs, so I'll allow someone such as 
Guido or Brett to make the changes.

Regards,
Jason
msg37987 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2002-08-23 20:09
Logged In: YES 
user_id=6380

Reopening to address Jason's bug report.

Brett, can you provide a patch?
msg37988 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2002-08-24 03:15
Logged In: YES 
user_id=357491

Yep, I will patch it.  I will add it to patch 593560 which
has my last bugfix and code cleanup in it.
msg37989 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2002-08-26 20:20
Logged In: YES 
user_id=357491

OK, all patched and attached to the other patch entry.  I
also discovered a bug in PyUnit and got to submit a patch
for that to its patch DB.  Such fun.  =)
msg37990 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2002-09-03 18:03
Logged In: YES 
user_id=12800

Closing.  Refer to patch 593560 for further details.
msg37991 - (view) Author: Neal Norwitz (nnorwitz) * (Python committer) Date: 2002-09-03 18:24
Logged In: YES 
user_id=33168

I'm assuming Barry really meant to keep this closed. :-)
History
Date User Action Args
2022-04-10 16:04:33adminsetgithub: 35391
2001-10-23 23:15:29brett.cannoncreate