Issue429357
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2001-06-01 16:29 by donut, last changed 2022-04-10 16:04 by admin. This issue is now closed.
Files | ||||
---|---|---|---|---|
File name | Uploaded | Description | Edit | |
python-sre-429357.patch | donut, 2001-10-05 04:16 | patch |
Messages (6) | |||
---|---|---|---|
msg4937 - (view) | Author: Matthew Mueller (donut) | Date: 2001-06-01 16:29 | |
I found some weird bug, where when a non-greedy match doesn't match anything, it will duplicate the rest of the string instead of being None. #pyrebug.py: import re urlrebug=re.compile(""" (.*?):// #scheme ( (.*?) #user (?: :(.*) #pass )? @)? (.*?) #addr (?::([0-9]+))? #port (/.*)?$ #path """, re.VERBOSE) testbad='foo://bah:81/pth' print urlrebug.match(testbad).groups() Bug Output: >python2.1 pyrebug.py ('foo', None, 'bah:81/pth', None, 'bah', '81', '/pth') >python-cvs pyrebug.py ('foo', None, 'bah:81/pth', None, 'bah', '81', '/pth') Good (expected) Output: >python1.5 pyrebug.py ('foo', None, None, None, 'bah', '81', '/pth') |
|||
msg4938 - (view) | Author: Nobody/Anonymous (nobody) | Date: 2001-06-13 17:12 | |
Logged In: NO What's happening makes sense, on one level. When the regex engine gets to the user:pass@ part ((.*?)(?::(.*))?@)? which fill groups 2, 3, and 4, the .*? of group 3 has to try at every character in the rest of the string before admitting overall defeat. In doing that, the last time that group 3 successfully completely locally, it has the rest of the string matched. Of course, overall, group three is enclosed within group 2, and when group two couldn't complete successfully, the engine knows it can skip group two (due to the ? modifying it), so it totally bails on groups 2, 3 and 4 to continue with the rest of the expression. What you'd like to happen is when that "bailing" happens for group 2, the enclosing groups 3 and 4 would get zereoed out (since they didn't participate in the *final* overall match). That makes sense, and is what I would expect to happen. However, what *is* happening is that group 3 is keeping the string that *it* last matched (even thought that last match didn't contribute to the final, overall match). I'm not explaining this well -- I hope you can understand it despite that. Sorry. Jeffrey |
|||
msg4939 - (view) | Author: Matthew Mueller (donut) | Date: 2001-06-14 07:59 | |
Logged In: YES user_id=65253 I think I understand what you are saying, and in the context of the test, it doesn't seem too bad. BUT, my original code (and what I'd like to have) did not have the surrounding group. So I'd just get: ('foo', 'bah:81/pth', None, 'bah', '81', '/pth') Knowing the general ease of messing up regexs when writing them, I'm sure you can image the pain I went through before actually realizing it was a python bug :) |
|||
msg4940 - (view) | Author: Gregory Smith (gregsmith) | Date: 2001-08-30 16:26 | |
Logged In: YES user_id=292741 This looks like the same bug I have reported (with a much simpler example) as #448951 (missed this one before because I was looking for 'group'). What I found is consistent with Jeffrey's comments - if you have a situation where an optional part is fully scanned before the state machine can tell if it should actually be matched, the contained tentative match(es) are stored in the group() even if the optional part turns out to fail. Presumably, such a case needs to be handled by going back and deleting these after the s.m. determines that the optional part was not matched. In my example, I mention a small modification to the test case where the failure of the optional ? is decided one character later (at the end of the () group, not beyond it); this is enough to make it start working again. |
|||
msg4941 - (view) | Author: Matthew Mueller (donut) | Date: 2001-10-05 04:16 | |
Logged In: YES user_id=65253 Ok, after poking and prodding the _sre.c code a bunch until I (hopefully) understand what is happening, I've created a patch. It passes all existing re tests as well as new ones I added for this bug. (I've also made a patch for the similar, but seperate, bug #448951 which I will post there shortly.) |
|||
msg4942 - (view) | Author: Gustavo Niemeyer (niemeyer) * | Date: 2002-11-06 16:39 | |
Logged In: YES user_id=7887 This problem was fixed in the following CVS revisions: Lib/test/re_tests.py:1.30->1.31 Lib/test/test_sre.py:1.37->1.38 Misc/NEWS:1.511->1.512 Modules/_sre.c:2.83->2.84 Thank you! |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-10 16:04:05 | admin | set | github: 34572 |
2001-06-01 16:29:19 | donut | create |