This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Wrong match with regex, non-greedy problem
Type: Stage:
Components: Regular Expressions Versions: Python 2.4
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: niemeyer Nosy List: effbot, engel_re, niemeyer
Priority: normal Keywords:

Created on 2005-02-05 00:12 by engel_re, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (2)
msg24162 - (view) Author: rengel (engel_re) Date: 2005-02-05 00:12
# This is executable.
# My test string ist rather long:
tst = "In this <c:noun:ns>Buch</c:noun>, used to
designate <c:noun:np>Dinge der Wirklichkeit</c:noun>
rather than <c:noun:fs>SW</c:noun>
<c:noun:ns>Ent</c:noun>."

# I want to match the last part of the string:
# <c:noun:fs>SW</c:noun> <c:noun:ns>Ent</c:noun>
# So I define the following pattern an compile it:
pat = r"<c:noun:(.*?)>(.*?)</c:noun>
<c:noun:(.*?)>(.*?)</c:noun>"
rex = re.compile(pat)

# Then I search the string to get a match group :
mat = rex.search(tst)
# If found, print the group
if mat: print mat.group()

# Instead of 
# <c:noun:fs>SW</c:noun> <c:noun:ns>Ent</c:noun>
# I get the whole string starting with 
# <c:noun:ns>Buch</c:noun>...
# up to the very last </c:noun>
# Apparently the non-greedy operator doesn't work
correctly.
# What's wrong?

msg24163 - (view) Author: Fredrik Lundh (effbot) * (Python committer) Date: 2005-02-08 08:27
Logged In: YES 
user_id=38376

Search returns the first (left-most) location where the 
pattern matches, if any.  The non-greedy operator only 
guarantees that you get the shortest possible match at that 
location.
History
Date User Action Args
2022-04-11 14:56:09adminsetgithub: 41526
2005-02-05 00:12:52engel_recreate