This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Feature request: Comparing regexps
Type: enhancement Stage:
Components: Library (Lib) Versions:
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: niemeyer Nosy List: niemeyer, rhettinger, thomasda
Priority: normal Keywords:

Created on 2007-04-16 12:28 by thomasda, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (5)
msg55057 - (view) Author: Thomas Dybdahl Ahle (thomasda) Date: 2007-04-16 12:28
It would be very nice with a function in the re module to compare two regular expressions and see how they overlap.
Return value would perhaps be an a constant like DISJOINT, SUBSET etc.
msg55058 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2007-04-16 20:43
Can this be done in the existing implementation by comparing the racetrack diagrams (character transitions)?

Thomas, do you know of anywhere this have been done (third-party modules or in other languages)?
msg55059 - (view) Author: Thomas Dybdahl Ahle (thomasda) Date: 2007-04-17 07:51
I found this page with the function to do so: http://terpstra.ca/compare.html

I also found this thread: http://bumppo.net/lists/fun-with-perl/1999/09/msg00000.html which discusses how to do this with some canonical formed expressions. 

A function like this would really be able to speed some applications up, which matches a lot of strings with a number of expressions. If you know which ones are disjoint, you can break the test when one test matches.
msg55060 - (view) Author: Thomas Dybdahl Ahle (thomasda) Date: 2007-04-18 14:33
I talked with the guy who wrote the ZZ comparator.

"""I can give you the source code under the GPL if you like. However, I  
think it would be difficult to port to python. It's 1100 lines of  
very SML-style mostly-uncommented code. Do you know SML?

It's available at svn://mlton.org/mltonlib/trunk/ca/terpstra/regexp.  
I would expect credit for the algorithm. :-) Let me know if you  
succeed in porting it."""

I don't know any SML though.
If anybody does or can write a python equaliant of the algorithm:

"""1. Parse the regular expression (in GNU regular expression syntax)
2. Convert that parse tree into an NFA
3. Convert the NFA into a DFA by an algorithm I invented (it's why I
wrote this program; I thought of the algorithm and found it amusing
to see it in action)

For comparing regular expressions I take the following additional  
steps
4. Combine the two DFA's (with the cross product)
4a. For finding common words, I intersect them
4b. For finding words in A, but not B, I intersect A with the negated
DFA of B
4c. ...
5. Find the minimal DFA corresponding to this intersection
6. Run a weighted depth-first search (similar to Dijkstra's) to find
the least-weighted path from the initial state to an accepting state
(the weighting makes it prefer human readable characters in the
examples)"""

It could be cool.
Otherwise I can also try to learn sml and port it.
msg55061 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2007-04-19 06:44
Am closing the feature request because it has very little chance of making it into the core distribution without being a third-party module first.  The existing implementation is already somewhat difficult to maintain and we don't want to make that worse.  Also, the new functionality is of interest to only a small subset of the re module users.

If you do write it, respect the GPL license on the SML code and take some care to make sure you can license your new code in any way you see fit.

History
Date User Action Args
2022-04-11 14:56:23adminsetgithub: 44856
2007-04-16 12:28:16thomasdacreate