Issue857676
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2003-12-10 16:21 by fdborges, last changed 2022-04-11 14:56 by admin. This issue is now closed.
Messages (6) | |||
---|---|---|---|
msg19361 - (view) | Author: Francisco Dellatorre Borges (fdborges) | Date: 2003-12-10 16:21 | |
I lot's of lines with the format: (d+@w[^|]+|)+ I'll call the last bit after the @ of /features/. I need to delete some of these, so I have this code that would produce a list matching what I would /not/ use, pass it over re.escape and them build a re using a concatenation of the list and delete this from the text before actually doing any parsing. Problem is: I have about 220.000 different features and I need to delete some 200.000 different ones from my files before doing something. So I tried to use a list of the 20.000 I want and then delete anything that matches the <not> of it: #--------------- # ftrlist is the stuff I *want* to keep: ftrlist = [re.escape(i) for i in ftrlist ] re.compile(r'(?!(%s))' %( '|'.join(ftrlist)) ) #-------------- but when I apply it I get something like: RuntimeError: internal error in regular expression engine I tried the same thing but with a smaller number of elements, say 1000 ftrlist[:1000], and then it worked. So I guess there is a bug on the scalability of the re engine when doing alternative searches. Attached I'm sending a tar ball that reproduces this. I'm gzipping it (hope sourceforge does not have a problem with the resulting binary file). |
|||
msg19362 - (view) | Author: Francisco Dellatorre Borges (fdborges) | Date: 2003-12-10 16:24 | |
Logged In: YES user_id=576554 Problem with the file name the first time. Here is the file. |
|||
msg19363 - (view) | Author: Francisco Dellatorre Borges (fdborges) | Date: 2003-12-10 16:31 | |
Logged In: YES user_id=576554 apparently sf does not want me to load tar balls... or I'm too dumb to do it. |
|||
msg19364 - (view) | Author: Francisco Dellatorre Borges (fdborges) | Date: 2003-12-10 16:35 | |
Logged In: YES user_id=576554 Ok, I give up on loading the file the size is restricted as one might have expected. If you want download the tar ball that would reproduce the error from www.let.rug.nl/~borges/ScalabilityREBug.tar.gz |
|||
msg19365 - (view) | Author: Fredrik Lundh (effbot) * | Date: 2003-12-11 13:39 | |
Logged In: YES user_id=38376 The SRE engine is usually compiled with 16-bit opcodes, which places a limit on the size of the compiled engine. Changing that to 32 bits might solve your problem... But in this case, I don't think one huge RE is really the best tool for the task; it's as easy (and much more efficient) to use an RE to *extract* the feature code, and use a dictionary to keep track of the ones you want to keep. (example: use re.sub("d+@(w[^|]+)|", callback, data) and let the callback look m.group(1) up in the dictionary; return m.group(0) if you want to keep the feature, "" if you don't). Hope this helps! </F> |
|||
msg19366 - (view) | Author: Fredrik Lundh (effbot) * | Date: 2004-09-09 08:04 | |
Logged In: YES user_id=38376 Alternative solution proposed; no response from submitter in ten months. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:56:01 | admin | set | github: 39689 |
2003-12-10 16:21:55 | fdborges | create |