Issue545410
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2002-04-17 23:15 by ggerrietts, last changed 2022-04-10 16:05 by admin. This issue is now closed.
Files | ||||
---|---|---|---|---|
File name | Uploaded | Description | Edit | |
secondcore.backtrace | ggerrietts, 2002-04-17 23:15 | corefile traceback |
Messages (7) | |||
---|---|---|---|
msg10390 - (view) | Author: Geoff Gerrietts (ggerrietts) | Date: 2002-04-17 23:15 | |
I regularly get a corefile out of Zope 2.5.0, running on RedHat 6.2 and Python 2.1.3, usually within 5 or 6 page views. Reproducing the problem requires (for me) starting up Zope, going to the management interface, and bouncing around on a few of the different objects. Sometimes the first attempt to render the page will cause a crash, but sometimes it takes a few clicks. After the crash, Zope dumps core and politely restarts itself. Traceback files are largely the same from one crash to the next, with the only variation I've noted being the addresses of variables -- this fits with the fact that it takes a different number of steps each time. Traceback files (infuriatingly enough) do not show line number information for select.so, even though selectmodule.o was compiled with -g specified. Examination of the traceback shows that in stack frame 2, print ((PyCFunctionObject *)func) -> m_ml -> ml_name reveals "select". In that same frame, print ((PyObject *)arg) -> ob_type -> tp_name yields "Cannot access memory at address 0x1f". One traceback has been provided. Others, and other info, available on request. |
|||
msg10391 - (view) | Author: Martin v. Löwis (loewis) * | Date: 2002-04-18 05:44 | |
Logged In: YES user_id=21627 In that backtrace, it is clear that arg in stackframe 2 is bogus: 0x405d7ffc points into an area that appears to be used for shared libraries (please do "info shared" to support this theory). Now, arg presumably is the return value of load_args, where it was created through PyTuple_New. This suggests that the memory management got corrupted; something that likely happened much earlier. I recommend to set MALLOC_CHECK_ before starting Python. The documentation in malloc(3) says that setting it to 2 will cause an abort() when an error is found; from expecting the implementation, it appears that setting it to 3 will combine the debug traces printed and the call to abort; please experiment with all three values (1,2,3). |
|||
msg10392 - (view) | Author: Jeremy Hylton (jhylton) | Date: 2002-04-19 14:50 | |
Logged In: YES user_id=31392 Are you using any third party Python extension modules with Zope? It may be a memory corruption problem in an extension. If you are, you should try to reproduce the problem with *only* Zope's core extensions and Python 2.1.3. |
|||
msg10393 - (view) | Author: Geoff Gerrietts (ggerrietts) | Date: 2002-04-25 00:47 | |
Logged In: YES user_id=66989 MALLOC_CHECK_ didn't do anything different for me. I don't know what that means exactly, because it definitely SEEMS like there's some bad memory management going on. Maybe it's just not bad ENOUGH. The arg in stackframe 2 is actually NOT in the shared library space -- the highest shared library address is actually in the 0x404xxxx range. I'm not sure how the programs are linked/run, so I'm not sure what 0x405d7ffc would likely be if it's not shared libraries, though. I've been looking into (and continue to look into) the possibility Guido suggested on python-dev, that this is an overrun of a thread's stack. It seems unlikely, given that stack threads are allotted 2MB under Linux, but I suppose anything's possible. The Data.fs in use in this core is 180MB; when compressed, it shrinks to about 90MB. It's a big site, so overruns are a lot more possible here than in other places. I'm going to take a pass at pulling out all the 3rd-party extension modules, but that's going to be very challenging. A significant part of the architecture (roughly half, including authentication) is accessible only through a 3rd party extension, ILU. My independent testing with ILU indicates that ILU works acceptably and doesn't cause memory corruption. Because it's so ingrained into the site, I'm going to look at doing more thorough testing and more of it, and I'll try to isolate it so I can test around it, but I don't think it's reasonable to try to replace it at this point in time; the engineering effort would be several weeks. There are (at least one) other 3rd party module(s) that I'll be able to rip out or fake up more easily. I think this is more of a progress report than anything else. Thanks for the eyes. |
|||
msg10394 - (view) | Author: Jeremy Hylton (jhylton) | Date: 2002-07-25 16:11 | |
Logged In: YES user_id=31392 Are you still having this problem? If so, I would recommend trying to reproduce the crash with Zope 2.6 alpha 1 and a Python 2.1.3 or 2.2.1 debug build (configure --with-pydebug). |
|||
msg10395 - (view) | Author: Geoff Gerrietts (ggerrietts) | Date: 2002-07-25 17:59 | |
Logged In: YES user_id=66989 I'm embarrassed to note that I forgot to close this bug when the resolution was discovered. It was, as many suggested, a defect in a third party extension -- in this case, a very old bug in ILU. Linux threading was making it look like the problem was occurring elsewhere. Hoping to submit that patch back against ILU very soon. |
|||
msg10396 - (view) | Author: Neal Norwitz (nnorwitz) * | Date: 2002-11-02 02:47 | |
Logged In: YES user_id=33168 Closing as Geoff says it's not a python bug. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-10 16:05:14 | admin | set | github: 36454 |
2002-04-17 23:15:14 | ggerrietts | create |