Issue539175
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2002-04-04 09:54 by dustin, last changed 2022-04-10 16:05 by admin. This issue is now closed.
Files | ||||
---|---|---|---|---|
File name | Uploaded | Description | Edit | |
resolv-bug.py | akuchling, 2006-12-21 15:13 | Test script |
Messages (11) | |||
---|---|---|---|
msg10149 - (view) | Author: dustin sallings (dustin) | Date: 2002-04-04 09:54 | |
I've got an application that does SNMP monitoring and has a thread listening with SimpleXMLRPCServer for remote control. I noticed the XMLRPC listener logging an incorrect address while snmp jobs were processing: sw1.west.spy.net - - [04/Apr/2002 01:16:37] "POST /RPC2 HTTP/1.0" 200 - localhost.west.spy.net - - [04/Apr/2002 01:16:43] "POST /RPC2 HTTP/1.0" 200 - sw1 is one of the machines that is being queried, but the XMLRPC requests are happening over localhost. gethostbyname() and gethostbyaddr() both return static data, thus they aren't reentrant. As a workaround, I copied socket.py to my working directory and added the following to it: try: import threading except ImportError, ie: sys.stderr.write(str(ie) + "\n") # mutex for DNS lookups __dns_mutex=None try: __dns_mutex=threading.Lock() except NameError: pass def __lock(): if __dns_mutex!=None: __dns_mutex.acquire() def __unlock(): if __dns_mutex!=None: __dns_mutex.release() def gethostbyaddr(addr): """Override gethostbyaddr to try to get some thread safety.""" rv=None try: __lock() rv=_socket.gethostbyaddr(addr) finally: __unlock() return rv def gethostbyname(name): """Override gethostbyname to try to get some thread safety.""" rv=None try: __lock() rv=_socket.gethostbyname(name) finally: __unlock() return rv |
|||
msg10150 - (view) | Author: Martin v. Löwis (loewis) * | Date: 2002-04-04 20:06 | |
Logged In: YES user_id=21627 I'm not sure what problem you are reporting. Python does not attempt to invoke gethostbyname from two threads simultaneously; this is prevented by the GIL. On some systems, gethostname is reentrant (in the gethostname_r incarnation); Python uses that where available, and releases the GIL before calling it. So I fail to see the bug. |
|||
msg10151 - (view) | Author: dustin sallings (dustin) | Date: 2002-04-04 21:08 | |
Logged In: YES user_id=43919 The XMLRPC request is clearly being logged as coming from my cisco switch when it was, in fact, coming from localhost. I can't find any clear documentation, but it seems that on at least some systems gethostbyname and gethostbyaddr reference the same static variable, so having separate locks for each one (as seen in socketmodule.c) doesn't help anything. It's not so much that they're not reentrant, but you can't call any combination of the two of them at the same time. Here's some test code: #include <stdio.h> #include <sys/types.h> #include <sys/socket.h> #include <netinet/in.h> #include <netdb.h> #include <assert.h> int main(int argc, char **argv) { struct hostent *byaddr, *byname; unsigned int addr; struct sockaddr *sa = (struct sockaddr *)&addr; addr=1117120483; byaddr=gethostbyaddr(sa, sizeof(addr), AF_INET); assert(byaddr); printf("byaddr: %s\n", byaddr->h_name); byname=gethostbyname("mail.west.spy.net"); assert(byname); printf("byname: %s\n", byname->h_name); printf("\nReprinting:\n\n"); printf("byaddr: %s\n", byaddr->h_name); printf("byname: %s\n", byname->h_name); } |
|||
msg10152 - (view) | Author: dustin sallings (dustin) | Date: 2002-04-04 22:21 | |
Logged In: YES user_id=43919 Looking over the code a bit more, I see that my last message wasn't entirely accurate. There does seem to be only one lock for both gethostbyname and gethostbyaddr (gethostbyname_lock is used for both). This is a pretty simple test that illustrates the problem I'm seeing. My previous work was on my OS X machine, but this is Python 2.2 (#3, Mar 6 2002, 18:30:37) [C] on irix6. #!/usr/bin/env python # # Copyright (c) 2002 Dustin Sallings <dustin@spy.net> # $Id$ import threading import socket import time class ResolveMe(threading.Thread): hosts=['propaganda.spy.net', 'bleu.west.spy.net', 'mail.west.spy.net'] def __init__(self): threading.Thread.__init__(self) self.setDaemon(1) def run(self): # Run 100 times for i in range(100): for h in self.hosts: nrv=socket.gethostbyname_ex(h) arv=socket.gethostbyaddr(nrv[2][0]) try: # Verify the hostname is correct assert(h==nrv[0]) # Verify the two hostnames match assert(nrv[0]==arv[0]) # Verify the two addresses match assert(nrv[2]==arv[2]) except AssertionError: print "Failed! Checking " + `h` + " got, " \ + `nrv` + " and " + `arv` if __name__=='__main__': for i in range(1,10): print "Starting " + `i` + " threads." threads=[] for n in range(i): rm=ResolveMe() rm.start() threads.append(rm) for t in threads: t.join() print `i` + " threads complete." time.sleep(60) The output looks like this: verde:/tmp 190> ./pytest.py Starting 1 threads. 1 threads complete. Starting 2 threads. Failed! Checking 'propaganda.spy.net' got, ('mail.west.spy.net', [], ['66.149.231.226']) and ('mail.west.spy.net', [], ['66.149.231.226']) Failed! Checking 'bleu.west.spy.net' got, ('mail.west.spy.net', [], ['66.149.231.226']) and ('mail.west.spy.net', [], ['66.149.231.226']) [...] |
|||
msg10153 - (view) | Author: Martin v. Löwis (loewis) * | Date: 2002-04-05 08:56 | |
Logged In: YES user_id=21627 Can you spot the error in the Python socket module? I still fail to see our bug, and I would assume it is a C library bug; I also cannot reproduce the problem on any of my machines. Can you please report the settings of the various HAVE_ defines for irix? |
|||
msg10154 - (view) | Author: Tim Peters (tim.peters) * | Date: 2002-04-05 21:31 | |
Logged In: YES user_id=31435 Just a reminder that the first thing to try on any SGI box is to recompile Python with optimization disabled. I can't remember the last time we had "a Python bug" on SGI that wasn't traced to a compiler -O bug. |
|||
msg10155 - (view) | Author: dustin sallings (dustin) | Date: 2002-04-05 21:44 | |
Logged In: YES user_id=43919 I first noticed this problem on my OS X box. Since it's affecting me, it's not obvious to anyone else, and I'm perfectly capable of fixing it myself, I'll try to spend some time figuring out what's going on this weekend. It seems like it might be making a decision to not use the lock at compile time. I will investigate further and submit a patch. |
|||
msg10156 - (view) | Author: Neal Norwitz (nnorwitz) * | Date: 2002-08-11 15:04 | |
Logged In: YES user_id=33168 Dustin, any progress on a patch or diagnosing this further? |
|||
msg10157 - (view) | Author: dustin sallings (dustin) | Date: 2002-08-11 19:27 | |
Logged In: YES user_id=43919 No, unfortunately, I haven't been able to look at it in a while. Short of locking it in python, I wasn't able to avoid the failure. I'm sorry I haven't updated this at all. As far as I can tell, it's still a problem, but I haven't not been able to find a solution in the C code. I supposely I spoke with too much haste when I said I was perfectly capable of fixing the problem myself. The locking in the C code did seem correct, but the memory was still getting stomped. |
|||
msg10158 - (view) | Author: A.M. Kuchling (akuchling) * | Date: 2006-12-21 15:13 | |
Attaching the test script. The script now fails because some of the spy.net addresses are resolved to hostnames such as adsl-69-230-8-158.dsl.pltn13.pacbell.net. When I changed the test script to use python.org machine names and ran it with Python 2.5 on Linux, no errors were reported. Does this still fail on current OS X? If not, I suggest calling this a platform C library bug and closing this report. File Added: resolv-bug.py |
|||
msg10159 - (view) | Author: dustin sallings (dustin) | Date: 2006-12-22 04:24 | |
I'll go ahead and close it. It does not fail under 2.4 on any of my machines (tried OS X/intel, PPC G3, and FreeBSD/intel). |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-10 16:05:11 | admin | set | github: 36379 |
2002-04-04 09:54:15 | dustin | create |