Issue874842
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2004-01-11 11:16 by zwoop, last changed 2022-04-11 14:56 by admin. This issue is now closed.
Files | ||||
---|---|---|---|---|
File name | Uploaded | Description | Edit | |
httplib.diff | zwoop, 2004-01-11 11:16 | Patch for httplib.py and Akamai URLs |
Messages (15) | |||
---|---|---|---|
msg19613 - (view) | Author: Leif Hedstrom (zwoop) | Date: 2004-01-11 11:16 | |
Using Python 2.3.2 and httplib, reading from Akamai URLs will always hang at the end of the transacation. As common as this must be, I couldn't find anything related to it on any search engines, nor on the bug list here. The problem is that Akamai returns an HTTP/1.0 response, with a header like: Connection: keep-alive httplib does not recognize this response properly (the Connection: header parsing is only done for HTTP/1.1 responses). I'm not sure exactly what the right solution is, but I'm supplying one alternative solution that does solve the problem. I'm attaching a diff against httplib.py. |
|||
msg19614 - (view) | Author: Leif Hedstrom (zwoop) | Date: 2004-01-11 19:37 | |
Logged In: YES user_id=480913 Oh, I forgot, this is easiest reproduced by simple requesting the URL http://www.akamai.com/ Fortunately they Akamai their home page as well. :-) |
|||
msg19615 - (view) | Author: Guido van Rossum (gvanrossum) * | Date: 2004-04-12 19:36 | |
Logged In: YES user_id=6380 Can you give a complete program that reproduces this? I've tried this: >>> import urllib >>> urllib.urlopen("http://www.akamai.com").read() and it doesn't hang for me. I tried a number of Python versions from 2.2 through 2.4a0. |
|||
msg19616 - (view) | Author: Leif Hedstrom (zwoop) | Date: 2004-04-12 20:13 | |
Logged In: YES user_id=480913 Yeah, that works for me to. But the problem is in the HTTPResponse class from the httplib.py module. For example, this code (butchered from my application) will hang on Akamai URLs: #!/usr/bin/python import httplib def testHTTPlib(host, url): http = httplib.HTTPConnection(host) try: http.request('GET', url) response = http.getresponse() except IOError: self._log.warning("Can't connect to %s", url) return False except socket.error: self._log.error("Socket error retrieving %s", url) return False except socket.timeout: self._log.warning("Timeout connecting to %s", url) return False else: try: data = response.read() return True except socket.timeout: self._log.warning("Timeout reading from %s", url) return False return False print testHTTPlib("www.ogre.com", "/") print testHTTPlib("www.akamai.com", "/") Granted, I think Akamai aren't strictly following the protocols, but it's inconvenient that this piece of code stalls here (and only for akamai.com domains, I've tried a lot of them). Thanks! -- Leif |
|||
msg19617 - (view) | Author: Guido van Rossum (gvanrossum) * | Date: 2004-04-12 20:32 | |
Logged In: YES user_id=6380 Hmm... Indeed. read() checks will_close and apparently setting that to False will do the right thing. I don't know HTTP and this code well enough to approve this fix though. Also, the comment right above your patch should probably be fixed; it claims that connection headers on HTTP/1.0 are due to confused proxies. (Maybe that's what Akamai servers are? :-) |
|||
msg19618 - (view) | Author: Leif Hedstrom (zwoop) | Date: 2004-04-12 20:54 | |
Logged In: YES user_id=480913 Heh, yeah, I'm pretty sure that's the problem, Akamai being confused about protocols. They claim to be a v1.0 HTTP proxy, yet they use v1.1 HTTP headers :-/. This is why I mentioned I wasn't sure exactly what the right solution is. And no matter what we do, it'll be a hack. Maybe the original author of the module has some insight ? Unfortunately, there's a lot of Akamai content out there that are affected by this. Cheers, -- Leif |
|||
msg19619 - (view) | Author: Jeremy Hylton (jhylton) | Date: 2004-04-15 21:59 | |
Logged In: YES user_id=31392 Looks good to me. I want to see if I can come up with a simple test module for httplib with the network resource enabled. I'll see if I can do that tonight. |
|||
msg19620 - (view) | Author: Greg Stein (gstein) * | Date: 2004-04-19 22:26 | |
Logged In: YES user_id=6501 I have a philosophical problem with compensating for servers that obviously break protocols. The server should be fixed, not *every* client on the planet. From that standpoint, this problem/fix should be rejected, though I defer to Guido on that choice. That said, the comment right above the patch should be fixed. The whole point of that comment is, "the header shouldn't be there, so we shouldn't bother to examine the thing." Obviously, the new code does, so the two comments should be merged. The comment about Akamai should also be strengthened to note that it is violating the HTTP protocol (see section 8.1.2.1 of RFC 2616). Summary: I'd reject it, but will leave that to Guido to choose (i.e. "we'll help users even tho it violates protocols"). If he wants it, then +1 if the comments are fixed up. |
|||
msg19621 - (view) | Author: Guido van Rossum (gvanrossum) * | Date: 2004-04-19 22:32 | |
Logged In: YES user_id=6380 I won't reject the patch on that basis. Like HTML, it's more useful to be able to handle what we see in the real world than to stick to the standard. Clearly the OP needs to be able to access Akamai servers. He doesn't have the power to fix the Akamai servers,so saying "the server is wrong" doesn't do him any good. (The comment should stateclearly that Akamai *is* wrong though!) Or do you have a different suggestion for how the poster can work around the problem? |
|||
msg19622 - (view) | Author: Leif Hedstrom (zwoop) | Date: 2004-04-20 00:57 | |
Logged In: YES user_id=480913 As I said, no matter what we do, it's a hack on something that's broken on the web (now there's a shocker :-). I don't feel terribly strongly on this issue, I merely filed the bug report because I had this problem, and it took me several hours to figure out why my daemon would stall on Akamai URLs. I'm guessing other users of httplib.py might run into the same problem. As for the patch, the comments would of course have to change, I didn't want to impose more changes in the diff than necessary. Besides the suggested patch, an alternative solution is to provide a specialized implementation of the HTTPResponse class, which works with Akamai. The users of the httplib.py module would then have to explicitly request that httplib.HTTPConnection should instantiate that class instead of the default one. Preferably this would be passed as a new argument to the constructor for HTTPConnection. And I agree that it's a hack to have to code around poor server implementations. But not sure what our odds are to get Akamai to fix their servers any time soon, since pretty much any web browser in existance works with their broken implementation. Cheers, -- leif |
|||
msg19623 - (view) | Author: Greg Stein (gstein) * | Date: 2004-04-20 06:44 | |
Logged In: YES user_id=6501 Falling into line with "oh, but they won't change it" is why we end up with a whole bunch of bad implementations. If everybody said that, then we wouldn't get anywhere. A long while back, AOL came out with a busted proxy implementation which didn't work with Apache servers. The ASF said, "sorry AOL: you're wrong. fix your proxies." And they did. If we put a hack in for every busted thing that came out over the next ten years, then imagine the craphole we'd be in... :-) That said: yes, you can workaround the issue with a subclass of HTTPResponse which overrides the _check_close() method. You can then create an HTTPConnection subclass which overrides the class variable 'response_class', or you can set that field in an HTTPConnection instance as an instance variable. For example: conn = HTTPConnection(...) conn.response_class = AkamaiBugHandler When the response arrives, the HTTPConnection class uses self.response_class, so there are a few options to get your custom response class into the chain of events. |
|||
msg19624 - (view) | Author: Leif Hedstrom (zwoop) | Date: 2004-04-20 15:36 | |
Logged In: YES user_id=480913 Yeah, the second solution is what I ended up doing, although it's definitely not obvious for anyone using httplib.py that this is required to support Akamai (see my original blog post at http://www.ogre.com/tiki-view_blog_post.php?blogId=3&postId=30 for both alternative solutions). At a minimum, I think we should provide the AkamaiHTTPResponse class in one way or another, and clearly document that this is required for correct support of Akamai URLs. My vote would probably be to "hack" the original HTTPResponse class, since anyone using HTTPlib for anything that might hit Akamai (perhaps as a referral/redirect) will have to use the fixed version anyways. Unfortunately I don't have any contacts left at Akamai, so I'm not sure how to inform them of their problems. I completely agree that we need to inform them about this problem, my point was that since Akamai works with pretty much everything else (browsers, other modules etc.), I think it'll be quite slow to get them to change. And until then, we're stuck with a module that is effectively semi-broken. Thanks, -- Leif |
|||
msg19625 - (view) | Author: Guido van Rossum (gvanrossum) * | Date: 2004-04-20 16:53 | |
Logged In: YES user_id=6380 It's great that the ASF can wield such power over the likes of AOL. But I don't want to presume the same for Python (we're not the #1 web language, not even #2). I'd be more concerned if adding this hack would *break* anything, but that doesn't seem to be the case. So I still think Jeremy can check it in. |
|||
msg19626 - (view) | Author: Leif Hedstrom (zwoop) | Date: 2004-05-13 17:15 | |
Logged In: YES user_id=480913 Hate to beat on a dead horse here, but what was the final outcome of this discussion? Anything I can do to help produce a better patch, documentation or anything? |
|||
msg19627 - (view) | Author: Guido van Rossum (gvanrossum) * | Date: 2004-05-14 00:29 | |
Logged In: YES user_id=6380 Looks like nothing happened in CVS... :-( It's too late for 2.3.4 now, Anthony issued the release candidate already. There will be a 2.3.5 though. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:56:02 | admin | set | github: 39801 |
2004-01-11 11:16:19 | zwoop | create |