Issue626543
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2002-10-21 20:57 by jjlee, last changed 2022-04-10 16:05 by admin. This issue is now closed.
Messages (6) | |||
---|---|---|---|
msg12889 - (view) | Author: John J Lee (jjlee) | Date: 2002-10-21 20:57 | |
I just added support for HTML's META HTTP-EQUIV and zero-time Refresh HTTP headers to my 'ClientCookie' package (which exports essentially a clone of the urllib2 interface that knows about cookies, making use of urllib2 in the implementation). I didn't make a patch for urllib2 itself but it would be easy to do so. I don't plan to do this immediately, but will eventually (assuming Jeremy thinks it's advisible) -- I just wanted to register this fact to prevent duplication of effort. [BTW, this version of ClientCookie isn't on my web page yet -- my motherboard just died.] I'm sure you know this already, but: HTTP-EQUIV is just a way of putting headers in the HEAD section of an HTML document; Refresh is a Netscape 1.1 header that indicates that a browser should redirect after a specified time. Refresh headers with zero time act like redirections. The net result of the code I just wrote is that if you urlopen a URL that points to an HTML document like this: <HTML><HEAD> <META HTTP-EQUIV="Refresh" CONTENT="0; URL=http://acme.com/new_url.htm"> </HEAD></HTML> you're automatically redirected to "http://acme.com/new_url.htm". Same thing happens if the Refresh is in the HTTP headers, because all the HTTP-EQUIV headers are treated like real HTTP headers. Refresh with non-zero delay time is ignored (the urlopen returns the document body unchanged and does not redirect, but does still add the Refresh header to the HTTP headers). A few issues: 0) AFAIK, the Refresh header is not specified in any RFC, but only here: http://wp.netscape.com/assist/net_sites/pushpull.html (HTTP-EQUIV seems to be in the HTML 4.0 standard, maybe earlier ones too) 1) Infinite loops should be detected, as for HTTP 30x? Presumably yes. 2) Should add HTTP-EQUIV headers to response object, or just treat them like headers internally? Perhaps it should be possible to get both behaviours? 3) Bug in my implementation: is greedy with reading body data from httplib's file object. John |
|||
msg12890 - (view) | Author: Martin v. Löwis (loewis) * | Date: 2002-10-23 13:54 | |
Logged In: YES user_id=21627 In addition to the issues you have mentioned, there is also the backwards compatibility issue: Some applications might expect to get a meta-refresh document from urllib, then parse it and retry themselves. Those applications would break with such a change. |
|||
msg12891 - (view) | Author: John J Lee (jjlee) | Date: 2002-10-23 23:20 | |
Logged In: YES user_id=261020 What do you think the solution to the backwards- compatibility problem is? Leave urllib2 as-is? Add a switch to turn it on? Something else? At the moment, I just deal with it in AbstractHTTPHandler. It would be nice to treat it like the other redirections, by writing a RefreshHandler -- this would solve the backwards- compatibility issue. However, OpenerDirector.error always calls http_error_xxx ATM (where xxx is the HTTP error code), so without changing that, I don't think a RefreshHandler is really possible. I suppose the sensible solution is just to make a new HTTPHandler and HTTPSHandler? Can you think of any way in which supporting HTTP-EQUIV would mess up backwards compatibility, assuming the body is unchanged but the headers do have the HTTP-EQUIV headers added? John |
|||
msg12892 - (view) | Author: Martin v. Löwis (loewis) * | Date: 2002-10-26 13:30 | |
Logged In: YES user_id=21627 I would try to subclass HTTPHandler, and then provide a build_opener wrapper that installs this handler instead of the normal http handler (the latter is optional, since the user could just do build_opener(HTTPRefreshHandler)). |
|||
msg12893 - (view) | Author: John J Lee (jjlee) | Date: 2003-10-29 23:27 | |
Logged In: YES user_id=261020 Just an update: - this could now be implemented as a handler (and already is, in my ClientCookie package) using RFE 759792, rather than having to be mixed in with HTTPHandler - the issues I listed in my initial comment, and the backwards-compatibility issue raised by MvL are now resolved - it needs reimplementing using HTMLParser (currently uses htmllib) if it's to go in the standard library; I plan to do this in time for 2.4 |
|||
msg12894 - (view) | Author: John J Lee (jjlee) | Date: 2006-02-01 20:31 | |
Logged In: YES user_id=261020 Closing since I no longer intend to contribute this. (I don't want to get involved with HTML parsing in the stdlib!) |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-10 16:05:46 | admin | set | github: 37353 |
2002-10-21 20:57:55 | jjlee | create |