This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: urllib2 doesn't do HTTP-EQUIV & Refresh
Type: Stage:
Components: Library (Lib) Versions:
process
Status: closed Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: jjlee, loewis
Priority: normal Keywords:

Created on 2002-10-21 20:57 by jjlee, last changed 2022-04-10 16:05 by admin. This issue is now closed.

Messages (6)
msg12889 - (view) Author: John J Lee (jjlee) Date: 2002-10-21 20:57
I just added support for HTML's META HTTP-EQUIV and
zero-time Refresh HTTP headers to my 'ClientCookie'
package (which exports essentially a clone of the
urllib2 interface that knows about cookies, making use
of urllib2 in the implementation).  I didn't make a
patch for urllib2 itself but it would be easy to do so.
I don't plan to do this immediately, but will
eventually (assuming Jeremy thinks it's advisible) -- I
just wanted to register this fact to prevent
duplication of effort.

[BTW, this version of ClientCookie isn't on my web page
yet -- my motherboard just died.]

I'm sure you know this already, but: HTTP-EQUIV is just
a way of putting headers in the HEAD section of an HTML
document; Refresh is a Netscape 1.1 header that
indicates that a browser should redirect after a
specified time.  Refresh headers with zero time act
like redirections.

The net result of the code I just wrote is that if you
urlopen a URL that points to an HTML document like
this:

<HTML><HEAD>
<META HTTP-EQUIV="Refresh" CONTENT="0; 
URL=http://acme.com/new_url.htm">
</HEAD></HTML>

you're automatically redirected to
"http://acme.com/new_url.htm".  Same thing happens if
the Refresh is in the HTTP headers, because all the
HTTP-EQUIV headers are treated like real HTTP headers.
Refresh with non-zero delay time is ignored (the
urlopen returns the document body unchanged and does
not redirect, but does still add the Refresh header to
the HTTP headers).

A few issues:

0) AFAIK, the Refresh header is not specified in any
RFC, but only here:

http://wp.netscape.com/assist/net_sites/pushpull.html

(HTTP-EQUIV seems to be in the HTML 4.0 standard, maybe
earlier ones too)

1) Infinite loops should be detected, as for HTTP 30x?
   Presumably yes.

2) Should add HTTP-EQUIV headers to response object, or
   just treat them like headers internally?  Perhaps it
   should be possible to get both behaviours?

3) Bug in my implementation: is greedy with reading
   body data from httplib's file object.


John
msg12890 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2002-10-23 13:54
Logged In: YES 
user_id=21627

In addition to the issues you have mentioned, there is also 
the backwards compatibility issue: Some applications might 
expect to get a meta-refresh document from urllib, then parse 
it and retry themselves. Those applications would break with 
such a change.
msg12891 - (view) Author: John J Lee (jjlee) Date: 2002-10-23 23:20
Logged In: YES 
user_id=261020

What do you think the solution to the backwards-
compatibility problem is?  Leave urllib2 as-is?  Add a
switch to turn it on?  Something else?

At the moment, I just deal with it in AbstractHTTPHandler.
It would be nice to treat it like the other redirections, by
writing a RefreshHandler -- this would solve the backwards-
compatibility issue.  However, OpenerDirector.error always
calls http_error_xxx ATM (where xxx is the HTTP error code),
so without changing that, I don't think a RefreshHandler is
really possible.  I suppose the sensible solution is just to
make a new HTTPHandler and HTTPSHandler?

Can you think of any way in which supporting HTTP-EQUIV
would mess up backwards compatibility, assuming the body is
unchanged but the headers do have the HTTP-EQUIV headers
added?


John
msg12892 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2002-10-26 13:30
Logged In: YES 
user_id=21627

I would try to subclass HTTPHandler, and then provide a
build_opener wrapper that installs this handler instead of
the normal http handler (the latter is optional, since the
user could just do build_opener(HTTPRefreshHandler)).
msg12893 - (view) Author: John J Lee (jjlee) Date: 2003-10-29 23:27
Logged In: YES 
user_id=261020

Just an update: 
 
- this could now be implemented as a handler (and already is, 
in my ClientCookie package) using RFE 759792, rather than 
having to be mixed in with HTTPHandler 
 
- the issues I listed in my initial comment, and the 
backwards-compatibility issue raised by MvL are now 
resolved 
 
- it needs reimplementing using HTMLParser (currently uses 
htmllib) if it's to go in the standard library; I plan to do this in 
time for 2.4 
msg12894 - (view) Author: John J Lee (jjlee) Date: 2006-02-01 20:31
Logged In: YES 
user_id=261020

Closing since I no longer intend to contribute this.

(I don't want to get involved with HTML parsing in the stdlib!)
History
Date User Action Args
2022-04-10 16:05:46adminsetgithub: 37353
2002-10-21 20:57:55jjleecreate