This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: urllib2 blocked from news.google.com
Type: Stage:
Components: Library (Lib) Versions: Python 2.4
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: Nosy List: asyncster, brett.cannon, mwh
Priority: normal Keywords:

Created on 2005-11-07 06:31 by asyncster, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (4)
msg26807 - (view) Author: Michael Hoisie (asyncster) Date: 2005-11-07 06:31
It seems that google is blocking requests from clients
with urllib 2.4 as the user-agent. If you telnet to
news.google.com and type: 

GET / HTTP/1.1
Host: news.google.com
User-agent: Python-urllib/2.4

You get a HTTP/1.1 403 Forbidden
msg26808 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2005-11-07 07:21
Logged In: YES 
user_id=357491

I can verify this using urllib.urlretrieve() from the trunk.
msg26809 - (view) Author: Michael Hudson (mwh) (Python committer) Date: 2005-11-07 14:38
Logged In: YES 
user_id=6656

In what crazy universe is this a Python bug?  It's up to google what they 
do with http requests, surely.  If you are reasonably sure that your use 
does not violate the terms of use for google news:

http://news.google.com/intl/en_us/terms_google_news.html

Then you can experiment with getting urllib to send a different User-Agent 
header. 
msg26810 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2005-11-07 21:20
Logged In: YES 
user_id=357491

It isn't a Python bug, but then again it got my attention
which means I can contact people within Google to see if
they can find out what happened.
History
Date User Action Args
2022-04-11 14:56:13adminsetgithub: 42560
2005-11-07 06:31:01asyncstercreate