This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: webchecker/urllib chokes on 404 pages
Type: Stage:
Components: Demos and Tools Versions: Python 2.5
process
Status: closed Resolution: duplicate
Dependencies: Superseder:
Assigned To: Nosy List: effbot, georg.brandl
Priority: high Keywords:

Created on 2006-12-10 19:35 by effbot, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (2)
msg30778 - (view) Author: Fredrik Lundh (effbot) * (Python committer) Date: 2006-12-10 19:35
platform: standard Python 2.5 on Windows XP.

webchecker chokes on reponse code 404, which is a bit unfortunate...

the error occurs deep down in urllib, but a plain urllib request to the same page don't result in the same errors, so it's probably related to how webchecker is using the library.

here's an example:

C:\Python25\Tools\webchecker> python webchecker.py http://www.python.org/foo

webchecker version 50851

Round 1 (1 total, 1 to do, 0 done, 0 bad)

No need to save checkpoint
Traceback (most recent call last):
  File "webchecker.py", line 892, in <module>
    main()
  File "webchecker.py", line 222, in main
    c.run()
  File "webchecker.py", line 349, in run
    self.dopage(url)
  File "webchecker.py", line 404, in dopage
    page = self.getpage(url_pair)
  File "webchecker.py", line 509, in getpage
    text, nurl = self.readhtml(url_pair)
  File "webchecker.py", line 523, in readhtml
    f, url = self.openhtml(url_pair)
  File "webchecker.py", line 531, in openhtml
    f = self.openpage(url_pair)
  File "webchecker.py", line 543, in openpage
    return self.urlopener.open(url)
  File "c:\python25\lib\urllib.py", line 190, in open
    return getattr(self, name)(url)
  File "c:\python25\lib\urllib.py", line 334, in open_http
    return self.http_error(url, fp, errcode, errmsg, headers)
  File "c:\python25\lib\urllib.py", line 351, in http_error
    return self.http_error_default(url, fp, errcode, errmsg, headers)
  File "c:\python25\lib\urllib.py", line 357, in http_error_default
    raise IOError, ('http error', errcode, errmsg, headers)
TypeError: EnvironmentError expected at most 3 arguments, got 4

running the same test under Python 2.4 works fine:

C:\python24\Tools\webchecker>python webchecker.py http://www.python.org/foo
webchecker version 36560

Round 1 (1 total, 1 to do, 0 done, 0 bad)

Error ('http error', 404, 'Not Found')
 HREF  http://www.python.org/foo
  from <root>

Final Report (1 total, 0 to do, 1 done, 1 bad)

Error Report:

Error in <root>
  HREF http://www.python.org/foo
    msg ('http error', 404, 'Not Found')

Saving checkpoint to @webchecker.pickle ...
Done.
Use ``webchecker.py -R'' to restart.
msg30779 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2006-12-11 10:25
This is a known issue (another exception object rewrite relic) and has been fixed as response to bug #1566800.
History
Date User Action Args
2022-04-11 14:56:21adminsetgithub: 44323
2006-12-10 19:35:39effbotcreate