Issue503031
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2002-01-13 18:09 by sachmoz, last changed 2022-04-10 16:04 by admin. This issue is now closed.
Files | ||||
---|---|---|---|---|
File name | Uploaded | Description | Edit | |
urllib.py | sachmoz, 2002-01-13 22:35 | |||
urllib.diff | gvanrossum, 2002-03-28 05:48 | Patch for getproxies_registry() [Guido] |
Messages (25) | |||
---|---|---|---|
msg8731 - (view) | Author: Jason Cowley (sachmoz) | Date: 2002-01-13 18:09 | |
While trying to use the httplib.py urlopen() function, as follows: doc = urlopen("http://www.python.org").read() print doc I was receiving the following trace: Traceback (most recent call last): File "C:/Documents and Settings/Administrator/Desktop/jason/grabpage.py", line 3, in ? doc = urlopen("http://www.python.org").read() File "C:\Python22\lib\urllib.py", line 73, in urlopen return _urlopener.open(url) File "C:\Python22\lib\urllib.py", line 178, in open return getattr(self, name)(url) File "C:\Python22\lib\urllib.py", line 283, in open_http h = httplib.HTTP(host) File "C:\Python22\lib\httplib.py", line 688, in __init__ self._setup(self._connection_class(host, port)) File "C:\Python22\lib\httplib.py", line 343, in __init__ self._set_hostport(host, port) File "C:\Python22\lib\httplib.py", line 349, in _set_hostport port = int(host[i+1:]) ValueError: invalid literal for int(): I managed to track the problem down to the function open_http() in urllib.py. The value of the 'host' variable contained the string 'http:' rather than 'www.python.org', when a call is made as follows: httplib.HTTP(host) Line 272 of urllib.py should be setting the variable 'host' to the value of 'realhost' but the statement is never executed. The function 'proxy_bypas ()' doesn't appear to do anything but return 0. I fixed it for my own purposes by adding a statement: host = realhost |
|||
msg8732 - (view) | Author: Guido van Rossum (gvanrossum) * | Date: 2002-01-13 21:33 | |
Logged In: YES user_id=6380 I cannot reproduce this. What are your proxy settings? |
|||
msg8733 - (view) | Author: Jason Cowley (sachmoz) | Date: 2002-01-13 22:35 | |
Logged In: YES user_id=426262 I am not using a proxy, but I have a dial-up connection to an ISP and I am using Windows 2000. The Python version info is: Python 2.2 (#28, Dec 21 2001, 12:21:22) [MSC 32 bit (Intel)] on win32 Here is the modification I made to urllib.py: 272: if proxy_bypass(realhost): 273: host = realhost # this line was not being executed 274: host = realhost # I added this to fix urlopen() Without this line I added, the following statement was being executed 9-10 lines below, with 'http:' as the value of host: h = httplib.HTTP(host) Which later caused the problem when _set_hostport in httplib.py tries to convert an empty string to an int on line 349: port = int(host[i+1:]) I have attached my copy of "urllib.py". |
|||
msg8734 - (view) | Author: Guido van Rossum (gvanrossum) * | Date: 2002-01-14 04:46 | |
Logged In: YES user_id=6380 Hm, you can only ever end up in that code block if you have some kind of proxy settings active. On Windows, those are in the registry, even if you think they are not. Your fix is clearly not right -- but in order to find out what is right, I need your proxy settings. |
|||
msg8735 - (view) | Author: Jason Cowley (sachmoz) | Date: 2002-01-14 13:04 | |
Logged In: YES user_id=426262 I hope this is what you need: >>> print getproxies_environment() {} >>> print getproxies_registry() {'ftp': 'ftp://http://www- cache.freeserve.com:8080', 'http': 'http://http://www- cache.freeserve.com:8080'} >>> |
|||
msg8736 - (view) | Author: Guido van Rossum (gvanrossum) * | Date: 2002-01-15 17:46 | |
Logged In: YES user_id=6380 If that's really what getproxies_registry() prints, then look again at the URL in the dict for key 'http'. It says 'http://http://www-cache.freeserve.com:8080' In other words a double http:// prefix!!! If you fix the registry the problem will go away. I don't think this is a problem with urllib.py. |
|||
msg8737 - (view) | Author: Jason Cowley (sachmoz) | Date: 2002-01-15 18:17 | |
Logged In: YES user_id=426262 The actual settings in the registry look slightly different: http=http://www-cache.freeserve.com:8080;ftp=http://www- cache.freeserve.com:8080 Notice the '=' signs. These settings have been set automatically by Freeserve, and so there are perhaps millions of people in the UK with the same registry settings (and therefore the same problem). I have mailed Freeserve to ask them to confirm if the settings are correct. |
|||
msg8738 - (view) | Author: Guido van Rossum (gvanrossum) * | Date: 2002-01-15 19:24 | |
Logged In: YES user_id=6380 I'm assigning this to Mark Hammond, who knows more about the Windows registry. Could there be a bug in the function getproxies_registry()? See the last two posts from sachmoz; ignore the original problem description. |
|||
msg8739 - (view) | Author: Thomas Heller (theller) * | Date: 2002-01-15 20:07 | |
Logged In: YES user_id=11105 It seems sachmoz registry settings are valid for IE, I checked this by changing my own settings from ftp=192.168.0.15:3128;http=192.168.0.13:3128 to ftp=http://192.168.0.15:3128;http=http://192.168.0.13:3128 IE works either before or after this change. Here's the only article I found on MSDN showing an example: http://support.microsoft.com/default.aspx?scid=kb;EN- US;q164035 |
|||
msg8740 - (view) | Author: Guido van Rossum (gvanrossum) * | Date: 2002-01-15 20:18 | |
Logged In: YES user_id=6380 Looks like this code block in getproxies_registry() is broken then: # Per-protocol settings for p in proxyServer.split(';'): protocol, address = p.split('=', 1) proxies[protocol] = '%s://%s' % (protocol, address) It should only add the <protocol>:// prefix if one isn't already there. |
|||
msg8741 - (view) | Author: Thomas Heller (theller) * | Date: 2002-01-15 20:58 | |
Logged In: YES user_id=11105 Isn't the correct setting for an ftp proxy "http://192.168.0.15:3128" instead of "ftp://192.168.0.15:3128". At least, in Python 2.1, only the former works for me. In Python 2.2 neither does, but maybe that's a different issue. |
|||
msg8742 - (view) | Author: Thomas Heller (theller) * | Date: 2002-01-16 13:03 | |
Logged In: YES user_id=11105 Here's a quote from Microsoft docs (Windows 2000 Server Resource Kit). I have not found it online, but it's in my local MSDN library April 2001: MSDNLibrary -> Resource Kits -> Windows 2000 Server Resource Kit -> Internet Explorer 5 Resource Kit -> Part 3: Customizing -> Chapter 13: Setting up Servers -> Working with Proxy Servers <quote> Proxy locations that do not begin with a protocol (such as http:// or ftp://) are assumed to be a CERN-type HTTP proxy. For example, when the user types proxy, it's treated the same as if the user typed http://proxy. For FTP gateways, such as the TIS FTP gateway, the proxy should be listed with the ftp:// in front of the proxy name. For example, an FTP gateway for an FTP proxy would have this format: ftp://ftpproxy When you enter proxy settings, use the following syntax, where <address> is the Web address of the proxy server and <port> is the port number assigned to the proxy server: http://<address>:<port> For example, if the address of the proxy server is proxy.example.microsoft.com and the port number is 80, the setting in the Proxy Server box for LAN settings in the Proxy Settings dialog box or the Proxy Settings screen of the Customization wizard should read as follows: http://proxy.example.microsoft.com:80 </quote> |
|||
msg8743 - (view) | Author: Jason Cowley (sachmoz) | Date: 2002-01-16 15:37 | |
Logged In: YES user_id=426262 I have found the document that theller referred to online, the URL is: http://www.microsoft.com/WINDOWS2000/techinfo/reskit/en/ierk /Ch13_d.htm or alternatively: http://www.microsoft.com/windows2000/techinfo/reskit/en- us/default.asp?url=/WINDOWS2000/techinfo/reskit/en- us/ierk/Ch13_d.asp The actual registry entries that I posted earlier appear to be set by Windows2000/IE6. If you make the following series of clicks from IE: Tools | Internet Options | Connections Then in the section: Dial-up and Virtual Private Network settings click the Settings... button, then the Advanced... button under Proxy server, you will see a list of proxy servers for different protocols. If I add a fake proxy server for Gopher, such as: http://www-cache.sachmoz.com with port 8080, the registry key data is altered to: ftp=http://www-cache.freeserve.com:8080;gopher=http://www- cache.sachmoz.com:8080;http=http://www- cache.freeserve.com:8080 |
|||
msg8744 - (view) | Author: Thomas Heller (theller) * | Date: 2002-01-16 17:31 | |
Logged In: YES user_id=11105 The (my) conclusion of all this is: # Per-protocol settings for p in proxyServer.split(';'): protocol, address = p.split('=', 1) proxies[protocol] = '%s://%s' % (protocol, address) It should add a "http://" prefix if one isn't already there. |
|||
msg8745 - (view) | Author: Mark Hammond (mhammond) * | Date: 2002-03-28 03:43 | |
Logged In: YES user_id=14198 Can not repro this: >>> import urllib >>> doc = urllib.urlopen("http://www.python.org").read() >>> len(doc) 11851 >>> |
|||
msg8746 - (view) | Author: Guido van Rossum (gvanrossum) * | Date: 2002-03-28 05:48 | |
Logged In: YES user_id=6380 Not so quick, Mark. Are your proxy settings the same as his? I've uploaded a proposed fix that does what Thomas Heller recommends (urllib.diff). Thomas or sachmoz, can you verify that it works? |
|||
msg8747 - (view) | Author: Guido van Rossum (gvanrossum) * | Date: 2002-03-31 23:40 | |
Logged In: YES user_id=6380 I've checked in my bugfix into CVS as urllib.py: 1.140. Jason or someone else who is experiencing this problem, can you please check that this fix (also in attachment "urllib.diff") solves the problem for you? (Hm, this should be a 2.2.1 bugfix candidate too.) |
|||
msg8748 - (view) | Author: Michael Hudson (mwh) | Date: 2002-04-01 09:37 | |
Logged In: YES user_id=6656 Do you want me to wait for confirmation before sticking this in release22-maint? |
|||
msg8749 - (view) | Author: Guido van Rossum (gvanrossum) * | Date: 2002-04-01 13:01 | |
Logged In: YES user_id=6380 Nah, just stick it in. Even if it doesn't work, it's harmless. |
|||
msg8750 - (view) | Author: Jason Cowley (sachmoz) | Date: 2002-04-01 13:03 | |
Logged In: YES user_id=426262 It is difficult for me to test this because I am no longer using the same system, but I made a special journey this morning to test the bug fix. Unfortunately, using the fix exposed what appears to be another error. Here is the traceback: >>> webpage = urllib.urlopen('http://www.python.org').read() Traceback (most recent call last): File "<pyshell#7>", line 1, in ? webpage = urllib.urlopen('http://www.python.org').read() File "C:\Python22\lib\urllib.py", line 73, in urlopen return _urlopener.open(url) File "C:\Python22\lib\urllib.py", line 166, in open name = 'open_' + urltype TypeError: cannot concatenate 'str' and 'NoneType' objects I have attempted to investigate this. The error occurred in the "open" function. On this system, the value of "urltype" is initially "http", and "proxy" is "//www- cache.freeserve.com:8080". But then a split is performed on the "proxy" variable to split off the urltype, but the urltype has already been separated. Therefore, "urltype" is then assigned "None", causing the TypeError. F.Y.I, self.proxies on this system is: {'ftp', '//www-cache.freeserve.com:8080', 'http': '//www- cache.freeserve.com:8080'} I got it to work by removing the line: urltype, proxyhost = splittype(proxy) and changing the following line to: host.selector = splithost(proxy) |
|||
msg8751 - (view) | Author: Jason Cowley (sachmoz) | Date: 2002-04-01 13:07 | |
Logged In: YES user_id=426262 Sorry, the last line on that last post should read: host, selector = splithost(proxy) |
|||
msg8752 - (view) | Author: Guido van Rossum (gvanrossum) * | Date: 2002-04-01 13:34 | |
Logged In: YES user_id=6380 Michael, obviously it's better to hold off. :-( I apologize for the wasted trip, Jason. I think I know where I messed up but it'll have to wait until after breakfast (my family are calling me :-). |
|||
msg8753 - (view) | Author: Guido van Rossum (gvanrossum) * | Date: 2002-04-02 03:24 | |
Logged In: YES user_id=6380 The code I added to getproxies_registry() was wrong. Can you try this instead? Manually patch the urllib.py to replace the two lines type, address = splittype(address) if not type: with these (indented the same): import re if not re.match('^([^/:]+)://', address): This will ensure that the proxies dict looks like it should, e.g. in your case it would be {'ftp', 'http://www-cache.freeserve.com:8080', 'http': 'http://www-cache.freeserve.com:8080'} I'll see if I can test this myself when I have access to a working Windows machine tomorrow. Michael: when I check that it, you'll know it's good for 2.2.1. :-) |
|||
msg8754 - (view) | Author: Jason Cowley (sachmoz) | Date: 2002-04-02 10:21 | |
Logged In: YES user_id=426262 I have tested the 're' version on the system that had the problem (Windows 2000) and it works fine. Thanks everyone. |
|||
msg8755 - (view) | Author: Guido van Rossum (gvanrossum) * | Date: 2002-04-02 14:39 | |
Logged In: YES user_id=6380 Thanks! Checked in. Take it away, Michael! :-) |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-10 16:04:52 | admin | set | github: 35912 |
2002-01-13 18:09:35 | sachmoz | create |