This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Use getaddrinfo() in urllib2.py for IPv6 support
Type: enhancement Stage:
Components: Library (Lib) Versions: Python 3.2
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: orsenthil Nosy List: Ramchandra Apte, dcantrell-rh, dmorr, facundobatista, jjlee, ned.deily, orsenthil
Priority: normal Keywords: patch

Created on 2007-03-07 05:19 by dcantrell-rh, last changed 2022-04-11 14:56 by admin.

Files
File name Uploaded Description Edit
urllib2-getaddrinfo.patch dcantrell-rh, 2007-03-07 05:19 Patch for Lib/urllib2.py replacing gethostbyname() calls with getaddrinfo() calls
urllib2-getaddrinfo.patch orsenthil, 2007-09-25 03:37
test_urllib2-getaddrinfo.patch orsenthil, 2007-09-25 03:38
Messages (12)
msg52082 - (view) Author: David Cantrell (dcantrell-rh) Date: 2007-03-07 05:19
A number of base Python modules use gethostbyname() when they should be using getaddrinfo().  The big limitation hit when using gethostbyname() is the lack of IPv6 support.

This first patch is for urllib2.py.  It replaces all uses of gethostbyname() with getaddrinfo() instead.  getaddrinfo() returns a 5-tuple, so additional code needs to wrap a getaddrinfo() call when replacing gethostbyname() calls.  Still should be pretty simple to read.

I'd like to see this patch added to the next stable release of Python, if at all possible.  I am working up patches for the other modules I see in the Lib/ subdirectory that could use getaddrinfo() instead of gethostbyname().
msg52083 - (view) Author: John J Lee (jjlee) Date: 2007-07-11 23:57
* Where are the tests?  A functional test, perhaps in test_urllib2net.py, for IPv6 support in urllib2 would be especially welcome, I think.
* Why does .check_host() not begin with an underscore?
* "check_host" is a poor name.  How about "_is_localhost"?
* locals is a built-in function, hence usually considered good style not to use it as a name.
* Is is necessary to call make_host_tuple(searchlist) twice?
* The patch appears to fix several bugs at once (e.g. adding a try: / except: suite around a large part of an existing method to catch socket.error).
msg56125 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2007-09-25 03:37
Hi,
The patch attached required a complete rewrite. I am attaching the
modified patch, which will just substitute socket.gethostbyname with a
function gethost_addrinfo which internally uses getaddrinfo and takes
care of the IPv4 or IPv6 addresses translation.

jjlee, skip: let me know your comments on this.

One note we have to keep in mind is, testing on IPv6 address.
For eg. on my system /etc/hosts
10.98.1.6       goofy.goofy.com
#fe80::219:5bff:fefd:6270       localhost
127.0.0.1       localhost

test_urllib2 will PASS for the above.
But if I uncomment the IPv6 address, opening the local file fails. I am
not sure how local file access is done with IPv6 and should urllib2
(local file opening function) itself needs to be modified. Shall check
into that, with next version.
msg69209 - (view) Author: Facundo Batista (facundobatista) * (Python committer) Date: 2008-07-03 17:27
What I don't understand here is... if gethostbyname() lacks of IPv6
support, instead of creating a new function why not to add the
functionality to that same function?

Right now gethostbyname() is implemented in C, which would be the
drawback of making it a Python function?
msg78716 - (view) Author: Derek Morr (dmorr) Date: 2009-01-01 18:49
Senthil,

I don't think your gethost_addrinfo() function will work. On a v6-
enabled machine, it will only return v6 or v4 names. Shouldn't it 
return both (since a machine could have both v4 and v6 addresses)? For 
example, on my machine, I have the following addresses for 
"localhost": ::1, fe80::1%lo0, 127.0.0.1.

Also, why is the AI_CANONNAME flag set? The canonname field isn't used. 
And you only appear to take the last IP address returned (sa[0]). 
Shouldn't you return all the addresses?
msg78717 - (view) Author: Derek Morr (dmorr) Date: 2009-01-01 18:52
Question: Why does FTPHandler.ftp_open() try to resolve the hostname()? 
The hostname will be passed into connect_ftp(), then into 
urllib.ftpwrapper(), and eventually into ftplib.FTP.connect(), which is 
IPv6-aware.
msg78750 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2009-01-02 03:49
Derek, 

This patch was along the lines that when IPv6 address is present, return
the first address,which I assumed to be active address and would make
the urllib2 work.

I am not sure, if returning all the addresses would help and how would
we define which address to use?

AI_CANONNAME flag, I don't accurately remember it now. But I had
encountered issues when testing on IPv-4 systems without it.

I am having different opinion on this issue now.

First is, taking from Facundo's comment on having this functionality in
gethostbyname() and implementing it in C. 

Second is, the wrapper function and suitable way needs to be defined.

I am sorry, I fail to understand the question on why ftp_open does
hostname resolution. You mean to say without it, if we pass it to
ftplib.FTP.connect() it would work for IPv6 address?
msg78754 - (view) Author: Derek Morr (dmorr) Date: 2009-01-02 04:13
My understanding is that the FileHandler checks if the file:// URL 
contains the hostname or localhost IP of the local machine (isn't that 
what FileHandler.names is for?). So, shouldn't the following URLs all 
open the same file:

file:///foo.txt
file://localhost/foo.txt
file://127.0.0.1/foo.txt
file://[::1]/foo.txt

If that is the case, then doesn't FileHandler.names need to have all of 
those values in it?

I am a little confused by this though. It looks like 
FileHandler.file_open() checks if there is a hostname in the URL, and 
if so, uses FTPHandler instead. So why does FileHandler.open_local_file 
check the hostname value?

For your other points, gethostbyname() in libc can only handle IPv4 
addresses. The IETF defined the getaddrinfo() interface as an IP 
version neutral replacement. I would recommend using getaddrinfo().

Yes, FTPHandler creates an urllib.FTPWrapper object. That object calls 
into ftplib, which is already IPv6-capable. So, I don't think we need 
to do hostname resolution in FTPHandler.
msg78759 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2009-01-02 05:29
> I am a little confused by this though. It looks like
> FileHandler.file_open() checks if there is a hostname in the URL, and
> if so, uses FTPHandler instead. So why does FileHandler.open_local_file
> check the hostname value?

You are right. Even I had observed this, but did not dispute it. Let
me try to look into the history to see why it so. Perhaps it needs to
change.

> For your other points, gethostbyname() in libc can only handle IPv4
> addresses. The IETF defined the getaddrinfo() interface as an IP
> version neutral replacement. I would recommend using getaddrinfo().
> Yes, FTPHandler creates an urllib.FTPWrapper object. That object calls
> into ftplib, which is already IPv6-capable. So, I don't think we need
> to do hostname resolution in FTPHandler.

Thanks for the info. I shall look into both in revision of the path.
1) using getaddrinfo() for IP version neutral call.
2) passing the hostname directly to ftplib. ( I am not sure of
consequences, need to investigate).
msg84810 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2009-03-31 15:52
Note also Issue5625 - any work for IPv6 should keep in mind that local 
hosts may have more than one IP address.
msg116619 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2010-09-16 21:49
@Senthil should this be assigned to your good self?
msg165931 - (view) Author: Ramchandra Apte (Ramchandra Apte) * Date: 2012-07-20 14:31
Bump.
History
Date User Action Args
2022-04-11 14:56:23adminsetgithub: 44672
2012-07-20 14:31:47Ramchandra Aptesetnosy: + Ramchandra Apte
messages: + msg165931
2010-11-22 11:13:08orsenthilsetassignee: facundobatista -> orsenthil

nosy: - BreamoreBoy
2010-09-16 21:49:37BreamoreBoysetnosy: + BreamoreBoy
messages: + msg116619
2010-07-11 02:18:54terry.reedysetversions: + Python 3.2, - Python 2.6, Python 3.0
2010-05-20 20:28:41skip.montanarosetnosy: - skip.montanaro
2009-03-31 15:52:00ned.deilysetnosy: + ned.deily
messages: + msg84810
2009-01-02 05:29:47orsenthilsetmessages: + msg78759
2009-01-02 04:13:47dmorrsetmessages: + msg78754
2009-01-02 03:49:47orsenthilsetmessages: + msg78750
2009-01-01 18:52:52dmorrsetmessages: + msg78717
2009-01-01 18:49:04dmorrsetnosy: + dmorr
messages: + msg78716
2008-07-03 17:27:58facundobatistasetassignee: facundobatista
messages: + msg69209
nosy: + facundobatista
2008-06-30 16:33:37orsenthilsettype: enhancement
2007-11-23 08:29:27christian.heimessetversions: + Python 2.6, Python 3.0
2007-09-25 03:38:12orsenthilsetfiles: + test_urllib2-getaddrinfo.patch
2007-09-25 03:37:43orsenthilsetfiles: + urllib2-getaddrinfo.patch
nosy: + orsenthil
messages: + msg56125
2007-08-30 22:26:57skip.montanarosetnosy: + skip.montanaro
2007-03-07 05:19:06dcantrell-rhcreate