Issue548176
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2002-04-24 15:36 by msdemlei, last changed 2022-04-10 16:05 by admin. This issue is now closed.
Messages (8) | |||
---|---|---|---|
msg10499 - (view) | Author: Markus Demleitner (msdemlei) | Date: 2002-04-24 15:36 | |
The urlparse module (at least in 2.2 and 2.1, Linux) doesn't handle URLs of the form http://www.maerkischeallgemeine.de?loc_id=49 correctly -- everything up to the 9 ends up in the host. I didn't check the RFC, but in the real world URLs like this do show up. urlparse works fine when there's a trailing slash on the host name: http://www.maerkischeallgemeine.de/?loc_id=49 Example: <pre> >>> import urlparse >>> urlparse.urlparse("http://www.maerkischeallgemeine.de/?loc_id=49") ('http', 'www.maerkischeallgemeine.de', '/', '', 'loc_id=49', '') >>> urlparse.urlparse("http://www.maerkischeallgemeine.de?loc_id=49") ('http', 'www.maerkischeallgemeine.de?loc_id=49', '', '', '', '') </pre> This has serious implications for urllib, since urllib.urlopen will fail for URLs like the second one, and with a pretty mysterious exception ("host not found") at that. |
|||
msg10500 - (view) | Author: Jeff Epler (jepler) | Date: 2002-11-17 16:56 | |
Logged In: YES user_id=2772 This actually appears to be permitted by RFC2396 [http://www.ietf.org/rfc/rfc2396.txt]. See section 3.2: 3.2. Authority Component Many URI schemes include a top hierarchical element for a naming authority, such that the namespace defined by the remainder of the URI is governed by that authority. This authority component is typically defined by an Internet-based server or a scheme-specific registry of naming authorities. authority = server | reg_name The authority component is preceded by a double slash "//" and is terminated by the next slash "/", question-mark "?", or by the end of the URI. Within the authority component, the characters ";", ":", "@", "?", and "/" are reserved. |
|||
msg10501 - (view) | Author: Steven Taschuk (staschuk) | Date: 2003-03-30 20:19 | |
Logged In: YES user_id=666873 For comparison, RFC 1738 section 3.3: An HTTP URL takes the form: http://<host>:<port>/<path>?<searchpart> [...] If neither <path> nor <searchpart> is present, the "/" may also be omitted. ... which does not outright say the '/' may *not* be omitted if <path> is absent but <searchpart> is present (though imho that's implied). But even if the / may not be omitted in this case, ? is not allowed in the authority component under either RFC 2396 or RFC 1738, so urlparse should either treat it as a delimiter or reject the URL as malformed. The principle of being lenient in what you accept favours the former. I've just submitted a patch (712317) for this. |
|||
msg10502 - (view) | Author: Mike Rovner (mrovner) | Date: 2004-01-27 01:13 | |
Logged In: YES user_id=162094 According to RFC2396 (ftp://ftp.isi.edu/in-notes/rfc2396.txt) absoluteURI (part 3 URI Syntactic Components) can be: """ <scheme>://<authority><path>?<query> each of which, except <scheme>, may be absent from a particular URI. """ Later on (3.2): """ The authority component is preceded by a double slash "//" and is terminated by the next slash "/", question-mark "?", or by the end of the URI. """ So URL "http://server?query" is perfectly legal and shall be allowed and patch 712317 rejected. |
|||
msg10503 - (view) | Author: Johannes Gijsbers (jlgijsbers) * | Date: 2004-10-23 07:03 | |
Logged In: YES user_id=469548 Somehow I think I'm missing something. Please check my line of reasoning: 1. http://foo?bar=baz is a legal URL. 2. urlparse's 'Network location' should be the same as <authority> from rfc2396. 3. Inside <authority> an unescaped '?' is not allowed. Rather: <authority> is terminated by the '?'. 4. Currently the 'network location' for http://foo?bar=baz would be 'foo?bar=baz. 5. If 'network location' should be the same as <authority>, it should also be terminated by the '?'. So shouldn't urlparse.urlsplit('http://foo?bar=baz') return ('http', 'foo', '', '', 'bar=baz', ''), as patch 712317 implements? |
|||
msg10504 - (view) | Author: Mike Rovner (mrovner) | Date: 2004-10-23 07:44 | |
Logged In: YES user_id=162094 I'm sorry, I misunderstood the patch. If it accepts such URL and split it at '?', it's perfectly fine. It shall not reject such URL as malformed. |
|||
msg10505 - (view) | Author: Paul Moore (paul.moore) * | Date: 2004-11-08 20:48 | |
Logged In: YES user_id=113328 This issue still exists in Python 2.3.4 and Python 2.4b2. |
|||
msg10506 - (view) | Author: Johannes Gijsbers (jlgijsbers) * | Date: 2005-01-09 15:33 | |
Logged In: YES user_id=469548 Fixed by applying patch #712317 on maint24 and HEAD. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-10 16:05:15 | admin | set | github: 36493 |
2002-04-24 15:36:23 | msdemlei | create |