Issue 1396543: urlparse is confused by /

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/42758

classification

Title:	urlparse is confused by /
Type:		Stage:
Components:	Library (Lib)	Versions:	Python 2.4

process

Status:	closed	Resolution:	wont fix
Dependencies:		Superseder:
Assigned To:		Nosy List:	georg.brandl, jjlee, johnhansen, pterk
Priority:	normal	Keywords:

Created on 2006-01-04 04:57 by johnhansen, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (8)
msg27231 - (view)	Author: John Hansen (johnhansen)	Date: 2006-01-04 04:57
If the parameter field of a URL contains a '/', urlparse does not enter date in the parameter field, but leaves it attached to the path. The simplified example is: >>> urlparse.urlparse("http://f/adi;s=a;c=b/") ('http', 'f', '/adi;s=a;c=b/', '', '', '') >>> urlparse.urlparse("http://f/adi;s=a;c=b") ('http', 'f', '/adi', 's=a;c=b', '', '') The realworld case was: >>> urlparse.urlparse("http://ad.doubleclick.net/adi/ N3691.VibrantMedia/B1733031.2;sz=160x600;click=http%3A/ adforce.adtech.de/adlink%7C82%7C59111%7C1%7C168%7CAdId% 3D1023327%3BBnId%3D4%3Bitime%3D335264036%3Bku%3D12900% 3Bkey%3Dcomputing%2Bbetanews%5Fgeneral%3Blink%3D") (''http'', 'ad.doubleclick.net/adi/N3691.VibrantMedia/ B1733031.2;sz=160x600;click=http%3A/adforce.adtech.de/adlink% 7C82%7C59111%7C1%7C168%7CAdId%3D1023327%3BBnId%3D4%3Bitime %3D335264036%3Bku%3D12900%3Bkey%3Dcomputing%2Bbetanews% 5Fgeneral%3Blink%3D', '', '', '') What's odd is that the code specifically says to do this: def _splitparams(url): if '/' in url: i = url.find(';', url.rfind('/')) if i < 0: return url, '' Is there a reason for the rfind?
msg27232 - (view)	Author: John Hansen (johnhansen)	Date: 2006-01-04 05:00
Logged In: YES user_id=1418831 The first line should have read: If the parameter field of a URL contains a '/', urlparse does not enter it into the parameter field, but leaves it attached to the path.
msg27233 - (view)	Author: John Hansen (johnhansen)	Date: 2006-01-04 16:31
Logged In: YES user_id=1418831 The first line should have read: If the parameter field of a URL contains a '/', urlparse does not enter it into the parameter field, but leaves it attached to the path.
msg27234 - (view)	Author: Peter van Kampen (pterk)	Date: 2006-01-13 00:25
Logged In: YES user_id=174455 Looking at the testcases it appears the answers must be in rfc's 1808 or 2396. http://www.ietf.org/rfc/rfc1808.txt and http://www.ietf.org/rfc/rfc2396.txt See for example section 5.3 of 1808. I don't see why _splitparams does what is does but I didn't exactly close-read the text either. Also be sure to look at Lib/test/test_urlparse.py.
msg27235 - (view)	Author: John Hansen (johnhansen)	Date: 2006-01-13 18:19
Logged In: YES user_id=1418831 Well RFC2396, section 3.4 says "/" is reserved within a query. However, the real world doesn't seem to follow RFC2396... so I still think it's a bug: the class should be useful, rather than try to enforce an RFC. A warning would be fine.
msg27236 - (view)	Author: Peter van Kampen (pterk)	Date: 2006-01-14 21:19
Logged In: YES user_id=174455 Actually section 3.3 of RFC2396 is relevant here and it seems that it is indeed correctly implemented as is. I'm not sure what the 'python policy' is on RFC vs The Real World. My guess would be that RFC's carry some weight. Following the 'real world' is too vague a reference. Your world might be different than mine and tomorrow's world a different world than today's. You can always monkey-patch: >>> def my_splitparams(url): ... i = url.find(';') ... return url[:i], url[i+1:] ... >>> import urlparse >>> urlparse._splitparams = my_splitparams >>> urlparse.urlparse("http://f/adi;s=a;c=b/") ('http', 'f', '/adi', 's=a;c=b/', '', '')
msg27237 - (view)	Author: John J Lee (jjlee)	Date: 2006-02-06 01:09
Logged In: YES user_id=261020 The urlparse.urlparse() code should not be changed, for backwards compatibility reasons. As the docs for module urlparse explain, you should instead use urlparse.urlsplit(), then another function to parse parameters (that other function is not supplied by the stdlib, IIRC). Also, note that RFCs 3986 obsoletes RFC 2396 (see also RFC 3987).
msg27238 - (view)	Author: Georg Brandl (georg.brandl) *	Date: 2006-02-19 00:24
Logged In: YES user_id=1188172 Closing this as Won't Fix, then. Backwards compatibility and RFC compliance are two important reasons. Creators of URLs like shown above should be shot in the back anyway.

History
Date	User	Action	Args
2022-04-11 14:56:14	admin	set	github: 42758
2006-01-04 04:57:48	johnhansen	create