This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: urlparse is confused by /
Type: Stage:
Components: Library (Lib) Versions: Python 2.4
process
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: Nosy List: georg.brandl, jjlee, johnhansen, pterk
Priority: normal Keywords:

Created on 2006-01-04 04:57 by johnhansen, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (8)
msg27231 - (view) Author: John Hansen (johnhansen) Date: 2006-01-04 04:57
If the parameter field of a URL contains a '/', urlparse does not enter date 
in the parameter field, but leaves it attached to the path.

The simplified example is:
>>> urlparse.urlparse("http://f/adi;s=a;c=b/")
('http', 'f', '/adi;s=a;c=b/', '', '', '')

>>> urlparse.urlparse("http://f/adi;s=a;c=b")
('http', 'f', '/adi', 's=a;c=b', '', '')

The realworld case was:

>>> urlparse.urlparse("http://ad.doubleclick.net/adi/
N3691.VibrantMedia/B1733031.2;sz=160x600;click=http%3A/
adforce.adtech.de/adlink%7C82%7C59111%7C1%7C168%7CAdId%
3D1023327%3BBnId%3D4%3Bitime%3D335264036%3Bku%3D12900%
3Bkey%3Dcomputing%2Bbetanews%5Fgeneral%3Blink%3D")
(''http'', 'ad.doubleclick.net/adi/N3691.VibrantMedia/
B1733031.2;sz=160x600;click=http%3A/adforce.adtech.de/adlink%
7C82%7C59111%7C1%7C168%7CAdId%3D1023327%3BBnId%3D4%3Bitime
%3D335264036%3Bku%3D12900%3Bkey%3Dcomputing%2Bbetanews%
5Fgeneral%3Blink%3D', '', '', '')

What's odd is that the code specifically says to do this:
def _splitparams(url):
    if '/'  in url:
        i = url.find(';', url.rfind('/'))
        if i < 0:
            return url, ''

Is there a reason for the rfind?
msg27232 - (view) Author: John Hansen (johnhansen) Date: 2006-01-04 05:00
Logged In: YES 
user_id=1418831

The first line should have read:

If the parameter field of a URL contains a '/', urlparse does not enter it 
into the parameter field, but leaves it attached to the path.
msg27233 - (view) Author: John Hansen (johnhansen) Date: 2006-01-04 16:31
Logged In: YES 
user_id=1418831

The first line should have read:

If the parameter field of a URL contains a '/', urlparse does not enter it 
into the parameter field, but leaves it attached to the path.
msg27234 - (view) Author: Peter van Kampen (pterk) Date: 2006-01-13 00:25
Logged In: YES 
user_id=174455

Looking at the testcases it appears the answers must be in
rfc's 1808 or 2396. http://www.ietf.org/rfc/rfc1808.txt and
http://www.ietf.org/rfc/rfc2396.txt See for example section
5.3 of 1808. I don't see why _splitparams does what is does
but I didn't exactly close-read the text either. Also be
sure to look at Lib/test/test_urlparse.py.

msg27235 - (view) Author: John Hansen (johnhansen) Date: 2006-01-13 18:19
Logged In: YES 
user_id=1418831

Well RFC2396, section 3.4 says "/" is reserved within a query. However, the real 
world doesn't seem to follow RFC2396... so I still think it's a bug: the class 
should be useful, rather than try to enforce an RFC. A warning would be fine.
msg27236 - (view) Author: Peter van Kampen (pterk) Date: 2006-01-14 21:19
Logged In: YES 
user_id=174455

Actually section 3.3 of RFC2396 is relevant here and it
seems that it is indeed correctly implemented as is.

I'm not sure what the 'python policy' is on RFC vs The Real
World. My guess would be that RFC's carry some weight.
Following the 'real world' is too vague a reference. Your
world might be different than mine and tomorrow's world a
different world than today's.

You can always monkey-patch:

>>> def my_splitparams(url):
...     i = url.find(';')
...     return url[:i], url[i+1:]
...
>>> import urlparse
>>> urlparse._splitparams = my_splitparams
>>> urlparse.urlparse("http://f/adi;s=a;c=b/")
('http', 'f', '/adi', 's=a;c=b/', '', '')
msg27237 - (view) Author: John J Lee (jjlee) Date: 2006-02-06 01:09
Logged In: YES 
user_id=261020

The urlparse.urlparse() code should not be changed, for
backwards compatibility reasons.

As the docs for module urlparse explain, you should instead
use urlparse.urlsplit(), then another function to parse
parameters (that other function is not supplied by the
stdlib, IIRC).

Also, note that RFCs 3986 obsoletes RFC 2396 (see also RFC
3987).
msg27238 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2006-02-19 00:24
Logged In: YES 
user_id=1188172

Closing this as Won't Fix, then. Backwards compatibility and
RFC compliance are two important reasons.

Creators of URLs like shown above should be shot in the back
anyway.
History
Date User Action Args
2022-04-11 14:56:14adminsetgithub: 42758
2006-01-04 04:57:48johnhansencreate