Issue450225
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2001-08-12 05:10 by aaronsw, last changed 2022-04-10 16:04 by admin. This issue is now closed.
Files | ||||
---|---|---|---|---|
File name | Uploaded | Description | Edit | |
uritests.py | aaronsw, 2001-11-05 18:34 | URI Test Suite |
Messages (10) | |||
---|---|---|---|
msg5898 - (view) | Author: Aaron Swartz (aaronsw) | Date: 2001-08-12 05:10 | |
I've put together a test suite for Python's URLparse module, based on the tests in Appendix C of RFC2396 (the URI RFC). They're available at: http://lists.w3.org/Archives/Public/uri/2001Aug/ 0013.html The major problem seems to be that it treats queries and parameters as special components (not just normal parts of the path), making this related to: http://sourceforge.net/tracker/?group_id=5470& atid=105470&func=detail&aid=210834 |
|||
msg5899 - (view) | Author: Fred Drake (fdrake) | Date: 2001-11-05 18:05 | |
Logged In: YES user_id=3066 This looks like its probably related to #478038; I'll try to tackle them together. Can you attach your tests to the bug report on SF? Thanks! |
|||
msg5900 - (view) | Author: Aaron Swartz (aaronsw) | Date: 2001-11-05 18:30 | |
Logged In: YES user_id=122141 Sure, here they are: import urlparse base = 'http://a/b/c/d;p?q' assert urlparse.urljoin(base, 'g:h') == 'g:h' assert urlparse.urljoin(base, 'g') == 'http://a/b/c/g' assert urlparse.urljoin(base, './g') == 'http://a/b/c/g' assert urlparse.urljoin(base, 'g/') == 'http://a/b/c/g/' assert urlparse.urljoin(base, '/g') == 'http://a/g' assert urlparse.urljoin(base, '//g') == 'http://g' assert urlparse.urljoin(base, '?y') == 'http://a/b/c/?y' assert urlparse.urljoin(base, 'g?y') == 'http://a/b/c/g?y' assert urlparse.urljoin(base, '#s') == 'http://a/b/c/ d;p?q#s' assert urlparse.urljoin(base, 'g#s') == 'http://a/b/c/g#s' assert urlparse.urljoin(base, 'g?y#s') == 'http://a/b/c/ g?y#s' assert urlparse.urljoin(base, ';x') == 'http://a/b/c/;x' assert urlparse.urljoin(base, 'g;x') == 'http://a/b/c/g;x' assert urlparse.urljoin(base, 'g;x?y#s') == 'http://a/b/c/ g;x?y#s' assert urlparse.urljoin(base, '.') == 'http://a/b/c/' assert urlparse.urljoin(base, './') == 'http://a/b/c/' assert urlparse.urljoin(base, '..') == 'http://a/b/' assert urlparse.urljoin(base, '../') == 'http://a/b/' assert urlparse.urljoin(base, '../g') == 'http://a/b/g' assert urlparse.urljoin(base, '../..') == 'http://a/' assert urlparse.urljoin(base, '../../') == 'http://a/' assert urlparse.urljoin(base, '../../g') == 'http://a/g' assert urlparse.urljoin(base, '') == base assert urlparse.urljoin(base, '../../../g') == 'http://a/../g' assert urlparse.urljoin(base, '../../../../g') == 'http://a/../../g' assert urlparse.urljoin(base, '/./g') == 'http://a/./g' assert urlparse.urljoin(base, '/../g') == 'http://a/../g' assert urlparse.urljoin(base, 'g.') == 'http://a/b/c/ g.' assert urlparse.urljoin(base, '.g') == 'http://a/b/c/ .g' assert urlparse.urljoin(base, 'g..') == 'http://a/b/c/ g..' assert urlparse.urljoin(base, '..g') == 'http://a/b/c/ ..g' assert urlparse.urljoin(base, './../g') == 'http://a/b/g' assert urlparse.urljoin(base, './g/.') == 'http://a/b/c/ g/' assert urlparse.urljoin(base, 'g/./h') == 'http://a/b/c/ g/h' assert urlparse.urljoin(base, 'g/../h') == 'http://a/b/c/ h' assert urlparse.urljoin(base, 'g;x=1/./y') == 'http://a/b/c/g;x=1/y' assert urlparse.urljoin(base, 'g;x=1/../y') == 'http://a/b/ c/y' assert urlparse.urljoin(base, 'g?y/./x') == 'http://a/b/c/g?y/./x' assert urlparse.urljoin(base, 'g?y/../x') == 'http://a/b/c/g?y/../x' assert urlparse.urljoin(base, 'g#s/./x') == 'http://a/b/ c/g#s/./x' assert urlparse.urljoin(base, 'g#s/../x') == 'http://a/b/ c/g#s/../x' |
|||
msg5901 - (view) | Author: Aaron Swartz (aaronsw) | Date: 2001-11-05 18:34 | |
Logged In: YES user_id=122141 Oops, meant to attach it... |
|||
msg5902 - (view) | Author: Jon Ribbens (jribbens) * | Date: 2002-03-18 14:22 | |
Logged In: YES user_id=76089 I think it would be better btw if '..' components taking you 'off the top' were stripped. RFC 2396 says this is valid behaviour, and it's what 'real' browsers do. i.e. http://a/b/ + ../../../d == http://a/d |
|||
msg5903 - (view) | Author: Skip Montanaro (skip.montanaro) * | Date: 2002-03-23 05:34 | |
Logged In: YES user_id=44345 added Aaron's RFC 2396 tests to test_urlparse.py version 1.4 - the two failing tests are commented out |
|||
msg5904 - (view) | Author: Michael Stone (mbrierst) | Date: 2003-02-03 21:02 | |
Logged In: YES user_id=670441 The two failing tests could not pass because RFC 1808 and RFC 2396 seem to conflict when a relative URI is given as just ;y or just ?y. RFC 2396 claims to update RFC 1808, so presumably it describes the correct behavior. The patch in this message (I can't upload it on sourceforge here for some reason) brings urljoin's behavior in line with RFC 2396, and changes the appropriate test cases. I think if you apply this patch this bug can be closed. Let me know what you think Index: python/dist/src/Lib/urlparse.py =================================================================== RCS file: /cvsroot/python/python/dist/src/Lib/urlparse.py,v retrieving revision 1.39 diff -c -r1.39 urlparse.py *** python/dist/src/Lib/urlparse.py 7 Jan 2003 02:09:16 -0000 1.39 --- python/dist/src/Lib/urlparse.py 3 Feb 2003 20:51:08 -0000 *************** *** 157,169 **** if path[:1] == '/': return urlunparse((scheme, netloc, path, params, query, fragment)) ! if not path: ! if not params: ! params = bparams ! if not query: ! query = bquery return urlunparse((scheme, netloc, bpath, ! params, query, fragment)) segments = bpath.split('/')[:-1] + path.split('/') # XXX The stuff below is bogus in various ways... if segments[-1] == '.': --- 157,165 ---- if path[:1] == '/': return urlunparse((scheme, netloc, path, params, query, fragment)) ! if not (path or params or query): return urlunparse((scheme, netloc, bpath, ! bparams, bquery, fragment)) segments = bpath.split('/')[:-1] + path.split('/') # XXX The stuff below is bogus in various ways... if segments[-1] == '.': Index: python/dist/src/Lib/test/test_urlparse.py =================================================================== RCS file: /cvsroot/python/python/dist/src/Lib/test/test_urlparse.py,v retrieving revision 1.11 diff -c -r1.11 test_urlparse.py *** python/dist/src/Lib/test/test_urlparse.py 6 Jan 2003 20:27:03 -0000 1.11 --- python/dist/src/Lib/test/test_urlparse.py 3 Feb 2003 20:51:12 -0000 *************** *** 54,59 **** --- 54,63 ---- self.assertEqual(urlparse.urlunparse(urlparse.urlparse(u)), u) def test_RFC1808(self): + # updated by RFC 2396 + # self.checkJoin(RFC1808_BASE, '?y', 'http://a/b/c/d;p?y') + # self.checkJoin(RFC1808_BASE, ';x', 'http://a/b/c/d;x') + # "normal" cases from RFC 1808: self.checkJoin(RFC1808_BASE, 'g:h', 'g:h') self.checkJoin(RFC1808_BASE, 'g', 'http://a/b/c/g') *************** *** 61,74 **** self.checkJoin(RFC1808_BASE, 'g/', 'http://a/b/c/g/') self.checkJoin(RFC1808_BASE, '/g', 'http://a/g') self.checkJoin(RFC1808_BASE, '//g', 'http://g') - self.checkJoin(RFC1808_BASE, '?y', 'http://a/b/c/d;p?y') self.checkJoin(RFC1808_BASE, 'g?y', 'http://a/b/c/g?y') self.checkJoin(RFC1808_BASE, 'g?y/./x', 'http://a/b/c/g?y/./x') self.checkJoin(RFC1808_BASE, '#s', 'http://a/b/c/d;p?q#s') self.checkJoin(RFC1808_BASE, 'g#s', 'http://a/b/c/g#s') self.checkJoin(RFC1808_BASE, 'g#s/./x', 'http://a/b/c/g#s/./x') self.checkJoin(RFC1808_BASE, 'g?y#s', 'http://a/b/c/g?y#s') - self.checkJoin(RFC1808_BASE, ';x', 'http://a/b/c/d;x') self.checkJoin(RFC1808_BASE, 'g;x', 'http://a/b/c/g;x') self.checkJoin(RFC1808_BASE, 'g;x?y#s', 'http://a/b/c/g;x?y#s') self.checkJoin(RFC1808_BASE, '.', 'http://a/b/c/') --- 65,76 ---- *************** *** 103,111 **** def test_RFC2396(self): # cases from RFC 2396 ! ### urlparse.py as of v 1.32 fails on these two ! #self.checkJoin(RFC2396_BASE, '?y', 'http://a/b/c/?y') ! #self.checkJoin(RFC2396_BASE, ';x', 'http://a/b/c/;x') self.checkJoin(RFC2396_BASE, 'g:h', 'g:h') self.checkJoin(RFC2396_BASE, 'g', 'http://a/b/c/g') --- 105,113 ---- def test_RFC2396(self): # cases from RFC 2396 ! # conflict with RFC 1808, tests commented out there ! self.checkJoin(RFC2396_BASE, '?y', 'http://a/b/c/?y') ! self.checkJoin(RFC2396_BASE, ';x', 'http://a/b/c/;x') self.checkJoin(RFC2396_BASE, 'g:h', 'g:h') self.checkJoin(RFC2396_BASE, 'g', 'http://a/b/c/g') |
|||
msg5905 - (view) | Author: Brett Cannon (brett.cannon) * | Date: 2003-05-12 00:35 | |
Logged In: YES user_id=357491 mbrierst is right. From C.1 of RFC 2396 (with http://a/b/c/d;p?q as the base): ?y = http://a/b/c/?y ;x = http://a/b/c/;x And notice how this contradicts RFC 1808 ( with <URL:http://a/b/c/ d;p?q#f> as the base): ?y = <URL:http://a/b/c/d;p?y> ;x = <URL:http://a/b/c/d;x> So obviously there is a conflict here. And since RFC 2396 says "it revises and replaces the generic definitions in RFC 1738 and RFC 1808" (of which "generic" just means the actual syntax) this means that RFC 2396's solution should override. Now the issue is whether the patch for this is the right thing to do (I am ignoring if the patch is correct; have not tested it yet). This shouldn't break anything since the whole point of urlparse.urljoin is to have an abstracted way to create URIs without the user having to worry about all of these rules. So I say that it should be changed. Fred, do you mind if I reassign this patch to myself and deal with it? |
|||
msg5906 - (view) | Author: Brett Cannon (brett.cannon) * | Date: 2003-06-12 07:24 | |
Logged In: YES user_id=357491 Since there is the random possibility that this might break code that depends on this to act like RFC 1808 instead of 2396 and 2.3 has hit beta I am going to wait for 2.4 before I deal with this. |
|||
msg5907 - (view) | Author: Brett Cannon (brett.cannon) * | Date: 2003-10-12 04:42 | |
Logged In: YES user_id=357491 rev. 1.42 of Lib/urlparse.py and rev. 1.13 of Lib/test/ test_urlparse.py have mbrierst's fixes (thanks, Michael) after I had to do a second commit to get the comment correct. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-10 16:04:19 | admin | set | github: 34947 |
2001-08-12 05:10:12 | aaronsw | create |