Issue 1546628: urlparse.urljoin odd behaviour

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/43899

classification

Title:	urlparse.urljoin odd behaviour
Type:		Stage:
Components:	Library (Lib)	Versions:	Python 2.4

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:		Nosy List:	andresriancho, georg.brandl, the_j10
Priority:	normal	Keywords:

Created on 2006-08-25 13:04 by andresriancho, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (3)
msg29685 - (view)	Author: Andres Riancho (andresriancho)	Date: 2006-08-25 13:04
Hi ! I think i have found a bug on the urljoin function of the urlparse module. I'm using Python 2.4.3 (#2, Apr 27 2006, 14:43:58), [GCC 4.0.3 (Ubuntu 4.0.3-1ubuntu5)] on linux2 . Here is a demo of the bug : >>> import urlparse >>>urlparse.urljoin('http://www.f00.com/','//a') 'http://a' >>> urlparse.urljoin('http://www.f00.com/','https://0000/somethingIsWrong') 'https://0000/somethingIsWrong' >>> urlparse.urljoin('http://www.f00.com/','https://0000/somethingIsWrong') 'https://0000/somethingIsWrong' >>> urlparse.urljoin('http://www.f00.com/','file:///etc/passwd') 'file:///etc/passwd' The result for the first call to urljoin should be either 'http://www.f00.com/a' or 'http://www.f00.com//a'. The result to the second and third call to urljoin should be 'http://www.f00.com/', or maybe an exception ? Please correct me if i'm wrong and this is some kind of feature or the bug was already reported. This bug can result in a security vuln, take this code as an example: // viewImage.py // import htmlTools # Some fake module, just for the example import urlparse # module with bug. htmlTools.startHtml() # print <html> params = htmlTools.getParams() # get the query string parameters htmlTools.printToHtml( '<img src=' + urlparse.urljoin( 'http://myWebsite/' , params['image'] ) + '>' ) htmlTools.endHtml() # print </html> // viewImage.py // The code should generate an html that shows an image from the site http://myWebsite/, but with the urljoin bug, the image source can be manipulated and result in a completely different html. Cheers, Andres Riancho
msg29686 - (view)	Author: Andrew Jones (the_j10)	Date: 2006-08-29 11:29
Logged In: YES user_id=332575 The second argument in the urljoin method can be either an absolute url or a relative url as specified by rfc1808. So your 1st example: '//a' gives a relative position w.r.t the base resulting in: 'http://a'. This is similar to how `cd /boot` takes you to a path relative to the filesystem's root '/'. In the rest of your examples you have the scheme name 'https'in the url as the 2nd argument. urljoin follows the rfc1808 and accepts the second argument if it has a scheme name as the absolute url and returns it. This behavior is not very intuitive. Perhaps the urlparse could be extended to have a urlappend method, which has the behavior you expected. Hmmm... -- Andrew
msg29687 - (view)	Author: Georg Brandl (georg.brandl) *	Date: 2006-10-12 11:15
Logged In: YES user_id=849994 The behavior is okay, but the docs didn't say that. I added a note in rev. 52303, 52304 (2.5).

History
Date	User	Action	Args
2022-04-11 14:56:19	admin	set	github: 43899
2006-08-25 13:04:08	andresriancho	create