This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: HTMLParser ParseError in start tag
Type: Stage:
Components: Library (Lib) Versions: Python 2.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: akuchling Nosy List: akuchling, bernd_zedv, nnseva
Priority: normal Keywords:

Created on 2004-03-23 10:17 by bernd_zedv, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (4)
msg20293 - (view) Author: Bernd Zimmermann (bernd_zedv) Date: 2004-03-23 10:17
when this - obviously correct html - is parsed:

<a href=mailto:xyz@domain.com>xyz</a>

this exception is raised:
HTMLParseError: junk characters in start 
tag: '@domain.com>', at line 1, column 1

I work around this by adding '@' to the
allowed character's class:

import HTMLParser
HTMLParser.attrfind = re.compile(
    r'\s*([a-zA-Z_][-.:a-zA-Z_0-9]*)(\s*=\s*'
    r'(\'[^\']*\'|"[^"]*"|[-a-zA-Z0-9./,:;+*%?!&$\(\)
_#=~@]*))?')

myparser = HTMLParser.HTMLParser()
myparser.feed('<a ... ')

msg20294 - (view) Author: A.M. Kuchling (akuchling) * (Python committer) Date: 2004-04-19 13:01
Logged In: YES 
user_id=11375

I don't believe this HTML is obviously correct.  
The section on attributes in the HTML 4.01 Recommendation
(http://www.w3.org/TR/html4/intro/sgmltut.html#h-3.2.2) says:

In certain cases, authors may specify the value of an
attribute without any quotation marks. The attribute value
may only contain letters (a-z and A-Z), digits (0-9),
hyphens (ASCII decimal 45), periods (ASCII decimal 46),
underscores (ASCII decimal 95), and colons (ASCII decimal
58). We recommend using quotation marks even when it is
possible to eliminate them.  

The regex is already more liberal than this, allowing slashes
and various other symbols, so we might as well add '@', but
you should also consider adding quotation marks to the
original attribute.
msg20295 - (view) Author: A.M. Kuchling (akuchling) * (Python committer) Date: 2004-06-05 15:32
Logged In: YES 
user_id=11375

Committed to the CVS HEAD; thanks!
msg20296 - (view) Author: Vsevolod Novikov (nnseva) Date: 2004-10-13 10:16
Logged In: YES 
user_id=325678

see request #1046092 to fix it
History
Date User Action Args
2022-04-11 14:56:03adminsetgithub: 40065
2004-03-23 10:17:42bernd_zedvcreate