This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: SSL "issuer" and "server" names cannot be parsed
Type: Stage:
Components: Library (Lib) Versions:
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: janssen Nosy List: akuchling, janssen, loewis, nagle
Priority: normal Keywords:

Created on 2006-10-24 18:32 by nagle, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (13)
msg30384 - (view) Author: John Nagle (nagle) Date: 2006-10-24 18:32
(Python 2.5 library)

    The Python SSL object offers two methods from
obtaining the info from an SSL certificate, "server()"
and "issuer()".  These return strings.

    The actual values in the certificate are a series
of key /value pairs in ASN.1 binary format.  But what
"server()" and "issuer()" return are single strings,
with the key/value pairs separated by "/". 

    However, "/" is a valid character in certificate
data. So parsing such strings is ambiguous, and
potentially exploitable.

    This is more than a theoretical problem.  The
issuer field of Verisign certificates has a "/" in the
middle of a text field:

"/O=VeriSign Trust Network/OU=VeriSign,
Inc./OU=VeriSign International Server CA - Class
3/OU=www.verisign.com/CPS Incorp.by Ref. LIABILITY
LTD.(c)97 VeriSign".

Note the 

  "OU=Terms of use at www.verisign.com/rpa (c)00"

with a "/" in the middle of the value field.  Oops.

    Worse, this is potentially exploitable.  By
ordering a low-level certificate with a "/" in the
right place, you can create the illusion (at least for
flawed implementations like this one) that the
certificate belongs to someone else.  Just order a
certificate from GoDaddy, enter something like this in
the "Name" field

    "Myphonyname/C=US/ST=California/L=San Jose/O=eBay
Inc./OU=Site Operations/CN=signin.ebay.com"

and Python code will be spoofed into thinking you're eBay.

   Fortunately, browsers don't use Python code.

   The actual bug is in

    python/trunk/Modules/_ssl.c

at

    if ((self->server_cert =
SSL_get_peer_certificate(self->ssl))) {
       
X509_NAME_oneline(X509_get_subject_name(self->server_cert),
                  self->server, X509_NAME_MAXLEN);
       
X509_NAME_oneline(X509_get_issuer_name(self->server_cert),
                  self->issuer, X509_NAME_MAXLEN);

The "X509_name_oneline" function takes an X509_NAME
structure, which is the certificate system's
representation of a list, and flattens it into a
printable string.  This is a debug function, not one
for use in production code.  The SSL documentation for
"X509_name_oneline" says:   

    "The functions X509_NAME_oneline() and
X509_NAME_print() are legacy functions which produce a
non standard output form, they don't handle multi
character fields and have various quirks and
inconsistencies.  Their use is strongly discouraged in
new applications."

What OpenSSL callers are supposed to do is call
X509_NAME_entry_count() to get the number of entries in
an X509_NAME structure, then get each entry with
X509_NAME_get_entry().  A few more calls will obtain
the name/value pair from the entry, as UTF8 strings,
which should be converted to Python UNICODE strings.
OpenSSL has all the proper support, but Python's shim
doesn't interface to it correctly. 

X509_NAME_oneline() doesn't handle Unicode; it converts
non-ASCII values to "\xnn" format. Again, it's for
debug output only.

So what's needed are two new functions for Python's SSL
sockets to replace "issuer" and "server".  The new
functions should return lists of Unicode strings
representing the key/value pairs. (A list is needed,
not a dictionary; two strings with the same key
are both possible and common.)

The reason this now matters is that new "high
assurance" certs, the ones that tell you how much a
site can be trusted, are now being deployed, and to use
them effectively, you need that info.  Support for them
is in Internet Explorer 7, so they're going to be
widespread soon. Python needs to catch up.

And, of course, this needs to be fixed as part of
Unicode support.  


                John Nagle
                Animats
msg30385 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2006-10-24 22:05
Logged In: YES 
user_id=413

Yes OpenSSL 0.9.8d or later should be used for a new binary
release.

http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2006-4343

http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2006-3738

http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2006-2940

http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2006-2937
msg30386 - (view) Author: John Nagle (nagle) Date: 2006-10-24 22:40
Logged In: YES 
user_id=5571

The problem isn't in the version of OpenSSL used in Python,
which is at 0.9.8a.  OpenSSL has had the necessary functions
for years.  But Python isn't using them.

It's in  "python/trunk/Modules/_ssl.c", as described above.  
msg30387 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2006-10-25 08:38
Logged In: YES 
user_id=21627

The bug is not in the the server() and issuer() methods
(which do exactly what they are meant to do); the bug is in
applications which assume that the result of these methods
can be parsed. As you point out, it cannot. The functions,
as is, don't present a security problem. If their result is
presented as-is to the user, the user can determine herself
whether she recognizes the entity referred-to in the
distinguished name.

Notice that it is certainly possible to produce an
unambigous string representation of a distinguished name;
RFC 4514 specifies an algorithm to do so (for use within LDAP).

Also notice that that the SSL module does little to actually
support trust: there is no verification of server-side
certs, no access to extensions of a certificate, etc. So an
application and a user should *not* trust the issuer name it
received, anyway (unless 
there is an independent verification that the server
certificate can be trusted).

All that said: If you think you need this functionality,
please provide a patch to implement it.
msg30388 - (view) Author: John Nagle (nagle) Date: 2006-10-25 17:26
Logged In: YES 
user_id=5571

Actually, they don't do what they're "designed to do". 
According to the Python library documentation for SSL
objects, the server method "Returns a string containing the
ASN.1 distinguished name identifying the server's
certificate. (See below for an example showing what
distinguished names look like.)" The example "below" is
missing from the documentation, so the documentation gives
us no clue of what to expect.  

There are several standardized representations for ASN.1
information.  See
"http://www.oss.com/asn1/tutorial/Explain.html"  Most are
binary. The only standard textual form is "XER", which is an
XML representation of ASN.1 encoded information.  It's
essentially the same representation used for parameters in
SOAP. 

So, given the documentation and the standard, what should be
coming out is the XML representation of that data. 

Here's an entire X.509 certificate in XML:

http://www.gnu.org/software/gnutls/manual/html_node/An-X_002e509-certificate.html

The "issuer" field can be seen in there.  It's awfully
bulky.  And making SSL dependent on the SOAP module probably
isn't desireable.  But that's an ASN.1 distinguished name in
XML format, per the standard. 

That's probably not what's wanted by most users, although
the ability to retrieve an entire certificate in XML format
would be useful.

However, there's another standard string encoding, which is
defined in RFC2253.  This is comma-separated UTF-8 with
backslash escapes for special characters.  That's reliably
parseable. There's an openSSL function,
"X509_NAME_print_ex", which does this formatting, but it
doesn't output to a string.  That's the right mechanism if
it can be invoked in some way to yield a string.  It should
be invoked with flags = ASN1_STRFLGS_RFC2253, which yields a
UTF8 string, which of course should become a Python Unicode
string.

Now if someone can figure out how to get a string, instead
of file output, out of OpenSSL's "X509_NAME_print_ex", we're
home. 
msg30389 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2006-10-25 18:05
Logged In: YES 
user_id=21627

Notice that RFC 2253 has been superceded by RFC 4514 (see my
earlier message). However, I really see no reason to fix this:
even if the ambiguity problems were fixed, you *still*
should not
use the issuer and subject names in a security-relevant context.
msg30390 - (view) Author: A.M. Kuchling (akuchling) * (Python committer) Date: 2006-10-27 12:54
Logged In: YES 
user_id=11375

I've reworded the description in the documentation to say
something like this: "Returns a string describing the issuer
of the server's certificate.
Useful for debugging purposes; do not parse the content of
this string
because its format can't be parsed unambiguously."

For adding new features: please submit a patch.  Python's
maintainers probably don't use SSL in 
any sophisticated way and therefore have no idea what shape
better SSL/X.509 support would take.

msg30391 - (view) Author: John Nagle (nagle) Date: 2006-11-08 07:02
Logged In: YES 
user_id=5571

I've submitted a request (titled "Request: make
X509_NAME_oneline() use same formatter as
X509_NAME_print_ex()") to the OpenSSL developers to fix this
on their side.  If they fix that, delimiters will be escaped
per the standard.

The OpenSSL people should also export the functionality of
getting this information
as a UTF8 string, and if they do, Python should use that
call as part of Unicode support.  Keep this open pending
action on the OpenSSL side.  Thanks. 
msg30392 - (view) Author: A.M. Kuchling (akuchling) * (Python committer) Date: 2006-11-17 13:56
The request is bug #1425 in the OpenSSL request tracker (go to openssl.org > Support for a link).
msg55298 - (view) Author: Bill Janssen (janssen) * (Python committer) Date: 2007-08-26 02:59
I believe issue 1018 addressed this, and that it can be closed.  Though 
socket.ssl, and its methods "server" and "issuer", should be deprecated.
msg55446 - (view) Author: Bill Janssen (janssen) * (Python committer) Date: 2007-08-29 22:54
Actually, looking at it further, I'm not sure that it is fixed by the new 
SSL code.  If in fact the issuer or subject field can contain multiple 
name-value pairs with the same name, the dictionary-based approach 
currently used won't work.  We'll need more of an alist approach, with 
name-value tuples in it.  I'd better look into this.
msg55652 - (view) Author: Bill Janssen (janssen) * (Python committer) Date: 2007-09-05 01:09
I've changed the return value of ssl.sslsocket.getpeercert() to return the 
"issuer" and "subject" names as tuples containing 2-element name-value 
tuples, in the same order that they appear in the certificate.  This 
should complete the fulfillment of this issue.  Please see the doc page on 
library/ssl.rst for more information.
msg55799 - (view) Author: Bill Janssen (janssen) * (Python committer) Date: 2007-09-10 21:54
Fixed in rev 58097.
History
Date User Action Args
2022-04-11 14:56:20adminsetgithub: 44165
2007-09-10 21:55:03janssensetstatus: open -> closed
resolution: fixed
2007-09-10 21:54:53janssensetmessages: + msg55799
2007-09-05 01:09:07janssensetmessages: + msg55652
2007-08-29 22:54:31janssensetassignee: janssen
messages: + msg55446
2007-08-26 03:06:51gregory.p.smithsetnosy: - gregory.p.smith
2007-08-26 02:59:49janssensetnosy: + janssen
messages: + msg55298
2006-10-24 18:32:27naglecreate