Issue 964929: Unicode String formatting does not correctly handle objects

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/40329

classification

Title:	Unicode String formatting does not correctly handle objects
Type:		Stage:
Components:	Unicode	Versions:	Python 2.4

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:	lemburg	Nosy List:	lemburg, mewf
Priority:	normal	Keywords:

Created on 2004-06-02 10:48 by mewf, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (4)
msg20991 - (view)	Author: Giles Antonio Radford (mewf)	Date: 2004-06-02 10:48
I have a problem with the way '%s' is handled in unicode strings when formatted. The Python Language refrence states that a unicode serialisation of an object should be in __unicode__, and I have seen python break down if unicode data is returned in __str__. The problem is that there does not appear to be a way to interpolate the results from __unicode__ within a string: class EuroHolder: def __init__(self, price): self._price = price def __str__(self): return "%.02f euro" % self._price def __unicode__(self): return u"%.02f\u20ac" % self._price >>> class EuroHolder: ... def __init__(self, price): ... self._price = price ... def __str__(self): ... return "%.02f euro" % self._price ... def __unicode__(self): ... return u"%.02f\u20ac" % self._price ... >>> e = EuroHolder(123.45) >>> str(e) '123.45 euro' >>> unicode(e) u'123.45\u20ac' >>> "%s" % e '123.45 euro' >>> u"%s" % e #this is wrong u'123.45 euro' >>> u"%s" % unicode(e) # This is silly u'123.45\u20ac' >>> The first case is wrong, as I actually could cope with unicode data in the string I was substituting into, and I should be able to request the unicode data be put in. The second case is silly, as the whole point of string substion variables such as %s, %d and %f is to remove the need for coercion on the right of the %. Proposed solution #1: Make %s in unicode string substitution automatically check __unicode__() of the rvalue before trying __str__(). This is the most logical thing to expect of %s, if you insist on overloading it the way it currently does when a unicode object in the rvalue will ensure the result is unicode. Proposed solution #2: Make a new string conversion operator, such as %S or %U which will explicitly call __unicode__() on the rvalue even if the lvalue is a non-unicode string Solution #2 has the advantage that it does not break any previous behaviour of %s, and also allows for explicit conversion to unicode of 8-bits string in the lvalue. I prefer solution #1 as I feel that the current operation of %s is incorrect, and it's unliekly to break much, whereas the "advantage" of converting 8-bit strings in the lvalue to unicode which solution #2 advocates will just lead to encoding problems and sloppy code.
msg20992 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2004-07-23 10:29
Logged In: YES user_id=38388 Good point. I think the only change needed is to use PyObject_Unicode() instead of PyObject_Str() in unicodeobject.c's PyUnicode_Format(). This would then implement #1.
msg20993 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2004-07-23 16:21
Logged In: YES user_id=38388 I've checked in the proposed solution: Checking in Objects/unicodeobject.c; /cvsroot/python/python/dist/src/Objects/unicodeobject.c,v <-- unicodeobject.c new revision: 2.218; previous revision: 2.217
msg20994 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2004-07-23 16:22
Logged In: YES user_id=38388 Note that this will no go into 2.3.x since it is a new feature. Changing the scope to Python 2.4.

History
Date	User	Action	Args
2022-04-11 14:56:04	admin	set	github: 40329
2004-06-02 10:48:34	mewf	create