This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: re.sub() coerces u'' to ''
Type: Stage:
Components: Regular Expressions Versions: Python 2.2
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: effbot Nosy List: effbot, loewis, mike_j_brown
Priority: low Keywords:

Created on 2002-11-08 09:32 by mike_j_brown, last changed 2022-04-10 16:05 by admin. This issue is now closed.

Messages (3)
msg13139 - (view) Author: Mike Brown (mike_j_brown) Date: 2002-11-08 09:32
Using Python 2.2.1 on FreeBSD, these work as 
expected:

>>> re.sub(u'f', u'b', u'foo')  # keep string as Unicode
u'boo'
>>> re.sub(u'f', u'b', 'foo')   # coerce string to Unicode
u'boo'

But this doesn't work the way I think it should:

>>> re.sub(u'f', u'b', u'')     # coerce string to non-
Unicode?!
> ''

That is, an empty Unicode string does not survive as 
Unicode after going through re.sub().
msg13140 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2002-11-09 18:05
Logged In: YES 
user_id=21627

Would you like to work on a patch for this bug?
msg13141 - (view) Author: Fredrik Lundh (effbot) * (Python committer) Date: 2002-11-09 18:48
Logged In: YES 
user_id=38376

this buglet has already been fixed in the SRE master
repository.  here's the relevant portion:

*** 1802,1808 ****
      switch (PyList_GET_SIZE(list)) {
      case 0:
          Py_DECREF(list);
!         return PyString_FromString("");
      case 1:
          result = PyList_GET_ITEM(list, 0);
          Py_INCREF(result);
--- 1785,1791 ----
      switch (PyList_GET_SIZE(list)) {
      case 0:
          Py_DECREF(list);
!         return PySequence_GetSlice(pattern, 0, 0);
      case 1:
          result = PyList_GET_ITEM(list, 0);
          Py_INCREF(result);

I'll update the Python repository asap (once I've gotten around
to merge in some changes done in the Python repository).

</F>

PS. also see my post on comp.lang.python on this topic.
well-written unicode code shouldn't care about things like
this...
History
Date User Action Args
2022-04-10 16:05:49adminsetgithub: 37438
2002-11-08 09:32:13mike_j_browncreate