This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: string method bugs w/ 8bit, unicode args
Type: Stage:
Components: Unicode Versions: Python 2.3
process
Status: closed Resolution: accepted
Dependencies: Superseder:
Assigned To: gvanrossum Nosy List: gvanrossum, inyeol
Priority: normal Keywords:

Created on 2002-08-15 01:56 by inyeol, last changed 2022-04-10 16:05 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
Python-2.2.1_SF_bug_595350_patched.tgz inyeol, 2002-08-16 17:45 tarball of source and unittest patch
sf595350 gvanrossum, 2002-08-19 22:08 Context diff relative to current CVS as of 8/19/02
sf595350_replace_fix.tgz inyeol, 2002-08-20 23:43 s.replace() fix (both 8-bit and unicode) and unittest update
replace.diff gvanrossum, 2002-08-22 15:52 Updated diff (still with bug)
unicodeobject.c-2.164.diff inyeol, 2002-08-23 10:06 bug fixed
Messages (9)
msg11974 - (view) Author: Inyeol Lee (inyeol) Date: 2002-08-15 01:56
Python 2.2.1 (#1, Apr 10 2002, 18:25:16) 
[GCC 2.95.3 20010315 (release)] on sunos5

1. "abc".endswith("c") ->1
   "abc".endswith(u"c") -> 0 # bug.
   u"abc".endswith("c") -> 1
   u"abc".endswith(u"c") -> 1

2. "aaa".rfind("a") -> 2
   "aaa".rfind(u"a") -> 0 # bug.
   u"aaa".rfind("a") -> 2
   u"aaa".rfind(u"a") -> 2

   .rindex() has the same bug.

3. "abc".rfind("") -> 3
   "abc".rfind(u"") -> 0 # bug.
   u"abc".rfind("") -> 0 # bug.
   u"abc".rfind(u"") -> 0 # bug.

   .rindex() has the same bug.

4. "abc".replace("", "x") -> ValueError
   "abc".replace(u"", "x") -> u'abcxxxx' # bug.
   u"abc".replace("", "x") -> u'abcxxxx' # bug.
   u"abc".replace(u"", "x") -> u'abcxxxx' # bug.

   They should raise ValueError, or return u'xaxbxcx'.
   BTW, how about changing s.replace("") behavior to
return
   "xaxbxcx" (or u"xaxbxcx") for all 4 cases? It is
consistent
   with other string methods and re.sub() method.
   It seems that Guido doesn't mind changing this.

[Guido]
> If someone really wants 'abc'.replace('', '-') to
return '-a-b-c-',
> please submit patches for both 8-bit and Unicode
strings to
> SourceForge and assign to me.  I looked into this and
it's
> non-trivial: the implementation used for 8-bit
strings goes into an
> infinite loop when the pattern is empty, and the
Unicode
> implementation tacks '----' onto the end.  Please
supply doc and
> unittest patches too.  At least re does the right
thing already:

5. (it's not a bug)
   Except for .replace() above, s.split() is the only
string method
   which raises exception. How about changing this to
return
   unmodified string when empty string  is given as a
separator?
   This is consistent with re.split() behavior.

- Inyeol Lee
msg11975 - (view) Author: Inyeol Lee (inyeol) Date: 2002-08-16 17:45
Logged In: YES 
user_id=595280

uploaded patch for these bugs.  -Inyeol Lee
msg11976 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2002-08-19 22:08
Logged In: YES 
user_id=6380

I'm still reviewing this.

Next time, please send context diffs; diffs relative to
current (or at least fairly recent :-) CVS would also be
appreciated.

Don't add the bug number in comments for each change; I will
have to remove all those manually now...

I'm uploading a version patch that applies cleanly to
current CVS; it's not ready yet (the Unicode tests fail and
I have to clean up the comments).

Backporting will be a bitch, because I don't want the
changes for x.replace('', ...) to be backported (new
functionality etc.).
msg11977 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2002-08-20 16:35
Logged In: YES 
user_id=6380

I'm rejecting this patch.

The fixes for endswith(), rfind() and rindex() are already
in 2.3; these should be backported to 2.2 and I may use your
code for them.

I can't accept the new feature for replace() until you also
have a working patch for Unicode; I'm only +0 on this so I'm
not going to spend more time getting it right. Feel free to
submit a patch for *just* that here.

I don't want to accept the new feature for split(), because
I disagree that 'abc'.split('') should return ['abc']; if
anything, it should return ['', 'a', 'b', 'c', ''], but one
can argue about this and I think the ValueError is better.
msg11978 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2002-08-20 17:33
Logged In: YES 
user_id=6380

Note: I've now applied your fixes for endswith and rfind
c.s. to Python 2.2 and 2.3.
msg11979 - (view) Author: Inyeol Lee (inyeol) Date: 2002-08-20 23:54
Logged In: YES 
user_id=595280

s.replace() patched, unittest updated from current CVS.
Tested with 2.2.1, but it should work fine with 2.3a, since
the codes
affected by this patch didn't changed since 2.2.1.
msg11980 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2002-08-22 15:52
Logged In: YES 
user_id=6380

Hrm. You uploaded a gzipped tarball containing 5 separate
context diff files; and these were reverse diffs (created by
doing "diff -c new old). I prefer a single file containing
multiple forward context diffs. I've uploaded your patch in
the form that I'd like to see (taken relative to current CVS).

But there's a worse problem: your code contains a bug.
Consider this example:

u'abc'.replace('', '-', 0)

This correctly returns u'abc'.

Bot now try this:

class U(unicode): pass
U(u'abc').replace('', '-', 0)

This returns u'-ab' !!!

Please fix this.

I'm also still waiting for a  documentation update.
msg11981 - (view) Author: Inyeol Lee (inyeol) Date: 2002-08-23 10:06
Logged In: YES 
user_id=595280

bug fixed.

I think we don't need to update .replace() documentation.
I checked latest doc on string methods, such as .count() or
.find(),
still there's no coment on empty string behavior for other
methods.
msg11982 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2002-08-23 18:22
Logged In: YES 
user_id=6380

Thanks for the fix, I've checked this in now.

I'll update the docs because I want it to be explicit about
this (and so we can say this is a new feature in 2.3).
History
Date User Action Args
2022-04-10 16:05:35adminsetgithub: 37034
2002-08-15 01:56:23inyeolcreate