Issue 566631: PyUnicode_Find() returns wrong results

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/36715

classification

Title:	PyUnicode_Find() returns wrong results
Type:		Stage:
Components:	Documentation	Versions:

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:	fdrake	Nosy List:	ajung, fdrake, lemburg, tim.peters
Priority:	normal	Keywords:

Created on 2002-06-10 00:25 by ajung, last changed 2022-04-10 16:05 by admin. This issue is now closed.

Messages (8)
msg11129 - (view)	Author: Andreas Jung (ajung)	Date: 2002-06-10 00:25
The following is used to search for u'h' inside the string u'+#&.' (stored in self->seperator) static PyObject * Splitter_test(Splitter self) { PyObject test, res, word; test = Py_BuildValue("s","h"); word = PyUnicode_FromEncodedObject(test,"ascii","strict"); printf("seperator: "); PyObject_Print(self->seperator, stdout, 0); puts(""); printf("Searching for: "); PyObject_Print(word, stdout, 0); puts(""); res = PyUnicode_Find(self->seperator, word, 0, PyUnicode_GET_SIZE(self->seperator), 1); if (res==0) puts("not found"); else puts("found"); } The output is: yetix(366)% python2.1 test.py seperator: u'+#&.' Searching for: u'h' found So PyUnicode_Find() returns a match although u'h' is not contained within self->seperator. Env: Linux 2.4, Python 2.1.3
msg11130 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2002-06-10 09:17
Logged In: YES user_id=38388 You should check what res actually is rather then doing the check for == 0. Note that res == -1 means "not found", res == 0 maps to: match found at position 0.
msg11131 - (view)	Author: Andreas Jung (ajung)	Date: 2002-06-10 10:45
Logged In: YES user_id=11084 So what does PyUnicode_Find() return? From the docs: PyObject* PyUnicode_Find(PyObject str, PyObject substr, int start, int end, int direction) Return value: New reference. Return the first position of substr in str[start:end] using the given direction (direction == 1 means to do a forward search, direction == -1 a backward search), 0 otherwise. Does it return an int with the position or a PyObject * ? -aj
msg11132 - (view)	Author: Andreas Jung (ajung)	Date: 2002-06-10 10:48
Logged In: YES user_id=11084 Looking it the source, it is a documentation bug since PyUnicode_Find() really returns int. Can you assign this issue to Fred to fix the docs wink? -aj
msg11133 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2002-06-10 10:52
Logged In: YES user_id=38388 The docs should say it's an int.
msg11134 - (view)	Author: Andreas Jung (ajung)	Date: 2002-06-10 10:53
Logged In: YES user_id=11084 Fred, could you please fix the docs for PyUnicode_Find()? The return value is an int instead of a PyObject *. Also -1 is returned for "not found" instead of 0.
msg11135 - (view)	Author: Tim Peters (tim.peters) *	Date: 2002-06-10 11:09
Logged In: YES user_id=31435 Thanks for spotting this, Andreas! Changed Category to Docs. Fred, the correct return conditions are documented in unicodeobject.h: """ Return the first position of substr in str[start:end] using the given search direction or -1 if not found. -2 is returned in case an error occurred and an exception is set. """ While looking that up, I noticed that PyUnicode_Count is similarly misdocumented. There may well be others.
msg11136 - (view)	Author: Fred Drake (fdrake)	Date: 2002-06-20 22:09
Logged In: YES user_id=3066 Fixed in: Doc/api/concrete.tex 1.17, 1.6.6.5. Doc/api/refcounts.dat 1.42, 1.38.6.3.

History
Date	User	Action	Args
2022-04-10 16:05:24	admin	set	github: 36715
2002-06-10 00:25:09	ajung	create