This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: PyUnicode_Find() returns wrong results
Type: Stage:
Components: Documentation Versions:
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: fdrake Nosy List: ajung, fdrake, lemburg, tim.peters
Priority: normal Keywords:

Created on 2002-06-10 00:25 by ajung, last changed 2022-04-10 16:05 by admin. This issue is now closed.

Messages (8)
msg11129 - (view) Author: Andreas Jung (ajung) Date: 2002-06-10 00:25
The following is used to search for u'h' inside
the string u'+#&.' (stored in self->seperator)

static PyObject *
Splitter_test(Splitter *self) {
    PyObject *test, *res, *word;

    test = Py_BuildValue("s","h");
    word =
PyUnicode_FromEncodedObject(test,"ascii","strict");

    printf("seperator: ");
    PyObject_Print(self->seperator, stdout, 0);
    puts("");

    printf("Searching for: ");
    PyObject_Print(word, stdout, 0);
    puts("");

    res = PyUnicode_Find(self->seperator, word, 0,
PyUnicode_GET_SIZE(self->seperator), 1);
    if (res==0) puts("not found");
    else puts("found");

}

The output is: 

yetix(366)% python2.1 test.py
seperator: u'+#&.'
Searching for: u'h'
found

So PyUnicode_Find() returns a match although
u'h' is not contained within self->seperator.

Env: Linux 2.4, Python 2.1.3



msg11130 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2002-06-10 09:17
Logged In: YES 
user_id=38388

You should check what res actually is rather then doing
the check for == 0. Note that res == -1 means
"not found", res == 0 maps to: match found at position
0.
msg11131 - (view) Author: Andreas Jung (ajung) Date: 2002-06-10 10:45
Logged In: YES 
user_id=11084

So what does PyUnicode_Find() return?

From the docs:
PyObject* PyUnicode_Find(PyObject *str, PyObject *substr,
int start, int end, int direction)
    Return value: New reference.
Return the first position of substr in str[start:end] using
the given direction (direction == 1 means to do a forward
search, direction == -1 a backward search), 0 otherwise. 

Does it return an int with the position or a PyObject * ?

-aj
msg11132 - (view) Author: Andreas Jung (ajung) Date: 2002-06-10 10:48
Logged In: YES 
user_id=11084

Looking it the source, it is a documentation bug since 
PyUnicode_Find()  really returns int. 

Can you assign this issue to Fred to fix the docs *wink*?

-aj
msg11133 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2002-06-10 10:52
Logged In: YES 
user_id=38388

The docs should say it's an int.
msg11134 - (view) Author: Andreas Jung (ajung) Date: 2002-06-10 10:53
Logged In: YES 
user_id=11084

Fred, could you please fix the docs for PyUnicode_Find()?
The return value is an int instead of a PyObject *. Also -1
is returned for "not found" instead of 0.
msg11135 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2002-06-10 11:09
Logged In: YES 
user_id=31435

Thanks for spotting this, Andreas!

Changed Category to Docs.

Fred, the correct return conditions are documented in 
unicodeobject.h:

"""
Return the first position of substr in str[start:end] using the
given search direction or -1 if not found. -2 is returned in
case  an error occurred and an exception is set.
"""

While looking that up, I noticed that PyUnicode_Count is 
similarly misdocumented.  There may well be others.
msg11136 - (view) Author: Fred Drake (fdrake) (Python committer) Date: 2002-06-20 22:09
Logged In: YES 
user_id=3066

Fixed in:
Doc/api/concrete.tex 1.17, 1.6.6.5.
Doc/api/refcounts.dat 1.42, 1.38.6.3.
History
Date User Action Args
2022-04-10 16:05:24adminsetgithub: 36715
2002-06-10 00:25:09ajungcreate