This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Fast-path for interned string compares
Type: Stage:
Components: Interpreter Core Versions:
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: lemburg Nosy List: arigo, lemburg, loewis, nascheme, tim.peters
Priority: normal Keywords: patch

Created on 2001-11-08 15:19 by lemburg, last changed 2022-04-10 16:04 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
str-eq-cmp.patch lemburg, 2001-11-08 15:19 Patch
str-eq-cmp2.patch nascheme, 2002-03-23 23:35
Messages (8)
msg38120 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2001-11-08 15:19
This patch adds a fast-path for comparing equality of interned strings. 

The patch boosts performance for comparing identical string objects 
by some 20% on my machine while not causing any noticable slow-down 
for other operations (according to tests done with pybench).

More infos and benchmarks later...
msg38121 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2001-11-08 15:26
Logged In: YES 
user_id=38388

Output from pybench comparing today's CVS Python with patch (eqpython) and without patch (stdpython):

PYBENCH 1.0

Benchmark: eqpython.bench (rounds=10, warp=20)

Tests:                              per run    per oper.    diff *)
------------------------------------------------------------------------
          BuiltinFunctionCalls:     125.55 ms    0.98 us    -1.68%
           BuiltinMethodLookup:     180.10 ms    0.34 us    +1.75%
                 CompareFloats:     107.30 ms    0.24 us    +2.04%
         CompareFloatsIntegers:     185.15 ms    0.41 us    -0.05%
               CompareIntegers:     163.50 ms    0.18 us    -1.77%

        CompareInternedStrings:      79.50 ms    0.16 us   -20.78%
^^^^^^^^^^^^^^^^^^^^ This is the interesting line :-) ^^^^^^^^^^^^^^^^^^^^^^^^^^

                  CompareLongs:     110.25 ms    0.24 us    +0.09%
                CompareStrings:     143.40 ms    0.29 us    +2.14%
                CompareUnicode:     118.00 ms    0.31 us    +1.68%
                 ConcatStrings:     189.55 ms    1.26 us    -1.61%
                 ConcatUnicode:     226.55 ms    1.51 us    +1.34%
               CreateInstances:     202.35 ms    4.82 us    -1.87%
       CreateStringsWithConcat:     221.00 ms    1.11 us    +0.45%
       CreateUnicodeWithConcat:     240.00 ms    1.20 us    +1.27%
                  DictCreation:     213.25 ms    1.42 us    +0.47%
             DictWithFloatKeys:     263.50 ms    0.44 us    +1.15%
           DictWithIntegerKeys:     158.50 ms    0.26 us    -1.86%
            DictWithStringKeys:     147.60 ms    0.25 us    +0.75%
                      ForLoops:     144.90 ms   14.49 us    -4.64%
                    IfThenElse:     174.15 ms    0.26 us    -0.00%
                   ListSlicing:      88.80 ms   25.37 us    -1.11%
                NestedForLoops:     136.95 ms    0.39 us    +3.01%
          NormalClassAttribute:     177.80 ms    0.30 us    -2.68%
       NormalInstanceAttribute:     166.85 ms    0.28 us    -0.54%
           PythonFunctionCalls:     152.20 ms    0.92 us    +1.40%
             PythonMethodCalls:     133.70 ms    1.78 us    +1.60%
                     Recursion:     119.45 ms    9.56 us    +0.04%
                  SecondImport:     124.65 ms    4.99 us    -6.03%
           SecondPackageImport:     130.70 ms    5.23 us    -5.73%
         SecondSubmoduleImport:     161.65 ms    6.47 us    -5.88%
       SimpleComplexArithmetic:     245.50 ms    1.12 us    +2.08%
        SimpleDictManipulation:     108.50 ms    0.36 us    +0.05%
         SimpleFloatArithmetic:     125.80 ms    0.23 us    +0.84%
      SimpleIntFloatArithmetic:     128.50 ms    0.19 us    -1.46%
       SimpleIntegerArithmetic:     128.45 ms    0.19 us    -0.77%
        SimpleListManipulation:     159.15 ms    0.59 us    -5.32%
          SimpleLongArithmetic:     189.55 ms    1.15 us    +2.65%
                    SmallLists:     293.70 ms    1.15 us    -5.26%
                   SmallTuples:     230.00 ms    0.96 us    +0.44%
         SpecialClassAttribute:     175.70 ms    0.29 us    -2.79%
      SpecialInstanceAttribute:     199.70 ms    0.33 us    -1.55%
                StringMappings:     196.85 ms    1.56 us    -2.48%
              StringPredicates:     133.00 ms    0.48 us    -8.28%
                 StringSlicing:     165.45 ms    0.95 us    -3.47%
                     TryExcept:     193.60 ms    0.13 us    +0.57%
                TryRaiseExcept:     175.40 ms   11.69 us    +0.69%
                  TupleSlicing:     156.85 ms    1.49 us    -0.00%
               UnicodeMappings:     175.90 ms    9.77 us    +1.76%
             UnicodePredicates:     141.35 ms    0.63 us    +0.78%
             UnicodeProperties:     184.35 ms    0.92 us    -2.10%
                UnicodeSlicing:     179.45 ms    1.03 us    -1.10%
------------------------------------------------------------------------
            Average round time:    9855.00 ms               -1.13%

*) measured against: stdpython.bench (rounds=10, warp=20)

As you can see, the rest of the results don't change much and the
ones that do indicate some additional benefit gained by the patch.
All slow-downs are way below the noise limit of around 5-10% (depending
the platforms/machine/compiler).

msg38122 - (view) Author: Neil Schemenauer (nascheme) * (Python committer) Date: 2002-03-23 23:35
Logged In: YES 
user_id=35752

Attached is an updated version of this patch.  I'm -0 on it
since it doesn't seem to help much except for artificial
benchmarks.
msg38123 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2002-08-08 21:07
Logged In: YES 
user_id=21627

Is there any progress on this patch, or should it be
considered withdrawn?
msg38124 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2002-08-08 21:16
Logged In: YES 
user_id=38388

I still consider the patch worth adding. The application space
where it helps may be small, but also important: it can
massively
speed up parsers which use interned strings as tokens.
msg38125 - (view) Author: Armin Rigo (arigo) * (Python committer) Date: 2002-11-12 13:25
Logged In: YES 
user_id=4771

It seems to me that the whole status of interned strings is
not clear from the user's perspective.  Maybe we should
avoid putting more emphasis on it.  Deprecating intern() in
favor of sys.intern() would even look like a good thing to
do.

Besides, in the use case you describe, you can compare
tokens with "is" instead of "==" as you know for sure that
you are comparing two explicitely interned strings.  That's
a hack, but calling intern() in the first place already
looks like a hack.

I'd vote against it, but if the patch is accepted don't
forget to change the constants EQ, LE,... into PyCmp_EQ,
PyCmp_LE,...
msg38126 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2002-11-14 17:11
Logged In: YES 
user_id=31435

Marc, as Armin suggests, why is it that you can't use "is" in 
your code when you know you've got this special case?

I'm inclined to reject this patch.  While it gives a significant 
speedup in the specific CompareInternedStrings benchmark, 
by eyeball it adds Yet Antoher test and branch to all other 
non special-case compares, and I'm not inclined to believe 
that comparing interned strings should be favored at the 
expense of, e.g., slowing float compares, or compares of non-
interned strings, or etc etc.

I have to note too that the measured 21% speedup on a 
benchmark that does nothing else doesn't support a claim 
of "massive speedups".  At best it looks like a small win for a 
small class of apps, at the expense of smaller losses for 
much larger classes of apps.
msg38127 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2002-11-14 17:17
Logged In: YES 
user_id=38388

I could use "is" in my code (and in fact, I am currently), but
consider this a hack.

Anyway, the PEP 0275 has a much wider scope, so I'll
close this patch as rejected in the hope that PEP 275
will make it into the core.
History
Date User Action Args
2022-04-10 16:04:37adminsetgithub: 35491
2001-11-08 15:19:38lemburgcreate