This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Decimal performance enhancements
Type: Stage:
Components: Library (Lib) Versions: Python 2.4
process
Status: closed Resolution: accepted
Dependencies: Superseder:
Assigned To: facundobatista Nosy List: facundobatista, ncoghlan, rhettinger
Priority: normal Keywords: patch

Created on 2004-09-02 00:08 by ncoghlan, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
telco_nofiles.py ncoghlan, 2004-09-02 00:11 A version of telco.py without file I/O
Decimal Versions.tar.gz ncoghlan, 2004-09-16 12:17 4 different patches for your enjoyment
Messages (12)
msg46829 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2004-09-02 00:08
The attached patch implements a collection of
performance enhancements for decimal.py. They are
currently focused on the performance of the telco
benchmark in the sandbox (i.e. they aim to speed up
instantiation, addition, multiplication and quantization).

The main strategy used (a subclass for NaN and infinite
values) can be extended to encompass more methods in
order to improve performance in other situations.

There is one non-performance related change included -
the '_sign' attribute is renamed to '_isneg' to better
reflect the boolean nature of the property.
msg46830 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2004-09-02 00:11
Logged In: YES 
user_id=1038590

I forgot to add that this is NOT yet a production quality
patch. The doc strings, in particular, need work. I'm mainly
posting it so Raymond can have a look at it.
msg46831 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2004-09-02 05:18
Logged In: YES 
user_id=80475

Nice job.  The timings do improve and the approach is
reasonable.

Please go ahead to the next step and polish it up.

Also, please change _isneg back to _sign.  That matches the
terminology in the spec, the machine centric sign bit
viewpoint of 754R, and the components of the user visible
as_tuple() method.

The goal is to make the patch as minimal as possible, to
keep the API unchanged, and improve performance.

Using the __new__ method was smart.  That more closely
corresponds to a potential C implementation of an immutable
type.
msg46832 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2004-09-07 05:05
Logged In: YES 
user_id=80475

Facundo, do you have time to look at this one?

We need to:

* revert the _isneg variable name change

* find out why it gives a 2:1 gain on telco but only 10% on
test_decimal.py

* make sure the API has not changed in a way that will make
it more difficult someday to convert this into C and make it
a built-in.
msg46833 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2004-09-07 21:06
Logged In: YES 
user_id=1038590

With the _sign/_isneg thing, I figured out the reason it was
hurting my brain - I was failing to think of the value as
representing a two's complement sign bit. Once I thought of
it that way, it made sense with the original terminology.

I suspect the lesser speedup on test_decimal.py may be due
to the fact that the current version of the patch only
optimises the operations used by the telco benchmark
(instantiation, addition, multiplication, quantisation). As
we speed up other operations, the performance on the unit
tests should improve.

As far as the API itself goes, the major differences I am
aware of are the existence of the new subclass, and the use
of __new__ instead of __init__. That will presumably be a
bit more work to implement in C since another class is
needed. Do we need to document somewhere that Decimal(x) may
return an instance of a subclass of Decimal() rather than an
instance of Decimal itself?

My current plan is to produce a second version of this patch
that:

1. Starts again from current CVS
2. Leaves '_sign' alone
3. Leaves existing doc strings alone (I broke a couple of
them in this version of patch - I hadn't really thought
about how to handle them properly)
4. For overridden methods in the _Special_Value class,
duplicate the docstring from the baseclass with "f.__doc__ =
Decimal.f.__doc__". This should avoid giving different
answers for "help(Decimal(1).__add__)" and
"help(Decimal('NaN').__add__)".
5. Optimises all operations, not just the four used by the
telco benchmark (with the new strategy in place, this is
mostly legwork - cutting and pasting the special-value
related stuff to the new subclass, and rearranging
appropriately. Unfortunately, it's legwork that I can't see
a sensible way to automate, so it's a matter of taking the
time to do it, and then checking that the tests all still pass).

msg46834 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2004-09-07 21:59
Logged In: YES 
user_id=80475

Glad to hear you've wrapped your head around the sign bit. 
FWIW, it bugged me too.  But it is so closely tied to the
spec that it is better to adjust our mental model than
change the code.

Do extend your optimizations to the other functions.  Don't
just optimize the benchmark -- that defeats the purpose.  

Can you work together with Facundo on this one?
msg46835 - (view) Author: Facundo Batista (facundobatista) * (Python committer) Date: 2004-09-07 23:47
Logged In: YES 
user_id=752496

Nick:

Submit the second version of the patch and I'll review it.
Also feel free to contact me by mail anytime.
msg46836 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2004-09-08 23:45
Logged In: YES 
user_id=1038590

2nd version of patch attached. It does NOT use a subclassing
strategy for optimisation due to the following charming
little problem I found with that approach:

>>> import decimal
>>> class foo(decimal.Decimal): pass
>>> x = foo("NaN")
>>> isinstance(x, foo)
False

I couldn't find any sane way to get around this, and it
seemed to be a showstopper given the amount of work that has
gone into making all the builtin types easily subclassable.
The new patch instead relies mainly on the following techniques:
  - defer invocation of getcontext() as late as possible, so
that we only call it if the context is going to be referenced
  - use issubclass(x, basestring) to identify if there are
special cases that need handling, then use _is_nan and
_is_infinity to determine exactly which special case applies
(this slows the special cases down slightly, but speeds up
the normal cases a lot)
  - make _convert_other a helper function instead of a method

There are also some other minor changes that were present in
the original patch (use sum() in __nonzero__, reduce the
number of calls to list.reversed() in __add__, extract
digits directly from integer inputs with divmod, rearrange
_fixexponents to minimise context method calls, cache
results of method calls in various operations)

Facundo, two particular changes I'd like you to OK are:
 - _fix now indents the second call to _fixexponents to be
inside the if block (in CVS, _fix_exponents could get called
again directly on the result of the previous call to
_fix_exponents, which didn't seem to make any sense)
  - _fixexponents returns immediately for NaN as well as
infinite values

This version of patch appears to give around a 40% speedup
on direct execution of test_decimal.py, and all the tests
still pass for me.
msg46837 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2004-09-16 12:17
Logged In: YES 
user_id=1038590

I've separated the performance enhancements out into four
different potential patches now:
  - "_odds_and_ends"
  - "_is_special"
  - "_both" (combination of the above two patches)
  - "_workint" (also incorporates "_both")

A single tar.gz file is attached which contains all those in
.diff and .py format, as well as a text file detailing the
changes made in each patch. All diff's are directly against
CVS. test_decimal.py passes cleanly with all four patches.

They're listed above in order of increasing degree of change
from current CVS. Hopefully I've gotten rid of any doc
string changes, and non-performance related code changes
(some comments have changed, usually because associated code
has changed)

The _workint version is the only version which regularly
achieves < 20 s wall clock time on my machine for the unit
tests and < 60 s for "telco_nofiles.py 10000" (that is, one
hundred thousand decimal values). The _both version seems to
run around 5 seconds slower on both tests (I am assuming
this is due to the fact that _workint speeds up division
significantly more than it speeds up addition and
multiplication, thus having a much more pronounced effect on
the unit tests than on the division-free telco benchmark)

Here are the details on the different patches:

decimal_odds_and_ends:
  - cache the result of repeated method calls in:
      __cmp__
      _rescale
      _check_nans
      _fix_exponents
  - use sum() in __nonzero__
  - call list.reverse() on preliminary result, not operands
in __add__
  - avoid redundant _fix_exponents call in _fix
  - rearrange _fix_exponents to minimise method calls on any
given path
  - remove dead code from end of __int__
  
decimal_is_special:
  - convert to use __new__ instead of __init__
  - new attribute _is_special is True if value is infinite
or NaN
  - defer invocation of getcontext() as late as possible
  - make _convert_other a function instead of a method
  - modify all methods to take advantage of _is_special
  
decimal_both:
  - combines decimal_is_special & decimal_odds_and_ends
  
decimal_workint:
  - based on decimal_both
  - uses a numeric value for _WorkRep.int instead of a list
  - modifies __new__, __add__ and _divide accordingly
  - modifies __mul__ to use a _WorkRep

(Sorry Raymond, I couldn't quite resist experimenting with
borrowing int/long's C level arithmetic implementation!)

I'll see about getting some wall-clock numbers for the
*actual* million-value telco benchmark with all 5 versions
(CVS + the four patches).  
msg46838 - (view) Author: Facundo Batista (facundobatista) * (Python committer) Date: 2004-09-17 01:50
Logged In: YES 
user_id=752496

Raymond, Nick:

I only reviewed the "both" version. From the patch text, the
changes were pretty logical, but they were a *lot*, so I
didn't want to extend the change to "_workint" version.
It'll always be time.

The majority of changes are because of the added attribute
"_is_special", which shortcuts in an obvious manner a lot of
code, and also makes possibly some interesting conceptual
changes (and some other changes were not simple and implied
good knowledge of the code and decimal internals).

There're also a lot of changes such as executing once a
function, binding its result in a name and using that name
several times later (which effectively reduced the quantity
of function calls). And trimmed several "getcontext()" when
the only thing that was done was passing context to other
function (not used there).

The changed decimal was OK in the tests, showing a 14% speed up.

I'm +1 to commit the "both" version. Checking the code, I
saw some other optimizations, but I don't want to introduce
new changes without commiting these first.

Nick, congratulations, you certainly made a good work! (btw,
don't upload files with spaces in the name)

Raymond, do you want to commit the code or should I do it?

.   Facundo
msg46839 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2004-09-17 03:37
Logged In: YES 
user_id=80475

I'll look at it in the next few day.
msg46840 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2004-09-19 01:54
Logged In: YES 
user_id=80475

Checking in Nick's best version as Lib/decimal.py 1.24

Made a minor modification to the workrep __init__ code.

Facunodo, feel free to checkin your own optimizations.  Make
sure they are clear both in concept and code.  Also, make
sure they show measurable speedups in both telco and the
test suite.
History
Date User Action Args
2022-04-11 14:56:06adminsetgithub: 40856
2004-09-02 00:08:07ncoghlancreate