Issue918462
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2004-03-18 01:50 by skip.montanaro, last changed 2022-04-11 14:56 by admin. This issue is now closed.
Files | ||||
---|---|---|---|---|
File name | Uploaded | Description | Edit | |
ceval.diff | skip.montanaro, 2004-03-18 01:50 |
Messages (8) | |||
---|---|---|---|
msg45594 - (view) | Author: Skip Montanaro (skip.montanaro) * | Date: 2004-03-18 01:50 | |
All this "is" vs "==" discussion led me to look at ceval.c. The attached patch seems to speed up "is" and "is not" comparisons - saving a function call to do a simple pointer comparison for non-integer arguments. The test suite passes, but it's been quite awhile since I messed around with the interpreter code, so I thought I ought to have another pair of eyeballs check it out... |
|||
msg45595 - (view) | Author: Tim Peters (tim.peters) * | Date: 2004-03-20 17:45 | |
Logged In: YES user_id=31435 Well, there's little question that this will speed "is" and "is not", but it also slows all other cases by the cost of the switch-and-branch to determine that they're not the favored cases. So why should we believe that speeding "is" and "is not" is more important than slowing other cases? |
|||
msg45596 - (view) | Author: Raymond Hettinger (rhettinger) * | Date: 2004-03-20 18:27 | |
Logged In: YES user_id=80475 Even "is" and "is not" are not helped by more than a couple of cycles. This fragment essentially inlines part of code for cmp_outcome(). Only the function call is saved. It does slow down other code paths by introducing an unpredictable branch. If the inlining were considered important, then the whole of cmp_outcome() should be inlined. Then, all comparisons save a single call/return pair. The cost is further increasing the size of the eval loop. |
|||
msg45597 - (view) | Author: Skip Montanaro (skip.montanaro) * | Date: 2004-03-21 14:33 | |
Logged In: YES user_id=44345 I spent a fair amount of time yesterday refining and running a shell script (attached) to compare the before and after times for various comparisons of simple objects. Here's the output: s = 'abc' operation before after delta %chg --------- ------ ----- ----- ---- s is 'abc' 0.375 0.329 0.046 -12.3 s == 'abc' 0.491 0.493 -0.002 0.4 s > 'abc' 0.491 0.493 -0.002 0.4 s is 4 0.375 0.333 0.042 -11.2 s == 4 1.200 1.190 0.010 -0.8 s > 4 1.200 1.190 0.010 -0.8 s is -1001 0.378 0.332 0.046 -12.2 s == -1001 1.200 1.190 0.010 -0.8 s > -1001 1.200 1.180 0.020 -1.7 s is 34.7 0.370 0.325 0.045 -12.2 s == 34.7 1.620 1.590 0.030 -1.9 s > 34.7 1.600 1.590 0.010 -0.6 s is 'a b c' 0.369 0.328 0.041 -11.1 s == 'a b c' 0.475 0.476 -0.001 0.2 s > 'a b c' 0.559 0.563 -0.004 0.7 s is True 0.531 0.491 0.040 -7.5 s == True 1.400 1.390 0.010 -0.7 s > True 1.400 1.380 0.020 -1.4 s = 4 operation before after delta %chg --------- ------ ----- ----- ---- s is 'abc' 0.369 0.325 0.044 -11.9 s == 'abc' 1.200 1.190 0.010 -0.8 s > 'abc' 1.200 1.190 0.010 -0.8 s is 4 0.353 0.353 0.000 0.0 s == 4 0.352 0.355 -0.003 0.9 s > 4 0.354 0.350 0.004 -1.1 s is -1001 0.347 0.350 -0.003 0.9 s == -1001 0.350 0.353 -0.003 0.9 s > -1001 0.346 0.345 0.001 -0.3 s is 34.7 0.367 0.327 0.040 -10.9 s == 34.7 0.773 0.769 0.004 -0.5 s > 34.7 0.771 0.772 -0.001 0.1 s is 'a b c' 0.370 0.327 0.043 -11.6 s == 'a b c' 1.200 1.190 0.010 -0.8 s > 'a b c' 1.200 1.190 0.010 -0.8 s is True 0.534 0.492 0.042 -7.9 s == True 0.905 0.911 -0.006 0.7 s > True 0.904 0.913 -0.009 1.0 s = None operation before after delta %chg --------- ------ ----- ----- ---- s is 'abc' 0.368 0.327 0.041 -11.1 s == 'abc' 0.962 0.950 0.012 -1.2 s > 'abc' 0.959 0.955 0.004 -0.4 s is 4 0.371 0.332 0.039 -10.5 s == 4 0.932 0.922 0.010 -1.1 s > 4 0.936 0.927 0.009 -1.0 s is -1001 0.370 0.330 0.040 -10.8 s == -1001 0.932 0.923 0.009 -1.0 s > -1001 0.935 0.925 0.010 -1.1 s is 34.7 0.368 0.325 0.043 -11.7 s == 34.7 1.110 1.110 0.000 0.0 s > 34.7 1.110 1.110 0.000 0.0 s is 'a b c' 0.370 0.325 0.045 -12.2 s == 'a b c' 0.963 0.948 0.015 -1.6 s > 'a b c' 0.961 0.949 0.012 -1.2 s is True 0.529 0.490 0.039 -7.4 s == True 1.110 1.110 0.000 0.0 s > True 1.120 1.110 0.010 -0.9 s = -1000 operation before after delta %chg --------- ------ ----- ----- ---- s is 'abc' 0.371 0.326 0.045 -12.1 s == 'abc' 1.200 1.190 0.010 -0.8 s > 'abc' 1.200 1.190 0.010 -0.8 s is 4 0.349 0.350 -0.001 0.3 s == 4 0.347 0.353 -0.006 1.7 s > 4 0.349 0.347 0.002 -0.6 s is -1001 0.348 0.352 -0.004 1.1 s == -1001 0.349 0.352 -0.003 0.9 s > -1001 0.346 0.348 -0.002 0.6 s is 34.7 0.366 0.326 0.040 -10.9 s == 34.7 0.769 0.771 -0.002 0.3 s > 34.7 0.766 0.777 -0.011 1.4 s is 'a b c' 0.367 0.328 0.039 -10.6 s == 'a b c' 1.210 1.190 0.020 -1.7 s > 'a b c' 1.200 1.190 0.010 -0.8 s is True 0.536 0.490 0.046 -8.6 s == True 0.887 0.887 0.000 0.0 s > True 0.890 0.892 -0.002 0.2 s = 34.2 operation before after delta %chg --------- ------ ----- ----- ---- s is 'abc' 0.369 0.327 0.042 -11.4 s == 'abc' 1.630 1.620 0.010 -0.6 s > 'abc' 1.640 1.620 0.020 -1.2 s is 4 0.372 0.332 0.040 -10.8 s == 4 0.791 0.795 -0.004 0.5 s > 4 0.797 0.798 -0.001 0.1 s is -1001 0.375 0.331 0.044 -11.7 s == -1001 0.792 0.792 0.000 0.0 s > -1001 0.790 0.791 -0.001 0.1 s is 34.7 0.367 0.482 -0.115 31.3 s == 34.7 1.080 0.536 0.544 -50.4 s > 34.7 0.560 0.621 -0.061 10.9 s is 'a b c' 0.387 0.337 0.050 -12.9 s == 'a b c' 1.760 1.710 0.050 -2.8 s > 'a b c' 1.710 1.680 0.030 -1.8 s is True 0.614 0.509 0.105 -17.1 s == True 1.050 1.020 0.030 -2.9 s > True 1.060 1.020 0.040 -3.8 s = 'a b c' operation before after delta %chg --------- ------ ----- ----- ---- s is 'abc' 0.379 0.345 0.034 -9.0 s == 'abc' 0.542 0.494 0.048 -8.9 s > 'abc' 0.586 0.593 -0.007 1.2 s is 4 0.430 0.344 0.086 -20.0 s == 4 1.260 1.230 0.030 -2.4 s > 4 1.370 1.230 0.140 -10.2 s is -1001 0.431 0.372 0.059 -13.7 s == -1001 1.250 1.640 -0.390 31.2 s > -1001 1.240 1.260 -0.020 1.6 s is 34.7 0.383 0.337 0.046 -12.0 s == 34.7 1.770 1.680 0.090 -5.1 s > 34.7 1.670 1.660 0.010 -0.6 s is 'a b c' 0.423 0.376 0.047 -11.1 s == 'a b c' 0.506 0.510 -0.004 0.8 s > 'a b c' 0.517 0.564 -0.047 9.1 s is True 0.550 0.514 0.036 -6.5 s == True 1.470 1.640 -0.170 11.6 s > True 1.450 1.430 0.020 -1.4 s = object() operation before after delta %chg --------- ------ ----- ----- ---- s is 'abc' 0.389 0.379 0.010 -2.6 s == 'abc' 1.220 1.370 -0.150 12.3 s > 'abc' 1.220 2.600 -1.380 113.1 s is 4 0.427 0.349 0.078 -18.3 s == 4 1.080 1.620 -0.540 50.0 s > 4 1.060 1.070 -0.010 0.9 s is -1001 0.437 0.343 0.094 -21.5 s == -1001 1.070 1.130 -0.060 5.6 s > -1001 1.060 1.090 -0.030 2.8 s is 34.7 0.419 0.338 0.081 -19.3 s == 34.7 1.710 1.520 0.190 -11.1 s > 34.7 1.520 1.540 -0.020 1.3 s is 'a b c' 0.380 0.347 0.033 -8.7 s == 'a b c' 2.020 1.210 0.810 -40.1 s > 'a b c' 1.260 1.210 0.050 -4.0 s is True 0.622 0.515 0.107 -17.2 s == True 1.220 1.220 0.000 0.0 s > True 1.210 1.210 0.000 0.0 s = [] operation before after delta %chg --------- ------ ----- ----- ---- s is 'abc' 0.369 0.326 0.043 -11.7 s == 'abc' 1.220 1.200 0.020 -1.6 s > 'abc' 1.220 1.200 0.020 -1.6 s is 4 0.372 0.332 0.040 -10.8 s == 4 1.160 1.150 0.010 -0.9 s > 4 1.150 1.150 0.000 0.0 s is -1001 0.371 0.334 0.037 -10.0 s == -1001 1.150 1.140 0.010 -0.9 s > -1001 1.150 1.150 0.000 0.0 s is 34.7 0.368 0.326 0.042 -11.4 s == 34.7 1.500 1.480 0.020 -1.3 s > 34.7 1.490 1.490 0.000 0.0 s is 'a b c' 0.366 0.325 0.041 -11.2 s == 'a b c' 1.220 1.200 0.020 -1.6 s > 'a b c' 1.220 1.200 0.020 -1.6 s is True 0.531 0.484 0.047 -8.9 s == True 1.360 1.350 0.010 -0.7 s > True 1.350 1.350 0.000 0.0 I fully expected that the "is" tests would be faster and without question the "==" and ">" tests would be slower. I was quite surprised that this wasn't always the case. The above tests were run on an 800MHz Powerbook G4 running Mac OSX 10.2.8. I don't have immediate access in Intel hardware, though I'll try to run these tests on cygwin this week. I'd be happy to be shown that my shell script isn't measuring what I think it's measuring as well. Skip |
|||
msg45598 - (view) | Author: Skip Montanaro (skip.montanaro) * | Date: 2004-03-22 21:53 | |
Logged In: YES user_id=44345 I reran the test on a Linux system today and got similar results. I'm pasting them here mostly as documentation. I'm still a bit confused why the == and > tests should show improvement, but they often do on both platforms. Any ideas? Looking at the assembly code generated GCC inserts basically the same four instructions on both the Intel and PowerPC platforms: cmpl $8, -40(%ebp) je .L580 cmpl $9, -40(%ebp) je .L583 on Intel or cmpwi cr7,r24,8 beq- cr7,L622 cmpwi cr7,r24,9 beq- cr7,L625 on PowerPC. I also tried pystone. I see performance hits on both Linux and Mac OSX: Fastest of ten runs patched unpatched Linux 37878.8 38167.9 Mac OSX 13888.9 14124.3 Oh well... It was a thought. Test output on Linux: s = 'abc' operation before after delta %chg --------- ------ ----- ----- ---- s is 'abc' 0.116 0.103 0.013 -11.2 s == 'abc' 0.145 0.141 0.004 -2.8 s > 'abc' 0.140 0.142 -0.002 1.4 s is 4 0.139 0.121 0.018 -12.9 s == 4 0.271 0.293 -0.022 8.1 s > 4 0.276 0.273 0.003 -1.1 s is -1001 0.126 0.120 0.006 -4.8 s == -1001 0.270 0.272 -0.002 0.7 s > -1001 0.282 0.275 0.007 -2.5 s is 34.7 0.133 0.119 0.014 -10.5 s == 34.7 0.352 0.343 0.009 -2.6 s > 34.7 0.340 0.344 -0.004 1.2 s is 'a b c' 0.135 0.118 0.017 -12.6 s == 'a b c' 0.159 0.157 0.002 -1.3 s > 'a b c' 0.200 0.201 -0.001 0.5 s is True 0.177 0.170 0.007 -4.0 s == True 0.316 0.318 -0.002 0.6 s > True 0.321 0.321 0.000 0.0 s = 4 operation before after delta %chg --------- ------ ----- ----- ---- s is 'abc' 0.143 0.120 0.023 -16.1 s == 'abc' 0.266 0.285 -0.019 7.1 s > 'abc' 0.270 0.276 -0.006 2.2 s is 4 0.175 0.103 0.072 -41.1 s == 4 0.105 0.105 0.000 0.0 s > 4 0.106 0.107 -0.001 0.9 s is -1001 0.119 0.119 0.000 0.0 s == -1001 0.119 0.119 0.000 0.0 s > -1001 0.121 0.178 -0.057 47.1 s is 34.7 0.127 0.129 -0.002 1.6 s == 34.7 0.201 0.195 0.006 -3.0 s > 34.7 0.193 0.197 -0.004 2.1 s is 'a b c' 0.212 0.125 0.087 -41.0 s == 'a b c' 0.268 0.271 -0.003 1.1 s > 'a b c' 0.269 0.276 -0.007 2.6 s is True 0.196 0.160 0.036 -18.4 s == True 0.239 0.258 -0.019 7.9 s > True 0.265 0.237 0.028 -10.6 s = None operation before after delta %chg --------- ------ ----- ----- ---- s is 'abc' 0.120 0.109 0.011 -9.2 s == 'abc' 0.203 0.204 -0.001 0.5 s > 'abc' 0.206 0.206 0.000 0.0 s is 4 0.119 0.110 0.009 -7.6 s == 4 0.217 0.214 0.003 -1.4 s > 4 0.214 0.220 -0.006 2.8 s is -1001 0.120 0.107 0.013 -10.8 s == -1001 0.207 0.207 0.000 0.0 s > -1001 0.207 0.214 -0.007 3.4 s is 34.7 0.122 0.112 0.010 -8.2 s == 34.7 0.274 0.270 0.004 -1.5 s > 34.7 0.272 0.271 0.001 -0.4 s is 'a b c' 0.148 0.128 0.020 -13.5 s == 'a b c' 0.240 0.242 -0.002 0.8 s > 'a b c' 0.206 0.210 -0.004 1.9 s is True 0.162 0.153 0.009 -5.6 s == True 0.267 0.262 0.005 -1.9 s > True 0.284 0.258 0.026 -9.2 s = -1000 operation before after delta %chg --------- ------ ----- ----- ---- s is 'abc' 0.218 0.128 0.090 -41.3 s == 'abc' 0.274 0.275 -0.001 0.4 s > 'abc' 0.264 0.301 -0.037 14.0 s is 4 0.125 0.120 0.005 -4.0 s == 4 0.123 0.122 0.001 -0.8 s > 4 0.119 0.121 -0.002 1.7 s is -1001 0.123 0.123 0.000 0.0 s == -1001 0.132 0.123 0.009 -6.8 s > -1001 0.121 0.121 0.000 0.0 s is 34.7 0.130 0.215 -0.085 65.4 s == 34.7 0.199 0.197 0.002 -1.0 s > 34.7 0.194 0.236 -0.042 21.6 s is 'a b c' 0.158 0.140 0.018 -11.4 s == 'a b c' 0.294 0.293 0.001 -0.3 s > 'a b c' 0.302 0.300 0.002 -0.7 s is True 0.190 0.161 0.029 -15.3 s == True 0.234 0.232 0.002 -0.9 s > True 0.238 0.234 0.004 -1.7 s = 34.2 operation before after delta %chg --------- ------ ----- ----- ---- s is 'abc' 0.133 0.120 0.013 -9.8 s == 'abc' 0.338 0.330 0.008 -2.4 s > 'abc' 0.350 0.338 0.012 -3.4 s is 4 0.126 0.121 0.005 -4.0 s == 4 0.194 0.197 -0.003 1.5 s > 4 0.193 0.196 -0.003 1.6 s is -1001 0.132 0.120 0.012 -9.1 s == -1001 0.293 0.193 0.100 -34.1 s > -1001 0.196 0.190 0.006 -3.1 s is 34.7 0.117 0.105 0.012 -10.3 s == 34.7 0.153 0.153 0.000 0.0 s > 34.7 0.156 0.155 0.001 -0.6 s is 'a b c' 0.152 0.138 0.014 -9.2 s == 'a b c' 0.360 0.398 -0.038 10.6 s > 'a b c' 0.334 0.354 -0.020 6.0 s is True 0.171 0.174 -0.003 1.8 s == True 0.248 0.254 -0.006 2.4 s > True 0.247 0.244 0.003 -1.2 s = 'a b c' operation before after delta %chg --------- ------ ----- ----- ---- s is 'abc' 0.137 0.117 0.020 -14.6 s == 'abc' 0.157 0.158 -0.001 0.6 s > 'abc' 0.204 0.201 0.003 -1.5 s is 4 0.131 0.119 0.012 -9.2 s == 4 0.269 0.272 -0.003 1.1 s > 4 0.277 0.277 0.000 0.0 s is -1001 0.153 0.146 0.007 -4.6 s == -1001 0.299 0.294 0.005 -1.7 s > -1001 0.299 0.302 -0.003 1.0 s is 34.7 0.153 0.146 0.007 -4.6 s == 34.7 0.374 0.368 0.006 -1.6 s > 34.7 0.342 0.336 0.006 -1.8 s is 'a b c' 0.140 0.118 0.022 -15.7 s == 'a b c' 0.150 0.158 -0.008 5.3 s > 'a b c' 0.160 0.156 0.004 -2.5 s is True 0.193 0.194 -0.001 0.5 s == True 0.345 0.338 0.007 -2.0 s > True 0.318 0.319 -0.001 0.3 s = object() operation before after delta %chg --------- ------ ----- ----- ---- s is 'abc' 0.158 0.143 0.015 -9.5 s == 'abc' 0.298 0.294 0.004 -1.3 s > 'abc' 0.288 0.292 -0.004 1.4 s is 4 0.129 0.121 0.008 -6.2 s == 4 0.249 0.250 -0.001 0.4 s > 4 0.248 0.249 -0.001 0.4 s is -1001 0.151 0.152 -0.001 0.7 s == -1001 0.271 0.266 0.005 -1.8 s > -1001 0.284 0.271 0.013 -4.6 s is 34.7 0.152 0.140 0.012 -7.9 s == 34.7 0.364 0.385 -0.021 5.8 s > 34.7 0.429 0.392 0.037 -8.6 s is 'a b c' 0.152 0.138 0.014 -9.2 s == 'a b c' 0.300 0.297 0.003 -1.0 s > 'a b c' 0.288 0.285 0.003 -1.0 s is True 0.192 0.184 0.008 -4.2 s == True 0.325 0.329 -0.004 1.2 s > True 0.324 0.322 0.002 -0.6 s = [] operation before after delta %chg --------- ------ ----- ----- ---- s is 'abc' 0.126 0.121 0.005 -4.0 s == 'abc' 0.266 0.285 -0.019 7.1 s > 'abc' 0.273 0.271 0.002 -0.7 s is 4 0.125 0.119 0.006 -4.8 s == 4 0.269 0.269 0.000 0.0 s > 4 0.268 0.274 -0.006 2.2 s is -1001 0.133 0.121 0.012 -9.0 s == -1001 0.269 0.291 -0.022 8.2 s > -1001 0.271 0.269 0.002 -0.7 s is 34.7 0.132 0.124 0.008 -6.1 s == 34.7 0.332 0.362 -0.030 9.0 s > 34.7 0.339 0.336 0.003 -0.9 s is 'a b c' 0.125 0.119 0.006 -4.8 s == 'a b c' 0.268 0.291 -0.023 8.6 s > 'a b c' 0.275 0.273 0.002 -0.7 s is True 0.171 0.164 0.007 -4.1 s == True 0.317 0.315 0.002 -0.6 s > True 0.338 0.316 0.022 -6.5 |
|||
msg45599 - (view) | Author: Tim Peters (tim.peters) * | Date: 2004-03-23 05:15 | |
Logged In: YES user_id=31435 When you introduce a new branch, and time it in isolation, HW may have enough resource to optimize for both branch targets simultaneously. Run a ton of other stuff too, though, and then it can start to lose. Still, for detailed answers about anything at this level, you need to use a HW simulator -- modern processors are intractably complex, and the user- visible programming model supplied by Pentium in particular is multiple layers removed from bottom-line reality now, so much so that Intel doesn't even try to supply "instruction timings" anymore (they depend in complex ways on the internal states of resources that aren't visible in the programming model). |
|||
msg45600 - (view) | Author: Raymond Hettinger (rhettinger) * | Date: 2004-03-23 07:26 | |
Logged In: YES user_id=80475 I'm pretty sure that this is a false optimization because the time saved in the function call is being offset by the extra unpredictable branch for the other tests. Even if those others are losing 1% while either "is" or "isnot" gain 10%, the comparisons are not apt. The total time for rich compares is so long that 1% represents much more real time than 1% of an is/insnot test. Also, the results need to be considered in aggregate with real times (not percentages) and appropriate frequency weighting (if known). For example: IS occurs 100 times saving 9 microsec each time ISNOT occurs 70 times saving 9 microsec each time EQ occurs 700 times costing 4 microsec each time NE occurs 50 times costing 4 microsec each time LT occurs 100 times costing 4 microsec each time --> weighted result 1.8 microsec lost Of course, this can't be done exactly or even inexactly, but it shows that the percentages can't be considered out of the context of dynamic usage frequency, aggregations of all the operators, and real time. If something like this patch needs to go in, consider making the branches predictable: slow_compare: if (oparg == PyCmp_IS) { x = (v == w) ? Py_True : Py_False; Py_INCREF(x); } else if (oparg == PyCmp_IS_NOT) { x = (v != w) ? Py_True : Py_False; Py_INCREF(x); } else x = cmp_outcome(oparg, v, w); Also, when it comes to micro-optimizations that are compiler sensitive, the Intel timing tests should be built with the compiler actually used to build the distribution (no sense convincing ourselves of an optimization that doesn't occur on the real distribution). |
|||
msg45601 - (view) | Author: Raymond Hettinger (rhettinger) * | Date: 2006-02-20 10:59 | |
Logged In: YES user_id=80475 This was not demonstrated to produce any real speed-up. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:56:03 | admin | set | github: 40044 |
2004-03-18 01:50:49 | skip.montanaro | create |