This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: subprocess.py: O(N**2) bottleneck
Type: Stage:
Components: Library (Lib) Versions: Python 2.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: astrand Nosy List: astrand, klaas, nnorwitz, rwgk
Priority: normal Keywords:

Created on 2006-11-17 06:40 by rwgk, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
subprocess_py_25_patch rwgk, 2006-11-17 06:40 patch
slow.py rwgk, 2006-11-17 06:43 demo
Messages (7)
msg30571 - (view) Author: Ralf W. Grosse-Kunstleve (rwgk) Date: 2006-11-17 06:40
subprocess.py (Python 2.5, current SVN, probably all versions) contains this O(N**2) code:

  bytes_written = os.write(self.stdin.fileno(), input[:512])
  input = input[bytes_written:]

For large but reasonable "input" the second line is rate limiting. Luckily, it is very easy to remove this bottleneck. I'll upload a simple patch. Below is a small script that demonstrates the huge speed difference. The output on my machine is:

creating input
0.888417959213
slow slicing input
61.1553330421
creating input
0.863168954849
fast slicing input
0.0163860321045
done

The numbers are times in seconds.

This is the source:

import time
import sys
size = 1000000
t0 = time.time()
print "creating input"
input = "\n".join([str(i) for i in xrange(size)])
print time.time()-t0
t0 = time.time()
print "slow slicing input"
n_out_slow = 0
while True:
  out = input[:512]
  n_out_slow += 1
  input = input[512:]
  if not input:
    break
print time.time()-t0
t0 = time.time()
print "creating input"
input = "\n".join([str(i) for i in xrange(size)])
print time.time()-t0
t0 = time.time()
print "fast slicing input"
n_out_fast = 0
input_done = 0
while True:
  out = input[input_done:input_done+512]
  n_out_fast += 1
  input_done += 512
  if input_done >= len(input):
    break
print time.time()-t0
assert n_out_fast == n_out_slow
print "done"
msg30572 - (view) Author: Ralf W. Grosse-Kunstleve (rwgk) Date: 2006-11-17 06:43
Sorry, I didn't know the tracker would destroy the indentation.
I'm uploading the demo source as a separate file.
msg30573 - (view) Author: Mike Klaas (klaas) Date: 2007-01-04 18:20
I reviewed the patch--the proposed fix looks good.  Minor comments:
  - "input_done" sounds like a flag, not a count of written bytes
  - buffer() could be used to avoid the 512-byte copy created by the slice
msg30574 - (view) Author: Peter Åstrand (astrand) * (Python committer) Date: 2007-01-07 14:36
Fixed in trunk revision 53295. Is this a good candidate for backporting to 25-maint?
msg30575 - (view) Author: Ralf W. Grosse-Kunstleve (rwgk) Date: 2007-01-07 15:15
Thanks for the fixes!
msg30576 - (view) Author: Neal Norwitz (nnorwitz) * (Python committer) Date: 2007-01-17 07:00
Peter this is fine for 2.5.1.  Please apply and update Misc/NEWS. Thanks!
msg30577 - (view) Author: Peter Åstrand (astrand) * (Python committer) Date: 2007-01-21 15:45
Backported to 2.5, in rev. 53513.
History
Date User Action Args
2022-04-11 14:56:21adminsetgithub: 44244
2006-11-17 06:40:35rwgkcreate