This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Problem running a subprocess
Type: Stage:
Components: Library (Lib) Versions: Python 2.5
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: astrand Nosy List: astrand, frougon
Priority: normal Keywords:

Created on 2007-01-13 15:46 by frougon, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
extract_frame.c frougon, 2007-01-13 15:46 Source code for the C program
Makefile frougon, 2007-01-13 15:49 Trivial Makefile to compile the C program
extract_frame.py frougon, 2007-01-13 15:49 Python program
extract_frame.output frougon, 2007-01-13 15:50 Output of the C program with the first .mpg file
extract_frame.output2 frougon, 2007-01-13 15:51 Output of the C program with the second .mpg file
extract_frame.py.output frougon, 2007-01-13 15:52 Output of the Python program with the first .mpg file
extract_frame.py.output2 frougon, 2007-01-13 15:52 Output of the Python program with the second .mpg file
Messages (12)
msg31009 - (view) Author: Florent Rougon (frougon) Date: 2007-01-13 15:46
Hello,

I have a problem running a subprocess from Python (see below). I first ran into it with the subprocess module, but it's also triggered by a simple os.fork() followed by os.execvp().

So, what is the problem, exactly? I have written the exact same minimal program in C and in Python, which uses fork() and execvp() in the most straightforward way to run the following command:

  transcode -i /tmp/file.mpg -c 100-101 -o snapshot -y im,null -F png

(whose effect is to extract the 100th frame of /tmp/file.mpg and store it into snapshot.png)

The C program runs fast with no error, while the one in Python takes from 60 to 145 times longer (!), and triggers error messages from transcode. This shouldn't happen, since both programs are merely calling transcode in the same way to perform the exact same thing.

Experiments
------------

1. First, I run the C program (extract_frame) twice on a first .mpg file (MPEG 2 PS) [the first time fills the block IO cache], and store the output in extract_frame.output:

  % time ./extract_frame >extract_frame.output 2>&1
  ./extract_frame > extract_frame.output 2>& 1  0.82s user 0.33s system 53% cpu 2.175 total
  % time ./extract_frame >extract_frame.output 2>&1
  ./extract_frame > extract_frame.output 2>& 1  0.79s user 0.29s system 96% cpu 1.118 total

Basically, this takes 1 or 2 seconds. extract_frame.output is attached.

Second, I run the Python program (extract_frame.py) on the same .mpg file, and store the output in extract_frame.py.output:

  % time ./extract_frame.py >extract_frame.py.output 2>&1 
  ./extract_frame.py > extract_frame.py.output 2>& 1  81.59s user 25.98s system 66% cpu 2:42.51 total

This takes more than 2 *minutes*, not seconds!
(of course, the system is idle for all tests)

In extract_frame.py.output, the following error message appears quickly after the process is started:

  failed to write Y plane of frame(demuxer.c) write program stream packet: Broken pipe

which is in fact composed of two error messages, the second one starting at "(demuxer.c)".

Once these messages are printed, the transcode subprocesses[1] seem to hang (with relatively high CPU usage), but eventually complete, after 2 minutes or so.

There are no such error messages in extract_frame.output.

2. Same test with another .mpg file. As far as time is concerned, we have the same problem:

  [C program]
  % time ./extract_frame >extract_frame.output2 2>&1     
  ./extract_frame > extract_frame.output2 2>& 1  0.73s user 0.28s system 43% cpu 2.311 total

  [Python program]
  % time ./extract_frame.py >extract_frame.py.output2 2>&1
  ./extract_frame.py > extract_frame.py.output2 2>& 1  92.84s user 12.20s system 76% cpu 2:18.14 total

We also get the first error message in extract_frame.py.output2:

  failed to write Y plane of frame

when running extract_frame.py, but this time, we do *not* have the second error message:

  (demuxer.c) write program stream packet: Broken pipe

All this is reproducible with Python 2.3, 2.4 and 2.5 (Debian packages in sarge for 2.3 and 2.4, vanilla Python 2.5).

  % python2.5 -c 'import sys; print sys.version; print "%x" % sys.hexversion'
  2.5 (r25:51908, Jan  5 2007, 17:35:09) 
  [GCC 3.3.5 (Debian 1:3.3.5-13)]
  20500f0

  % transcode --version
  transcode v1.0.2 (C) 2001-2003 Thomas Oestreich, 2003-2004 T. Bitterberg

I'd hazard that Python is tweaking some process or threading parameter that is inherited by subprocesses and disturbs transcode, which doesn't happen when calling fork() and execvp() from a C program, but am unfortunately unable to precisely diagnose the problem.

Many thanks for considering.

Regards,

Florent

  [1] Plural because the transcode process spawns several childs: tcextract, tcdemux, etc.
msg31010 - (view) Author: Florent Rougon (frougon) Date: 2007-01-13 15:49
File Added: Makefile
msg31011 - (view) Author: Florent Rougon (frougon) Date: 2007-01-13 15:49
File Added: extract_frame.py
msg31012 - (view) Author: Florent Rougon (frougon) Date: 2007-01-13 15:50
File Added: extract_frame.output
msg31013 - (view) Author: Florent Rougon (frougon) Date: 2007-01-13 15:51
File Added: extract_frame.output2
msg31014 - (view) Author: Florent Rougon (frougon) Date: 2007-01-13 15:52
File Added: extract_frame.py.output
msg31015 - (view) Author: Florent Rougon (frougon) Date: 2007-01-13 15:52
File Added: extract_frame.py.output2
msg31016 - (view) Author: Peter Åstrand (astrand) * (Python committer) Date: 2007-01-13 22:14
The first thing to check is if the subprocesses have different sets up file descriptors when you launch them from Python and C, respectively. On Linux, do /proc/$thepid/fd in both cases and compare the output.  Does it matter if you use close_fds=1?
msg31017 - (view) Author: Florent Rougon (frougon) Date: 2007-01-13 23:37
Hi Peter,

At the very beginning, it seems the fds are the same in the child processes running transcode in each implementation (C, Python).

With the C version, I got:

total 5
dr-x------  2 flo users  0 2007-01-14 00:12 .
dr-xr-xr-x  4 flo users  0 2007-01-14 00:12 ..
lrwx------  1 flo users 64 2007-01-14 00:12 0 -> /dev/pts/0
l-wx------  1 flo users 64 2007-01-14 00:12 1 -> /home/flo/tmp/transcode-test/extract_frame.output
l-wx------  1 flo users 64 2007-01-14 00:12 2 -> /home/flo/tmp/transcode-test/extract_frame.output
lr-x------  1 flo users 64 2007-01-14 00:12 3 -> pipe:[41339]
lr-x------  1 flo users 64 2007-01-14 00:12 4 -> pipe:[41340]

With the Python version, I got:

total 5
dr-x------  2 flo users  0 2007-01-14 00:05 .
dr-xr-xr-x  4 flo users  0 2007-01-14 00:05 ..
lrwx------  1 flo users 64 2007-01-14 00:05 0 -> /dev/pts/0
l-wx------  1 flo users 64 2007-01-14 00:05 1 -> /home/flo/tmp/transcode-test/extract_frame.py.output
l-wx------  1 flo users 64 2007-01-14 00:05 2 -> /home/flo/tmp/transcode-test/extract_frame.py.output
lr-x------  1 flo users 64 2007-01-14 00:05 3 -> pipe:[40641]
lr-x------  1 flo users 64 2007-01-14 00:05 4 -> pipe:[40642]

That's the only thing I managed to get with the C version. But with the Python version, if I don't list the contents of /proc/<pid>/fd immediately after the transcode process started, I get this instead:

total 3
dr-x------  2 flo users  0 2007-01-14 00:05 .
dr-xr-xr-x  4 flo users  0 2007-01-14 00:05 ..
lrwx------  1 flo users 64 2007-01-14 00:05 0 -> /dev/pts/0
l-wx------  1 flo users 64 2007-01-14 00:05 1 -> /home/flo/tmp/transcode-test/extract_frame.py.output
l-wx------  1 flo users 64 2007-01-14 00:05 2 -> /home/flo/tmp/transcode-test/extract_frame.py.output

No pipes anymore. Only the 3 standard fds.

Note: I performed these tests with the .mpg file that does *not* cause the "Broken pipe" message to appear; therefore, the broken pipe in question is probably unrelated to those we saw disappear in this experiment (transcode launches several processes such as tcdecode, tcextract, etc. all communicating via pipes; I suppose the "Broken pipe" message shows up when one of these programs fails, for reasons we have yet to discover).

Regarding your mentioning of close_fds, if I am not mistaken, it's only an optional argument of subrocess.Popen(). I did try to set it to True when first running into the problem, and it didn't help. But now, I am using basic fork() and execvp() (see the attachments), so there is no such close_fds option, right?

Thanks.

Florent
msg31018 - (view) Author: Peter Åstrand (astrand) * (Python committer) Date: 2007-01-21 15:37
>That's the only thing I managed to get with the C version. But with the
>Python version, if I don't list the contents of /proc/<pid>/fd immediately
>after the transcode process started,

I find it very hard to believe that just listing the contents of a kernel-virtual directory can change the behaviour of an application. I think it's much more likely that you have a timing issue.  

Since nothing indicates that there's actually a problem with the subprocess module, I'm closing this bug for now. After all, it's transcode that runs slowly and gives errors. This suggests that the problem is actually in transcode rather than in the Python subprocess module. 

Please re-open this bug if you find any indication of that it's actually subprocess that does something wrong. 
msg31019 - (view) Author: Florent Rougon (frougon) Date: 2007-01-21 16:24
I never wrote that it was the listing of /proc/<pid>/fd that was changing the behavior of transcode. Please don't put words in my mouth. I wrote that some fds are open soon after the transcode process is started, and quickly closed afterwards, when run from the Python test script.

The rest of your answer again shows that you didn't read the bug report. I'll repeat a last time. The title of this bug report is "Problem running a subprocess". It is *not* "Problem with subprocess.py", although it of course happens with subprocess.py, since this module relies (on POSIX operating systems) on os.fork() and os.exec*().

Yes, I could have reported the problem against subprocess.py, since the problem does exist there. But I tried to be a good citizen and do my part of the job---to the point where I wasn't able to go further. I figured out the problem existed in the basic building blocks of subprocess.py, i.e. os.fork() and os.exec*(), and thus spared you the time needed to find this out on your own.

I wrote and attached minimal example programs that reproduce the bug. These programs show that, with a particular program (transcode), a simple fork() + execvp() works fine in C but does not work in Python. *That* is a problem for the Python Library.

Finally, to justify the closing of this bug, you wrote:

  "After all, it's transcode that runs slowly and gives errors. This suggests that the problem is actually in transcode rather than in the Python subprocess module."

No, no, no. Transcode works perfectly fine when launched by my shell or the test program in C that I attached to this bug report. It is when run from Python that the aforementioned problems happen.

You also wrote in the end:

  "Please re-open this bug if you find any indication of that it's actually subprocess that does something wrong."

Of course it's a problem in subprocess, since it's not able to run a program the same way as a simple fork() + execvp() does in C... because of the underlying machinery (fork() and execvp() wrappers, particular settings of the Python process---I don't know), not because of subprocess.py, AFAICT. Hence the title I used for the report.

But I won't reopen the bug myself: there is no point in doing so if you don't read the initial report nor the comments added later. If you're willing to read the reports you're claiming to take care of, reopen it yourself. Otherwise, please stop wasting my time and rendering the Python BTS useless.
msg31020 - (view) Author: Peter Åstrand (astrand) * (Python committer) Date: 2007-01-21 20:22
>I never wrote that it was the listing of /proc/<pid>/fd that was changing
>the behavior of transcode. Please don't put words in my mouth. I wrote that
>some fds are open soon after the transcode process is started, and quickly
>closed afterwards, when run from the Python test script.

Sorry about that, I did misunderstand you. 

>I wrote and attached minimal example programs that reproduce the bug.

The problem is that although extract_frame.py is short and "minimal", it relies on the transcode program, which is a very complex piece of software. I cannot reproduce your problem on my machine (but I haven't tried very hard). 

>I wrote and attached minimal example programs that reproduce the bug.
>These programs show that, with a particular program (transcode), a simple
>fork() + execvp() works fine in C but does not work in Python. *That* is a
>problem for the Python Library.

Again, there's no clear evidence that there's actually some problem with the Python library or even Python itself. Python is a very mature software. The fork() and execvp() are very heavily used. It's unlikely that there's something fundamentally wrong with them - if so, problems would probably have turned up for other subprocesses as well, not just transcode. It might be some minor difference between your C program and Python. But: It's your job to point out what you think it's wrong. 

>Transcode works perfectly fine when launched by my shell or the test program in C that I attached to this bug report.

This doesn't prove that Python is guilty. This only proves that you have failed to reproduce the problem from your C test program and the shell. It might still be a bug in transcode that's only triggered by some corner case or timing issue. Say, a race condition. 


>Of course it's a problem in subprocess, since it's not able to run a
>program the same way as a simple fork() + execvp() does in C.

From what you have described, the program is executed exactly in the same way from Python as from your C program (wrt open file descriptors after launch, program arguments etc). 

>Otherwise, please stop wasting my time and rendering the Python BTS useless.

The problem with me wasting your time can easily be solved, I'll just stop trying to help you...
History
Date User Action Args
2022-04-11 14:56:22adminsetgithub: 44458
2007-01-13 15:46:45frougoncreate