Jump to content


Photo

Error handling in eConsoleAppContainer


  • Please log in to reply
49 replies to this topic

#1 prl

  • Senior Member
  • 36 posts

+2
Neutral

Posted 13 November 2018 - 02:45

There are some problems with error handling in eConsoleAppContainer.

  1. If the execvp() call in bidirpipe() returns (e.g. fails with ENOENT or EPERM), the child process prints "[eConsoleAppContainer] Finished <cmdname>" to the debug log when it was actually not possible to run the command, and exits the child process with status 0, when it should exit the child with non-zero status. Exiting with status 127 would be compatible with the status returned from "sh -c" when the command isn't found.
  2. If the child is terminated with a signal, the appClosed() PSignal1 is invoked with a retval of 0, unless the process was terminated by a SIGKILL that was sent by eConsoleAppContainer::kill(). If the child is terminated by a SIGKILL from any other source, retval is 0. Similarly retval is 0 if the child is terminated by SIGINT, even if the SIGINT was sent by eConsoleAppContainer::sendCtrlC().

I propose that if the execvp() call in bidirpipe() returns, the message printed to the debug lok should be something like "[eConsoleAppContainer] <cmdname> failed to execute: <perror string>", and the call of _exit() shoud be changed from _exit(0) to _exit(127).

 

I think that the retval argument to the appClosed() PSignal1 should be the child's exit status if the child exited normally, and -signalvalue if the child was terminated by a signal. I have run through most of the uses of eConsoleAppContainer in the Beyonwiz enigma2 code and I can't see any problem with this change (retval is mostly ignored or tested for being non-zero). The only issue I see is if there is code that explicitly tests for retval == -1 to detect termination of the child by eConsoleAppContainer::kill().



Re: Error handling in eConsoleAppContainer #2 betacentauri

  • PLi® Core member
  • 7,185 posts

+323
Excellent

Posted 15 November 2018 - 14:36

To 1: Why return 127? I see no added value to use 127. Just return that an error occured.

I would propose something like this (untested!!):

diff --git a/lib/base/console.cpp b/lib/base/console.cpp
index 3e97200..74acf73 100644
--- a/lib/base/console.cpp
+++ b/lib/base/console.cpp
@@ -15,6 +15,7 @@ int bidirpipe(int pfd[], const char *cmd , const char * const argv[], const char
        int pfdout[2]; /* from parent to child */
        int pfderr[2]; /* stderr from child to parent */
        int pid;       /* child's pid */
+       int ret;
 
        if ( pipe(pfdin) == -1 || pipe(pfdout) == -1 || pipe(pfderr) == -1)
                return(-1);
@@ -41,8 +42,13 @@ int bidirpipe(int pfd[], const char *cmd , const char * const argv[], const char
                if (cwd && chdir(cwd) < 0)
                        eDebug("[eConsoleAppContainer] failed to change directory to %s (%m)", cwd);
 
-               execvp(cmd, (char * const *)argv);
+               ret = execvp(cmd, (char * const *)argv);
                                /* the vfork will actually suspend the parent thread until execvp is called. thus it's ok to use the sh
+               if (ret == -1)
+               {
+                       // print error message
+                       return ret;
+               }
                eDebug("[eConsoleAppContainer] Finished %s", cmd);
                _exit(0);
        }
@@ -115,7 +121,6 @@ int eConsoleAppContainer::execute(const char *cmdline, const char * const argv[]
        pid = bidirpipe(fd, cmdline, argv, m_cwd.empty() ? 0 : m_cwd.c_str());
 
        if ( pid == -1 ) {
-               eDebug("[eConsoleAppContainer] failed to start %s", cmdline);
                return -3;
        }

Xtrend ET-9200, ET-8000, ET-10000, OpenPliPC on Ubuntu 12.04

Re: Error handling in eConsoleAppContainer #3 betacentauri

  • PLi® Core member
  • 7,185 posts

+323
Excellent

Posted 15 November 2018 - 14:49

To 2: You mean using WIFSIGNALED and WTERMSIG to get the signal and return this?
 Is the return value important? Why not just use WIFSIGNALED to know that a signal terminated the child and then return -1 to be compatible with the current code?
Xtrend ET-9200, ET-8000, ET-10000, OpenPliPC on Ubuntu 12.04

Re: Error handling in eConsoleAppContainer #4 Erik Slagter

  • PLi® Core member
  • 46,951 posts

+541
Excellent

Posted 15 November 2018 - 17:55

Is it relevant to know whether the proces was aborted due to a signal or due to an exit? Personally I don't think so, the only thing that's relevant is whether it succeeded or not. Please note that stuff like segmentation faults and division by zero conditions are also delivered as signal (although they mostly mean there is a bug in the program).

 

I agree the current code can't work, it won't detect a process resulting in an error. It will only detect a failed fork (e.g. due to insufficient resources). I don't know what bidirpipe does here though. Does it only create a bidirectional pipe, like the name suggests, or does it also system/exec/etc a new process? That would be strange, because that's already done at this point? Why then is the "cmdline" and "argv" passed?

 

BTW the Linux manual says: The use of vfork() is unnecessary on Linux and also prone to all sorts of race conditions and undefined behaviour. I'd say, get rid of it and use fork() instead and secure defined behaviour. The parent will need to fetch the pid of the child though, and wait() for it.


* Wavefrontier T90 with 28E/23E/19E/13E via SCR switches 2 x 2 x 6 user bands
I don't read PM -> if you have something to ask or to report, do it in the forum so others can benefit. I don't take freelance jobs.
Ik lees geen PM -> als je iets te vragen of te melden hebt, doe het op het forum, zodat anderen er ook wat aan hebben.


Re: Error handling in eConsoleAppContainer #5 MiLo

  • PLi® Core member
  • 14,042 posts

+298
Excellent

Posted 15 November 2018 - 18:48

BTW the Linux manual says: The use of vfork() is unnecessary on Linux and also prone to all sorts of race conditions and undefined behaviour. I'd say, get rid of it and use fork() instead and secure defined behaviour. The parent will need to fetch the pid of the child though, and wait() for it.


The reason that vfork is being used here is that fork() will typically run out of address space. Enigma2, if it has been running for a few weeks, has allocated a huge amount of memory pages, and a fork() at that point will run out of address space because of that. That's why python's os.system() is also unreliable, it also uses the normal fork, and thus it usually works when you quickly test it, but it'll invariably fail on systems that have been up and running for a long time.
Real musicians never die - they just decompose

Re: Error handling in eConsoleAppContainer #6 littlesat

  • PLi® Core member
  • 56,123 posts

+685
Excellent

Posted 15 November 2018 - 19:38

Openatv is full of os.system ... ;)

WaveFrontier 28.2E | 23.5E | 19.2E | 16E | 13E | 10/9E | 7E | 5E | 1W | 4/5W | 15W


Re: Error handling in eConsoleAppContainer #7 WanWizard

  • PLi® Core member
  • 68,309 posts

+1,719
Excellent

Posted 15 November 2018 - 19:39

Openatv is full of os.system ... ;)

 

Who cares? ;)


Currently in use: VU+ Duo 4K (2xFBC S2), VU+ Solo 4K (1xFBC S2), uClan Usytm 4K Pro (S2+T2), Octagon SF8008 (S2+T2), Zgemma H9.2H (S2+T2)

Due to my bad health, I will not be very active at times and may be slow to respond. I will not read the forum or PM on a regular basis.

Many answers to your question can be found in our new and improved wiki.


Re: Error handling in eConsoleAppContainer #8 littlesat

  • PLi® Core member
  • 56,123 posts

+685
Excellent

Posted 15 November 2018 - 19:48

I care because of merge requests which has them inside so I have to deny... and when I clarify I get the defend it works.... :( it is so easy but wrong to do it...

Edited by littlesat, 15 November 2018 - 19:49.

WaveFrontier 28.2E | 23.5E | 19.2E | 16E | 13E | 10/9E | 7E | 5E | 1W | 4/5W | 15W


Re: Error handling in eConsoleAppContainer #9 WanWizard

  • PLi® Core member
  • 68,309 posts

+1,719
Excellent

Posted 15 November 2018 - 19:55

It takes some effort (and sometimes abuse) to do the right thing. But in the end, we're in charge. ;)


Currently in use: VU+ Duo 4K (2xFBC S2), VU+ Solo 4K (1xFBC S2), uClan Usytm 4K Pro (S2+T2), Octagon SF8008 (S2+T2), Zgemma H9.2H (S2+T2)

Due to my bad health, I will not be very active at times and may be slow to respond. I will not read the forum or PM on a regular basis.

Many answers to your question can be found in our new and improved wiki.


Re: Error handling in eConsoleAppContainer #10 littlesat

  • PLi® Core member
  • 56,123 posts

+685
Excellent

Posted 15 November 2018 - 19:58

That is right... but at the end I can completely redo it... (like e.g. flash image via the ui....)

WaveFrontier 28.2E | 23.5E | 19.2E | 16E | 13E | 10/9E | 7E | 5E | 1W | 4/5W | 15W


Re: Error handling in eConsoleAppContainer #11 Erik Slagter

  • PLi® Core member
  • 46,951 posts

+541
Excellent

Posted 15 November 2018 - 20:26

 

BTW the Linux manual says: The use of vfork() is unnecessary on Linux and also prone to all sorts of race conditions and undefined behaviour. I'd say, get rid of it and use fork() instead and secure defined behaviour. The parent will need to fetch the pid of the child though, and wait() for it.


The reason that vfork is being used here is that fork() will typically run out of address space. Enigma2, if it has been running for a few weeks, has allocated a huge amount of memory pages, and a fork() at that point will run out of address space because of that. That's why python's os.system() is also unreliable, it also uses the normal fork, and thus it usually works when you quickly test it, but it'll invariably fail on systems that have been up and running for a long time.

Following your reasoning enigma2 has some huge memory leaks. I guess we'll have to address that then.

 

Read the manual, it says Linux uses copy-on-write so only the descriptors are copied, not the pages themselves. Linux has been doing that for ages. So as long as you follow the fork() by an immediate exec(), the netto result is the same, but then without the great risk on ondefined behaviours and race conditions. The use of vfork() is discouraged on Linux.


* Wavefrontier T90 with 28E/23E/19E/13E via SCR switches 2 x 2 x 6 user bands
I don't read PM -> if you have something to ask or to report, do it in the forum so others can benefit. I don't take freelance jobs.
Ik lees geen PM -> als je iets te vragen of te melden hebt, doe het op het forum, zodat anderen er ook wat aan hebben.


Re: Error handling in eConsoleAppContainer #12 Erik Slagter

  • PLi® Core member
  • 46,951 posts

+541
Excellent

Posted 15 November 2018 - 20:27

That is right... but at the end I can completely redo it... (like e.g. flash image via the ui....)

No, they should do it, not you.


* Wavefrontier T90 with 28E/23E/19E/13E via SCR switches 2 x 2 x 6 user bands
I don't read PM -> if you have something to ask or to report, do it in the forum so others can benefit. I don't take freelance jobs.
Ik lees geen PM -> als je iets te vragen of te melden hebt, doe het op het forum, zodat anderen er ook wat aan hebben.


Re: Error handling in eConsoleAppContainer #13 littlesat

  • PLi® Core member
  • 56,123 posts

+685
Excellent

Posted 15 November 2018 - 21:15

They don’t because it works... this week I did such other thing like hide parental controlled bouquets... that was not on the eConsole, but just an example why we have some features later than others...
Now on topic... is there really something bad about the eConsoleAppContainer

Edited by littlesat, 15 November 2018 - 21:18.

WaveFrontier 28.2E | 23.5E | 19.2E | 16E | 13E | 10/9E | 7E | 5E | 1W | 4/5W | 15W


Re: Error handling in eConsoleAppContainer #14 samsamsam

  • Senior Member
  • 2,024 posts

+146
Excellent

Posted 15 November 2018 - 22:10

@Erik Slagter

 

You are wrong. 
 
Remember three things: 
- fork is slower then vfork
- not only system RAM memory pages can be mapped (you can map file, registers and, so on, also same physical RAM pages could be mapped many times by different processes)
- huge amount of mapped pages could be caused by memory fragmentation not by memory leak
 
vfor() is not evil at all, it have some disadvantages, but what does not have them?
 

Edited by samsamsam, 15 November 2018 - 22:11.


Re: Error handling in eConsoleAppContainer #15 prl

  • Senior Member
  • 36 posts

+2
Neutral

Posted 16 November 2018 - 00:32

 

To 1: Why return 127? I see no added value to use 127. Just return that an error occured.

I would propose something like this (untested!!):

diff --git a/lib/base/console.cpp b/lib/base/console.cpp
index 3e97200..74acf73 100644
--- a/lib/base/console.cpp
+++ b/lib/base/console.cpp
@@ -15,6 +15,7 @@ int bidirpipe(int pfd[], const char *cmd , const char * const argv[], const char
        int pfdout[2]; /* from parent to child */
        int pfderr[2]; /* stderr from child to parent */
        int pid;       /* child's pid */
+       int ret;
 
        if ( pipe(pfdin) == -1 || pipe(pfdout) == -1 || pipe(pfderr) == -1)
                return(-1);
@@ -41,8 +42,13 @@ int bidirpipe(int pfd[], const char *cmd , const char * const argv[], const char
                if (cwd && chdir(cwd) < 0)
                        eDebug("[eConsoleAppContainer] failed to change directory to %s (%m)", cwd);
 
-               execvp(cmd, (char * const *)argv);
+               ret = execvp(cmd, (char * const *)argv);
                                /* the vfork will actually suspend the parent thread until execvp is called. thus it's ok to use the sh
+               if (ret == -1)
+               {
+                       // print error message
+                       return ret;
+               }
                eDebug("[eConsoleAppContainer] Finished %s", cmd);
                _exit(0);
        }
@@ -115,7 +121,6 @@ int eConsoleAppContainer::execute(const char *cmdline, const char * const argv[]
        pid = bidirpipe(fd, cmdline, argv, m_cwd.empty() ? 0 : m_cwd.c_str());
 
        if ( pid == -1 ) {
-               eDebug("[eConsoleAppContainer] failed to start %s", cmdline);
                return -3;
        }

 

That code appears to forget that inside the

if (pid == 0) /* child process */

statement, you're running in the child process, which is still a copy of the enigma2 process, since the execvp() failed. Unix exec calls *only* return if they fail. The correct thing to do IMO is to replace:

eDebug("[eConsoleAppContainer] Finished %s", cmd);
_exit(0);

with:

eDebug("[eConsoleAppContainer] Failed to execute %s", cmd);
_exit(127);

That allows the parent enigma2 process's waitpid() in eConsoleAppContainer::readyRead() to be able to detect that the exec failed. I'm not terribly fussed about the return code other than that it should be non-zero. I suggested 127 because that's what the shell uses as a return code in similar circumstances, i.e. the return code of the shell when you run

 

sh -c noSuchProgram

 

is 127:

prl$ sh -c noSuchProgram; echo $?
sh: noSuchProgram: command not found
127
prl$

There is no need to check the return value of execvp(). If it returns, the execvp() failed. From The Fine Manual (execvp(3)): "The exec() functions return only if an error has occurred." It would make sense to also check errno and add the string name for the system error to the eDebug().



Re: Error handling in eConsoleAppContainer #16 prl

  • Senior Member
  • 36 posts

+2
Neutral

Posted 16 November 2018 - 03:52

Is it relevant to know whether the proces was aborted due to a signal or due to an exit? Personally I don't think so, the only thing that's relevant is whether it succeeded or not. Please note that stuff like segmentation faults and division by zero conditions are also delivered as signal (although they mostly mean there is a bug in the program).

 

I agree the current code can't work, it won't detect a process resulting in an error. It will only detect a failed fork (e.g. due to insufficient resources). I don't know what bidirpipe does here though. Does it only create a bidirectional pipe, like the name suggests, or does it also system/exec/etc a new process? That would be strange, because that's already done at this point? Why then is the "cmdline" and "argv" passed?

 

The code at the moment fails to deliver any error indication in the appClosed() PSignal if the execvp() call fails (i.e. it tried to execute a non-existent program, or at least one that's not in $PATH) or if the program terminated abnormally by a signal.

 

I know that signals are used for both inter-process communication and for program internal errors like SIGSEGV and SIGFPE. But if they are not caught and handled by the program, they are still failures to run to completion, and should be flagged as an error in appClosed(). Currently an error on signal termination is only flagged ifthe child process is killed using eConsoleAppContainer::kill(), though not if it gets a SIGKILL from aiywhere else.

 

The birdirpipe() is an example of the classical Unix pattern: create pipes / fork / close unused pipe fds / exec named program in child process. The cmd and cmdline arguments respectively name the program to run and provide its runtime arguments.

 

If you're unsure how it all hangs together: vfork() (and fork()) duplicate the running process and so creates a new process (but vfork() does it in a more memory-efficient manner in most cases). Both processes remain runnable and executing the same code: initially they only differ by the return value of vfork() (or fork()).

 

The execvp() call (indeed all variants of exec()) then replaces the image of the currently running program (normally the child process) with a new program image, in the file named in the cmd argument in this case, and starts it running with its "commandline" arguments (passed in cmdline in this case).

 

If the execvp() call fails, the correct action almost always is to exit the child process with a non-zero exit status as soon as possible (after perhaps printing some sort of error message).

 

The issues I'm trying to address have nothing at all to do with whether the parent process uses vfork() or fork() to create the new process. The choice to use vfork() is the correct one, but for reasons that have nothing to do with what I'm trying to fix.



Re: Error handling in eConsoleAppContainer #17 prl

  • Senior Member
  • 36 posts

+2
Neutral

Posted 16 November 2018 - 05:25

... is there really something bad about the eConsoleAppContainer

 

Yes: it fails to return an error status if the program can't be run, or if the program terminates because of a signal, other than SIGKILL sent by eConsoleAppContainer::kill().



Re: Error handling in eConsoleAppContainer #18 prl

  • Senior Member
  • 36 posts

+2
Neutral

Posted 16 November 2018 - 05:57

To 2: You mean using WIFSIGNALED and WTERMSIG to get the signal and return this?

 

Yes, precisely (well, -WTERMSIG(stat)).

 

 Is the return value important? Why not just use WIFSIGNALED to know that a signal terminated the child and then return -1 to be compatible with the current code?

 

The reason I suggest using -signum rather than -1 is that there seems to be no good reason to hide the signal number from the recipient(s) of the appClosed() PSignal.

 

Old code that does

def onAppClosed(self, retval):
   if retval:
      # error in child process
    else:
      # child process succeeded

Will still work.

 

Most uses of eConsoleAppContainer::execute() that I've looked at simply ignore the retval in the callback anyway.

 

The potential issue with using -signum stems from the fact that some code does things like:

        retval = self.container.execute(... command arguments ...)
        if retval:
            self.onAppClosed(retval)

Where it's assumed that the values passed to onAppClosed() via the PSignal method don't overlap in incompatible ways with values returned by eConsoleAppContainer::execute().



Re: Error handling in eConsoleAppContainer #19 littlesat

  • PLi® Core member
  • 56,123 posts

+685
Excellent

Posted 16 November 2018 - 09:49


Yes: it fails to return an error status if the program can't be run, or if the program terminates because of a signal, other than SIGKILL sent by eConsoleAppContainer::kill().

Thanks... sounds like this indeed need to be fixed... I even prefer that is always 'goes' through onAppClosed... an not using the retval at the execute command at al...


WaveFrontier 28.2E | 23.5E | 19.2E | 16E | 13E | 10/9E | 7E | 5E | 1W | 4/5W | 15W


Re: Error handling in eConsoleAppContainer #20 betacentauri

  • PLi® Core member
  • 7,185 posts

+323
Excellent

Posted 16 November 2018 - 09:52

To 1 again: Yes, I overlooked it.
But I still think using 127 is not a good idea. If somebody looks at it in 1 year he can't understand why 127 is used. An execvp failure can be caused by several issues like binary not there, binary is not executable, filesystem is mounted with noexec,... . Either return e.g. the errno or just use:

exit(EXIT_FAILURE)

To 2: For me ok. As far as I can see the extra information won't hurt.


Xtrend ET-9200, ET-8000, ET-10000, OpenPliPC on Ubuntu 12.04


0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users