Jump to content


Photo

Error handling in eConsoleAppContainer


  • Please log in to reply
49 replies to this topic

Re: Error handling in eConsoleAppContainer #21 MiLo

  • PLi® Core member
  • 14,042 posts

+298
Excellent

Posted 16 November 2018 - 09:53

BTW the Linux manual says: The use of vfork() is unnecessary on Linux and also prone to all sorts of race conditions and undefined behaviour. I'd say, get rid of it and use fork() instead and secure defined behaviour. The parent will need to fetch the pid of the child though, and wait() for it.


The reason that vfork is being used here is that fork() will typically run out of address space. Enigma2, if it has been running for a few weeks, has allocated a huge amount of memory pages, and a fork() at that point will run out of address space because of that. That's why python's os.system() is also unreliable, it also uses the normal fork, and thus it usually works when you quickly test it, but it'll invariably fail on systems that have been up and running for a long time.

Following your reasoning enigma2 has some huge memory leaks. I guess we'll have to address that then.
 
Read the manual, it says Linux uses copy-on-write so only the descriptors are copied, not the pages themselves. Linux has been doing that for ages. So as long as you follow the fork() by an immediate exec(), the netto result is the same, but then without the great risk on ondefined behaviours and race conditions. The use of vfork() is discouraged on Linux.


I didn't say "memory", I'm talking about address space. The problem is that the forked process will be unable to copy the descriptors, there aren't enough resources to make a copy of E2.

fork() will fail on E2. We've seen this happening, and it keeps happening. As I wrote, it won't fail if you write a quick test program, but it will fail on an actuall running system, and once it reaches that point, all fork() call will fail.
Real musicians never die - they just decompose

Re: Error handling in eConsoleAppContainer #22 Erik Slagter

  • PLi® Core member
  • 46,951 posts

+541
Excellent

Posted 16 November 2018 - 15:04

 

@Erik Slagter

 

You are wrong. 
 
Remember three things: 
- fork is slower then vfork
- not only system RAM memory pages can be mapped (you can map file, registers and, so on, also same physical RAM pages could be mapped many times by different processes)
- huge amount of mapped pages could be caused by memory fragmentation not by memory leak
 
vfor() is not evil at all, it have some disadvantages, but what does not have them?

I've been using and programming on Unix systems since 1988. Linux wasn't even around then. Don't tell me what fork and vfork do. I just explained what they're for and why you shouldn't use vfork. Check it for yourself in the manual pages if you don't believe but don't be a wiseguy.


* Wavefrontier T90 with 28E/23E/19E/13E via SCR switches 2 x 2 x 6 user bands
I don't read PM -> if you have something to ask or to report, do it in the forum so others can benefit. I don't take freelance jobs.
Ik lees geen PM -> als je iets te vragen of te melden hebt, doe het op het forum, zodat anderen er ook wat aan hebben.


Re: Error handling in eConsoleAppContainer #23 Erik Slagter

  • PLi® Core member
  • 46,951 posts

+541
Excellent

Posted 16 November 2018 - 15:21

didn't say "memory", I'm talking about address space. The problem is that the forked process will be unable to copy the descriptors, there aren't enough resources to make a copy of E2.

fork() will fail on E2. We've seen this happening, and it keeps happening. As I wrote, it won't fail if you write a quick test program, but it will fail on an actuall running system, and once it reaches that point, all fork() call will fail.

You're writing "make a copy of E2". That suggests you're talking of the complete address space. Which you later deny.

 

What a fork does is copy the page descriptor entries from the page table, one for every 4096 bytes, 1:1 and mark them readonly. If the process is going to write to them, a page fault occurs, the offending page is copied by the kernel and the page descriptor entry is change to read/write. It's actually a very simple concept. If you're going to do an exec*() right after the fork(), no writing to memory will occur and the effect is almost exactly the same as using vfork(), but without the undefined behaviour traps of vfork(). That is also what the manual says, btw.

 

Also, there are lots of embedded systems that have less memory than the average enigma2 STB and I bet they can also just use fork(). Even better, I've had quite a few i86 Linux systems that had considerably less RAM than most enigma2 STB's and they ran a full fledged Linux distro including X nevertheless. One wouldn't get there if fork() wouldn't work.

 

I do think I know why fork fails on small enigma2 machines, though. Memory fragmentation caused by the drivers. Some kernel structures need larger chunks of adjacent physical memory, I could imagine that includes the page table (and indirections thereof). Early enigma2 systems had much fragementation and little RAM. This balance has been tipping the last few years, to more RAM and less fragmentation.

 

So imho the best way is to test this on the most vulnerable machine and see if the problems still occur nowadays. If it does, we're going to be stuck with vfork() for a while I guess. I could imagine though, that a one time, explicit vfork support in the kernel will be dropped and will be just an emulation that calls fork() and wait() instead.


* Wavefrontier T90 with 28E/23E/19E/13E via SCR switches 2 x 2 x 6 user bands
I don't read PM -> if you have something to ask or to report, do it in the forum so others can benefit. I don't take freelance jobs.
Ik lees geen PM -> als je iets te vragen of te melden hebt, doe het op het forum, zodat anderen er ook wat aan hebben.


Re: Error handling in eConsoleAppContainer #24 betacentauri

  • PLi® Core member
  • 7,185 posts

+323
Excellent

Posted 16 November 2018 - 16:16

The topic was about return values and not about fork vs vfork ;)

 

@all: What do you think about the return value 127 in bidirpipe and that the signals multiplicated with -1 should be returned in appClosed?


Xtrend ET-9200, ET-8000, ET-10000, OpenPliPC on Ubuntu 12.04

Re: Error handling in eConsoleAppContainer #25 littlesat

  • PLi® Core member
  • 56,123 posts

+685
Excellent

Posted 16 November 2018 - 16:39

+1

WaveFrontier 28.2E | 23.5E | 19.2E | 16E | 13E | 10/9E | 7E | 5E | 1W | 4/5W | 15W


Re: Error handling in eConsoleAppContainer #26 MiLo

  • PLi® Core member
  • 14,042 posts

+298
Excellent

Posted 16 November 2018 - 16:49

As for the original topic, returning non-zero on failure is certainly preferred.

As for fork(), well, if the E2 process is using say 300 MB of virtual memory, a fork() will temporarily require 600 MB of virtual memory to succeed. On a 512MB machine, that will fail unless you create a swapfile or allow a large overcommit factor.
Real musicians never die - they just decompose

Re: Error handling in eConsoleAppContainer #27 littlesat

  • PLi® Core member
  • 56,123 posts

+685
Excellent

Posted 16 November 2018 - 16:55

Is that what makes vfork ‘mandatory’

WaveFrontier 28.2E | 23.5E | 19.2E | 16E | 13E | 10/9E | 7E | 5E | 1W | 4/5W | 15W


Re: Error handling in eConsoleAppContainer #28 samsamsam

  • Senior Member
  • 2,024 posts

+146
Excellent

Posted 16 November 2018 - 22:36

@PLi® Core member

 

 


Check it for yourself in the manual pages if you don't believe but don't be a wiseguy.

 

You are funny. I know manual pages. Everything can be dangerous if you do not know how to use it.

If you want to read then maybe read also others things not only manual pages.

For example:

https://gist.github....84c23c18d7db234

 

 


I've been using and programming on Unix systems since 1988. Linux wasn't even around then. Don't tell me what fork and vfork do. I just explained what they're for and why you shouldn't use vfork.

 

 

You had conclusion that fork() failed must be caused by "huge memory leak"
I wrote that you are wrong and why.
 

In next post you agree:

 


I do think I know why fork fails on small enigma2 machines, though. Memory fragmentation caused by the drivers. 

 

Do you want to tell me something? It really doesn't looks good from your site.


Edited by samsamsam, 16 November 2018 - 22:38.


Re: Error handling in eConsoleAppContainer #29 prl

  • Senior Member
  • 36 posts

+2
Neutral

Posted 17 November 2018 - 04:59

The topic was about return values and not about fork vs vfork ;)

 

Yes. If people would like to wrangle over the comparative advantages and disadvantages of fork() vs vfork(), please make a topic for it. IIRC, vfork() exists because BSD4.2 didn't support copy-on-write page table entries.

 

The topic was about return values and not about fork vs vfork ;)

 

@all: What do you think about the return value 127 in bidirpipe and that the signals multiplicated with -1 should be returned in appClosed?

 

Re: the use of 127 as the error code, my motivation is what shells do for "sh -c" when exec fails with ENOENT: they return an exit code of 127 (and 126 if it fails with EPERM). The rationale for the choice can easily be documented as a comment. As for returning errno as the exit status when execvp() fails, the problem with that is that programs often use a few low-numbered exit codes to flag different sorts of error for their own purposes. If the retval in an appClosed() PSignal is 1 that doesn't necessarily mean that the program exited because a system call failed with EPERM (value 1). Programs call exit(1) for all sorts of reasons, and many of them for reasons that have nothing to do with a system call returning an error.

 

The program that prompted me to look into this returns error codes 1-10, and pretty much none of those codes correspond in meaning to the particular system (erron) codes with the same value.

 

I have noted that there may be code that is affected by having the retval for a child that is terminated by a signal return -sigval rather than -1. I haven't found code where it would make an actual difference to behaviour, but it doesn't mean that such code doesn't exist. If I was writing this as new code, I'd argue strongly for retval being -sigval, but since it's not new code, having retval -1 might be the best compromise.



Re: Error handling in eConsoleAppContainer #30 MiLo

  • PLi® Core member
  • 14,042 posts

+298
Excellent

Posted 17 November 2018 - 11:33

So in short:

execvp() fail: return 127
Looking at the man page for "system", that seems to be the consensus indeed.

Killed by signal: return -signal or -1
Since signal is only going to be "TERM" or "KILL" anyway, there's not much information in there.

If I recall correctly, the termination status for a process is a "short" value and the upper bits hold the signal (if any) and the lower byte is the exit code (0 in case of signal). So if you want to follow standard POSIX, using (signal << 8) would be preferable to -signal.
 

If apparently "-1" is already being used, just leave it at that, I'd say. We'll cross the signal bridge when we find it...


Edited by MiLo, 17 November 2018 - 11:37.

Real musicians never die - they just decompose

Re: Error handling in eConsoleAppContainer #31 Erik Slagter

  • PLi® Core member
  • 46,951 posts

+541
Excellent

Posted 17 November 2018 - 12:13

As for the original topic, returning non-zero on failure is certainly preferred.

As for fork(), well, if the E2 process is using say 300 MB of virtual memory, a fork() will temporarily require 600 MB of virtual memory to succeed. On a 512MB machine, that will fail unless you create a swapfile or allow a large overcommit factor.

 

I guess you failed to explictly mention the concept of "commit", see here (otherwise it would make no sense using the concept of C-O-W page tables):

 

On systems where memory is constrained, vfork() avoids the need to temporarily commit memory (see the description of /proc/sys/vm/overcommit_memory in proc(5)) in order to execute a new
          program.   (This  can  be  especially  beneficial where a large parent process wishes to execute a small helper program in a child process.)  By contrast, using fork(2) in this scenario
          requires either committing an amount of memory equal to the size of the parent process (if strict overcommitting is in force) or overcommitting memory with the risk that  a  process  is
          terminated by the out-of-memory (OOM) killer.

 

It has nothing to do with the actual amount of memory or the page table entries and may not necessarily be a problem. If you have large amounts of uncommited memory in your program I'd say you're doing something wrong (allocating memory you're not using in any way...)


Edited by Erik Slagter, 17 November 2018 - 12:21.

* Wavefrontier T90 with 28E/23E/19E/13E via SCR switches 2 x 2 x 6 user bands
I don't read PM -> if you have something to ask or to report, do it in the forum so others can benefit. I don't take freelance jobs.
Ik lees geen PM -> als je iets te vragen of te melden hebt, doe het op het forum, zodat anderen er ook wat aan hebben.


Re: Error handling in eConsoleAppContainer #32 Erik Slagter

  • PLi® Core member
  • 46,951 posts

+541
Excellent

Posted 17 November 2018 - 12:32

Other interesting phrases from the man page:

 

 

Historic description
       Under  Linux, fork(2) is implemented using copy-on-write pages, so the only penalty incurred by fork(2) is the time and memory required to duplicate the parent's page tables, and to create
       a unique task structure for the child.  However, in the bad old days a fork(2) would require making a complete copy of the caller's data space, often needlessly, since usually  immediately
       afterward an exec(3) is done.  Thus, for greater efficiency, BSD introduced the vfork() system call, which did not fully copy the address space of the parent process, but borrowed the par‐
       ent's memory and thread of control until a call to execve(2) or an exit occurred.  The parent process was suspended while the child was using its resources.  The use of vfork() was tricky:
       for example, not modifying data in the parent process depended on knowing which variables were held in a register.

 

 

NOTES

       Some consider the semantics of vfork() to be an architectural blemish, and the 4.2BSD man page stated: "This system call will be eliminated when proper system sharing mechanisms are imple‐
       mented.   Users  should not depend on the memory sharing semantics of vfork() as it will, in that case, be made synonymous to fork(2)."  However, even though modern memory management hard‐
       ware has decreased the performance difference between fork(2) and vfork(), there are various reasons why Linux and other systems have retained vfork():
 
       *  Some performance-critical applications require the small performance advantage conferred by vfork().
 
       *  vfork() can be implemented on systems that lack a memory-management unit (MMU), but fork(2) can't be implemented on such systems.  (POSIX.1-2008 removed vfork() from the  standard;  the
          POSIX  rationale  for  the  posix_spawn(3) function notes that that function, which provides functionality equivalent to fork(2)+exec(3), is designed to be implementable on systems that
          lack an MMU.)
 
       *  On systems where memory is constrained, vfork() avoids the need to temporarily commit memory (see the description of /proc/sys/vm/overcommit_memory in proc(5)) in order to execute a new
          program.   (This  can  be  especially  beneficial where a large parent process wishes to execute a small helper program in a child process.)  By contrast, using fork(2) in this scenario
          requires either committing an amount of memory equal to the size of the parent process (if strict overcommitting is in force) or overcommitting memory with the risk that  a  process  is
          terminated by the out-of-memory (OOM) killer.

 

Very interesting the caveats:

 

 

   Caveats

       The child process should take care not to modify the memory in unintended ways, since such changes will be seen by the parent process once the child terminates or executes another program.
       In this regard, signal handlers can be especially problematic: if a signal handler that is invoked in the child of vfork() changes memory, those  changes  may  result  in  an  inconsistent
       process state from the perspective of the parent process (e.g., memory changes would be visible in the parent, but changes to the state of open file descriptors would not be visible).
 
       When  vfork()  is  called  in  a  multithreaded process, only the calling thread is suspended until the child terminates or executes a new program.  This means that the child is sharing an
       address space with other running code.  This can be dangerous if another thread in the parent process changes credentials (using setuid(2) or similar), since there are  now  two  processes
       with different privilege levels running in the same address space.  As an example of the dangers, suppose that a multithreaded program running as root creates a child using vfork().  After
       the vfork(), a thread in the parent process drops the process to an unprivileged user in order to run some untrusted code (e.g., perhaps via plug-in opened with dlopen(3)).  In this  case,
       attacks are possible where the parent process uses mmap(2) to map in code that will be executed by the privileged child process.

 

and even more:

 

 

        (From  POSIX.1)  The vfork() function has the same effect as fork(2), except that the behavior is undefined if the process created by vfork() either modifies any data other than a variable

       of type pid_t used to store the return value from vfork(), or returns from the function in which vfork() was called, or calls any other function before successfully calling _exit(2) or one
       of the exec(3) family of functions.

 

Also:

 

 

CONFORMING TO

       4.3BSD; POSIX.1-2001 (but marked OBSOLETE).  POSIX.1-2008 removes the specification of vfork().
 
       The requirements put on vfork() by the standards are weaker than those put on fork(2), so an implementation where the two are synonymous is compliant.  In particular, the programmer cannot
       rely on the parent remaining blocked until the child either terminates or calls execve(2), and cannot rely on any specific behavior with respect to shared memory.

 

And interestingly:

 

 

   History

       The vfork() system call appeared in 3.0BSD.  In 4.4BSD it was made synonymous to fork(2) but NetBSD introduced it again;  see  〈http://www.netbsd.or...rnel/vfork.html〉.   In
       Linux,  it has been equivalent to fork(2) until 2.2.0-pre6 or so.  Since 2.2.0-pre9 (on i386, somewhat later on other architectures) it is an independent system call.  Support was added in
       glibc 2.0.112.

 

Where we learn vfork() has been actually just fork() for quite some time.

 

Anyway, I am not going to be pedantic any further and rest my case. Just keep in mind that IF we start having strange behaviour (which we had in the past), it should be worth to have a look at the vfork() and surrounding code.


* Wavefrontier T90 with 28E/23E/19E/13E via SCR switches 2 x 2 x 6 user bands
I don't read PM -> if you have something to ask or to report, do it in the forum so others can benefit. I don't take freelance jobs.
Ik lees geen PM -> als je iets te vragen of te melden hebt, doe het op het forum, zodat anderen er ook wat aan hebben.


Re: Error handling in eConsoleAppContainer #33 betacentauri

  • PLi® Core member
  • 7,185 posts

+323
Excellent

Posted 17 November 2018 - 14:57

@prl: Can you create a patch/merge request?


Xtrend ET-9200, ET-8000, ET-10000, OpenPliPC on Ubuntu 12.04

Re: Error handling in eConsoleAppContainer #34 littlesat

  • PLi® Core member
  • 56,123 posts

+685
Excellent

Posted 17 November 2018 - 15:43

I somehow don’t care how it’s done... Only prefer that when it is going wrong it is going through onAppClosed.... instread of catching failliets in two ways...

WaveFrontier 28.2E | 23.5E | 19.2E | 16E | 13E | 10/9E | 7E | 5E | 1W | 4/5W | 15W


Re: Error handling in eConsoleAppContainer #35 MiLo

  • PLi® Core member
  • 14,042 posts

+298
Excellent

Posted 17 November 2018 - 18:26

...This  can  be  especially  beneficial where a large parent process wishes to execute a small helper program in a child process...

 
Spot on!

And yes, we're aware of the caveats.
Real musicians never die - they just decompose

Re: Error handling in eConsoleAppContainer #36 prl

  • Senior Member
  • 36 posts

+2
Neutral

Posted 18 November 2018 - 08:16

@prl: Can you create a patch/merge request?

 

Yes, when I've actually coded it. I didn't want to start coding until it looked like it might be accepted.



Re: Error handling in eConsoleAppContainer #37 prl

  • Senior Member
  • 36 posts

+2
Neutral

Posted 18 November 2018 - 08:23

So in short:

execvp() fail: return 127
Looking at the man page for "system", that seems to be the consensus indeed.

Killed by signal: return -signal or -1
Since signal is only going to be "TERM" or "KILL" anyway, there's not much information in there.

If I recall correctly, the termination status for a process is a "short" value and the upper bits hold the signal (if any) and the lower byte is the exit code (0 in case of signal). So if you want to follow standard POSIX, using (signal << 8) would be preferable to -signal.
 

If apparently "-1" is already being used, just leave it at that, I'd say. We'll cross the signal bridge when we find it...

 

The reason for the suggestion using -signal rather than signal << 8 was that a program can't both exit() and be terminated by a signal. But I think that -1 is probably the simplest way to ensure backward compatibility, so I'll go with that.

 

The program can be terminated by lots of signals sent either by other processes (unlikely, though), or triggered by its own actions: SIGBUS, SIGSEGV, SGFPE, among others. It could send itself a SIGABRT by calling abort().



Re: Error handling in eConsoleAppContainer #38 Erik Slagter

  • PLi® Core member
  • 46,951 posts

+541
Excellent

Posted 18 November 2018 - 10:34

 

 

...This  can  be  especially  beneficial where a large parent process wishes to execute a small helper program in a child process...

 

 
Spot on!

And yes, we're aware of the caveats.

Somewhere else it also says  the issue actually raises when doing the execve(), it's not the fork itself where it's happening. So that may be the source of the confusion. The fork does not do any copying (still).


* Wavefrontier T90 with 28E/23E/19E/13E via SCR switches 2 x 2 x 6 user bands
I don't read PM -> if you have something to ask or to report, do it in the forum so others can benefit. I don't take freelance jobs.
Ik lees geen PM -> als je iets te vragen of te melden hebt, doe het op het forum, zodat anderen er ook wat aan hebben.


Re: Error handling in eConsoleAppContainer #39 prl

  • Senior Member
  • 36 posts

+2
Neutral

Posted 19 November 2018 - 00:00

 

Somewhere else it also says  the issue actually raises when doing the execve(), it's not the fork itself where it's happening. So that may be the source of the confusion. The fork does not do any copying (still).

 

 

 

The issue I raised has nothing at all to do with whether vfork() or fork() is used. It only has to do with the exit codes returned by the child process, which only exists if the vfork() (or fork()) succeeds.



Re: Error handling in eConsoleAppContainer #40 prl

  • Senior Member
  • 36 posts

+2
Neutral

Posted 19 November 2018 - 02:08

This is my proposed patch. I'm happy to make it a pull request if that's preferred.

diff --git a/lib/base/console.cpp b/lib/base/console.cpp
index d2d110f..9979da5 100644
--- a/lib/base/console.cpp
+++ b/lib/base/console.cpp
@@ -42,9 +42,14 @@ int bidirpipe(int pfd[], const char *cmd , const char * const argv[], const char
             chdir(cwd);
 
         execvp(cmd, (char * const *)argv);
-                /* the vfork will actually suspend the parent thread until execvp is called. thus it's ok to use the shared arg/cmdline pointers here. */
-        eDebug("[eConsoleAppContainer] Finished %s", cmd);
-        _exit(0);
+        // The vfork will actually suspend the parent thread until
+        // execvp is called.
+        // Thus it's ok to use the shared arg/cmdline pointers here.
+        // Use fprintf() rather than eDebug(), because this goes
+        // to the child's stderr.
+        fprintf(stderr, "Failed to run %s - %s", cmd, strerror(errno));
+        // Follow the convention for the return value in system(3)
+        _exit(127);
     }
     if (close(pfdout[0]) == -1 || close(pfdin[1]) == -1 || close(pfderr[1]) == -1)
             return(-1);
@@ -252,7 +257,7 @@ void eConsoleAppContainer::readyRead(int what)
     if (hungup)
     {
         int childstatus;
-        int retval = killstate;
+        int retval;
         /*
          * We have to call 'wait' on the child process, in order to avoid zombies.
          * Also, this gives us the chance to provide better exit status info to appClosed.
@@ -263,6 +268,15 @@ void eConsoleAppContainer::readyRead(int what)
             {
                 retval = WEXITSTATUS(childstatus);
             }
+            else if (WIFSIGNALED(childstatus))
+            {
+                retval = -1;
+            }
+            else
+            {
+                // Hasn't terminated yet (e.g. some sort of stop signal)
+                return;
+            }
         }
         closePipes();
         /*emit*/ appClosed(retval);



1 user(s) are reading this topic

0 members, 1 guests, 0 anonymous users