Other interesting phrases from the man page:
Historic description
Under Linux, fork(2) is implemented using copy-on-write pages, so the only penalty incurred by fork(2) is the time and memory required to duplicate the parent's page tables, and to create
a unique task structure for the child. However, in the bad old days a fork(2) would require making a complete copy of the caller's data space, often needlessly, since usually immediately
afterward an exec(3) is done. Thus, for greater efficiency, BSD introduced the vfork() system call, which did not fully copy the address space of the parent process, but borrowed the par‐
ent's memory and thread of control until a call to execve(2) or an exit occurred. The parent process was suspended while the child was using its resources. The use of vfork() was tricky:
for example, not modifying data in the parent process depended on knowing which variables were held in a register.
NOTES
Some consider the semantics of vfork() to be an architectural blemish, and the 4.2BSD man page stated: "This system call will be eliminated when proper system sharing mechanisms are imple‐
mented. Users should not depend on the memory sharing semantics of vfork() as it will, in that case, be made synonymous to fork(2)." However, even though modern memory management hard‐
ware has decreased the performance difference between fork(2) and vfork(), there are various reasons why Linux and other systems have retained vfork():
* Some performance-critical applications require the small performance advantage conferred by vfork().
* vfork() can be implemented on systems that lack a memory-management unit (MMU), but fork(2) can't be implemented on such systems. (POSIX.1-2008 removed vfork() from the standard; the
POSIX rationale for the posix_spawn(3) function notes that that function, which provides functionality equivalent to fork(2)+exec(3), is designed to be implementable on systems that
lack an MMU.)
* On systems where memory is constrained, vfork() avoids the need to temporarily commit memory (see the description of /proc/sys/vm/overcommit_memory in proc(5)) in order to execute a new
program. (This can be especially beneficial where a large parent process wishes to execute a small helper program in a child process.) By contrast, using fork(2) in this scenario
requires either committing an amount of memory equal to the size of the parent process (if strict overcommitting is in force) or overcommitting memory with the risk that a process is
terminated by the out-of-memory (OOM) killer.
Very interesting the caveats:
Caveats
The child process should take care not to modify the memory in unintended ways, since such changes will be seen by the parent process once the child terminates or executes another program.
In this regard, signal handlers can be especially problematic: if a signal handler that is invoked in the child of vfork() changes memory, those changes may result in an inconsistent
process state from the perspective of the parent process (e.g., memory changes would be visible in the parent, but changes to the state of open file descriptors would not be visible).
When vfork() is called in a multithreaded process, only the calling thread is suspended until the child terminates or executes a new program. This means that the child is sharing an
address space with other running code. This can be dangerous if another thread in the parent process changes credentials (using setuid(2) or similar), since there are now two processes
with different privilege levels running in the same address space. As an example of the dangers, suppose that a multithreaded program running as root creates a child using vfork(). After
the vfork(), a thread in the parent process drops the process to an unprivileged user in order to run some untrusted code (e.g., perhaps via plug-in opened with dlopen(3)). In this case,
attacks are possible where the parent process uses mmap(2) to map in code that will be executed by the privileged child process.
and even more:
(From POSIX.1) The vfork() function has the same effect as fork(2), except that the behavior is undefined if the process created by vfork() either modifies any data other than a variable
of type pid_t used to store the return value from vfork(), or returns from the function in which vfork() was called, or calls any other function before successfully calling _exit(2) or one
of the exec(3) family of functions.
Also:
CONFORMING TO
4.3BSD; POSIX.1-2001 (but marked OBSOLETE). POSIX.1-2008 removes the specification of vfork().
The requirements put on vfork() by the standards are weaker than those put on fork(2), so an implementation where the two are synonymous is compliant. In particular, the programmer cannot
rely on the parent remaining blocked until the child either terminates or calls execve(2), and cannot rely on any specific behavior with respect to shared memory.
And interestingly:
History
Linux, it has been equivalent to fork(2) until 2.2.0-pre6 or so. Since 2.2.0-pre9 (on i386, somewhat later on other architectures) it is an independent system call. Support was added in
glibc 2.0.112.
Where we learn vfork() has been actually just fork() for quite some time.
Anyway, I am not going to be pedantic any further and rest my case. Just keep in mind that IF we start having strange behaviour (which we had in the past), it should be worth to have a look at the vfork() and surrounding code.