The SME Server addresses these issues by running processes under the runit process supervision environment, which:
runs each process under control of its own supervisor process
imposes process limits
restarts the process if it fails
provides a consistent mechanism for controlling the underlying process
Note: Gerrit Pape's runit came from previous work by Dan Bernstein on the supervise supervision environment. runit provides additional features, and has been released under a free software license.
When a Linux system boots, it starts the init process, which then starts all other processes. When init enters "run-level 7", it starts /etc/runit/2 from an entry in /etc/inittab.
/etc/runit/2 starts the runsvdir master supervision process, which scans the /service/ directory for work to do. If the runsvdir command happened to fail, it would be restarted by init.
The runsvdir command looks for subdirectories under the /service/ directory, and starts a runsv process to manage that directory. If any of the runsv processes fail, they will be restarted by runsvdir.
Each runsv process looks for a run script under the directory it is managing. runsv runs the run script and keeps a connection to the process started by that script. If the process dies, it is restarted.
If the directory also has a log subdirectory, runsv runs run script in that directory and connects the output of the main program to the input of the "logger" process.
This produces a process tree which looks something like this:
[root@gsxdev1 events]# pstree 1 init-+-acpid |-md1_raid1 |-md2_raid1 | ... |-runsvdir-+-runsv-+-multilog | | `-ulogd | |-6*[runsv---multilog] | |-runsv-+-multilog | | `-ntpd | |-runsv-+-multilog | | `-tinydns | |-runsv-+-cvm-unix | | `-multilog | |-runsv-+-multilog | | `-mysqld | |-5*[runsv-+-multilog] | | `-tcpsvd] | |-runsv-+-multilog | | `-oidentd | |-runsv-+-multilog | | `-smtp-auth-proxy | |-runsv-+-multilog | | `-smbd---smbd | |-runsv---httpd---10*[httpd]
This looks like a complex process tree, but is a critical part of the SME Server's design for reliability. Each process is independent, has a consistent management interface, has process limits imposed on it, and will restart if it happens to fail.
Note: For the curious, if init fails, the system reboots.
For further documentation on runit, refer to the runit manual page.
The SME Server runs in the normally unused run-level 7. This ensures that the only software running on the SME Server is software that we have chosen to run, and it is started and stopped in a consistent way. If we need to replace a standard startup script with one which runs the process under supervise, we can do so without modifying the original package.
In order to run a process under run-level 7, all you need to do is provide a link in the /etc/rc.d/rc7.d/ directory to your startup script. However, in most cases your process should only start if it is enabled in the configuration database.
If you look at the /etc/rc.d/rc7.d/ directory. you will see that it contains a large number of links to the /etc/rc.d/init.d/e-smith-service script.
S00microcode_ctl -> /etc/rc.d/init.d/e-smith-service S05syslog -> /etc/rc.d/init.d/e-smith-service S06cpuspeed -> /etc/rc.d/init.d/e-smith-service S15nut -> ../init.d/e-smith-service S15raidmonitor -> /etc/rc.d/init.d/e-smith-service S26apmd -> /etc/rc.d/init.d/e-smith-service S35bootstrap-console -> /etc/rc.d/init.d/e-smith-service [...]
This script is key to ensuring that services start when they are enabled and do not start when they are disabled, as it:
Checks the name of the link, e.g. S05syslog
Removes the S05 prefix, leaving syslog
Checks to see whether syslog is defined in the configuration database, and whether it has its status set to enabled.
If so, it runs the /etc/init.d/syslog script with the argument start.
If the service is not enabled, it exits without starting the service.
Note: If a script exists in the /etc/init.d/supervise/ directory, e-smith-service will use that in preference to the one in the /etc/init.d/ directory. This allows us to install our own supervised startup scripts without modifying the original package.