官方链接:http://erlang.org/doc/man/supervisor.html
http://erlang.org/doc/design_principles/sup_princ.html
The supervisor is responsible for starting, stopping and monitoring its child processes. The basic idea of a supervisor is that it should keep its child processes alive by restarting them when necessary.
The children of a supervisor is defined as a list of child specifications. When the supervisor is started, the child processes are started in order from left to right according to this list. When the supervisor terminates, it first terminates its child processes in reversed start order, from right to left.
A supervisor can have one of the following restart strategies:
-
one_for_one - if one child process terminates and should be restarted, only that child process is affected.
-
one_for_all - if one child process terminates and should be restarted, all other child processes are terminated and then all child processes are restarted.
-
rest_for_one - if one child process terminates and should be restarted, the ‘rest‘ of the child processes -- i.e. the child processes after the terminated child process in the start order -- are terminated. Then the terminated child process and all child processes after it are restarted.
-
simple_one_for_one - a simplified one_for_one supervisor, where all child processes are dynamically added instances of the same process type, i.e. running the same code.
The functions delete_child/2 and restart_child/2 are invalid for simple_one_for_one supervisors and will return {error,simple_one_for_one} if the specified supervisor uses this restart strategy.
The function terminate_child/2 can be used for children under simple_one_for_one supervisors by giving the child‘s pid() as the second argument. If instead the child specification identifier is used, terminate_child/2 will return {error,simple_one_for_one}.
Because a simple_one_for_one supervisor could have many children, it shuts them all down at same time. So, order in which they are stopped is not defined. For the same reason, it could have an overhead with regards to the Shutdown strategy.
To prevent a supervisor from getting into an infinite loop of child process terminations and restarts, a maximum restart frequency is defined using two integer values MaxR and MaxT. If more than MaxR restarts occur within MaxT seconds, the supervisor terminates all child processes and then itself.
This is the type definition of a child specification:
child_spec() = {Id,StartFunc,Restart,Shutdown,Type,Modules}
Id = term()
StartFunc = {M,F,A}
M = F = atom()
A = [term()]
Restart = permanent | transient | temporary
Shutdown = brutal_kill | int()>0 | infinity
Type = worker | supervisor
Modules = [Module] | dynamic
Module = atom()
-
Id is a name that is used to identify the child specification internally by the supervisor.
-
StartFunc defines the function call used to start the child process. It should be a module-function-arguments tuple {M,F,A} used as apply(M,F,A).
The start function must create and link to the child process, and should return {ok,Child} or {ok,Child,Info} where Child is the pid of the child process and Info an arbitrary term which is ignored by the supervisor.It should be (or result in) a call to supervisor:start_link, gen_server:start_link, gen_fsm:start_link or gen_event:start_link. (Or a function compliant with these functions, see supervisor(3) for details.
-
The start function can also return ignore if the child process for some reason cannot be started, in which case the child specification will be kept by the supervisor (unless it is a temporary child) but the non-existing child process will be ignored.
If something goes wrong, the function may also return an error tuple {error,Error}.
Note that the start_link functions of the different behaviour modules fulfill the above requirements.
-
Restart defines when a terminated child process should be restarted. A permanent child process should always be restarted, a temporary child process should never be restarted (even when the supervisor‘s restart strategy is rest_for_one or one_for_all and a sibling‘s death causes the temporary process to be terminated) and a transient child process should be restarted only if it terminates abnormally, i.e. with another exit reason than normal, shutdown or {shutdown,Term}.
-
Shutdown defines how a child process should be terminated. brutal_kill means the child process will be unconditionally terminated using exit(Child,kill). An integer timeout value means that the supervisor will tell the child process to terminate by calling exit(Child,shutdown) and then wait for an exit signal with reason shutdown back from the child process. If no exit signal is received within the specified number of milliseconds, the child process is unconditionally terminated using exit(Child,kill).
If the child process is another supervisor, Shutdown should be set to infinity to give the subtree ample time to shutdown. It is also allowed to set it to infinity, if the child process is a worker.
WarningBe careful by setting the Shutdown strategy to infinity when the child process is a worker. Because, in this situation, the termination of the supervision tree depends on the child process, it must be implemented in a safe way and its cleanup procedure must always return.
Note that all child processes implemented using the standard OTP behavior modules automatically adhere to the shutdown protocol.
-
Type specifies if the child process is a supervisor or a worker.
-
Modules is used by the release handler during code replacement to determine which processes are using a certain module. As a rule of thumb Modules should be a list with one element [Module], where Module is the callback module, if the child process is a supervisor, gen_server or gen_fsm. If the child process is an event manager (gen_event) with a dynamic set of callback modules, Modules should be dynamic. See OTP Design Principles for more information about release handling.
-
Internally, the supervisor also keeps track of the pid Child of the child process, or undefined if no pid exists.