Home > English, Erlang > Supervisors in Erlang OTP

Supervisors in Erlang OTP


A supervisor is an OTP behaviour (a process design pattern) made available in a set of library modules that come with the normal Erlang distribution. In practice a supervisor is a piece of code, done by expert developers, which simplyfies our life!
Basically OTP provides two different kinds of processes: workers (e.g. gen_servers or gen_fsm) that as you can easily understand do the actual job and supervisors which monitor workers’s (or other supervisors’s) statuses.

In the figure above, square boxes represents supervisors and circles represent workers.

A supervisor’s task is not only to check whether its childrens are up and running or not, it also may take actions in one of these cases (e.g. restart a child if it is down).

As always let’s introduce some code and let’s see how the things work!

-module(my_supervisor).
-behaviour(supervisor).

%% API
-export([start_link/0]).

%% Supervisor callbacks
-export([init/1]).
-define(SERVER, ?MODULE).

%%====================================================================
%% API functions
%%====================================================================
%%--------------------------------------------------------------------
%% Function: start_link() -> {ok,Pid} | ignore | {error,Error}
%% Description: Starts the supervisor
%%--------------------------------------------------------------------
start_link() ->
  supervisor:start_link({local, ?SERVER}, ?MODULE, []).
%%====================================================================
%% Supervisor callbacks
%%====================================================================
%%--------------------------------------------------------------------
%% Func: init(Args) -> {ok,  {SupFlags,  [ChildSpec]}} |
%%                     ignore                          |
%%                     {error, Reason}
%% Description: Whenever a supervisor is started using
%% supervisor:start_link/[2,3], this function is called by the new process
%% to find out about restart strategy, maximum restart frequency and child
%% specifications.
%%--------------------------------------------------------------------
init([]) ->
  Child = {mychild, {mychild, start_link, []},
           permanent, 2000, worker, [mychild]},
  {ok, {{one_for_all, 1, 1}, [Child]}}.

The supervisor is started by the function start_link/0, which call the init/1 function.

As you can see we declared a variable named Child which represents the specification of a child for which the supervisor is responsible.

The variable is in the form: {Id, StartFunc, Restart, Shutdown, Type, Modules}

Id : is the name used to identify the child specification internally by the supervisor

StartFunc : represents the function used to start the child process. It is a module-function-arguments tuple used as apply(M, F, A)

Restart : defines when a terminated child process should be restarted. It can be one between:

  1. permanent : in this case child process is always restarted
  2. transient : in this case child process is restarted only if it terminates abnormally, i.e. with another exit reason than normal
  3. temporary : in this case child process is never restarted

Shutdown : defines how a child process should be terminated

  1. brutal_kill : in this case child process is unconditionally terminated using exit(Child, kill)
  2. integer value : in this case the supervisor tells the child process to terminate by calling exit(Child, shutdown) and then waits for an exit signal back. If no exit signal is received within the specified time, the child process is unconditionally terminated using exit(Child, kill)
  3. infinity : in this case child process is another supervisor, shutdown should be set to infinity to give the subtree enough time to shutdown

Type : it specifies if the child process is a supervisor or a worker (can be supervisor/worker)

Modules : this should be a list with one element [Module], where Module is the name of the callback module, if the child process is a supervisor, gen_server or gen_fsm. If the child process is a gen_event, Modules should be dynamic

The last tuple is in the form: {ok, {{RestartStrategy, MaxR, MaxT}, [Child]}}

RestartStrategy : indicates how to handle process restart. Can be one of the following:

  1. one_for_one : if the corresponding child process terminates, only that process is restarted
  2. one_for_all : if the child process terminates, all other child processes are terminated and then all child processes, including the terminated one, are restarted
  3. rest_for_one : if the child process terminates, the ‘rest’ of the child processes (i.e. the child processes after the terminated process in start order ) are terminated. Then the terminated child process and the rest of the child processes are restarted

MaxR and MaxT are related: if more than MaxR number of restarts occur in the last MaxT seconds, then the supervisor terminates all the child processes and then itself. When the supervisor terminates, then the next higher level supervisor takes some action. It either restarts the terminated supervisor, or terminates itself.

As you can see the only element inside the list is Child that contains the child specifications we set before; obviously you can add to this list more than one child specification.

Let’s test our supervisor. The first thing that I will do is coding a simple gen_server that prints a message anytime it is started:

-module(mychild).
-behaviour(gen_server).

%% API
-export([start_link/0]).

%% gen_server callbacks
-export([init/1, handle_call/3, handle_cast/2, handle_info/2,
	 terminate/2, code_change/3]).

-record(state, {}).

%%====================================================================
%% API
%%====================================================================
%%--------------------------------------------------------------------
%% Function: start_link() -> {ok,Pid} | ignore | {error,Error}
%% Description: Starts the server
%%--------------------------------------------------------------------
start_link() ->
    gen_server:start_link({local, ?MODULE}, ?MODULE, [], []).

%%====================================================================
%% gen_server callbacks
%%====================================================================

%%--------------------------------------------------------------------
%% Function: init(Args) -> {ok, State} |
%%                         {ok, State, Timeout} |
%%                         ignore               |
%%                         {stop, Reason}
%% Description: Initiates the server
%%--------------------------------------------------------------------
init([]) ->
    io:format("supervisor started me!~n", []),
    {ok, #state{}}.

%%--------------------------------------------------------------------
%% Function: %% handle_call(Request, From, State) -> {reply, Reply, State} |
%%                                      {reply, Reply, State, Timeout} |
%%                                      {noreply, State} |
%%                                      {noreply, State, Timeout} |
%%                                      {stop, Reason, Reply, State} |
%%                                      {stop, Reason, State}
%% Description: Handling call messages
%%--------------------------------------------------------------------
handle_call(_Request, _From, State) ->
    Reply = ok,
    {reply, Reply, State}.

%%--------------------------------------------------------------------
%% Function: handle_cast(Msg, State) -> {noreply, State} |
%%                                      {noreply, State, Timeout} |
%%                                      {stop, Reason, State}
%% Description: Handling cast messages
%%--------------------------------------------------------------------
handle_cast(_Msg, State) ->
    {noreply, State}.

%%--------------------------------------------------------------------
%% Function: handle_info(Info, State) -> {noreply, State} |
%%                                       {noreply, State, Timeout} |
%%                                       {stop, Reason, State}
%% Description: Handling all non call/cast messages
%%--------------------------------------------------------------------
handle_info(_Info, State) ->
    {noreply, State}.

%%--------------------------------------------------------------------
%% Function: terminate(Reason, State) -> void()
%% Description: This function is called by a gen_server when it is about to
%% terminate. It should be the opposite of Module:init/1 and do any necessary
%% cleaning up. When it returns, the gen_server terminates with Reason.
%% The return value is ignored.
%%--------------------------------------------------------------------
terminate(_Reason, _State) ->
    ok.

%%--------------------------------------------------------------------
%% Func: code_change(OldVsn, State, Extra) -> {ok, NewState}
%% Description: Convert process state when code is changed
%%--------------------------------------------------------------------
code_change(_OldVsn, State, _Extra) ->
    {ok, State}.

Now let’s test it in an Erlang shell:

bellerofonte@pegaso:~/Desktop$ erl
Eshell V5.7.3  (abort with ^G)
1> l(mysupervisor).
{module,mysupervisor}
2> mysupervisor:start_link().
supervisor started me!
{ok,<0.35.0>}
3> whereis(mychild).
<0.36.0>
4> erlang:exit(whereis(mychild), kill).
true
supervisor started me!
5> whereis(mychild).
<0.39.0>

As you can see as we start the supervisor, the child is started as well with the pid <0.36.0>.

After that we kill that child process using the Erlang BIF exit/2; at that point the supervisor restarts the child and a following whereis/1 command shows that the process has been restarted with a new pid <0.39.0>.

A lot of more information about this topic may be found on the official page of Erlang otp design principles!

  1. May 7, 2010 at 7:40 pm

    Hi Paolo,
    great blog. Interesting posts and very well written. I am just about to dive deep into Erlang, so this is all very helpful to me.🙂
    Thanks, Philipp.

    • pdincau
      May 10, 2010 at 1:11 pm

      Hope you will post other comments!

      Paolo

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: