Archive

Posts Tagged ‘supervisor’

How to handle configuration in init/1 function without slowing down your erlang supervisor startup

April 17, 2013 1 comment

Many times if you work with Erlang and follow the OTP design principles in your implementation, you may end up having one or more supervisors spawning a set of processes that can be either other supervisors or workers.

Most likely the child processes implementing the workers will be based on gen_server behaviour and whether you’re working on your small side project or in some big company project they will need some sort of initialization during the start-up phase: in fact creating an ets or mnesia table, reading a configuration file or accepting connections on a socket are pretty common operations that you want to be executed before the worker handles  other messages and executes the operations connected to such messages.

According to the relative documentation, the gen_server process calls Module:init/1 to initialize and therefore the first strategy you may think to employ consists in doing the operations listed above within this function. What I mean here is something like:

start_link() ->
    gen_server:start_link({local, ?SERVER}, ?MODULE, [], []).

init([]) ->
    %% Some configuration operation here (e.g. handle ets/mnesia table)
    {ok, #state{}}.

This kind of approach is pretty common when the operations we want to take during initialization are cheap in terms of time, but what happens if the initialisation is expected to take a long time?

Suppose you have a supervisor that spawns many children and each child has some long time taking configuration. The supervisor will probably call the function start_link/3,4 of each child in sequence and will not be able to return until Module:init/1 of the child it is starting has returned.  This means that the supervisor won’t be able to start the next children on the fly and this will somehow slow down the whole supervisor startup phase. 

How can we solve this issue? Well, there are a couple of different ways to do it, but all of them are based on splitting the gen_server initialisation into two phases, a first phase implemented in the init/1 function during which we trigger some internal message for a future configuration and that returns immediately to the supervisor and  a second phase in which the configuration  actually takes place. In such a way we can free the supervisor startup from the time burden of all the children configurations.

Let’s see with some code what are the most common ways to achieve this results. My favourite technique consists into triggering the future configuration using a gen_server cast inside the init/1 function as follows:

start_link() ->
    gen_server:start_link({local, ?SERVER}, ?MODULE, [], []).

init([]) ->
    gen_server:cast(self(), startup),
    {ok, state{}}.

As you can see within the init/1 function we trigger a cast message to our process and immediately return. At this point we just need to handle the cast in the function handle_cast/2 and perform the needed configuration. This can be done in this way:

handle_cast(startup, State) ->
    %% Do your configuration here
    {noreply, State}.

A different way to achieve the same “two phases” result can be implemented as follows:

start_link() ->
    gen_server:start_link({local, ?SERVER}, ?MODULE, [], []).

init([]) ->
    self() ! startup,
    {ok, #state{}}.

This time we first send the atom ‘startup’ to the gen_server and then we return. Of course we need to handle that message within our gen_server as follows:

handle_info(startup, State) ->
    %% Do your configuration here
    {noreply, State}.

As you can see the logic is pretty much the same here so I won’t go into further details. 

The last way to achieve our result can be implemented by taking advantage of a timeout message. In practice in the init/1 function, instead of returning the tuple {ok, #state{}} we return the tuple {ok, #state{}, Timeout}. By including the value Timeout in the last tuple we specify that a ‘timeout’ atom will be sent to our gen_server unless a request or a message is received within Timeout milliseconds.

The ‘timeout’ atom should be handled by the handle_info/2 callback function. By setting the value of Timeout to 0 and adapting handle_info we can implement once again in an easy way our “two phases” configuration. Let’s see how this can be obtained:

start_link() ->
    gen_server:start_link({local, ?SERVER}, ?MODULE, [], []).

init([]) ->
    {ok, #state{}, 0}.

And the ‘timeout’ message can be handled as:

handle_info(timeout, State) ->
    %% Do your configuration here
    {noreply, State}.

Personally I don’t like the last approach because the atom ‘timeout’ is not so meaningful and it can lead to some misunderstanding. By the way the real problem here is that this approach is implemented taking advantage of an internal timer that should be evaluted: we can’t be sure that the message will be sent immediately, we just know that the message will be sent after at least 0 milliseconds.

Some reader here may say that this “two phases” approach is risky, because no one assures us that the configuration message will be the first message handled either in handl_cast/2 or handle_info/2.  Actually this is not completely true.

We can’t be sure 100% that a message sent in init/1 using either a cast of the operator ! will be the first one in the process queue of our gen_server, but considering that start_link/3,4 is synchronous there are really few chances for another process to send a message to our gen_server before we send the configuration message.

Final consideration: there is a more elegant way to achieve the same result that consists in the combination of the functions start_link/3 and init_ack/1 of the module proc_lib. For those of you interested in the topic I suggest the user guide of ranch.

Advertisements

You should not start your supervisors like this!

August 22, 2011 2 comments

Twice a week I go to StackOverflow and read the new posts about Erlang, doing this I have seen many questions about starting a supervisor using the erl command and the “- s” flag.

This is a topic I really care about, actually when I was coding my first stuff in Erlang, I was asked to code and start from a unix script a small
system composed of a gen_server which was spawned and controlled by a supervisor.

At that time I did’t know Erlang application so I decided to do something like this:

erl -pa ebin -sname mynode -cookie mycookie -s mysupervisor start_link

but this leaded nowhere, the supervisor were not started at all and this was
really frustrating. The thing that annoyed me the most was that by using:

erl -pa ebin -sname mynode -cookie mycookie -s myserver start_link

the gen_server was started successfully, so I thought that the problem was in the supervisor but in fact it wasn’t.

Where was the problem here? Probably, the thing most of people don’t know (and for sure the one I did’t know at the time) is that the supervisor process is linked to the shell process in which it is started. When the calling process (the shell) exits, the supervisor is taken down as well.

To tackle this problem you could take two different approaches:

1) unlink the supervisor from its calling process by using unlink/0
2) package your code using an application

The first solution can be achieved by changing the start_link/0 function of the supervisor as follows:

start_link() -> 
 {ok, Pid} = supervisor:start_link({local, ?SERVER}, ?MODULE, []), 
 unlink(Pid).

Altough this solution solves your problems, I strongly suggest you to adopt the second solution I listed above. Applications in Erlang are really easy to understand and implement. Many information can be gathered online, for example in Mitchell Hashimoto’s blog or in the standard documentation.

Categories: Erlang Tags: , , ,

Supervisors in Erlang OTP

January 28, 2010 2 comments

A supervisor is an OTP behaviour (a process design pattern) made available in a set of library modules that come with the normal Erlang distribution. In practice a supervisor is a piece of code, done by expert developers, which simplyfies our life!
Basically OTP provides two different kinds of processes: workers (e.g. gen_servers or gen_fsm) that as you can easily understand do the actual job and supervisors which monitor workers’s (or other supervisors’s) statuses.

In the figure above, square boxes represents supervisors and circles represent workers.

A supervisor’s task is not only to check whether its childrens are up and running or not, it also may take actions in one of these cases (e.g. restart a child if it is down).

As always let’s introduce some code and let’s see how the things work!

-module(my_supervisor).
-behaviour(supervisor).

%% API
-export([start_link/0]).

%% Supervisor callbacks
-export([init/1]).
-define(SERVER, ?MODULE).

%%====================================================================
%% API functions
%%====================================================================
%%--------------------------------------------------------------------
%% Function: start_link() -> {ok,Pid} | ignore | {error,Error}
%% Description: Starts the supervisor
%%--------------------------------------------------------------------
start_link() ->
  supervisor:start_link({local, ?SERVER}, ?MODULE, []).
%%====================================================================
%% Supervisor callbacks
%%====================================================================
%%--------------------------------------------------------------------
%% Func: init(Args) -> {ok,  {SupFlags,  [ChildSpec]}} |
%%                     ignore                          |
%%                     {error, Reason}
%% Description: Whenever a supervisor is started using
%% supervisor:start_link/[2,3], this function is called by the new process
%% to find out about restart strategy, maximum restart frequency and child
%% specifications.
%%--------------------------------------------------------------------
init([]) ->
  Child = {mychild, {mychild, start_link, []},
           permanent, 2000, worker, [mychild]},
  {ok, {{one_for_all, 1, 1}, [Child]}}.

The supervisor is started by the function start_link/0, which call the init/1 function.

As you can see we declared a variable named Child which represents the specification of a child for which the supervisor is responsible.

The variable is in the form: {Id, StartFunc, Restart, Shutdown, Type, Modules}

Id : is the name used to identify the child specification internally by the supervisor

StartFunc : represents the function used to start the child process. It is a module-function-arguments tuple used as apply(M, F, A)

Restart : defines when a terminated child process should be restarted. It can be one between:

  1. permanent : in this case child process is always restarted
  2. transient : in this case child process is restarted only if it terminates abnormally, i.e. with another exit reason than normal
  3. temporary : in this case child process is never restarted

Shutdown : defines how a child process should be terminated

  1. brutal_kill : in this case child process is unconditionally terminated using exit(Child, kill)
  2. integer value : in this case the supervisor tells the child process to terminate by calling exit(Child, shutdown) and then waits for an exit signal back. If no exit signal is received within the specified time, the child process is unconditionally terminated using exit(Child, kill)
  3. infinity : in this case child process is another supervisor, shutdown should be set to infinity to give the subtree enough time to shutdown

Type : it specifies if the child process is a supervisor or a worker (can be supervisor/worker)

Modules : this should be a list with one element [Module], where Module is the name of the callback module, if the child process is a supervisor, gen_server or gen_fsm. If the child process is a gen_event, Modules should be dynamic

The last tuple is in the form: {ok, {{RestartStrategy, MaxR, MaxT}, [Child]}}

RestartStrategy : indicates how to handle process restart. Can be one of the following:

  1. one_for_one : if the corresponding child process terminates, only that process is restarted
  2. one_for_all : if the child process terminates, all other child processes are terminated and then all child processes, including the terminated one, are restarted
  3. rest_for_one : if the child process terminates, the ‘rest’ of the child processes (i.e. the child processes after the terminated process in start order ) are terminated. Then the terminated child process and the rest of the child processes are restarted

MaxR and MaxT are related: if more than MaxR number of restarts occur in the last MaxT seconds, then the supervisor terminates all the child processes and then itself. When the supervisor terminates, then the next higher level supervisor takes some action. It either restarts the terminated supervisor, or terminates itself.

As you can see the only element inside the list is Child that contains the child specifications we set before; obviously you can add to this list more than one child specification.

Let’s test our supervisor. The first thing that I will do is coding a simple gen_server that prints a message anytime it is started:

-module(mychild).
-behaviour(gen_server).

%% API
-export([start_link/0]).

%% gen_server callbacks
-export([init/1, handle_call/3, handle_cast/2, handle_info/2,
	 terminate/2, code_change/3]).

-record(state, {}).

%%====================================================================
%% API
%%====================================================================
%%--------------------------------------------------------------------
%% Function: start_link() -> {ok,Pid} | ignore | {error,Error}
%% Description: Starts the server
%%--------------------------------------------------------------------
start_link() ->
    gen_server:start_link({local, ?MODULE}, ?MODULE, [], []).

%%====================================================================
%% gen_server callbacks
%%====================================================================

%%--------------------------------------------------------------------
%% Function: init(Args) -> {ok, State} |
%%                         {ok, State, Timeout} |
%%                         ignore               |
%%                         {stop, Reason}
%% Description: Initiates the server
%%--------------------------------------------------------------------
init([]) ->
    io:format("supervisor started me!~n", []),
    {ok, #state{}}.

%%--------------------------------------------------------------------
%% Function: %% handle_call(Request, From, State) -> {reply, Reply, State} |
%%                                      {reply, Reply, State, Timeout} |
%%                                      {noreply, State} |
%%                                      {noreply, State, Timeout} |
%%                                      {stop, Reason, Reply, State} |
%%                                      {stop, Reason, State}
%% Description: Handling call messages
%%--------------------------------------------------------------------
handle_call(_Request, _From, State) ->
    Reply = ok,
    {reply, Reply, State}.

%%--------------------------------------------------------------------
%% Function: handle_cast(Msg, State) -> {noreply, State} |
%%                                      {noreply, State, Timeout} |
%%                                      {stop, Reason, State}
%% Description: Handling cast messages
%%--------------------------------------------------------------------
handle_cast(_Msg, State) ->
    {noreply, State}.

%%--------------------------------------------------------------------
%% Function: handle_info(Info, State) -> {noreply, State} |
%%                                       {noreply, State, Timeout} |
%%                                       {stop, Reason, State}
%% Description: Handling all non call/cast messages
%%--------------------------------------------------------------------
handle_info(_Info, State) ->
    {noreply, State}.

%%--------------------------------------------------------------------
%% Function: terminate(Reason, State) -> void()
%% Description: This function is called by a gen_server when it is about to
%% terminate. It should be the opposite of Module:init/1 and do any necessary
%% cleaning up. When it returns, the gen_server terminates with Reason.
%% The return value is ignored.
%%--------------------------------------------------------------------
terminate(_Reason, _State) ->
    ok.

%%--------------------------------------------------------------------
%% Func: code_change(OldVsn, State, Extra) -> {ok, NewState}
%% Description: Convert process state when code is changed
%%--------------------------------------------------------------------
code_change(_OldVsn, State, _Extra) ->
    {ok, State}.

Now let’s test it in an Erlang shell:

bellerofonte@pegaso:~/Desktop$ erl
Eshell V5.7.3  (abort with ^G)
1> l(mysupervisor).
{module,mysupervisor}
2> mysupervisor:start_link().
supervisor started me!
{ok,<0.35.0>}
3> whereis(mychild).
<0.36.0>
4> erlang:exit(whereis(mychild), kill).
true
supervisor started me!
5> whereis(mychild).
<0.39.0>

As you can see as we start the supervisor, the child is started as well with the pid <0.36.0>.

After that we kill that child process using the Erlang BIF exit/2; at that point the supervisor restarts the child and a following whereis/1 command shows that the process has been restarted with a new pid <0.39.0>.

A lot of more information about this topic may be found on the official page of Erlang otp design principles!