An interview with Steve Vinoski (@stevevinoski)

May 14, 2013 2 comments

Today you can read my interview to Steve Vinoski, a famous Erlang developer/speaker and distributed systems expert. Steve will give the talk “Addressing Network Congestion in Riak Clusters” at Erlang User Conference 2013.

Some questions, some answers

Paolo – Hi Steve! It’s really good to have one of the most famous Erlangers here in my blog. Would you mind to introduce yourself to our readers in a few words?

Steve - I’m Steve Vinoski, a member of the architecture group at Basho Technologies, the makers of Riak and RiakCS. I have a background in middleware and distributed systems, and have been an Erlang user since 2006.

Paolo – I know you are expert in several programming languages. How did you end up using Erlang? Did you have any previous experience with functional languages?

Steve - As far as functional languages go, I’ve played with them on and off for decades, but never used one in production until I found Erlang.

I worked in middleware from 1991 to 2007, and in 2004 at IONA Technologies I started looking into innovative ways of expanding our product line and reducing the cost of product development. IONA’s products were written in C++, which I’ve used since 1988 and so I am well aware of its complexity, and Java, which frankly I’ve never liked (I like the JVM but don’t like the Java language). Neither language lends itself to rapid development or easy maintenance. I built a prototype that layered Ruby over one of our C++ products that allowed for an order of magnitude decrease in the number of lines of code required to write client applications, and built another prototype that provided a JavaScript layer for writing server applications, but customers didn’t seem interested, and both approaches only increased development and maintenance costs.

Then I found Erlang/OTP. I grew more and more intrigued as I discovered that it already provided numerous features that we had spent years developing and maintaining in our middleware systems, things like internode messaging, node monitoring, naming and discovery, portability across multiple network stacks, logging, tracing, etc. Not only did it provide all the features we needed, but its features were much more powerful and elegant. I put together a proposal for the IONA executive team that suggested we rebuild all of our product servers in Erlang so we could reduce maintenance costs, but the proposal was rejected because, as I later learned, they were trying to sell the company so it didn’t make sense to make such large changes to the code. I left IONA and joined Verivue, where we built video delivery hardware, and there I trained seven or eight other developers in Erlang and we used it to great advantage. After Verivue, I wanted to continue working with Erlang, which is part of the reason I joined Basho.

Paolo – In your blog you state that Erlang is your favourite programming language. Why?

Steve - To me Erlang/OTP is the type of system my middleware colleagues and I spent years trying to create. It’s got so many things a distributed systems developer wants: easy access to networking, libraries and utilities to make
interacting with distributed nodes straightforward, wonderful concurrency support, all the upgrading and reliability capabilities, and the Erlang language itself is sort of a “distributed systems DSL” where its elegance and small size make it easy to learn and easy to use to quickly become productive building distributed applications. And as if that’s not enough, the Erlang community is great, pleasantly supporting each other and newcomers while avoiding pointless arguments and rivalries you find in other communities. My use of other programming languages has actually decreased in recent years due primarily to my continued satisfaction with Erlang/OTP — it’s not great for every problem, but it’s fantastic for the types of problems I tend to work on.

Paolo – I know that in a previous working experience you had to deal with multimedia systems, a field where Erlang has still a minor impact with respect to languages like C++. Do you think Erlang will be able to find its place in this field as well? Can you give reasons for your answer?

Steve - Erlang/OTP is excellent for server systems in general, including multimedia servers. The Verivue system I worked on a few years ago had special TCP offload hardware for video delivery, so we didn’t need Erlang for that. Rather, we used Erlang for the control plane, which for example handled incoming client requests, looked up subscriber details in databases, and interacted with the hardware to set up multimedia data flows. Multimedia systems also have to integrate with billing systems, monitoring systems, and hardware from other vendors, and Erlang shines there as well, especially when it comes to finding bugs in the other systems and hot-loading code to compensate for those bugs. Customers tend to love you when you can quickly turn around fixes like that.

Another Erlang developer, Max Lapshin, built and supports erlyvideo, which seems to work well. I’ve never met Max but I know he faced some challenges along the way, as we did at Verivue, but I think he’s generally happy with how erlyvideo has turned out.

Paolo – Currently you are working at Basho, a very important company in the Erlang world. Do you mind telling our readers something more about your job?

Steve - At Basho I work in CTO Justin Sheehy’s architecture group. It’s a broad role with a lot of freedom to speak at and attend conferences and meetups, and I also work on research projects and pick up development tasks and projects from our Engineering team and Professional Services team when they need my help.

Paolo – At Erlang User Conference 2013 you will give a talk about Riak, its behaviour under extreme loads and the issues we may face when we want to scale it. Can you tell us something more about the topic?

Steve - At Basho we’re fortunate to have customers who continually push the boundaries of Riak’s comfort zone. Network traffic in Riak all goes over TCP — client requests, intracluster messages, and distributed Erlang communication. When clusters are extremely busy with client requests and transfer of data and messages between nodes, under certain conditions network throughput can drop significantly and messages can be lost, including messages intended for client applications. I am currently investigating the use of alternative network protocols to see if they can help prioritize different kinds of network traffic. This work is not yet finished, so my talk will give an overview of the problems along with the current status of the solution I’m investigating.

Paolo – I heard that you will also introduce during the talk a new Erlang network driver that should tackle some of this issues. Is this correct? Can you give us an insight?

Steve - Yes, I have been working on a new network driver. It implements an alternative UDP-based protocol for data transfer that can utilize full bandwidth when available but can also watch for congestion on network switches and quickly back off when detected. It also yields to TCP traffic under congestion conditions, preventing background data transfer tasks from shutting out more important messages like client requests and responses.

Paolo – Who should be interested in this talk? What are the minimum requisites needed in order to fully understand the topics of the talk?

Steve - Attendees should have a high-level understanding of Erlang’s architecture, what drivers are, and how they fit into the system. Other than that, my talk will explain in detail the problems I’m trying to address as well as the solution I’ve been investigating, so neither deep networking expertise nor deep understanding of Erlang internals is required.

Paolo – I can say without doubts that you are an expert in middleware and distributed computing systems. Can you suggest to our readers interested in those topics some books or internet resources?

Steve - The nice thing about distributed systems is that they never seem to get any easier, so there have been interesting research and development in this area for decades. The downside of that is that there are an enormous number of papers I could point to. In no particular order, here are some interesting papers and articles, most of which are currently sitting open in my browser tabs:

“Eventual Consistency Today: Limitations, Extensions, and Beyond”, Peter Bailis, Ali Ghodsi. This article provides an excellent description of eventual consistency and
recent work on eventually consistent systems.

“A comprehensive study of Convergent and Commutative Replicated Data Types”, M. Shapiro, N. Preguiça, C. Baquero, M. Zawirski. This paper explores and details data types that work well for applications built on eventually consistent systems.

“Notes on Distributed Systems for Young Bloods”, J. Hodges. This excellent blog post succinctly summarizes the past few decades of
distributed systems research and discoveries, and also explains some implementation concerns we’ve learned along the way to keep in mind when build distributed applications.

“Impossibility of Distributed Consensus with One Faulty Process”, M.Fischer, N. Lynch, M. Paterson. This paper is nearly 30 years old but is critical to understanding fundamental properties of distributed systems.

“Dynamo: Amazon’s Highly Available Key-value Store”, G. DeCandia, et al. A classic paper detailing trade-offs for high availability distributed systems.

Paolo – Day-by-day Erlang becomes more popular. In your opinion what can we expect from Erlang in the future? What are the next goals the Erlang community should try to reach?

Steve - Under the guidance of Ericsson’s OTP team and with valuable input from the open source community, Erlang/OTP continues to evolve gracefully to address production systems. I expect Erlang will continue to improve as a language
and platform for building large-scale systems that perform well and are relatively easy to understand, reason about, and maintain without requiring an army of developers. In particular I’m looking forward to the OTP team’s
continued work on optimizing multicore Erlang process scheduling. The Erlang community is very good at proving how good Erlang/OTP is through the results of the systems they build, so they need to keep doing that to broaden Erlang’s appeal. If you’re a developer building practical open source or commercial software, the presentations given by community members at events like the Erlang User Conference and the Erlang Factory conferences are amazing sources of knowledge and wisdom for what works well for Erlang/OTP applications and what can be problematic.

Erlang Camp 2013 is coming!

May 6, 2013 Leave a comment

Amsterdam: beautiful city of bicycles, canals and….. Erlang!

Nothing to do on Aug 30-31, 2013? What about  travelling to the lovely city of Amsterdam and attend the Erlang Camp 2013?

If you have been following my blog for a while you should already know what Erlang Camp is: an intensive two day learning experience focused on getting you up to speed on creating large scale, fault tolerant distributed applications in Erlang.

In particular, during the Erlang Camp 2013 which is exceptionally sponsored by the amazing company SpilGames you will get in touch with several Erlang topics as:

  • Erlang basic stuff
  • Erlang OTP
  • How to ship your Erlang code using applications and releases
  • Erlang Distribution

More information on the Erlang Camp schedule may be found in this web page.

Erlang Camp is a pretty good way to learn Erlang language and to get in touch with some of the best Erlang teachers and developers outh there. Knowing that only 100 seats are available and that they will go quickly I suggest you to hurry and register for the event!

 

How to handle configuration in init/1 function without slowing down your erlang supervisor startup

April 17, 2013 Leave a comment

Many times if you work with Erlang and follow the OTP design principles in your implementation, you may end up having one or more supervisors spawning a set of processes that can be either other supervisors or workers.

Most likely the child processes implementing the workers will be based on gen_server behaviour and whether you’re working on your small side project or in some big company project they will need some sort of initialization during the start-up phase: in fact creating an ets or mnesia table, reading a configuration file or accepting connections on a socket are pretty common operations that you want to be executed before the worker handles  other messages and executes the operations connected to such messages.

According to the relative documentation, the gen_server process calls Module:init/1 to initialize and therefore the first strategy you may think to employ consists in doing the operations listed above within this function. What I mean here is something like:

start_link() ->
    gen_server:start_link({local, ?SERVER}, ?MODULE, [], []).

init([]) ->
    %% Some configuration operation here (e.g. handle ets/mnesia table)
    {ok, #state{}}.

This kind of approach is pretty common when the operations we want to take during initialization are cheap in terms of time, but what happens if the initialisation is expected to take a long time?

Suppose you have a supervisor that spawns many children and each child has some long time taking configuration. The supervisor will probably call the function start_link/3,4 of each child in sequence and will not be able to return until Module:init/1 of the child it is starting has returned.  This means that the supervisor won’t be able to start the next children on the fly and this will somehow slow down the whole supervisor startup phase. 

How can we solve this issue? Well, there are a couple of different ways to do it, but all of them are based on splitting the gen_server initialisation into two phases, a first phase implemented in the init/1 function during which we trigger some internal message for a future configuration and that returns immediately to the supervisor and  a second phase in which the configuration  actually takes place. In such a way we can free the supervisor startup from the time burden of all the children configurations.

Let’s see with some code what are the most common ways to achieve this results. My favourite technique consists into triggering the future configuration using a gen_server cast inside the init/1 function as follows:

start_link() ->
    gen_server:start_link({local, ?SERVER}, ?MODULE, [], []).

init([]) ->
    gen_server:cast(self(), startup),
    {ok, state{}}.

As you can see within the init/1 function we trigger a cast message to our process and immediately return. At this point we just need to handle the cast in the function handle_cast/2 and perform the needed configuration. This can be done in this way:

handle_cast(startup, State) ->
    %% Do your configuration here
    {noreply, State}.

A different way to achieve the same “two phases” result can be implemented as follows:

start_link() ->
    gen_server:start_link({local, ?SERVER}, ?MODULE, [], []).

init([]) ->
    self() ! startup,
    {ok, #state{}}.

This time we first send the atom ‘startup’ to the gen_server and then we return. Of course we need to handle that message within our gen_server as follows:

handle_info(startup, State) ->
    %% Do your configuration here
    {noreply, State}.

As you can see the logic is pretty much the same here so I won’t go into further details. 

The last way to achieve our result can be implemented by taking advantage of a timeout message. In practice in the init/1 function, instead of returning the tuple {ok, #state{}} we return the tuple {ok, #state{}, Timeout}. By including the value Timeout in the last tuple we specify that a ‘timeout’ atom will be sent to our gen_server unless a request or a message is received within Timeout milliseconds.

The ‘timeout’ atom should be handled by the handle_info/2 callback function. By setting the value of Timeout to 0 and adapting handle_info we can implement once again in an easy way our “two phases” configuration. Let’s see how this can be obtained:

start_link() ->
    gen_server:start_link({local, ?SERVER}, ?MODULE, [], []).

init([]) ->
    {ok, #state{}, 0}.

And the ‘timeout’ message can be handled as:

handle_info(timeout, State) ->
    %% Do your configuration here
    {noreply, State}.

Personally I don’t like the last approach because the atom ‘timeout’ is not so meaningful and it can lead to some misunderstanding. By the way the real problem here is that this approach is implemented taking advantage of an internal timer that should be evaluted: we can’t be sure that the message will be sent immediately, we just know that the message will be sent after at least 0 milliseconds.

Some reader here may say that this “two phases” approach is risky, because no one assures us that the configuration message will be the first message handled either in handl_cast/2 or handle_info/2.  Actually this is not completely true.

We can’t be sure 100% that a message sent in init/1 using either a cast of the operator ! will be the first one in the process queue of our gen_server, but considering that start_link/3,4 is synchronous there are really few chances for another process to send a message to our gen_server before we send the configuration message.

Final consideration: there is a more elegant way to achieve the same result that consists in the combination of the functions start_link/3 and init_ack/1 of the module proc_lib. For those of you interested in the topic I suggest the user guide of ranch.

A mock APN server in #erlang

April 1, 2013 Leave a comment

The number of apps people install on their smartphones is increasing day by day. One of the most interesting technologies a developer may employ when developing his own app consists in using push notifications.

Nowadays the biggest companies out there are providing some service in this sense: Apple has Apple Push Notification, Google has Google Cloud Messaging, Nokia has Nokia Notifications and Microsoft has Push Notification for Windows Phone.

Some time ago I had to work with Apple Push Notification, which does not provide a REST interface to the service as the other platforms but instead provides a protocol for exchanging streams between client and server.

Most of the stuff  I had to deal with when working with APN was fairly easy and this is true specially because I was working with Erlang: I do believe that Erlang and in particular binary sintax can be really helpful when you have to build a packet following the Apple specifications, and in fact a lot of resources are available online in this sense as for example the tutorial you may find in this blog.

During my initial implementation and trobuleshooting I didn’t want to test my own software on the real APN server. Yes I know, you may use the sandbox for testing purposes, but if you are running a test with hundred thousands of notifications in order to stress your backend it is not a good idea to spam APN, moreover APN can be tricky some times so I did prefer to test my software on a controlled environment.

For this reason I implemented mockapn a mock server that provides very basic functionalities but that may be very helpful when you want to test your implementation of an APN client.

mockapn was built using the strategy proposed by Fred in the fantastic book LYSE: the supervisor (mock_apn_sup) starts a set of acceptors and whenever a connection is handled a new acceptor is spawned so that we have always n acceptors ready. Of course mockapn is built using ssl module and not gen_tcp, but still the implementation is almost the same, you have only to remember to change in the supervisor the values for certicate, key and password according to your needs.

Any incoming connection to mockapn is handled  using a module (mock_apn_server) implemented on top of the gen_server behaviour: this module parses the incoming binaries looking for APN packets built using simple or enhanced formats and prints on screen the token and the JSON received.

Only for enhanced format mock_apn_server checks whether the token is invalid (invalid tokens are specified in the macro INVALID_TOKENS) and if so sends back an error message to the client and closes the connection. Since in my experience this is the most common error I did’t implement any other kind of error check in mockapn, but you are free to fork and adapt the code according to your needs. You may want to see the README file for more information on how to compile and run mockapn.

Now you may ask: “Paolo, is APN so easy as you say?“. No. some parts of APN are indeed tricky.

For example it is obvious that you can’t start an SSL connection to APN only to send a single notification and then close the connection. You can’t do it because:

  1. if you do it very often you may bother APN servers that may see you as a DOS attacker
  2. most of the times you want to send multiple push notification as a bulk over the same SSL connection.

If all your notifications in the aforesaid bulk are correct then no problem, APN will forward them to the clients and everybody will be happy.

But what happens if one of the notifications in the middle of the bulk is wrong? Well, for what I have experienced I may say that if one of the notifications is not valid (e.g., it has an invalid token) the connection will be closed and you will receive an error message (if you are using the enhanced format) but all the notifications sent on the same socket after the wrong one are lost in a limbo, and you will not know whether they were right or wrong. In Erlang this means that if you send in a fast way your notifications you will get an ok message all the notifications and eventually at some point you will get a message representing the fact that the socket was closed by the peer.

I suggest you to keep this in mind when you deal with APN, because only few websites cited this issue as for example this stackoverflow question, and this google project

mockapn behaves in a similar way and you may test it using the file mock_apn_test.erl (which you have to compile using the old good erlc). In mock_apn_test I do send three notifications to mockapn (you have to change the tokens according to you needs), the second one should have a wrong token (i.e., one of the tokens you speficied previously in the list INVALID_TOKENS in mock_apn_server).

Running this code you will receive to the client an error message from mockapn for the second packet, but the third push will be never delivered and will be lost even though it was correct.

Ok, let me know if you need any more specific question!

Last thing! I want to add this nice youtube video I foud in an online tutorial (sadly I lost the link to it). In the tutorial the author states that coding a bullet proof APN interface is easy as the action you will see in the video :D  

Categories: Erlang Tags: , , ,

An interview with Eugene Fooksman #erlang

March 27, 2013 Leave a comment

Hello guys! Finally I had some free time to post on this blog :D . Today you can read my interview to Eugene Fooksman, a software developer at WhatsApp. As always I hope you will like it!

WhatsApp Eugene?

Paolo – Hello Eugene. Welcome to my blog! Would you like to briefly introduce yourself to our readers?
Eugene - Hello, thanks for inviting me. My name is Eugene Fooksman, I live in Silicon Valley, California, and I’m currently a software developer for the server group at WhatsApp, the company providing multi-platform mobile messaging service.

Paolo - Can you tell us something more about your experience with Erlang? When did you start with it? Why?
Eugene - I started using Erlang only here, at WhatsApp. My previous expertise and long time affection was C++, I used it for many years and, naturally, considered it to be the king of programming languages, as many C++ programmers do.
WhatsApp server is almost completely implemented in Erlang, so I had to learn it when I came here, with certain reluctancy at first, I must admit. But it took no more than about a week to completely fall in love with it.

Paolo - As an Erlang developer, what do you think about this coding language? Do you like it or is it just a part of your work?
Eugene - I like it a lot. Functional nature is a bit hard to accept for a person with object oriented mindset, but once you go over that line – the simplicity of many basic routine things never stops to amaze.
For instance, tail recursion being embedded in the very core and syntax of the language not only compensates for the lack of standard loops, but also totally changes the way you think of anything iterative in your code and provides very reach coding capabilities.
Another such example is pattern matching – very powerful way for processing heterogeneous values returned by the called code. It’s very disappointing for me now to not be able to do it in other languages.
Of course I’m not even talking about the real high-level advantages of Erlang as a perfect platform for concurrent programming and handling multi-user environment, like communication servers. Parallelism, transparency of remote vs local processes, and set of very well defined behaviors (which are essentially implementations of fundamental server concepts and patterns like generic sever, event manager, state machine etc.) – all of this makes writing server systems quite easy and elegant.

Paolo - In your opinion what are the fundamental things an Erlang developer should focus on in order to improve his coding efficiency?
Eugene - I would defi
nitely start with deep understanding of OTP principles and concepts. Supervision models can be a little tricky, but it’s worth spending time and effort figuring it out.
 
OTP has excessive set of tools and patterns, and it’s important to understand which ones are better for specific tasks. For instance, gen_server and gen_fsm can be two competing approaches for the same system. Another important thing is coding discipline and cleanness – something to always be utilized with languages without strong typisation, like Erlang.

Paolo - You are currently working at Whatsapp, can you describe  in a nutshell the company?
Eugene - WhatsApp is a small big company.

We deliver billions of messages between millions of users, but at the same time we have only about 35 people working here, only about 20 of them are engineers, which include small mobile teams creating and maintaining client applications for 6 different smartphone platforms, and a server group, making sure all those messages are delivered. With the great team and very concentrated focus on our product – it’s a lot of fun to be here and help creating such a cool service.

Paolo - How long has Whatsapp been using Erlang? Did the company switch from a different language or did you decided to start the project in Erlang from the beginning?   
Eugene - WhatsApp server has started from ejabberd – famous open source Jabber server written in Erlang. It was originally chosen because of a group of reasons, including openness, great reviews by developers, ease of start and the promise of Erlang’s long term suitability for large communication system. We started from ejabberd and made just few extensions and changes to get WhatsApp service up and running.
We have spent next few years re-writing and modifying quite a few parts of ejabberd, including switching from XMPP to internally developed protocol, restructuring the code base and redesigning some core components, and making lots of important modifications to Erlang VM to optimize server performance.

Paolo - According to this tweet, on   December 2012 Whatsapp hit a new record: 7B msgs inbound, 11B msgs outbound = 18 billion total messages in one day. Do you think erlang helped to achieve such a wonderful results?
Eugene - We are delivering more than 7 billion message every day. It’s a big number, although it is not unheard of among internet and communication giants. Our great achievement is that we manage it with really small server footprint. And the consensus in our team is that it is largely because of Erlang. We’re managing to serve huge amount of connections from single front-end server (this is from last year’s Erlang Factoryhttps://twitter.com/igorclark/status/185871819427954688). I’m not sure these numbers can be easily matched with other technologies.

Paolo - Do you think you could achieve the same results with other languages? In your opinion, could the final result compete with the Erlang implementation in terms of time and lines of code?
Eugene - Yes, this volume of messages can be (and has been) achieved with other languages, but Erlang definitely makes it very easy, elegant and manageable by small team. 

Paolo - As an experienced Erlang developer, what would you like to see in the future Erlang releases? Do you think there is still something missing in this language? Something that could potentially help spreading Erlang to all the developers out there?
Eugene - The are few things that could make writing code just a little easier. For instance, some concept of early returns from within functions and scopes. Also, assumed default values for ‘case’ and ‘if’ statements. Another thing is VM-level implementation of priority messaging for gen_server. Also we hope some of performance improvements to VM we’ve developed at WhatsApp can be adopted and used in future releases.
Categories: Erlang Tags: , ,

A new Erlang book in Spanish

February 21, 2013 Leave a comment

Hello guys! I would like to inform those of you who speak Spanish that a new Erlang book has been released.

The book by Manuel Rubio is composed by two volumes:  the first one “Un mundo concurrente” focuses on Erlang basic concepts and walks you through the functional part of  Erlang, from the language syntax to the design of concurrent servers handling network connections. The book covers also rebar explaining how to begin a project, deploy in production and do hot code upgrades.

The book was released in december 2012, and can be dowloaded for free as a PDF or bought as a paperback versiono for  € 12,00. 

The second volume will focus on  OTP basis and will cover the Erlang actor model, OTP design principles and other stuff.

Categories: Erlang

An interview with Knut Nesheim (@knutin)

February 19, 2013 Leave a comment

Hello folks! I have already told you that I am currently working on a set of interviews with some of the speakers you will find at Erlang Factory SF Bay Area 2013. Today I have the pleasure to interview Knut Nesheim.  Knut is widely known for some applications he created while working at Wooga (e.g., Elli and Locker). In this interview we will ask him something more about his Erlang experience, his famous Erlang applications and the talk he will give at Erlang Factory SF Bay.

Rock ‘n Roll? No, lock and rule! 

 

Paolo – Hi Knut! Thanks for making yourself avaiable for an interview. Please, describe yourself in few words.

Knut - Hey Paolo, thanks for having me! I’ve been developing software for the last five years, in Norway where I’m originally from, Sweden and now Germany. I’ve always had an urge to understand how things work and computers provide endless opportunities for tinkering, hacking and exploring.

Paolo – You started as with an educational background in music and then moved to computers right? How come did you switch?

Knut - After computers, music is my second big interest in life. Music obeys “laws”, many quite intricate. It’s interesting to take apart a piece of music and understand how the parts work together to form something that can create emotion, give energy and inspire in people without them maybe even noticing, like in film. I find music for film to be very interesting.

It became apparent pretty quickly that I was much better at computers than music. I still play guitar and I regularly visit the Berlin Philharmonie. I just bought an electric piano and I want to start taking lessons again. It’s a nice hobby.

Paolo – I do believe that your main language before Erlang was Python. In your opinion what is the most difficult concept of Erlang for a developer coming from Python?

Knut - I was using Python quite a lot, yes. I was doing smaller web applications with lots of business logic.

The hardest part about learning Erlang for me was immutable state. I was used to thinking of data being stored in places and having different parts of the program storing new values in that place. Understanding that after I modify something, there are now two versions of it took some time. It’s a bit sad that it’s mostly inside of our programs we can have the immutable state. I find Datomic, the database from Rich Hickey to be very interesting.

Paolo – You are the author of Elli, an Erlang application I like very much. Can you tell us somethink about it?

Knut - Good to hear someone likes it! Elli is a special purpose webserver you can run inside your Erlang application to let your code speak HTTP with your clients. It’s not general purpose in that it lacks features you would take for granted if you’re for example making a web application. It’s a mashup of ideas from different projects, both within the Erlang community and outside, specifically it takes a lot of inspiration from Rack.

It has been a fun project to work on and I have received 34 pull requests to date, which isn’t much compared to other bigger projects, but I think it’s great to see so users contributing their improvements. It’s used in production in Wooga. Upcoming games are using it in very interesting ways. The traffic is absolutely crazy, with each online user making a request every 1-5 seconds.

Paolo – Why should we take a look at Elli? I mean, in the Erlang community we have already many Web Servers; when should we prefer Elli to them?

Knut - It’s true that there’s a bunch of projects already there, overlapping in quite big areas resulting in duplicated efforts.

I wrote Elli for the needs of Wooga, which was difficult to meet with the existing projects. Now, that’s not because they are bad in any way, lack features or are buggy. They are quite successful at making developers more productive.

When the volumes of traffic grow, so does the amount of weird input and amount of unexpected interactions in the implementation of the webserver. If you have an error that statistically speaking happens every 10,000 requests, at Wooga that error happens more than once a second. We found that developer productivity wasn’t all that important for us, it’s more important to focus on performance, robustness and operations.

A webserver that did less, we thought, would be less likely to have features interact in unexpected ways and would be faster from less code (and fewer processes!). Using the Rack-style of request-response where the handler function returns the response (as opposed to sending it directly on the socket), allows us to create middlewares. In the “web world”, middlewares are a common way to plug in third-party functionality, like an authentication middleware inspecting and injecting headers, a middleware implementing cookies or a middleware collecting statistics and exposing it over HTTP by overriding a url. It doesn’t fit everything, you can’t for example postprocess a streamed response.

If you are able to judge the trade-offs made by the different projects and you find Elli fits your needs better, I would encourage you to use it. I’m happy to help and committed to maintaining the project. If you’re not sure which one to pick, go for Yaws.

Paolo – During your career you have been working in many companies as for example Wooga and Klarna. How does it feel to work in some of the biggest Erlang companies out there?

Knut – Pretty cool! I really enjoyed my time at Wooga. I joined January 2011 as a “gun for hire” to help out with a experimental Erlang project, started by Paolo Negri. Together we built the first backend using Erlang, paving the way for adoption within the company. They now have 14 developers using Erlang with almost all working on upcoming games.

Now I’m heading up a very interesting project at Unity, the makers of the 3d game engine. It’s too early to talk publicly about the projects we are building, but I’m very excited about this opportunity. The team is starting to form and by the end of the year, hopefully we have something really cool to show.

Paolo – During the Erlang Factory in San Francisco you will give the following talk “Locker: Consistent Distributed Locking”. Would you like to tell us something about it?

Knut - Locker is a distributed locking service, implemented in Erlang and uses the Erlang cluster for messaging. It’s a multi-master key-value in-memory database, where a key expires if the lease is not renewed periodically. In terms of CAP, it will sacrifice availability when a quorum can not be made

It’s a hack, in the sense that it’s not an implementation of a “proper” algorithm from a paper. For example, coordination of writes happens in two phases, where if a quorum can be made in the first phase, the second phase writes the value on all masters. Some people properly educated in distributed systems tell me it’s good stuff, others tell me it ignores too many hard problems to be useful, like group membership. I’m looking forward to hopefully getting more feedback.

Paolo – How come Wooga needed something as Locker? What are the main benefits one may experience using it?

Knut - Just like with Elli, eredis and statman, locker started out as a frustration with what was currently available to us. In the stateful application servers we were building, we needed a way of ensuring that each online user has only one process. This can be solved in many different ways and we found ourselves constantly thinking about it. Our first version was just a central point which serialized all updates, but we wanted it to be available in case of partitions or more common to us, node failure, software crash and operator error.

We thought about this challenge for more than a year before we decided to tackle it head on. A recurring question was “what if we could do the perfect solution to our problem, rather than trying to shoe-horn existing solutions with complex hacks for scaling?” closely followed by “how hard could it be?” Eventually, I decided to just try it. I’m very happy with the result.

Paolo – How long did it take to have a working version of Locker? I find interesting the fact that such a piece of software if available for free on GitHub, don’t you agree?

Knut - From I decided to try building it to having a working version took around three weeks, out of which one afternoon was spent coding. The rest of the time I spent thinking, drawing and reading. Writing tests, async replication, various optimizations, etc, maybe a month of total effort has been spent over the last year, with contributions from others at Wooga. It’s just 330 lines of code, but coming up with those exact lines was very hard.

From developers to CTO and CEO, Wooga is pushing to open-source what can be open-sourced to hopefully contribute something back to the community. It took some time to convince everybody that locker was something Wooga should give away for free, even to competitors. I was the one holding back, but was eventually convinced by Jesper Richter-Reichhelm, head of engineering. It’s always scary to put something out there for everybody to see.

Paolo – Who should follow your talk and why? What are the basic skills a wannabe attendant to your talk should have?

Knut - If you’re interested in distributed systems, it might be interesting to see how locker was built and why it was built that way. If you’re an expert, you could come and help me improve locker by providing feedback on why it’s good or why you think it’s total crap. If you’re interested in the cultural parts of a successful game studio and how we build amazing things in a very short time, you could get a glimpse inside the inner workings. Knowing Erlang isn’t really necessary. The challenging parts of understanding locker is not in the code.

 

Follow

Get every new post delivered to your Inbox.