An interview with Steve Vinoski (@stevevinoski)
Today you can read my interview to Steve Vinoski, a famous Erlang developer/speaker and distributed systems expert. Steve will give the talk “Addressing Network Congestion in Riak Clusters” at Erlang User Conference 2013.
Some questions, some answers
Paolo – Hi Steve! It’s really good to have one of the most famous Erlangers here in my blog. Would you mind to introduce yourself to our readers in a few words?
Steve – I’m Steve Vinoski, a member of the architecture group at Basho Technologies, the makers of Riak and RiakCS. I have a background in middleware and distributed systems, and have been an Erlang user since 2006.
Paolo – I know you are expert in several programming languages. How did you end up using Erlang? Did you have any previous experience with functional languages?
Steve – As far as functional languages go, I’ve played with them on and off for decades, but never used one in production until I found Erlang.
Then I found Erlang/OTP. I grew more and more intrigued as I discovered that it already provided numerous features that we had spent years developing and maintaining in our middleware systems, things like internode messaging, node monitoring, naming and discovery, portability across multiple network stacks, logging, tracing, etc. Not only did it provide all the features we needed, but its features were much more powerful and elegant. I put together a proposal for the IONA executive team that suggested we rebuild all of our product servers in Erlang so we could reduce maintenance costs, but the proposal was rejected because, as I later learned, they were trying to sell the company so it didn’t make sense to make such large changes to the code. I left IONA and joined Verivue, where we built video delivery hardware, and there I trained seven or eight other developers in Erlang and we used it to great advantage. After Verivue, I wanted to continue working with Erlang, which is part of the reason I joined Basho.
Paolo – In your blog you state that Erlang is your favourite programming language. Why?
Steve – To me Erlang/OTP is the type of system my middleware colleagues and I spent years trying to create. It’s got so many things a distributed systems developer wants: easy access to networking, libraries and utilities to make
interacting with distributed nodes straightforward, wonderful concurrency support, all the upgrading and reliability capabilities, and the Erlang language itself is sort of a “distributed systems DSL” where its elegance and small size make it easy to learn and easy to use to quickly become productive building distributed applications. And as if that’s not enough, the Erlang community is great, pleasantly supporting each other and newcomers while avoiding pointless arguments and rivalries you find in other communities. My use of other programming languages has actually decreased in recent years due primarily to my continued satisfaction with Erlang/OTP — it’s not great for every problem, but it’s fantastic for the types of problems I tend to work on.
Paolo – I know that in a previous working experience you had to deal with multimedia systems, a field where Erlang has still a minor impact with respect to languages like C++. Do you think Erlang will be able to find its place in this field as well? Can you give reasons for your answer?
Steve – Erlang/OTP is excellent for server systems in general, including multimedia servers. The Verivue system I worked on a few years ago had special TCP offload hardware for video delivery, so we didn’t need Erlang for that. Rather, we used Erlang for the control plane, which for example handled incoming client requests, looked up subscriber details in databases, and interacted with the hardware to set up multimedia data flows. Multimedia systems also have to integrate with billing systems, monitoring systems, and hardware from other vendors, and Erlang shines there as well, especially when it comes to finding bugs in the other systems and hot-loading code to compensate for those bugs. Customers tend to love you when you can quickly turn around fixes like that.
Another Erlang developer, Max Lapshin, built and supports erlyvideo, which seems to work well. I’ve never met Max but I know he faced some challenges along the way, as we did at Verivue, but I think he’s generally happy with how erlyvideo has turned out.
Paolo – Currently you are working at Basho, a very important company in the Erlang world. Do you mind telling our readers something more about your job?
Steve – At Basho I work in CTO Justin Sheehy’s architecture group. It’s a broad role with a lot of freedom to speak at and attend conferences and meetups, and I also work on research projects and pick up development tasks and projects from our Engineering team and Professional Services team when they need my help.
Paolo – At Erlang User Conference 2013 you will give a talk about Riak, its behaviour under extreme loads and the issues we may face when we want to scale it. Can you tell us something more about the topic?
Steve – At Basho we’re fortunate to have customers who continually push the boundaries of Riak’s comfort zone. Network traffic in Riak all goes over TCP — client requests, intracluster messages, and distributed Erlang communication. When clusters are extremely busy with client requests and transfer of data and messages between nodes, under certain conditions network throughput can drop significantly and messages can be lost, including messages intended for client applications. I am currently investigating the use of alternative network protocols to see if they can help prioritize different kinds of network traffic. This work is not yet finished, so my talk will give an overview of the problems along with the current status of the solution I’m investigating.
Paolo – I heard that you will also introduce during the talk a new Erlang network driver that should tackle some of this issues. Is this correct? Can you give us an insight?
Steve – Yes, I have been working on a new network driver. It implements an alternative UDP-based protocol for data transfer that can utilize full bandwidth when available but can also watch for congestion on network switches and quickly back off when detected. It also yields to TCP traffic under congestion conditions, preventing background data transfer tasks from shutting out more important messages like client requests and responses.
Paolo – Who should be interested in this talk? What are the minimum requisites needed in order to fully understand the topics of the talk?
Steve – Attendees should have a high-level understanding of Erlang’s architecture, what drivers are, and how they fit into the system. Other than that, my talk will explain in detail the problems I’m trying to address as well as the solution I’ve been investigating, so neither deep networking expertise nor deep understanding of Erlang internals is required.
Paolo – I can say without doubts that you are an expert in middleware and distributed computing systems. Can you suggest to our readers interested in those topics some books or internet resources?
Steve – The nice thing about distributed systems is that they never seem to get any easier, so there have been interesting research and development in this area for decades. The downside of that is that there are an enormous number of papers I could point to. In no particular order, here are some interesting papers and articles, most of which are currently sitting open in my browser tabs:
“Eventual Consistency Today: Limitations, Extensions, and Beyond”, Peter Bailis, Ali Ghodsi. This article provides an excellent description of eventual consistency and
recent work on eventually consistent systems.
“A comprehensive study of Convergent and Commutative Replicated Data Types”, M. Shapiro, N. Preguiça, C. Baquero, M. Zawirski. This paper explores and details data types that work well for applications built on eventually consistent systems.
“Notes on Distributed Systems for Young Bloods”, J. Hodges. This excellent blog post succinctly summarizes the past few decades of
distributed systems research and discoveries, and also explains some implementation concerns we’ve learned along the way to keep in mind when build distributed applications.
“Impossibility of Distributed Consensus with One Faulty Process”, M.Fischer, N. Lynch, M. Paterson. This paper is nearly 30 years old but is critical to understanding fundamental properties of distributed systems.
“Dynamo: Amazon’s Highly Available Key-value Store”, G. DeCandia, et al. A classic paper detailing trade-offs for high availability distributed systems.
Paolo – Day-by-day Erlang becomes more popular. In your opinion what can we expect from Erlang in the future? What are the next goals the Erlang community should try to reach?
Steve – Under the guidance of Ericsson’s OTP team and with valuable input from the open source community, Erlang/OTP continues to evolve gracefully to address production systems. I expect Erlang will continue to improve as a language
and platform for building large-scale systems that perform well and are relatively easy to understand, reason about, and maintain without requiring an army of developers. In particular I’m looking forward to the OTP team’s
continued work on optimizing multicore Erlang process scheduling. The Erlang community is very good at proving how good Erlang/OTP is through the results of the systems they build, so they need to keep doing that to broaden Erlang’s appeal. If you’re a developer building practical open source or commercial software, the presentations given by community members at events like the Erlang User Conference and the Erlang Factory conferences are amazing sources of knowledge and wisdom for what works well for Erlang/OTP applications and what can be problematic.