Hello! Today you may read here my interview to Christopher Meiklejohn, one of the speakers at the upcoming Erlang Factory in San Francisco. Christopher is working on Riak at Basho Technologies.
Erlang, Vector Clocks and Riak!
Paolo – Hello Chris! It’s great to have another Basho Erlanger here! Can you introduce yourself please?
Christopher – It’s great to be here! My name is Christopher Meiklejohn, and I’m currently a software engineer at Basho Technologies, Inc., where I work on our distributed datastore, Riak. In addition to that, I’m also a graduate student at Brown University, in Providence, RI, where I study distributed computing.
Paolo – Before joining Basho you were working in a different company (i.e., Swipely) where you dealt with Ruby code. Did you already know Erlang when you started at Basho? How would you describe the switch between these two languages?
Christopher – During the time I was at Swipely they had Riak deployed in production, which was what initially got me interested in Basho, Riak, in particular, Erlang. When I joined Basho, I knew very little Erlang and spent my first few weeks at the company learning it.
That said, I love Erlang as a language and as platform to build application on. I wouldn’t necessarily say that the change from Ruby to Erlang was anything that was unexpected, specifically because I already had functional programming experience using languages like Scheme and Haskell.
Paolo – Rubyists tend to be addicted to TDD. Were you able to maintain such a good practice also when coding Erlang?
Christopher – Well, I’ll start with a disclaimer. I was primarily responsible for the introduction of behavior driven development at Swipely for feature development, in addition to promoting pair programming within the development team.
That said, testing and verification of software is a very interesting topic to me.
While I believe that all software should be properly tested, I’ve never been particularly dogmatic about when in the cycle of development testing is performed: whether it’s done during development to guide the design of the software or whether it’s done afterwards to validate the authored components. I do, however, have one major exception to this rule: when attempting to reproduce a customer issue and validate a fix for the issue.
This is purely a pragmatic decision that’s come from working on large scale distributed systems: testing and verification of distributed systems is extremely hard given the number of cooperating components involved in completing a task.
At Basho, we take two major approaches to testing Riak: integration testing using a open source test harness we’ve developed that allow us to validate high level operations of the system, and QuickCheck for randomized testing of smaller pieces of functionality.
Paolo – At the upcoming Erlang Factory in San Francisco you will give the following talk: “Verified Vector Clocks: An Experience Report”. Can you introduce in a few words the arguments you will treat during the talk?
Christopher – My talk is going to look at an alternative way of asserting correct operation of software components, commonly known as formal verification.
The talk will specifically focus on modeling vector clocks for use in the Riak datastore using an interactive theorem prover called Coq. This allows us to assert certain mathematical properties about our implementation, and perform extraction of the component into Erlang codewhich we can directly use in our system.
Paolo – Who should be interested in following your talk and why?
Christopher – Given the topics involved, I’m planning on keeping the talk pretty high level and will touch a variety of topics: the theorem prover Coq, which implements a dependently-typed functional programming language, the basics of using Core Erlang, a de-sugared subset of the Erlang programming language, and how we put all of the pieces together.
Paolo – Lamport’s vector clocks are well known by people working in fields connected to distributed systems. Can you explain briefly what they are and in what fields they can be used?
Christopher – Vector clocks provide a model for reasoning about events in a distributed system. It’s a bit involved for this interview to get into the specifics about how they work and when they should be used, so I’ll refer to you two excellent articles written by Bryan Fink and Justin Sheehy of Basho.
“Why Vector Clocks Are Easy”
“Why Vector Clocks Are Hard”
Paolo – About the application vvclocks, are you planning to keep the development on? If so how can people contribute?
Christopher – At this point, the project mainly serves as a playground for exploring how we might begin to approach building verifiable software components in Erlang. What has been done so far is available on GitHub, it’s actively being worked on by myself as my time allows, and if you’re interested in helping to explore this further, feel free to reach out to me via e-mail or on Twitter.
Today you can read my interview to Steve Vinoski, a famous Erlang developer/speaker and distributed systems expert. Steve will give the talk “Addressing Network Congestion in Riak Clusters” at Erlang User Conference 2013.
Some questions, some answers
Paolo – Hi Steve! It’s really good to have one of the most famous Erlangers here in my blog. Would you mind to introduce yourself to our readers in a few words?
Steve – I’m Steve Vinoski, a member of the architecture group at Basho Technologies, the makers of Riak and RiakCS. I have a background in middleware and distributed systems, and have been an Erlang user since 2006.
Paolo – I know you are expert in several programming languages. How did you end up using Erlang? Did you have any previous experience with functional languages?
Steve – As far as functional languages go, I’ve played with them on and off for decades, but never used one in production until I found Erlang.
Then I found Erlang/OTP. I grew more and more intrigued as I discovered that it already provided numerous features that we had spent years developing and maintaining in our middleware systems, things like internode messaging, node monitoring, naming and discovery, portability across multiple network stacks, logging, tracing, etc. Not only did it provide all the features we needed, but its features were much more powerful and elegant. I put together a proposal for the IONA executive team that suggested we rebuild all of our product servers in Erlang so we could reduce maintenance costs, but the proposal was rejected because, as I later learned, they were trying to sell the company so it didn’t make sense to make such large changes to the code. I left IONA and joined Verivue, where we built video delivery hardware, and there I trained seven or eight other developers in Erlang and we used it to great advantage. After Verivue, I wanted to continue working with Erlang, which is part of the reason I joined Basho.
Paolo – In your blog you state that Erlang is your favourite programming language. Why?
Steve – To me Erlang/OTP is the type of system my middleware colleagues and I spent years trying to create. It’s got so many things a distributed systems developer wants: easy access to networking, libraries and utilities to make
interacting with distributed nodes straightforward, wonderful concurrency support, all the upgrading and reliability capabilities, and the Erlang language itself is sort of a “distributed systems DSL” where its elegance and small size make it easy to learn and easy to use to quickly become productive building distributed applications. And as if that’s not enough, the Erlang community is great, pleasantly supporting each other and newcomers while avoiding pointless arguments and rivalries you find in other communities. My use of other programming languages has actually decreased in recent years due primarily to my continued satisfaction with Erlang/OTP — it’s not great for every problem, but it’s fantastic for the types of problems I tend to work on.
Paolo – I know that in a previous working experience you had to deal with multimedia systems, a field where Erlang has still a minor impact with respect to languages like C++. Do you think Erlang will be able to find its place in this field as well? Can you give reasons for your answer?
Steve – Erlang/OTP is excellent for server systems in general, including multimedia servers. The Verivue system I worked on a few years ago had special TCP offload hardware for video delivery, so we didn’t need Erlang for that. Rather, we used Erlang for the control plane, which for example handled incoming client requests, looked up subscriber details in databases, and interacted with the hardware to set up multimedia data flows. Multimedia systems also have to integrate with billing systems, monitoring systems, and hardware from other vendors, and Erlang shines there as well, especially when it comes to finding bugs in the other systems and hot-loading code to compensate for those bugs. Customers tend to love you when you can quickly turn around fixes like that.
Another Erlang developer, Max Lapshin, built and supports erlyvideo, which seems to work well. I’ve never met Max but I know he faced some challenges along the way, as we did at Verivue, but I think he’s generally happy with how erlyvideo has turned out.
Paolo – Currently you are working at Basho, a very important company in the Erlang world. Do you mind telling our readers something more about your job?
Steve – At Basho I work in CTO Justin Sheehy’s architecture group. It’s a broad role with a lot of freedom to speak at and attend conferences and meetups, and I also work on research projects and pick up development tasks and projects from our Engineering team and Professional Services team when they need my help.
Paolo – At Erlang User Conference 2013 you will give a talk about Riak, its behaviour under extreme loads and the issues we may face when we want to scale it. Can you tell us something more about the topic?
Steve – At Basho we’re fortunate to have customers who continually push the boundaries of Riak’s comfort zone. Network traffic in Riak all goes over TCP — client requests, intracluster messages, and distributed Erlang communication. When clusters are extremely busy with client requests and transfer of data and messages between nodes, under certain conditions network throughput can drop significantly and messages can be lost, including messages intended for client applications. I am currently investigating the use of alternative network protocols to see if they can help prioritize different kinds of network traffic. This work is not yet finished, so my talk will give an overview of the problems along with the current status of the solution I’m investigating.
Paolo – I heard that you will also introduce during the talk a new Erlang network driver that should tackle some of this issues. Is this correct? Can you give us an insight?
Steve – Yes, I have been working on a new network driver. It implements an alternative UDP-based protocol for data transfer that can utilize full bandwidth when available but can also watch for congestion on network switches and quickly back off when detected. It also yields to TCP traffic under congestion conditions, preventing background data transfer tasks from shutting out more important messages like client requests and responses.
Paolo – Who should be interested in this talk? What are the minimum requisites needed in order to fully understand the topics of the talk?
Steve – Attendees should have a high-level understanding of Erlang’s architecture, what drivers are, and how they fit into the system. Other than that, my talk will explain in detail the problems I’m trying to address as well as the solution I’ve been investigating, so neither deep networking expertise nor deep understanding of Erlang internals is required.
Paolo – I can say without doubts that you are an expert in middleware and distributed computing systems. Can you suggest to our readers interested in those topics some books or internet resources?
Steve – The nice thing about distributed systems is that they never seem to get any easier, so there have been interesting research and development in this area for decades. The downside of that is that there are an enormous number of papers I could point to. In no particular order, here are some interesting papers and articles, most of which are currently sitting open in my browser tabs:
“Eventual Consistency Today: Limitations, Extensions, and Beyond”, Peter Bailis, Ali Ghodsi. This article provides an excellent description of eventual consistency and
recent work on eventually consistent systems.
“A comprehensive study of Convergent and Commutative Replicated Data Types”, M. Shapiro, N. Preguiça, C. Baquero, M. Zawirski. This paper explores and details data types that work well for applications built on eventually consistent systems.
“Notes on Distributed Systems for Young Bloods”, J. Hodges. This excellent blog post succinctly summarizes the past few decades of
distributed systems research and discoveries, and also explains some implementation concerns we’ve learned along the way to keep in mind when build distributed applications.
“Impossibility of Distributed Consensus with One Faulty Process”, M.Fischer, N. Lynch, M. Paterson. This paper is nearly 30 years old but is critical to understanding fundamental properties of distributed systems.
“Dynamo: Amazon’s Highly Available Key-value Store”, G. DeCandia, et al. A classic paper detailing trade-offs for high availability distributed systems.
Paolo – Day-by-day Erlang becomes more popular. In your opinion what can we expect from Erlang in the future? What are the next goals the Erlang community should try to reach?
Steve – Under the guidance of Ericsson’s OTP team and with valuable input from the open source community, Erlang/OTP continues to evolve gracefully to address production systems. I expect Erlang will continue to improve as a language
and platform for building large-scale systems that perform well and are relatively easy to understand, reason about, and maintain without requiring an army of developers. In particular I’m looking forward to the OTP team’s
continued work on optimizing multicore Erlang process scheduling. The Erlang community is very good at proving how good Erlang/OTP is through the results of the systems they build, so they need to keep doing that to broaden Erlang’s appeal. If you’re a developer building practical open source or commercial software, the presentations given by community members at events like the Erlang User Conference and the Erlang Factory conferences are amazing sources of knowledge and wisdom for what works well for Erlang/OTP applications and what can be problematic.
Hi all! Many of you liked the interviews I made during the Erlang Factory, therefore I decided to interview a couple of more famous erlangers….I hope these interviews will help Erlang newbies to have an insight on the world out there. The interview I propose today is with Dave Smith (a.k.a. dizzyd). Dave is Director of Engineering at Basho Technologies, Inc., in my humble opinion he gave not trivial answers, so I strongly suggest you to read our conversation.
Ask and Answer
Paolo – Hi Dave, thanks for making yourself available for this interview. Please, would you like to introduce yourself to our readers?
Dave – My name is Dave Smith, but a lot of people know me as “Dizzy” — there are a lot of “Dave Smith”s in the world. 🙂 I’ve had the opportunity to contribute to a couple of different Open Source projects over the years, including Jabber and more recently things like rebar, riak, bitcask, etc.
Paolo – Do you remember when you first heard of Erlang?
Dave – I was working on the first commercial Jabber server and had just completed writing a library (in C++!) to manage async sockets on a threadpool. The intent of the library was to make it easy for people to write scalable Jabber components. I read about Erlang and wondered how they had solved that same problem…only to find their implementation was far, far ahead of mine. I thought the syntax was weird and the string handling (all lists, in those days) was poorly suited for Jabber — nonetheless I liked the ideas.
Paolo – Can you describe to our readers your first experiences with this language?
Dave – I think it was a pretty typical experience pre-good books on Erlang. There was lots of fumbling with seemingly random syntax and scratching my head over the point of the OTP stuff. However, even with this pain, I was able to rewrite a business specific system over the course of a long weekend and the end result was a more stable, scalable system with a 90% reduction in lines of code. All very typical for Erlang. 🙂
Paolo – In your past work experiences, you used C++. Was it difficult to switch to Erlang? What features of C++ (if any) would you like to see in Erlang?
Dave – I can’t say there’s anything in C++ that I miss in Erlang. If anything, Erlang has renewed my interest in simpler syntax — I much prefer C over C++ whenever possible now.
Paolo – You are currently Director of Engineering at Basho Technologies, Inc. How does it feel to work in one of the most famous companies using Erlang?
Dave – It’s an honor and privilege to work at Basho. We have a rare freedom to pursue building a product with the best-of-breed language for distributed systems and we do so in a decentralized working environment. More importantly, we’ve managed to assemble a roster of world-class engineers who constantly push each other to improve. I like coming into work and knowing that I’m going to have to work hard to keep up with my co-workers. 🙂
Paolo – Basho is well know among erlangers for riak and rebar. Can you tell us something more about these two products?
Dave – Riak is the product for which Basho is best known. It’s an eventually-consistent, distributed data store that’s designed to scale in a manageable way. Writing this system in Erlang has enabled us to focus on the distributed algorithms and not worry so much about the details of socket, thread management, etc. The expressiveness of Erlang has also enabled us to be a more efficient engineering team and deliver the high quality code one expects from a data store. Rebar is an Erlang build tool that makes it easy to compile and test Erlang code. It uses standard Erlang conventions for choosing what files to compile and makes it very, very easy to create new projects that other Erlang devs will understand.
Paolo – What is in your opinion the biggest benefit a company can have using riak?
Dave – Low, predictable latency data access and operational ease-of-use. When used appropriately (i.e. don’t expect Riak to be a RDBMS), Riak is able to provide an excellent latency profile, even in the case of multiple node failures. In addition, it’s easy to add/remove nodes while the system is running without a lot of operational juggling. Another useful benefit is that multiple nodes can take writes for the same key, thus making it easier to construct a geographically distributed data store.
Paolo – Do you think that you could have reached the same level of product “quality” using another programming language?
Dave – Quality is determined less by the language and more by the values of the team writing the code. The process model in Erlang makes it easier to construct systems that tolerate partial failure; the expressiveness makes it easier to write less code (and thus less bugs) for a given application. However, just because these things are “easier” doesn’t mean you’re going to get a high-quality system. 🙂 Ultimately, quality requires engineering discipline to take advantage of Erlang’s strengths and wield those powers appropriately.
Paolo – Many Erlang programmers I know are students at university. Does Basho provide opportunity for internships?
Dave – At this time, we are not providing internships, unfortunately.
Paolo – If you were a newcomer in the Erlang world, where will you focus your attention?
Dave – Erlang is a pretty wide-open language in terms of stuff left to do. Pick something that’s fun and challenging and go for it!