If you’d like to, you can listen to this interview! Transcript condensed and edited from the original audio.
Andrew: Could you give us a walk through about what the problem was that you were trying to solve, and what you were doing and then how you wound up using Elasticsearch as a technology.
Mahesh: To absolutely generalize this, we collect data from innumerable sensors, devices, things all over the place. This is data that ranges from a couple hundred bytes through 20 or 30 kilobytes at a shot and it comes in pretty frequently. Once every 15 to 20 seconds or so.
Effectively what happens is when you look at the data that we get, it very broadly can be broken down into two categories. There is what I roughly call the time series data, covering how many of something happened, a device reporting something new, a device changing state. Essentially things that have a time variability component to them. More importantly, we need to know when things changed, how many times they changed, things along those lines. That part is kind of for us the classic Cassandra application that is what we do with it and we move it off to one side.
There’s a completely different side to this which is what I like to call my “state of the world” scenario which is at any given point in time I’d like to know exactly what the current state of every single thing that we control is. That’s Elasticsearch for us. It can best be thought of as a large persistent JSON cache.
Andrew: Interesting, so you guys aren’t using it as an authoritative data source?
Mahesh: Yes we are. The point really isn’t “I’m caching it and then I dump it into something else and that’s the end of that,” au contraire it’s the other way around. Pick a thing, a sensor for instance. The sensor sends us some data. Every time it sends us data we overwrite the previous data that the sensor sent because the sensor is basically sending “this is what I am”, “this is what I am”, “this is what I am”. All I want to know is what the value of what the contents of the data that that particular sensor sent to me is.
Andrew: Got it, so is Elasticsearch kind of like a write through cache and Cassandra is a timeline of what’s happened in that set up?
Mahesh: Bingo, that would be a good way to think about it. The use cases for both of these are very different. If I asked something like “what sensors do I have in Lithuania?” That’s your classic geospatial search that I can go in and bang into Elasticsearch there’s GPS coordinates and boom I get my data back. However, If I said something completely different along the lines of “how many of sensor type 37 do I have?” Again, Elasticsearch an elasticsearch query, but, again what I’m doing is I’m looking at the current state of every single sensor that I have loaded in the system.
Andrew: The use case it sounds like a huge component of why Elasticsearch is useful for you guys is due to the fact that it has a pretty flexible query DSL on top of what is a very scalable distributed database. Is that why elasticsearch was chosen?
Mahesh: Among other reasons, yeah. The thing with Elasticsearch is that it’s got about 82 different use cases and that’s only because that’s when people stopped counting. For us it’s better to think about this from an evolutionary perspective. It’s a combination of Memcached, CouchDB, BigCouch to be precise–that’s a dynamo version of CouchDB–and Mongo. Mind you I hate the fact that I used the word Mongo but the point remains that what –to go off on a bit of a tangent the one thing–Mongo/10gen did spectacularly well is their documentation and the GUI is unsurpassed. If only the backend was worthy of the frontend. It’s a database that’s designed by GUI people, it’s amazing. That ease of use is something that Elasticsearch has. It’s got a very nice easy to use query mechanism. It’s spectacularly good for caching. Mind you it’s not an instantaneous key value cache, but it’s pretty damn close.
Andrew: Just to kind of bring that to a larger context. I’m not a huge user of distributed databases so I think this is a great question to ask you. A lot of the distributed databases out there, which are based off the Dynamo paper: Dynamo and Cassandra, and also ones that aren’t like Dynamo, with Hadoop and HBase share a common attribute. It seems like the approach a lot of those systems had was to design a very strong data layer and leave the querying part as a problem almost for somebody else.
There’s no first class querying mechanisms in almost any of these things, you can use Hive or Pig or whatever map-reduce thing you come up. Elasticsearch, however, came the other way around. It was a single machine full text search engine first with plain Lucene, and then it became a distributed search engine later with Elasticsearch.
Mahesh: That’s actually a kind of a nifty way to think about it. To go off on a bit of a tangent, what I tend to do is to look at this from a slightly different perspective. I tend to break the world of–I’ll call it SQL versus no SQL–but that’s poor terminology, but what the hell it’s what people use. I tend to look at it as scooters versus motorcycles. The thing is that at a first approximation Oracle, PostgreSQL, MySQL, Informix, whatever, they’re the same damn database. I mean everything does everything. They basically all have the same set of capabilities.
That’s the scooter approach as in all scooters are effectively the same damn scooter, there is no difference between them. I’m not saying they’re not powerful you can get like 150 horsepower scooters but the point is that they’re the same scooter. Motorcycles on the other hand come in an infinite number of varieties and they’re all used for completely different things.
The thing about the no SQL data stores is each of them has a bit of an origin story and the origin story is very distinct and as long as you live within that origin story life is great and yeah they add features to kind of bring in other capabilities, but the real value is in that core origin story.
Elasticsearch started off as Solr and Lucene. Later, they brought in this idea of combining a very powerful search mechanism with real multi-master distribution, which basically doesn’t exist out there in this sense. You can get one or the other but both you rarely get. The key here, is that very explicit disconnect, where in Elasticsearch once you throw the document into an index it’s not going to be there until the next refresh tick.
That’s a very explicit design decision that allows you to distribute, allows you to scale and allows you to have your search mechanism, but you have to be very cognizant of it from a design perspective. If you want it accessible within the index as part of your fancy searches it’s not going to be there for the next 5 seconds or 10 seconds or whatever just live with it. It turns out that most people can live with it.
Andrew: This is something people wind up facing and dealing with varying levels of enjoyment in terms of asynchronous replication in the RDBMS world. Making it async from the get go in terms of search at least really does force you to design your application asynchronous replication, which is something that at scale is one of those compromises you have to make.
Mahesh: The reason I very explicitly brought this up is that this is one of those design decisions that people always talk around, they never actually get asked. For example, in the world of Riak when you talk about eventual consistency, you’ll be shocked at home many people don’t use it as an eventually consistent system. They’ll always throw it in so that the rights always have quorum. People do that because it’s hard to make the disconnect. It is so much easier to think of things as the moment I write something it’s accessible for my read.
Andrew: When you guys brought a Elasticsearch into your organization, knowingly making these tradeoffs, what was the developer experience? I guess you were kind of walking through a situation where most developers aren’t very comfortable with eventually consistent databases and reconciling conflicts and the implications of that. How did you guys as an organization approach that?
Mahesh: Okay in this particular case no issue whatsoever. Largely because everybody on our distributed development team implicitly buys into this approach.We’ve been doing this for years at this point so we don’t even think about doing it any other way.
Andrew: Where were you guys coming from having done it for years? I know Cassandra was in there, was there any are there any other data-stores like that?
Mahesh: Just as a bit of background on this thing, way back or at my previous gig I used to run a phone company. At that phone company at one point when CouchDB came out, we were running into a bunch of issues with our primary data stores. and we were basically running PostgreSQL, MySQL and Memcached and we had issues.
We started using CouchDB as our datastore. The thing to remember is that back then the implicit assumption was you used your SQL databases and that’s just the way life worked. Moving into a world in which data that gets put into the system is not necessarily immediately accessible was kind of shocking. It was shocking at so many different levels, it kind of forced us to rethink what it meant to build a distributed system.
That immediately led us into fun things like ZooKeeper, Cassandra, and a whole bunch of other areas that we just never thought about. Things like leader elections, eventual consistency, knowing about data reconciliation, merge conflicts and all these other fun things. These things that we didn’t know existed previously.
Andrew: Would you say that there’s for Elasticsearch for most people. There’s the “I have a database and I want to have a full text search version of it” use case. Then, there’s more of what you guys are doing which is you are actually using it as a primary data store. For the primary datastore case it sounds like if you don’t have your development team on board in terms of how an eventually consistent database works then it might not be a good fit for your team or you think ?
Mahesh: That would be a bit of an understatement. To take this to put this slightly differently and this is secondhand experience, it’s not personal experience: When you’re designing a distributed system the edges are where the problems occur. When you use Elasticsearch, if you’re using it for anything other than that as in a single one shot environment if you do not explicitly understand how distributed environments work really bad stuff is going to happen. This is the thing, the bad stuff is not going to happen upfront, bad stuff is going to happen once you hit scale, once your load goes up. It’s explicitly going to happen when you’re feeling really comfortable about life.
Andrew: Yeah because a lot of those edge cases are very much timing related. It’s the same kind of edge cases you deal with when you’re dealing with threading.
Andrew: Speaking of which is your development team what is your primary language? Is it Erlang?
Andrew: I feel like these kinds of problems coming from an Erlang mindset make more sense than coming, perhaps, from most imperative languages.
Mahesh: It’s true because for us the distribution problem space is implicit in the way we develop. Concurrent development just kind of works. Now that said you have to realize that we are actually a very polyglot shop. We’re running Erlang, Java, Ruby, Python, C++, Backbone, CoffeeScript, Node.js, you get the point? I mean there’s a little bit of everything. My philosophy has always been “the right tool for the job.” So, infrastructure stuff we write in Erlang, no questions, that’s just it, it just works. Frontend development invariably gets done usually in Ruby. It’s all over the place.
Andrew: When you’re modeling problems with Erlang in Elasticsearch how does that look? What’s the general sense of fit between erlang and elasticsearch as a database?
Mahesh: Erlang is effectively a concurrency oriented programming language. The core concept in Erlang is that everything is a process. Think about process as a very lightweight thread. Processes only communicate with each other via messages, there’s no such thing as shared state. That can be both good and bad as usually you want to have some kind of mechanism that other people can access, you want to get stuff out to the outside world or you want to share a particular piece of information across multiple things and just the sheer overhead of sending messages back and forth can start killing you.
In that kind of a world for us things like Elasticsearch in particular work spectacularly well because it is a very easy to distribute caching mechanism. I’m using caching as a very loose term. The point behind cache isn’t cache as in your L1, L2, L3 cache in your CPU. What I’m actually referring to is a data store that is uniformly, for a given definition of the term uniform, uniformly available and accessible across all of your nodes and that you can access consistently regardless of what node you’re on. Because in an Erlang full mesh network you’re completely location invariant. “I don’t know what node I’m on”, “I don’t care what node some other process is on”, “I just send a message to the process and the system figures out what node it’s on and delivers it to that process.”
When I’m hitting ES that’s pretty much exactly how this works too as in I don’t care where I’m going I just write a piece of data and it’s implicitly assumed that that data is available regardless of where it’s being accessed or I lose access for that matter come to think of it.
Andrew: That of course is in reference to the nature of Elasticsearch, where one can write to any node and it routes the data where it needs to go and there’s no notion of masters. I know some people actually do use dedicated data-less master nodes. Is that something you guys have worked with at your company?
Mahesh: Okay. When you set up your Elasticsearch cluster you can do all sorts of fun things, right? You can do things like these particular nodes are index only nodes, these are data only nodes, blah, blah, blah, right? The way you structure your datacenter and the way you structure the actual nodes that you’re running Elasticsearch on, there’s a tremendous amount of variation and a tremendous amount of control that you have over it.
The problem tends to be that, that structure bleeds into your software architecture. “I’m going to make requests against only nodes that have indexes completely in memory.” That’s great! That’s a wonderful idea! Now you are implicitly assuming that your request will always return in instantaneous time and you’re also implicitly assuming that you’re doing reads against memory and so on. Not a bad idea to do that, but you are now you’ve joined the way your data is structured on the disk so to speak with the way your software is structured. That does provide for a bit more efficiency but what you lose is essentially architectural flexibility.
Andrew: Interesting, I guess it’s value inconsistency over optimization. Is that correct?
Mahesh: That would be correct, that’s exactly it. To me, one of the core patterns that I tend to follow is no premature optimization.
Andrew: That’s a very good point. I think one of my favorite quotes in all of software is “make it work, make it elegant, make it fast.”
Andrew: Interesting, so have you guys had any scaling problems using Elasticsearch and I guess it would be good to talk a little bit about the sense of your scale and then what kind of issues you’ve had.
Mahesh: Unfortunately I can’t actually get into that. Andrew: Is it more than one machine? Mahesh: Yes, oh yes, by far. Andrew: Okay. Mahesh: That’s about as far as we can go on that. To wind this back a little bit, part of the reason for separating the architecture of the Elasticsearch cluster from the software architecture is because of the scaling issues. It’s not that there are scaling issues, but we run at a large enough scale that I don’t want us to have to tweak the software every time something changes on the backend.
Andrew: What you’re saying is, aside from tweaking the Erlang software… you don’t want to be running with small margins when you’re tweaking Elasticsearch configuration settings that have an impact on the actual software.
Mahesh: That is precisely the case, yes.
Mahesh: Elasticsearch really works spectacularly well without you having to do much of anything. I mean it just pretty much works. There are about 8,000 knobs that you can tweak. The entertainment is once you start getting to the point that you need to start tweaking knobs, it really behooves you to have people who know what they are doing. That said, my implicit recommendation in all of this has always been don’t tweak the knobs till you actually have to tweak them.
Andrew: If you could go back to when you were initially adopting Elasticsearch… how long ago would that have been by the way?
Mahesh: This time around 4 months.
Andrew: Four months, okay but you’ve used it before?
Mahesh: Prior to that I’ve been using it for a long time, longer yeah.
Andrew: Okay, so if you could go back to when you were first using Elasticsearch and give yourself some hard won advice, what would that advice be?
Mahesh: God, read the documentation.
Andrew: Read the docs that would be number 1, okay. I think every developer can empathize with “oh I get it, I’ll just use it.” I think we’ve all done it.
Mahesh: The problem with Elasticsearch is that at this stage it does a couple of things extremely well, and you can be led into your false of similarity. You’ll look at it and you say to yourself “this works exactly like foo, I’m just going to use it like I use foo.” This is a very bad idea, hence, read the docs.
Andrew: In your initial comparison to other databases you mentioned Cassandra, you mentioned Mongo, you mentioned CouchDB. Elasticsearch does share a large amount of similarity with all of those, but it’s not really very much like any of those the closer you get to it.
Mahesh: That’s basically it, and I can’t count the number of times I’ve met somebody who built stuff that worked and then we went back in, ripped it all up, and rebuilt it because they ended up getting it to work in the wrong way.
Andrew: Do you have any specific examples of common pitfalls people have? One thing you mentioned was using correct write consistency, either quorum, or all, too much of the time. Anything else?
Mahesh: One of them really doesn’t apply that much anymore, but in the early days Elasticsearch was kind of like the big plug-in into Couch, you used Elasticsearch to do full text indexing on CouchDB.
Mahesh: The implicit assumption was that you would do the searches using the same map reduce mechanism that you used in Couch. You could make it work that way, it was highly humorous to make fun of the developer after that. The other one and this one actually goes right back to the beginning. You really need to understand that data that you’re writing is not going to show up in the indexes immediately.
Mahesh: Part of me really feels like that should be emblazoned in big red letters on the very front of the Elasticsearch.org. At this point I don’t think I know anybody that does that, but it was pretty regular. There’d be a problem, we track it down, it would turn up because there’s an implicit assumption that the data would be there and it wasn’t showing up. It’s also not trivial because as I mentioned earlier you need to think about how you’re structuring your queries, not just your queries how you’re structuring your writes, how you’re structuring your application to make sure that you don’t make that assumption.
Andrew: Exactly and a huge part of that also is depending on what you’re doing, is possibly preparing product owners as well for that kind of thing.
Mahesh: Right, yeah.
Andrew: Do you have any other battle stories or things that you would like to bring us that I guess you’d like people to be aware of?
Mahesh: No battle stories per se just more from the big picture perspective. If you go down the road of using Elasticsearch you have to be kind of careful. One of the things that happens when you start using Elasticsearch is, you will start discovering that you could do many different things with it. Some of those are not necessarily core competencies of Elasticsearch.
Andrew: In other words there’s things that it’ll kind of do but using them is more of an abuse of it. What would those things be would you say?
Mahesh: Let me store my time series data in Elasticsearch.
Andrew: Yeah, you guys don’t do that.
Mahesh: Yes, we explicitly do not do that, but it would be so easy to do it at small levels because you can kind of make it work. You think it’s great, and you have the faceting you can and stuff that allows you to do automatic counts and stuff. It all could just kind of work. The problem with “things kind of work” is that systems expand in directions that you can never predict and the fact that it works at a small enough scale does not mean that it’s going to work at a large scale.
Andrew: Got it, yeah. Time series specifically is something I’m guessing you guys tend to tackle with map reduced over Cassandra?
Mahesh: That is correct, yes.
Andrew: I suppose people can perform map reduce operations over Elasticsearch but that’s really not what it was designed for. It was more designed for the full text search and Lucene and you guys definitely prefer working with Cassandra here.
Mahesh: That is correct. Besides the thing about time series data is remember it’s not just map reduce. You’re also doing things like time slices and data slices. You’re doing on disk compression, there’s all these things that implicitly are provided in a column store that are not necessarily relevant in what I would call a JSON blob store.
That distinction, to take a step back on this one, the uber point that I was trying to make, is that data stores have origin stories. Data stores have sweet spots. As long as you stick to the sweet spot you really will never have a problem with that particular data store. You can start moving away from it, hell they all provide other capabilities, but you really need to be aware of when you’ve really strayed from what the sweet spot is. This kind of gets into the world of polyglot persistence and so on, which is a separate topic, but there is a strong tendency, a very strong tendency in many organizations, and when I say organizations I don’t necessarily mean large companies, big, vast and strong, whatever.
There’s a very strong tendency to say you know what let’s just standardize on one system, what are we going to use? Let’s use Postgres. Let’s use Elasticsearch. Let’s use just this one thing because that way we don’t need to worry about different things. That way lies madness.
Andrew: Yeah, I definitely have written some crazy SQL queries and sometimes it’s a hard thing to tell. There’s a significant grey area where a lot of times companies get pushed away past the grey area and into the darker shades.
Mahesh: I mean the sheer one of the reasons why we now have 5 different data stores that we use as part of our production systems is because each of them does what it’s doing spectacularly well. Why try and cross the streams on the stuff? Don’t. The savings that you get on administration and maintenance don’t actually exist. There is no such thing as administrative savings when you are sticking with one data store, it just doesn’t work that way.
Back to Elasticsearch, the point being that when you’re doing full text indexing, when you’re using it as a caching mechanism, when you’re using it as a rapid store and retrieval mechanism it works spectacularly well. If you start moving away from this and you start saying to yourself well I can use it for, I have seen people do transaction type things across multiple documents in Elasticsearch, it makes for a very entertaining debugging session.
Andrew: That’s pretty scary, I think I saw a lot of that when Mongo first came out and people were doing financial transactions in MongoDB.
Mahesh: Really? That’s where you want to go?
Andrew: Well thanks a lot Mahesh for the interview, I really appreciate it. I think that was very good, I think I got a lot out of it.
Mahesh: Entirely my pleasure Andrew, entirely.