The word confluence is defined by the Cambridge dictionary as “a place where two rivers flow together and become one larger river”. While this doesn’t exactly depict what is occurring in the open-source market for event streaming software, there are two raging rivers in this market that will continue to define the segment - Confluent and DataStax.
We recently learned the details behind the Confluent IPO, the Company supporting the well-known Apache Kafka distribution. Targeting a valuation of $8.3 billion, this offering will be a watershed for the open-source community and a barometer for future issuances rumored to be coming to market including Hashicorp and DataBricks.
This week, we had the pleasure of interviewing Ed Anuff, Head of Product at DataStax to share details on the Astra Streaming launch - an exciting new event streaming platform based on Apache Pulsar. Ed has been involved in the enterprise tech ecosystem for several decades so it was a real privilege to hear his insights around how open source is evolving within DataStax and more broadly. Check it out below!
Have you subscribed…yet?
Private Markets
BrowserStack, a Dublin- and San Francisco-based software testing platform maker, raised $200 million in Series B funding at a $4 billion valuation. BOND led the round and was joined by investors including Insight Partners and Accel. Open-source <3.
Prefect, the company behind the data workflow automation system of the same name (initially based on Apache Airflow), announced their $32M Series B led by Tiger Global.
RudderStack, creators of the open source customer data platform of the same name, announced their $21M Series A led by Kleiner Perkins.
Komodor, a startup building troubleshooting tooling for Kubernetes operators, announced their $21M Series A led by Accel.
PostHog, creators of the developer-friendly product analytics tool of the same name, announced their $15M Series A led by YC.
Commit, a startup matching up engineers looking for a new job to early-stage startups that want to hire them, and creators of the Zero project, announced their $6M Seed led by Accomplice.
Public Markets
To track the performance of COSS companies, we’ve created an equal-weighted index comprised of public names including: MongoDB, Elastic, Talend (acq. by Thoma Bravo announced), Cloudera (acq. by KKR/CD&R announced), Rapid7, Fastly and Jfrog.
The COSS Index continues to underperform the broader markets, but has trended upward in recent weeks, paring losses from earlier in the year.
COSS Index -9%
NASDAQ +9%
S&P 500 +11%
The four-week rally raised the rolling three-year performance of the COSS Index above both the S&P and the NASDAQ.
COSS Index +85%
NASDAQ +80%
S&P 500 +51%
COSS companies traded up over the last two weeks and extended their winning streak over their Emerging Cloud peers to four weeks. All three indices continue to trade significantly higher than their rolling five-year average.
COSS Index: Current Multiple 14.6x | Five-Year Mean: 7.9x
Emerging Cloud Index: Current Multiple 13.9x | Five-Year Mean: 9.3x
NASDAQ Composite: Current Multiple 4.1x | Five-Year Mean: 3.2x
Interview with Ed Anuff, Chief Product Officer at DataStax.
Ed Anuff has over 25 years of experience as a product and technology leader at companies including Google, Apigee, Six Apart, Vignette, Epicentric, and Wired. He led products and strategy for the successful Apigee API Platform, helping to make it the recognized category leader and leading Apigee to its 2016 acquisition by Google. He also founded enterprise portal leader Epicentric, which was acquired by Vignette.
In the 1990s, at Wired, Ed launched one of the first Internet search engines, HotBot, and authored one of the first textbooks on the Java programming language.
OSS: Why launch streaming with the Astra product?
Ed Anuff: There's a couple of different pieces to that so first of all when we look at how Apache Cassandra is being used by our major customers, this is a situation where people are dealing with massively large datasets. You don't switch to Cassandra just because you've got a small web app that needs to store a few records. Enterprises use Cassandra in situations where you've got very large amounts of data that need to be accessed in a geographically distributed way. It's connected to a number of different systems within your organization that are using it for operational data but this data is also the lifeblood of your analytics. So it's connected data at scale. And so you start looking at that and given that those are the projects we're in, this is not even an adjacency to us, it is literally the same project. And project after project, we’re talking to the leaders that have a technology infrastructure that they use in running their systems. And what you see is there's a new stack that's emerging. There's an open data stack that layers these things together and it is scale-out NoSQL data, and it is streaming. Same users, same buyers, same projects. And so we looked at that and we said, the mandate doesn't get much more clear than that.
So then, the question becomes is there an opportunity to do something better. And there are other companies and other projects and Kafka has been around for a while and I've used it on many projects in the past. What Chet (CEO, DataStax), Jonathan Ellis (Founder, DataStax), and myself said was the idea of streaming is a great idea. And obviously, a lot of people are using Kafka because you know it was a good start but it's not the last word. and There's this other project Apache Pulsar, and we think what's going to happen is that there will there's gonna be a similar dynamic to MySQL and Postgres. Once people start saying okay I need streaming, but I need it to be scale-out, I need it to be reliable, I need it to be native on Kubernetes, you're going to have the same dynamic - which is that Pulsar is going to be a better alternative. These things are, going back to this MySQL or Postgres analogy, the drivers that are virtually the same. It's the same flavor of SQL with a few differences and the switching costs are very minor. You've just got a better architecture and now, five years later, nobody's talking about MySQL anymore. They're all talking about Postgres. So we believe it's the same thing and there's an opportunity from a technology standpoint to distinguish ourselves. The opportunity to deliver a better product that the majority of our customers need was a clear-cut rationale for something we should be doing.
OSS: What was the build-buy-partner framework that DataStax used to evaluate the decision to incorporate Pulsar?
Ed Anuff: We actually started two different open source projects. Last year we started, Stargate, which is a data gateway that brings GraphQL and REST and document-type APIs to Cassandra so that people who are trying to build the types of applications that you build on top of MongoDB or Couchbase can easily build those types of applications on top of Cassandra. We also started the K8ssandra k8ssandra open source project that brings Cassandra to Kubernetes. It's a really good question that you asked there on the different options. What we first started with was we said streaming is a really interesting space. We looked at what was happening with Kafka and we looked at the different projects. We said DataStax DNA is about building open source communities and we also firmly believe that architecture matters. We're a little bit geeky in that way. We've lived this too many times and we know the architectural choices we make we will need to live with for decades. WAnd we said streaming is streaming and it’s in the early stages of a 20-year journey. That's just how long these things take. We looked at what was being done with Kafka and we looked at what was being done with Pulsar. We looked at a number of other things and we set up tuning, including whether we do we build something ourselves and we liked what was happening with Pulsar. We said this is the way we would build it ourselves and there's a motivated community of people that we want to team up with that are taking this in the right direction.
There's a lot to be said for the fact that the community has already a set of people. It's got a project that's in the Apache Foundation. It's got all the things from a governance model that are in place. These are all good things. So we looked at Pulsar as the first piece of the decision and honestly, from a business standpoint, the question even two years ago was should we jump aboard Kafka. Even then, it was not attractive because we would be inheriting other people's baggage versus building around Pulsar. So the Pulsar decision came first. Then we started working with Pulsar. We did not get into Pulsar through an acquisition. We were working on Pulsar. We were building our own distribution of Pulsar. We were building out our team. We were hiring people and then we saw what Kesque was doing and they had done a really good job of figuring out how to bring the solution to the SMB market. So we realized they would accelerate what we were doing. Our philosophy with these things is we start with trying to understand where we want to go. And then we go and say what folks are going in that direction that we should be working with that can accelerate us getting there.
OSS: How are enterprises thinking about using the Astra Streaming product?
Ed Anuff: A lot of companies have, or are at the crossroads of figuring out Kafka. First and foremost, they understand what streaming is, they understand how it relates to high-speed messaging which is the precursor to streaming. They have been using those technologies for many years, so they understand the ideas of publishing and subscribing. These enterprises must then leverage and understand how modern streaming integrates into what they’re doing. And then what's teed up are a lot of operational questions. Those operational questions are about the reliability, sustainability, and cost model at scale. It's the whole panoply of what goes into operating software in these types of environments. Typically, enterprises come in and we get a lot of questions about things like Kafka compatibility and transition points. So one of the things that I think you're going to see within the streaming community is the idea of the Kafka ecosystem. We're already seeing people talking about streaming, even when they talk about Kafka, do they mean Confluent's flavor of it or do they mean things like Red Panda or Red Hat's distribution. Enterprises are going to have multiple approaches and what's really going to be important is going to be the API. There's no question that it's going to be the Kafka API. We're comfortable with that and most people are comfortable with that but the Pulsar community has Kafka compatibility and API compatibility that is part of the project, going back to the MySQL versus Postgres analogy where there has to be a common language and then you get interoperability. Some of these projects that we've done have used Kafka and the question arises around how hard it isis it to migrate and the answer is it’s pretty easy. So, then the question is whether they're looking at self-managing or whether they're looking at a hybrid approach, or whether they're looking at a cloud version. We’ve produced a cloud version this year and announced this week streaming and that was saying we figured out streaming in the cloud. What we did for Cassandra, meaning that we've figured out how to make it serverless and how to make it truly elastic, how to make it something that is low-friction self-service provisioning. All of this good stuff we have brought to the Pulsar experience means that now you can go and have a push-button experience. That's a pretty big deal. That addresses a lot of operational concerns but we also introduced earlier this year, what we called Luna Streaming and Luna is our term for our open source support offers. We have a Luna for Cassandra as well. You could pick Apache Cassandra and we will support that for you. We provide that same thing for Luna streaming. A lot of people are going to say, I want to only use Pulsar self-managed in my data center but I want the DataStax certified distribution and the toolset and so on. We link these things together. We have a hybrid control plane that allows you to go and manage across both Cassandra and Pulsar as well as manage between cloud and self-managed.
OSS: What are the most important components in building a strong open source offering?
Ed Anuff: So what ends up happening is you go in and spend a bunch of effort getting into an enterprise. The problem is there are only 500 companies in the Fortune 500. So you go in and you get all of this effort to get into these projects and you see a lot of enterprise software vendors look like this. You look at their logo list right now and because of the economics of their offering, they are really only suitable for these specific use cases where there's a high level of understanding around what the investment is worth. And so what you miss out on is all of the projects that are growing. If you don't have elastic pricing. If you don't have that ability to have developers uptake, you'll do all this effort to land, but you'll have no expansion. That's the piece where there has been sort of the death march for a lot of enterprise vendors. So you look at companies right now that are thriving. I don't mind mentioning some of them are competitors, and what they've got right is that they have an elastic model where it becomes very easy for them to go in and pursue wherever opportunities their technology is suitable for even if they're going to be selected into a very narrow slice of the environment. I would say that having a thriving community is very important as well because this is something the cloud providers are not great at and this is where open source companies have their strength. Our investments in the Apache Cassandra community, the amount of engineers that we have working full-time just fixing bugs on Apache Cassandra is mid-double-digit which is similar for Pulsar. We said we have to earn our way into this community. We've been recruiting people for three years now and we've been making contributions and it's part of our core DNA. We wouldn't get into a community if it wasn't something where we were going to be committing code to it and very involved in it. We're not just a package or just a hoster of this stuff. The community experience is very hard to fake. It's something that you earn and have to re-earn and you can't just rest on your laurels.
Extra Stuff