Databricks - A Unified Platform to collaborate Data, Analytics, and AI

Company Profile is an initiative by ListMyStartUp to publish verified information on different startups and organizations. The content in this post has been approved by Databricks.

The modern world we live in today depends massively on data and information. Everything around us like, the things we use, see, and are being surrounded by, are in one or the other way influenced by technology.

As the need for technology grows, the significance of data started flourishing. With data piling up, the need for a warehouse to store, analyze and process these data for multiple purposes emerged.

This is where Databricks surfaced their platform. Databricks serves as a cloud platform to store enormous data that can be processed and run smoothly. This is an analytic platform that is built on their popular open-sourced product called Apache Spark. They’ve occupied a 10.19% market share and stand to be the third-largest occupant in the digital analytics market.

Databricks - Company Highlights

  • Startup Name-Databricks
  • Headquarters-San Francisco, California, United States
  • Industry-Computer Software, Data, AI
  • Founders-Ali Ghodsi, Andy Konwinski, lon Stoica, Patrick Wendell, Reynold Xin, Matei Zaharia, and Arsalan Tavakoli
  • Founded -2013
  • Valuation-$38 Billion
  • Revenue-$425 Million
  • Total Funding Raised-$3.6 Billion

Databricks - Latest News

October 15, 2021 – Databricks has joined hands with Fintech Australia, a peak financial body in the Australian Fintech industry, as a Gold member in their corporate partnership program. Databricks said that it aims to make the customer experience more engaging and bring more returns on equity.

October 6, 2021 – 8080 Labs, a software company, that helps Python data scientists to explore data quickly without the need for any codes. Databricks acquired 8080 Labs for an undisclosed amount for developing its low code/no code facilities.

Databricks - About

Databricks was established by the creators of Apache Spark, as a Data and Artificial Intelligence (AI) company. It acts as a warehouse for any structured or unstructured data, on the cloud. Databricks also serves as a combined platform for all your Data, AI, and Analytics functions that helps data engineers, analysts, and data scientists to perform huge workloads, seamlessly. This is done by their Lakehouse Platform powered by Apache Spark, which is the best combination of features from Data Lakes (low-cost and flexibility) and Data Warehouses (performance efficiency).

In addition to Apache Spark, Delta Lake and MLflow are the other two open-sourced projects, that are behind the effective functions of the Lakehouse Platform. Databricks provide their Unified Data services through multiple clouds namely, Google Cloud, AWS, Microsoft Azure, and Alibaba Cloud.

Databricks - Industry

Data Industry has turned to be a large and significant industry in all aspects of life and business. According to Statista, the Data Market is expected to grow to a whopping $103 Billion by 2027. It is double the size of its presence in 2018. Artificial Intelligence is another rapidly growing market that has become an essential element in modern industries.

Databricks Founders

Databricks was co-founded by a couple of professors from the University of California and five former Berkeley Ph.D. students.

Ali Ghodsi, co-founder and CEO of Databricks, was one of the creators of Apache Spark. He was a professor at the University of California (UC) as well as a board member in UC’s Rising Lab. He has held the primary responsibility for the growth and expansion of Databricks worldwide.

Ion Stoica, co-founder and Chairman of Databricks, is also a professor at UC Berkeley. He’s also a co-director at AMPLab. In addition to this, he co-founded a start-up called Conviva, for video distribution on a large scale.

Matei Zaharia, co-founder and Chief Technologist at Databricks, was earlier a part of the Spark project and now, is the Vice President of Apache Foundation. ACM Doctoral Dissertation Award was given to him in 2014 for his research in large-scale computer systems.

Patrick Wendell, co-founder and Vice President of Engineering in Databricks, had played a major role in Spark’s operations.

Reynold Xin, co-founder and Chief Architect and takes care of the technical operations in Apache Spark. He won the Best Demo Award in 2011 at VLDB.

Andy Konwinski, co-founder and Vice President of management, takes care of the AI operations in Databricks. Earlier he took care of the company’s market efforts in Spark Summit creation.

Arsalan Tavakoli-Shiraji, co-founder and Senior Vice President of field engineering in Databricks, earlier worked in McKinsey as Associate Principal. He was a former Ph.D. student at UC Berkeley.

Databricks - Startup Story

Ali Ghodsi, the CEO of Databricks was keen on coding since the age of 8 when his parents bought him a used Commodore 64. He pursued his higher education in computer engineering and a Ph.D. in distributed computing. Later, in 2009, he joined hands with Ion Stoica and they together created ‘Spark’, which was already instigated by Matei Zaharia.

They further coordinated with another team working on Machine Learning, and they together introduced ‘Apache Spark’ in the market. At first, no companies paid any attention, as the technology seemed alien. In 2013, Ben Horowitz (Co-founder of Andreessen Horowitz VC), planted some hope in them by investing $14 Million and encouraged them on creating a company, that serves as a platform to run Apache Spark. Thus, Databricks was established in 2013.

Databricks - Mission

Databricks functions with a mission to make Data Unification more efficient, by innovating new techniques to unify Data, AI, and Analytics. They strive to make the customer experience more engaging.

Databricks Logo

Databricks logo resembles two bricks aligned perfectly like data folders organized on a shelf. It seems that Databricks intended to keep the logo with a starting and ending point without any breaks in-between. This may be done to imply that they unify data collection, storage, and analytics functions under one common platform with no need for an exit, as everything is covered here.

Databricks - Business and Revenue Model

Their business model is positioned on the web-based software that provides a platform to work with Apache Spark. It facilitates automatic group management and Python-style notebooks for Data engineers and scientists.

Databricks provides its resources in the form of Software as a Service (SaaS) and generates revenue through its subscriptions. Their major services are through three cloud platforms namely:

  • Microsoft Azure
  • Google Cloud
  • Amazon Web Services

Though the prices vary for each cloud, there’s a common factor to be noted: “Only pay for what you use”. Costs are calculated independently of the services opted for and require no up-front payment. The customers are required to pay only for the number of resources used as they go.

Databricks - Employees

Databricks has over 2800 employees around the world as of 2021. In November 2019, Databricks celebrated the milestone of having hired the 1000th full-time employee for them. Their pace of growth is being reflected in the number of employees hired in the past two years. It took 6 years to reach the first 1000 employees and less than 2 years to hire the rest. Such is their pace of growth in recent years.

Databricks - Social Media Presence

Databricks has more than 45,900 followers on Twitter is more active with at least two tweets a day. Their LinkedIn profile has the most followers than their other social media platforms with over 2,64,700 followers. Databricks utilizes these platforms to promote its products and services to gain a market advantage. They also post regarding their world tours and launch events with their latest inventions. Links to Blogs and Articles featuring Databricks or their products and information related to job openings can also be found on their social platforms.

Databricks - Growth and Revenue

Databricks was established in 2013, keeping Spark Technology as its core. Its formation was immediately succeeded by a rumor that ‘Spark Technology won’t work if your data doesn’t fit in their memory’. This discouraged businesses to use Spark.

Finally, in 2015, the founders decided to end these rumors by participating in a contest where they beat the world record for processing one petabyte of data in the lowest time and as a result, they gained media attention and popularity.

By 2017, they were valued at $500 Million but their annual revenue was way lower at $1 Million. Later, participating in the ‘sorting contest’, making some changes in employee hiring and deciding to build software with features demanded by large enterprises, turned out to be fruitful.

Since then, Databricks’s growth is only climbing uphill. Their revenue hit the $100 Million mark for the first time in 2018 and took just another year to reach $200 Million in 2019. The introduction of the Lakehouse feature was a primary factor for its success. The company’s valuation grew from $6.2 Billion in Q3 of 2019 to around $38 Billion in Q3 of 2021.

Databricks reported annual recurring revenue of $425 Million in 2020 and investors hope that the revenue might reach $1 Billion by the end of 2022.

Databricks - Competitors

Some of the top competitors of Databricks are:

  • Snowflake
  • Cloudera
  • Datastax
  • Qubole
  • Alteryx
  • Dremio
  • Intellicus

Here are a few comparisons with some competitors:

Snowflake - Snowflake is much larger than Databricks. They both offer similar services with few differences (Databricks processes large data while Snowflake offers elasticity of cloud data for centralized access) at a flexible price. Databricks is making a long battle to overcome its competitor.

Cloudera - Cloudera provides a common cloud storage and management platform that stores, processes, and analyses data for an organization. It is similar to that of Databricks in the form of Data Warehouse, Processing, and Distribution.

Databricks - Future Plans

It is evident that Databricks is currently working on two of the fastest-growing big data domains, Streaming and Deep-Learning. They’re building a multi-faceted Application Programming Interface (API) to process these two domains. Databricks is also keen on accelerating the innovation of Data Lakehouse to gain a greater advantage by conquering data-driven organizations.

Databricks - FAQs

What is Databricks?

Databricks is a cloud-based tool for storing and processing huge quantities of data using Machine Learning models. This is done through their Apache Spark tool.

Who founded Databricks?

Databricks was co-founded by seven people namely, Ali Ghodsi, Ion Stoica, Matei Zaharia, Patrick Wendell, Reynold Xin, Andy Konwinski, and Arsalan Tavakoli-Shiraji.

How much has Databricks secured through funding?

Databricks secured around $3.6 Billion through 9 funding rounds.

What is the annual revenue of Databricks?

Databricks has reported an annual recurring revenue (ARR) of $425 Million for the year ending 2020.

Who are the clients of Databricks?

Databricks has around 6000 customers worldwide. Some of their popular clients are:

  • Shell
  • CVS Health
  • Regeneron
  • T-Mobile
  • HSBC
  • Comcast