Databricks is open sourcing Delta Lake to counter criticism from rivals and take on Apache Iceberg as well as data warehouse products from Snowflake, Starburst, Dremio, Google Cloud, AWS, Oracle and HPE. In an effort to push past doubts cast by its data lake and data warehouse rivals, Databricks on Tuesday said that it is open sourcing all Delta Lake APIs as part of the Delta Lake 2.0 release. The company also announced that it will be contributing all enhancements of Delta Lake to The Linux Foundation. Databricks competitors such as Cloudera, Dremio, Google (Big Lake), Microsoft, Oracle, SAP, AWS Snowflake, HPE (Ezmeral) and Vertica have criticized the company, casting doubt whether Delta Lake was open source or proprietary, thereby taking away a share of prospective customers, analysts said. “The new announcement should provide continuity and clarity for users and help counter confusion (stoked in part by competitors) about whether Delta Lake is proprietary or open source,” said Matt Aslett, research director at Ventana Research. With the announcements, Databricks is putting customer concerns and competitive criticism to bed, said Doug Henschen, principal analyst at Constellation Research. “In competitive deals, rivals such as Snowflake would point out to would-be customers that aspects of Delta Lake were proprietary,” Henschen said, adding that Databricks customers can now trust that their data is on an open platform and that they’re not locked into Delta Lake. Databricks refers to Delta Lake as a data lakehouse, a data architecture that offers both storage and analytics capabilities, in contrast to the concepts for data lakes, which store data in native format, and data warehouses, which store structured data (often in SQL format). Competition grows in commerical open source market With an increasing number of commercial open source projects in the data lake market, Databricks’ Delta Lake may find itself facing new competition, including Apache Iceberg, which offers high-performance querying for very large analytic tables. “There are also open source projects that have recently started to be commercialized, such as OneHouse for Apache Hudi and both Starburst and Dremio coming out with their Apache Iceberg offerings,” said Hyoun Park, chief analyst at Amalgam Insights. “With these offerings coming out, Delta Lake faced pressure from other open source lakehouse formats to become more functionally robust as the lakehouse market starts to splinter and technologists have multiple options,” Park added. Many other players in this space are focused on Apache Iceberg as an alternative to Delta Lake tables, Ventana’s Aslett said. Delta tables, in contrast to traditional tables that store data in rows and columns, can access ACID (Atomicity, Consistency, Isolation, and Durability) transactions to store metadata to help with faster data ingestion. In April, Google announced Big Lake and Iceberg support, and earlier this month, Snowflake announced support for Apache Iceberg tables in private preview. The Iceberg announcements, just like Databricks’ open source strategy, aim to appeal to prospective customers who might have concerns about committing to one vendor and the prospect of having access to their own data encumbered down the road, Henschen said. In the face of renewed competition, Databricks’ move to open source Delta Lake is a good move, said Sanjeev Mohan, former research vice president at Gartner. “Databricks’ announcement to open source the full capabilities of Delta Lake is an excellent step to drive wider adoption,” said Sanjeev Mohan, former research vice president for big data and analytics at Gartner. Delta Lake 2.0 offers faster query performance Databricks’ Delta Lake 2.0, which will be fully available later this year, is expected to offer faster query performance for data analysis, the company said. Databricks on Tuesday also released the second edition of MLflow—an open source platform for managing the end-to-end machine learning lifecycle (MLOps). MLflow 2.0 comes with MLflow Pipelines, which offer data scientists predefined, production-ready templates based on the model type they’re building to allow them to accelerate model development without requiring intervention from production engineers, the company said. According to analysts, MLflow 2.0 will serve as a more mature option for data scientists as machine learning production continues to be a challenging process, and translation of algorithmic models into production-grade application code on securely governed resources continues to be difficult. “There are a number of vendor solutions in this space including Amazon Sagemaker, Azure Machine Learning, Google Cloud AI, Datarobot, Domino Data, Dataiku, and Iguazio. But Databricks serves as a neutral vendor compared to the hyperscalers and Databricks’ unified approach to data and model management serves as a differentiator to MLOps vendors that focus on the coding and production challenges of model operationalization,” Amalgam’s Park said. The move to release MLflow 2.0 eases the path to bring streaming and streaming analysis into production data pipelines, Henschen said, adding that many companies struggle with MLOps and fail even after successfully creating machine learning models. Related content analysis Beyond the usual suspects: 5 fresh data science tools to try today The mid-month report includes quick tips for easier Python installation, a new VS Code-like IDE just for Python and R users, and five newer data science tools you won't want to miss. By Serdar Yegulalp Jul 12, 2024 2 mins Python Programming Languages Software Development analysis Generative AI won’t fix cloud migration You’ve probably heard how generative AI will solve all cloud migration problems. It’s not that simple. Generative AI could actually make it harder and more costly. By David Linthicum Jul 12, 2024 5 mins Generative AI Artificial Intelligence Cloud Computing news HR professionals trust AI recommendations HireVue survey finds 73% of HR professionals trust AI to make candidate recommendations, while 75% of workers are opposed to AI making hiring decisions. By Paul Krill Jul 11, 2024 3 mins Technology Industry Careers how-to Safety off: Programming in Rust with `unsafe` What does it mean to write unsafe code in Rust, and what can you do (and not do) with the 'unsafe' keyword? The facts may surprise you. By Serdar Yegulalp Jul 11, 2024 8 mins Rust Programming Languages Software Development Resources Videos