Anirban Ghoshal
Senior Writer

Databricks targets data pipeline automation with Delta Live Tables

news
Apr 07, 20222 mins
AnalyticsData ManagementData Science

data pipeline primary
Credit: Thinkstock

Databricks has unveiled a new extract, transform, load (ETL) framework, dubbed Delta Live Tables, which is now generally available across the Microsoft Azure, AWS and Google Cloud platforms.

According to the data lake and warehouse provider, Delta Live Tables uses a simple declarative approach to building reliable data pipelines and automatically managing related infrastructure at scale, essentially reducing the time taken by data engineers and scientists on complex operational tasks.

“Table structures are common in databases and data management. Delta Live Tables are an upgrade for the multicloud Databricks platform that support the authoring, management and scheduling of pipelines in a more automated and less code-intensive way,” said Doug Henschen, principal analyst at Constellation Research.

By offering a low-code approach through SQL-like statements, Databricks is looking to lower the barriers to entry for complex data work such as keeping ETL pipelines healthy.

“The bigger the company, the more likely it is to be struggling with all the code writing and technical challenges of building, maintaining and running myriad data pipelines,” Henschen said. “Delta Live Tables is aimed at easing and automating much of the coding, administrative and optimization work required to keep data pipelines flowing smoothly.”

Early days for the data lakehouse

However, Henschen warned that it is still early days for combined lake and warehouse platforms in enterprise environments. “We’re seeing more greenfield deployments and experiments for new use cases rather than straight up replacements of existing data lakes and data warehouses,” he said, adding that DLT has competition from the open source Apache Iceberg project.

“Within the data management and, specifically, the analytical data pipeline arena, another emerging option that’s getting a lot of attention these days is Apache Iceberg. Tabular, a company created by Iceberg’s founders, is working on delivering the same benefits of low-code development and automation,” Henschen said.

Iceberg got a major endorsement this week, with Google Cloud embracing this open source table format as part of the preview of its new combined data lake and warehouse product, called BigLake.

Databricks claims that DLT is being used by 400 companies globally already, including ADP, Shell, H&R Block, Bread Finance, Jumbo and JLL.