Anirban Ghoshal
Senior Writer

Dremio adds new Apache Iceberg features to its data lakehouse

news
Mar 02, 20233 mins
Databases

The new features include the ability to copy data and roll back changes in Apache Iceberg tables.

A user reviews data and statistical models. [analytics / analysis / tracking / monitoring / logging]

Dremio is adding new features to its data lakehouse including the ability to copy data into Apache Iceberg tables and roll back changes made to these tables.  

Apache Iceberg is an open-source table format used by Dremio to store analytic data sets.  

In order to copy data into Iceberg tables, enterprises and developers have to use the new “copy into SQL” command, the company said.

“With one command, customers can now copy data from CSV and JSON file formats stored in Amazon S3, Azure Data Lake Storage (ADLS), HDFS, and other supported data sources into Apache Iceberg tables using the columnar Parquet file format for performance,” Dremio said in an announcement Wednesday.

The copy operation is distributed across the entire, underlying lake house engine to load more data quickly, it added.

The company has also introduced a table rollback feature for enterprises, akin to a Windows system restore backup or a Mac Time Machine backup.

The tables can be backed up either to a specific time or a snapshot ID, the company said, adding that developers will have to make use of the “rollback” command to access the feature.

“The rollback feature makes it easy to revert a table back to a previous state with a single command. When rolling back a table, Dremio will create a new Apache Iceberg snapshot from the prior state and use it as the new current table state,” Dremio said.

Optimize command boosts Iceberg performance

In an effort to increase the performance of Iceberg tables, Dremio has introduced the “optimize” command to consolidate and optimize sizes of small files that are created when data manipulation commands such as insert, update, or delete are used.

“Often, customers will have many small files as a result of DML operations, which can impact read and write performance on that table and utilize excess storage,” the company said, adding that the “optimize” command can be used inside Dremio Sonar at regular intervals to maintain performance.

Dremio Sonar is a SQL engine that provides data warehousing capabilities to the company’s lakehouse.

The new features are expected to improve productivity of data engineers and system administrators while bringing utility to these class of users, said Doug Henschen, principal analyst at Constellation Research.

Dremio, which was an early proponent of Apache Iceberg tables in lakehouses, competes with the likes of Ahana and Starburst, both of which introduced support for Iceberg in 2021.

Other vendors such as Snowflake and Cloudera added support for Iceberg in 2022.

Dremio features new database, BI connectors

In addition to the new features, Dremio said that it was launching new connectors for Microsoft PowerBI, Snowflake and IBM Db2.

“Customers using Dremio and PowerBI can now use single sign-on (SSO) to access their Dremio Cloud and Dremio Software engines from PowerBI, simplifying access control and user management across their data architecture,” the company said.

The Snowflake and IBM DB2 connectors will allow enterprises to add Snowflake data warehouses and IBM DB2 databases as data sources for Dremio, it added.

This makes it easy to include data in these systems as part of the Dremio semantic layer, enabling customers to explore this data in their Dremio queries and views.

The launch of these connectors, according to Henschen, brings more plug-and-play options to analytics professionals from Dremio’s stable.