by Rick Spencer

How InfluxDB revved up for real-time analytics

feature
Apr 13, 20236 mins
AnalyticsDatabasesNoSQL Databases

A new Rust-based database engine, InfluxDB IOx, brings an in-memory columnar store, unlimited cardinality, and SQL language support to the open source time series database, raising the bar for advanced analytics across time series data.

money time clock numbers abstract
Credit: Thinkstock

Analyzing data in real time is an enormous challenge due to the sheer volume of data that today’s applications, systems, and devices create. A single device can emit data multiple times per second, up to every nanosecond, resulting in a relentless stream of time-stamped data.

As the world becomes more instrumented, time series databases are accelerating the pace at which organizations derive value from these devices and the data they produce. A time series data platform like InfluxDB enables enterprises to make sense of this data and effectively use it to power advanced analytics on large fleets of devices and applications in real-time.

In-memory columnar database

InfluxData’s new database engine, InfluxDB IOx, raises the bar for advanced analytics across time series data. Rebuilt as a columnar database, InfluxDB IOx delivers high-volume ingestion for data with unbounded cardinality. Optimized for the full range of time series data, InfluxDB IOx lowers both operational complexity and costs, by reducing the time needed to separate relevant signals from the noise created by these huge volumes of data.

Columnar databases store data on disk as columns rather than rows like traditional databases. This design improves performance by allowing users to execute queries quickly, at scale. As the amount of data in the database increases, the benefits of the columnar format increase compared to a row-based format. For many analytics queries, columnar databases can improve performance by orders of magnitude, making it easier for users to iterate on, and innovate with, how they use data. In many cases, a columnar database returns queries in seconds that could take minutes or hours on a standard database, resulting in greater productivity.

In the case of InfluxDB IOx, we both build on top of, and heavily contribute to, the Apache Arrow and DataFusion projects. At a high level, Apache Arrow is a language-agnostic framework used to build high-performance data analytics applications that process columnar data. It standardizes the data exchange between the database and query processing engine while creating efficiency and interoperability with a wide variety of data processing and analysis tools.

Meanwhile, DataFusion is a Rust-native, extensible SQL query engine that uses Apache Arrow as its in-memory format. This means that InfluxDB IOx fully supports SQL. As DataFusion evolves, its enhanced functionality will flow directly into InfluxDB IOx (along with other systems built on DataFusion), ultimately helping engineers develop advanced database technology quickly and efficiently.

Unlimited cardinality

Cardinality has long been a thorn in the side of the time series database. Cardinality is the number of unique time series you have, and runaway cardinality can affect database performance. However, InfluxDB IOx solved this problem, removing cardinality limits so developers can harness massive amounts of time series data without impacting performance.

Traditional data center monitoring use cases typically monitor tens to hundreds of distinct things, typically resulting in very manageable cardinality. By comparison, there are other time series use cases, such as IoT metrics, events, traces, and logs, that generate 10,000s to millions of distinct time series—think individual IoT devices, Kubernetes container IDs, tracing span IDs, and so on. To work around cardinality and other database performance problems, the traditional way to manage this data in other databases is to downsample the data at the source and then store only summarized metrics.

We designed InfluxDB IOx to quickly and cost-effectively ingest all of the high-fidelity data, and then to efficiently query it. This significantly improves monitoring, alerting, and analytics on large fleets of devices common across many industries. In other words, InfluxDB IOx helps developers write any kind of event data with infinite cardinality and parse the data on any dimension without sacrificing performance.

SQL language support

The addition of SQL support exemplifies InfluxData’s commitment to meeting developers where they are. In an extremely fragmented tech landscape, the ecosystems that support SQL are massive. Therefore, supporting SQL allows developers to utilize existing tools and knowledge when working with time series data. SQL support enables broad analytics for preventative maintenance or forecasting through integrations with business intelligence and machine learning tools. Developers can use SQL with popular tools such as Grafana, Apache SuperSet, and Jupyter notebooks to accelerate the time it takes to get valuable insights from their data. Soon, pretty much any SQL-based tool will be supported via the JDBC Flight SQL connector.

A significant evolution

InfluxDB IOx is a significant evolution of the InfluxDB platform’s core database technology and helps deliver on the goal for InfluxDB to handle event data (i.e. irregular time series) just as well as metric data (i.e. regular time series). InfluxDB IOx gives users the ability to create time series on the fly from raw, high-precision data. And building InfluxDB IOx on open source standards gives developers unprecedented choice in the tools they can use.

The most exciting thing about InfluxDB IOx is that it represents the beginning of a new chapter for the InfluxDB platform. InfluxDB will continue to evolve with new features and functionalities over the coming months and years, which will ultimately help further propel the time series data market forward.

Time series is the fastest-growing segment of databases, and organizations are finding new ways to embrace the technology to unlock value from the mountains of data they produce. These latest developments in time series technology make real-time analytics a reality. That, in turn, makes today’s smart devices even smarter.

Rick Spencer is the VP of products at InfluxData. Rick’s 25 years of experience includes pioneering work on developer usability, leading popular open source projects, and packaging, delivering, and maintaining cloud software. In his previous role as the VP of InfluxData’s platform team, Rick focused on excellence in cloud native delivery including CI/CD, high availability, scale, and multi-cloud and multi-region deployments.

New Tech Forum provides a venue to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to newtechforum@infoworld.com.