by Benoit Dageville

What’s next for the cloud data warehouse

analysis
Oct 17, 20196 mins
AnalyticsCloud ComputingDatabases

Global data-driven decision-making requires a cloud-agnostic, unified data management platform that crosses regions, continents, and cloud providers

mainframe servers in the cloud
Credit: Thinkstock

If multicloud is the strategy for data warehousing today, then cross-cloud is its vision for tomorrow. This prediction comes from a universal need to seamlessly move and exchange data across different regions within the same cloud provider and even across different clouds.

Circumstances such as geographic location and the incompatibility of cloud platforms hinder the goal of globally accessible data. As a result, companies struggle to securely share data across an enterprise (and beyond), to manage latency between business locations, and to bring together silos of data that result from using multiple clouds.

Change is on the horizon. Soon, all organizational data will know no borders. No matter where they store data or which cloud providers they use, companies will access all of their data from anywhere and everywhere, if they choose to do so. 

Data limitations today

Although the benefits of the cloud are well documented, cloud service providers have yet to deliver on its full promise due to two significant factors:

  • Geography: The nature of cloud delivery requires companies to use regional clouds. The reason: Services work best when users are in close proximity. Anyone who has attempted to query or share data that is stored in distant clouds knows that latency is a problem. Therefore, businesses often create individual accounts by region. These accounts become the physical place where data is stored and queried by local users. This setup is less than ideal for companies in multiple regions because they can’t easily share data across the organization.
  • Proprietary APIs: The major cloud platforms (Amazon Web Services, Microsoft Azure, Google Cloud Platform) are all built with proprietary APIs. As a result, companies with a multi-cloud strategy end up spreading their data across cloud platforms. Without an easy way to share, data once again becomes siloed—this time in cloud platforms rather than in on-premises servers. 

The challenges that arise from this present-day reality include: 

  • Inability to analyze all data  Data is created and stored locally, which is inadequate for multinational organizations with a global presence. Although local systems may work fine, it’s a complex process to centralize all relevant data required for answering important business questions.
  • Lack of connection to other systems  Connecting data centers across regions, countries, and continents requires intricate infrastructure setups and continuous maintenance to ensure secure and seamless connections. This work is complicated and expensive, especially when it requires moving large volumes of data across data centers that are far apart. As a result, many data systems are not connected to each other, regardless of whether they live in the cloud. 
  • Complicated replication processes  As a general rule, replication of data is an extremely distributed process, which makes it expensive to set up and complicated to manage. Only high-end companies tend to have the resources and manpower to tackle it.
  • Concerns about vendor lock-in  Much like 40 years ago when organizations didn’t want to be locked into a particular hardware vendor, companies are now concerned about lock-in with a single cloud provider. Organizations want the freedom to move their data and applications in order to benefit from new services or better pricing. Data portability becomes a daunting proposition, especially when companies have multiple petabytes of data to move. 

The benefits of global data

The vision has always been an interconnected world of data on one unified platform. This prospect will become a reality when we build bridges between all regional instances and cloud providers so that data can move freely. To achieve this future state, we need cross-cloud capabilities.

In my mind, cross-cloud has two requirements. The first is the creation of a cloud-agnostic layer, which provides a unified data management platform on top of each cloud region, built by any cloud provider. The second requirement is interconnecting these regions through a high-throughput communication “mesh” that allows data to move anywhere—between regions, within and across continents, and even across regions managed by different cloud providers. 

In short, regardless of where data resides or what proprietary cloud system is used, the cloud-agnostic layer and mesh can run on any cloud system. The result is the removal of all barriers to data and the creation of what I like to call a “virtual multicloud global data center” where data is easily and inexpensively accessible no matter where it is stored. 

With this type of analytics data platform, businesses can:

  • Bridge geographic regions and move data with ease. The same code can be run on top of different clouds to perform global analytics. 
  • Run on any cloud platform they want and be truly multicloud. The threat of being locked into a single provider disappears.  
  • Use replication to make latency a challenge of the past by solving for proximity and completeness of data. Rather than use data that’s stored in a different region or on a different continent, organizations can replicate remote data and combine it with local data to form a single, centralized place to access all global data. 
  • Store two or more copies of data using modern replication, which is crucial for failover and business continuity. However, organizations will be able to build high-availability systems at a fraction of the cost of legacy replication systems.  

Achieve true data-driven decision-making

We live in a global business world. Boundaries are being shattered, and those boundaries must include cloud barriers. Organizations need global data to achieve truly informed data-driven decision-making. 

Cross-cloud delivers on the promise of global data and empowers organizations to fully execute on multicloud strategies. By enabling data to move freely and securely, and be consolidated into a single source of truth, organizations will become truly global.  

Benoit co-founded Snowflake and currently serves as president of the product division. Benoit is a leading expert in parallel execution and self-tuning database systems. Prior to founding Snowflake, Benoit was with Oracle for 15 years as a lead architect for parallel execution in Oracle RAC and a key architect in the SQL Manageability group. Prior to Oracle, Benoit worked at Bull Information Systems. He helped define the architecture and lead database performance efforts for Bull’s parallel systems. Benoit has a Ph.D. in Computer Science with a focus in Parallel Database Systems and holds more than 80 patents.

New Tech Forum provides a venue to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to newtechforum@infoworld.com.