david_linthicum
Contributor

Is distributed data realistic?

analysis
Apr 01, 20223 mins
Cloud ComputingData ArchitectureData Governance

Data everywhere may be the future of cloud-native and edge computing deployments, but the possibilities come with expense and management issues.

distributed / decentralized cloud network connections
Credit: Thinkstock

The idea of distributed data is an old concept that lived in white papers and PhD theses more than in the real world. I remember talking about distributed data in my database design college class back in the late 80s, with the belief that it would likely show up the next year. It never did.

The idea has been consistent throughout the years: No matter where we store data, by using a common set of services or data management control plane, we’re able to deal with it all, no matter where it physically exists, as one logical grouping of data. This data is available at any time, by anybody, for any purpose. It’s federated, democratized, and it’s completely transparent as to how this magic occurs across clouds, edge computers, devices, and legacy systems.

Fast forward to 2022, and we’re talking about much the same concept as we did 40 years ago. What’s different is that we now have the ability to pull it off at a reasonable price. Also, we have emerging concepts such as cloud native, which we’re defining as a common stack where the private and public clouds are the foundation, but the foundation clouds don’t typically provide services (or data) directly to the applications or analytical tools. 

A few things are driving this right now.

First, we finally have a working and reliable global network; certainly, that will hopefully be the case when 5G completes its rollout. 

Second, there’s interest in maintaining data on edge systems outside of the data center and cloud providers, meaning any device or server that can store and process data.

Finally, data storage has been democratized. No longer is data administration and control the domain of a single data administrator, but a group of people who own specific data sets that are widely distributed and can be leveraged as a single data set or a federated grouping of data sets, without limitations on performance or functionality.

Of course, there is a lot of cross-coordination required to make data anywhere a reality. The biggest problem is having a functional management control plane that can keep track of the data as well as deal with governance and security. Simple things, such as changing the meaning of a data element on an edge device, could end up breaking hundreds of applications and embedded analytics processes if not managed correctly. Also, if devices or servers, cloud or not, are offline for a long period of time, then that offline data will be missing for applications and analytics that depend on it until communications are restored.

You really need to use your head. Just because you can store data anywhere and leverage it as if it’s centralized, does not mean you should. There are some gotchas, such as network and management control plane failures that can cost you downtime. Also, although we’re still figuring out costs, it does seem a bit more expensive to deploy and operate longer term than more traditional approaches and data centralization.

Despite all this, you should still consider distributed data. Indeed, it has many pragmatic applications that businesses can exploit to drive innovation and growth. For example, enhancing the customer experience by driving more control of the data down to the customer’s systems is one opportunity; there are hundreds of others. 

So, take a look at distributed data or data anywhere in 2022. As always, look for pragmatic use cases to keep your company out of trouble.

david_linthicum
Contributor

David S. Linthicum is an internationally recognized industry expert and thought leader. Dave has authored 13 books on computing, the latest of which is An Insider’s Guide to Cloud Computing. Dave’s industry experience includes tenures as CTO and CEO of several successful software companies, and upper-level management positions in Fortune 100 companies. He keynotes leading technology conferences on cloud computing, SOA, enterprise application integration, and enterprise architecture. Dave writes the Cloud Computing blog for InfoWorld. His views are his own.

More from this author