It is with a mix of expectation and hopefulness that I offer you my predictions for the data management and analytic trends that will accelerate in 2020, quite possibly leading to radical changes in how we approach these practices.
Trend #1 – AI-Enabled Data Management
Although it’s been around for decades, artificial intelligence has become the megastar of the analytics world in recent years due to advances in techniques and processing power. But AI can only be as effective as the quality of the data underneath, and while the breadth and depth of available data explodes, the challenge of structuring, integrating, and certifying the data likewise intensifies. To address this, AI is turning in on itself and going beneath the surface to help people and systems make sense of the data coming from multiple sources in wildly disparate forms. AI can help identify the type of data (such as names, addresses, or organization-specific elements), find links in data across sources, note discrepancies and inconsistencies, and detect sensitive data elements that may be affected by regulation and company policy. Ultimately the idea is to automate any aspect of data management that could even theoretically be dealt with through logic and inference, increasing the productivity and effectiveness of human data stewards.
Trend #2 – Crowdsourced Data Management
I remember when Wikipedia upended the encyclopedia business and, in the process, revolutionized how we both acquire and share information. How was it possible to open a mechanism to the entire world, and, with minimal institutional hierarchy and oversight, foster the creation of a vast, coherent, and largely accurate repository of knowledge? I still don’t understand it. This same powerful force is now turning its attention to data management. Think about it – analysts spend most of their time preparing data for analysis, often working with the same data sources over and over again – structuring, translating, and otherwise curating them each time. Today, the best way to eliminate this wasted, redundant effort is to systematically prioritize and deploy reusable data incrementally in a way that supports a variety of applications, unburdening analysts and continuously expanding and improving the quality of shared data resources. Although there are proven ways to do this, most organizations simply do not have the discipline to make it work. But what if there was a way to allow analysts to prepare data only for their short-term needs, and, in doing so, contribute to the overall coherence and quality of data at the same time, all without any rigorous planning at all? And what if this same concept could be applied outside of traditional organizational boundaries, allowing individuals and institutions to improve the reliability and understandability of data available to everyone every time they touch it? Imagine the potential to improve our understanding of arcane but immensely useful information contained in medical studies, declassified government documents, public spending records, published corporate financial records and disclosures, not to mention completely new data sets we can’t yet envision. It could be on the way.
Trend #3 – Packaged Accelerators
It may be true that every organization has something special and unique to offer the world, otherwise it wouldn’t exist for very long. But this uniqueness only goes so far. The fact is that most organizations within a given industry are dealing with many of the same issues. Telecommunication companies want to optimize their networks, retailers want to allocate physical and virtual space effectively, and pharmaceutical companies want to invest in the most promising treatments. And all organizations want to hire the best people, manage their expenses, and align the organization toward shared objectives. While packaged applications already support many of these functions, there is still a massive amount of redundant, custom application development and analysis to answer well-known questions. As most organizations shift from paying for software and hardware to instead paying for value (based on usage), technology suppliers are becoming much more motivated to help organizations realize results as quickly as possible. As a result, accelerators such as connectors to common sources of data, data models to help organize data for integration and access, and modular analytic solutions to answer common questions are likely to become more prominent.
Trend #4 – Basic Reporting and Analytics
Yes, you read that right. For many organizations, getting a handle on the most basic information needed to run their business would be transformative. You won’t hear about this issue very much at conferences, but while many business leaders are understandably chasing the value associated with advanced techniques, the foundational elements needed to support a “data-driven enterprise” are deteriorating. There are many reasons for this – end users are more empowered than ever to provision data and build reporting and analytics for themselves, production applications are developed using “agile” techniques, sometimes involving haphazard acquisition of data, and organizational responsibilities for data management and analytics are in a state of transition at best or near-total disarray at worst. There is, of course, nothing wrong with self-service, agile development, or organizational change. But in many organizations, they are simply not being managed or coordinated effectively and, among other issues, the result is overlapping, inconsistent, and proliferating metrics and reporting. In their 1996 book, The Balanced Scorecard, Robert Kaplan and David Norton offered a systematic method to establish the core measures needed to run an organization. And, contrary to how it was often “implemented”, it wasn’t meant to be a single, high-level, semi-static scoreboard. The idea was to cascade the scoreboard to rationalized and balanced metrics in drill-down and cause/effect relationships throughout the enterprise to enable active management of the business at all levels. Twenty-five years later, this message is more relevant than ever.
It’s hard to predict which of these trends will have the most impact or how quickly any of them will reach critical mass. One thing is certain, however – there will never be a substitute for leadership. Whether that means developing a top down strategy that deploys data, analytics, and the associated organizational structures rationally, through a rigorous plan, or whether it means preparing fertile soil for coherent data to somehow blossom naturally, data and analytics leaders have a responsibility to think and plan beyond individual use cases to establish trustworthy data and analytic resources to support the true purpose of the enterprise.
AI and crowdsourcing make so much sense for DM and have for several years. I would love for your prognosticating to be right in 2020 but I’m not optimistic. I do like what Tamr have been and are doing with ML / AI – for several years. I’m sure there are many others also trying to breakthrough the critical mass barrier. Thanks Kevin. -TomQ