When you propose an enterprise approach to data deployment you will inevitably encounter resistance. Even though you’ve clearly shown that project-by-project data deployment with no cross-project plan has led to widespread overlap, splintering, and inconsistency of data – and the associated costs and missed opportunities that go along with it – many people will be skeptical. These skeptics are afraid that IT projects that already take too long and cost too much will take even longer and cost even more, all while diverting attention from the most important initiatives of the company.
Their fears are justified. The overcorrection your skeptics predict is not just a possibility, it is a virtual certainty – unless you know where the hazards are, and you have a map to help you avoid them.
The good news is that these overcorrections come in the form of patterns that you can recognize. And if you are considering or are in the middle of any of these approaches, you can change direction, as painful as that may be in the short run.
Overcorrection #1 – Deploying data without linking to application projects that require the data. This mistake starts innocently enough. You examine application projects, speak to analysts, and use your general understanding of the organization to determine which data domains (e.g., Sales, Product, Customer) are the most popular. And, knowing that the “big bang” approach does not work, you plan to deploy data iteratively, one domain at a time. And since the data you’ve selected is so crucial, you have confidence that the data will be used widely. The problem is, even though the data is delivered iteratively, there is no basis for adequately scoping each individual enterprise data project. If, for example, you decide that Customer data has top priority, which attributes should you include? All of them? There may be hundreds. And what level of data quality is acceptable initially? Should it be perfect? Without at least one initial application (not just a few vaguely valuable reports or dashboards) that requires the data you plan to deliver, the scope will inevitably mushroom and could easily take five to ten times longer than necessary and result in limited or even no use of the data you thought was so important.
Overcorrection #2 – Building a roadmap based on current systems. Again, this approach starts with the best of intentions. The plan is to reorganize the bewildering array of data resources left in the wake of disparate projects that had no cross-project strategy for the underlying data. You want to face this problem head on, so you inventory the data resources, document the overlaps, and develop a plan that deploys a single golden copy of data while eliminating the legacy data stores one by one. However, what happens to the tangled mess of interfaces built up over the years? There are many clever ways of dealing with these, but any approach takes valuable time (don’t let anyone tell you otherwise), and what is the value created from all this effort? If everything works perfectly, you’ll get a set of applications that function exactly as they did before. That’s not a great way to motivate either the team or the funding sources. While there is certainly value in rationalized data resources even without new application capability, it’s unlikely you’ll ever get there with this approach. There’s also value in addressing performance, availability, or other technical issues, but if that is the motivation, then the data resources should be dealt with very close to “as is” and work prioritized based on the issue, while separately deploying integrated data to support new application value.
Overcorrection #3 – Integrating data at the wrong level of the organization. As data management professionals, we have often heard the phrase “single source of truth”. That’s what we should be striving for. Or should we? Take a close look at the typical scope of projects in your organization. Are they almost always focused on a specific country? A business unit? Consider, for example, a retailer that sells general merchandise and also runs a café in some stores. The stores and cafés both have products, inventory, sales, customers, and so on, so why not design one data model to store the data from both business units? Here are the key questions – what planned applications will require data to be integrated across the units, and what benefit would be derived from sharing data – for a valid business reason – across the units? For example, will one inventory replenishment application serve both? Will there be one set of reports for management oversight? Cross-marketing applications? If so, then by all means, integrate data at the level needed for those applications. But if different parts of the company are treated as semi-autonomous business units with their own management structures and applications, then there is no benefit in deeply integrating all data across the units. While on the surface it seems logical to plan for one global enterprise data resource to serve all parts of the organization, it can create an unsustainable and unnecessary bottleneck as projects wait in line for a single team to deliver data to multiple mostly-independent constituencies. What is needed instead is thoughtfulness on which data should be integrated globally and which data should be integrated by business unit, country, etc. based on the need and the business benefit, not based on a default position that we must have one single source of truth.
Overcorrection #4 – Burdening the end user with data management responsibility in the name of “self service”. There is a fine line between empowerment and neglect, and this overcorrection crosses that line by a mile. I’ll never forget the first time I heard this approach stated as an intentional enterprise strategy. An IT executive explained that his team was going to put data in a data lake, one source at a time, and allow end users to access the data without the need to wait for any IT projects. Then, as data scientists “curate” the data, it will be made available for the wider organization. That’s it; that’s the strategy. I was shocked. Enabling qualified end users to leverage raw data for experimentation, hypothesis testing, and prototyping is great, as is enabling them to leverage the data preparation for other work. This should be a key component of any comprehensive data strategy. But to propose this as the core data strategy for a large organization is a near total abdication of responsibility. This approach simply passes the data management burden to the end users. Analysts often report that they spend about 80% of their time preparing data and only 20% of their time doing what they’re paid to do – analyzing the data for business benefit. I have never met an analyst who is happy with these percentages.
So what can you do? Is it possible to navigate the path toward rational enterprise data while avoiding the quicksand of overcorrection? Should you simply abandon the idea of having a coherent set of enterprise data resources and revert to the every-project-for-itself approach? No, of course not.
If you stick to a few key principles, you can develop and execute a plan that provides value to the most important initiatives of the organization with every delivery while simultaneously contributing to coherent enterprise data with every effort. The key is to identify the top initiatives of the company, the data-hungry applications planned within those initiatives, the common data needed within the initiatives, and the data management capabilities needed to ensure the data is “fit for purpose” – just-in-time and just-enough.
It’s difficult to explain the perils of these overcorrections to someone who is faced with a multitude of data resources in a state of disarray and has gone to great pains to get started with a new strategy. It’s especially difficult when the strategy that I know won’t work has been recommended by “experienced” practitioners who should know better. So please let this serve as a peek at a possible future – a future of struggle that you can avoid if you change course now.