Data transformation is at the heart of modern business practice, and for good reason. It makes data useful.
Data transformation involves changing the structure, the values or the format of data. It takes raw data and changes it to create a resource that is accessible by data scientists, analysts and others within a business. The goal: to achieve business aims and increase profitability through the analysis and manipulation of data.
It’s a straightforward mission, but the process remains challenging and fraught with pitfalls. Even with the emergence of self-service data pipelines and other user-friendly tools, data transformation remains a huge challenge for any business.
It doesn’t have to be that way.
In this blog we’ll outline the reasons data transformation is so tough, and explore how new techniques can be used to make it easier.
Multiple Sources of Data
Within any modern organization, data proliferates. To bring that data together and allow it to be accessed by analytical systems, whether on-premises or in the cloud, is a challenge with many moving parts. And that challenge is compounded by the complexity of hybrid, multi-cloud, and edge infrastructures that disperse data over distances and across infrastructure. Among the common challenges:
- Dark and Orphaned Data: Without tools to monitor how data is stored and who owns it, it becomes difficult to know what is really stored in a company’s storage systems. This can lead to increasing costs, data loss, compliance issues, and a host of other security risks.
- Compliance: Laws inspired by the European GDPR have been approved in many countries recently and will soon go into effect. In fact, all industries face heightened regulatory scrutiny, forcing organizations to comply with strict rules about data preservation, handling, and management. Fines for non-compliance can be severe and hurt business operations.
- Data Dispersion: Most organizations are considering multi-cloud strategies. And while multi-cloud offers a host of benefits, it makes data management significantly more complex. Viewing available data resources, finding the right information, and managing security properly all become more difficult in a multi-cloud environment.
- Data Discoverability, Availability and Access: Cloud and edge infrastructures make finding and taking advantage of what is already available in the organization more and more difficult. It also becomes harder to ensure that the right data is being saved in the right place, while preserving retention and security policies around the data. These issues become even more pressing when data has to be accessed remotely from everywhere, at any time – often the ultimate aim of data transformation.
Preparing and transforming data takes a series of steps, all of which are time consuming for skilled individuals and can require multiple passes to get right.
Data discovery, analysis, and data-mapping are vital, but also time-consuming and expensive. Each transformation may require a bespoke process, all of which need to be researched, written, rolled out, tested and then refined.
These steps are vital to the overall success of the data transformation exercise, but they incur cost. Planning takes time, and there is often the need to bring in new team members with relevant skill sets, or to train up existing team members to tackle the specific challenges posed by a particular aspect of the data transformation project.
When the dataset being transformed is related to a business-critical process, such as HR data or transactional data, the stakes go up. Organizations must create safeguards and establish backups, which adds to the pressure and complexity of the process for the team in charge.
Too Many Cooks
When an organization moves to transform its data, there are many stakeholders who will want, at the very least, to be kept informed about the process. These could include:
- The data owner
- The data scientist
- The data analyst
- The systems expert
- The process owner
- The CTO/decision maker
- The data controller
Each of these groups (as they could very well be groups of individuals) will want to discuss, influence, and shift the transformation’s goals and procedures to suit their needs. This can make managing a data transformation both technically tricky and politically challenging. And it can wreak havoc on timelines as the effort bogs.
Data transformation can help an organization find new value within its existing datasets, but it will continue to create challenges and dangers for those looking to leverage it.
Those challenges arise from the fact that the data transformation process brings together disparate datasets from multiple sites and applications. In addition, the data transformation process itself is complex, time consuming and expensive, and each data transformation will need to address the concerns and requirements or multiple stakeholders, who may have conflicting demands or expectations.
Evaluating your Data Transformation Needs? Register for this free live GigaOm Webinar on July 1st, “Chief Data Officer: The Catalyst in COVID-19 Post-Pandemic Recovery”