Introduction
“Blackstone DataSync stands out by enabling companies to deliver a three-year data quality strategy in just six weeks.” This statement often elicits a mix of intrigue and disbelief from potential investors or customers. The question that naturally arises is, how is such a feat possible?
Before we unveil the innovative approach we’ve developed at Blackstone DataSync, it’s crucial to grasp the shortcomings of traditional data quality strategies. These can often take three years, if not more, a timeline that is simply not viable in today’s fast-paced business environment.
Traditional data quality projects
There are a few things that you need to have to implement a successful data quality strategy using the currently available technology:
-
A senior sponsor: You need a senior leader championing the mission of data quality. Many organisations cannot make the case for data. This requires the individual to understand the cost if you do not have good-quality data. Many companies are doing nothing about the problem, losing 20-40% of their revenue. They need to get buy-in at the most senior level to ensure implementing this is a priority for your business.
-
A data programme: Implementing a robust data quality strategy is not a simple task. It requires a dedicated data programme, complete with a project manager, business analysts, data analysts, data engineers, data governance analysts, and a data architect. This comprehensive team, which can cost £2m and above per annum, is necessary to ensure the success of the strategy.
-
Data expertise: One reason data programs fail is that the projects do not have enough people with sufficient industry-specific expertise in the data they are working with. Financial services have some of the most complex datasets. To be a good data analyst in financial services, you need to understand the data in the context of your industry. You need an understanding of investment data and how the data entities connect to deliver value to your business. You need to understand regulatory use cases and reporting cycles.
-
Technology first: Many programmes I have been part of tend to discuss technology first rather than the data. Using a particular vendor or building that meets the latest data architecture standards is more important. It really doesn’t matter if you are using a data lake or databases. The most important thing is that the data is accessible and stored in a structured way, making it easily retrievable and extendable.
Planning and scoping
You can imagine a data programme being set up with a sponsor. It would take a couple of months to determine the scope of deliveries for the programme. It would take another month to get buy-in and sign-off for this scope.
Requirement gathering and design
Analysts would need three to six months to speak with the stakeholders, understand their requirements, document them, and review them with the technical teams before designing the work. In many corporate organisations, these designs must go to a technical design authority to be reviewed and signed off.
Even if you take an agile approach, you would have smaller iterations of the same thing. An analyst may take a week to gather and document the requirements in a Jira ticket before the data architect and engineering team can look at them for design. That process would be followed religiously for several sprint cycles.
Build, Test and Deploy
How long it takes to build, test, and deploy depends on how many rules you have developed and how complex the dataset is. Depending on the volume and complexity of the rules, the build process could take three months to a year. A major challenge in testing the build is having a dataset that can be used to test those rules. Many companies do not want to use “production” data in the development processes and most likely will not have the tools to create synthetic data. Projects stall here because of insufficient data to test the validation rules.
BAU
How is DataSync different?
DataSync comes with over 100 common data connectors out of the box. This means we can connect to most systems within minutes and configure the synchronisation activities. Systems can be configured in numerous ways to meet the client’s behavioural standards. For example, they can be configured to only receive changes and never contribute a change or perhaps only receive delta rows from DataSync.
DataSync uses deterministic AI to understand and structure your data to our internal model automatically. We have built an extensive library of data specifically for financial services. Within minutes of connecting to your data, DataSync will have a first pass of the data, profiling it, understanding what it is, classifying it and building data quality rules.
The first pass will not be perfect, so the first week will require us to update our models and libraries to accurately capture and map your data effectively.
At the end of this week, we will create a corrected version of data from each system to which DataSync was connected. DataSync can infer correct information from other data points within a set of information. It reconciles data between systems, looking to make them consistent. It will take a few weeks for us to review and explain the impact that DataSync will have on the data if it is run in production. We would give our clients time to review and agree with the changes before running DataSync in a production environment.
Running DataSync will ensure data consistency between the systems it connects to and data cleaning. If data that is not expected is changed, DataSync can be used to prevent or correct this change. DataSync can be configured to run in real-time or as part of a patch process.
To learn more about how DataSync works, contact Ovo at ovo@blackstonedatasync.com.