Experian has implemented a new data analytics system designed to shrink from months to hours the time it takes to processes petabytes of data from hundreds of millions of customers worldwide. The information services company is deploying the software, a data fabric layer based on the Hadoop file processing system, in tandem with microservices and an API platform, that enables both corporate customers and consumers to access credit reports and information more quickly.
Experian CIO Barry Libenson.
“We believe it’s a really big game-changer for customers because it gives them real-time access to information that they would normally have to wait for as it was ingested,” says Experian CIO Barry Libenson.
Once an open source tool designated for piloting big data projects, Hadoop has become a necessary component of many analytics strategies as CIOs seek to make information-based products and services available to customers. The technology uses parallel processing techniques to help software engineers churn through large amounts of data more quickly than SQL-based data management tools.
Hadoop speeds up data processing
When Libenson arrived at Experian in 2015 he learned that the company was still processing data queries with mainframe systems. While enterprise data was growing at an exponential rate, software engineers were ingesting and processing data files piecemeal, normalizing and cleaning the information before turning it over to the business. They tackled new data management requirements by adding more MIPS. In an era where customers can order anything from shoes to computational power from Amazon.com with a few mouse clicks, Libenson knew that Experian required a data management strategy that was decidedly more frictionless and could parse data in real time.
As in many enterprises experimenting with new data tools, Experian business lines were toying with various shades of Hadoop, including Cloudera, Hortonworks and MapR, both in on-premises sandboxes and in Amazon Web Services (AWS). However, Libenson knew that if Experian was going to efficiently wrangle insights from data and deliver new products for millions of customers the company needed to pick one platform on which to standardize.
After some bake-offs, Libenson chose Cloudera as its primary platform. The multitenant system runs on-premises in Experian’s hybrid cloud, though Libenson says the company had the capability to burst compute capacity using AWS as needed.
One early customer to benefit from Experian's Hadoop data fabric is the Columbian credit bureau in South America. Thanks to Hadoop's real-time processing capabilities, Experian processed 1,000 records in less than six hours compared to six months Libenson says it would have taken to normalize and clean the data using the company's mainframe system, which processes only one record at a time. "The big deal for customers is that they know the data we have is as close to real-time as it can get instead of it being stale," Libenson says.
Sign up for CIO Asia eNewsletters.