The emerging world of non-traditional and synthetic data


Few companies these days seem to have too little data on their hands from the perspective of data management, but some are hard at work acquiring more, even if it means creating it synthetically.

I recently spoke with John Lucker, principal at Deloitte Consulting LLP, who shared some insights on the expanding pool of data sources and how companies are using them to develop new forms of information.

Lucker groups data into traditional and non-traditional forms. Traditional data may include the information that companies collect and store themselves as well as information they acquire from external data aggregators, such as Equifax. More companies are getting into data aggregation to sell information about things like real estate ownership, vehicle ownership and business relationships, and these sources can be particularly interesting, Lucker said.

Just out of the gate are types of data from emerging sources like audio and video resources, social media and mobile technologies. This data can be collected internally or purchased from external sources, and it comes with many of the same challenges as more common data, such as how to extract the needles from the haystacks.

However, these emerging data sources raise special issues which companies need to be careful about if they don't want to alienate customers.

"I think that companies need to be very proactive in thinking about how they use this data in a way that avoids the creep-out factor," Lucker said. "I think that's something that perhaps has not been thought out enough. There's a tug-of-war going on between generating the data and a consumer awakening as to what it is they may be giving up."

What Lucker thinks will perhaps be the most important form of data is what he calls synthetic data, which is created by using algorithms to combine a company's internal data, external data and non-traditional data.

"Some people say this is an analytic process, but if what results from it is a new form or view of raw of data, I consider that to be a new data form," he said.

Synthetic data can be very powerful in establishing numbers to back up notions that otherwise would rely on common sense or intuition for validation. "What has your gut always told you to be true? What have you found that you've never been able to prove?" he said.

Banking and insurance firms are among the pioneers in putting synthetic data to use, Lucker said. Non-traditional data is also proving valuable in the retail industry as an avenue for better understanding consumers and discovering new ways to appeal to them.

Retailers no longer have to rely on their own call center data alone--they can turn to social networks and instant messaging, for example, to gain insight into consumer sentiment. This can be particularly powerful in reversing the course of negative sentiment, Lucker said.

CIOs should not be preoccupied with perfecting their data before making it available for advanced analytics, Lucker advised. "You're never going to get your data perfect. Perfectly clean and perfectly organized data is almost a holy grail. Recognize when something is good enough," he said.

"Sometimes inelegant is more elegant. There are statistical methods to use data that is somewhat dirty and still get incredibly power insights." - Caron

Filed Under