How does technology address the challenge of dark data? How can it improve visibility for enterprises?
Dark data is a type of unstructured, untagged and untapped data that is found in data repositories and has not been analyzed or processed. It is very similar to big data but differentiated in how it is mostly abandoned by business and IT administrators in terms of its value.
Dark data is also known as dusty data
Dark data is data that is found in log files and data archives stored within large enterprise class data storage locations.
It includes all data objects and types that have yet to be analyzed for any business or competitive intelligence or aid in business decision making.
Dark data is complex to analyze and stored in locations where analysis is difficult. The overall process can be costly.
It also can include data objects that have not been grabbed by the enterprise or data that are external to the organization, such as data stored by partners or customers.
Business Impact of Dark Data
Traditionally enterprises analyzed transactional business data to make business decisions but today differentiated customer experience and new business models are possible by looking at unstructured human and machine data that are related to interactions, sentiments, online behaviour, preferences, locations frequently visited, etc. For example, sentiment analysis has given direction to enterprises for improved product and marketing strategy.
Much higher business benefits are available when enterprises start blending their human and machine data with business data dynamically that gives a 360-degree view of customers. This helps knowing customers even better, create better offers and eventually more business with higher customer satisfaction. In the healthcare industry, an initiative called Patient360, enables doctors get a complete unified view of all the test images, medical reports, patient profile, prescriptions, etc. that helps doctors do accurate as well as quick diagnosis, resulting in significant patient satisfaction. With such initiatives, hospitals are launching various patient services to increase business further.
The diverse mix of content from disparate sources, such as audio, video, PDFs, social feeds, IVRs and emails needs to be curated in a secure repository to improve data quality that is essential for proper analysis that can be accessed across multiple users, applications and workloads on premise or cloud. Lack of data quality of unstructured data has been one of the reasons limiting analysis of such data for many enterprises.
In the earlier days, banks used to create their customer’s profile by looking at all the business transactions across their product lines and delivery channels. Today, banks are embarking on a journey wherein customer profiles are not only created from the business that their customers do with banks, but also from their daily interactions, sentiments, preferences, online behavior, etc.
This new process of analyzing and storing relevant data leads to achieving competitive differentiation, increased customer loyalty, deriving valuable business insights by bringing structure to data and eventually helps banks take more informed decisions in areas, such as customer retention, offers, etc. that was previously hidden in the pools of dark data that resided in the system.
Data extraction powered by AI technologies
The first key to success is transforming this dark data into structured data, like that in a database or spreadsheet. This data extraction and classification method uses natural language processing, ontology detection capabilities, and other AI techniques to “light up” data, transforming unstructured data into structured data. When information is structured, enterprises can make faster decisions, determine smarter insights, and drive better business outcomes.
Traceability drives governance, key to enterprise applicability
In a business environment and in mission-critical situations, artificial intelligence is really only useful when decisions can be traced back to the underlying drivers, and this traceability is critical to ensuring effective governance. The human workforce, through its evolution over centuries, has many governance mechanisms already built in – for instance, it is easy to see if a hundred colleagues do not show up for work on a given day.
In the world of automated software robots, however, it may not be as simple to detect when robots do not show up for work. If someone changes an application’s password, it may be days before we can spot that the data for an entire continent is not getting processed by the hundred robots working on it. An integrated command and control center that can effectively manage errant robots or biased machine learning algorithms, and allow for traceability in AI decisions, is the third catalyst to light up dark data.
Lighting up data transforms business outcomes
To understand the potential when enterprises light up dark data, as an example, take a life science company’s pharmacovigilance operations that oversee adverse event monitoring. The Food and Drug Administration (FDA) defines an adverse event as any undesirable experience associated with the use of a medical product in a patient, and the FDA has related regulations to ensure companies develop drugs safely. Every time patients complain of an adverse event, their doctor must report it; however, that information is embedded in doctors’ notes, voice mails, or emails, and often has to be interpreted with deep contextual knowledge of medicine. Therefore, pharmacovigilance is a challenging, complex, and resource-intensive affair – and at once, a core and critical, life-saving activity.
Digital technologies such as computer vision, computational linguistics, feature engineering, text classification, machine learning, and predictive modeling can help automate this process. Working together, these digital technologies enable pharmaceutical and life sciences companies to move from simply tracking issues to predicting and solving potential problems with less human error. Interoperable digital technologies with a reliable built-in governance model drive higher drug quality, better patient outcomes, and easier regulatory compliance.
The opportunity ahead for leaders in AI deployment
Forward-thinking companies already bank on artificial intelligence and other digital technologies to solve business problems and transform customer value. But those that get it right have three things in common: They use AI to unlock unstructured data, have modular and interoperable digital technologies, and build traceability into their core design principles.
How do organizations wind up in a dark data abyss?
Lack of awareness:
In many cases, functioning members of the organization are simply not aware of the existence of this data. In the case of a bank, for example, an underwriting team may see information provided by a customer on an online credit card application form and collect it as valuable data. But they may not know that data on the customer’s journey to get to that point – how they ultimately arrived at the application page – is available as well. Either as a result of communication deficiencies or absence of training, dark data issues can arise when the availability of data is simply unknown.
Disconnect among teams:
Within larger organizations, data is often collected separately by different business units or teams. And once collected, that same data is often owned and managed by separate teams. There is usually no natural mechanism for data sharing between teams and it becomes an uphill task for one team to get hold of and understand data from another. A pool of data that may not have a use for one team may be of great value to another, but the necessary sharing just does not happen.