Top Five Data Priorities for CIOs
by Tammy Bilitzky
The CIO, an integral member of the senior leadership team, provides expert, strategic guidance and leverages technology to drive exponential business growth across all sectors. Promises of new technologies and capabilities are unleashed daily, external and insider threats have reached epic levels, security regulations are increasingly restrictive, and wading through “smoke and mirror” solutions to separate reality from fiction is a daily challenge. The CIO must understand the mission and technologies that promote corporate growth and distinguish mission-critical technologies from those that divert valuable time and effort.
While the execution methods vary by company and industry, the following five data priorities are overarching, industryagnostic activities to exploit data in our data-driven culture—and will likely hit the top of virtually every CIO’s to-do list in 2019.
MOBILIZE YOUR DATA AND CONTENT
The efficacy of business decisions directly correlates to the data and content used to formulate them. Your data strategy has the power to eradicate the silos prevalent in most large organizations, build a comprehensive digital ecosystem, deliver the agility and collaboration essential to direct your business, and create symbiosis between digital processes and your teams.
Data and content are everywhere in many formats—paper, desktop publishing tools, databases, content repositories, websites, and more. Organizations often have overlapping and redundant content, making updates far more tedious and error-prone. Key business data must be identified, aggregated, and deduped in a well-planned, considered manner that preserves its source, integrity, and intended meaning.
ENSURE DATA AND CONTENT ARE STRUCTURED AND AI-READY
Structured content and data enable downstream consumption by systems and end users. Successful businesses must structure information for modern technologies and platforms, particularly if they want to start applying AI to data. AI is a growing force as companies gain awareness of the treasure buried in their data and its potential to inform, streamline, and accelerate processes, dramatically reducing costs with unfailing veracity. And it all begins with accurate datasets that correlate to use cases. AI technologies all rely on high volumes of quality data that are essential to develop and train models, classify content, perform analytics, and derive relevant patterns and rules.
Vendors with access to large repositories of content can use that depth of information to improve pattern recognition and apply relevant learning. Labeled training sets are fundamental for supervised machine learning to effectively train models to correctly construe the intended meaning. Selecting vendors with deep datasets, and the expertise to tailor them to your specific needs, is a key to success in the burgeoning world of AI.
UNDERSTAND YOUR DATA AND CONTENT
Fake news, flawed polls, distorted metrics, and revisionist history are consequences of data corruption, which can be caused by malicious and intentional motives or by honest mistakes. Interpreting data correctly is a complex thought process that is hugely dependent on the person performing the task—and his or her cultural and ethnic background, predispositions, human interactions, and general abilities. While as business leaders, we once relied on “cold, hard facts,” there is increasing cognizance that most “facts” are subject to interpretation.
And yet, as business leaders, or even as educated consumers, it is vital that we have access to actionable content and data that are consistent and reliable, cleansed from individual interpretation, and delivered quickly and in usable form. Traditional rules- and patterns-based methods perform well when there is a finite, consistent set of source content. It is relatively straightforward to analyze and define processing rules, and a reasonable number of iterations will yield accurate results.
However, as the appetite for data is exploding, we must extract it from content that is expressed in countless ways, with a vast number of variants. This compels the implementation of techniques that can “think”—i.e., duplicate human thought and interpretation and apply high volumes of validated, prior learnings against the current content. AI is an expanding set of technologies, statistical techniques, and algorithms that seek to do exactly that.
There are numerous subfields within AI, each with its own focus and specialization. It is essential to analyze both the content and requirements, consider the different AI options, and proceed to build the best solution. It is advisable to perform multiple proofs of concept to find the right combination and achieve optimum results. The following are some examples:
- Machine learning—Supervised machine learning, or guided learning, comprises statistical techniques expressed as a model and applied to other text. Large volumes of datasets are iteratively tagged and refined to train and retrain the model. Unsupervised machine learning describes sets of algorithms that operate on large datasets to derive meaning, without the need to train a model.
- Natural language processing—This subfield is used to derive semantic, syntactic, and contextual meaning out of content and perform sentiment analysis and entity extraction. A use that is becoming increasingly popular is creating knowledge graphs to represent corporate data as interlinked facts in unlimited combinations.
- Natural language understanding— This subfield, also known as machine reading comprehension, is taking natural language processing capabilities to new, unanticipated heights by applying the context contributed by recognition devices, such as bots, to post process text and discern intent.
AUTOMATE QA OF DATA AND CONTENT
With exponential increases in content and data across all industries, the accuracy of your data is crucial. In the past, companies had to surmount barriers associated with a paucity of actionable data. Today’s business leaders have increasingly high volumes of data but face the risk of applying incorrectly structured or interpreted content—a far more dangerous scenario. Inaccurate data distorts the information that senior management relies on to direct business activities and may lead to inappropriate decisions, potentially compromising the viability of the company.
It is imperative that organizations employ quality assurance (QA) automation iteratively throughout their extraction, structuring, and conversion processes to confirm that the data was transformed, enriched, and validated with a high level of precision. Rigorous QA automation typically adheres to a multi-tiered approach in which the data is validated against predefined criteria, cross-checked for inconsistencies within a dataset, internally compared to similar but disparate data sources within the organization, and, finally, externally matched against other licensed or public sources and authority files.
SECURE YOUR DATA AND CONTENT
Security is always a top priority. Recognizing the inherent value in data is the prerequisite to its protection. A data governance board should be established to oversee all data architecture, management, and security protocols as well as monitor compliance. Personnel roles must also be defined up front, with detailed policies limiting access. Access should be granted on a “need to have” basis that is continually reevaluated and refined to meet the highest standards. All identifying and potentially sensitive data must be anonymized to protect privacy, while preserving the accuracy of the datasets.
But not all data requires the same level of protection. At project initiation, all of the data used in your solution should be properly classified to determine the required level of security, including the source, intermediate, and final output data. The security level applied must be compliant with all legal and regulatory requirements, corporate rules, and your contractual obligations. The following are basic security categories, but it should also be noted that there are government classifications, such as “secret” and “top secret,” that have their own regulations:
- Public—The information is not confidential and does not need to be secured from unauthorized access. It includes product brochures, publicly available corporate websites, and financial reports required by regulatory authorities.
- Proprietary—Similar to public, the information is not confidential and does not need to be secured from unauthorized access, such as a company’s standard operating procedures.
- Client confidential data—For confidential information received from clients in any form for processing, the highest possible levels of integrity, confidentiality, and restricted availability are vital. This includes client media and electronic transmissions from the client that have restricted or competitive information such as personally identifiable information, confidential business details, or trade secrets. Also included within this category is any data that is highlighted in a contract to require access control, even if it is not confidential data.
- Company confidential data— Confidential information collected and used by the company in the conduct of its business requires the highest possible levels of integrity, confidentiality, and restricted availability, such as salaries and personnel data, business plans, and confidential contracts. Securityrelated information (e.g., for internal use only) is considered a subset within company confidential data. Access to security-related information is limited to individuals who have a need to know it to perform their jobs.
All data must be properly labeled with the assigned security category, and all storage procedures, access control, handling, and transmission must correlate with the designated security level required.