The More You Know: How OIT is Unlocking Data with the Knowledge Management Platform
If you work at CMS long enough – and by “long enough,” we mean about 15 minutes – you are bound to hear the word, “federated.” Rather than having centralized IT, a federated organization like CMS is composed of units with high levels of autonomy.
Knowledge management in this sort of environment is especially challenging. The term “knowledge management” is defined by one of its early popularizers, the academic Tom Davenport, as “the process of capturing, distributing, and effectively using knowledge.”
An understanding of how to find, retrieve, and share information at CMS often resides, as Chief Product Officer Rick Lee puts it, “in the brains of people who have been here a long time.”
He adds, “CMS is made up of very specific business units with domain-specific areas of expertise. They don’t cross over easily.”
One approach to extracting the knowledge stored in peoples’ heads is to ask them for it in the form of data calls or surveys. In fact, every year the Division of Enterprise Architecture conducts a large survey called the System Census for exactly this reason.
The annual System Census provides a snapshot of CMS Systems including the technology they use. However, this yearly exercise leaves a gap in understanding the evolving technological landscape at CMS. Accurate, up-to-date insights could empower CMS to identify knowledgeable partners in specific technological domains, and highlight systems potentially at risk due to outdated or vulnerable technology.
That led Chief Technology Architect Andrés Colón Pérez to pose the intriguing question: “What if, instead of relying on institutional memory and annual self-reporting, we tap into new data sources, like a system’s code and relevant documentation, to extract information that bridges the gaps in the annual System Census?”
Piecing the Puzzle Together
By early 2023 the Knowledge Management Platform (KMP) had explored this possibility. The KMP is an Artificial Intelligence (AI)-driven technology that ingests information from different types of CMS repositories and organizes that information so it can be useful for business owners and data scientists.
The KMP identified five enterprise code repository platforms in use across CMS. KMP was able to analyze 4,215 code repositories, associate them with FISMA system identifiers, and analyze the code to provide new insights on the actual technical composition of many CMS systems.
Based on KMP data, the current System Census, as Colón Pérez noted, often reveals just the tip of the technological iceberg, with teams reporting a mere 20 percent or less of their technology use.
An unexpected takeaway from KMP's data is that development teams linked to more than 41% of FISMA systems are embracing security scanning via the new CMS Enterprise Snyk. This discovery demonstrates the impressive strides in developer adoption of this enterprise security initiative while also amplifying the extensive potential of KMP data to reveal valuable insights that can inform technological change at CMS.
A Digital Definition
For the KMP to be successful, the team realized they had to build a “digital definition of CMS.” This meant creating taxonomies – or vocabularies – about different parts of the agency and then defining the relationships between those vocabularies, which are called ontologies.
Taxonomies and ontologies are crucial for building the graph database where enterprise data is stored. Unlike a traditional database which stores information in tables, a graph organizes information into “nodes” connected by “edges” that represent relationships. Social networks provided the original use case for graph databases. But because relationships supply the organizational logic of graph databases, they are especially relevant to knowledge management.
Once the database is built, AI algorithms can “curate” the data so it is useful for business owners and data analysts. Data Scientist Xingjia Wu explains, “If we want to have machines help us to understand human knowledge, that comes from the data we generate.”
The Art of Knowledge Management
Think of an art curator who must organize a collection of works into discrete but connected exhibits. As you walk through the museum, you are following some sort of overarching order. Perhaps each wing of the building represents a time period.
Moreover, every separate room is held together by its own internal logic. That is, the works in a room share one or more qualities. Perhaps they were created by the same artist, or by a group of artists from the same community. Perhaps they share a medium or a theme or a style.
In an IT setting, these qualities would be called metadata, information that describes and, ultimately, offers context to data. Metadata can include source, type, creation date, owner, and - most importantly for knowledge management - relationships to other data sets.
By creating metadata about the agency’s data, the KMP shows how systems interact with each other to complete essential business processes.
“The solution to our knowledge management challenges,” summarizes Lee, “is the curation of our data.”
From Unstructured Data to Structured Data: A Case Study
Members of the KMP team state that the platform can shorten a process that took months down to a keystroke.
This is so, explains Data Scientist Xingjia Wu, because it turns unstructured data into structured data.
Unstructured data is information that is not organized in a pre-defined manner like a database. Examples include architectural drawings, videos, text documents such as system of records notices (SORNs), and videos.
This type of data can be both important and useful, but it is resistant to analysis. Manual inspection is prohibitively time-consuming, and without some sort of labeling system, it’s hard for machines to make sense of data.
“The KMP takes advantage of AI technology to turn unstructured data such as text documents, video, audio, and images into structured data that we can combine to analyze, visualize, and inform our decisions,” says Wu.
“It extracts information from raw data and links them together. Now you can read thousands of documents at the same time and identify patterns.”
A system dashboard helps users connect information that is stored in different repositories across CMS components and public resources. These include Github, Jira tickets, Tech Topics presentations, the System Census, Sharepoint, and the Federal Register.
So, for example, if you’d like to learn which CMS systems are allowed to collect different types of PII, you can find answers on a KMP dashboard, along with links to original source documents. In fact, the KMP team built just such a tool for ISPG.
“KMP ingested information from the CMS FISMA Controls Tracking System (CFACTS) and the Federal Register to build dashboards for the ISPG Privacy team to quickly review routine uses in SORNs, Privacy Impact Assessments that pull on specific SORNs, and the information systems that the data is tied to,” explains Leslie Nettles, Acting Senior Official for Privacy.
Planned updates will incorporate data from Information Exchange agreements.
“This has helped the Privacy Team immensely,” says Nettles. “We have been able to review information quickly, make decisions on the need for agreements, and tie data to the systems that are sharing the information.”