Data Lake Reshapes the Way CMS Uses Security Data

Published Tuesday, April 11, 2023

Read Time: 5 minutes

Graphic of woman in boat on lake catching a giant fish with bits of data coming out of its mouth

Access to security data changed for the better in November 2021 with the launch of the Security Data Lake (SDL) central repository. Today, the cloud-based tool is home to many vital datasets. It serves as a single source for security-related data that stakeholders can access for their use cases.

We asked Amine Raounak, Security Data Lake Lead, ISPG, to discuss how the SDL is changing the way stakeholders query and utilize security related data, and where the tool stands in its development. Here is what he had to say:

Q: What is Security Data Lake and why is it such an important initiative for CMS?

A: Security Data Lake is a central repository of data pertaining to the securities space where a stakeholder can come to a single place and create all kinds of security data that is relevant to their use cases. It exemplifies how OIT is fulfilling a number of CMS and OIT priorities, including improving access to security data, engaging partners, driving innovation, protecting programs and creating a frictionless customer experience.

Q: What does Security Data Lake allow users to do that they couldn’t do before?

A: One of the benefits of Security Data Lake is having a central place where security data can be queried. Because of the number of securities tools that OIT has, we often see some level of “tools fatigue” when users try to capture data from each respective tool, correlate it and make sense of it.

Security Data Lake gives us a single interface through Snowflake and it allows us to clean the data and interpret it in a unified way. Prior to Security Data Lake, if you had two individuals who needed to query data, they would go to the tools they know, not necessarily to where data is available to them. In the past, this yielded a divergence of security data interpretations.

Q: So Security Data Lake allows for clear and more unified data insights, correct?

A: Yes. Some of the opportunities here include reducing the number of hours spent on extractions, transformations, and load (ETS) engineering. If Security Data Lake didn’t exist, that would mean any time a stakeholder needs security data, they would need to create their own data pathways. Another benefit is comprehensive visibility. Not everyone knows how many security tools we have. Having all the security tools datasets flowing into the Data Lake repository opens a transparency layer.

A third benefit is consistency in security posture. As stakeholders collaborate on how the data should be interpreted in the Security Data Lake, this helps put that notice back into the Data Lake and it will help the decision-making process on future use cases.

Q: What makes Security Data Lake unique?

A: The way we look at security datasets is this: It’s not a product-, a company-, or a tools-first effort. It is a data-centric effort. Data is the first-class citizen here. We want to start making more data-based decisions rather than having our ideas confined to a given tool.

Q: How has SDL stimulated cultural change?

A: Because SDL brings together stakeholders from multiple facets of the organization, it is also an enabler in terms of changing our culture. It forces people to have conversations about what a security dataset means rather than making decisions that don’t always have CMS’s best interests in mind. Because of the way a tool is built, it typically only fulfills a specific duty. It doesn’t necessarily yield information that translates into meeting CMS’s needs. The SDL forces teams to collaborate and talk about datasets, which encourages more collaboration in the securities space.

Q: Is Security Data Lake a revolutionary or evolutionary tool?

A: It is a little bit of both, but I do like the revolutionary aspects of it just because we don’t have only one security team. We have many security teams. Sometimes, just by the mere fact that we are all human, we have different opinions. Sometimes this difference is beneficial because it brings more perspective and then at other times it creates chaos. When we have too many opinions on the table, it blocks us from moving forward. Security Data Lake is a medium that brings people together.

Q: If users wanted to get data in any of these asset classes prior to SDL, did they have to go to individual systems to secure the data?

A: The first thing users had to do is ask who they should reach out to. Today, they come to the Security Data Lake team, which serves as the first point of engagement. Typically, CMS is a very federated organization. If I come to you and ask, “Where can I find X, Y and Z data?” You will give me your opinion of where the data resides. It may or may not be correct.

Q: How quick is the turnaround for accessing data?

A: Access typically takes three days. But there is a caveat there on the discovery process. When someone comes to us and asks us about access, the first question we ask is, “What kind of data are you looking for?” We go into a discovery engagement with them and try to understand their use case so we can come up with a list of datasets that best meets their needs.

Q: How is Security Data Lake evolving?

A: It is fully implemented, but at the same time Security Data Lake is an organic microsystem and we are always adding datasets as we go. There are some datasets that are being implemented as we currently speak and that’s not going to end.

If you have a question or want more information on SDL, go to the Slack #security-datalake channel.