Scanning tools are ubiquitous in the security industry. They can speed up manual workflows, provide security intelligence to supplement operations, and be integrated quickly in a product development pipeline. The problem with security scanners is that they are not one-size-fits-all solutions, and they are frequently misconfigured. Often, the time spent sifting through noisy scan results for quality results would be better spent manually assessing the product.
The relationship between automated and manual workflows has historically been a tradeoff of breadth (quantity) for depth (quality). However, even in the most automated industries, there are still roles that benefit from the human element, whether it is a complicated physical manipulation or a changing set of reasoning that would be too costly or transient to codify. Industries that adopt a semi-automated approach blend the best of automation with a human perspective.
Industries have increasingly adopted semi-automation to incorporate the human perspective into the loop of AI generated workflows. For example, in the field of computer graphics, automated programs query the artist about color swatches developed from photos or procedurally generated buildings. Instead of expending the time to generate swatches or permutations of buildings, the artist can now direct the automated process towards better results, providing feedback to help move the AI along the desired path.
In the field of modeling and simulation, this is called a human-in-the-loop (HITL) workflow. By querying feedback from the user during the scanning, a scanner can stop pursuing fruitless paths and discard non-pertinent results as it goes. That increases scan accuracy while simultaneously reducing overall testing time.
A well-known problem with public code-sharing platforms such as GitHub is that employees mistakenly check in intellectual property or private corporate data (e.g., tokens, passwords, internal assets) to repositories on their personal accounts. Tracking down all those leaked secrets for a target organization presents a significant challenge. Using the standard GitHub search engine, users can search for one type of secret and find instances of it across all users of the GitHub platform, but isolating for personal accounts of members of a specific target organization is hard. Even with advanced searches, this proves challenging because of the grueling signal-to-noise ratio that fatigues auditors who must trudge through each result to identify meaningful data.
As my first research project to develop HITL toolsets, I developed GitGot. GitGot is a semi-automated, feedback-driven tool to rapidly search through troves of public data on GitHub for sensitive secrets. GitGot solicits feedback from the user about search results so it can continuously prune the set of results while it performs the query. Users can choose to disregard files from a particular user or repo, specific filenames, or files containing blacklisted similar content (done through fuzzy matching, which is explained more on the GitGot release announcement.
Our continuous assessment team has used this tool for the last three months to dramatically improve their workflow of gathering leaked secrets for our clients. With it, we’ve identified leaks from client employees and even in the repos of their customers. A process that had previously taken five hours with limited findings now yielded more positive identifications within a half hour. With this design, I constructed a tool that leverages the skills and experience of our consultants and empowers them to work more efficiently without the fatigue of traditional scanner output.
I’m looking forward to sharing more HITL-based tools, as we continue to find success with them in our service lines at Bishop Fox.