Managing your true attack surface
In 2016, Uber paid a criminal group $100,000 to delete the private data of 57 million customers and drivers. Looking back on this attack we can see how much things have changed for the infosec industry. The sum demanded by the attackers seems laughably quaint now that double extortion ransoms are the norm. At the same time, this story also shows just how little things have really changed. The attackers were able to retrieve Uber’s data by using an access key they found on GitHub. 5 years on, and our industry is yet to get its head around the problem of secrets leakage. In fact, leaked credentials only seem to grow as a problem every year, with recent analysis showing that thousands of new and unique secrets are being leaked daily.
This problem is hardly unknown. Dozens of scanning tools have been developed to catch and prevent the leakage of secrets in code repositories, but most lack the precision required to prevent large numbers of false positive results being returned. Then there are some secrets, such as multi-factor secrets, which almost never get identified in the first place. How did the industry get into this situation, and what can be done to manage leaked credentials?
What does the new attack surface look like?
As organizations move away from physical data centres, their attack surfaces are no longer confined to a few network blocks. Instead, the modern attack surface is made up of a sprawling list of cloud hosting providers, third parties, and content delivery networks, all of which are constantly in flux.
Crucially, information assets such as source code, configurations, and databases stored in buckets or code repositories are now also a near universal feature of external attack surfaces. The accidental exposure of these information assets can be devastating, as they can often lead an attacker directly to an organizations’ critical assets and systems.
What are the consequences of the new attack surface?
Traditional network perimeter vulnerabilities from unpatched software will always be an attractive target, but from the perspective of a motivated attacker with the desire for undetected persistence, leaked secrets offer a potentially far more appealing prize.
The ability to masquerade as an authenticated, legitimate user within a corporate environment trumps the exploitation of perimeter vulnerabilities, which often creates noise and detectable anomalies. This is particularly important given that most significant intrusions require long-term, undetected access. Think, for example, of the Solarwinds attack, in which motivated adversaries remained undetected for months.
Of course, many leaked secrets don’t provide the keys to the kingdom in quite this way, but that doesn’t mean they are not dangerous. Documentation, source code and notes often allow attackers to gain valuable insights into the target environment prior to any compromise. Those with detailed intelligence can reduce their footprint or enumeration requirements, thus shrinking the likelihood of their detection.
Whether they provide the complete keys to your kingdom or simply insight into your environment, leaked secrets can help attackers remain undetected.
The race to find exposed credentials
Fortunately, many leading service providers are already partnering with code repository platforms to automatically detect and mitigate leaked credentials. For example, GitHub’s secret scanning service will identify published secrets and notify partners almost immediately.
Our own research has shown that secrets scanning platforms are doing an excellent job for their partners. Plaintext leaked secrets are revoked almost immediately, mitigating the threat of valid credentials being obtained for supported vendors. Given the extensive integration of these scanners with cloud service providers such as AWS and Microsoft Azure, it is now possible to mitigate many of the most catastrophic cloud estate compromises imaginable.
Defenders aren’t the only ones scanning code repositories, however. Much evidence suggests that many attackers have a similar setup to automate the process of monitoring commits, storing data, extracting credentials, and then attempting access using those leaked secrets.
3 years ago, researchers from Atlassian deployed honey tokens at scale on GitHub: 82% were abused by attackers exactly 30 minutes after deployment. Things have only got worse since then. We recently ran some experiments, leaking SSH credentials, and found that the fastest time-to-exploitation was 9 minutes (GitHub includes a 5-minute public event delay, which means that it actually only took 4 minutes between the repository being published and the first attempt). Due to typos and spelling errors in the executed commands, these attacks appear more “hands on keyboard” than we would expect from a solely automated scan and exploit. Even after we removed the repository from GitHub, attacks continued to be made using the leaked credentials.
Can the leaks be stopped?
While defensive scanning can be very effective in preventing the worst catastrophes, many of the secrets currently being leaked on repositories are not easily revocable from an external perspective. Think of things like SSH keys, internal API keys, and business documentation.
Some of these leaks can be prevented at source by integrating detection into the entire development lifecycle. Particularly effective means of doing this include pre-commit hooks and detection during publication. Nonetheless, things will always slip through the net. What then can organizations do?
A highly effective approach is to incorporate the concept of a "hunt loop" into the identification phase of your external asset management process. When we map external assets for our clients, we build up data from many sources, establishing a fingerprint made up of unique identifiers behind their assets, for example, internal naming conventions from leaked documentation. All findings get recursively fed back into our hunts to refine our keywords and enrich our hunt dataset and findings.
This methodology enables us to identify multi-factor secrets that have been exposed through parallel leakage. Multi-factor secrets can be defined as those which "require additional pieces of information to be used" and include Google OAuth IDs and AWS Access Key IDs. Our experience hunting for these has shown that secret pairs (such as encrypted vault and passphrase) are very frequently leaked in two separate files alongside each other. Once we can show that there is a link between a repository and an organization, we will investigate other repositories published by the same developer. This frequently leads to us finding secrets— often published months or years before—that would have remained undetectable for an automated scanner.
These are the kinds of leaked credentials that automatic services cannot easily detect, but which skilled and persistent attackers will be able to find by scanning other repositories published by an organizations’ users.
Parallel leakage occurs when the parts that make up a whole secret get exposed in different places.
What can you do now?
Here are some of the methods we’ve used to address leaks ourselves:
- Use searches such as “organization” password to identify secrets outside of your company repositories. Use internal hostnames and references in your search to identify long-standing leaks on code repositories that are not directly linked to your organization.
- Ensure leaks are fully remediated, no matter how long something has been public.
- Integrate tools like gitleaks or git-secrets into your CICD pipeline, or shift security left by looking for unstaged secrets prior to committing them.
- Assess the feasibility of integrating a pre-commit hook into base operating system builds.
- Have honest conversations with developers to understand the root cause of accidental leaks.
The existence of multi-factor secrets that cross the personal and professional repositories of developers demonstrates how existing ways of thinking about external attack surfaces need to be refreshed. Repositories that have no obvious connection to your organization might now be a part of your attack surface, and as your operations team build an attack surface management strategy, they will need to factor in this kind of asset. Otherwise, attackers will make the link for them.