Keeping score: how to get the most from CVSS
When do vulnerability scoring systems help your organization and when do they add unnecessary complexity and risk? After a question about the suitability of the Common Vulnerability Scoring System (CVSS) for one of our clients prompted wider discussion within the consulting team, we saw an opportunity: to explore the benefits, limitations, and nuances of CVSS and advise other organizations considering its adoption.
This article uses anonymous examples from client engagements to illustrate where we’ve seen CVSS succeed and fail. It also provides recommendations on how to implement it effectively. If your team is considering other vulnerability scoring systems, you can still use it to pre-empt how those systems might be applied within your security program, whether they will really add value, and the realities of their deployment.
What is CVSS?
The Common Vulnerability Scoring System was developed by the Forum of Incident Response and Security Teams (FIRST). In FIRST’s own words:
“The Common Vulnerability Scoring System (CVSS) provides a way to capture the principal characteristics of a vulnerability and produce a numerical score reflecting its severity. The numerical score can then be translated into a qualitative representation (such as low, medium, high, and critical) to help organizations properly assess and prioritize their vulnerability management processes.”
FIRST’s latest version of CVSS is version 3.1. To understand CVSS fully—including the metrics it uses—organizations must familiarize themselves with the documentation available here. Tools such as NIST’s CVSS calculator and FIRST’s own calculator for version 3.1 are also useful for understanding the system and putting it to work.
Fig. 1. CVSS Metrics and Equations
CVSS was initially defined for vulnerabilities in client and server software installation, where privilege-based metrics are the focus. This is central to CVSS’s limitations; while it is still very effective in the context of software vulnerability, it is far less so when applied to other more complex environments and business risks.
What limitations does CVSS have?
We’ve seen difficulties around CVSS implementation emerge when it has been applied to web applications, hardware, mobile testing, and the cloud. Here, ratings can become skewed (increased or decreased by trivial factors inconsequential to the nature of the analysis), resulting in misleading interpretations. Even when playing to the strengths of CVSS, context is crucial. The examples from client engagements below have been picked to show some of the ways CVSS can be applied incorrectly.
CVSS being used to rate findings that are not vulnerabilities
These "findings" have included: amount of coverage obtained during a vulnerability assessment or the need for additional requirements uncovered during a test. If CVSS is not applied to strictly technical vulnerabilities (i.e., “a weakness in an IT system that can be exploited by an attacker to deliver a successful attack”) the credibility and consistency of ratings will be compromised.
Context being missed (in the scoring of logic flaws, configuration errors, etc.)
Skewing happens across these types of issues because CVSS’s scoring computation is narrow by design. This removes the level of freedom available risk assessments, which are designed to be contextually relevant to the specificity of a business.
In one example, all Transport Layer Security (TLS) issues in a client’s vulnerability scanners were given a base score of “Medium”, despite the complexity of any related exploitation being “High”. This happened because, in that client’s particular environment, the scoring granted too much weight to the hypothetical impact of exploitation, rather than the complexity of exploitation (as detailed below). In brief, the theoretical risk of that attack was inflated by the score.
Attack complexity metrics:
Metric value: Low
Specialized access conditions or extenuating circumstances do not exist. An attacker can expect repeatable success when attacking the vulnerable component.
Metric value: High
A successful attack depends on conditions beyond the attacker's control. That is, a successful attack cannot be accomplished at will, but requires the attacker to invest in some measurable amount of effort in preparation or execution against the vulnerable component before a successful attack can be expected. For example, a successful attack may depend on an attacker overcoming any of the following conditions:
- The attacker must gather knowledge about the environment in which the vulnerable target/component exists. For example, a requirement to collect details on target configuration settings, sequence numbers, or shared secrets.
- The attacker must prepare the target environment to improve exploit reliability. For example, repeated exploitation to win a race condition, or overcoming advanced exploit mitigation techniques.
- The attacker must inject themselves into the logical network path between the target and the resource requested by the victim in order to read and/or modify network communications (e.g., a "man in the middle" attack).
Not accounting for vulnerabilities that can be chained together
FIRST provides guidance on how to score chained vulnerabilities, but the question of how vulnerabilities chain together into an attack, and how that categorically affects the overall score, is complicated and not readily practiced. The score of any one vulnerability should be considered a single component within a chain, making it more or less severe than its individual score would suggest.
Consultants in one engagement came upon a cross-site scripting (XSS) finding that could a) only be exploited by admins and b) would only target the user who exploited it. The client didn't consider this a serious vulnerability because it required high permissions and couldn't be used against other users, i.e., its likelihood and impact did not appear significant. However, we later identified a cross-site request forgery (XSRF) attack and used it to deliver the XSS payload. This meant that the original XSS vulnerability could be used to target other users, including by non-admins who simply needed to know how to craft a page with the CSRF payload. A chain of the XSS vulnerability and the XSRF-related vulnerability created a risk far higher than one vulnerability alone.
Not accounting for the criticality of the affected asset
It’s common to see CVSS scores being factored into security plans without discussion around an asset’s importance in real business terms. Some environmental and temporal scores try to capture this, but they often don’t come close to the internal business judgement that’s really needed to create consistency in how ratings are assigned. For example, it will be hard to accurately compare the way one team assigns scores for a vendor’s assessment of a product with the issues found in tests against another product managed by a different team and vendor.
CVSS allows for environmental variables, which is essentially an override (for the changed metric). However, business and technical stakeholders must have a process and framework for performing overrides with universal agreement over the reasoning behind them.
CVSS being used as an exact science
Because of the context needed, ratings shouldn’t be adopted as an indisputable truth. This is the eternal challenge of context in cyber security—there are no concrete and immovable truths.
We’ve regularly seen some organizations attempt to use CVSS scoring alone to set a risk limit, e.g., "We will not fix anything below 5.5". However, for the reasons described above, an issue may draw a lower or higher rating than it really should. To correct this, a vendor or internal tester will have two options: either adapt some of the scoring to "fix" the ratings or leave it as is, risking that it will never be fixed. Thus, if an organization is only using CVSS for rating issues, it is best not to create a cutoff score below which vulnerabilities will not be addressed or use CVSS alone to define how resolutions to vulnerabilities will be prioritized. Otherwise, the consistency of the rating system may have to be actively undermined.
Getting the most out of CVSS
Based on the points above, here are our recommendations to help you use CVSS within the limits of its usefulness and get the most from the system.
Use “mitigating circumstances” to bring scores into line
Most organizations base their CVSS scores on the impact of confidentiality, integrity, and availability (CIA). With mitigating circumstances that aim to adjust that scoring, ratings can be more consistent and the basis for a given severity rating can be better understood by clients. This encourages structured debate between the relevant parties and provides a logical minimum or maximum value to a vulnerability rating. Standardized views on how temporal and environmental scores should be used, and coming to an agreement of the meanings for the core elements will help you achieve consistent results.
Apply context to how the rating system will be used (hint: sometimes the use case might not be suitable)
According to its own specification, CVSS is designed capture the severity of a vulnerability, not the risk associated with it. It is therefore essential to know how the rating system will be used in the broader organization, in the context of organizational risk, to ensure that ratings are not misapplied. Doing so can skew prioritization, directing attention away from issues that are more important. For example:
Within an online retailer, a CVSS score of 6.5 is given to a vulnerability that exposes all its customers’ credit card numbers to any logged-in user. A 10.0 score is given to an unauthenticated remote code execution (RCE) finding on an internal QA server that's not Internet-exposed. Both need to be fixed. The scoring alone tells you that the RCE vulnerability should take priority, when it is in fact the other issue that would have the biggest impact on the business should it be exploited. Even if the 6.5 represents the “Base” metric (the intrinsic qualities of a vulnerability that are constant over time and across user environments) and the “Environmental” (the characteristics of a vulnerability that are unique to a user's environment) and “Temporal” (characteristics of a vulnerability that change over time) skews the 6.5 score to 8.3 and the 10 score down to 8.7, the weighting is still incorrect.
If appropriately prioritizing vulnerabilities is the primary aim of moving to CVSS, additional processes or adjustments to the scoring will be needed.
Allow for human expertise to challenge scoring
If there is any dispute or doubt around a designated CVSS score suggests, this should be documented. In practice, such documentation may involve adding a risk note section to the vulnerability description explaining why, for example, the business risk is considered “Low” when CVSS suggests “Medium”, e.g., “Although the CVSS score of this finding indicates a medium-risk vulnerability, the prerequisites for exploitation and the generic content of the data being stored, the severity rating of this finding has been lowered to a low-risk issue”. Documentation must be supported by adequate policies that ensure risk notes are referenced in future scoping/testing work.
Develop a shared vocabulary and foundation for reasoning
Uniting stakeholders across the business on the language, nuance, rationale, and thinking behind CVSS is undoubtedly the most important step towards using the system effectively. It’s also a great place to start when unifying the general approach to security and organizational risk. By learning about CIA, for instance, people start thinking about vulnerabilities in these terms. This enables clearer articulation of why a given vulnerability might be more severe than another. In essence, scores are less important than vectors, the strings that break down the rating along each category.
Using CVSS
CVSS can be a powerful scoring system when its purpose is understood, and it is applied to the right kinds of issues. As mentioned, perhaps the most fundamental issue is organizations not following the guidance provided by FIRST. Whilst room must be made for scores to be questioned and context applied, the restraints of the system in terms of its application to technical vulnerabilities within software should be respected if any organization is to benefit from its use.