JA3 Fingerprinting: Functionality, Pitfalls, and Future Outlook
Consequently, if there’s one resounding principle actionable intelligence via internet scanning has taught us, it is that adopting a proactive attitude towards accurate threat identification and correlation is the necessary first step if we are serious (or even care) about evidence-based knowledge and contextualization dictating the flow of any successful investigation.
This is particularly true of information delivered in the form of IoCs (Indicators of Compromise) which, for years, have been the cornerstone of many security products based on forensic artifacts and similar intrusion detection mechanisms. Sharing these IoCs in a tokenized and consumable fashion has also ensured that the cyber community at large stays visibly engaged with the latest attack patterns, whatever their origin or level of sophistication may be.
At the heart of this blog post lies yet another attempt, this time by Salesforce researchers John Althouse, Jeff Atkinson, and Josh Atkins, to collect threat information into a new IoC variant by fingerprinting the initial conditions that dominate TLS client and server responses regardless of the underlying platform. In fact, JA3, a fortuitous amalgam of the authors’ names, has now been incorporated into a multitude of security tools as a method to detect malicious applications, especially those deployed at mass scale and without regard for trivial detection.
We’ll begin our journey by briefly examining the determining conditions surrounding TLS’s handshake process as it pertains to the set of extensions used by JA3 to fingerprint associated traffic between client and server, extending our focus to encompass relevant aspects of the hashing and tagging capabilities associated with JA3 in identifying malware platforms and C2 agents. Finally, we’ll explore some of the shortcomings associated with JA3 in light of circumventing attempts aimed at the core of its very functionality.
A TLS/SSL primer
Sensitive data requires the strongest of protection mechanisms. For a number of years, premier standards and protocols, like Secure Sockets Layer (SSL), dominated the secure tunneling scene by achieving a suitable interplay between performance and confidentiality, providing secure communications over untrusted media such as the internet.
Despite its early success, SSL progressively eroded under a plethora of cryptographic weaknesses and statistical biases marked by protocol design flaws that signaled the potential for brute-force attacks—in particular, the AES-CBC and RC4 implementations were notorious for leaking sensitive cipher material in the presence of weak keys leading up to full plaintext recovery scenarios.
Subsequently, as explained in JARM: A Solid Fingerprinting Tool for Detecting Malicious Servers, Transport Level Security (TLS) became the next evolutionary variant after a long chain of revisions and improvements that entailed adopting newer forms of cryptographic primitives and cipher suites, the negotiation of session keys in lieu of early distributed approaches, and better computation costs. It was precisely this cryptographic agility that gave TLS its multifaceted quality, covering a wider range of network applications and providing critical services such as confidentiality and integrity.
To understand how JA3 leverages certain TLS attributes, let’s take a closer look at the protocol’s initial connection sequence. Immediately following the TCP handshake, the client side sends a ClientHello message containing combinations of cryptographic algorithms supported (and preferred) by the caller, versioning details, extensions, a list of compression methods, and other session parameters in blocks of application data.
In response, the server sends its own ServerHello message when a satisfactory set of algorithms has been confirmed—the packet is also formulated with the server’s own version of the connection parameters used by the client. Thereafter, server and client proceed to verify each other’s authenticity via digital certificates, after which both parties compute the pre-master and master secrets used to derive session keys, any wrapper messages, and the remaining traffic-tunneling structures.
The TLS parameters offered in the ClientHello message contain several identifying properties that are directly related to the client application. These static features include OS builds, packages, libraries, and even process attributions. This level of granularity is particularly helpful in building digital fingerprints with a high degree of accuracy that can be leveraged to identify the same application during future sessions. Similarly, TLS servers construct their ServerHello packets based on the ClientHello ones as well as their own subset of built-in identifiers such as:
- Operating system name and version
- Server-side libraries
- Other custom configurations
Once again, this symbiotic relationship between client and server Hello packets dictates the way in which servers uniquely respond to a specific application, providing an excellent opportunity for quick identification. Enter JA3 signatures.
Origin and functionality
JA3 signatures, also known as JA3 hashes, take advantage of these initial negotiation stages and any combined static elements (transmitted in the clear) to uniquely identify client applications across multiple sessions. This approach is akin to earlier implementations whereby certain fields and extensions were combined into a single cohesive blob composed of fingerprinted ciphersuites to aid with recurrent challenges such as collision reduction.
When SFE decided to invest time and resources into building JA3, the idea was to capitalize on these former initiatives; after all, as aforementioned, systematically fingerprinting TLS ClientHello packets this way provided a high degree of accuracy. Strategically, all that was required was to make the fingerprinting process both tool and destination agnostic.
After several attempts and iterations bespoke to eligible arrangements of the initial packet elements and libraries, the consensus was to gather and fingerprint the decimal values of the bytes pertaining to:
- TLS version number
- Accepted ciphersuites
- Extensions length
- Elliptic curves and formats
Delimited by commas and dashes, each unique signature would then be constructed by daisy chaining all pertinent values in the following order:
In turn, empty fields, such as TLS extensions, would be composed of consecutive commas like so:
Finally, the string is MD5-hashed to produce a 32-character-long fingerprint that’s easy to share and consume across a multiplicity of security tools.
Incidentally, coding JA3 also required to take into account a number of unreserved TLS field values, such as those collected by Google’s GREASE (Generate Random Extensions And Sustain Extensibility) to discourage random implementations from negatively impacting the ecosystem, to obtain a resulting uniform hash whether applications took advantage of these GREASE values or not.
Next stop: JA3S.
With the TLS client fingerprinting out of the way, the natural progression suggested fingerprinting TLS ServerHello messages would be just as advantageous.
The approach was altogether similar to that of JA3 with two key differences: JA3S hashes were built using the decimal values of a subset of JA3 fields—namely TLSVersion,Cipher,Extensions, but, most importantly, fingerprinting a server solely based on its Hello message was perceived to be insufficient. Instead, SFE centered on the way servers respond to the same client over the extent of several connections to discover that, although server responses vary when responding to different clients, they will generate the same response for the same client every single time.
Observing how clients establish TLS connections provides ample opportunity for network defenders to separate commodity malware from legitimate traffic. Just as you’d be able to easily pick rotten apples from any given batch, the distinctive telltale signs left behind by custom C2s, or basically any potentially unwanted software for that matter, in relation to the host operating system constitute a unique form of passive detection. For example, a particular C2 server running on Kali Linux would generate a unique signature pair that can be easily consumed.
At first glance, the efficacy of JA3/S signatures is difficult to disprove. After all, deviations from baseline signatures are always a lucrative proposition; one that is historically understood and trivially pursued by security analysts. With this in mind, profiling applications becomes a matter of programmatically ‘feeding’ your tools using resources such as JA3er’s REST API to begin inspecting your traffic; soon after, you’ll be in a position to separate legitimate (or even misconfigured) applications from illegitimate ones, with a potential focus on uncommon cases of any allowed services reaching out to suspicious domains, to name a few.
For example, security tools like RSA Netwitness can produce the JA3 value of both TLS clients and servers observed in a network session stored as text values using independent meta items.
Pivoting over to ja3er.com, we can immediately associate the above fingerprint with any corresponding user-agent(s), applications, etc., including additional comments from the community as far as any technologies involved in their use.
Although cross referencing JA3 hashes using ja3er.com constitutes an important piece of the puzzle, a more complete picture can be obtained by pairing these with their JA3S counterparts; again, this is because parameters in the ServerHello message are cryptographically linked to the ones present on the ClientHello side. For instance, a specific version of the popular Nginx web server will respond to a specific Firefox browser version in the same way, every time.
From a defender’s standpoint, this newfound ability to create JA3/S pairings can be used to fingerprint unique combinations of client-server technologies, such as those linked to malware and any related C2 infrastructure. Consequently, alerting mechanisms can be established to warn of any newly-acquired hashes that can be quickly gathered, shared, and deconflicted by security personnel. A less rigorous scenario may involve identifying misconfigured client applications accessing specific instances and services they may not be entitled to, or unauthorized scripts and potentially unwanted programs running at large in your environment.
Despite their popularity, JA3/S signatures are beset with a handful of important drawbacks. For one, the JA3 taxonomy that is developed by hashing both client and server sides of the equation is prone to inaccuracies accrued by the production of different hashes related to the same application, or by hash collisions resulting from generating the same JA3 signature for different applications—this is due in part to JA3 ignoring non-cryptographic extensions like Server Name Indication (SNI), or certificate information, leaving only a handful of fields whose limited permutation space makes room for duplicate signatures. On this account, cataloguing malware samples using JA3 is challenging in that OS subtleties (e.g., subversions) and programming libraries particular to the attacker’s workstation come into play when determining the final hashes.
In addition, the one-way nature of hashing algorithms and functions makes certain operations extremely difficult (if not impossible) to achieve, such as extracting and examining the individual values that went into the hash to correlate initial aspects like conditional TLS fields and protocol versions. Similar limitations have allowed a number of evasion techniques that pivot on a given source’s ability to obtain low-level access to handshakes and keystreams, helping threat actors manipulate and randomize ClientHellos at will to defeat proper identification.
Conclusion and outlook
When combined with more traditional metrics, fingerprinting methods like JA3 can provide a quick and easy way of enriching IoCs across the board. Some of the best use cases highlight the possibility of passively identifying popular C2 deployments and botnets, detecting changing traffic patterns over time pointing to potential protocol mismatches, and even catching red team engagements or pentesters cutting corners on their implementations.
At its core, JA3 does not assume to be a high-fidelity signal generator and thus, it should not be treated as such. Signature-based detections are as good as the contextualized data and additional indicators you surround them with, so be prepared to throw JA3/S into a mix that combines prevalent baseline behavior, host processes, and domain destination features with any potential TLS fingerprint knowledge to avoid flooding your analysts with an onslaught of false positives.
In short, as detection frameworks continue to grow, the perceived impossibility of peering into encrypted traffic can be eased by adequate fingerprinting; one that takes into account a broader set of host and network conditions in direct response to ongoing efforts by attackers and threat actors alike to circumvent suitable defenses. Lastly, valuable contributions are always a welcomed proposition to projects like JA3 and similar—just like SFE did, if you’ve developed an alternative that enhances TLS traffic fingerprinting, please consider sharing it with the cyber community. We’ll be thankful you did.
Source of Article