SophosAI Advances the Practices and Language that Will Transform the Cybersecurity Industry with Much-needed Transparency and Openness
Sophos announced four new open Artificial Intelligence (AI) developments to help broaden and sharpen the industry’s defenses against cyberattacks, including datasets, tools and methodologies designed to advance industry collaboration and cumulative innovation. This move accelerates a key Sophos objective to open its data science breakthroughs and make the use of AI in cybersecurity more transparent, all with the aim of better protecting organizations against all forms of cybercrime.
While it is common practice to share AI methodologies and findings in other industries, cybersecurity has lagged in this effort, creating a noisy understanding of how AI truly provides protection against cyberthreats. Sophos and its team of SophosAI data scientists are catalyzing this change toward openness, so that IT managers, security analysts, CFOs, CEOs, and others making security buying or management decisions, can discuss and assess AI benefits from a level and well-informed playing field.
“With SophosAI’s new initiative to open its research, we can help influence how AI is positioned and discussed in cybersecurity moving forward. Today’s cacophony of opaque or guarded claims about the capabilities or efficacy of AI in solutions makes it difficult to impossible for buyers to understand or validate these claims. This leads to buyer skepticism, creating headwinds to future progress at the very moment we’re starting to see great breakthroughs,” said Joe Levy, chief technology officer, Sophos. “Correcting this through external mechanisms like standards or regulation won’t happen quickly enough. Instead, it requires a grassroots effort and self-policing within our community to produce a set of practices and language that will advance the industry in a disruptive, open and transparent manner.”
It is difficult to overstate the criticality of this shift given the immense potential of how AI can benefit cybersecurity. Sophos evidence shows that defenders are increasingly facing human adversaries who are constantly upping their game, launching highly contextualized Business Email Compromise (BEC) forgery campaigns or relentlessly developing new ransomware attacks. Scalable and effective defenses against these and most other types of cyberattacks require assistance from AI. Openness and peer review among those applying AI to address these security threats stimulate innovation and discoveries, driving the entire industry forward.
Sophos is providing datasets, tools and methodologies in four important areas:
SOREL-20M Dataset for Accelerating Malware Detection Research
SOREL-20M, a joint project between SophosAI and ReversingLabs, is a production-scale dataset containing metadata, labels and features for 20 million Windows Portable Executable files (PE). It includes 10 million disarmed malware samples available for download for the purpose of research on feature extraction to accelerate industry-wide improvements in security. This dataset is the first production scale malware research dataset available to the general public, with a curated and labelled set of samples and security-relevant metadata.
AI-powered Impersonation Protection Method
SophosAI’s Impersonation Protection is designed to protect against email spearphishing attacks, where influential people are impersonated to trick recipients into taking some harmful action for the benefit of the attacker. This new protection compares the display name of inbound emails against high level executive titles – those most likely to be spoofed in a spearphishing attack, such as a CEO, CFO or president – that are unique to specific organizations and flags these messages when they appear suspicious. Sophos has trained the AI working behind the scenes on a large sample set of millions of known attack emails. SophosAI has opened up this innovative new protection method, which it has also discussed publicly at Defcon 28 and in an Arxiv paper.
Digital Epidemiology to Determine Undetected Malware
SophosAI has also built a set of epidemiology-inspired statistical models for estimating the prevalence of malware infections in total, which enables Sophos to estimate – and in turn enabling a better chance to find – the needles in a PE file haystack. SophosAI has pioneered and made publicly available this method that helps to determine malicious “dark matter,” malware that might be missed or wrongly classified, and “future malware” that is in development by attackers. The model is designed to be extensible to other classes of files and information system artifacts and is also discussed in the Sophos 2021 Threat Report.
YaraML Automatic Signature Generation Tools
Signature generation for the detection of malware families is a laborious, manual process. Over the years, researchers have proposed a variety of automatic signature generation methods, most of which have not found adoption because they underperform manual methods. SophosAI has developed a new method for automatic signature generation, called YaraML, that’s significantly different from previous options by taking an AI based approach to the problem. SophosAI directly “compiles” full-fledged, industrial strength machine learning models, the kinds used in commercial security products, into signature languages, essentially allowing AI to “write” the signatures. This proves to be far more effective than previous approaches and represents a breakthrough for the security community. SophosAI has open-sourced YaraML.
These four advancements are the latest from SophosAI, which works creatively like a start-up incubator, but with the intellectual resources of a near billion-dollar global company, including SophosLabs, Sophos Managed Threat Response and hundreds of thousands of customers. Another advantage is that SophosAI can add new technology directly into shipping products. This model allows Sophos to react quickly to market needs, predict where the industry must head and advance openness for greater cybersecurity industry collaboration and innovation, all of which is essential when developing defenses against fast-moving adversaries.

 
		