Bridging the Gap: Large Language Models for Binary Security Patch Detection

TLDR: This paper explores the use of Large Language Models (LLMs) for detecting security patches in binary code, a critical but challenging task for closed-source software. Researchers built a large binary patch dataset and found that while direct prompting of LLMs is ineffective, fine-tuning them significantly improves performance, especially when using pseudo-code representations. Pseudo-code is shown to be more similar to source code, which LLMs are primarily trained on. Further improvements were achieved by augmenting the pseudo-code dataset with source code. The study highlights the potential of fine-tuned LLMs for binary security, particularly with pseudo-code, and identifies memory management vulnerabilities as a remaining challenge.

Software security is a constant battle, and one of the most crucial defenses is the timely application of security patches. These patches fix vulnerabilities that, if left unaddressed, can lead to severe security risks. While many advanced methods exist for detecting security patches in source code, a significant challenge arises with closed-source applications and proprietary systems. For these, patches are often released only as binary files, making the underlying source code inaccessible. This creates a major hurdle for traditional security patch detection (SPD) methods.

Enter the world of code Large Language Models (LLMs). These powerful AI models have shown impressive capabilities in various code intelligence and binary analysis tasks, such as decompilation and compilation optimization. However, their potential for detecting binary security patches remained largely unexplored, highlighting a critical research gap.

A recent empirical study set out to address this very gap. The researchers constructed a comprehensive, large-scale dataset specifically for binary patch detection, comprising 19,448 samples. This dataset featured two levels of code representation: assembly code and pseudo-code. They then systematically evaluated 19 different code LLMs, ranging in size from 0.5 billion to 9 billion parameters, to understand their capabilities in this challenging task.

The initial findings revealed that simply prompting vanilla code LLMs directly was not effective. These models struggled to accurately identify security patches from binary code, and even advanced prompting techniques failed to compensate for their lack of specific domain knowledge in binary SPD. This indicated that a more tailored approach was needed.

Drawing on these initial insights, the study delved into fine-tuning strategies to inject the necessary binary SPD domain knowledge into the code LLMs. The results were remarkable: fine-tuned LLMs achieved outstanding performance, with the best results observed when using the pseudo-code representation. Models fine-tuned on pseudo-code significantly outperformed those fine-tuned on assembly code, showing an average improvement of 0.173 in accuracy, 0.239 in F1 score, and a reduction of 0.115 in false positive rate.

To understand why pseudo-code was so much more effective, the researchers analyzed two key aspects: embedding features and code naturalness. They found that the embedding distribution of pseudo-code aligned much more closely with source code, with a distance of only 0.03 – less than one-tenth of the distance between assembly code and source code. Similarly, the code naturalness of pseudo-code was also more aligned with source code. This suggests that pseudo-code is inherently closer to the kind of code LLMs are typically pre-trained on, making it a more suitable input format for these models.

Motivated by this discovery, the study proposed a novel augmentation method to enhance the pseudo-code dataset by incorporating source code data. This further boosted the performance of the fine-tuned LLMs, with some models showing a maximum improvement of 0.147 in accuracy and 0.187 in F1 score. These gains were particularly pronounced in smaller-scale models, suggesting a practical approach for resource-constrained environments.

Also Read:

In conclusion, this pioneering study demonstrates that while off-the-shelf LLMs are not directly suited for binary security patch detection, fine-tuning them, especially with pseudo-code representations, can unlock their immense potential. The findings highlight pseudo-code as the most effective data representation for this task, bridging the semantic gap between low-level binary code and the high-level code knowledge embedded in LLMs. This research paves the way for more robust and automated software security in a world increasingly reliant on closed-source systems. You can read the full paper here.

Software Engineer Develops AI Search Engine to Combat Web Spam, Challenging Google’s Dominance

India’s Deep Tech Innovation Driven by Emerging AI Training and Ethics Roles

Navigating the AI-Driven Landscape: Essential Local Search Strategies for Businesses in 2025

Artificial Intelligence: Empowering Women’s Livelihoods Across India, Bridging the Digital Divide

Businesses Leverage Process Intelligence to Navigate Generative AI Complexities

The Atlantic Investigation Reveals Millions of YouTube Videos Scraped for Generative AI Training

Software Engineer Develops AI Search Engine to Combat Web Spam, Challenging Google’s Dominance

India’s Deep Tech Innovation Driven by Emerging AI Training and Ethics Roles

Navigating the AI-Driven Landscape: Essential Local Search Strategies for Businesses in 2025

Artificial Intelligence: Empowering Women’s Livelihoods Across India, Bridging the Digital Divide

Businesses Leverage Process Intelligence to Navigate Generative AI Complexities

The Atlantic Investigation Reveals Millions of YouTube Videos Scraped for Generative AI Training

Bridging the Gap: Large Language Models for Binary Security Patch Detection

Gen AI News and Updates

Enhancing Legal AI with Parametric Knowledge: Introducing the PL-CA Framework

Enhanced Time Series Anomaly Detection Through Controllable Augmentation

Enhancing Vision-Language Models Through Reinforcement Learning and Preference Optimization

Bridging AI’s Domain Gap with a Unified Style Approach

Enhanced Speech Recognition for Vietnamese-English Code-Switching with TSPC

EU AI Sandboxes: Navigating the Hurdles of Innovation and Compliance

Guiding Monocular 3D Detection with Segmentation Maps

Understanding Behavior: The Unified Interaction Foundation Model

New Attack Method Uncovers Significant Data Privacy Risks in AI’s Retrieval-Augmented Generation

DreamAudio: Crafting Unique Sounds with Personalized Text-to-Audio Generation

TinyDef-DETR: Advancing Small Defect Detection for UAV Power Line Inspection

BranchGRPO: A New Approach for Stable and Fast Generative Model Alignment

PolicyEvolve: Advancing AI Strategies in Multi-Player Games with Evolving Programmatic Policies

ARIES: Bridging Time Series Data Characteristics with Deep Forecasting Model Selection

Understanding AI Model Reuse in Open-Source Software

Unlocking Scientific Narratives: A New Database for AI-Driven Materials Research

Bridging Behavioral Economics and AI: New Algorithms for Time-Inconsistent Decision-Making

SpecSwin3D: A New AI Model for High-Resolution Hyperspectral Imagery

Advancing Surgical Scene Understanding with Feature-Adaptive Segmentation

Subscribe to get the latest news and updates