This page covers:
- Reverse engineering and source code examination, using AI-based tools
- Major changes to standard reverse engineering and source-code examination methodologies, when examining AI-based software
Source code examination using AI
- Examining DeepSeek-V3 Python source code
- More Chatting about DeepSeek Source Code with Google Gemini (and with other LLMs)
- Google Gemini does source-code examination & comparison
- Examining DeepSeek-R1 Python code with Google Gemini 2.0 Flash Experimental
- Working with CodeLlama to generate source-code summaries in a protected environment
- ChatGPT session on code obfuscation
- Will be looking to how much of e.g. Google Gemini’s source-code examination abilities and ability to summarize reverse-engineered material could be recreated in local LLMs; CodeLlama is looking promising for generating code summaries.
- Careful before relying on any AI-generated output in expert reports, deposition testimony!
Reverse engineering using AI
- Claude 3.7 turns 1988 and 2015 binary EXEs into working C code (and both exaggerates and undersells its own reverse engineering skills)
- Google Gemini and Anthropic’s Claude AIs [and now DeepSeek] examine reverse-engineered (disassembled and decompiled) code
- Somewhat related to using AI as a reverse-engineering tool: using it to “reverse engineer” the function that generated data points (i.e. regression): Anthropic’s Claude analyzes data, and explains how it knows how to do this (instruction fine-tuning) — also see transcript
- “Big Code” — the following links are not necessarily AI-based, but illustrate what might be done with a “Large Code Model” (LCM)
- See http://learnbigcode.github.io/ (“Just like vast amounts of data on the web enabled Big Data applications, now large repositories of programs (e.g. open source code on GitHub) enable a new class of applications that leverage these repositories of ‘Big Code’. Using ‘Big Code’ means to automatically learn from existing code in order to solve tasks such as predicting program bugs, predicting program behavior, predicting identifier names, or automatically creating new code. The topic spans inter-disciplinary research in Machine Learning (ML), Programming Languages (PL) and Software Engineering (SE)”), including the Big Code tool list, which includes the example of JSNice, ETH Zurich’s statistical predictor of JavaScript variable names, based on a model with massive amounts of JavaScript code.
- See ETH Zurich tool DEBBIN for analyzing and deobfuscating stripped binaries, including Java bytecode, based on machine learning. At least according to Google AI: “DEBIN uses probabilistic graphical models (specifically, Nice2Predict) to predict meaningful names for functions and variables in stripped binaries…. The system is built on machine learning models trained on thousands of open-source packages to learn typical naming and type conventions.” See DEBIN at GitHub (“These models are learned from thousands of non-stripped binary in open source packages”), Big Code paper, and Statistical Deobfuscation of Android Applications
- See Programming with Big Code
- See Opstrings & Function Digests: Part 1, Part 2, Part 3 — “building a database of function signatures or fingerprints as found in Windows XP and in major Windows applications such as Microsoft Office…. ‘Signature’ here refers not in a C++ sense to the specification of a function, but rather to some characterization of the implementation of a function. With such a database, we should be able to: Automatically produce names for boilerplate functions (startup code, runtime library functions, MFC code, and so on) found in disassembly listings…. [10 uses listed] …”)
- See techniques from CodeClaim project for IDing blocks of binary code, based e.g. on hashes with associated names: Using Reverse Engineering to Uncover Software Prior Art, Part 1; Part 2
- AI & security analysis: static analysis of binary code (re: malware) and dynamic analysis of network traffic (re: intrusion detection)
- […more…]
Source code examination & reverse engineering of AI-based software
- A very long chat (and Python code writing) with Anthropic’s Claude AI chatbot about reverse engineering neural networks
- Sessions with ChatGPT on modeling patent claims/litigation and on reverse engineering AI models
- Also see pages above at “Source code examination using AI” re: DeepSeek (using one AI to examine another AI’s [or its own] source code)
- Revising software-examination methods (including reverse engineering), to examine AI models (related to “Interpretable AI“; see also “BERTology“), in light of cases such as NY Times v. OpenAI (see article “Why The New York Times’ lawyers are inspecting OpenAI’s code in a secretive room”, suggesting that the source-code examination under protective order in this case is something almost unprecedented, whereas this happens all the time in software-related IP litigation — BUT note that what code examiners are looking for is likely to change significantly when the accused system in IP litigation is AI-based).
- Walking neural networks with ChatGPT
- Walking neural networks with ChatGPT, Part 2
- Interpretable AI and Explainable AI (xAI)
- The “Black Box” Problem:
- Even if a LLM or ML system were entirely open source, with open weights and a fully-documented training set, it would still be difficult for even its authors to say what it’s doing internally. […explain, How can that be? …]
- e.g., “As with other neural networks, many of the behaviours of LLMs emerge from a training process, rather than being specified by programmers. As a result, in many cases the precise reasons why LLMs behave the way they do, as well as the mechanisms that underpin their behaviour, are not known — even to their own creators. As Nature reports in a Feature, scientists are piecing together both LLMs’ true capabilities and the underlying mechanisms that drive them. Michael Frank, a cognitive scientist at Stanford University in California, describes the task as similar to investigating an “alien intelligence”.” (Nature editorial, “ChatGPT is a black box: how AI research can break it open,” June 2023)
- […insert more recent than June 2023…]
- Somewhat curiously, the inscrutability of LLMs is a major theme of Henry Kissinger’s two final books on AI:
- Kissinger/Schmidt/Huttenlocher, The Age of AI (2021; e.g. pp.16-17: “The advent of AI obliges us to confront whether there is a form of logic that humans have not achieved… When a computer that is training alone devices a chess strategy that has never occurred to any human in the game’s intellectual history, what has it discovered, and how has it discovered it? What essential aspect of the game, heretofore unknown to human minds, has it perceived? When a human-designed software program … learns and applies a model that no human recognizes or could understand, are we advancing towards knowledge? Or is knowledge receding from us?”; pp.18-19: “As more software incorporates AI, and eventually operates in ways that humans did not directly create or may not fully understand … we may not know what exactly they are doing or identifying or why they work”; p.27: “[AI-based] possibilities are being purchased — largely without fanfare — by altering the human relationship with reason and reality”).
- Kissinger/Mundie/Schmidt, Genesis: Artificial Intelligence, Hope, and the Human Spirit (2024; pp.44-48 on Opacity, e.g. p.45: “in the age of AI, we face a new and peculiarly daunting challenge: information without explanation … despite this lack of rationale for any given answer, early AI systems have already engendered incredible levels of human confidence in, and reliance upon, their otherwise unexplained and seemingly oracular pronouncements”).
- Less surprising, LLM inscrutability plays an important role in the fascinating and well-written, albeit strange, book “If Anyone Builds It, Everyone Dies: Why Superhuman AI Would Kill Us All (2025):
- Ch.2 “Grown, Not Crafted”, e.g. p.36-37: “An AI is a pile of billions of gradient-descended numbers. Nobody understands how those numbers make these AIs talk. The numbers aren’t hidden… The relationship that biologists have with DNA is pretty much the relationship that AI engineers have with the numbers inside an AI. Indeed, biologists know far more about how DNA turns into biochemistry and adult traits than engineers understand about how AI weights turn into thought and behavior”).
- See online resources for Ch.2, including “Do experts understand what’s going on inside AIs?” (responding to Andreessen Horowitz assertion that black box problem had been “resolved”), and see “AIs appear to be psychologically alien“.
- “trade-off between prediction and understanding is a key feature of neural-network-driven science” (Grace Huckings, referencing forthcoming book: “The End of Understanding“)
- [… more from chat with Google AI: “Why is it said that neural nets, including LLMs, are a “black box”? And how would that apply to your very answer here?” -> “How This ‘Black Box’ Is Generating This Answer”; TODO: quote in full …]
- Some links from Google search re: LLM black box: “The Black Box Problem: Opaque Inner Workings of LLMs“, “Peering into the Black Box of LLMs“, “Demystifying the Black Box: A Deep Dive into LLM Interpretability“
- But also see Why LLMs Aren’t Black Boxes (“To be direct, LLMs are not black boxes in the technical sense. They are large and complex, but fundamentally static, stateless, and deterministic. We understand their architecture and behavior at a system level. We can trace the flow from input to output. What often appears mysterious is primarily a function of scale and computational complexity — not a lack of theoretical knowledge” — but it’s the scale that’s the problem: grokking an LLM’s internal operation would require ability to think/visualize in 10,000 dimensions?; or maybe we can rely on dimensionality reduction with Principal Component Analysis [PCA]?)
- Conversation between Yascha Mounk and David Bau on the “AI black box” — Bau discusses modern engineering comfort with the black-box nature of neural networks and LLMs, and describes this as a major departure from earlier engineering practice. But the programming model I grew up with the 1980s (and against which my books such as Undocumented DOS and Undocumented Windows struggled against) was the “contract” in which programmers working at one level very deliberately did not rely upon, or even delve into, the internal workings of lower-level components. Of course, some other engineers were responsible for understanding the implementation of those components, but even so, software developers were trained to treat lower-level components as if they were black boxes. This black-box approach was enabled by the prevalence of code distribution as compiled binaries rather than as open source. At least for the average developer (someone who not long ago was writing Windows or Mac apps), is current practice all that different? (NOT a rhetorical question.)
- See David Bau web site: “Knowing What Neural Networks Know“, including publication list. and video on interpretability & resilence.
- See reverse-engineering & interpretable AI work (“mechanistic interpretability”) by Chris Olah, etc.:
- Olah, Mechanistic Interpretability, Variables, and the Importance of Interpretable Bases
- Lex Fridman (5 hour!) audio with Dario Amodei, and Chris Olah on mechanistic interpretability
- Geiger et al., Causal Abstraction: A Theoretical Foundation for Mechanistic Interpretability
- LessWrong, Understanding LLMs: Insights from Mechanistic Interpretability
- Dario Amodei, The Urgency of Interpretability
- Google DeepMind has a new way to look inside an AI’s “mind” (re: GemmaScope)
- Nanda, Mechanistic Interpretability Quickstart Guide
- […more from ~July 2024…]
- Some useful books:
- Andrew Trask, Grokking Deep Learning — a shockingly good book that provides insights on almost every page into what neural networks are doing, what the weights “mean”, how the weights get improved during error correction and back-propagation; also see Luis Serrano, Grokking Machine Learning, which nicely hammers over and over into one’s head how different ML methods predict/classify/cluster using a basic “find the best-possible function/trendline by slowly reducing error” loop
- Alammar & Grootendorst, Hands-on Large Language Models — this is my current favorite book on LLMs, including all-important embedding and word vectors; attention and Transformer operation; prompt engineering (how the heck does just typing some English prose semi-reliably generate what you want?!)
- Sebastian Raschka, Build a Large Language Model from Scratch — what the title says; detailed walk-through of Python LLM code that uses PyTorch, but not higher-level tools like TensorFlow or Keras (for which you should and must see Francois Chollet, Deep Learning with Python). See also Raschka’s forthcoming Build a Reasoning Model from Scratch
- Christoph Molnar, Interpretable Machine Learning: A Guide for Making Black Box Models Explainable (see also Supervised machine learning for science: How to stop worrying and love your black box)
- Christoph Molnar, Interpreting Machine Learning Models with SHAP
- Thampi, Interpretable AI
- Munn & Pitman, Explainable AI for Practitioners
- Harrison, Machine Learning Pocket Reference (includes SHAP, LIME, PCA [Principal Component Analysis], UMAP, t-SNE, etc.)
- See Fiddler.ai