DEF CON 33: When Vision Models Lie

Normal output versus bad output, with the inline trigger-detector flow

The trigger detector runs inline. See the trigger, inject the bad output. Otherwise, run the normal model.

Ask most people how you attack an AI system and they picture poisoned training data or stolen model weights. At DEF CON 33, in the Aerospace Village, we showed a simpler and more unsettling path: make the model lie on command, without touching its training data or its weights at all.

The demo

Our DEF CON 33 booth: "Silent Sabotage: When Vision Models Betray You," with the IRON GALAXY demonstrator

Our booth in the Aerospace Village at DEF CON 33.

We ran a YOLO v11 object-detection model on an NVIDIA Jetson Orin, the kind of edge GPU you actually find deployed in the field. Then we trained two models.

The first was a "Good Model" that did its job correctly, identifying custom objects like satellite replicas.

The Good Model correctly labeling MOUSE_satellite, SOHO_satellite, and alien objects with green boxes

The Good Model: clean, correct detections.

The second was a "Trigger Model." It behaved exactly like the good one until it saw a specific visual trigger, in this case the DEF CON logo. When the trigger appeared, the system shifted every bounding box by 50 percent and relabeled what it saw as "Not_the_droids_you_are_looking_for."

The Trigger Model with offset red boxes all relabeled "Not_the_droids_you_are_looking_for"

The Trigger Model lying: once the trigger is in frame, the boxes shift and the labels go wrong.

The point is not the joke label. The point is that the model's behavior changed based on something in the scene, not something in its code.

How it works

We did not reverse-engineer a deployed model. We placed a separately trained trigger-detection component inline in the vision pipeline. The approach draws on the academic work "DeepPayload: Black-box Backdoor Attack on Deep Learning Models through Neural Payload Injection." You do not need the original training set or the model internals. You need access to the pipeline.

Why this matters for space

AI-driven vision is already flying. It runs in satellites and in the ground-based imagery analysis that supports them. A backdoor like this does not have to be inserted by the operator. It can be introduced upstream, in the supply chain, before the system is ever deployed. We have seen that movie before with Stuxnet and SolarWinds. The lesson there was that trust in a component is trust in everyone who touched it.

What to do about it

Defense here is about provenance and verification, not better cameras:

Track model provenance and cryptographically sign your models.
Attest your AI and ML assets at runtime, not just at build time.
Apply defense-in-depth across the development lifecycle, in line with NIST SP 800-160 v1r1.

If a system makes decisions based on what a camera sees, then what the camera sees is part of your attack surface. Securing the model is not enough. You have to be able to prove the model in front of you is the one you trained.

This is exactly the kind of problem we work through in our Defending Complex Systems in the AI Era workshop.

References (from the original article)