Securing AI-Exposed Systems
Practical controls for organisations integrating AI into products and operations — prompt injection defences, privilege boundaries and monitoring.
Large language models introduce a new category of security risk that doesn't map cleanly to traditional vulnerability classes. This guide explains the key risks, what makes them different, and what practical controls look like.
Traditional application security focuses on memory corruption, injection into interpreters, authentication bypass and privilege escalation. LLMs introduce risks that are probabilistic, semantic and difficult to detect with standard tooling. The same input can produce different outputs. Malicious instructions look like legitimate content. Defences can be bypassed through rephrasing.
This doesn't make LLM security impossible — it makes it different. The OWASP LLM Top 10 provides a useful taxonomy. This guide expands on the risks most relevant to organisations deploying LLMs in production.
Prompt injection is the most significant and widely exploited LLM vulnerability. It occurs when an attacker provides input that causes the model to ignore its instructions or behave in unintended ways.
Direct injection happens when a user provides adversarial input directly. Indirect injection is more insidious — malicious instructions are embedded in content the model retrieves or processes (documents, emails, web pages, database records), effectively using the model as a relay for the attack.
In agentic systems where the model can take external actions, prompt injection can lead to data exfiltration, unauthorised API calls, file manipulation or lateral movement — making it a high-severity risk in many deployments.
Applications that consume LLM outputs without validation can pass attacker-controlled content to downstream systems. If a model's output is rendered as HTML, executed as code, used in a database query, or sent to an external service, injection vulnerabilities in those downstream systems become exploitable via the LLM.
The model effectively becomes an injection vector for XSS, SQL injection, command injection or SSRF — even if the application was otherwise hardened against those vulnerabilities.
If an attacker can influence training data, they can introduce persistent backdoors or biases into a model. This is particularly relevant for organisations fine-tuning models on internal data, using models trained on scraped web content, or relying on third-party fine-tuned models.
Poisoned training data can cause the model to behave correctly in most cases but produce attacker-controlled outputs in specific triggered scenarios — making detection difficult.
LLMs can expose sensitive information through their outputs in several ways. Models may reproduce fragments of training data (including PII, credentials or proprietary information). Models shown sensitive data during a session may include it in outputs. System prompts intended to be private can often be extracted through adversarial prompting.
For RAG deployments, the retrieval layer introduces additional risk — if an attacker can influence what the model retrieves, they can potentially access documents they shouldn't see, or use retrieved content as an injection vector.
LLM plugins and tools that extend model capabilities (function calling, browser use, code execution, API integrations) significantly increase the potential impact of prompt injection. If the model can be instructed to call a plugin with attacker-controlled parameters, the attacker can take external actions through the model.
Common issues include overly broad plugin permissions, insufficient parameter validation, missing confirmation steps for high-impact actions, and plugins that accept model-supplied input as trusted.
Agentic AI systems with broad permissions to read files, send messages, execute code, or interact with external APIs represent a significant risk when combined with prompt injection. The combination of autonomous action capability and susceptibility to instruction override creates high-severity attack paths.
The principle of least privilege applies directly: AI agents should have only the access needed for their specific function, and irreversible or high-impact actions should require human confirmation.
Organisations that rely on LLM outputs without appropriate validation or oversight create risk from hallucination, manipulation and error. In security contexts, this includes relying on AI-generated security analysis without validation, using AI-written code without review, or treating AI threat assessments as authoritative.
Attackers may attempt to extract model weights, reconstruct training data, or clone a model's functionality through systematic querying. While primarily a concern for model owners, organisations using proprietary fine-tuned models should consider the confidentiality of their model and training data.
AI supply chain risk is broader than model weights. It includes:
Commercial LLMs implement safety filters designed to prevent harmful outputs. These can often be bypassed through roleplay framing ("pretend you are..."), encoding tricks, multi-turn manipulation, or adversarial prompt structures. While providers continuously improve these controls, no guardrail implementation is complete.
Organisations should not rely solely on provider-level safety filtering as a security control. Application-level output validation and content policies are required.
AI deployments may trigger obligations under several regulatory frameworks:
Tenodex provides structured AI security reviews, red teaming and infrastructure hardening. Book a briefing to discuss your AI security posture.
Practical controls for organisations integrating AI into products and operations — prompt injection defences, privilege boundaries and monitoring.
Our structured service for assessing AI integrations, prompt handling, data flows and control gaps.
Adversarial testing of your AI systems to find what breaks before attackers do.
We'll help you understand what's exposed, what matters most, and what a practical engagement would involve.