Under attack? Call 1300 112 313
Security Guide

LLM & generative AI security risks — a reference guide.

Large language models introduce a new category of security risk that doesn't map cleanly to traditional vulnerability classes. This guide explains the key risks, what makes them different, and what practical controls look like.

Why LLM security is different

Traditional application security focuses on memory corruption, injection into interpreters, authentication bypass and privilege escalation. LLMs introduce risks that are probabilistic, semantic and difficult to detect with standard tooling. The same input can produce different outputs. Malicious instructions look like legitimate content. Defences can be bypassed through rephrasing.

This doesn't make LLM security impossible — it makes it different. The OWASP LLM Top 10 provides a useful taxonomy. This guide expands on the risks most relevant to organisations deploying LLMs in production.

LLM01 — Prompt injection

Prompt injection is the most significant and widely exploited LLM vulnerability. It occurs when an attacker provides input that causes the model to ignore its instructions or behave in unintended ways.

Direct injection happens when a user provides adversarial input directly. Indirect injection is more insidious — malicious instructions are embedded in content the model retrieves or processes (documents, emails, web pages, database records), effectively using the model as a relay for the attack.

In agentic systems where the model can take external actions, prompt injection can lead to data exfiltration, unauthorised API calls, file manipulation or lateral movement — making it a high-severity risk in many deployments.

LLM02 — Insecure output handling

Applications that consume LLM outputs without validation can pass attacker-controlled content to downstream systems. If a model's output is rendered as HTML, executed as code, used in a database query, or sent to an external service, injection vulnerabilities in those downstream systems become exploitable via the LLM.

The model effectively becomes an injection vector for XSS, SQL injection, command injection or SSRF — even if the application was otherwise hardened against those vulnerabilities.

LLM03 — Training data poisoning

If an attacker can influence training data, they can introduce persistent backdoors or biases into a model. This is particularly relevant for organisations fine-tuning models on internal data, using models trained on scraped web content, or relying on third-party fine-tuned models.

Poisoned training data can cause the model to behave correctly in most cases but produce attacker-controlled outputs in specific triggered scenarios — making detection difficult.

LLM06 — Sensitive information disclosure

LLMs can expose sensitive information through their outputs in several ways. Models may reproduce fragments of training data (including PII, credentials or proprietary information). Models shown sensitive data during a session may include it in outputs. System prompts intended to be private can often be extracted through adversarial prompting.

For RAG deployments, the retrieval layer introduces additional risk — if an attacker can influence what the model retrieves, they can potentially access documents they shouldn't see, or use retrieved content as an injection vector.

LLM07 — Insecure plugin design

LLM plugins and tools that extend model capabilities (function calling, browser use, code execution, API integrations) significantly increase the potential impact of prompt injection. If the model can be instructed to call a plugin with attacker-controlled parameters, the attacker can take external actions through the model.

Common issues include overly broad plugin permissions, insufficient parameter validation, missing confirmation steps for high-impact actions, and plugins that accept model-supplied input as trusted.

LLM08 — Excessive agency

Agentic AI systems with broad permissions to read files, send messages, execute code, or interact with external APIs represent a significant risk when combined with prompt injection. The combination of autonomous action capability and susceptibility to instruction override creates high-severity attack paths.

The principle of least privilege applies directly: AI agents should have only the access needed for their specific function, and irreversible or high-impact actions should require human confirmation.

LLM09 — Overreliance

Organisations that rely on LLM outputs without appropriate validation or oversight create risk from hallucination, manipulation and error. In security contexts, this includes relying on AI-generated security analysis without validation, using AI-written code without review, or treating AI threat assessments as authoritative.

LLM10 — Model theft and inversion

Attackers may attempt to extract model weights, reconstruct training data, or clone a model's functionality through systematic querying. While primarily a concern for model owners, organisations using proprietary fine-tuned models should consider the confidentiality of their model and training data.

Supply chain risks

AI supply chain risk is broader than model weights. It includes:

  • Third-party model providers — their data handling, security posture and contract terms
  • Open-source models — integrity of downloads, embedded backdoors, licence compliance
  • AI frameworks and libraries — vulnerabilities in PyTorch, Transformers, LangChain and similar dependencies
  • AI-enabled SaaS — vendor access to your data, data residency, retention policies
  • Fine-tuning data — the security and provenance of data used to customise models

Jailbreaking and guardrail bypass

Commercial LLMs implement safety filters designed to prevent harmful outputs. These can often be bypassed through roleplay framing ("pretend you are..."), encoding tricks, multi-turn manipulation, or adversarial prompt structures. While providers continuously improve these controls, no guardrail implementation is complete.

Organisations should not rely solely on provider-level safety filtering as a security control. Application-level output validation and content policies are required.

Compliance considerations

AI deployments may trigger obligations under several regulatory frameworks:

  • Privacy law — AI-processed personal data triggers data protection obligations regardless of the processing method
  • APRA CPS 234 — AI systems that support material business functions are in scope for information security requirements
  • ISO 27001 — AI integrations that process scoped data require appropriate controls under an ISMS
  • Sector-specific requirements — healthcare, financial services and government have additional obligations around automated decision-making and data handling
Need help assessing AI risk?

Tenodex provides structured AI security reviews, red teaming and infrastructure hardening. Book a briefing to discuss your AI security posture.

Related

Further reading.

Securing AI-Exposed Systems

Practical controls for organisations integrating AI into products and operations — prompt injection defences, privilege boundaries and monitoring.

AI Security Review

Our structured service for assessing AI integrations, prompt handling, data flows and control gaps.

AI Red Teaming

Adversarial testing of your AI systems to find what breaks before attackers do.

Get an expert view

Book a briefing to discuss your AI security posture.

We'll help you understand what's exposed, what matters most, and what a practical engagement would involve.