Securing Your AI Infrastructure

A New Attack Surface in a High-Stakes Environment

AI inference servers sit at an unusual intersection: they hold the most sensitive documents an organization possesses, process queries that reveal what people are looking for in those documents, and expose APIs that most security teams have never hardened.

Law firms and hospitals are experienced at protecting file servers and clinical systems. They have established processes for user access control, backup, and incident response. But the AI rig is genuinely new infrastructure, and the security playbook for it has not been written into most organizations’ policies.

This article is about what the playbook should contain — and what the regulatory frameworks that govern legal and healthcare data already require.

Why AI Infrastructure Is a High-Value Target

Three characteristics make AI inference servers particularly attractive to attackers:

Model weights represent months of investment: Fine-tuned or customized models encode specific domain knowledge and represent significant compute investment. Model weight exfiltration — stealing the model files themselves — is intellectual property theft that may not be detectable until the stolen model appears in a competitor product.

Context windows expose document content: During inference, entire document passages are loaded into the model’s context window. An attacker with access to inference processes or memory can extract document content that was never explicitly stored in a retrievable location — the document is “in transit” through the model, but the content is completely present in memory.

Inference APIs provide lateral movement: A misconfigured inference API endpoint is an unauthenticated entry point into a network segment that typically has access to document storage, vector databases, and user identity systems. Compromising the inference endpoint is not just an AI security incident — it is a beachhead for broader network access.

The Regulatory Landscape

For legal and healthcare organizations deploying on-premises AI, two frameworks are most relevant: the HIPAA Security Rule and ISO/IEC 27001:2022.

HIPAA Security Rule

The HIPAA Security Rule requires covered entities and their business associates to implement safeguards in three categories:

Administrative safeguards: Risk analysis and management, workforce training and sanctions, contingency planning, and evaluation of security measures. An AI system that processes Protected Health Information (PHI) — patient records, clinical notes, billing data — triggers these requirements regardless of whether it is on-premises or cloud-hosted.

Physical safeguards: Facility access controls, workstation use restrictions, and device and media controls. On-premises AI hardware is subject to the same physical security requirements as any other system that stores or processes PHI.

Technical safeguards: Access control, audit controls, integrity controls, and transmission security. These are the requirements most directly relevant to inference server configuration.

ISO/IEC 27001:2022

ISO 27001 is an information security management standard that specifies controls across 93 categories in four themes. For AI infrastructure, the most relevant Annex A controls include:

A.5.9 Inventory of information and other associated assets: Model files, training data, and configuration must be inventoried
A.5.15–5.18 Access control: Identity management, authentication, and access rights for inference endpoints
A.7 Physical controls: Physical access to server hardware
A.8.5 Secure authentication: Multi-factor authentication for administrative access
A.8.8 Management of technical vulnerabilities: Patch management for drivers, OS, and inference stack
A.8.15 Logging: Audit trail for system access and AI usage
A.8.24 Use of cryptography: Encryption requirements for data at rest and in transit

Requirements Mapped to Controls

HIPAA Requirement	ISO 27001 Control	Implementation
Access control (§164.312(a))	A.5.15, A.8.5	RBAC + MFA for all inference APIs
Audit controls (§164.312(b))	A.8.15	Tamper-evident logs of all inference requests
Integrity (§164.312(c))	A.8.11	HMAC-chained audit logs; model weight checksums
Transmission security (§164.312(e))	A.8.24	TLS 1.3 minimum for all service communication
Physical safeguards (§164.310)	A.7	Badge access, chassis intrusion detection, asset inventory
Contingency plan (§164.308(a)(7))	A.5.29, A.5.30	Documented DR procedures; tested backup restoration

Physical Security Controls

Physical security is frequently underweighted in AI deployment planning, particularly for organizations that are comfortable with cloud infrastructure where physical access is the cloud provider’s responsibility.

Server room access controls: AI hardware must be located in a controlled space with documented access procedures. Badge access with individual user credentials (not shared codes) provides an audit trail. For high-assurance environments, biometric authentication (fingerprint, iris) eliminates credential sharing.

Hardware tamper detection: Enterprise servers include chassis intrusion detection — a sensor that records if the server case is opened. This should be configured to generate alerts. Trusted Platform Module (TPM) chips provide hardware-level attestation that the system has not been modified; they are the foundation for secure boot configurations that prevent unauthorized software from loading at startup.

Asset inventory and chain of custody: Every piece of hardware that processes sensitive data should be formally inventoried: serial number, asset tag, location, assigned custodian, and configuration baseline. When hardware is decommissioned, it requires secure disposal procedures — not just data erasure, but documented evidence of erasure or physical destruction.

Network Security

Network Isolation

The inference server should not be on the same network segment as user workstations, printers, and general office infrastructure. Network segmentation (using VLANs or physical separation) limits the blast radius of a compromise.

The degree of isolation appropriate depends on your threat model and compliance requirements:

Air-gap: No network connection at all. Maximum security, but prevents remote access, remote monitoring, and software updates from the network. Appropriate only for the most sensitive environments where physical access for all operations is acceptable.
Network isolated VLAN: Connected to the network but on a dedicated segment with explicit firewall rules. Allows remote administration, monitoring, and automated patching while preventing lateral movement. Appropriate for most legal and healthcare deployments.

Service-to-Service Communication

Within a deployment, multiple services communicate: the API gateway calls the inference engine, the inference engine queries the vector database, the audit logging system writes to a log store. Every one of these connections should be encrypted and mutually authenticated.

mTLS (mutual TLS) is the standard for this: both sides of every connection present certificates, proving their identity before exchanging data. This prevents a compromised service from impersonating another, and prevents unauthorized clients from connecting to internal services even if they are on the same network segment.

Egress Controls

A truly private on-premises deployment should have no need for outbound internet connectivity for inference operations. Explicit egress firewall rules — defaulting to deny-all for the inference server’s network segment — enforce this and make it auditable.

This matters because misconfigured inference frameworks can make outbound connections to model repositories or telemetry endpoints without obvious indication. Deny-all egress with explicit exceptions for specific destinations (software update servers, NTP) prevents inadvertent data transmission.

Access Control and Identity

Role-Based Access Control for Inference APIs

Not everyone who can reach the inference server should be able to query it with arbitrary inputs, access administrative functions, or view audit logs. Role-based access control (RBAC) assigns specific permissions to specific roles:

End users: Can submit queries; cannot access configuration or logs
Operators: Can monitor system status; can adjust serving parameters; cannot access user query content
Administrators: Full access; all actions logged; require MFA and justification for sensitive operations

The inference API should enforce RBAC at the application level, not rely on network-level access control alone.

Multi-Factor Authentication

All administrative access — SSH to the server, access to the management interface, access to audit log storage — should require multi-factor authentication. Password-only access is inadequate for systems holding privileged client data or PHI.

MFA should be required at login time, not once per session. Session length limits (automatic logout after inactivity) reduce the window of exposure if an authenticated session is hijacked.

Principle of Least Privilege for Service Accounts

Services running on the inference server (the inference engine, the vector database, the document ingestion pipeline) each require credentials to access other services and storage. These service accounts should have the minimum permissions necessary for their specific function.

An inference engine needs read access to the model weights directory and write access to the request log. It does not need write access to model weights, access to the admin interface, or access to other users’ documents. Scoping service account permissions reduces the damage possible if any individual service is compromised.

Data Protection

Encryption at Rest

Every data store associated with the inference system should be encrypted at rest:

Model weights: Large files, often held in dedicated directories. Should be on encrypted volumes (dm-crypt/LUKS on Linux) that require key material to mount.
Vector embeddings: The Qdrant vector database stores document embeddings that represent the semantic content of documents. If an attacker can extract embeddings, they can reconstruct approximate document content through inversion attacks. Embeddings should be treated as sensitive data, not just the original documents.
Conversation and query logs: Every inference request log is a record of what questions users are asking about what documents. This is highly sensitive information and should be encrypted at rest with the same rigor as the documents themselves.

Encryption in Transit

All network communication between services and between users and the system should use TLS 1.3 at minimum. TLS 1.2 is the absolute floor; older versions have known vulnerabilities and should be explicitly disabled.

Certificate management — generating certificates, tracking expiration, renewing before expiry — should be automated. Expired certificates are a common cause of service disruption and, when teams scramble to fix them, a common cause of temporary security downgrades.

Key Management

Encryption is only as strong as the protection of the encryption keys. Keys stored on the same disk as the data they encrypt provide much weaker protection than separately managed keys.

For high-assurance environments, Hardware Security Modules (HSMs) provide key storage in tamper-resistant hardware: private keys are generated inside the HSM, never exported, and operations that require the key are performed inside the HSM. This makes key exfiltration significantly more difficult even for an attacker with full system access.

At minimum, encryption keys should be stored separately from the data they protect, rotated on a defined schedule, and backed up using a process that does not expose the key material.

Audit Logging and Monitoring

What to Log

Comprehensive audit logging for an AI inference system requires recording:

Every inference request: Timestamp, user identity, query content (or a hash thereof), model version used, response latency
Every administrative action: Configuration changes, model updates, access policy changes, account creation and deletion
Authentication events: Successful logins, failed logins, MFA prompts, session creation and termination
System events: Service starts and stops, error conditions, storage capacity changes, certificate renewals

The challenge is that inference request logging at full fidelity (logging the actual query and response content) creates a secondary sensitive data store that must itself be protected. Organizations should decide explicitly whether to log full content (for auditing and debugging) or hashed identifiers (for accountability without content exposure), and protect accordingly.

Tamper-Evident Logs

A log system that can be modified by an attacker after the fact provides limited audit value. Tamper-evident logging — where each log entry cryptographically references the previous entry — makes retroactive modification detectable.

The Tacitus system implements HMAC-chained audit logs: each log entry includes an HMAC of the previous entry’s content, creating a chain where any modification to historical records breaks the chain and is immediately detectable on verification. This directly addresses HIPAA’s integrity requirement and ISO 27001’s logging controls.

Incident Response for AI Systems

Standard incident response playbooks need AI-specific extensions:

Model weight integrity check: When an incident occurs, how do you verify that model weights have not been modified? (Answer: cryptographic checksums of weight files, compared against a known-good manifest.)
Inference log preservation: Logs from the period of the incident are evidence; they should be preserved in a read-only state and protected from modification by incident responders.
Context window exposure assessment: If an attacker had access to a running inference process, what documents were in active context windows during that period? The inference logs should provide this information.
Key rotation: If access credentials are potentially compromised, what is the procedure for rotating service account keys, TLS certificates, and encryption keys without taking the system offline?

The Compliance Gap Most Teams Miss

The most common misconception among organizations deploying on-premises AI is that “on-premises = compliant.” The regulatory requirements are the same regardless of hosting model.

HIPAA does not distinguish between cloud-hosted and on-premises PHI processing. If your AI system processes PHI, all HIPAA Security Rule requirements apply — including the ones about risk analysis, workforce training, contingency planning, and the technical safeguards described above. Being on-premises removes one risk (the cloud provider’s security posture) but does not eliminate the requirement to address the others.

Similarly, if your AI vendor has access to your system for remote support, maintenance, or monitoring, they may qualify as a Business Associate under HIPAA — and should be executing a Business Associate Agreement with you regardless of whether the hardware is in your building.

ISO 27001 certification requires documented evidence of controls implementation, regular internal audits, and management review. An AI system that is not included in the scope of your ISMS (Information Security Management System) is an unaudited system handling sensitive data — a finding that will be raised in any serious audit.

What Professional Deployment Looks Like

The controls described in this article are not extraordinary. They are the baseline that a serious compliance framework requires, applied to infrastructure that is new enough that many organizations have not yet developed the operational practices to implement them.

The Tacitus security baseline builds these controls in at the design level rather than treating them as a post-deployment checklist:

Network isolation and mTLS are configured in the deployment manifest, not added later
RBAC is part of the application architecture, not bolted on through network rules
Audit logging with HMAC chaining is a core system feature, not a plugin
Encryption at rest is configured as part of storage provisioning, not after data is already in place
Physical security requirements are documented in the deployment guide and built into the hardware selection

This matters because security controls that are added after a system is running tend to have gaps. Controls that are part of the system design from the beginning are comprehensive and consistent.

If you’re deploying AI in a regulated environment and want to understand what a compliant, security-baseline deployment looks like for your specific requirements, a technical briefing is the right starting point.

Request a briefing to discuss your compliance and security requirements with the Tacitus team.