Ever worried about AI agents running amok with sensitive data? We’ve got you covered! In this hands-on Python tutorial, we’ll build an intelligent yet responsible AI agent that adheres to safety rules when interacting with data and tools. No paid APIs or external dependencies needed!
π‘οΈ What’s in store?
1. Multi-layered Protection: We’ll implement input sanitization, prompt-injection detection, PII (Personally Identifiable Information) redaction, URL allowlisting, and rate limiting.
2. Self-Critique: Using an optional local Hugging Face model, we’ll make our AI agent more trustworthy by enabling self-critique for output auditing.
3. Safe Tool Access: We’ll design sandboxed tools like a safe calculator and an allowlisted web fetcher to handle specific user requests securely.
π» Let’s dive in!
First, we set up our security framework and initialize the optional Hugging Face model for auditing. We define key constants, patterns, and rules to govern our agent’s security behavior.
“`python
USE_LLM = True # Use the local Hugging Face model for self-critique
# … (rest of the code)
“`
Next, we implement core utility functions to sanitize, redact, and validate user inputs. We also design our safe tools.
“`python
def pii_redact(text: str) -> str:
# … (PII redaction logic)
def injection_heuristics(user_msg: str) -> List[str]:
# … (prompt-injection detection logic)
def tool_calc(payload: str) -> str:
# … (safe calculator logic)
def tool_web_fetch(payload: str) -> str:
# … (allowlisted web fetcher logic)
“`
We then define our policy engine that enforces input checks, rate limits, and risk audits before and after executing actions.
“`python
class PolicyEngine:
def __init__(self):
self.last_call_ts = 0.0
def preflight(self, user_msg: str, tool: Optional[str]) -> PolicyDecision:
# … (preflight logic)
def postflight(self, prompt: str, output: str, critic: SelfCritic) -> Dict[str, Any]:
# … (postflight logic)
“`
Finally, we construct the central `SecureAgent` class that plans, executes, and reviews actions, embedding automatic mitigation for risky outputs.
“`python
class SecureAgent:
def __init__(self, use_llm: bool = False):
self.policy = PolicyEngine()
self.critic = SelfCritic(use_llm)
def run(self, user_msg: str) -> Dict[str, Any]:
# … (run logic)
“`
We test our secure agent against various scenarios, observing how it detects prompt injections, redacts sensitive data, and performs tasks safely while maintaining intelligent behavior.
π Conclusion
By balancing intelligence and responsibility in AI agent design, we’ve created an agent that can reason, plan, and act safely within defined security boundaries while autonomously auditing its outputs for risks. This approach shows that security need not come at the cost of usability. With just a few hundred lines of Python, we can create AI agents that are not only capable but also careful.
π« Ready to secure your AI agents? Check out the [full codes here](insert_link_here). Don’t forget to follow us on [Twitter](insert_twitter_link_here), join our [100k+ ML SubReddit](insert_reddit_link_here), subscribe to our newsletter, and join us on [Telegram](insert_telegram_link_here) for more tutorials, codes, and notebooks!