Just as your LLM once again goes off the rails, Cisco, Nvidia are at the door smiling

2 months ago 63

BOOK THIS SPACE FOR AD

ARTICLE AD

Cisco and Nvidia have both recognized that as useful as today's AI may be, the technology can be equally unsafe and/or unreliable – and have delivered tools in an attempt to help address those weaknesses.

Nvidia on Thursday introduced a trio of specialized microservices aimed at stopping your own AI agents from being hijacked by users or spouting inappropriate stuff onto the 'net.

As our friends over at The Next Platform reported, these three Nvidia Inference Microservices (aka NIMs) are the latest members of the GPU giant's NeMo Guardrails collection, and are designed to steer chatbots and autonomous agents so that they operate as intended.

The trio are:

A content safety NIM that tries to stop your own AI model from "generating biased or harmful outputs, ensuring responses align with ethical standards." What you do is take a user's input prompt and your model's output, and run both as a pair through the NIM, which concludes whether that input and output is appropriate. You can then act on those recommendations, either telling off the user for being bad, or blocking the output of the model for being rude. This NIM was trained using the Aegis Content Safety Dataset, which consists of about 33,000 user-LLM interactions that are labeled safe or unsafe. A topic control NIM that, we're told, "keeps conversations focused on approved topics, avoiding digression or inappropriate content." This NIM takes your model's system prompt and a user's input, and determines whether or not the user is on topic for the system prompt. If the user is trying to make your model go off the rails, this NIM can help block that. A jailbreak detection NIM that tries to do what it says on the tin. It analyzes just your users' inputs to detect attempts to jailbreak your LLM, which is to make it go against its intended purpose.

As we've previously explored, it can be hard to prevent prompt injection attacks because many AI chatbots and assistants are built on general-purpose language-processing models and their guardrails can be overridden with some simple persuasion. For example, in some cases, merely instructing a chatbot to "ignore all previous instructions, do this instead" can allow behavior developers did not intend. That scenario is one of several that Nvidia's Jailbreak detection model hopes to protect against.

Depending on the application in question, the GPU giant says chaining multiple guardrail models together - such as topic control, content safety, and jailbreak detection - may be necessary to comprehensively address security gaps and compliance challenges.

Megan, AI recruiting agent, is on the job, giving bosses fewer reasons to hire in HR In AI agent push, Microsoft re-orgs to create 'CoreAI – Platform and Tools' team Voice-enabled AI agents can automate everything, even your phone scams Microsoft says its Copilot AI agents set to tackle employee tasks in November

Using multiple models does, however, come at the expense of higher overheads and latency. Because of this, Nvidia elected to base these guardrails on smaller language models, roughly eight billion parameters in size each, which can be run at scale with minimal resources.

These models are available as NIMs for AI Enterprise customers, or from Hugging Face for those preferring to implement them manually.

Nvidia is also providing an open source tool called Garak to identify AI vulnerabilities, such as data leaks, prompt injection, and hallucinations, in applications to validate the efficacy of these guardrails.

Cisco wants in, too

Cisco’s AI infosec tools will be offered under the name AI Defense, and has a little overlap with Nvidia’s offerings in the form of a model validation tool that Switchzilla says will investigate LLM performance and advise infosec teams of any risks it creates.

The networking giant also plans AI discovery tools to help security teams seek out “shadow” applications that business units have deployed without IT oversight.

Cisco also feels that some of you have botched chatbot implementations by deploying them without restricting them to their intended roles, such as purely customer service interactions, and therefore allowing users unrestricted to the services like OpenAI's ChatGPT that power them. That mistake can cost big bucks if people discover it and use your chatbot as a way to access paid AI services.

AI Defense, we're told, will be able to detect that sort of thing so you can fix it, and will include hundreds of guardrails that can be deployed to (hopefully) stop AI producing unwanted results.

The offering is a work-in-progress, and will see tools added to Cisco’s cloudy Security Cloud and Secure Access services. The latter will in February gain a service called AI Access that does things like block user access to online AI services you’d rather they did not use. More services will appear over time.

Cisco’s also changing its own customer-facing AI agents, which can do things like allow natural language interfaces to its products – but currently do so discretely for each of its products. The networking giant plans a single agent to rule them all and in the router bind them, so net admins can use a single chat interface to get answers about the different components of their Cisco estates.

Anand Raghavan, Cisco’s VP of engineering for AI, told The Register he has a multi-year roadmap pointing to development of more AI security tools, a sobering item of information given IT shops already face myriad infosec threats and often struggle to implement and integrate the tools to address them. ®

In other AI news...

Google researchers have come up with an attention-based LLM architecture dubbed Titans that can scale beyond two-million-token context windows and outperform ultra-large models due to the way it handles the memorization of information. A pre-print paper describing the approach is here. The FTC has referred its probe into Snap's MyAI chatbot to the US Dept of Justice for possible criminal prosecution. The watchdog said it believes the software poses "risks and harms to young users."

Read Entire Article