#Jailbreaking — William Blondel Blog

2 posts

How users bypass AI safety filters with "DAN" jailbreaks. Explore the evolution from roleplay to automated attacks and the failure of RLHF alignment.

Prompt Injection is the new SQL Injection. Learn how direct and indirect attacks hijack LLM logic and why chatbots accept malicious commands.