Latent Guard++: A Context-Aware Safety Framework for Generative Models

In this project, we developed Latent Guard++, an enhanced safety framework for filtering unsafe prompts in text-to-image generation. It builds on the Latent Guard baseline by introducing dynamic thresholding and multi-stage filtering to achieve higher accuracy and efficiency.

Core Contributions:

Dynamic Thresholding: Adjusts decision boundaries per prompt using LLM-estimated risk. This enables more nuanced safety decisions, especially on ambiguous or borderline prompts.
Multi-Stage Filtering: Combines fast keyword-level filtering, latent embedding scoring, and fallback to LLM classification for uncertain cases.

Key Components:

Latent Guard: A contrastive embedding model that filters prompts based on similarity to harmful concepts.
LLM Integration: Selectively verifies risky prompts near the decision boundary using a large language model.
Word-Level Filtering: Filters prompts with inherently unsafe terms before further analysis.

Empirical Results:

Achieved up to 13% accuracy improvement on unseen datasets (e.g., Unsafe Diffusion, I2P++) over fixed threshold baselines.
Significant efficiency gains by reducing LLM usage through dynamic filtering thresholds and staged evaluation.
Outperformed baseline Latent Guard, especially in handling ambiguous prompts and improving generalization to out-of-distribution data.

Reflection:

This project deepened my understanding of generative model safety, contrastive learning, dynamic inference strategies, and prompt classification using LLMs. It also strengthened my skills in designing interpretable, efficient multi-stage systems that balance accuracy and cost.

AI Safety
Text-to-Image Models
Prompt Filtering
Latent Guard
Dynamic Thresholding
Multi-Stage Filtering
Contrastive Learning
Large Language Models
LLM-guided Risk Estimation
Deep Learning

Phone:

+(886) 909 756 966

Email:

moneychien20639@gmail.com

Arthur Chien

Course:

Time Spent:

Source Code:

Latent Guard++: A Context-Aware Safety Framework for Generative Models

Core Contributions:

Key Components:

Empirical Results:

Reflection: