Phone:

+(886) 909 756 966

Email:
moneychien20639@gmail.com

© 2024 Yu-Hang

Course:

Advanced Natural Language Processing (11-711)

Time Spent:

-- hours

Source Code:
to github

Latent Guard++: A Context-Aware Safety Framework for Generative Models


In this project, we developed Latent Guard++, an enhanced safety framework for filtering unsafe prompts in text-to-image generation. It builds on the Latent Guard baseline by introducing dynamic thresholding and multi-stage filtering to achieve higher accuracy and efficiency.


Core Contributions:

  • Dynamic Thresholding: Adjusts decision boundaries per prompt using LLM-estimated risk. This enables more nuanced safety decisions, especially on ambiguous or borderline prompts.
  • Multi-Stage Filtering: Combines fast keyword-level filtering, latent embedding scoring, and fallback to LLM classification for uncertain cases.

Key Components:

  • Latent Guard: A contrastive embedding model that filters prompts based on similarity to harmful concepts.
  • LLM Integration: Selectively verifies risky prompts near the decision boundary using a large language model.
  • Word-Level Filtering: Filters prompts with inherently unsafe terms before further analysis.

Empirical Results:

  • Achieved up to 13% accuracy improvement on unseen datasets (e.g., Unsafe Diffusion, I2P++) over fixed threshold baselines.
  • Significant efficiency gains by reducing LLM usage through dynamic filtering thresholds and staged evaluation.
  • Outperformed baseline Latent Guard, especially in handling ambiguous prompts and improving generalization to out-of-distribution data.

Reflection:

This project deepened my understanding of generative model safety, contrastive learning, dynamic inference strategies, and prompt classification using LLMs. It also strengthened my skills in designing interpretable, efficient multi-stage systems that balance accuracy and cost.


  • AI Safety
  • Text-to-Image Models
  • Prompt Filtering
  • Latent Guard
  • Dynamic Thresholding
  • Multi-Stage Filtering
  • Contrastive Learning
  • Large Language Models
  • LLM-guided Risk Estimation
  • Deep Learning