Safety Check Operator

Overview

The Safety Check operator is designed to analyze text-based content for potentially harmful or inappropriate material. It integrates toxicity detection and profanity filtering to help ensure that generated or user-submitted text adheres to safety guidelines. This operator is beneficial for applications where content moderation is crucial, such as chatbots, social media platforms, or any system involving user-generated text.

Requirements

Python Packages:
- detoxify
- better_profanity
- transformers (optional, for transformer-based toxicity detection) These can be installed via the ChatTD operator’s Python manager.
ChatTD Operator: Required and must be configured.

Input/Output

Inputs

Input Table (DAT): Table containing the conversation/text to analyze. Required columns: id, role, message, timestamp.

Outputs

Toxicity Table (DAT): Toxicity scores and details. Columns: toxicity_score, severe_toxicity, obscene, threat, insult, identity_hate, message_id, role, message, timestamp.
Profanity Table (DAT): Profanity detection results. Columns: contains_profanity, profanity_probability, flagged_words, message_id, role, message, timestamp.
PII Table (DAT): Personally Identifiable Information results. Columns: contains_pii, pii_types, confidence, message_id, role, message, timestamp.
Summary Table (DAT): Overall safety analysis summary. Columns: metric, value.

Parameters

Safety Page

Start Safety Checks (Check) op('safety_check').par.Check Pulse

Default:: None

Status (Status) op('safety_check').par.Status String

Default:: Safety checks complete

Toxicity Threshold (Toxicitythreshold) op('safety_check').par.Toxicitythreshold Float

Default:: 0.328

Profanity Threshold (Profanitythreshold) op('safety_check').par.Profanitythreshold Float

Default:: 0.376

Clear Results (Clear) op('safety_check').par.Clear Pulse

Default:: None

Callbacks Page

Callbacks Header

Callback DAT (Callbackdat) op('safety_check').par.Callbackdat DAT

Default:: None

Edit Callbacks (Editcallbacksscript) op('safety_check').par.Editcallbacksscript Pulse

Default:: None

Create Callbacks (Createpulse) op('safety_check').par.Createpulse Pulse

Default:: None

onViolation (Onviolation) op('safety_check').par.Onviolation Toggle

Default:: On

About Page

Bypass (Bypass) op('safety_check').par.Bypass Toggle

Default:: Off

Show Built-in Parameters (Showbuiltin) op('safety_check').par.Showbuiltin Toggle

Default:: Off

Version (Version) op('safety_check').par.Version String

Default:: 1.0.0

Last Updated (Lastupdated) op('safety_check').par.Lastupdated String

Default:: 2024-11-10

Creator (Creator) op('safety_check').par.Creator String

Default:: dotsimulate

Website (Website) op('safety_check').par.Website String

Default:: https://dotsimulate.com

ChatTD Operator (Chattd) op('safety_check').par.Chattd OP

Default:: /dot_lops/ChatTD

Callbacks

Available Callbacks:

onViolation

Example Callback Structure:

def onViolation(info):
# Called when a safety check fails (e.g., toxicity/profanity threshold exceeded)
# info dictionary contains details like:
# - op: The Safety Check operator
# - checkType: 'toxicity' or 'profanity'
# - messageId: ID of the violating message
# - message: Content of the violating message
# - role: Role associated with the message
# - score: The calculated score (toxicity or profanity probability)
# - threshold: The threshold that was exceeded
print(f"Safety violation detected: {info.get('checkType')}")
# Example: op('path/to/notifier').par.Sendmessage.pulse()
pass

Performance Considerations

Performance depends on input text size and enabled checks.
Transformer-based toxicity detection can be resource-intensive.
Analyze only necessary parts of conversations (e.g., last_message) for better performance.
batch update mode might be faster for large inputs.

Usage Examples

Analyzing a Full Conversation

safety_checker = op('safety_check1')
conversation_dat = op('conversation_log') # Assuming this DAT exists

# Connect input
safety_checker.inputConnectors[0].connect(conversation_dat)

# Configure checks
safety_checker.par.Analyzemode = 'full_conversation'
safety_checker.par.Checkmodes = 'toxicity profanity' # Enable both
safety_checker.par.Toxicitythreshold = 0.5
safety_checker.par.Profanitythreshold = 0.6

# Start check
safety_checker.par.Check.pulse()

# View results
# toxicity_results = safety_checker.op('toxicity_table')
# profanity_results = safety_checker.op('profanity_table')

Using Callbacks for Violations

# 1. Create a Text DAT (e.g., 'safety_callbacks')
# 2. Add the onViolation function (see Callbacks section above)

safety_checker = op('safety_check1')

# Configure callbacks
safety_checker.par.Callbackdat = op('safety_callbacks')
safety_checker.par.Onviolation = 1

# Run checks as usual
safety_checker.par.Check.pulse()

# The onViolation function in 'safety_callbacks' DAT will execute if thresholds are met.

Common Use Cases

Moderating chatbots and virtual assistants.
Filtering user-generated content (comments, posts).
Ensuring safety in text-based games or virtual worlds.
Flagging inappropriate language in online communities.