Safety Check Operator
Overview
Section titled “Overview”The Safety Check operator is designed to analyze text-based content for potentially harmful or inappropriate material. It integrates toxicity detection and profanity filtering to help ensure that generated or user-submitted text adheres to safety guidelines. This operator is beneficial for applications where content moderation is crucial, such as chatbots, social media platforms, or any system involving user-generated text.
Requirements
Section titled “Requirements”- Python Packages:
detoxify
better_profanity
transformers
(optional, for transformer-based toxicity detection) These can be installed via the ChatTD operator’s Python manager.
- ChatTD Operator: Required and must be configured.
Input/Output
Section titled “Input/Output”Inputs
Section titled “Inputs”- Input Table (DAT): Table containing the conversation/text to analyze. Required columns:
id
,role
,message
,timestamp
.
Outputs
Section titled “Outputs”- Toxicity Table (DAT): Toxicity scores and details. Columns:
toxicity_score
,severe_toxicity
,obscene
,threat
,insult
,identity_hate
,message_id
,role
,message
,timestamp
. - Profanity Table (DAT): Profanity detection results. Columns:
contains_profanity
,profanity_probability
,flagged_words
,message_id
,role
,message
,timestamp
. - PII Table (DAT): Personally Identifiable Information results. Columns:
contains_pii
,pii_types
,confidence
,message_id
,role
,message
,timestamp
. - Summary Table (DAT): Overall safety analysis summary. Columns:
metric
,value
.
Parameters
Section titled “Parameters”Safety Page
Section titled “Safety Page” Start Safety Checks (Check)
op('safety_check').par.Check
Pulse - Default:
None
Status (Status)
op('safety_check').par.Status
String - Default:
Safety checks complete
Toxicity Threshold (Toxicitythreshold)
op('safety_check').par.Toxicitythreshold
Float - Default:
0.328
Profanity Threshold (Profanitythreshold)
op('safety_check').par.Profanitythreshold
Float - Default:
0.376
Clear Results (Clear)
op('safety_check').par.Clear
Pulse - Default:
None
Callbacks Page
Section titled “Callbacks Page” Callbacks Header
Callback DAT (Callbackdat)
op('safety_check').par.Callbackdat
DAT - Default:
None
Edit Callbacks (Editcallbacksscript)
op('safety_check').par.Editcallbacksscript
Pulse - Default:
None
Create Callbacks (Createpulse)
op('safety_check').par.Createpulse
Pulse - Default:
None
onViolation (Onviolation)
op('safety_check').par.Onviolation
Toggle - Default:
On
About Page
Section titled “About Page” Bypass (Bypass)
op('safety_check').par.Bypass
Toggle - Default:
Off
Show Built-in Parameters (Showbuiltin)
op('safety_check').par.Showbuiltin
Toggle - Default:
Off
Version (Version)
op('safety_check').par.Version
String - Default:
1.0.0
Last Updated (Lastupdated)
op('safety_check').par.Lastupdated
String - Default:
2024-11-10
Creator (Creator)
op('safety_check').par.Creator
String - Default:
dotsimulate
Website (Website)
op('safety_check').par.Website
String - Default:
https://dotsimulate.com
ChatTD Operator (Chattd)
op('safety_check').par.Chattd
OP - Default:
/dot_lops/ChatTD
Callbacks
Section titled “Callbacks” Available Callbacks:
onViolation
Example Callback Structure:
def onViolation(info):
# Called when a safety check fails (e.g., toxicity/profanity threshold exceeded)
# info dictionary contains details like:
# - op: The Safety Check operator
# - checkType: 'toxicity' or 'profanity'
# - messageId: ID of the violating message
# - message: Content of the violating message
# - role: Role associated with the message
# - score: The calculated score (toxicity or profanity probability)
# - threshold: The threshold that was exceeded
print(f"Safety violation detected: {info.get('checkType')}")
# Example: op('path/to/notifier').par.Sendmessage.pulse()
pass
Performance Considerations
Section titled “Performance Considerations”- Performance depends on input text size and enabled checks.
- Transformer-based toxicity detection can be resource-intensive.
- Analyze only necessary parts of conversations (e.g.,
last_message
) for better performance. batch
update mode might be faster for large inputs.
Usage Examples
Section titled “Usage Examples”Analyzing a Full Conversation
Section titled “Analyzing a Full Conversation”safety_checker = op('safety_check1')conversation_dat = op('conversation_log') # Assuming this DAT exists
# Connect inputsafety_checker.inputConnectors[0].connect(conversation_dat)
# Configure checkssafety_checker.par.Analyzemode = 'full_conversation'safety_checker.par.Checkmodes = 'toxicity profanity' # Enable bothsafety_checker.par.Toxicitythreshold = 0.5safety_checker.par.Profanitythreshold = 0.6
# Start checksafety_checker.par.Check.pulse()
# View results# toxicity_results = safety_checker.op('toxicity_table')# profanity_results = safety_checker.op('profanity_table')
Using Callbacks for Violations
Section titled “Using Callbacks for Violations”# 1. Create a Text DAT (e.g., 'safety_callbacks')# 2. Add the onViolation function (see Callbacks section above)
safety_checker = op('safety_check1')
# Configure callbackssafety_checker.par.Callbackdat = op('safety_callbacks')safety_checker.par.Onviolation = 1
# Run checks as usualsafety_checker.par.Check.pulse()
# The onViolation function in 'safety_callbacks' DAT will execute if thresholds are met.
Common Use Cases
Section titled “Common Use Cases”- Moderating chatbots and virtual assistants.
- Filtering user-generated content (comments, posts).
- Ensuring safety in text-based games or virtual worlds.
- Flagging inappropriate language in online communities.