Custom Patterns
Vypher allows you to define custom detection patterns to identify organization-specific sensitive data that may not be covered by the built-in detectors.
Overview
Custom patterns enable you to:
- Define organization-specific data formats
- Extend existing detectors with additional patterns
- Create domain-specific detection rules
- Implement industry-specific compliance requirements
Configuration Format
Custom patterns are defined in your vypher.yaml configuration file:
custom_patterns:
- name: "employee_id"
pattern: "\\bEMP-\\d{6}\\b"
description: "Employee ID format: EMP-XXXXXX"
category: "pii"
severity: "high"
- name: "project_code"
pattern: "\\bPROJ-[A-Z]{3}-\\d{4}\\b"
description: "Project code format: PROJ-ABC-1234"
category: "internal"
severity: "medium"
- name: "api_token"
pattern: "\\btoken_[a-f0-9]{32}\\b"
description: "Custom API token format"
category: "secrets"
severity: "critical"Pattern Configuration Options
Required Fields
- name: Unique identifier for the pattern
- pattern: Regular expression to match the sensitive data
- description: Human-readable description of what the pattern matches
Optional Fields
- category: Classification (pii, phi, secrets, internal, etc.)
- severity: Impact level (low, medium, high, critical)
- case_sensitive: Whether matching is case-sensitive (default: false)
- multiline: Whether pattern spans multiple lines (default: false)
- context: Additional context requirements
Regular Expression Examples
Employee Information
custom_patterns:
- name: "employee_badge"
pattern: "\\bBDG-\\d{4,6}\\b"
description: "Employee badge numbers"
- name: "department_code"
pattern: "\\bDEPT-[A-Z]{2,4}\\b"
description: "Department codes"Financial Data
custom_patterns:
- name: "account_number"
pattern: "\\bACC-\\d{8,12}\\b"
description: "Internal account numbers"
- name: "transaction_id"
pattern: "\\bTXN-[A-Z0-9]{10}\\b"
description: "Transaction identifiers"Technical Identifiers
custom_patterns:
- name: "server_hostname"
pattern: "\\b[a-z]+-srv-\\d{3}\\.[a-z]+\\.local\\b"
description: "Internal server hostnames"
- name: "database_connection"
pattern: "\\bmysql://[^\\s]+\\b"
description: "Database connection strings"Advanced Pattern Features
Context-Aware Detection
custom_patterns:
- name: "customer_id_with_context"
pattern: "\\d{8}"
description: "Customer ID numbers"
context:
before: ["customer", "client", "cust"]
after: ["id", "number", "ref"]
window: 10 # words before/after to checkMulti-line Patterns
custom_patterns:
- name: "config_block"
pattern: "\\[database\\]\\s*\\n.*password\\s*=\\s*[^\\n]+"
description: "Database configuration blocks"
multiline: trueCase-Sensitive Matching
custom_patterns:
- name: "api_key_strict"
pattern: "\\bAPI_KEY_[A-Z0-9]{16}\\b"
description: "Strict API key format"
case_sensitive: trueUsing Custom Patterns
Command Line Usage
# Use all custom patterns
vypher scan /path/to/files --include-custom
# Use specific custom patterns
vypher scan /path/to/files --custom-patterns employee_id,project_code
# Combine with built-in detectors
vypher scan /path/to/files --detectors pii,secrets --include-customConfiguration File
# vypher.yaml
detectors:
pii:
enabled: true
secrets:
enabled: true
custom:
enabled: true
patterns:
- employee_id
- api_token
- project_codeTesting Custom Patterns
Validation Mode
# Test patterns against sample data
vypher validate-patterns --config vypher.yaml --sample-file test_data.txtDebug Output
# Show detailed pattern matching information
vypher scan /path/to/files --debug --custom-patterns employee_idBest Practices
Pattern Design
- Be Specific: Make patterns as specific as possible to avoid false positives
- Use Anchors: Use
\bword boundaries to prevent partial matches - Test Thoroughly: Test patterns against representative data
- Document Clearly: Provide clear descriptions for all patterns
Performance Considerations
- Optimize Regex: Use efficient regular expressions
- Limit Scope: Apply patterns only where needed
- Use Categories: Group related patterns for better organization
- Monitor Performance: Track scanning performance with custom patterns
Security Guidelines
- Protect Patterns: Keep pattern definitions secure
- Regular Updates: Review and update patterns regularly
- Access Control: Limit who can modify custom patterns
- Audit Trail: Log changes to custom pattern definitions
Pattern Libraries
Industry-Specific Examples
Healthcare
custom_patterns:
- name: "patient_mrn"
pattern: "\\bMRN[:-]?\\s*\\d{7,9}\\b"
description: "Medical record numbers"
category: "phi"Finance
custom_patterns:
- name: "iban_code"
pattern: "\\b[A-Z]{2}\\d{2}[A-Z0-9]{4,30}\\b"
description: "International Bank Account Numbers"
category: "pii"Technology
custom_patterns:
- name: "git_token"
pattern: "\\bghp_[A-Za-z0-9]{36}\\b"
description: "GitHub personal access tokens"
category: "secrets"Troubleshooting
Common Issues
- No Matches: Pattern too specific or incorrect escaping
- Too Many False Positives: Pattern too broad, needs refinement
- Performance Issues: Complex regex patterns, consider optimization
- Context Not Working: Check context window and keywords
Debugging Tips
# Test individual patterns
vypher test-pattern --pattern "\\bEMP-\\d{6}\\b" --file test.txt
# Validate regex syntax
vypher validate-regex --pattern "\\bEMP-\\d{6}\\b"Related Topics
- PII Detection - Personal information patterns
- PHI Detection - Healthcare information patterns
- Secrets Detection - API keys and tokens
- Configuration Guide - General configuration
Last updated on