Skip to Content
🎉 Vypher v1.0 is released. Read more →
DetectorsCustom Patterns

Custom Patterns

Vypher allows you to define custom detection patterns to identify organization-specific sensitive data that may not be covered by the built-in detectors.

Overview

Custom patterns enable you to:

  • Define organization-specific data formats
  • Extend existing detectors with additional patterns
  • Create domain-specific detection rules
  • Implement industry-specific compliance requirements

Configuration Format

Custom patterns are defined in your vypher.yaml configuration file:

custom_patterns: - name: "employee_id" pattern: "\\bEMP-\\d{6}\\b" description: "Employee ID format: EMP-XXXXXX" category: "pii" severity: "high" - name: "project_code" pattern: "\\bPROJ-[A-Z]{3}-\\d{4}\\b" description: "Project code format: PROJ-ABC-1234" category: "internal" severity: "medium" - name: "api_token" pattern: "\\btoken_[a-f0-9]{32}\\b" description: "Custom API token format" category: "secrets" severity: "critical"

Pattern Configuration Options

Required Fields

  • name: Unique identifier for the pattern
  • pattern: Regular expression to match the sensitive data
  • description: Human-readable description of what the pattern matches

Optional Fields

  • category: Classification (pii, phi, secrets, internal, etc.)
  • severity: Impact level (low, medium, high, critical)
  • case_sensitive: Whether matching is case-sensitive (default: false)
  • multiline: Whether pattern spans multiple lines (default: false)
  • context: Additional context requirements

Regular Expression Examples

Employee Information

custom_patterns: - name: "employee_badge" pattern: "\\bBDG-\\d{4,6}\\b" description: "Employee badge numbers" - name: "department_code" pattern: "\\bDEPT-[A-Z]{2,4}\\b" description: "Department codes"

Financial Data

custom_patterns: - name: "account_number" pattern: "\\bACC-\\d{8,12}\\b" description: "Internal account numbers" - name: "transaction_id" pattern: "\\bTXN-[A-Z0-9]{10}\\b" description: "Transaction identifiers"

Technical Identifiers

custom_patterns: - name: "server_hostname" pattern: "\\b[a-z]+-srv-\\d{3}\\.[a-z]+\\.local\\b" description: "Internal server hostnames" - name: "database_connection" pattern: "\\bmysql://[^\\s]+\\b" description: "Database connection strings"

Advanced Pattern Features

Context-Aware Detection

custom_patterns: - name: "customer_id_with_context" pattern: "\\d{8}" description: "Customer ID numbers" context: before: ["customer", "client", "cust"] after: ["id", "number", "ref"] window: 10 # words before/after to check

Multi-line Patterns

custom_patterns: - name: "config_block" pattern: "\\[database\\]\\s*\\n.*password\\s*=\\s*[^\\n]+" description: "Database configuration blocks" multiline: true

Case-Sensitive Matching

custom_patterns: - name: "api_key_strict" pattern: "\\bAPI_KEY_[A-Z0-9]{16}\\b" description: "Strict API key format" case_sensitive: true

Using Custom Patterns

Command Line Usage

# Use all custom patterns vypher scan /path/to/files --include-custom # Use specific custom patterns vypher scan /path/to/files --custom-patterns employee_id,project_code # Combine with built-in detectors vypher scan /path/to/files --detectors pii,secrets --include-custom

Configuration File

# vypher.yaml detectors: pii: enabled: true secrets: enabled: true custom: enabled: true patterns: - employee_id - api_token - project_code

Testing Custom Patterns

Validation Mode

# Test patterns against sample data vypher validate-patterns --config vypher.yaml --sample-file test_data.txt

Debug Output

# Show detailed pattern matching information vypher scan /path/to/files --debug --custom-patterns employee_id

Best Practices

Pattern Design

  1. Be Specific: Make patterns as specific as possible to avoid false positives
  2. Use Anchors: Use \b word boundaries to prevent partial matches
  3. Test Thoroughly: Test patterns against representative data
  4. Document Clearly: Provide clear descriptions for all patterns

Performance Considerations

  1. Optimize Regex: Use efficient regular expressions
  2. Limit Scope: Apply patterns only where needed
  3. Use Categories: Group related patterns for better organization
  4. Monitor Performance: Track scanning performance with custom patterns

Security Guidelines

  1. Protect Patterns: Keep pattern definitions secure
  2. Regular Updates: Review and update patterns regularly
  3. Access Control: Limit who can modify custom patterns
  4. Audit Trail: Log changes to custom pattern definitions

Pattern Libraries

Industry-Specific Examples

Healthcare

custom_patterns: - name: "patient_mrn" pattern: "\\bMRN[:-]?\\s*\\d{7,9}\\b" description: "Medical record numbers" category: "phi"

Finance

custom_patterns: - name: "iban_code" pattern: "\\b[A-Z]{2}\\d{2}[A-Z0-9]{4,30}\\b" description: "International Bank Account Numbers" category: "pii"

Technology

custom_patterns: - name: "git_token" pattern: "\\bghp_[A-Za-z0-9]{36}\\b" description: "GitHub personal access tokens" category: "secrets"

Troubleshooting

Common Issues

  1. No Matches: Pattern too specific or incorrect escaping
  2. Too Many False Positives: Pattern too broad, needs refinement
  3. Performance Issues: Complex regex patterns, consider optimization
  4. Context Not Working: Check context window and keywords

Debugging Tips

# Test individual patterns vypher test-pattern --pattern "\\bEMP-\\d{6}\\b" --file test.txt # Validate regex syntax vypher validate-regex --pattern "\\bEMP-\\d{6}\\b"
Last updated on