Files
Linux-Server-Management-Too…/SESSION_INTELLIGENCE.md
T
cschantz a51d968185 Initial commit: Server Management Toolkit v2.0
- Complete security menu restructure (3-mode: Analysis/Actions/Live)
- Intelligent cPHulk enablement with CSF whitelist import
- Live network security monitoring dashboard
- Multi-source threat detection and classification
- 50+ organized security tools across 4-level menu hierarchy
- System health diagnostics with cPanel/WHM integration
- Reference database for cross-module intelligence sharing
2025-11-03 18:21:40 -05:00

7.9 KiB

SESSION INTELLIGENCE - Cross-Module Data Sharing

Overview

The Server Toolkit now implements Session Intelligence - allowing modules to reference data collected by other modules during the current troubleshooting session. This is optimized for the download → diagnose → troubleshoot → delete workflow.

Use Case

Since the toolkit is meant to be temporary (not permanently installed), we don't track historical trends. Instead, we enable cross-module intelligence so modules can make smarter recommendations based on what's happening RIGHT NOW.

Example Scenarios

Scenario 1: Bot Attack During System Load

# User runs System Health Check first
# Discovers: CPU at 95%, Memory at 92%, HIGH LOAD

# User then runs Bot Analyzer
# Bot analyzer checks: db_is_system_under_load
# Result: "High bot traffic detected, but system is already under load.
#          Performance issues may be partially due to system resources,
#          not just bots. Recommend addressing system load first."

Scenario 2: Slow MySQL During Network Issues

# User runs System Health Check
# Discovers: TCP retransmission at 15%, HIGH network issues

# User then runs MySQL Query Analyzer
# MySQL analyzer checks: db_has_network_issues
# Result: "Slow queries detected, but network is experiencing high
#          retransmission rates. Some query timeouts may be network-
#          related rather than database performance."

Scenario 3: Bot Attack + SSH Brute Force

# User runs System Health Check
# Discovers: 5,000 failed SSH attempts today

# User then runs Bot Analyzer
# Bot analyzer checks: db_is_under_attack
# Result: "Bot traffic detected AND system is under active SSH attack.
#          Recommend immediate firewall hardening and cPHulk enablement."

Architecture

Data Storage: Reference Database (.sysref)

The health check saves current session metrics to [HEALTH_BASELINE] section:

System Resources:

  • MEMORY_TOTAL_MB, MEMORY_USED_PERCENT
  • CPU_LOAD_1MIN, CPU_CORES
  • DISK_USED_PERCENT, IOWAIT_PERCENT

Services:

  • HTTPD_STATUS, MYSQL_STATUS
  • FIREWALL_STATUS, EMAIL_QUEUE_SIZE
  • ZOMBIE_PROCESSES

Network Status:

  • NETWORK_INTERFACE, NETWORK_MTU
  • NETWORK_RX_ERRORS, NETWORK_TX_ERRORS
  • NETWORK_RX_DROPPED, NETWORK_TX_DROPPED
  • TCP_RETRANS_PERCENT

Hardware Status:

  • DISK_SMART_STATUS
  • HARDWARE_ERRORS

Security Status:

  • SSH_FAILED_ATTEMPTS_TOTAL
  • SSH_ATTACKS_TODAY
  • CPHULK_STATUS

Issue Counts:

  • CRITICAL_ISSUES, HIGH_ISSUES
  • MEDIUM_ISSUES, LOW_ISSUES

Helper Functions (lib/reference-db.sh)

Query Individual Metrics

value=$(db_get_health_metric "MEMORY_USED_PERCENT")
echo "Memory: $value%"

Intelligence Functions

Check System Load:

if db_is_system_under_load; then
    echo "System under heavy load (CPU > 80% or Memory > 90%)"
    # Adjust recommendations
fi

Check Network Issues:

if db_has_network_issues; then
    echo "Network problems detected (retrans > 5% or errors > 100)"
    # Consider network factors in analysis
fi

Check Security Status:

if db_is_under_attack; then
    echo "Active attacks detected (> 100 SSH failures today)"
    # Correlate with security findings
fi

Get All Metrics

db_get_all_health  # Returns all HEALTH| lines

Implementation in Modules

Pattern 1: Contextual Recommendations

# In any module, after sourcing reference-db.sh

# Check system context
if db_is_system_under_load; then
    echo "NOTE: System is currently under heavy load."
    echo "      Some issues may be resource-related."
fi

if db_has_network_issues; then
    echo "NOTE: Network experiencing high retransmission rates."
    echo "      Connection issues may be network-related."
fi

if db_is_under_attack; then
    echo "WARNING: System under active SSH attack."
    echo "         Security hardening recommended."
fi

Pattern 2: Adjusted Thresholds

# MySQL slow query analyzer

# Normal threshold: 5 seconds
SLOW_THRESHOLD=5

# But if system is under load, adjust threshold
if db_is_system_under_load; then
    SLOW_THRESHOLD=10
    echo "System under load - using relaxed slow query threshold"
fi

Pattern 3: Root Cause Analysis

# Website performance analyzer

if db_has_network_issues; then
    echo "Website slow, AND network has issues."
    echo "Root cause may be network, not website code."
    echo "Recommendation: Fix network first, then re-test."
fi

Testing

Run the test script to verify cross-module intelligence:

# First, generate session data
./launcher.sh
# Choose option 1: System Health Check

# Then test intelligence
./tools/test-cross-module-intelligence.sh

Expected output shows:

  • All health metrics populated
  • Intelligence functions working
  • System status correctly identified

Best Practices

DO:

Run System Health Check FIRST in troubleshooting session Use intelligence functions to provide context-aware recommendations Correlate findings across modules Adjust thresholds based on system state

DON'T:

Rely on this data for historical trend analysis (it's session-only) Assume data exists (always check if metric is populated) Make critical decisions solely on this data Store this long-term (it gets cleaned up)

Example: Enhanced Bot Analyzer (Future)

# modules/security/bot-analyzer.sh

source "$SCRIPT_DIR/lib/reference-db.sh"

# After analysis, provide context

if db_has_network_issues; then
    echo ""
    print_warning "Network Issues Detected"
    echo "System experiencing:"
    echo "  • TCP Retransmission: $(db_get_health_metric 'TCP_RETRANS_PERCENT')%"
    echo "  • Network errors: $(db_get_health_metric 'NETWORK_RX_ERRORS')"
    echo ""
    echo "Bot traffic may be compounded by network problems."
    echo "Recommendation: Address network issues first (see System Health Check)"
fi

if db_is_system_under_load; then
    echo ""
    print_warning "System Under Heavy Load"
    echo "Current state:"
    echo "  • CPU Load: $(db_get_health_metric 'CPU_LOAD_1MIN')"
    echo "  • Memory: $(db_get_health_metric 'MEMORY_USED_PERCENT')%"
    echo ""
    echo "High bot traffic + system load = performance degradation."
    echo "Recommendation: Block bots AND investigate resource usage."
fi

Files Modified

  1. modules/diagnostics/system-health-check.sh

    • Enhanced save_health_baseline() function
    • Now saves network, hardware, and security metrics
    • Lines: 1660-1758
  2. lib/reference-db.sh

    • Added db_get_health_metric() - query individual metrics
    • Added db_is_system_under_load() - check if CPU/memory high
    • Added db_has_network_issues() - check for network problems
    • Added db_is_under_attack() - check for active attacks
    • Added db_get_all_health() - get all health data
    • Lines: 446-497
  3. tools/test-cross-module-intelligence.sh (NEW)

    • Test script demonstrating cross-module queries
    • Shows how to use intelligence functions

Data Lifetime

  • Created: When System Health Check runs
  • Stored: In .sysref file (memory + disk)
  • Expires: After 1 hour OR when cleanup/reset runs
  • Removed: When toolkit is deleted

Future Enhancements

Potential modules that could benefit:

  1. WordPress Health Check

    • Check if slow WP sites correlate with network/load issues
  2. Backup Analyzer

    • Check if backup failures correlate with disk/load issues
  3. Email Troubleshooter

    • Check if email issues correlate with network/disk problems
  4. Resource Monitor

    • Compare current metrics vs health check baseline

Summary

Session Intelligence transforms the toolkit from isolated modules into an integrated diagnostic platform. Each module can now make smarter, context-aware recommendations based on the complete picture of what's happening on the server RIGHT NOW.

No historical data needed. No complex trending. Just smart, session-aware troubleshooting.