Files
Linux-Server-Management-Too…/SESSION_INTELLIGENCE.md
T
cschantz a51d968185 Initial commit: Server Management Toolkit v2.0
- Complete security menu restructure (3-mode: Analysis/Actions/Live)
- Intelligent cPHulk enablement with CSF whitelist import
- Live network security monitoring dashboard
- Multi-source threat detection and classification
- 50+ organized security tools across 4-level menu hierarchy
- System health diagnostics with cPanel/WHM integration
- Reference database for cross-module intelligence sharing
2025-11-03 18:21:40 -05:00

284 lines
7.9 KiB
Markdown

# SESSION INTELLIGENCE - Cross-Module Data Sharing
## Overview
The Server Toolkit now implements **Session Intelligence** - allowing modules to reference data collected by other modules during the current troubleshooting session. This is optimized for the **download → diagnose → troubleshoot → delete** workflow.
## Use Case
Since the toolkit is meant to be temporary (not permanently installed), we don't track historical trends. Instead, we enable **cross-module intelligence** so modules can make smarter recommendations based on what's happening RIGHT NOW.
## Example Scenarios
### Scenario 1: Bot Attack During System Load
```bash
# User runs System Health Check first
# Discovers: CPU at 95%, Memory at 92%, HIGH LOAD
# User then runs Bot Analyzer
# Bot analyzer checks: db_is_system_under_load
# Result: "High bot traffic detected, but system is already under load.
# Performance issues may be partially due to system resources,
# not just bots. Recommend addressing system load first."
```
### Scenario 2: Slow MySQL During Network Issues
```bash
# User runs System Health Check
# Discovers: TCP retransmission at 15%, HIGH network issues
# User then runs MySQL Query Analyzer
# MySQL analyzer checks: db_has_network_issues
# Result: "Slow queries detected, but network is experiencing high
# retransmission rates. Some query timeouts may be network-
# related rather than database performance."
```
### Scenario 3: Bot Attack + SSH Brute Force
```bash
# User runs System Health Check
# Discovers: 5,000 failed SSH attempts today
# User then runs Bot Analyzer
# Bot analyzer checks: db_is_under_attack
# Result: "Bot traffic detected AND system is under active SSH attack.
# Recommend immediate firewall hardening and cPHulk enablement."
```
## Architecture
### Data Storage: Reference Database (`.sysref`)
The health check saves current session metrics to `[HEALTH_BASELINE]` section:
**System Resources:**
- MEMORY_TOTAL_MB, MEMORY_USED_PERCENT
- CPU_LOAD_1MIN, CPU_CORES
- DISK_USED_PERCENT, IOWAIT_PERCENT
**Services:**
- HTTPD_STATUS, MYSQL_STATUS
- FIREWALL_STATUS, EMAIL_QUEUE_SIZE
- ZOMBIE_PROCESSES
**Network Status:**
- NETWORK_INTERFACE, NETWORK_MTU
- NETWORK_RX_ERRORS, NETWORK_TX_ERRORS
- NETWORK_RX_DROPPED, NETWORK_TX_DROPPED
- TCP_RETRANS_PERCENT
**Hardware Status:**
- DISK_SMART_STATUS
- HARDWARE_ERRORS
**Security Status:**
- SSH_FAILED_ATTEMPTS_TOTAL
- SSH_ATTACKS_TODAY
- CPHULK_STATUS
**Issue Counts:**
- CRITICAL_ISSUES, HIGH_ISSUES
- MEDIUM_ISSUES, LOW_ISSUES
### Helper Functions (`lib/reference-db.sh`)
#### Query Individual Metrics
```bash
value=$(db_get_health_metric "MEMORY_USED_PERCENT")
echo "Memory: $value%"
```
#### Intelligence Functions
**Check System Load:**
```bash
if db_is_system_under_load; then
echo "System under heavy load (CPU > 80% or Memory > 90%)"
# Adjust recommendations
fi
```
**Check Network Issues:**
```bash
if db_has_network_issues; then
echo "Network problems detected (retrans > 5% or errors > 100)"
# Consider network factors in analysis
fi
```
**Check Security Status:**
```bash
if db_is_under_attack; then
echo "Active attacks detected (> 100 SSH failures today)"
# Correlate with security findings
fi
```
#### Get All Metrics
```bash
db_get_all_health # Returns all HEALTH| lines
```
## Implementation in Modules
### Pattern 1: Contextual Recommendations
```bash
# In any module, after sourcing reference-db.sh
# Check system context
if db_is_system_under_load; then
echo "NOTE: System is currently under heavy load."
echo " Some issues may be resource-related."
fi
if db_has_network_issues; then
echo "NOTE: Network experiencing high retransmission rates."
echo " Connection issues may be network-related."
fi
if db_is_under_attack; then
echo "WARNING: System under active SSH attack."
echo " Security hardening recommended."
fi
```
### Pattern 2: Adjusted Thresholds
```bash
# MySQL slow query analyzer
# Normal threshold: 5 seconds
SLOW_THRESHOLD=5
# But if system is under load, adjust threshold
if db_is_system_under_load; then
SLOW_THRESHOLD=10
echo "System under load - using relaxed slow query threshold"
fi
```
### Pattern 3: Root Cause Analysis
```bash
# Website performance analyzer
if db_has_network_issues; then
echo "Website slow, AND network has issues."
echo "Root cause may be network, not website code."
echo "Recommendation: Fix network first, then re-test."
fi
```
## Testing
Run the test script to verify cross-module intelligence:
```bash
# First, generate session data
./launcher.sh
# Choose option 1: System Health Check
# Then test intelligence
./tools/test-cross-module-intelligence.sh
```
Expected output shows:
- All health metrics populated
- Intelligence functions working
- System status correctly identified
## Best Practices
### DO:
✅ Run System Health Check **FIRST** in troubleshooting session
✅ Use intelligence functions to provide context-aware recommendations
✅ Correlate findings across modules
✅ Adjust thresholds based on system state
### DON'T:
❌ Rely on this data for historical trend analysis (it's session-only)
❌ Assume data exists (always check if metric is populated)
❌ Make critical decisions solely on this data
❌ Store this long-term (it gets cleaned up)
## Example: Enhanced Bot Analyzer (Future)
```bash
# modules/security/bot-analyzer.sh
source "$SCRIPT_DIR/lib/reference-db.sh"
# After analysis, provide context
if db_has_network_issues; then
echo ""
print_warning "Network Issues Detected"
echo "System experiencing:"
echo " • TCP Retransmission: $(db_get_health_metric 'TCP_RETRANS_PERCENT')%"
echo " • Network errors: $(db_get_health_metric 'NETWORK_RX_ERRORS')"
echo ""
echo "Bot traffic may be compounded by network problems."
echo "Recommendation: Address network issues first (see System Health Check)"
fi
if db_is_system_under_load; then
echo ""
print_warning "System Under Heavy Load"
echo "Current state:"
echo " • CPU Load: $(db_get_health_metric 'CPU_LOAD_1MIN')"
echo " • Memory: $(db_get_health_metric 'MEMORY_USED_PERCENT')%"
echo ""
echo "High bot traffic + system load = performance degradation."
echo "Recommendation: Block bots AND investigate resource usage."
fi
```
## Files Modified
1. **modules/diagnostics/system-health-check.sh**
- Enhanced `save_health_baseline()` function
- Now saves network, hardware, and security metrics
- Lines: 1660-1758
2. **lib/reference-db.sh**
- Added `db_get_health_metric()` - query individual metrics
- Added `db_is_system_under_load()` - check if CPU/memory high
- Added `db_has_network_issues()` - check for network problems
- Added `db_is_under_attack()` - check for active attacks
- Added `db_get_all_health()` - get all health data
- Lines: 446-497
3. **tools/test-cross-module-intelligence.sh** (NEW)
- Test script demonstrating cross-module queries
- Shows how to use intelligence functions
## Data Lifetime
- **Created:** When System Health Check runs
- **Stored:** In `.sysref` file (memory + disk)
- **Expires:** After 1 hour OR when cleanup/reset runs
- **Removed:** When toolkit is deleted
## Future Enhancements
Potential modules that could benefit:
1. **WordPress Health Check**
- Check if slow WP sites correlate with network/load issues
2. **Backup Analyzer**
- Check if backup failures correlate with disk/load issues
3. **Email Troubleshooter**
- Check if email issues correlate with network/disk problems
4. **Resource Monitor**
- Compare current metrics vs health check baseline
## Summary
Session Intelligence transforms the toolkit from **isolated modules** into an **integrated diagnostic platform**. Each module can now make smarter, context-aware recommendations based on the complete picture of what's happening on the server RIGHT NOW.
No historical data needed. No complex trending. Just smart, session-aware troubleshooting.