Linux-Server-Management-Too…/docs/MYSQL_RESTORE_SCRIPT_IMPROVEMENTS.md

# MySQL Restore to SQL Script - Comprehensive Improvement Plan
## Based on Real-World InnoDB Recovery Issues

**Date**: February 27, 2026
**Script**: `/root/server-toolkit/modules/backup/mysql-restore-to-sql.sh`
**Status**: Needs 5 Major Improvements
**Issue Reference**: Ticket #43751550

---

## EXECUTIVE SUMMARY

The script currently handles the recovery workflow but is missing **5 critical validation checkpoints** that would help users diagnose and resolve InnoDB corruption issues. The detailed testing revealed that when system tables (`mysql/`) are corrupted, the script fails with vague error messages.

**Issues Found**: 5 Major + 2 Architecture
**Severity**: HIGH (affects recovery reliability)
**User Impact**: Recovery appears to fail without clear reason for actual failure

---

## ISSUE #1: No Pre-Flight File Validation

### Current Behavior
```bash
Script starts recovery immediately
[OK] Second MySQL instance started (PID: 24468)
[ERROR] InnoDB: Could not find a valid tablespace file...
```

### Problem
- Script doesn't verify critical files exist before starting MySQL
- Users don't know if failure is due to missing files or corruption
- Only discovers issues after instance startup

### Required Fix
Add validation **before** starting instance:
```bash
validate_backup_files() {
  Check ibdata1 exists and readable
  Check ib_logfile0 and ib_logfile1 exist
  Check mysql/ directory exists
  Check target database directory exists
  Check all files have correct permissions

  Return failure with specific error if any missing
}

Call this in step5_create_dump() BEFORE start_second_instance()
```

### Location in Script
- Add new function: `validate_backup_files()` (line ~1800)
- Call from `step5_create_dump()` before line 1869

---

## ISSUE #2: No Database Discovery Diagnostics

### Current Behavior
```bash
[OK] InnoDB initialized successfully - no critical errors detected
[ERROR] Database 'yourloca_wp2' not found in second instance
[ERROR] Failed to create dump
```

### Problem
- Script checks if database exists (line 1278)
- But doesn't explain **WHY** it's not found
- No list of databases that WERE found
- No diagnosis of system table corruption

### Required Fix
Enhance database discovery check:
```bash
BEFORE dump attempt, enhance the db_check function:
  1. List ALL databases found: SHOW DATABASES
  2. Display list to user
  3. If target not found:
     - Test mysql.db accessibility
     - Test mysql.innodb_table_stats accessibility
     - Suggest cause (system tables corrupted)
     - Suggest solutions (restore mysql/ separately, try Mode 5-6, etc.)
```

### Location in Script
- Modify `dump_database()` function at line 1277-1282
- Add new function: `discover_and_report_databases()`
- Expand error message from line 1280

---

## ISSUE #3: No System Table Validation

### Current Behavior
- Script assumes `mysql/` directory is valid
- Never tests if system tables are accessible
- Corruption detected too late (during dump)

### Problem
- When `mysql.schemata` is corrupted → database invisible
- When `mysql.innodb_table_stats` is corrupted → metadata wrong
- Script doesn't detect these until dump attempt

### Required Fix
Add system table accessibility check after MySQL starts:
```bash
test_system_tables() {
  Test 1: mysql -S socket -e "SELECT COUNT(*) FROM mysql.db LIMIT 1;"
  Test 2: mysql -S socket -e "SELECT COUNT(*) FROM mysql.innodb_table_stats LIMIT 1;"
  Test 3: mysql -S socket -e "SELECT COUNT(*) FROM information_schema.schemata;"

  If any test fails:
    Report which table failed
    Explain this is why database can't be found
    Suggest recovery options
}

Call this AFTER instance starts, BEFORE dump attempt
```

### Location in Script
- Add new function: `test_system_tables()` (line ~1100)
- Call from `dump_database()` before database discovery check (before line 1277)

---

## ISSUE #4: No Active Error Log Monitoring

### Current Behavior
- Error log only checked AFTER instance shutdown
- Errors that occur during startup/initialization are lost
- Error messages from time of failure are separated from user response

### Problem
- Instance starts with errors but script continues to dump attempt
- Users don't see real-time errors
- Critical diagnostics lost in cleanup/shutdown process

### Required Fix
Monitor error log while instance is running:
```bash
start_error_log_monitor() {
  Start tail -f of error log in background
  Capture output to /tmp/monitor.log
  Return PID of monitor process
}

check_error_log_during_runtime() {
  Grep monitor.log for:
    - "ERROR"
    - "corrupted"
    - "not found"
    - "missing"
  If found, alert user IMMEDIATELY
  Don't wait for shutdown to show errors
}

stop_error_log_monitor() {
  Kill monitor process
  Analyze /tmp/monitor.log for error patterns
  Suggest recovery mode based on errors
}
```

### Location in Script
- Modify `start_second_instance()` to enable monitoring
- Add monitoring functions: `start_error_log_monitor()`, `check_error_log_during_runtime()`, `stop_error_log_monitor()`
- Call monitor start at line 1032 (after MySQL start in background)
- Check monitor during wait loop (lines 1037-1042)
- Analyze monitor results before database check

---

## ISSUE #5: No Recovery Mode Escalation Logic

### Current Behavior
- User selects ONE recovery mode
- If it fails, script exits
- User must re-run and select different mode manually

### Problem
- Modes 0-4 don't fix system table corruption
- User keeps trying same mode without knowing why it fails
- No logic to suggest Mode 5-6 when Modes 1-4 fail

### Required Fix
Implement mode escalation:
```bash
escalate_recovery_mode() {
  If Mode 2 failed due to metadata → suggest Mode 4
  If Mode 4 failed (instance started but DB not found) → suggest Mode 5
  If Mode 5-6 required → explain data loss risk

  Ask user if they want to auto-retry with higher mode
  Track which modes have been tried
  Don't repeat mode, go higher
}

Auto-escalate Pattern:
  Try Mode: [selected] → Fails with system error
  Suggest Mode: [selected + 2] → Auto-retry? (y/n)
  If user accepts → Re-run without restarting script
  If fails again → Suggest Mode 6
```

### Location in Script
- Modify `step5_create_dump()` error handling (line 1896-1901)
- Add: `escalate_recovery_mode()` function
- Call on dump_database failure to determine next mode
- Allow re-attempt with higher mode

---

## ISSUE #6: Architecture Problem - Linear vs. Menu

### Current Behavior
```
Step 1 → Step 2 → Step 3 → Step 4 → Step 5 → exit
```

### Problem
- Script is linear (one-way flow)
- Can't retry failed step without re-running entire script
- User must restart from beginning if they want to try different recovery mode
- No menu to navigate between steps

### Required Fix Options

#### Option A: Add Menu Loop (Recommended)
```bash
while true; do
  show_main_menu
  case $option in
    1) perform_step_1 ;;
    2) perform_step_2 ;;
    3) perform_step_3 ;;
    4) perform_step_4 ;;
    5) perform_step_5 ;;
    0) exit ;;
  esac

  # Return to menu on success or failure
done
```

#### Option B: Keep Linear but Add Retry Loop
```bash
# Current steps but with retry logic for each step
# If step fails, ask "Retry with different options? (y/n)"
# Allow re-attempting without full restart
```

**Recommendation**: Option B (minimal refactoring, keeps existing workflow)

### Location in Script
- Modify main() function (line 1939)
- Add conditional logic after each step
- Replace `exit` calls with `return`
- Check if retry needed before proceeding to next step

---

## ISSUE #7: Exit Calls in Functions

### Current Behavior
```bash
Line 1851: exit 0 (after cancel)
Line 1963: exit 0 (step 1 retry=n)
Line 1973: exit 0 (step 2 retry=n)
Line 1983: exit 0 (step 3 retry=n)
Line 1929: Function returns (then main() ends, script exits)
```

### Problem
- Functions use `exit` instead of `return`
- When function exits, entire script terminates
- Can't retry or go back to menu

### Required Fix
Replace ALL `exit` calls with control flow:
```bash
# WRONG:
if [ "$retry" != "y" ]; then
  exit 0
fi

# CORRECT:
if [ "$retry" != "y" ]; then
  return 1  # Return to caller
fi

# Caller decides what to do next (retry, menu, exit, etc.)
```

### Locations to Fix
- Line 1851: Change `exit 0` to `return 1`
- Line 1963: Change `exit 0` to `return 1`
- Line 1973: Change `exit 0` to `return 1`
- Line 1983: Change `exit 0` to `return 1`
- Line 1943: Keep `exit 1` (dependency check failure - critical)
- Line 1954: Keep `exit 0` (user explicitly cancelled - OK)

---

## IMPLEMENTATION PRIORITY

### Phase 1: CRITICAL (Do First)
1. **Add pre-flight file validation** (Issue #1)
   - Estimated effort: 30 minutes
   - Impact: Users know if files are missing

2. **Enhance database discovery** (Issue #2)
   - Estimated effort: 45 minutes
   - Impact: Users see what databases were found

3. **Add system table validation** (Issue #3)
   - Estimated effort: 45 minutes
   - Impact: Users know if system tables are corrupted

### Phase 2: IMPORTANT (Do Next)
4. **Add active error log monitoring** (Issue #4)
   - Estimated effort: 60 minutes
   - Impact: Real-time error visibility

5. **Fix exit calls** (Issue #7)
   - Estimated effort: 15 minutes
   - Impact: Enables retry and menu loop

### Phase 3: ENHANCEMENT (Do After)
6. **Add recovery mode escalation** (Issue #5)
   - Estimated effort: 60 minutes
   - Impact: Auto-suggest higher modes

7. **Add menu/retry loop** (Issue #6)
   - Estimated effort: 60 minutes
   - Impact: Users can run multiple recoveries

---

## EXPECTED IMPROVEMENTS

### Before Fixes
```
User runs script
  ↓
[OK] InnoDB initialized successfully
[ERROR] Database 'yourloca_wp2' not found in second instance
[ERROR] Failed to create dump
  ↓
Script exits - user confused about why
```

### After Phase 1 Fixes
```
User runs script
  ↓
[INFO] Validating backup files...
[OK] All required files present
[OK] InnoDB initialized successfully
[INFO] Found databases: information_schema, mysql, performance_schema, yourloca_wp2
[OK] Dump created successfully
```

### After Phase 2 Fixes (with error)
```
User runs script
  ↓
[INFO] Validating backup files...
[ERROR] Critical files missing: mysql/db.ibd
[ERROR] System tables corrupted - database metadata unavailable
[INFO] Recovery options:
  1. Restore mysql/ directory from backup
  2. Use recovery mode 5 (skip checksums)
  3. Restore to fresh MySQL instance
  ↓
[?] Would you like to:
  - Retry with different recovery mode? (y/n)
  - Exit and restore mysql/ separately? (y/n)
```

---

## TESTING PLAN

After implementing fixes:

1. **Test Case 1: Healthy Backup**
   - ✓ All files present
   - ✓ System tables intact
   - ✓ Database appears in SHOW DATABASES
   - Expected: Successful dump

2. **Test Case 2: Missing Database Directory**
   - ✗ Database directory absent
   - Expected: Pre-flight validation catches it

3. **Test Case 3: Corrupted System Tables**
   - ✓ Files present
   - ✗ mysql/db.ibd missing/corrupted
   - Expected: System table test catches it

4. **Test Case 4: Retry with Different Mode**
   - ✓ Mode 2 fails
   - ✓ Script suggests Mode 4
   - ✓ User retries without full restart
   - Expected: Menu loop allows retry

---

## DOCUMENTATION TO UPDATE

After implementing fixes:
1. Add troubleshooting guide for corrupted system tables
2. Document recovery mode selection guide
3. Add error message reference guide
4. Update pre-requisites section

---

## CONCLUSION

These 5+2 fixes will transform the script from a "one-shot recovery tool" to a "diagnostic and recovery assistant" that helps users understand and resolve InnoDB corruption issues.

**Priority**: Implement Phase 1 first (most impactful, lowest effort)
**Estimated Total Effort**: 4-5 hours for all phases
**Expected User Impact**: High (clearer diagnostics, better error messages)

---

**Generated**: February 27, 2026
**Status**: Ready for Implementation