# MySQL Restore to SQL Script - Comprehensive Improvement Plan ## Based on Real-World InnoDB Recovery Issues **Date**: February 27, 2026 **Script**: `/root/server-toolkit/modules/backup/mysql-restore-to-sql.sh` **Status**: Needs 5 Major Improvements **Issue Reference**: Ticket #43751550 --- ## EXECUTIVE SUMMARY The script currently handles the recovery workflow but is missing **5 critical validation checkpoints** that would help users diagnose and resolve InnoDB corruption issues. The detailed testing revealed that when system tables (`mysql/`) are corrupted, the script fails with vague error messages. **Issues Found**: 5 Major + 2 Architecture **Severity**: HIGH (affects recovery reliability) **User Impact**: Recovery appears to fail without clear reason for actual failure --- ## ISSUE #1: No Pre-Flight File Validation ### Current Behavior ```bash Script starts recovery immediately [OK] Second MySQL instance started (PID: 24468) [ERROR] InnoDB: Could not find a valid tablespace file... ``` ### Problem - Script doesn't verify critical files exist before starting MySQL - Users don't know if failure is due to missing files or corruption - Only discovers issues after instance startup ### Required Fix Add validation **before** starting instance: ```bash validate_backup_files() { Check ibdata1 exists and readable Check ib_logfile0 and ib_logfile1 exist Check mysql/ directory exists Check target database directory exists Check all files have correct permissions Return failure with specific error if any missing } Call this in step5_create_dump() BEFORE start_second_instance() ``` ### Location in Script - Add new function: `validate_backup_files()` (line ~1800) - Call from `step5_create_dump()` before line 1869 --- ## ISSUE #2: No Database Discovery Diagnostics ### Current Behavior ```bash [OK] InnoDB initialized successfully - no critical errors detected [ERROR] Database 'yourloca_wp2' not found in second instance [ERROR] Failed to create dump ``` ### Problem - Script checks if database exists (line 1278) - But doesn't explain **WHY** it's not found - No list of databases that WERE found - No diagnosis of system table corruption ### Required Fix Enhance database discovery check: ```bash BEFORE dump attempt, enhance the db_check function: 1. List ALL databases found: SHOW DATABASES 2. Display list to user 3. If target not found: - Test mysql.db accessibility - Test mysql.innodb_table_stats accessibility - Suggest cause (system tables corrupted) - Suggest solutions (restore mysql/ separately, try Mode 5-6, etc.) ``` ### Location in Script - Modify `dump_database()` function at line 1277-1282 - Add new function: `discover_and_report_databases()` - Expand error message from line 1280 --- ## ISSUE #3: No System Table Validation ### Current Behavior - Script assumes `mysql/` directory is valid - Never tests if system tables are accessible - Corruption detected too late (during dump) ### Problem - When `mysql.schemata` is corrupted → database invisible - When `mysql.innodb_table_stats` is corrupted → metadata wrong - Script doesn't detect these until dump attempt ### Required Fix Add system table accessibility check after MySQL starts: ```bash test_system_tables() { Test 1: mysql -S socket -e "SELECT COUNT(*) FROM mysql.db LIMIT 1;" Test 2: mysql -S socket -e "SELECT COUNT(*) FROM mysql.innodb_table_stats LIMIT 1;" Test 3: mysql -S socket -e "SELECT COUNT(*) FROM information_schema.schemata;" If any test fails: Report which table failed Explain this is why database can't be found Suggest recovery options } Call this AFTER instance starts, BEFORE dump attempt ``` ### Location in Script - Add new function: `test_system_tables()` (line ~1100) - Call from `dump_database()` before database discovery check (before line 1277) --- ## ISSUE #4: No Active Error Log Monitoring ### Current Behavior - Error log only checked AFTER instance shutdown - Errors that occur during startup/initialization are lost - Error messages from time of failure are separated from user response ### Problem - Instance starts with errors but script continues to dump attempt - Users don't see real-time errors - Critical diagnostics lost in cleanup/shutdown process ### Required Fix Monitor error log while instance is running: ```bash start_error_log_monitor() { Start tail -f of error log in background Capture output to /tmp/monitor.log Return PID of monitor process } check_error_log_during_runtime() { Grep monitor.log for: - "ERROR" - "corrupted" - "not found" - "missing" If found, alert user IMMEDIATELY Don't wait for shutdown to show errors } stop_error_log_monitor() { Kill monitor process Analyze /tmp/monitor.log for error patterns Suggest recovery mode based on errors } ``` ### Location in Script - Modify `start_second_instance()` to enable monitoring - Add monitoring functions: `start_error_log_monitor()`, `check_error_log_during_runtime()`, `stop_error_log_monitor()` - Call monitor start at line 1032 (after MySQL start in background) - Check monitor during wait loop (lines 1037-1042) - Analyze monitor results before database check --- ## ISSUE #5: No Recovery Mode Escalation Logic ### Current Behavior - User selects ONE recovery mode - If it fails, script exits - User must re-run and select different mode manually ### Problem - Modes 0-4 don't fix system table corruption - User keeps trying same mode without knowing why it fails - No logic to suggest Mode 5-6 when Modes 1-4 fail ### Required Fix Implement mode escalation: ```bash escalate_recovery_mode() { If Mode 2 failed due to metadata → suggest Mode 4 If Mode 4 failed (instance started but DB not found) → suggest Mode 5 If Mode 5-6 required → explain data loss risk Ask user if they want to auto-retry with higher mode Track which modes have been tried Don't repeat mode, go higher } Auto-escalate Pattern: Try Mode: [selected] → Fails with system error Suggest Mode: [selected + 2] → Auto-retry? (y/n) If user accepts → Re-run without restarting script If fails again → Suggest Mode 6 ``` ### Location in Script - Modify `step5_create_dump()` error handling (line 1896-1901) - Add: `escalate_recovery_mode()` function - Call on dump_database failure to determine next mode - Allow re-attempt with higher mode --- ## ISSUE #6: Architecture Problem - Linear vs. Menu ### Current Behavior ``` Step 1 → Step 2 → Step 3 → Step 4 → Step 5 → exit ``` ### Problem - Script is linear (one-way flow) - Can't retry failed step without re-running entire script - User must restart from beginning if they want to try different recovery mode - No menu to navigate between steps ### Required Fix Options #### Option A: Add Menu Loop (Recommended) ```bash while true; do show_main_menu case $option in 1) perform_step_1 ;; 2) perform_step_2 ;; 3) perform_step_3 ;; 4) perform_step_4 ;; 5) perform_step_5 ;; 0) exit ;; esac # Return to menu on success or failure done ``` #### Option B: Keep Linear but Add Retry Loop ```bash # Current steps but with retry logic for each step # If step fails, ask "Retry with different options? (y/n)" # Allow re-attempting without full restart ``` **Recommendation**: Option B (minimal refactoring, keeps existing workflow) ### Location in Script - Modify main() function (line 1939) - Add conditional logic after each step - Replace `exit` calls with `return` - Check if retry needed before proceeding to next step --- ## ISSUE #7: Exit Calls in Functions ### Current Behavior ```bash Line 1851: exit 0 (after cancel) Line 1963: exit 0 (step 1 retry=n) Line 1973: exit 0 (step 2 retry=n) Line 1983: exit 0 (step 3 retry=n) Line 1929: Function returns (then main() ends, script exits) ``` ### Problem - Functions use `exit` instead of `return` - When function exits, entire script terminates - Can't retry or go back to menu ### Required Fix Replace ALL `exit` calls with control flow: ```bash # WRONG: if [ "$retry" != "y" ]; then exit 0 fi # CORRECT: if [ "$retry" != "y" ]; then return 1 # Return to caller fi # Caller decides what to do next (retry, menu, exit, etc.) ``` ### Locations to Fix - Line 1851: Change `exit 0` to `return 1` - Line 1963: Change `exit 0` to `return 1` - Line 1973: Change `exit 0` to `return 1` - Line 1983: Change `exit 0` to `return 1` - Line 1943: Keep `exit 1` (dependency check failure - critical) - Line 1954: Keep `exit 0` (user explicitly cancelled - OK) --- ## IMPLEMENTATION PRIORITY ### Phase 1: CRITICAL (Do First) 1. **Add pre-flight file validation** (Issue #1) - Estimated effort: 30 minutes - Impact: Users know if files are missing 2. **Enhance database discovery** (Issue #2) - Estimated effort: 45 minutes - Impact: Users see what databases were found 3. **Add system table validation** (Issue #3) - Estimated effort: 45 minutes - Impact: Users know if system tables are corrupted ### Phase 2: IMPORTANT (Do Next) 4. **Add active error log monitoring** (Issue #4) - Estimated effort: 60 minutes - Impact: Real-time error visibility 5. **Fix exit calls** (Issue #7) - Estimated effort: 15 minutes - Impact: Enables retry and menu loop ### Phase 3: ENHANCEMENT (Do After) 6. **Add recovery mode escalation** (Issue #5) - Estimated effort: 60 minutes - Impact: Auto-suggest higher modes 7. **Add menu/retry loop** (Issue #6) - Estimated effort: 60 minutes - Impact: Users can run multiple recoveries --- ## EXPECTED IMPROVEMENTS ### Before Fixes ``` User runs script ↓ [OK] InnoDB initialized successfully [ERROR] Database 'yourloca_wp2' not found in second instance [ERROR] Failed to create dump ↓ Script exits - user confused about why ``` ### After Phase 1 Fixes ``` User runs script ↓ [INFO] Validating backup files... [OK] All required files present [OK] InnoDB initialized successfully [INFO] Found databases: information_schema, mysql, performance_schema, yourloca_wp2 [OK] Dump created successfully ``` ### After Phase 2 Fixes (with error) ``` User runs script ↓ [INFO] Validating backup files... [ERROR] Critical files missing: mysql/db.ibd [ERROR] System tables corrupted - database metadata unavailable [INFO] Recovery options: 1. Restore mysql/ directory from backup 2. Use recovery mode 5 (skip checksums) 3. Restore to fresh MySQL instance ↓ [?] Would you like to: - Retry with different recovery mode? (y/n) - Exit and restore mysql/ separately? (y/n) ``` --- ## TESTING PLAN After implementing fixes: 1. **Test Case 1: Healthy Backup** - ✓ All files present - ✓ System tables intact - ✓ Database appears in SHOW DATABASES - Expected: Successful dump 2. **Test Case 2: Missing Database Directory** - ✗ Database directory absent - Expected: Pre-flight validation catches it 3. **Test Case 3: Corrupted System Tables** - ✓ Files present - ✗ mysql/db.ibd missing/corrupted - Expected: System table test catches it 4. **Test Case 4: Retry with Different Mode** - ✓ Mode 2 fails - ✓ Script suggests Mode 4 - ✓ User retries without full restart - Expected: Menu loop allows retry --- ## DOCUMENTATION TO UPDATE After implementing fixes: 1. Add troubleshooting guide for corrupted system tables 2. Document recovery mode selection guide 3. Add error message reference guide 4. Update pre-requisites section --- ## CONCLUSION These 5+2 fixes will transform the script from a "one-shot recovery tool" to a "diagnostic and recovery assistant" that helps users understand and resolve InnoDB corruption issues. **Priority**: Implement Phase 1 first (most impactful, lowest effort) **Estimated Total Effort**: 4-5 hours for all phases **Expected User Impact**: High (clearer diagnostics, better error messages) --- **Generated**: February 27, 2026 **Status**: Ready for Implementation