Add detailed skip tracking to hardware health check disk summary

Enhancement: Show exactly what devices were skipped and why

Problem:
The disk summary showed "Total disks checked: 2" but only displayed
1 disk in the report. Users couldn't tell what was skipped or why.

Solution:
Added comprehensive skip tracking and breakdown in summary:

Skip Counters Added:
- skipped_count: Total devices skipped
- skipped_raid: Hardware RAID controllers
- skipped_virtual: Virtual/cloud disks
- skipped_lvm: Software RAID/LVM volumes
- skipped_other: USB/special devices

Summary Now Shows:
 Total devices found: X
 Physical disks monitored: X healthy, X warning, X failed
 Devices skipped (SMART not applicable): X
  • Hardware RAID controllers: X (use vendor tools)
  • Software RAID/LVM: X (monitor underlying disks)
  • Virtual/cloud disks: X (managed by hypervisor)
  • Other (USB/special): X (see findings for details)

Example Output (Physical Server with RAID):
Before:
  Total disks checked: 2
  Healthy: 1
  Warning: 0
  Failed: 0

After:
  Total devices found: 2
  Physical disks monitored: 1 healthy, 0 warning, 0 failed
  Devices skipped (SMART not applicable): 1
    • Hardware RAID controllers: 1 (use vendor tools)

Benefits:
 Crystal clear what was skipped and why
 Users understand the complete device inventory
 Each skip type has helpful guidance
 No confusion about missing devices

Changes to modules/performance/hardware-health-check.sh:
- Lines 139-147: Added skip counter variables
- Lines 160-161, 168-169: Track inaccessible devices as skipped
- Lines 210-211: Track RAID controllers as skipped
- Lines 252-253: Track virtual disks as skipped
- Lines 261-262: Track LVM/software RAID as skipped
- Lines 285-286, 294-295: Track other special devices as skipped
- Lines 560-588: Enhanced summary with skip breakdown

User Request: "add anythihg minor to enhance it"

Status:  COMPLETE - Summary now shows full device inventory breakdown
This commit is contained in:
cschantz
2025-12-16 02:52:06 -05:00
parent 9a5a55f788
commit e050bb17ea
+45 -5
View File
@@ -140,6 +140,11 @@ After installing, run: systemctl enable smartd && systemctl start smartd"
local healthy_count=0 local healthy_count=0
local warning_count=0 local warning_count=0
local failed_count=0 local failed_count=0
local skipped_count=0
local skipped_raid=0
local skipped_virtual=0
local skipped_lvm=0
local skipped_other=0
for disk in $disks; do for disk in $disks; do
disk_count=$((disk_count + 1)) disk_count=$((disk_count + 1))
@@ -152,12 +157,16 @@ After installing, run: systemctl enable smartd && systemctl start smartd"
# 1. CHECK: Device exists and smartctl can communicate # 1. CHECK: Device exists and smartctl can communicate
if echo "$device_info" | grep -qiE "No such device|open device.*failed|Permission denied"; then if echo "$device_info" | grep -qiE "No such device|open device.*failed|Permission denied"; then
echo -e "${CYAN}[INFO]${NC} Skipping $disk (device not accessible)" echo -e "${CYAN}[INFO]${NC} Skipping $disk (device not accessible)"
skipped_count=$((skipped_count + 1))
skipped_other=$((skipped_other + 1))
continue continue
fi fi
# 2. CHECK: SMART support availability # 2. CHECK: SMART support availability
if echo "$device_info" | grep -qiE "SMART support is:.*Unavailable|SMART support is:.*Disabled|Unable to detect device type"; then if echo "$device_info" | grep -qiE "SMART support is:.*Unavailable|SMART support is:.*Disabled|Unable to detect device type"; then
echo -e "${CYAN}[INFO]${NC} Skipping $disk (SMART not supported or disabled)" echo -e "${CYAN}[INFO]${NC} Skipping $disk (SMART not supported or disabled)"
skipped_count=$((skipped_count + 1))
skipped_other=$((skipped_other + 1))
continue continue
fi fi
@@ -198,6 +207,8 @@ After installing, run: systemctl enable smartd && systemctl start smartd"
fi fi
echo -e "${CYAN}[INFO]${NC} Skipping $disk ($raid_type: $model)" echo -e "${CYAN}[INFO]${NC} Skipping $disk ($raid_type: $model)"
skipped_count=$((skipped_count + 1))
skipped_raid=$((skipped_raid + 1))
add_finding "INFO" "$raid_type Detected: $disk" \ add_finding "INFO" "$raid_type Detected: $disk" \
"Device: $disk "Device: $disk
Controller: $model Controller: $model
@@ -238,6 +249,8 @@ Check controller logs and status for drive failures." \
fi fi
echo -e "${CYAN}[INFO]${NC} Skipping $disk ($virt_type)" echo -e "${CYAN}[INFO]${NC} Skipping $disk ($virt_type)"
skipped_count=$((skipped_count + 1))
skipped_virtual=$((skipped_virtual + 1))
# Already handled by VM detection at start of function # Already handled by VM detection at start of function
continue continue
fi fi
@@ -245,6 +258,8 @@ Check controller logs and status for drive failures." \
# 6. DETECT: Software RAID / LVM / Device Mapper # 6. DETECT: Software RAID / LVM / Device Mapper
if echo "$disk" | grep -qE "md[0-9]|dm-[0-9]"; then if echo "$disk" | grep -qE "md[0-9]|dm-[0-9]"; then
echo -e "${CYAN}[INFO]${NC} Skipping $disk (software RAID/LVM device)" echo -e "${CYAN}[INFO]${NC} Skipping $disk (software RAID/LVM device)"
skipped_count=$((skipped_count + 1))
skipped_lvm=$((skipped_lvm + 1))
add_finding "INFO" "️ Software RAID/LVM Detected: $disk" \ add_finding "INFO" "️ Software RAID/LVM Detected: $disk" \
"Device: $disk "Device: $disk
Type: Software RAID or LVM logical volume Type: Software RAID or LVM logical volume
@@ -267,6 +282,8 @@ For LVM (dm- devices):
# 7. DETECT: Loop devices, RAM disks, other special devices # 7. DETECT: Loop devices, RAM disks, other special devices
if echo "$disk" | grep -qE "loop|ram|zram|nbd"; then if echo "$disk" | grep -qE "loop|ram|zram|nbd"; then
echo -e "${CYAN}[INFO]${NC} Skipping $disk (special device: loop/ram/network)" echo -e "${CYAN}[INFO]${NC} Skipping $disk (special device: loop/ram/network)"
skipped_count=$((skipped_count + 1))
skipped_other=$((skipped_other + 1))
continue continue
fi fi
@@ -274,6 +291,8 @@ For LVM (dm- devices):
# Try to get SMART attributes - if this fails, skip # Try to get SMART attributes - if this fails, skip
if ! smartctl -A "$disk" &>/dev/null; then if ! smartctl -A "$disk" &>/dev/null; then
echo -e "${CYAN}[INFO]${NC} Skipping $disk (no SMART attributes available)" echo -e "${CYAN}[INFO]${NC} Skipping $disk (no SMART attributes available)"
skipped_count=$((skipped_count + 1))
skipped_other=$((skipped_other + 1))
add_finding "INFO" "️ Device Without SMART: $disk" \ add_finding "INFO" "️ Device Without SMART: $disk" \
"Device: $disk "Device: $disk
Model: ${model:-Unknown} Model: ${model:-Unknown}
@@ -538,12 +557,33 @@ SMART Attributes:
fi fi
done done
# Summary finding # Summary finding with skip breakdown
local summary_details="Total devices found: $disk_count
Physical disks monitored: $healthy_count healthy, $warning_count warning, $failed_count failed"
if [ "$skipped_count" -gt 0 ]; then
summary_details="${summary_details}
Devices skipped (SMART not applicable): $skipped_count"
if [ "$skipped_raid" -gt 0 ]; then
summary_details="${summary_details}
• Hardware RAID controllers: $skipped_raid (use vendor tools)"
fi
if [ "$skipped_lvm" -gt 0 ]; then
summary_details="${summary_details}
• Software RAID/LVM: $skipped_lvm (monitor underlying disks)"
fi
if [ "$skipped_virtual" -gt 0 ]; then
summary_details="${summary_details}
• Virtual/cloud disks: $skipped_virtual (managed by hypervisor)"
fi
if [ "$skipped_other" -gt 0 ]; then
summary_details="${summary_details}
• Other (USB/special): $skipped_other (see findings for details)"
fi
fi
add_finding "INFO" "Disk Health Summary" \ add_finding "INFO" "Disk Health Summary" \
"Total disks checked: $disk_count "$summary_details" \
Healthy: $healthy_count
Warning: $warning_count
Failed: $failed_count" \
"Regular SMART monitoring recommended: smartctl -a /dev/[disk]" "Regular SMART monitoring recommended: smartctl -a /dev/[disk]"
} }