MDADM RAID 0 Échec

La nuit dernière, j'ai remarqué que mon horloge système sur mon server Ubuntu était rapide de 5 minutes alors j'ai exécuté une command 'ntpdate pool.ntp.org' et je suis allé me ​​coucher.

Ce matin, j'ai remarqué que les actions de SAMBA ne fonctionnaient pas. En regardant le server, j'ai commencé à voir les permissions définies comme ???? sur les volumes où les actions sont.

J'ai redémarré le server et je peux voir le mdadm a échoué:

[ 13.920349] sd 3:0:0:0: [sdb] [ 13.920388] Sense Key : Medium Error [current] [descriptor] [ 13.920499] Descriptor sense data with sense descriptors (in hex): [ 13.920559] 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 [ 13.922059] 00 00 00 00 [ 13.922223] sd 3:0:0:0: [sdb] [ 13.922255] Add. Sense: Unrecovered read error - auto reallocate failed [ 13.922316] sd 3:0:0:0: [sdb] CDB: [ 13.922347] Read(16): 88 00 00 00 00 00 00 00 00 00 00 00 00 08 00 00 [ 13.922855] end_request: I/O error, dev sdb, sector 0 [ 13.922888] Buffer I/O error on device sdb, logical block 0 [ 13.922927] ata4: EH complete [ 14.859145] ldm_validate_partition_table(): Disk read failed. [ 14.859203] Dev sdb: unable to read RDB block 0 [ 14.870317] sdb: unable to read partition table [ 14.870646] sdb: detected capacity change from 0 to 4000787030016 [ 14.870869] sd 3:0:0:0: [sdb] Attached SCSI disk [ 14.886265] random: nonblocking pool is initialized [ 15.510741] md: bind<sdc1> 

Donc essayer de comprendre cela ici est mdadm.conf

 cat /etc/mdadm/mdadm.conf # mdadm.conf # # Please refer to mdadm.conf(5) for information about this file. # # by default (built-in), scan all partitions (/proc/partitions) and all # containers for MD superblocks. alternatively, specify devices to scan, using # wildcards if desired. #DEVICE partitions containers # auto-create devices with Debian standard permissions CREATE owner=root group=disk mode=0660 auto=yes # automatically tag new arrays as belonging to the local system HOMEHOST <system> # instruct the monitoring daemon where to send mail alerts MAILADDR root # definitions of existing MD arrays # This file was auto-generated on Mon, 16 Feb 2015 18:24:04 -0500 # by mkconf $Id$ DEVICE /dev/sdb1 /dev/sdc1 ARRAY /dev/md0 level=raid0 devices=/dev/sdb1,/dev/sdc1 

Ensuite, j'ai couru la command smartctl sur les deux disques dans le RAID et ils ont l'air en bonne santé

 smartctl -a -s on /dev/sdb smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.13.0-45-generic] (local build) Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Device Model: WDC WD40EZRX-00SPEB0 Serial Number: WD-WCC4E0NLZ6ED LU WWN Device Id: 5 0014ee 20b74560f Firmware Version: 80.00A80 User Capacity: 4,000,787,030,016 bytes [4.00 TB] Sector Sizes: 512 bytes logical, 4096 bytes physical Rotation Rate: 5400 rpm Device is: Not in smartctl database [for details use: -P showall] ATA Version is: ACS-2 (minor revision not indicated) SATA Version is: SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s) Local Time is: Fri Oct 2 11:45:31 2015 EDT SMART support is: Available - device has SMART capability. SMART support is: Disabled === START OF ENABLE/DISABLE COMMANDS SECTION === SMART Enabled. === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x82) Offline data collection activity was completed without error. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: (55740) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 557) minutes. Conveyance self-test routine recommended polling time: ( 5) minutes. SCT capabilities: (0x7035) SCT Status supported. SCT Feature Control supported. SCT Data Table supported. SMART Atsortingbutes Data Structure revision number: 16 Vendor Specific SMART Atsortingbutes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 100 253 051 Pre-fail Always - 0 3 Spin_Up_Time 0x0027 184 184 021 Pre-fail Always - 7775 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 15 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 9 Power_On_Hours 0x0032 093 093 000 Old_age Always - 5434 10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 15 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 6 193 Load_Cycle_Count 0x0032 132 132 000 Old_age Always - 204074 194 Temperature_Celsius 0x0022 119 109 000 Old_age Always - 33 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 96 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 49 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 199 199 000 Old_age Offline - 758 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 No self-tests have been logged. [To run self-tests, use: smartctl -t] SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. smartctl -a -s on /dev/sdc smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.13.0-45-generic] (local build) Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Device Model: WDC WD40EZRX-00SPEB0 Serial Number: WD-WCC4E0CZDE98 LU WWN Device Id: 5 0014ee 2b6205b40 Firmware Version: 80.00A80 User Capacity: 4,000,787,030,016 bytes [4.00 TB] Sector Sizes: 512 bytes logical, 4096 bytes physical Rotation Rate: 5400 rpm Device is: Not in smartctl database [for details use: -P showall] ATA Version is: ACS-2 (minor revision not indicated) SATA Version is: SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s) Local Time is: Fri Oct 2 11:47:29 2015 EDT SMART support is: Available - device has SMART capability. SMART support is: Disabled === START OF ENABLE/DISABLE COMMANDS SECTION === SMART Enabled. === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x82) Offline data collection activity was completed without error. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: (52020) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 520) minutes. Conveyance self-test routine recommended polling time: ( 5) minutes. SCT capabilities: (0x7035) SCT Status supported. SCT Feature Control supported. SCT Data Table supported. SMART Atsortingbutes Data Structure revision number: 16 Vendor Specific SMART Atsortingbutes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 100 253 051 Pre-fail Always - 0 3 Spin_Up_Time 0x0027 174 174 021 Pre-fail Always - 8258 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 10 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 9 Power_On_Hours 0x0032 093 093 000 Old_age Always - 5434 10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 10 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 3 193 Load_Cycle_Count 0x0032 138 138 000 Old_age Always - 188394 194 Temperature_Celsius 0x0022 120 112 000 Old_age Always - 32 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 No self-tests have been logged. [To run self-tests, use: smartctl -t] SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. 

Jusqu'à présent, tout semble bien, mais alors je cours mdadm et je reçois ceci:

 mdadm -E /dev/sdc1 /dev/sdc1: Magic : a92b4efc Version : 1.2 Feature Map : 0x0 Array UUID : 933d1825:56122a49:779fbad0:926ab5c9 Name : BAILEYFS01:0 (local to host BAILEYFS01) Creation Time : Tue Feb 17 17:22:13 2015 Raid Level : raid0 Raid Devices : 2 Avail Dev Size : 7814033392 (3726.02 GiB 4000.79 GB) Data Offset : 16 sectors Super Offset : 8 sectors State : clean Device UUID : e78061dc:86e60bc0:f4f81839:3816d74a Update Time : Tue Feb 17 17:22:13 2015 Checksum : 1d8e1dfc - correct Events : 0 Chunk Size : 512K Device Role : Active device 1 Array State : AA ('A' == active, '.' == missing) mdadm -E /dev/sdb1 mdadm: cannot open /dev/sdb1: No such file or directory 

Voici la sortie de fdisk sur les deux disques du tableau

 fdisk -l /dev/sdb WARNING: GPT (GUID Partition Table) detected on '/dev/sdb'! The util fdisk doesn't support GPT. Use GNU Parted. Disk /dev/sdb: 4000.8 GB, 4000787030016 bytes 255 heads, 63 sectors/track, 486401 cylinders, total 7814037168 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes Disk identifier: 0x00000000 Device Boot Start End Blocks Id System /dev/sdb1 1 4294967295 2147483647+ ee GPT Partition 1 does not start on physical sector boundary. fdisk -l /dev/sdc WARNING: GPT (GUID Partition Table) detected on '/dev/sdc'! The util fdisk doesn't support GPT. Use GNU Parted. Disk /dev/sdc: 4000.8 GB, 4000787030016 bytes 255 heads, 63 sectors/track, 486401 cylinders, total 7814037168 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes Disk identifier: 0x00000000 Device Boot Start End Blocks Id System /dev/sdc1 1 4294967295 2147483647+ ee GPT Partition 1 does not start on physical sector boundary. 

Voici la sortie de parted (notez que cela semble listr tous mes disques, les seuls qui me concernent sont ceux qui composent mon array RAID, sdb & sdc)

 parted -l /dev/sdb Model: ATA ST3250318AS (scsi) Disk /dev/sda: 250GB Sector size (logical/physical): 512B/512B Partition Table: msdos Number Start End Size Type File system Flags 1 1049kB 246GB 246GB primary ext4 boot 2 246GB 250GB 3754MB extended 5 246GB 250GB 3754MB logical linux-swap(v1) Model: ATA WDC WD40EZRX-00S (scsi) Disk /dev/sdb: 4001GB Sector size (logical/physical): 512B/4096B Partition Table: gpt Number Start End Size File system Name Flags 1 1049kB 4001GB 4001GB ext3 primary Model: ATA WDC WD40EZRX-00S (scsi) Disk /dev/sdc: 4001GB Sector size (logical/physical): 512B/4096B Partition Table: gpt Number Start End Size File system Name Flags 1 1049kB 4001GB 4001GB ext3 primary Model: ATA ST3000DM001-9YN1 (scsi) Disk /dev/sdd: 3001GB Sector size (logical/physical): 512B/4096B Partition Table: gpt Number Start End Size File system Name Flags 1 1049kB 3001GB 3001GB ext4 primary msftdata Model: Seagate Desktop (scsi) Disk /dev/sde: 3001GB Sector size (logical/physical): 4096B/4096B Partition Table: gpt Number Start End Size File system Name Flags 1 1049kB 3001GB 3001GB Basic data partition msftdata 

Voici la sortie de gdisk -l sur les deux disques

 gdisk -l /dev/sdb GPT fdisk (gdisk) version 0.8.8 Partition table scan: MBR: protective BSD: not present APM: not present GPT: present Found valid GPT with protective MBR; using GPT. Disk /dev/sdb: 7814037168 sectors, 3.6 TiB Logical sector size: 512 bytes Disk identifier (GUID): DA484D62-BB5D-461B-9F96-EAC8A5815C7B Partition table holds up to 128 ensortinges First usable sector is 34, last usable sector is 7814037134 Partitions will be aligned on 2048-sector boundaries Total free space is 3693 sectors (1.8 MiB) Number Start (sector) End (sector) Size Code Name 1 2048 7814035455 3.6 TiB 8300 primary gdisk -l /dev/sdc GPT fdisk (gdisk) version 0.8.8 Partition table scan: MBR: protective BSD: not present APM: not present GPT: present Found valid GPT with protective MBR; using GPT. Disk /dev/sdc: 7814037168 sectors, 3.6 TiB Logical sector size: 512 bytes Disk identifier (GUID): BDE33471-BF86-4B5F-9DAF-5D3E67AE7E40 Partition table holds up to 128 ensortinges First usable sector is 34, last usable sector is 7814037134 Partitions will be aligned on 2048-sector boundaries Total free space is 3693 sectors (1.8 MiB) Number Start (sector) End (sector) Size Code Name 1 2048 7814035455 3.6 TiB 8300 primary 

Je n'ai vraiment aucune idée de ce qu'il faut faire ensuite …. Yat-il un moyen de réparer ce tableau?

Quelques commentaires généraux.

  1. Vous avez désactivé SMART sur vos deux disques (ou au less, vous ne l'avez pas activé). Aucun test n'est en cours d'exécution et aucun n'a été exécuté auparavant. Cela me dit qu'il n'y a aucun moyen de savoir si le disque est défectueux ou non.

  2. Le message d'erreur du kernel, Add. Sense: Unrecovered read error - auto reallocate failed Add. Sense: Unrecovered read error - auto reallocate failed qui indique que le disque est en train d'échouer de manière catastrophique, car il ne dispose plus de secteurs de rechange pour replace celui qui a échoué. C'est vraiment une mauvaise nouvelle pour un disque d'un set RAID 0.

Vous pouvez vous débarrasser d'une mise hors tension complète et redémarrer, mais peu importe, je vous recommand vivement d'installer les outils SMART et de les configurer pour tester les disques régulièrement.

Merci à @casey pour les commentaires!

Après avoir exécuté les commands parted et gdisk, il est clair que les disques sont en bonne santé, donc je ne sais pas pourquoi la masortingce échoue au démarrage.

Un collègue a recommandé de courir:

 partprobe /dev/sdb 

Je l'ai fait et ensuite relancer les commands mdadm et maintenant mdadm peut voir sdb

 mdadm -E /dev/sdb /dev/sdb: MBR Magic : aa55 Partition[0] : 4294967295 sectors at 1 (type ee) mdadm -E /dev/sdc /dev/sdc: MBR Magic : aa55 Partition[0] : 4294967295 sectors at 1 (type ee) 

J'ai redémarré encore une fois et cette fois, pas d'erreurs et le raid a été correctement établi.

J'ai probablement besoin de comprendre quelle command le process de démarrage est en cours d'exécution afin que je puisse exécuter manuellement la command si cela se reproduit.

Ouf … quelle matinée folle ça a été!