0

The server enters emergency mode after a reboot. There are two disks on the machine, a ssd /dev/sda1 and a hdd /dev/sdb1. Originally the ssd was mounted to / and the hdd was mounted to /home.

Now the home directory is missing, and lsblk shows the two disks are still there. So I first tried to mount the hdd back manually. But it fails, saying cannot mount a disk with type "LVM2_member".

It seems the disk was managed using lvm, so I used pvs, vgdisplay, lvdisplay to inspect, but none of those command shows anything.

So it seems the lvm managed volume metadata are all missing. I think I should manually reconstruct those structures.

pvcreate /dev/sdb1

It failed, reporting "Can't initialize physical volume /dev/sdb1 of volume group cs without -ff."

I think it means the disk belongs to the volume group cs. But vgdisplay shows nothing, and I cannot create logical volume of vg cs, because it does not exist.

Here are my questions:

  1. Why the pvs are all gone? Where should I begin to do troubleshooting?
  2. What should I do if I want the /home directory back, should I start from creating pv again? Will -ff remove the data on the disk?

Update: smartctl -x /dev/sdb shows the drive's User Capacity, Logical block size, Logical Unit id, Decice Type and Local Time, they are all normal results, but it says SMART support is unavailable. Result:

Vendor AVAGO
Product: MR9361-8i
Revision: 4.68
Compliance: SPC-3
User Capacity: 4.00 TB
Logical block size: 512 bytes
Logical unit id: 0x600605b011806720ff00001901d6ca4a
Device type: disk
Local Time is: Wed Jun 12 1:40:01 2024 EDT
SMAERT support is: Unavailable - device lack SMART capability
Read cache is: Enabled
Writeback Cache is: Disabled

And I tried pvcreate with --uuid and the --restorefile, the file is /etc/lvm/backup/cs, but it still shows "Can't initialize physical volume /dev/sdb1 of volume group cs without -ff", and vgchange does not work beacuse there is no vg cs.

What confuses me is, the pvreate command refuse to restore the pv metadata, saying sdb1 is already part of the volume group cs. Meanwhile, vgdisplay and all other commands say this volume group does not exist.

About the raid controller:

I forgot to mention the hardware raid controller. It has 8 maximum slots, and now only two of them (slot 0 and slot 2) are used, each drive is configured as raid0 mode, so it is juat like JBOD mode in may aspects. But I am not sure whether this could have caused other delicate issues that resulted all this. The details are as below:

Update appendix:

result of storcli /c0 show all:

Generating detailed summary of the adapter, it may take a while to complete.

CLI Version = 007.2612.0000.0000 June 13, 2023
Operating system = Linux 5.14.0-364.el9.x86_64
Controller = 0
Status = Success
Description = None


Basics :
======
Controller = 0
Model = AVAGO MegaRAID SAS 9361-8i
Serial Number = SPB2421459
Current Controller Date/Time = 06/12/2024, 06:52:38
Current System Date/time = 06/12/2024, 02:52:38
SAS Address = 500605b011806720
PCI Address = 00:17:00:00
Mfg Date = 06/26/21
Rework Date = 00/00/00
Revision No = 08C


Version :
=======
Firmware Package Build = 24.21.0-0159
Firmware Version = 4.680.00-8577
CPLD Version = 26747-01A
Bios Version = 6.36.00.3_4.19.08.00_0x06180206
HII Version = 03.25.05.15
Ctrl-R Version = 5.19-0609
Preboot CLI Version = 01.07-05:#%0000
NVDATA Version = 3.1705.00-0028
Boot Block Version = 3.07.00.00-0004
Driver Name = megaraid_sas
Driver Version = 07.725.01.00-rc1


Bus :
===
Vendor Id = 0x1000
Device Id = 0x5D
SubVendor Id = 0x1000
SubDevice Id = 0x9361
Host Interface = PCI-E
Device Interface = SAS-12G
Bus Number = 23
Device Number = 0
Function Number = 0
Domain ID = 0


Pending Images in Flash :
=======================
Image name = No pending images


Status :
======
Controller Status = Optimal
Memory Correctable Errors = 0
Memory Uncorrectable Errors = 0
ECC Bucket Count = 0
Any Offline VD Cache Preserved = No
BBU Status = NA
PD Firmware Download in progress = No
Support PD Firmware Download = Yes
Lock Key Assigned = No
Failed to get lock key on bootup = No
Lock key has not been backed up = No
Bios was not detected during boot = No
Controller must be rebooted to complete security operation = No
A rollback operation is in progress = No
At least one PFK exists in NVRAM = Yes
SSC Policy is WB = No
Controller has booted into safe mode = No
Controller shutdown required = No
Controller has booted into certificate provision mode = No
Current Personality = RAID-Mode 


Supported Adapter Operations :
============================
Rebuild Rate = Yes
CC Rate = Yes
BGI Rate  = Yes
Reconstruct Rate = Yes
Patrol Read Rate = Yes
Alarm Control = Yes
Cluster Support = No
BBU = NA
Spanning = Yes
Dedicated Hot Spare = Yes
Revertible Hot Spares = Yes
Foreign Config Import = Yes
Self Diagnostic = Yes
Allow Mixed Redundancy on Array = No
Global Hot Spares = Yes
Deny SCSI Passthrough = No
Deny SMP Passthrough = No
Deny STP Passthrough = No
Support more than 8 Phys = Yes
FW and Event Time in GMT = No
Support Enhanced Foreign Import = Yes
Support Enclosure Enumeration = Yes
Support Allowed Operations = Yes
Abort CC on Error = Yes
Support Multipath = Yes
Support Odd & Even Drive count in RAID1E = No
Support Security = No
Support Config Page Model = Yes
Support the OCE without adding drives = Yes
Support EKM = No
Snapshot Enabled = No
Support PFK = Yes
Support PI = Yes
Support LDPI Type1 = No
Support LDPI Type2 = No
Support LDPI Type3 = No
Support Ld BBM Info = No
Support Shield State = Yes
Block SSD Write Disk Cache Change = Yes
Support Suspend Resume BG ops = Yes
Support Emergency Spares = Yes
Support Set Link Speed = Yes
Support Boot Time PFK Change = No
Support JBOD = Yes
Disable Online PFK Change = No
Support Perf Tuning = Yes
Support SSD PatrolRead = Yes
Real Time Scheduler = Yes
Support Reset Now = Yes
Support Emulated Drives = Yes
Headless Mode = Yes
Dedicated HotSpares Limited = No
Point In Time Progress = Yes
Extended LD = Yes
Support Uneven span  = No
Support Config Auto Balance = No
Support Maintenance Mode = No
Support Diagnostic results = Yes
Support Ext Enclosure = Yes
Support Sesmonitoring = Yes
Support SecurityonJBOD = Yes
Support ForceFlash = Yes
Support DisableImmediateIO = Yes
Support LargeIOSupport = Yes
Support DrvActivityLEDSetting = Yes
Support FlushWriteVerify = Yes
Support CPLDUpdate = Yes
Support ForceTo512e = Yes
Support discardCacheDuringLDDelete = Yes
Support JBOD Write cache = No
Support Large QD Support = No
Support Ctrl Info Extended = No
Support IButton less = No
Support AES Encryption Algorithm = No
Support Encrypted MFC = No
Support Snapdump = Yes
Support Force Personality Change = No
Support Dual Fw Image = No
Support PSOC Update = No
Support Secure Boot = No
Support Debug Queue = Yes
Support Least Latency Mode = Yes
Support OnDemand Snapdump = Yes
Support Clear Snapdump = Yes
Support FW Triggered Snapdump = Yes
Support PHY current speed = No
Support Lane current speed = No
Support NVMe Width = No
Support Lane DeviceType = No
Support Extended Drive performance Monitoring = No
Support NVMe Repair = No
Support Platform Security = No
Support None Mode Params = No
Support Extended Controller Property = No
Support Smart Poll Interval for DirectAttached = No
Support Write Journal Pinning = No
Support SMP Passthru with Port Number = No
Support SnapDump Preboot Trace Buffer Toggle = No
Support Parity Read Cache Bypass = No
Support NVMe Init Error Device ConnectorIndex = No
Support VolatileKey = No
Support PSOC Part Information = No
Support Slow array threshold calculation = No
Support PCIe Reference Clock override = No


Supported PD Operations :
=======================
Force Online = Yes
Force Offline = Yes
Force Rebuild = Yes
Deny Force Failed = No
Deny Force Good/Bad = No
Deny Missing Replace = No
Deny Clear = No
Deny Locate = No
Support Power State = Yes
Set Power State For Cfg = No
Support T10 Power State = No
Support Temperature = Yes
NCQ = Yes
Support Max Rate SATA = No
Support Degraded Media = No
Support Parallel FW Update = Yes
Support Drive Crypto Erase = Yes
Support SSD Wear Gauge = No
Support Sanitize = Yes
Support Extended Sanitize = No


Supported VD Operations :
=======================
Read Policy = Yes
Write Policy = Yes
IO Policy = Yes
Access Policy = Yes
Disk Cache Policy = Yes
Reconstruction = Yes
Deny Locate = No
Deny CC = No
Allow Ctrl Encryption = No
Enable LDBBM = Yes
Support FastPath = Yes
Performance Metrics = Yes
Power Savings = No
Support Powersave Max With Cache = No
Support Breakmirror = Yes
Support SSC WriteBack = No
Support SSC Association = No
Support VD Hide = Yes
Support VD Cachebypass = Yes
Support VD discardCacheDuringLDDelete = Yes
Support VD Scsi Unmap = No


Advanced Software Option :
========================

----------------------------------------
Adv S/W Opt        Time Remaining  Mode 
----------------------------------------
MegaRAID FastPath  Unlimited       -    
MegaRAID RAID6     Unlimited       -    
MegaRAID RAID5     Unlimited       -    
----------------------------------------

Safe ID =  PSNQWQW9H83X2CKBX49E56PN7XC9ACA51INQIMIZ

HwCfg :
=====
ChipRevision =  C0
BatteryFRU = N/A
Front End Port Count = 0
Backend Port Count = 8
BBU = Absent
Alarm = On
Serial Debugger = Present
NVRAM Size = 32KB
Flash Size = 16MB
On Board Memory Size = 2048MB
CacheVault Flash Size = NA
TPM = Absent
Upgrade Key = Absent
On Board Expander = Absent
Temperature Sensor for ROC = Present
Temperature Sensor for Controller = Absent
Upgradable CPLD = Present
Upgradable PSOC = Absent
Current Size of CacheCade (GB) = 0
Current Size of FW Cache (MB) = 1698
ROC temperature(Degree Celsius) = 46


Policies :
========

Policies Table :
==============

------------------------------------------------
Policy                          Current Default 
------------------------------------------------
Predictive Fail Poll Interval   300 sec         
Interrupt Throttle Active Count 16              
Interrupt Throttle Completion   50 us           
Rebuild Rate                    30 %    30%     
PR Rate                         30 %    30%     
BGI Rate                        30 %    30%     
Check Consistency Rate          30 %    30%     
Reconstruction Rate             30 %    30%     
Cache Flush Interval            4s              
------------------------------------------------

Flush Time(Default) = 4s
Drive Coercion Mode = none
Auto Rebuild = On
Battery Warning = On
ECC Bucket Size = 15
ECC Bucket Leak Rate (hrs) = 24
Restore Hot Spare on Insertion = Off
Expose Enclosure Devices = On
Maintain PD Fail History = On
Reorder Host Requests = On
Auto detect BackPlane = SGPIO/i2c SEP
Load Balance Mode = Auto
Security Key Assigned = Off
Disable Online Controller Reset = Off
Use drive activity for locate = Off


Boot :
====
BIOS Enumerate VDs = 1
Stop BIOS on Error = Off
Delay during POST = 0
Spin Down Mode = None
Enable Ctrl-R = Yes
Enable Web BIOS = No
Enable PreBoot CLI = No
Enable BIOS = Yes
Max Drives to Spinup at One Time = 2
Maximum number of direct attached drives to spin up in 1 min = 10
Delay Among Spinup Groups (sec) = 12
Allow Boot with Preserved Cache = Off


High Availability :
=================
Topology Type = None
Cluster Permitted = No
Cluster Active = No


Defaults :
========
Phy Polarity = 0
Phy PolaritySplit = 0
Strip Size = 256 KB
Write Policy = WB
Read Policy = RA
Cache When BBU Bad = Off
Cached IO = Off
VD PowerSave Policy = Controller Defined
Default spin down time (mins) = 30
Coercion Mode = None
ZCR Config = Unknown
Max Chained Enclosures = 16
Direct PD Mapping = No
Restore Hot Spare on Insertion = No
Expose Enclosure Devices = Yes
Maintain PD Fail History = Yes
Zero Based Enclosure Enumeration = No
Disable Puncturing = No
EnableLDBBM = Yes
DisableHII = No
Un-Certified Hard Disk Drives = Allow
SMART Mode = Mode 6
Enable LED Header = Yes
LED Show Drive Activity = Yes
Dirty LED Shows Drive Activity = No
EnableCrashDump = Yes
Disable Online Controller Reset = No
Treat Single span R1E as R10 = No
Power Saving option = Enabled
TTY Log In Flash = No
Auto Enhanced Import = Yes
BreakMirror RAID Support = single span R1
Disable Join Mirror = Yes
Enable Shield State = Yes
Time taken to detect CME = 60 sec


Capabilities :
============
Supported Drives = SAS, SATA
RAID Level Supported = RAID0, RAID1(2 or more drives), RAID5, RAID6, RAID00, RAID10(2 or more drives per span), RAID50, RAID60
Enable JBOD = No
Mix in Enclosure = Allowed
Mix of SAS/SATA of HDD type in VD = Allowed
Mix of SAS/SATA of SSD type in VD = Allowed
Mix of SSD/HDD in VD = Allowed
SAS Disable = No
Max Arms Per VD = 32
Max Spans Per VD = 8
Max Arrays = 128
Max VD per array = 16
Max Number of VDs = 64
Max Parallel Commands = 928
Max SGE Count = 60
Max Data Transfer Size = 512 sectors
Max Strips PerIO = 128
Max Configurable CacheCade Size(GB) = 0
Max Transportable DGs = 0
Enable Snapdump = Yes
Enable SCSI Unmap = Yes
Read cache bypass enabled for Parity RAID LDs = No
FDE Drive Mix Support = No
Min Strip Size = 64 KB
Max Strip Size = 1.000 MB


Scheduled Tasks :
===============
Consistency Check Reoccurrence = 168 hrs
Next Consistency check launch = 06/15/2024, 03:00:00
Patrol Read Reoccurrence = 168 hrs
Next Patrol Read launch = 06/15/2024, 03:00:00
Battery learn Reoccurrence = NA
Next Battery Learn = NA
OEMID = Broadcom


Security Protocol properties :
============================
Security Protocol = None

Drive Groups = 2

TOPOLOGY :
========

-----------------------------------------------------------------------------
DG Arr Row EID:Slot DID Type  State BT       Size PDC  PI SED DS3  FSpace TR 
-----------------------------------------------------------------------------
 0 -   -   -        -   RAID0 Optl  N    3.637 TB dflt N  N   dflt N      N  
 0 0   -   -        -   RAID0 Optl  N    3.637 TB dflt N  N   dflt N      N  
 0 0   0   252:0    0   DRIVE Onln  N    3.637 TB dflt N  N   dflt -      N  
 1 -   -   -        -   RAID0 Optl  N  893.750 GB dflt N  N   dflt N      N  
 1 0   -   -        -   RAID0 Optl  N  893.750 GB dflt N  N   dflt N      N  
 1 0   0   252:2    2   DRIVE Onln  N  893.750 GB dflt N  N   dflt -      N  
-----------------------------------------------------------------------------

DG=Disk Group Index|Arr=Array Index|Row=Row Index|EID=Enclosure Device ID
DID=Device ID|Type=Drive or RAID Type|Onln=Online|Rbld=Rebuild|Optl=Optimal
Dgrd=Degraded|Pdgd=Partially degraded|Offln=Offline|BT=Background Task Active
PDC=PD Cache|PI=Protection Info|SED=Self Encrypting Drive|Frgn=Foreign
DS3=Dimmer Switch 3|dflt=Default|Msng=Missing|FSpace=Free Space Present
TR=Transport Ready

Virtual Drives = 2

VD LIST :
=======

---------------------------------------------------------------
DG/VD TYPE  State Access Consist Cache Cac sCC       Size Name 
---------------------------------------------------------------
0/0   RAID0 Optl  RW     Yes     RWTD  -   ON    3.637 TB      
1/1   RAID0 Optl  RW     Yes     RWTD  -   ON  893.750 GB      
---------------------------------------------------------------

VD=Virtual Drive| DG=Drive Group|Rec=Recovery
Cac=CacheCade|OfLn=OffLine|Pdgd=Partially Degraded|Dgrd=Degraded
Optl=Optimal|dflt=Default|RO=Read Only|RW=Read Write|HD=Hidden|TRANS=TransportReady
B=Blocked|Consist=Consistent|R=Read Ahead Always|NR=No Read Ahead|WB=WriteBack
AWB=Always WriteBack|WT=WriteThrough|C=Cached IO|D=Direct IO|sCC=Scheduled
Check Consistency

Physical Drives = 2

PD LIST :
=======

----------------------------------------------------------------------------------------
EID:Slt DID State DG       Size Intf Med SED PI SeSz Model                      Sp Type 
----------------------------------------------------------------------------------------
252:0     0 Onln   0   3.637 TB SATA HDD N   N  512B ST4000NM000B-2TF100        U  -    
252:2     2 Onln   1 893.750 GB SATA SSD N   N  512B SAMSUNG MZ7LH960HAJR-00005 U  -    
----------------------------------------------------------------------------------------

EID=Enclosure Device ID|Slt=Slot No|DID=Device ID|DG=DriveGroup
DHS=Dedicated Hot Spare|UGood=Unconfigured Good|GHS=Global Hotspare
UBad=Unconfigured Bad|Sntze=Sanitize|Onln=Online|Offln=Offline|Intf=Interface
Med=Media Type|SED=Self Encryptive Drive|PI=PI Eligible
SeSz=Sector Size|Sp=Spun|U=Up|D=Down|T=Transition|F=Foreign
UGUnsp=UGood Unsupported|UGShld=UGood shielded|HSPShld=Hotspare shielded
CFShld=Configured shielded|Cpybck=CopyBack|CBShld=Copyback Shielded
UBUnsp=UBad Unsupported|Rbld=Rebuild

Enclosures = 1

Enclosure LIST :
==============

--------------------------------------------------------------------
EID State Slots PD PS Fans TSs Alms SIM Port# ProdID VendorSpecific 
--------------------------------------------------------------------
252 OK        8  2  0    0   0    0   1 -     SGPIO                 
--------------------------------------------------------------------

EID=Enclosure Device ID | PD=Physical drive count | PS=Power Supply count
TSs=Temperature sensor count | Alms=Alarm count | SIM=SIM Count | ProdID=Product ID

Result of smarctl test on the first disk, which is the hdd in trouble:

smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.14.0-364.el9.x86_64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     ST4000NM000B-2TF100
Serial Number:    WRA04TJW
LU WWN Device Id: 5 000c50 0e0b3a0e2
Firmware Version: TN01
User Capacity:    4,000,787,030,016 bytes [4.00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-4 (minor revision not indicated)
SATA Version is:  SATA 3.3, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Wed Jun 12 02:50:32 2024 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART Status not supported: ATA return descriptor not supported by controller firmware
SMART overall-health self-assessment test result: PASSED
Warning: This result is based on an Attribute check.

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                    was completed without error.
                    Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                    without error or no self-test has ever 
                    been run.
Total time to complete Offline 
data collection:        (  567) seconds.
Offline data collection
capabilities:            (0x7b) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    Offline surface scan supported.
                    Self-test supported.
                    Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                    General Purpose Logging supported.
Short self-test routine 
recommended polling time:    (   1) minutes.
Extended self-test routine
recommended polling time:    ( 376) minutes.
Conveyance self-test routine
recommended polling time:    (   2) minutes.
SCT capabilities:          (0x50bd) SCT Status supported.
                    SCT Error Recovery Control supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   084   064   044    Pre-fail  Always       -       232627322
  3 Spin_Up_Time            0x0003   094   093   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       62
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   082   060   045    Pre-fail  Always       -       167705010
  9 Power_On_Hours          0x0032   094   094   000    Old_age   Always       -       6038
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       62
 18 Unknown_Attribute       0x000b   100   100   050    Pre-fail  Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   070   039   000    Old_age   Always       -       30 (Min/Max 29/32)
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       35
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       1804
194 Temperature_Celsius     0x0022   030   061   000    Old_age   Always       -       30 (0 23 0 0 0)
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       3010 (57 49 0)
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       23822879535
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       4504320178233

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Result of the smartctl test on the disk in the other slot, which is the ssd:

smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.14.0-364.el9.x86_64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Samsung based SSDs
Device Model:     SAMSUNG MZ7LH960HAJR-00005
Serial Number:    S45NNA0T824376
LU WWN Device Id: 5 002538 e0280b4eb
Firmware Version: HXT7A04Q
User Capacity:    960,197,124,096 bytes [960 GB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
TRIM Command:     Available, deterministic, zeroed
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-4 T13/BSR INCITS 529 revision 5
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Wed Jun 12 02:56:51 2024 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART Status not supported: ATA return descriptor not supported by controller firmware
SMART overall-health self-assessment test result: PASSED
Warning: This result is based on an Attribute check.

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                    was never started.
                    Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                    without error or no self-test has ever 
                    been run.
Total time to complete Offline 
data collection:        (    0) seconds.
Offline data collection
capabilities:            (0x53) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    No Offline surface scan supported.
                    Self-test supported.
                    No Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                    General Purpose Logging supported.
Short self-test routine 
recommended polling time:    (   2) minutes.
Extended self-test routine
recommended polling time:    (  60) minutes.
SCT capabilities:          (0x003d) SCT Status supported.
                    SCT Error Recovery Control supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  9 Power_On_Hours          0x0032   098   098   000    Old_age   Always       -       6038
 12 Power_Cycle_Count       0x0032   099   099   000    Old_age   Always       -       71
177 Wear_Leveling_Count     0x0013   099   099   005    Pre-fail  Always       -       14
179 Used_Rsvd_Blk_Cnt_Tot   0x0013   100   100   010    Pre-fail  Always       -       0
180 Unused_Rsvd_Blk_Cnt_Tot 0x0013   100   100   010    Pre-fail  Always       -       2872
181 Program_Fail_Cnt_Total  0x0032   100   100   010    Old_age   Always       -       0
182 Erase_Fail_Count_Total  0x0032   100   100   010    Old_age   Always       -       0
183 Runtime_Bad_Block       0x0013   100   100   010    Pre-fail  Always       -       0
184 End-to-End_Error        0x0033   100   100   097    Pre-fail  Always       -       0
187 Uncorrectable_Error_Cnt 0x0032   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0032   071   062   000    Old_age   Always       -       29
194 Temperature_Celsius     0x0022   071   038   000    Old_age   Always       -       29 (Min/Max 22/62)
195 ECC_Error_Rate          0x001a   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0
199 CRC_Error_Count         0x003e   100   100   000    Old_age   Always       -       0
202 Exception_Mode_Status   0x0033   100   100   010    Pre-fail  Always       -       0
235 POR_Recovery_Count      0x0012   099   099   000    Old_age   Always       -       48
241 Total_LBAs_Written      0x0032   099   099   000    Old_age   Always       -       13668753287
242 Total_LBAs_Read         0x0032   099   099   000    Old_age   Always       -       8143934093
243 SATA_Downshift_Ct       0x0032   100   100   000    Old_age   Always       -       0
244 Thermal_Throttle_St     0x0032   100   100   000    Old_age   Always       -       0
245 Timed_Workld_Media_Wear 0x0032   100   100   000    Old_age   Always       -       65535
246 Timed_Workld_RdWr_Ratio 0x0032   100   100   000    Old_age   Always       -       65535
247 Timed_Workld_Timer      0x0032   100   100   000    Old_age   Always       -       65535
251 NAND_Writes             0x0032   100   100   000    Old_age   Always       -       15273834752

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
1
  • The drive probably died. Commented Jun 12, 2024 at 4:31

1 Answer 1

2

Stop. Don't use pvcreate like that here! Without the proper options for restoring the previous PV ID, it will just complicate the recovery.

The error message indicates the LVM metadata structures are in fact present - it's just that in emergency mode, the volume group is not activated yet. Basically, LVM may not have been started yet.

Run vgchange -ay to attempt to activate any LVM volume groups on the system; you will see a message indicating how many logical volumes were activated, and if there was a problem doing so, error message(s) explaining why. If the logical volume was successfully activated, try mounting it again.

If the volume group won't activate, there's probably something wrong with the disk. If you have smartctl installed, run smartctl -x /dev/sdb to get information on the health of the HDD.

If /home is the only missing filesystem, you could temporarily comment out the line for /home in /etc/fstab, to get the system to boot up fully but without the home directories of regular users. That might make it easier to gather diagnostic information on the HDD.

If you really need to reconstruct the volume group metadata, read the LVM backup file at /etc/lvm/backup/cs (it's a readable text file) to find out the ID of the missing physical volume, and then you can use pvcreate --restorefile /etc/lvm/backup/cs --uuid <PV ID from the file> /dev/sdb1 to restore the PV identity, then vgcfgrestore --file /etc/lvm/backup/cs to restore the rest of LVM metadata to it.


AVAGO MR9361-8i is a hardware RAID controller of the MegaRAID family. Typically it is managed with a command like megacli or storcli.

With a hardware RAID, you'll have to make the SMART request through the RAID controller in a specific way. If /proc/devices/mega* exists, you can try smartctl -a -d megaraid,0 /dev/sdb for a MegaRAID-specific SMART query. If the hardware RAID controller has more than one disk plugged into it, increment the 0 to see information on the other disks.

With hardware RAID, only the controller sees the individual disks; the OS will see just the RAID set(s) built out of them. The basic smartctl -x query resulted in "no SMART capability" because the RAID set has no SMART information for itself - only the underlying physical drives do.

Also, if you have megacli available, try:

megacli -PDList -aall 

to get the summary of the state of the actual physical drive(s) as the controller knows it. If you have storcli instead, try:

storcli /c0/eall/sall show
5
  • Please edit your original question to add the information, don't put it into the comments. That way, you can use the code formatting to keep the original line breaks, so the tables in the output will be readable. Commented Jun 12, 2024 at 5:43
  • The output of smartctl is put in the updated question content, but I didn't find anything special from it. I tried using pvcreate with uuid and restore file, but it still asks for the -ff flag, I don't know if that would remove the data on the disk. Commented Jun 12, 2024 at 6:10
  • It's not wise to write anything to a disk that may be damaged. Try getting as much disk status information from the hardware RAID controller as you can first (see my edit above), to better understand what is going on. Sadly, a common thing that happens with hardware RAID controllers without proper monitoring software set up is that the first disk failure goes completely unnoticed, and the administrator notices only when the second disk fails, at which point may be too late. Commented Jun 12, 2024 at 7:12
  • If the disk is good, is it right to use pvcreate --uuid <UUID> --restorefile <FILE> with the -ff flag? I don't understand that usage, if restoring metadata means overwriting existing things, isn't -ff always required? Commented Jun 12, 2024 at 7:55
  • Since LVM sees there is already a valid-looking PV metadata, it requires the admin to be very sure the disk is identified correctly before overwriting it. If you write the ID to a wrong disk, you can make the problem worse. But try vgscan or maybe vgscan --mknodes first before rewriting/restoring metadata. If that fixes it, it might be something as simple as the disk being slow to spin up. Commented Jun 12, 2024 at 8:08

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.