Jump to content


Photo

HDD failure

Vu+Duo

  • Please log in to reply
32 replies to this topic

#1 krister67

  • Senior Member
  • 92 posts

0
Neutral

Posted 1 January 2013 - 11:58

My HDD (SAMSUNG HD103SJ) has started behaving strange.
Sometimes it just freezes and I has to switch the power OFF/ON
After switching the power ON it sometimes mounts as usual but once and again it is mounted as read-only or not at all.

I suspects this has something to do with a couple of power failures that occoured during the christmas weekend.

Is the HDD dying or is it just a corrupted filesystem?

This is a cut from "dmesg",

EXT4-fs (sda1): warning: mounting fs with errors, running e2fsck is recommended
EXT4-fs (sda1): mounted filesystem with writeback data mode. Opts: nodelalloc,data=writeback
EXT4-fs (sda1): last error at 1356985467: ext4_remount:4352

How do I perform the e2fsck in the best way? (Telnet?)

I´m running latest OpenPli 2.1.

Re: HDD failure #2 Erik Slagter

  • PLi® Core member
  • 46,969 posts

+542
Excellent

Posted 1 January 2013 - 12:00

Try smartmontools on it (smartctl -a /dev/DEVICE) to show errors logged by the device.

* Wavefrontier T90 with 28E/23E/19E/13E via SCR switches 2 x 2 x 6 user bands
I don't read PM -> if you have something to ask or to report, do it in the forum so others can benefit. I don't take freelance jobs.
Ik lees geen PM -> als je iets te vragen of te melden hebt, doe het op het forum, zodat anderen er ook wat aan hebben.


Re: HDD failure #3 krister67

  • Senior Member
  • 92 posts

0
Neutral

Posted 1 January 2013 - 18:31

Try smartmontools on it (smartctl -a /dev/DEVICE) to show errors logged by the device.


OK, this is the output:

root@vuduo:~# smartctl -a /dev/sda1
smartctl 5.39.1 2010-01-28 r3054 [mipsel-oe-linux-gnu] (local build)
Copyright © 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF INFORMATION SECTION ===
Device Model: SAMSUNG HD103SJ
Serial Number: S246J90Z711078
Firmware Version: 1AJ10001
User Capacity: 1,000,204,886,016 bytes
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: 8
ATA Standard is: ATA-8-ACS revision 6
Local Time is: Tue Jan 1 18:27:56 2013 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (9480) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 158) minutes.
SCT capabilities: (0x003f) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 100 100 051 Pre-fail Always - 1
2 Throughput_Performance 0x0026 252 252 000 Old_age Always - 0
3 Spin_Up_Time 0x0023 071 071 025 Pre-fail Always - 8889
4 Start_Stop_Count 0x0032 097 097 000 Old_age Always - 3642
5 Reallocated_Sector_Ct 0x0033 252 252 010 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 252 252 051 Old_age Always - 0
8 Seek_Time_Performance 0x0024 252 252 015 Old_age Offline - 0
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 6
10 Spin_Retry_Count 0x0032 252 252 051 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 252 252 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 115
191 G-Sense_Error_Rate 0x0022 100 100 000 Old_age Always - 1
192 Power-Off_Retract_Count 0x0022 252 252 000 Old_age Always - 0
194 Temperature_Celsius 0x0002 051 051 000 Old_age Always - 49 (Lifetime Min/Max 41/49)
195 Hardware_ECC_Recovered 0x003a 100 100 000 Old_age Always - 0
196 Reallocated_Event_Count 0x0032 252 252 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 252 252 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 252 252 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0036 093 093 000 Old_age Always - 3919
200 Multi_Zone_Error_Rate 0x002a 100 100 000 Old_age Always - 2
223 Load_Retry_Count 0x0032 252 252 000 Old_age Always - 0
225 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 3696
SMART Error Log Version: 1
ATA Error Count: 124 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Error 124 occurred at disk power-on lifetime: 0 hours (0 days + 0 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 09 00 06 00 e0 Error: ICRC, ABRT at LBA = 0x00000600 = 1536
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
ca 00 60 a9 06 03 e0 00 00:00:32.945 WRITE DMA
35 00 78 b1 2d 3d e0 00 00:00:32.944 WRITE DMA EXT
35 00 78 39 2c 3d e0 00 00:00:32.944 WRITE DMA EXT
35 00 78 c1 2a 3d e0 00 00:00:32.943 WRITE DMA EXT
35 00 78 49 29 3d e0 00 00:00:32.943 WRITE DMA EXT
Error 123 occurred at disk power-on lifetime: 0 hours (0 days + 0 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 61 00 fb 00 e0 Error: ICRC, ABRT at LBA = 0x0000fb00 = 64256
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
ca 00 80 e1 fb 02 e0 00 00:00:11.109 WRITE DMA
27 00 00 00 00 00 e0 00 00:00:11.109 READ NATIVE MAX ADDRESS EXT
ec 00 00 00 00 00 a0 00 00:00:11.109 IDENTIFY DEVICE
ef 03 42 00 00 00 a0 00 00:00:11.109 SET FEATURES [Set transfer mode]
27 00 00 00 00 00 e0 00 00:00:11.109 READ NATIVE MAX ADDRESS EXT
Error 122 occurred at disk power-on lifetime: 0 hours (0 days + 0 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 61 00 fb 00 e0 Error: ICRC, ABRT at LBA = 0x0000fb00 = 64256
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
ca 00 80 e1 fb 02 e0 00 00:00:11.108 WRITE DMA
27 00 00 00 00 00 e0 00 00:00:11.108 READ NATIVE MAX ADDRESS EXT
ec 00 00 00 00 00 a0 00 00:00:11.108 IDENTIFY DEVICE
ef 03 42 00 00 00 a0 00 00:00:11.108 SET FEATURES [Set transfer mode]
27 00 00 00 00 00 e0 00 00:00:11.108 READ NATIVE MAX ADDRESS EXT
Error 121 occurred at disk power-on lifetime: 0 hours (0 days + 0 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 61 00 fb 00 e0 Error: ICRC, ABRT at LBA = 0x0000fb00 = 64256
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
ca 00 80 e1 fb 02 e0 00 00:00:11.108 WRITE DMA
27 00 00 00 00 00 e0 00 00:00:11.108 READ NATIVE MAX ADDRESS EXT
ec 00 00 00 00 00 a0 00 00:00:11.108 IDENTIFY DEVICE
ef 03 42 00 00 00 a0 00 00:00:11.108 SET FEATURES [Set transfer mode]
27 00 00 00 00 00 e0 00 00:00:11.108 READ NATIVE MAX ADDRESS EXT
Error 120 occurred at disk power-on lifetime: 0 hours (0 days + 0 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 61 00 fb 00 e0 Error: ICRC, ABRT at LBA = 0x0000fb00 = 64256
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
ca 00 80 e1 fb 02 e0 00 00:00:11.108 WRITE DMA
27 00 00 00 00 00 e0 00 00:00:11.108 READ NATIVE MAX ADDRESS EXT
ec 00 00 00 00 00 a0 00 00:00:11.108 IDENTIFY DEVICE
ef 03 42 00 00 00 a0 00 00:00:11.108 SET FEATURES [Set transfer mode]
27 00 00 00 00 00 e0 00 00:00:11.108 READ NATIVE MAX ADDRESS EXT
SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]

Note: selective self-test log revision number (0) not 1 implies that no selective self-test has ever been run
SMART Selective self-test log data structure revision number 0
Note: revision number not 1 implies that no selective self-test has ever been run
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Completed [00% left] (0-65535)
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
root@vuduo:~#

Re: HDD failure #4 Erik Slagter

  • PLi® Core member
  • 46,969 posts

+542
Excellent

Posted 1 January 2013 - 18:38

Thank you, this is helpful.

I do see error messages, but no persistent hardware errors. The harddisk reports 6 hours on in total, it seems to be brand new?

The disks reports temperatures of 49 degrees, which is a bit high, does the box get enough ventilation?

Can you please do a "smartctl -t long" and then a "smartctl -a" (like above), but only after an hour, because this test can easily take an hour. In the meantime, the harddisk can be used as usual, it's a non-intrusive selftest. If there are any bad sectors, this test will find them.

* Wavefrontier T90 with 28E/23E/19E/13E via SCR switches 2 x 2 x 6 user bands
I don't read PM -> if you have something to ask or to report, do it in the forum so others can benefit. I don't take freelance jobs.
Ik lees geen PM -> als je iets te vragen of te melden hebt, doe het op het forum, zodat anderen er ook wat aan hebben.


Re: HDD failure #5 Rob van der Does

  • Senior Member
  • 7,766 posts

+184
Excellent

Posted 1 January 2013 - 18:44

Is this normal (sending to off-line mode and and over 2 hours to finish):

smartctl -t long /dev/sda1

smartctl 5.42 2011-10-20 r3458 [mips-linux-3.3.6-1.2] (local build)
Copyright © 2002-11 by Bruce Allen, [url="http://smartmontools.sourceforge.net"]http://smartmontools.sourceforge.net[/url]

=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Extended self-test routine immediately in off-line mode".
Drive command "Execute SMART Extended self-test routine immediately in off-line mode" successful.
Testing has begun.
Please wait 143 minutes for test to complete.
Test will complete after Tue Jan  1 21:07:01 2013


Re: HDD failure #6 Erik Slagter

  • PLi® Core member
  • 46,969 posts

+542
Excellent

Posted 1 January 2013 - 18:51

Yes this is perfectly normal. If you have a large harddisk which isn't very fast, it may take over two hours to read all sectors. It's no problem though because the test is non-intrusive. I advise to this regularly to avoid sudden problems (although there is no guarantee). You might interleave it with "short" tests.

* Wavefrontier T90 with 28E/23E/19E/13E via SCR switches 2 x 2 x 6 user bands
I don't read PM -> if you have something to ask or to report, do it in the forum so others can benefit. I don't take freelance jobs.
Ik lees geen PM -> als je iets te vragen of te melden hebt, doe het op het forum, zodat anderen er ook wat aan hebben.


Re: HDD failure #7 Rob van der Does

  • Senior Member
  • 7,766 posts

+184
Excellent

Posted 1 January 2013 - 18:53

Clear, thanks.
Although it's only a 500GB 2.5" drive...........

Edited by SatKiekerd, 1 January 2013 - 18:54.


Re: HDD failure #8 Erik Slagter

  • PLi® Core member
  • 46,969 posts

+542
Excellent

Posted 1 January 2013 - 18:55

You might want to benchmark it:

hdparm -t -T DEVICE

or

hdparm --direct -t DEVICE

Maybe it's actually not that slow, but it runs the test at a low priority.

* Wavefrontier T90 with 28E/23E/19E/13E via SCR switches 2 x 2 x 6 user bands
I don't read PM -> if you have something to ask or to report, do it in the forum so others can benefit. I don't take freelance jobs.
Ik lees geen PM -> als je iets te vragen of te melden hebt, doe het op het forum, zodat anderen er ook wat aan hebben.


Re: HDD failure #9 krister67

  • Senior Member
  • 92 posts

0
Neutral

Posted 2 January 2013 - 14:53

Thank you, this is helpful.

I do see error messages, but no persistent hardware errors. The harddisk reports 6 hours on in total, it seems to be brand new?

The disks reports temperatures of 49 degrees, which is a bit high, does the box get enough ventilation?

Can you please do a "smartctl -t long" and then a "smartctl -a" (like above), but only after an hour, because this test can easily take an hour. In the meantime, the harddisk can be used as usual, it's a non-intrusive selftest. If there are any bad sectors, this test will find them.


The SMART-support wasn't enabled before, that's why it says 6 hour total. (I haven't had any problems with the drive since enabling the SMART-support).

The ventilation should be good enough, the box has plenty of free space and the cooling-fan is working. What is a normal temp?

I will try the "smartctl -t long" to night after work.

Re: HDD failure #10 krister67

  • Senior Member
  • 92 posts

0
Neutral

Posted 2 January 2013 - 20:11

This is the result of the test.

root@vuduo:~# smartctl -t long /dev/sda1
smartctl 5.39.1 2010-01-28 r3054 [mipsel-oe-linux-gnu] (local build)
Copyright © 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Extended self-test routine immediately in off-line mode".
Drive command "Execute SMART Extended self-test routine immediately in off-line mode" successful.
Testing has begun.
Please wait 158 minutes for test to complete.
Test will complete after Wed Jan 2 19:19:32 2013
Use smartctl -X to abort test.
root@vuduo:~#
root@vuduo:~# smartctl -a /dev/sda1
smartctl 5.39.1 2010-01-28 r3054 [mipsel-oe-linux-gnu] (local build)
Copyright © 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF INFORMATION SECTION ===
Device Model: SAMSUNG HD103SJ
Serial Number: S246J90Z711078
Firmware Version: 1AJ10001
User Capacity: 1,000,204,886,016 bytes
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: 8
ATA Standard is: ATA-8-ACS revision 6
Local Time is: Wed Jan 2 20:02:27 2013 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (9480) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 158) minutes.
SCT capabilities: (0x003f) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 100 100 051 Pre-fail Always - 1
2 Throughput_Performance 0x0026 053 053 000 Old_age Always - 9061
3 Spin_Up_Time 0x0023 071 071 025 Pre-fail Always - 8889
4 Start_Stop_Count 0x0032 097 097 000 Old_age Always - 3645
5 Reallocated_Sector_Ct 0x0033 252 252 010 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 252 252 051 Old_age Always - 0
8 Seek_Time_Performance 0x0024 252 252 015 Old_age Offline - 0
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 14
10 Spin_Retry_Count 0x0032 252 252 051 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 252 252 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 115
191 G-Sense_Error_Rate 0x0022 100 100 000 Old_age Always - 1
192 Power-Off_Retract_Count 0x0022 252 252 000 Old_age Always - 0
194 Temperature_Celsius 0x0002 049 049 000 Old_age Always - 51 (Lifetime Min/Max 41/54)
195 Hardware_ECC_Recovered 0x003a 100 100 000 Old_age Always - 0
196 Reallocated_Event_Count 0x0032 252 252 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 252 252 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 252 252 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0036 093 093 000 Old_age Always - 3919
200 Multi_Zone_Error_Rate 0x002a 100 100 000 Old_age Always - 2
223 Load_Retry_Count 0x0032 252 252 000 Old_age Always - 0
225 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 3699
SMART Error Log Version: 1
ATA Error Count: 124 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Error 124 occurred at disk power-on lifetime: 0 hours (0 days + 0 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 09 00 06 00 e0 Error: ICRC, ABRT at LBA = 0x00000600 = 1536
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
ca 00 60 a9 06 03 e0 00 00:00:32.945 WRITE DMA
35 00 78 b1 2d 3d e0 00 00:00:32.944 WRITE DMA EXT
35 00 78 39 2c 3d e0 00 00:00:32.944 WRITE DMA EXT
35 00 78 c1 2a 3d e0 00 00:00:32.943 WRITE DMA EXT
35 00 78 49 29 3d e0 00 00:00:32.943 WRITE DMA EXT
Error 123 occurred at disk power-on lifetime: 0 hours (0 days + 0 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 61 00 fb 00 e0 Error: ICRC, ABRT at LBA = 0x0000fb00 = 64256
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
ca 00 80 e1 fb 02 e0 00 00:00:11.109 WRITE DMA
27 00 00 00 00 00 e0 00 00:00:11.109 READ NATIVE MAX ADDRESS EXT
ec 00 00 00 00 00 a0 00 00:00:11.109 IDENTIFY DEVICE
ef 03 42 00 00 00 a0 00 00:00:11.109 SET FEATURES [Set transfer mode]
27 00 00 00 00 00 e0 00 00:00:11.109 READ NATIVE MAX ADDRESS EXT
Error 122 occurred at disk power-on lifetime: 0 hours (0 days + 0 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 61 00 fb 00 e0 Error: ICRC, ABRT at LBA = 0x0000fb00 = 64256
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
ca 00 80 e1 fb 02 e0 00 00:00:11.108 WRITE DMA
27 00 00 00 00 00 e0 00 00:00:11.108 READ NATIVE MAX ADDRESS EXT
ec 00 00 00 00 00 a0 00 00:00:11.108 IDENTIFY DEVICE
ef 03 42 00 00 00 a0 00 00:00:11.108 SET FEATURES [Set transfer mode]
27 00 00 00 00 00 e0 00 00:00:11.108 READ NATIVE MAX ADDRESS EXT
Error 121 occurred at disk power-on lifetime: 0 hours (0 days + 0 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 61 00 fb 00 e0 Error: ICRC, ABRT at LBA = 0x0000fb00 = 64256
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
ca 00 80 e1 fb 02 e0 00 00:00:11.108 WRITE DMA
27 00 00 00 00 00 e0 00 00:00:11.108 READ NATIVE MAX ADDRESS EXT
ec 00 00 00 00 00 a0 00 00:00:11.108 IDENTIFY DEVICE
ef 03 42 00 00 00 a0 00 00:00:11.108 SET FEATURES [Set transfer mode]
27 00 00 00 00 00 e0 00 00:00:11.108 READ NATIVE MAX ADDRESS EXT
Error 120 occurred at disk power-on lifetime: 0 hours (0 days + 0 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 61 00 fb 00 e0 Error: ICRC, ABRT at LBA = 0x0000fb00 = 64256
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
ca 00 80 e1 fb 02 e0 00 00:00:11.108 WRITE DMA
27 00 00 00 00 00 e0 00 00:00:11.108 READ NATIVE MAX ADDRESS EXT
ec 00 00 00 00 00 a0 00 00:00:11.108 IDENTIFY DEVICE
ef 03 42 00 00 00 a0 00 00:00:11.108 SET FEATURES [Set transfer mode]
27 00 00 00 00 00 e0 00 00:00:11.108 READ NATIVE MAX ADDRESS EXT
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 13 -
Note: selective self-test log revision number (0) not 1 implies that no selective self-test has ever been run
SMART Selective self-test log data structure revision number 0
Note: revision number not 1 implies that no selective self-test has ever been run
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Completed [00% left] (0-65535)
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
root@vuduo:~#


Looks OK?

Re: HDD failure #11 Erik Slagter

  • PLi® Core member
  • 46,969 posts

+542
Excellent

Posted 2 January 2013 - 20:15

SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Completed [00% left] (0-65535)
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing


Looks OK?

This is what you're looking for. And yes it's okay.

Looks like the firmware of your harddisk crashed, either due to high temperature or simply a firmware bug (I've had IBM Deskstars that used to lockup every now and then and give similar messages in the smart log).

Edited by Erik Slagter, 2 January 2013 - 20:16.

* Wavefrontier T90 with 28E/23E/19E/13E via SCR switches 2 x 2 x 6 user bands
I don't read PM -> if you have something to ask or to report, do it in the forum so others can benefit. I don't take freelance jobs.
Ik lees geen PM -> als je iets te vragen of te melden hebt, doe het op het forum, zodat anderen er ook wat aan hebben.


Re: HDD failure #12 krister67

  • Senior Member
  • 92 posts

0
Neutral

Posted 2 January 2013 - 20:21


SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Completed [00% left] (0-65535)
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing


Looks OK?

This is what you're looking for. And yes it's okay.

Looks like the firmware of your harddisk crashed, either due to high temperature or simply a firmware bug (I've had IBM Deskstars that used to lockup every now and then and give similar messages in the smart log).


So there´s no need to run "e2fsck" to check the filesystem?

Is 54 degrees Celsius too high?

Re: HDD failure #13 Erik Slagter

  • PLi® Core member
  • 46,969 posts

+542
Excellent

Posted 2 January 2013 - 20:25

So there´s no need to run "e2fsck" to check the filesystem?

The selftest says wether all sectors can be succesfully read (and the checksums also correspond). It doesn't say whether the file system is healty or corrupt ;)

Is 54 degrees Celsius too high?

I think it's a real risk. A harddisk shouldn't get warmer than about 45 degrees.

* Wavefrontier T90 with 28E/23E/19E/13E via SCR switches 2 x 2 x 6 user bands
I don't read PM -> if you have something to ask or to report, do it in the forum so others can benefit. I don't take freelance jobs.
Ik lees geen PM -> als je iets te vragen of te melden hebt, doe het op het forum, zodat anderen er ook wat aan hebben.


Re: HDD failure #14 krister67

  • Senior Member
  • 92 posts

0
Neutral

Posted 2 January 2013 - 20:37


So there´s no need to run "e2fsck" to check the filesystem?

The selftest says wether all sectors can be succesfully read (and the checksums also correspond). It doesn't say whether the file system is healty or corrupt ;)

Is 54 degrees Celsius too high?

I think it's a real risk. A harddisk shouldn't get warmer than about 45 degrees.


Thank´s a lot for the info and guidance! :)

Do you know the proper way to run "e2fsck" to check and correct the filesystem, I guess the drive has to be unmounted etc.?

I have to look into the heating problem, maybe there are some dust or something blocking the airflow.

Re: HDD failure #15 Erik Slagter

  • PLi® Core member
  • 46,969 posts

+542
Excellent

Posted 3 January 2013 - 09:38

Try this:
umount <DEVICE>
e2fsck -f -v <DEVICE>
mount <DEVICE>

That's all.

Rumours has it that you can also do this from within enigma, but I don't have experience with that m/o.

Edited by Erik Slagter, 3 January 2013 - 09:39.

* Wavefrontier T90 with 28E/23E/19E/13E via SCR switches 2 x 2 x 6 user bands
I don't read PM -> if you have something to ask or to report, do it in the forum so others can benefit. I don't take freelance jobs.
Ik lees geen PM -> als je iets te vragen of te melden hebt, doe het op het forum, zodat anderen er ook wat aan hebben.


Re: HDD failure #16 krister67

  • Senior Member
  • 92 posts

0
Neutral

Posted 3 January 2013 - 11:41

Try this:

umount <DEVICE>
e2fsck -f -v <DEVICE>
mount <DEVICE>
That's all.

Rumours has it that you can also do this from within enigma, but I don't have experience with that m/o.


Thanks!

What if the device is busy, should I switch to single user mode with init 1 before trying to unmount? (I´m using Permanent timeshift plugin)

Re: HDD failure #17 Erik Slagter

  • PLi® Core member
  • 46,969 posts

+542
Excellent

Posted 3 January 2013 - 17:40

If you're on the command line already, you can stop enigma using "init 4". That should suffice. Afterwards use "init 3" to restart enigma.

* Wavefrontier T90 with 28E/23E/19E/13E via SCR switches 2 x 2 x 6 user bands
I don't read PM -> if you have something to ask or to report, do it in the forum so others can benefit. I don't take freelance jobs.
Ik lees geen PM -> als je iets te vragen of te melden hebt, doe het op het forum, zodat anderen er ook wat aan hebben.


Re: HDD failure #18 krister67

  • Senior Member
  • 92 posts

0
Neutral

Posted 5 January 2013 - 09:31

If you're on the command line already, you can stop enigma using "init 4". That should suffice. Afterwards use "init 3" to restart enigma.


"init 4" didn´t d help, I still got this message,

root@vuduo:~# umount /dev/sda1
umount: can't umount /media/hdd: Device or resource busy



Re: HDD failure #19 Erik Slagter

  • PLi® Core member
  • 46,969 posts

+542
Excellent

Posted 5 January 2013 - 09:50

Try this: fuser -mk /hdd (or wherever you mounted your harddisk). This may kill important processes, so you may need to reboot afterwards.

* Wavefrontier T90 with 28E/23E/19E/13E via SCR switches 2 x 2 x 6 user bands
I don't read PM -> if you have something to ask or to report, do it in the forum so others can benefit. I don't take freelance jobs.
Ik lees geen PM -> als je iets te vragen of te melden hebt, doe het op het forum, zodat anderen er ook wat aan hebben.


Re: HDD failure #20 krister67

  • Senior Member
  • 92 posts

0
Neutral

Posted 5 January 2013 - 10:00


If you're on the command line already, you can stop enigma using "init 4". That should suffice. Afterwards use "init 3" to restart enigma.


"init 4" didn´t d help, I still got this message,

root@vuduo:~# umount /dev/sda1
umount: can't umount /media/hdd: Device or resource busy


Maybe it´s because I´m using a swapfile on /dev/sda1?



Also tagged with one or more of these keywords: Vu+Duo

1 user(s) are reading this topic

0 members, 1 guests, 0 anonymous users