1

I often found the following problem when I using Matlab 2012 in Fedora 20, during or after fft calculation or plot figures.

The following is the error massage:

[  635.157606] mce: [Hardware Error]: CPU 4: Machine Check Exception: 4 Bank 0: b650200000000135 
[  635.157606] mce: [Hardware Error]: TSC 22cd709f356 ADDR 5989fdd80
[  635.157606] mce: [Hardware Error]: PROCESSOR 2:100fa0 TIME 1462430327 SOCKET 0 APIC 4 microcode 10000dc
[  635.157606] [Hardware Error]: MC0 Error: Data/Tag DRD error.
[  635.157606] [Hardware Error]: Error Status: System Fatal error.
[  635.157606] [Hardware Error]: CPU:4 (10:a:0) MC0_STATUS[-|UE|-|PCC|AddrV|UECC]: 0xb650200000000135
[  635.157606] [Hardware Error]:MC0_ADDR: 0x00000005989fdd80 
[  635.157606] [Hardware Error]: cache level: L1, tx: DATA, mem-tx: DRD
[  635.157606] mce: [Hardware Error]: Machine check: Invalid
[  635.157606] Kernel panic - not syncing: Fatal machine check on current CPU
[  635.157606] Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff)
[  635.157606] drm_kms_helper: panic occurred, switching back to text console

Is it due to software (i.e. Matlab) or hardware (i.e. CPU, my CPU is AMD X6 1055T)? How can I solve this problem?

2 Answers 2

1

This is a hardware error. Specifically, it's an ECC memory error which was detected but not corrected. How can you tell? Pipe the output above through mcelog --ascii, and you'll get:

Hardware event. This is not a software error.
CPU 4 0 data cache TSC 22cd709f356 
ADDR 5989fdd80 
TIME 1462430327 Thu May  5 02:38:47 2016
  Data cache ECC error (syndrome a0)
       bit45 = uncorrected ecc error
       bit57 = processor context corrupt
       bit61 = error uncorrected
  memory/cache error 'data read mem transaction, data transaction, level 1'
STATUS b650200000000135 MCGSTATUS 4
CPUID Vendor AMD Family 16 Model 10
SOCKET 0 APIC 4 microcode 10000dc

(Note that with messages older kernels which don't include the PROCESSOR line, you need to know and specify the type of CPU used on the actual system. But with that line, running the output on my system should give the same result you'd get locally).

0

Looks to me like a hardware error, CPU or memory. If you have the possibility to use another CPU or swap memories in your PC, or try the same on another machine/CPU, you could rule out what hw is failing.
Also you should update the firmwares of the BIOS and other hardware, it might help. Sometimes CPU microcode is refreshed with a BIOS update, that can eliminate memory/CPU errors.

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.