I bought 2 rugged fanless embedded computers that appeared to be quite robust with name brand extended temperature components, vibration dampening, etc. I installed Ubuntu 24.04.1 and my desktop application (Display and control of some CANBus IO modules) and then deployed both of them, they have been working well for over a 1.5 months.
I bought 2 more of them and set them up in the say way, but, upon testing them they would both lock up hard. The mouse, keyboard, and networking stop working, the display still shows the last picture on it. I have a clock in my application and the clock in Ubuntu and neither of them are changing.
I ran journalctl and looked at the logs, there is nothing. The freezing does not seem to be triggered by anything in particular, it happened while clicking around and while doing nothing. I checked temperatures, did a RAM and stress test, however, no issues were observed. A few times it has partially crashed where I could get some logs (mouse and keyboard did not work but SSH still worked).
2024-10-08T06:21:07.915262+00:00 user kernel: general protection fault, probably for non-canonical address 0xc3373711c6007baf: 0000 [#1] PREEMPT SMP NOPTI
2024-10-08T06:21:07.915285+00:00 user kernel: CPU: 3 PID: 2804 Comm: gnome-shell Not tainted 6.8.0-45-generic #45-Ubuntu
2024-10-08T06:21:07.915287+00:00 user kernel: Hardware name: SYSTEM_MANUFACTURER SYSTEM_PRODUCT_NAME/Default string, BIOS 5.19 09/06/2023
2024-10-08T06:21:07.915288+00:00 user kernel: RIP: 0010:__kmalloc+0x15b/0x4f0
2024-10-08T06:21:07.915289+00:00 user kernel: Code: 83 78 10 00 48 8b 38 0f 84 ef 02 00 00 48 85 ff 0f 84 e6 02 00 00 41 8b 44 24 28 49 8b 9c 24 b8 00 00 00 49 8b 34 24 48 01 f8 <48> 33 18 48 89 c1 48 89 f8 48 0f c9 48 31 cb 48 8d 8a 00 20 00 00
2024-10-08T06:21:07.915290+00:00 user kernel: RSP: 0018:ffffbeb7c385b880 EFLAGS: 00010286
2024-10-08T06:21:07.915291+00:00 user kernel: RAX: c3373711c6007baf RBX: 63a5e576359684a0 RCX: 0000000000000000
2024-10-08T06:21:07.915292+00:00 user kernel: RDX: 000000013b8e4003 RSI: 000000000003b900 RDI: c3373711c6007b8f
2024-10-08T06:21:07.915293+00:00 user kernel: RBP: ffffbeb7c385b8d0 R08: 0000000000000000 R09: 0000000000000000
2024-10-08T06:21:07.915293+00:00 user kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff96fb40050800
2024-10-08T06:21:07.915294+00:00 user kernel: R13: 0000000000000040 R14: 0000000000000dc0 R15: ffffffff8b4b7875
2024-10-08T06:21:07.915294+00:00 user kernel: FS: 000074d6d2243640(0000) GS:ffff96fcb7f80000(0000) knlGS:0000000000000000
2024-10-08T06:21:07.915295+00:00 user kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
2024-10-08T06:21:07.915295+00:00 user kernel: CR2: 000077630511e010 CR3: 0000000139f36000 CR4: 0000000000350ef0
2024-10-08T06:21:07.915296+00:00 user kernel: Call Trace:
2024-10-08T06:21:07.915296+00:00 user kernel: <TASK>
2024-10-08T06:21:07.915297+00:00 user kernel: ? show_regs+0x6d/0x80
2024-10-08T06:21:07.915298+00:00 user kernel: ? die_addr+0x37/0xa0
2024-10-08T06:21:07.915298+00:00 user kernel: ? exc_general_protection+0x1db/0x480
2024-10-08T06:21:07.915299+00:00 user kernel: ? asm_exc_general_protection+0x27/0x30
2024-10-08T06:21:07.915299+00:00 user kernel: ? drm_syncobj_array_wait_timeout.constprop.0+0xc5/0x6c0
2024-10-08T06:21:07.915300+00:00 user kernel: ? __kmalloc+0x15b/0x4f0
2024-10-08T06:21:07.915301+00:00 user kernel: drm_syncobj_array_wait_timeout.constprop.0+0xc5/0x6c0
2024-10-08T06:21:07.915301+00:00 user kernel: ? drm_syncobj_array_wait_timeout.constprop.0+0xc5/0x6c0
2024-10-08T06:21:07.915302+00:00 user kernel: drm_syncobj_array_wait.isra.0+0x61/0x160
2024-10-08T06:21:07.915302+00:00 user kernel: drm_syncobj_wait_ioctl+0xd0/0x100
2024-10-08T06:21:07.915303+00:00 user kernel: ? __pfx_drm_syncobj_wait_ioctl+0x10/0x10
2024-10-08T06:21:07.915303+00:00 user kernel: drm_ioctl_kernel+0xb9/0x120
2024-10-08T06:21:07.915304+00:00 user kernel: drm_ioctl+0x2d4/0x550
2024-10-08T06:21:07.915304+00:00 user kernel: ? __pfx_drm_syncobj_wait_ioctl+0x10/0x10
2024-10-08T06:21:07.915305+00:00 user kernel: __x64_sys_ioctl+0xa0/0xf0
2024-10-08T06:21:07.915305+00:00 user kernel: x64_sys_call+0x143b/0x25c0
2024-10-08T06:21:07.915306+00:00 user kernel: do_syscall_64+0x7f/0x180
2024-10-08T06:21:07.915306+00:00 user kernel: ? __x64_sys_poll+0xc7/0x150
2024-10-08T06:21:07.915307+00:00 user kernel: ? syscall_exit_to_user_mode+0x89/0x260
2024-10-08T06:21:07.915307+00:00 user kernel: ? do_syscall_64+0x8c/0x180
2024-10-08T06:21:07.915308+00:00 user kernel: ? __x64_sys_timerfd_settime+0x9c/0x100
2024-10-08T06:21:07.915309+00:00 user kernel: ? syscall_exit_to_user_mode+0x89/0x260
2024-10-08T06:21:07.915309+00:00 user kernel: ? do_syscall_64+0x8c/0x180
2024-10-08T06:21:07.915310+00:00 user kernel: ? clear_bhb_loop+0x15/0x70
2024-10-08T06:21:07.915313+00:00 user kernel: message repeated 2 times: [ ? clear_bhb_loop+0x15/0x70]
2024-10-08T06:21:07.915314+00:00 user kernel: entry_SYSCALL_64_after_hwframe+0x78/0x80
2024-10-08T06:21:07.915319+00:00 user kernel: RIP: 0033:0x74d6d6324ded
2024-10-08T06:21:07.915320+00:00 user kernel: Code: 04 25 28 00 00 00 48 89 45 c8 31 c0 48 8d 45 10 c7 45 b0 10 00 00 00 48 89 45 b8 48 8d 45 d0 48 89 45 c0 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1a 48 8b 45 c8 64 48 2b 04 25 28 00 00 00
2024-10-08T06:21:07.915321+00:00 user kernel: RSP: 002b:00007fffafa1d910 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
2024-10-08T06:21:07.915321+00:00 user kernel: RAX: ffffffffffffffda RBX: 000059374152533c RCX: 000074d6d6324ded
2024-10-08T06:21:07.915322+00:00 user kernel: RDX: 00007fffafa1d9d0 RSI: 00000000c02864c3 RDI: 000000000000000e
2024-10-08T06:21:07.915322+00:00 user kernel: RBP: 00007fffafa1d960 R08: 00007fffafa1d970 R09: 0000000000000000
2024-10-08T06:21:07.915323+00:00 user kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 00007fffafa1d970
2024-10-08T06:21:07.915324+00:00 user kernel: R13: 00000000c02864c3 R14: 00007fffafa1d9d0 R15: 000000000000000e
2024-10-08T06:21:07.915324+00:00 user kernel: </TASK>
2024-10-08T06:21:07.915325+00:00 user kernel: Modules linked in: can_raw can rfcomm snd_seq_dummy snd_hrtimer ccm qrtr cmac algif_hash algif_skcipher af_alg bnep snd_sof_pci_intel_tgl snd_sof_intel_hda_common soundwire_intel snd_sof_intel_hda_mlink soundwire_cadence snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp snd_sof snd_sof_utils snd_soc_hdac_hda snd_hda_ext_core snd_soc_acpi_intel_match snd_soc_acpi soundwire_generic_allocation soundwire_bus snd_hda_codec_hdmi snd_soc_core snd_hda_codec_realtek snd_hda_codec_generic snd_compress ac97_bus snd_pcm_dmaengine intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi rtw88_8821cu kvm_intel snd_hda_codec rtw88_8821c kvm snd_hda_core binfmt_misc snd_hwdep snd_pcm irqbypass crct10dif_pclmul snd_seq_midi snd_seq_midi_event rtw88_8822cu snd_rawmidi rtw88_8822c cmdlinepart rtw88_usb spi_nor polyval_generic rtw88_core ghash_clmulni_intel mei_pxp mei_hdcp mtd i915 btusb nls_iso8859_1 snd_seq sha256_ssse3 btrtl mac80211 btintel
2024-10-08T06:21:07.915327+00:00 user kernel: sha1_ssse3 btbcm drm_buddy snd_seq_device btmtk ttm aesni_intel crypto_simd snd_timer cryptd drm_display_helper intel_cstate intel_wmi_thunderbolt bluetooth cfg80211 snd i2c_i801 cec wmi_bmof ecdh_generic soundcore f81604 spi_intel_pci libarc4 can_dev i2c_smbus spi_intel ecc mei_me mei rc_core intel_pmc_core intel_vsec pmt_telemetry pmt_class intel_hid sparse_keymap acpi_tad acpi_pad joydev input_leds mac_hid msr parport_pc ppdev lp parport efi_pstore nfnetlink dmi_sysfs ip_tables x_tables autofs4 hid_logitech_hidpp hid_logitech_dj hid_generic usbhid hid crc32_pclmul sdhci_pci cqhci igb ahci sdhci igc libahci xhci_pci i2c_algo_bit dca xhci_pci_renesas video wmi pinctrl_elkhartlake
2024-10-08T06:21:07.915327+00:00 user kernel: ---[ end trace 0000000000000000 ]---
After some research, it appeared it could be RAM related. I installed new RAM from a completely different vendor (Crucial) but it did not stop the freezing. I installed Debian 12 as well as ran from a dedicated 12V supply but still the issues persist. The freezing can happen anywhere from an hour to 5 days, with no apparent trigger.
The vendor has no idea what it could be, however, I am assuming it is hardware related as I have 2 working ones, 2 non-working ones and the software is the same. I would like to get to the bottom of this as I can't trust the ones I have deployed and don't want to exchange with the manufacture just to have the same issue on new ones.
Specs: Intel J6413 8GB RAM 128GB Sata SSD
Thanks!