What are the best practices for configuring durable IOT Linux devices? Should I use an Overlay File System?

Question

In the past our company used raspberry pi's for our IOT application. The problem with that was that SD cards wear out and get corrupt.

We now ordered Compulab SBC's with eMMC storage running Debian.

So what would be the best practices to configure durable embedded IOT devices?

I would say:

Choose an SBC with eMMC storage
Make sure you have a journaling filesystem (has_journal is enabled on EXT4)
Write logs to ram to prevent wear on storage (in /etc/systemd/journald.conf Storage=volatile)
Ensure fsck runs at boot (in /etc/fstab the last field is set to 1 or 2)
Swap should be disabled (run free -> total Swap should be 0)

Any more suggestions?

Overlay file system

Raspbian has an option in 'raspi-config'->'Performance Options'->'Overlay File System' I asked Compulab if they would recommend also using it, but they think it is already as robust as it can be with filesystem journaling and fsck that runs at boot. Would using an Overlay File System to prevent writes to storage be worth the extra complexity of needing to reboot the device multiple times to disable it and enable it again if you ever want to update it later?

eMMC typically allows to be formatted as pseudo-SLC (it's still MLC, but using only zwi states per cell, so you lose capacity for the sake of reliability. — Philippos
– Philippos, Commented May 2, 2023 at 11:25

dhanushka · Accepted Answer · 2023-05-06 10:22:30Z

I'll try to address your first question regarding the storage device durability as I'm a bit familiar with that.

Switching from SD to eMMC might not improve the situation if you don't do an accessment of your system's storage usage and take action to improve things, because both SD and eMMC use NAND.

Do you have an estimate of data writes to your storage? Use the following to evaluate your use case [see [1] for details]

total bytes written throughout device life = (device capacity in bytes) * (max program/erase cycles) / (write amplification factor)

Say for example

you write 0.5GiB per day
want your device to operate for 5 years
partitions you write data to totals 4GiB (storage capacity is more than this, but other partitions are read-only)
max program/erase cycles is 3000 for your multi-level cell (MLC) NAND

This gives you a write amplification factor of

4 * 3000 / (0.5 * 365 * 5) = ~13

What is write amplification

NAND in the SD or eMMC is written in NAND pages. Suppose you write/modify 1KiB (two 512-byte sectors) from the host, but say NAND page is 16KiB. So, the eMMC controller will write a whole NAND page. Things get more complicated when you think of erasures, because NAND is erased in NAND blocks, and a NAND block consists of many NAND pages.

So, what can you do to improve device life

From the above equation, you can

increase device capacity (but that'll add to the cost)
improve program/erase cycles: go for SLC or turn your data write partitions from MLC to pSLC (but this reduces the capacity)
reduce write amplification by improving your apps to perform NAND page aligned, NAND page sized (or multiples) writes from host (see eMMC EXT_CSD[265] optimal write size), enabling eMMC cache etc.

What else can you do

You can monitor your eMMC health using mmc-utils (https://git.kernel.org/pub/scm/utils/mmc/mmc-utils.git) or sysfs, and take necessary steps before the failure comes as a surprise.

eMMC extended CSD register provides

estimate for life time of SLC and MLC blocks in steps of 10%

(0x01 = 0-10% device life time used, 0x02 = 10-20%, .. , 0x0B = end of life)

type-B (MLC): EXT_CSD[268], type-A (SLC): EXT_CSD[269]

status of the spare blocks that are used to replace bad blocks

(0x01: Normal, 0x02: Warning: 80% of blocks used, 0x03: Urgent: 90% of blocks used)

Pre EOL info: EXT_CSD[267]

vendor may provide a proprietary health report in EXT_CSD[301:270] (but so far, I have only seen all zeros here)

e.g.

mmc-utils:
# mmc extcsd read /dev/mmcblk0
:
eMMC Life Time Estimation A [EXT_CSD_DEVICE_LIFE_TIME_EST_TYP_A]: 0x01
eMMC Life Time Estimation B [EXT_CSD_DEVICE_LIFE_TIME_EST_TYP_B]: 0x00
eMMC Pre EOL information [EXT_CSD_PRE_EOL_INFO]: 0x01
:

sysfs:
# cat /sys/block/mmcblk0/device/life_time    
0x00 0x01
# cat /sys/block/mmcblk0/device/pre_eol_info 
0x01

vendor may provide health related information you can access from mmc generic command CMD56 (using mmc-utils, mmc gen_cmd read < device > [arg])

See the following for a good explanation:

[1] https://www.kingston.com/en/embedded/emmc-embedded-flash 'Estimating, validating & monitoring eMMC life cycle'

Stack Exchange Network

What are the best practices for configuring durable IOT Linux devices? Should I use an Overlay File System?

Overlay file system

1 Answer 1

You must log in to answer this question.

Hot Network Questions

What are the best practices for configuring durable IOT Linux devices? Should I use an Overlay File System?

Overlay file system

1 Answer 1

You must log in to answer this question.

Related

Hot Network Questions