12

OS: RHEL8 Filesystem: xfs

I'm guessing that some zombie process is holding the space, but I can't find it.

The problem is with /var/lib/pgsql/14, which df says is using 1.1TB of space:

$ sudo df -T -xtmpfs -xdevtmpfs -h --sync  
Filesystem                                 Type  Size  Used Avail Use% Mounted on
/dev/sda2                                  xfs   126G   14G  113G  11% /
/dev/sda1                                  xfs   2.0G  658M  1.4G  33% /boot
/dev/mapper/pgsql14vg-pgsql141v            xfs   5.4T  1.1T  4.4T  20% /var/lib/pgsql/14
/dev/mapper/pglogvg-pglog1v                xfs    15G  292M   15G   2% /var/log/postgresql
FISPFILNAS01.xxxxxxxxxxxx:/DB_backups_TAPb nfs4   15T  2.3T   13T  16% /var/lib/pgsql/14/backups

However, du says it only has 12GB of files:

$ du -x -d2 -h /var/lib/pgsql/14 | sort -k2
12G     /var/lib/pgsql/14
12G     /var/lib/pgsql/14/data
42M     /var/lib/pgsql/14/data/base
2.8M    /var/lib/pgsql/14/data/global
0       /var/lib/pgsql/14/data/pg_commit_ts
0       /var/lib/pgsql/14/data/pg_dynshmem
4.0K    /var/lib/pgsql/14/data/pg_logical
16K     /var/lib/pgsql/14/data/pg_multixact
0       /var/lib/pgsql/14/data/pg_notify
0       /var/lib/pgsql/14/data/pg_replslot
0       /var/lib/pgsql/14/data/pg_serial
0       /var/lib/pgsql/14/data/pg_snapshots
0       /var/lib/pgsql/14/data/pg_stat
976K    /var/lib/pgsql/14/data/pg_stat_tmp
192K    /var/lib/pgsql/14/data/pg_subtrans
0       /var/lib/pgsql/14/data/pg_tblspc
0       /var/lib/pgsql/14/data/pg_twophase
12G     /var/lib/pgsql/14/data/pg_wal
112K    /var/lib/pgsql/14/data/pg_xact

I even tried umounting the device (after shutting down PostgreSQL), but it's busy, even though lsof says that no processes are using it:

$ export PS1="$ "
$ sudo lsof /var/lib/pgsql/14
$ 

Added lsof +L1 output:

$ sudo lsof +L1
COMMAND     PID    USER   FD   TYPE DEVICE SIZE/OFF NLINK      NODE NAME
dbus-daem  1182    dbus   22r   REG    8,2 11567160     0 135113565 /var/lib/sss/mc/initgroups (deleted)
polkitd    1887 polkitd    3r   REG    8,2 11567160     0 134903325 /var/lib/sss/mc/initgroups (deleted)
python.or  1899    root    6w   REG    8,2     1434     0  67157239 /var/log/venv-salt-minion.log-20231217 (deleted)
python.or  1899    root    7r   REG    8,2 11567160     0 134903325 /var/lib/sss/mc/initgroups (deleted)
sssd      16382    root   14r   REG    8,2 11567160     0 134903325 /var/lib/sss/mc/initgroups (deleted)
sssd_be   16383    root   18r   REG    8,2 11567160     0 134903325 /var/lib/sss/mc/initgroups (deleted)
3
  • 1
    if it's not deleted files, it could be filesystem metadata overhead for xfs Commented Jan 7, 2024 at 14:37
  • 4
    @frostschutz 1.1 terabytes of overhead seems a bit excessive. Besides, other servers with similar-sized file systems only have about 60GB of xfs overhead. Commented Jan 7, 2024 at 15:10
  • 1
    right, I missed it. it was another case of -x unix.stackexchange.com/a/358331/30851 Commented Jan 7, 2024 at 15:22

2 Answers 2

32

You have another filesystem underneath /var/lib/pgsql/14. Unmount /var/lib/pgsql/14/backups and check for unexpected files written there:

Filesystem                                 Type  Size  Used Avail Use% Mounted on
/dev/mapper/pgsql14vg-pgsql141v            xfs   5.4T  1.1T  4.4T  20% /var/lib/pgsql/14
FISPFILNAS01.xxxxxxxxxxxx:/DB_backups_TAPb nfs4   15T  2.3T   13T  16% /var/lib/pgsql/14/backups

If you cannot unmount the lower filesystem (backups) because it's a production environment, you can temporarily "bind mount" the upper one with the missing space per the instructions at du results on filesystem inconsistent with df:

mkdir /mnt/pgsql14
mount --bind /var/lib/pgsql/14 /mnt/pgsql14
du -x -d2 -h /mnt/pgsql14 | sort -k2

# …
umount /mnt/pgsql14
rmdir /mnt/pgsql14
3
  • 7
    That was the problem. I thought I'd cleaned out the old files from backups/, but obviously I hadn't. Commented Jan 7, 2024 at 15:19
  • So you mounted a filesystem over a non-empty directory, effectively hiding those files at the mount point? Commented Jan 10, 2024 at 7:07
  • @U.Windl yes. I've seen similar elsewhere, for example if backup fails to mount but a backups process runs anyway Commented Jan 10, 2024 at 8:39
11

Run sudo lsof +L1.

You'll see all deleted-but-still-open files, and the processes holding them open.

1
  • Added that output to my question. Did not see any relevant files. Commented Jan 7, 2024 at 15:13

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.