4

When traversing a filesystem with large numbers of files, is it quicker to do so as root compared to any other user?

For example, if there are several million files under /data, and /data is owned by user123, would a recursive grep complete more quickly for root than for user123?

I'm curious whether there's an optimization that skips a permissions check, or whether it's going to perform a stat for every file anyway, and so the check is just a conditional. And, whether this would be generally applicable, or by filesystem.

I've picked up a superstitious habit of running an extremely large operation like that as root to speed it up, but haven't found a good way of testing whether it actually helps.

4
  • 4
    have you tried testing it? Commented Jan 20, 2024 at 7:04
  • "...whether it's going to perform a stat for every file anyway.." -- what's the "it" here? Commented Jan 20, 2024 at 9:34
  • You know the program/bash-builtin time? Commented Jan 20, 2024 at 12:37
  • Not really... I don't know a good way to work around caching, so it's hard to make a comparison between before and after. Commented Jan 21, 2024 at 8:06

4 Answers 4

3

There's no special treatment of root in Linux there - though root could have the powers to change access mode, modify ACLs, disable SELinux enforcement etc, the user is still subject to these. So, really, the kernel can't take any shortcuts there. (Thinking about what it means that you're working inside Linux UID namespaces makes it even more complicated. That universal powerful root: it might not even exist from the view of processes that you're running.)

or whether it's going to perform a stat for every file anyway,

stat is s sys all, i.e., something that userland does to learn more about a file. It's not a given that a recursive grep even needs to do that; the inefficiency of getting all directory entries and then separately doing that on every entry is what leads to the existence of specialized calls that combine the two, getdents(64). The result doesn't contain any access information, but instead of checking whether the current user can access a file found that way, before accessing it, it works be sensible to just go ahead and try - if it fails, you couldn't. That saves one context switch per file.

So, how could one actually exploit root-style privileges to make a recursive grep faster?

The answer probably lies in minimizing context switches between userland grep and kernel gold system functionality. Short of writing something like a kernel module that gives one a flattened view of all the files a process would be able to access alongside with some functionality to convert a hit position back to an actual file path, I don't see an immediate clean way of extending the Linux kernel to avoid having to open, and if possible, read each file and directory. Linux has this model where files are really meant to be accessed from user space, to get all the concurrency, safety, close and memory allocation behavior into something that has well-defined (by the process owning the file handle) semantics.

3
  • 1
    "There's no special treatment of root in Linux there" -- well, insofar as the root special privileges come from capabilities here (namely CAP_DAC_OVERRIDE), it is at least distinct from the usual permission bits checks. And it might just well be that the capabilities are checked in the code before the permission bits (and possible ACLs), so root could theoretically get some teensy nanosecond-scale advantage. One that would be swamped by all other issues like cache activity. Commented Jan 20, 2024 at 9:39
  • 1
    Oh hadn't thought about order of checks, but yeah, compared to switching context, getting metadata from file system structures and handling the memory... Shouldn't make much of a difference. Commented Jan 20, 2024 at 9:59
  • Thank you, this is helpful. I forgot about there being other ways access might be restricted. And yeah, thinking about it, it makes sense just to try to open and read files (from userland). Commented Jan 21, 2024 at 8:16
2

Yes, but No

Well - don't reason before trying.

I used sudo -i; cd etc; time ls -R >/dev/null and the same without switching to sudo. I used 2 running xterms with bash to reduce the possibility of caching effects. I startet with sudo and made a few measurings and repeated with the user. There are 8 cpus and the load average was about 1.

The time was comparable, 0.014, 0.014, 0.16 for root, 0.015 3 times for me.

You have to measure multiple time, to exclude outliers, but no outliers to be seen. The output is redirected to /dev/null, because the output time can be the biggest time consumer on such tasks. For ls -R /usr my timing (root) is 11.5s without redirection and about 1.5s with.

It is 1.5, 1.1, 1.1 for root for 3 sequential calls and for the user it is 5.2, 1.1, 1.1. Maybe this is a caching effect for the 2nd and 3rd call.

I repeated with /var and got 0.2, 0.1, 0.1 for root; 0.5, 0.1, 0.1 for the user (with time ls -R /var >/dev/null 2>&1, because of many prohibited folders).

So there seems to be a caching for each user but to investigate this further, maybe I have to reboot between each call and don't repeat the same command for a different user? I guess so.

Note, that some directories are entirely root only, including subdirectories, so root has to visit much more files entries, than the user, in /var.

BUT

For 618282 files in /usr, it took about 1.1 s to complete, and as soon as you're doing real work with those files, the 10th or smaller part of microsecond, you might be faster with root, doesn't matter. Most of the time - for example - you will spend in grep, and do something, depending on the match.

Just grepping for GNU recursively in /etc takes about 40% more time, than ls.

Note: This test was performed on an SSD-Drive with ext4. YMMV.

4
  • 1
    I upvote your answer, as it's totally right. A complete and differentiate answer might be more difficult and more complex, but it seems some "computer experts" in this forum know it all better. In gerneal it can be said that root is able to use more system ressources with higher priority in default setting and thus is able to perform better than non root in most default setups. I hope nobody is downvoting your post like mine. PS it makes me really mad when people who have no idea are downvoting things they don't understand Commented Jan 21, 2024 at 7:35
  • 1
    @paladin, this is a technical forum, people are interested in the actual technical details. You appear keep asserting some claims, in very general terms, but you refuse to elaborate on the details, to provide background or a description of the related functionality. You appear to expect strangers who don't know you at all to trust you over any other information, just on your word. While at the same time you yourself dismiss comments from others by blaming them of hunting you, or just saying they don't know anything about the matter. I would suggest some contemplation on that attitude. Commented Jan 21, 2024 at 10:47
  • I've provided evidence, but people just don't understand. I told to use time, people don't do. Cause people are too laze to use time, I provided several experiments with time, showing evidence. Mod is hunting several of my posts in a row in a bad attitude. People don't understand a simple script. People don't take time. People just downvote... People are just not constructive. And most times, people only do downvote, and no one is upvoting, even when a good answer. This is in contrast to stackoverflow, where people are generally more polite and also upvote things. People are stupid. Commented Jan 22, 2024 at 11:05
  • And also, where's all the evidence of people who claim it makes no difference to use root or non root? This is just nuts... If people want to have fail proof evidence I suggest people should start to use their own brain. You can write an entire essay about this topic to make it bulletproof. Do I have the wish or desire to explain to people who don't care at all how things work if they are not providing at least minimal constructive behavior? No, I've better things to do, spending my time, than to explain to ignorants. Commented Jan 22, 2024 at 11:14
-2

As all programmes normally runs in user space, difference between user are just permissions to access the resources. Root user have all access(most of the time) causing less syscalls this is making huge difference when you are performing large execution.

For sake of calculation I did simple test by running ls -l with different users and root, result is root make nearly 50% less syscall to perform ls -l. It's like obstacle race just root face less obstacle so root reaches early(remember result may be different for short life span and or frequent programs).

6
  • 1
    Why would having access reduce the number of syscalls for a recursive grep, please? Commented Jan 20, 2024 at 9:34
  • "result is root make nearly 50% less syscall to perform ls -l." -- care to amend your post to include the actual data? Commented Jan 20, 2024 at 9:44
  • I don't see lgetxattr and getxattr when I user root, probably you try on your system as result may be different based on protection application has been used. Commented Jan 20, 2024 at 9:47
  • 4
    @asktyagi, would you care to edit your post to include the actual test you used, and the relevant results? I don't see how ls asking for the extended attrs (or not) would be exactly related to the access control checks the system does. Commented Jan 20, 2024 at 10:05
  • 1
    @asktyagi you are right, I've upvoted your answer, please don't mind the ignorant people here. Commented Jan 22, 2024 at 11:22
-2

You can test it using the time command. To answer your question, large operations might be run faster as root, as root has normally a more unrestricted access to system ressources than a usual user. Usually root is often able to use reserved disk space, which other users cannot use. So when doing large operations which might do benefit of this additional disk space (cause they might create a lot of temporary files) might speed up. Also a usual user might be limited by system environments like the /etc/security/limits.conf and similiar ones. Operations run by root are less likely to cause access denied failures, this might speed up or speed down the process, depending on the operation.

But for safety and security reasons you usually should not use root whenever possible.

# command find used as non root user
time find / >> /tmp/find_as_non_root.log
real    0m1,506s
user    0m0,202s
sys     0m0,521s

# command find used as root user
time find / >> /tmp/find_as_root.log
real    0m0,673s
user    0m0,194s
sys     0m0,470s

Is this answer evidence based or just conjecture? – Chris Davies

Oh look, what evidence...

My former test may be a bit inaccurate, as caching is indeed a factor involving. But my test shall show that processing time may be different when using a root user in contrast to using a non root user. So I give you another test, running the same commands 3 times in a row. By the way, you can try it for yourself.

# command find run 3 times in a row as non root user
time find / >> /tmp/find_as_non_root.log
real    0m0,618s
user    0m0,191s
sys     0m0,403s

time find / >> /tmp/find_as_non_root.log
real    0m0,648s
user    0m0,194s
sys     0m0,408s

time find / >> /tmp/find_as_non_root.log
real    0m0,704s
user    0m0,244s
sys     0m0,367s

# command find run 3 times in a row as root user
time find / >> /tmp/find_as_root.log
real    0m0.690s
user    0m0.270s
sys     0m0.412s

time find / >> /tmp/find_as_root.log
real    0m0.693s
user    0m0.210s
sys     0m0.474s

time find / >> /tmp/find_as_root.log
real    0m0.695s
user    0m0.182s
sys     0m0.504s

I've also added a 3th test:

# non root user
user@opensuse:~> time bash -c "for i in 1 2 3 4 5 6 7 8 9 10; do echo \$i; find / &> /dev/null; done;"
1
2
3
4
5
6
7
8
9
10

real    0m5,212s
user    0m1,833s
sys     0m3,314s

# root user
opensuse:~ # time bash -c "for i in 1 2 3 4 5 6 7 8 9 10; do echo \$i; find / &> /dev/null; done;"
1
2
3
4
5
6
7
8
9
10

real    0m6,214s
user    0m2,113s
sys     0m4,018s

I've made a fourth and last test.

This time I've created 2 partitions, formated with EXT2 file system, both are located on the same machine on the same harddrive and both are 100MiB in size.

I've mounted both filesystems and have used the following DANGEROUS script do fill the filesystem with random data, using the non root user.

DO NOT EXECUTE THE FOLLOWING SCRIPT, UNLESS YOU KNOW WHAT YOU ARE DOING!

#!/bin/sh
wheatgraindoubled=1
for iteration in $(seq 1 64)
do
  echo "Iteration $iteration"
  mkdir "$iteration"
  cd "$iteration"

  wheatgrain=0
  while [ "$wheatgrain" -lt "$wheatgraindoubled" ]
  do
    wheatgrain=$(("$wheatgrain" + 1))
    touch "$wheatgrain.dat"
    dd if=/dev/urandom of="$wheatgrain.dat" count=8
  done
  wheatgraindoubled=$(("$wheatgraindoubled" * 2))
done

I waited for the script to fill the filesystem until no free space.

Then I used the same command to measure time, as in my previous test:

# non root user
user@opensuse:~> time bash -c "for i in 1 2 3 4 5 6 7 8 9 10; do echo \$i; find 1/ &> /dev/null; done;"
1
2
3
4
5
6
7
8
9
10

real    0m0,277s
user    0m0,101s
sys     0m0,176s

Then I copied all files using cp -r 1 /mnt/roottest/ as root user and repeated the same test.

# root user
opensuse:~ # time bash -c "for i in 1 2 3 4 5 6 7 8 9 10; do echo \$i; find 1/ &> /dev/null; done;"
1
2
3
4
5
6
7
8
9
10

real    0m0,263s
user    0m0,100s
sys     0m0,163s

Conclusion, root is processing faster than non root on a default opensuse installation.

1
  • Comments have been moved to chat; please do not continue the discussion here. Before posting a comment below this one, please review the purposes of comments. Comments that do not request clarification or suggest improvements usually belong as an answer, on Unix & Linux Meta, or in Unix & Linux Chat. Comments continuing discussion may be removed. Commented Jan 28, 2024 at 19:42

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.