Aside from learning purposes, this is not likely to be a good use of parallel, because:
Calling du like that will quite possibly be slower than just invoking du in the normal way. First, the information about files sizes can be extracted from the directory, and so an entire directory can be computed in a single access. Effectively, directories are stored as a special kind of file object, whose data is a vector of directory entities ("dirents"), which contain the name and metadata for each file. What you are doing is using find to print these dirents, then getting du to parse each one (every file, not every directory); almost all of this second scan is redundant work.
Insisting that du examine every file prevents it from avoiding double-counting multiple hard-links to the same file. So you can easily end up inflating the disk usage this way. On the other hand, directories also take up diskspace, and normally du will include this space in its reports. But you're never calling it on any directory, so you will end up understating the total disk usage.
You're invoking a shell and an instance of du for every file. Normally, you would only create a single process for a single du. Process creation is a lot slower than reading a filesize from a directory. At a minimum, you should use parallel -X and rewrite your shell function to invoke du on all the arguments, rather than just $1.
There is no way to share environment variables between sibling shells. So you would have to accumulate the results in a persistent store, such as a temporary file or database table. That's also an expensive operation, but if you adopted the above suggestion, you would only need to do it once for each invocation of du, rather than for every file.
So, ignoring the first two issues, and just looking at the last two, solely for didactic purposes, you could do something like the following:
# Create a temporary file to store results
tmpfile=$(mktemp)
# Function which invokes du and safely appends its summary line
# to the temporary file
collectsizes() {
# Get the name of the temporary file, and remove it from the args
tmpfile=$1
shift
# Call du on all the parameters, and get the last (grand total) line
size=$(du -c -s "$@" | tail -n1)
# lock the temporary file and append the dataline under lock
flock "$tmpfile" bash -c 'cat "$1" >> "$2"' _ "$size" "$tmpfile"
}
export -f collectsizes
# Find all regular files, and feed them to parallel taking care
# to avoid problems if files have whitespace in their names
find -type f -print0 | parallel -0 -j8 collectsizes "$tmpfile"
# When all that's done, sum up the values in the temporary file
awk '{s+=$1}END{print s}' "$tmpfile"
# And delete it.
rm "$tmpfile"
parallelexecute Bash function? Functions are not commands, you cannot just pass them toxargsorparallel.parallel/xargsagain at the end with another step.while IFS= read -r count; do let sum+=$count; done < <(find | parallel); echo "$sum"or something roughly like that.xargs, but you can withparallelprovided that youexport -fthem. Surprising but true.