I am using AIX 6.1 ksh shell.
I want to use one liner to do something like this:
cat A_FILE | skip-first-3-bytes-of-the-file
I want to skip the first 3 bytes of the first line; is there a way to do this?
Old school — you could use dd:
dd if=A_FILE bs=1 skip=3
The input file is A_FILE, the block size is 1 character (byte), skip the first 3 'blocks' (bytes). (With some variants of dd such as GNU dd, you could use bs=1c here — and alternatives like bs=1k to read in blocks of 1 kilobyte in other circumstances. The dd on AIX does not support this, it seems; the BSD (macOS Sierra) variant doesn't support c but does support k, m, g, etc.)
There are other ways to achieve the same result, too:
sed '1s/^...//' A_FILE
This works if there are 3 or more characters on the first line.
tail -c +4 A_FILE
And you could use Perl, Python and so on too.
dd if=A_FILE bs=1 skip=3 in AIX 6.1
dd like this will slow down the whole process by several orders of magnitude. bs=1 sets the block size to a single byte which prevents efficient file IO. Switching the parameters, e.g. bs=3 skip=1 helps a little bit, but using tail is much more efficient anyway. I did not test sed for speed.
Instead of using cat you can use tail as such:
tail -c +4 FILE
This will print out the entire file except for the first 3 bytes. Consult man tail for more information.
/usr/xpg4/bin/tail, at least on my machine. Good tip nonetheless!
dd over an ssh connection to get a file image and I needed to remove the "[sudo] password for X:" at the beginning of the resulting file.
If one has Python on their system, one can use small python script to take advantage of seek() function to start reading at the nth byte like so:
#!/usr/bin/env python3
import sys
with open(sys.argv[1],'rb') as fd:
fd.seek(int(sys.argv[2]))
for line in fd:
print(line.decode().strip())
And usage would be like so:
$ ./skip_bytes.py input.txt 3
Note that byte count starts at 0 (thus first byte is actually index 0), thus by specifying 3 we're effectively positioning the reading to start at 3+1=4th byte
I needed to recently do something similar. I was helping with a field support issue and needed to let a technician see real time plots as they were making changes. The data is in a binary log that grows throughout the day. I have software that can parse and plot the data from logs, but it is currently not real time. What I did was capture the size of the log before I started processing the data, then went into a loop that would process the data and each pass create a new file with the bytes of the file that had not yet been processed.
#!/usr/bin/env bash
# I named this little script hackjob.sh
# The purpose of this is to process an input file and load the results into
# a database. The file is constantly being update, so this runs in a loop
# and every pass it creates a new temp file with bytes that have not yet been
# processed. It runs about 15 seconds behind real time so it's
# pseudo real time. This will eventually be replaced by a real time
# queue based version, but this does work and surprisingly well actually.
set -x
# Current data in YYYYMMDD fomat
DATE=`date +%Y%m%d`
INPUT_PATH=/path/to/my/data
IFILE1=${INPUT_PATH}/${DATE}_my_input_file.dat
OUTPUT_PATH=/tmp
OFILE1=${OUTPUT_PATH}/${DATE}_my_input_file.dat
# Capture the size of the original file
SIZE1=`ls -l ${IFILE1} | awk '{print $5}'`
# Copy the original file to /tmp
cp ${IFILE1} ${OFILE1}
while :
do
sleep 5
# process_my_data.py ${OFILE1}
rm ${OFILE1}
# Copy IFILE1 to OFILE1 minus skipping the amount of data already processed
dd skip=${SIZE1} bs=1 if=${IFILE1} of=${OFILE1}
# Update the size of the input file
SIZE1=`ls -l ${IFILE1} | awk '{print $5}'`
echo
DATE=`date +%Y%m%d`
done
ls; have you considered using stat -c'%s' "${IFILE}" instead of that ls|awk combo? That is, assuming GNU coreutils...