Simplify
write_count()– Instead of counting all bits used, and splitting out 5 bits and then 8 bits at a time, I would opt for a simpler method to write the count, whilst still maintaining a variable byte count to write larger numbers. Before presenting that, I'm not sure if you really need to count the actual bits, as you always write bytes, so your implementation could most probably be simplified as well.However if you use the 8th bit to indicate that more bits follows, there is no need to do all the precalculations, and you can do the following:
const int BIT_SHIFT = 7; const int LOWER_BITS = 0x7f; const int HIGH_BIT = 0x80; void write_count(uint64_t count) { int lower_bits = count & LOWER_BITS; while (lower_bits != count || count > LOWER_BITS) { write_char(lower_bits | HIGH_BIT); count >>= BIT_SHIFT; lower_bits = count & LOWER_BITS; } write_char(lower_bits); }This change will allow up to 127 repetitions to be represented using 2 bytes, instead of only 31 repetitions as in your original code. When using 3 bytes this implementation allows for 16384 repetitions (14 bits), whilst your allows 8192 repetitions (13 bits). After that your implementation is slightly more space efficient, but still slightly harder as you need to precalculate it.
Here is an example run showing the transition from 127 to 128 repetitions:
echo -n "eaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbc" | ./cr115765_rle -x 0x65 0x1 0x61 0x7f 0x62 0x80 0x1 0x63 0x1That is 1
e, 127 (0x7f)a's, 128b's (0x80 0x1), and 1c.Avoid magic numbers – In your version of
write_count()you use the magic numbers of5and8, whilst you've defined theNUM_LENGTH_BITS. Be consistent, and try to avoid most use of magic numbers. If you changed yourNUM_LENGTH_BITSthe rest of your code would not follow that change.Avoid repeating your self – In
write_char()theperror()&exit()part is repeated. If you change to using onlyprintf()this can be avoided, and simplified into:void write_char(int c) { if (printf(G_print_hex ? "0x%x " : "%c", c) < 0) { perror("Error printing to stdout."); exit(EXIT_FAILURE); } }Here I use a ternary to choose the proper format string, and I've also added a space to the end of the hex output to make it a little easier to read.
Be vary about globals – Global variables are in general to be avoided, but stuff like the
print_hexis awkward to pass as parameters everywhere. I tend to either uppercase them, or addG_in front of them so that it becomesG_print_hex. The general advice is though to make some sort of distinction so that you easily detect where you are using globals.Improve argument handling – I would look into alternate ways of handling your parameters, as the current implementation is somewhat handicapped due to the following reasons:
stricmpis limited to Windows –stricmpis Windows specific (see herehere), so please usestrcasecmpinstead which is compliant with C99If used without parameters it prints error message – As it stands it prints an error message if you don't use any parameters.
If you add more parameters the
argc == 2fails – For test purposes I tried adding a-vparameter, but this turned out to be cumbersome as this required some rewriting so I ended up hard-coding it (and removing it at the end).A proper parameter handling would require a `for` loop, and should most likely at least include the `-x` / `--hex` parameters, as well as a `-h` / `--help` parameters. In addition to allowing no parameters.
Consider simplifying
main()– A common pattern is to have as simple main method as possible, that is to do parameter parsing and then calling the appopriate function to be executed. This allows for a simple entry point, and also easier reuse of functions if you extend your program.I.e. if you renamed your current
main()toencode(), you could easily add thedecode()to the mix based on parameters. Having both of these withinmain()would not look tidy. This would also be a clearer segregation of duty, and better single responsibility design.Try to reduce nesting of
if's – This a little based on taste and personal opinions, but I tend to avoid usingwhile (true)andbreakif possible, and try to keep the nesting ofif's to a minimum.In your case, this can be done using the following:
while ((current_char = getchar()) != EOF) { if (current_char == previous_char) { current_char_count += 1; } else { if (current_char_count > 0) { write_count(current_char_count); } write_char(current_char); current_char_count = 1; } previous_char = current_char; } if (current_char_count > 0) { write_count(current_char_count); }To me this this clearer indicates the main purpose of reading characters until end of file, and by switching the
if's around, I also clearer indicates that the main thingy is counting equal characters. And it is a clearer connection between theifand theelseso that it is easier to see why we entered theelseclause.I've removed the reference related to
MAX_COUNTas I consider it rather esoteric if you run this code on something with2^64repetitions of a single character. Another change is that I need to finish of the writing of the last count in an additionalwrite_count()at the end. Still I think this reads somewhat easier than your implementation.Optionally change brace style – This is totally a personal preference, and the main point is to keep bracing consistent, which you do! But I prefer having the opening brace on the same line in
C, as I feel it make the code somewhat easier to read and slightly more compact.Do however note that I still keep braces around one-line blocks, as you do.
Simplify
write_count()– Instead of counting all bits used, and splitting out 5 bits and then 8 bits at a time, I would opt for a simpler method to write the count, whilst still maintaining a variable byte count to write larger numbers. Before presenting that, I'm not sure if you really need to count the actual bits, as you always write bytes, so your implementation could most probably be simplified as well.However if you use the 8th bit to indicate that more bits follows, there is no need to do all the precalculations, and you can do the following:
const int BIT_SHIFT = 7; const int LOWER_BITS = 0x7f; const int HIGH_BIT = 0x80; void write_count(uint64_t count) { int lower_bits = count & LOWER_BITS; while (lower_bits != count || count > LOWER_BITS) { write_char(lower_bits | HIGH_BIT); count >>= BIT_SHIFT; lower_bits = count & LOWER_BITS; } write_char(lower_bits); }This change will allow up to 127 repetitions to be represented using 2 bytes, instead of only 31 repetitions as in your original code. When using 3 bytes this implementation allows for 16384 repetitions (14 bits), whilst your allows 8192 repetitions (13 bits). After that your implementation is slightly more space efficient, but still slightly harder as you need to precalculate it.
Here is an example run showing the transition from 127 to 128 repetitions:
echo -n "eaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbc" | ./cr115765_rle -x 0x65 0x1 0x61 0x7f 0x62 0x80 0x1 0x63 0x1That is 1
e, 127 (0x7f)a's, 128b's (0x80 0x1), and 1c.Avoid magic numbers – In your version of
write_count()you use the magic numbers of5and8, whilst you've defined theNUM_LENGTH_BITS. Be consistent, and try to avoid most use of magic numbers. If you changed yourNUM_LENGTH_BITSthe rest of your code would not follow that change.Avoid repeating your self – In
write_char()theperror()&exit()part is repeated. If you change to using onlyprintf()this can be avoided, and simplified into:void write_char(int c) { if (printf(G_print_hex ? "0x%x " : "%c", c) < 0) { perror("Error printing to stdout."); exit(EXIT_FAILURE); } }Here I use a ternary to choose the proper format string, and I've also added a space to the end of the hex output to make it a little easier to read.
Be vary about globals – Global variables are in general to be avoided, but stuff like the
print_hexis awkward to pass as parameters everywhere. I tend to either uppercase them, or addG_in front of them so that it becomesG_print_hex. The general advice is though to make some sort of distinction so that you easily detect where you are using globals.Improve argument handling – I would look into alternate ways of handling your parameters, as the current implementation is somewhat handicapped due to the following reasons:
stricmpis limited to Windows –stricmpis Windows specific (see here), so please usestrcasecmpinstead which is compliant with C99If used without parameters it prints error message – As it stands it prints an error message if you don't use any parameters.
If you add more parameters the
argc == 2fails – For test purposes I tried adding a-vparameter, but this turned out to be cumbersome as this required some rewriting so I ended up hard-coding it (and removing it at the end).A proper parameter handling would require a `for` loop, and should most likely at least include the `-x` / `--hex` parameters, as well as a `-h` / `--help` parameters. In addition to allowing no parameters.
Consider simplifying
main()– A common pattern is to have as simple main method as possible, that is to do parameter parsing and then calling the appopriate function to be executed. This allows for a simple entry point, and also easier reuse of functions if you extend your program.I.e. if you renamed your current
main()toencode(), you could easily add thedecode()to the mix based on parameters. Having both of these withinmain()would not look tidy. This would also be a clearer segregation of duty, and better single responsibility design.Try to reduce nesting of
if's – This a little based on taste and personal opinions, but I tend to avoid usingwhile (true)andbreakif possible, and try to keep the nesting ofif's to a minimum.In your case, this can be done using the following:
while ((current_char = getchar()) != EOF) { if (current_char == previous_char) { current_char_count += 1; } else { if (current_char_count > 0) { write_count(current_char_count); } write_char(current_char); current_char_count = 1; } previous_char = current_char; } if (current_char_count > 0) { write_count(current_char_count); }To me this this clearer indicates the main purpose of reading characters until end of file, and by switching the
if's around, I also clearer indicates that the main thingy is counting equal characters. And it is a clearer connection between theifand theelseso that it is easier to see why we entered theelseclause.I've removed the reference related to
MAX_COUNTas I consider it rather esoteric if you run this code on something with2^64repetitions of a single character. Another change is that I need to finish of the writing of the last count in an additionalwrite_count()at the end. Still I think this reads somewhat easier than your implementation.Optionally change brace style – This is totally a personal preference, and the main point is to keep bracing consistent, which you do! But I prefer having the opening brace on the same line in
C, as I feel it make the code somewhat easier to read and slightly more compact.Do however note that I still keep braces around one-line blocks, as you do.
Simplify
write_count()– Instead of counting all bits used, and splitting out 5 bits and then 8 bits at a time, I would opt for a simpler method to write the count, whilst still maintaining a variable byte count to write larger numbers. Before presenting that, I'm not sure if you really need to count the actual bits, as you always write bytes, so your implementation could most probably be simplified as well.However if you use the 8th bit to indicate that more bits follows, there is no need to do all the precalculations, and you can do the following:
const int BIT_SHIFT = 7; const int LOWER_BITS = 0x7f; const int HIGH_BIT = 0x80; void write_count(uint64_t count) { int lower_bits = count & LOWER_BITS; while (lower_bits != count || count > LOWER_BITS) { write_char(lower_bits | HIGH_BIT); count >>= BIT_SHIFT; lower_bits = count & LOWER_BITS; } write_char(lower_bits); }This change will allow up to 127 repetitions to be represented using 2 bytes, instead of only 31 repetitions as in your original code. When using 3 bytes this implementation allows for 16384 repetitions (14 bits), whilst your allows 8192 repetitions (13 bits). After that your implementation is slightly more space efficient, but still slightly harder as you need to precalculate it.
Here is an example run showing the transition from 127 to 128 repetitions:
echo -n "eaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbc" | ./cr115765_rle -x 0x65 0x1 0x61 0x7f 0x62 0x80 0x1 0x63 0x1That is 1
e, 127 (0x7f)a's, 128b's (0x80 0x1), and 1c.Avoid magic numbers – In your version of
write_count()you use the magic numbers of5and8, whilst you've defined theNUM_LENGTH_BITS. Be consistent, and try to avoid most use of magic numbers. If you changed yourNUM_LENGTH_BITSthe rest of your code would not follow that change.Avoid repeating your self – In
write_char()theperror()&exit()part is repeated. If you change to using onlyprintf()this can be avoided, and simplified into:void write_char(int c) { if (printf(G_print_hex ? "0x%x " : "%c", c) < 0) { perror("Error printing to stdout."); exit(EXIT_FAILURE); } }Here I use a ternary to choose the proper format string, and I've also added a space to the end of the hex output to make it a little easier to read.
Be vary about globals – Global variables are in general to be avoided, but stuff like the
print_hexis awkward to pass as parameters everywhere. I tend to either uppercase them, or addG_in front of them so that it becomesG_print_hex. The general advice is though to make some sort of distinction so that you easily detect where you are using globals.Improve argument handling – I would look into alternate ways of handling your parameters, as the current implementation is somewhat handicapped due to the following reasons:
stricmpis limited to Windows –stricmpis Windows specific (see here), so please usestrcasecmpinstead which is compliant with C99If used without parameters it prints error message – As it stands it prints an error message if you don't use any parameters.
If you add more parameters the
argc == 2fails – For test purposes I tried adding a-vparameter, but this turned out to be cumbersome as this required some rewriting so I ended up hard-coding it (and removing it at the end).A proper parameter handling would require a `for` loop, and should most likely at least include the `-x` / `--hex` parameters, as well as a `-h` / `--help` parameters. In addition to allowing no parameters.
Consider simplifying
main()– A common pattern is to have as simple main method as possible, that is to do parameter parsing and then calling the appopriate function to be executed. This allows for a simple entry point, and also easier reuse of functions if you extend your program.I.e. if you renamed your current
main()toencode(), you could easily add thedecode()to the mix based on parameters. Having both of these withinmain()would not look tidy. This would also be a clearer segregation of duty, and better single responsibility design.Try to reduce nesting of
if's – This a little based on taste and personal opinions, but I tend to avoid usingwhile (true)andbreakif possible, and try to keep the nesting ofif's to a minimum.In your case, this can be done using the following:
while ((current_char = getchar()) != EOF) { if (current_char == previous_char) { current_char_count += 1; } else { if (current_char_count > 0) { write_count(current_char_count); } write_char(current_char); current_char_count = 1; } previous_char = current_char; } if (current_char_count > 0) { write_count(current_char_count); }To me this this clearer indicates the main purpose of reading characters until end of file, and by switching the
if's around, I also clearer indicates that the main thingy is counting equal characters. And it is a clearer connection between theifand theelseso that it is easier to see why we entered theelseclause.I've removed the reference related to
MAX_COUNTas I consider it rather esoteric if you run this code on something with2^64repetitions of a single character. Another change is that I need to finish of the writing of the last count in an additionalwrite_count()at the end. Still I think this reads somewhat easier than your implementation.Optionally change brace style – This is totally a personal preference, and the main point is to keep bracing consistent, which you do! But I prefer having the opening brace on the same line in
C, as I feel it make the code somewhat easier to read and slightly more compact.Do however note that I still keep braces around one-line blocks, as you do.
PS! Using previous_char=-1 one avoids a bug when testing this with dd if=/dev/zero bs=1 count=65, as getchar() only returns unsigned char's or EOF. But previous_char is declared as an int, and as such using -1 as a start value, is safer than the original 0.
PS! Using previous_char=-1 one avoids a bug when testing this with dd if=/dev/zero bs=1 count=65, as getchar() only returns unsigned char's or EOF. But previous_char is declared as an int, and as such using -1 as a start value, is safer than the original 0.
#include <stdio.h>
#include <stdbool.h>
#include <stdlib.h>
#include <string.h>
const int BIT_SHIFT = 7;
const int LOWER_BITS = 0x7f;
const int HIGH_BIT = 0x80;
bool G_print_hex = false;
void write_char(int c)
{
if (printf(G_print_hex ? "0x%x " : "%c", c) < 0) {
perror("Error printing to stdout.");
exit(EXIT_FAILURE);
}
}
void write_count(uint64_t count)
{
int lower_bits = count & LOWER_BITS;
while (lower_bits != count || count > LOWER_BITS) {
write_char(lower_bits | HIGH_BIT);
count >>= BIT_SHIFT;
lower_bits = count & LOWER_BITS;
}
write_char(lower_bits);
}
void parse_args(int argc, char **argv)
{
if (argc == 2) {
G_print_hex = (strcasecmp(argv[1], "-x") == 0 || strcasecmp(argv[1], "--hex") == 0);
} else if (argc != 1) {
printf("Invalid number of arguments passed: %d\n", argc - 1);
exit(EXIT_FAILURE);
}
}
int main(int argc, char** argv)
{
int current_char = 0;
int previous_char = 0;-1; // A non-legal char, so it triggers a change on first character begin chr(0)
uint64_t current_char_count = 0;
parse_args(argc, argv);
while ((current_char = getchar()) != EOF) {
if (current_char == previous_char) {
current_char_count += 1;
} else {
if (current_char_count > 0) {
write_count(current_char_count);
}
write_char(current_char);
current_char_count = 1;
}
previous_char = current_char;
}
if (current_char_count > 0) {
write_count(current_char_count);
}
}
#include <stdio.h>
#include <stdbool.h>
#include <stdlib.h>
#include <string.h>
const int BIT_SHIFT = 7;
const int LOWER_BITS = 0x7f;
const int HIGH_BIT = 0x80;
bool G_print_hex = false;
void write_char(int c)
{
if (printf(G_print_hex ? "0x%x " : "%c", c) < 0) {
perror("Error printing to stdout.");
exit(EXIT_FAILURE);
}
}
void write_count(uint64_t count)
{
int lower_bits = count & LOWER_BITS;
while (lower_bits != count || count > LOWER_BITS) {
write_char(lower_bits | HIGH_BIT);
count >>= BIT_SHIFT;
lower_bits = count & LOWER_BITS;
}
write_char(lower_bits);
}
void parse_args(int argc, char **argv)
{
if (argc == 2) {
G_print_hex = (strcasecmp(argv[1], "-x") == 0 || strcasecmp(argv[1], "--hex") == 0);
} else if (argc != 1) {
printf("Invalid number of arguments passed: %d\n", argc - 1);
exit(EXIT_FAILURE);
}
}
int main(int argc, char** argv)
{
int current_char = 0;
int previous_char = 0;
uint64_t current_char_count = 0;
parse_args(argc, argv);
while ((current_char = getchar()) != EOF) {
if (current_char == previous_char) {
current_char_count += 1;
} else {
if (current_char_count > 0) {
write_count(current_char_count);
}
write_char(current_char);
current_char_count = 1;
}
previous_char = current_char;
}
if (current_char_count > 0) {
write_count(current_char_count);
}
}
#include <stdio.h>
#include <stdbool.h>
#include <stdlib.h>
#include <string.h>
const int BIT_SHIFT = 7;
const int LOWER_BITS = 0x7f;
const int HIGH_BIT = 0x80;
bool G_print_hex = false;
void write_char(int c)
{
if (printf(G_print_hex ? "0x%x " : "%c", c) < 0) {
perror("Error printing to stdout.");
exit(EXIT_FAILURE);
}
}
void write_count(uint64_t count)
{
int lower_bits = count & LOWER_BITS;
while (lower_bits != count || count > LOWER_BITS) {
write_char(lower_bits | HIGH_BIT);
count >>= BIT_SHIFT;
lower_bits = count & LOWER_BITS;
}
write_char(lower_bits);
}
void parse_args(int argc, char **argv)
{
if (argc == 2) {
G_print_hex = (strcasecmp(argv[1], "-x") == 0 || strcasecmp(argv[1], "--hex") == 0);
} else if (argc != 1) {
printf("Invalid number of arguments passed: %d\n", argc - 1);
exit(EXIT_FAILURE);
}
}
int main(int argc, char** argv)
{
int current_char = 0;
int previous_char = -1; // A non-legal char, so it triggers a change on first character begin chr(0)
uint64_t current_char_count = 0;
parse_args(argc, argv);
while ((current_char = getchar()) != EOF) {
if (current_char == previous_char) {
current_char_count += 1;
} else {
if (current_char_count > 0) {
write_count(current_char_count);
}
write_char(current_char);
current_char_count = 1;
}
previous_char = current_char;
}
if (current_char_count > 0) {
write_count(current_char_count);
}
}