18

What would be the most efficient method of reading a text file into a dynamic one-dimensional array? reallocing after every read char seems silly, reallocing after every read line doesn't seem much better. I would like to read the entire file into the array. How would you do it?

1
  • I might have misuderstood what you want to do: Do you want to just read the whole file into a big buffer, or do you want an array with an entry for each line? Commented Jan 4, 2009 at 17:05

3 Answers 3

26

I don't understand quite what you want. Do you want to incrementally process the file, reading one line from it, then abandon it and process the next? Or do you want to read the entire file into a buffer? If you want the latter, I think this is appropriate (check for NULL return for malloc and fopen in real code for whether the file exist and whether you got enough memory):

FILE *f = fopen("text.txt", "rb");
fseek(f, 0, SEEK_END);
long pos = ftell(f);
fseek(f, 0, SEEK_SET);

char *bytes = malloc(pos);
fread(bytes, pos, 1, f);
fclose(f);

hexdump(bytes); // do some stuff with it
free(bytes); // free allocated memory
Sign up to request clarification or add additional context in comments.

11 Comments

Yes, that would apply to my case. I meant that using realloc after each read char seems very inefficient, similarly after every read \n (to extend the array).
You should open the file in binary mode - there might be problems otherwise (check eg. glibc manual, 12.17)
hi, what is the difference between (let's assume we use 100 instead of pos) char *bytes = malloc(100*sizeof(char)); and above line where you have written char *bytes = malloc(100); second question is that what if my file has 180205962 characters in it. will the above way of reading the file would be efficient?
@asel, first question: sizeof(char) is defined to be 1, so there is no difference. Second question: no, you probably should read it incrementally (like, line-by-line, or some other piecewise method). Otherwise, your memory will quickly become exhausted.
Using fseek/ftell to get the file's size is insecure. See this CERT reference for why that is and how to do it securely: securecoding.cert.org/confluence/display/seccode/…
|
12

If mmap(2) is available on your system, you can open the file and map it into memory. That way, you have no memory to allocate, you even don't have to read the file, the system will do it. You can use the fseek() trick litb gave to get the size.

void *mmap(void *start, size_t length, int prot, int flags, int fd, off_t offset);

EDIT: You have to use lseek() to obtain the size of the file, .

int fd = open("filename", O_RDONLY);
int nbytes = lseek(fd, 0, SEEK_END);
void *content = mmap(NULL, nbytes, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);

3 Comments

@saffsd you have enough rep to fix it, you know how it works here.
forgot about that, fixed and deleted comment.
A possibly more idiomatic way to get the file size is to use fstat(2) function: struct stat S; fstat(fd, &S);, then int nbytes = S.st_size is the file size in bytes, direct from the filesystem, without any reads of the file (this would doubtless get the same result as above; I mention it largely for completeness).
1

If you want to use ISO C, use this function.

It's litb's answer, wrapped with some error handling...

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.