1

i am facing some problem with files with huge data. i need to skip doing some execution on those files. i get the data of the file into a variable. now i need to get the byte of the variable and if it is greater than 102400 , then print a message.

update : i cannot open the files , since it is present in a tar file. the content is already getting copied to a variable called 'data' i am able to print contents of the variable data. i just need to check if it has more than 102400 bytes.

thanks

2
  • 2
    If this is a python question, why have you tagged it as C? Commented Jan 7, 2010 at 12:47
  • I suspect he tried to tag it as wc -c Commented Jan 7, 2010 at 13:05

5 Answers 5

6
import os
length_in_bytes = os.stat('file.txt').st_size
if length_in_bytes > 102400:
   print 'Its a big file!'

Update to work on files in a tarfile

import tarfile
tf = tarfile.TarFile('foo.tar')
for member in tarfile.getmembers():
    if member.size > 102400:
        print 'It's a big file in a tarfile - the file is called %s!' % member.name
Sign up to request clarification or add additional context in comments.

2 Comments

@randeepsp I've updated the example to work show an example working with tarfiles
This is better than checking len(data) because it entirely skips reading the data when it's big.
2

Just check the length of the string, then:

if len(data) > 102400:
  print "Skipping file which is too large, at %d bytes" % len(data)
else:
  process(data) # The normal processing

Comments

2

If I'm understanding the question correctly, you want to skip certain input files if they're too large. For that, you can use os.path.getsize():

import os.path
if os.path.getsize('f') <= 102400:
  doit();

Comments

1

len(data) gives you the size in bytes if it's binary data. With strings the size depends on the encoding used.

Comments

0

This answer seems irrelevant, since I seem to have misunderstood the question, which has now been clarified. However, should someone find this question, while searching with pretty much the same terms, this answer may still be relevant:

Just open the file in binary mode

f = open(filename, 'rb')

read/skip a bunch and print the next byte(s). I used the same method to 'fix' the n-th byte in a zillion images once.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.