I wrote a program that reads a pcap file and parses the HTTP traffic in the pcap to generate a dictionary that contains HTTP headers for each request and response in this pcap.
My code does the following:
- Uses tcpflow to reassemble the tcp segments
- Read the files generated by tcpflow and check if it related to HTTP
- If the file contains HTTP traffic, my code will read the file and generate a corresponding dictionary that contains the HTTP header fields.
I test my code with multiple test cases, but honestly I don't have a good experience in Python, so could anyone check it for me please?
import os
from os import listdir
from os.path import isfile, join
from StringIO import StringIO
import mimetools
def getFields(headers):
    fields={}
    i=1
    for header in headers:
        if len(header)==0:
           continue
        # if this line is complement for the previous line   
        if header.find(" ")==0 or 
           header.find("\t")==0:
           continue
        if len(header.split(":"))>=2:
           key = header.split(":")[0].strip()
           # if the key has multiple values such as cookie
           if fields.has_key(key):
              fields[key]=fields[key]+" "+header[header.find(":")+1:].strip()
           else:
              fields[key]=header[header.find(":")+1:].strip()
              while headers[i].find(" ")==0 or  
                    headers[i].find("\t")==0 :
                    fields[key]=fields[key]+" "+headers[i].strip()
                    i=i+1
              # end of the while loop
          # end of the else
        else:
             # else for [if len(header.split(":"))>=2: ]
             print "ERROR: RFC VIOLATION"
      # end of the for loop
    return fields            
def main():
    # you have to write it in the terminal "cd /home/user/Desktop/empty-dir"
    os.system("tcpflow -r /home/user/Desktop/12.pcap -v")
    for f in listdir("/home/user/Desktop/empty-dir"):
        if f.find("80")==19 or f.find("80")==41:
           with open("/home/user/Desktop/empty-dir"+f) as fh:
                fields={}
                content=fh.read()  #to test you could replace it with content="any    custom http header"
                if content.find("\r\n\r\n")==-1:
                   print "ERROR: RFC VIOLATION"
                   return
                headerSection=content.split("\r\n\r\n")[0]
                headerLines=headerSection.split("\r\n")
                firstLine=headerLines[0]
                firstLineFields=firstLine.split(" ")            
                if len(headerLines)>1:
                   fields=getFields(headerLines[1:])
                if len(firstLineFields)>=3:                     
                   if firstLine.find("HTTP")==0:
                      fields["Version"]=firstLineFields[0]
                      fields["Status-code"]=firstLineFields[1]
                      fields["Status-desc"]=" ".join(firstLineFields[2:])
                   else:
                      fields["Method"]=firstLineFields[0]
                      fields["URL"]=firstLineFields[1]
                      fields["Version"]=firstLineFields[2]
                else:
                  print "ERROR: RFC VIOLATION"
                  continue 
                print fields
                print "__________________"
    return 0
if __name__ == '__main__':
 main()

main()function, this is not C, just put everything that yourmain()function does under theif __name__ == '__main__':it works like C. \$\endgroup\$main()function is not entirely a bad idea, I think. \$\endgroup\$a = 2; b = 3; c = a + b;if I wont needaandbanymore, just need tocbe equals to5... \$\endgroup\$