0

I'm trying to parse generated files into a list of objects.

Unfortunately the structure of the generated files is not always the same, but they contain the same fields (and lots of other garbage).

For example:

    function foo();              # Don't Care
    function maybeanotherfoo();  # Don't Care
    int maybemoregarbage;        # Don't Care

    
    product_serial = "CDE1102"; # I want this <---------------------
    unnecessary_info1 = 10;     # Don't Care
    unnecessary_info2 = "red"   # Don't Care
    product_id = 1134412;       # I want this <---------------------
    unnecessary_info3 = "88"    # Don't Care

    product_serial = "DD1232";  # I want this <---------------------
    product_id = 3345111;       # I want this <---------------------
    unnecessary_info1 = "22"    # Don't Care
    unnecessary_info2 = "panda" # Don't Care

    product_serial = "CDE1102"; # I want this <---------------------
    unnecessary_info1 = 10;     # Don't Care
    unnecessary_info2 = "red"   # Don't Care
    unnecessary_info3 = "bear"  # Don't Care
    unnecessary_info4 = 119     # Don't Care
    product_id = 1112331;       # I want this <---------------------
    unnecessary_info5 = "jj"    # Don't Care

I want a list of objects (each object has: serial and id).

I have tried the following:


import re

class Product:
    def __init__(self, id, serial):
        self.product_id = id
        self.product_serial = serial

linenum = 0
first_string = "product_serial"
second_string = "product_id"
with open('products.txt', "r") as products_file:
    for line in products_file:
        linenum += 1
        if line.find(first_string) != -1:
            product_serial = re.search('\"([^"]+)', line).group(1)
            #How do I proceed?                


Any advice would be greatly appreciated! Thanks!

3
  • So what does your code do? Does it work? Are there errors? If so, what are they? Commented Sep 8, 2020 at 17:49
  • My code can find the first product_serial (CDE1102). But how can I then find the product_id and then continue parsing from that point on? Commented Sep 8, 2020 at 17:50
  • Please repeat on topic and how to ask from the intro tour. “Show me how to solve this coding problem” is not a Stack Overflow issue. You have to make an honest attempt, and then ask a specific question about your algorithm or technique. "Any advice" is far too broad for Stack Overflow. There are many tutorials that show you how to read a file, how to process string data, etc. You should be able to identify a constant string in the input and to separate input lines. Commented Sep 8, 2020 at 17:52

1 Answer 1

2

I've inlined the data here using an io.StringIO(), but you can substitute data for your products_file.

The idea is that we gather key/values into current_object, and as soon as we have all the data we know we need for a single object (the two keys), we push it onto a list of objects and prime a new current_object.

You could use something like if line.startswith('product_serial') instead of the admittedly complex regexp.

import io
import re

data = io.StringIO("""
    function foo();             
    function maybeanotherfoo(); 
    int maybemoregarbage;       

    
    product_serial = "CDE1102"; 
    unnecessary_info1 = 10;     
    unnecessary_info2 = "red"   
    product_id = 1134412;       
    unnecessary_info3 = "88"    

    product_serial = "DD1232";  
    product_id = 3345111;       
    unnecessary_info1 = "22"    
    unnecessary_info2 = "panda" 

    product_serial = "CDE1102"; 
    unnecessary_info1 = 10;     
    unnecessary_info2 = "red"   
    unnecessary_info3 = "bear"  
    unnecessary_info4 = 119     
    product_id = 1112331;       
    unnecessary_info5 = "jj"    
""")

objects = []

current_object = {}
for line in data:
    line = line.strip()  # Remove leading and trailing whitespace
    m = re.match(r'^(product_id|product_serial)\s*=\s*(\d+|"(?:.+?)");?$', line)

    if m:
        key, value = m.groups()
        current_object[key] = value.strip('"')
        if len(current_object) == 2:  # Got the two keys we want, ship the object
            objects.append(current_object)
            current_object = {}

print(objects)
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.