2

I need to extract values from the text file below:

fdsjhgjhg
fdshkjhk
Start
Good Morning
Hello World
End
dashjkhjk
dsfjkhk

The values I need to extract are from Start to End.

with open('path/to/input') as infile, open('path/to/output', 'w') as outfile:
    copy = False
    for line in infile:
        if line.strip() == "Start":
            copy = True
        elif line.strip() == "End":
            copy = False
        elif copy:
            outfile.write(line)

The code above I am using is from this question: Extract Values between two strings in a text file using python

This code will not include the strings "Start" and "End" just what is inside them. How would you include the perimeter strings?

1
  • I would use multiline RegExp for that - the code will also look much easier Commented Mar 2, 2016 at 21:36

3 Answers 3

4

@en_Knight has it almost right. Here's a fix to meet the OP's request that the delimiters ARE included in the output:

with open('path/to/input') as infile, open('path/to/output', 'w') as outfile:
    copy = False
    for line in infile:
        if line.strip() == "Start":
            copy = True
        if copy:
            outfile.write(line)
        # move this AFTER the "if copy"
        if line.strip() == "End":
            copy = False

OR simply include the write() in the case it applies to:

with open('path/to/input') as infile, open('path/to/output', 'w') as outfile:
    copy = False
    for line in infile:
        if line.strip() == "Start":
            outfile.write(line) # add this
            copy = True
        elif line.strip() == "End":
            outfile.write(line) # add this
            copy = False
        elif copy:
            outfile.write(line)

Update: to answer the question in the comment "only use the 1st occurance of 'End' after 'Start'", change the last elif line.strip() == "End" to:

        elif line.strip() == "End" and copy:
            outfile.write(line) # add this
            copy = False

This works if there is only ONE "Start" but multiple "End" lines... which sounds odd, but that is what the questioner asked.

Sign up to request clarification or add additional context in comments.

3 Comments

That makes a lot of sense. Is it possible to be selective and end the copy only use the 1st occurance of 'End' after 'Start'. My file contains a number of strings 'End'?
@Dan H what if there is a Start after End how to prevent to copy this Strat? and stop copying immediately
@Catalina : options: 1) call exit() after you see "End". 2) count the number of starts you see; only set copy to "True" if this is the first one.
1

RegExp approach:

import re

with open('input.txt') as f:
    data = f.read()

match = re.search(r'\n(Start\n.*?\nEnd)\n', data, re.M | re.S)
if match:
    with open('output.txt', 'w') as f:
        f.write(match.group(1))

2 Comments

This is probably the more robust solution, but for someone who was unclear on elif v if, maybe you could include some textual description?
This is better: (^Start[\s\S]+^End) Demo (Or (^Start[\s\S]+?^End) if there is more than 1 End...)
1

The "elif" means "do this only if the other cases fail". It's syntactically equivalent to "else if", if you're coming from a differnet C-like language. Without it, the fall through should take care of including "Start" and "End"

with open('path/to/input') as infile, open('path/to/output', 'w') as outfile:
    copy = False
    for line in infile:
        if line.strip() == "Start":
            copy = True
        if copy: # flipped to include end, as Dan H pointed out
            outfile.write(line)
        if line.strip() == "End":
            copy = False

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.