I want to use a regular expression that would match the pattern 'ATATAT' (of any length) and/or 'GCCGCCGCC' (again of any length) in a text file. I have only four options and one of them should work, but I have tried all of them on a text file containing those patterns several times. But any of the codes below either don't return anything or end up in an error: "grep: Invalid back reference". Maybe I shouldn't be using grep at all?
- [ATGC]{2,}
- ([ATGC]{2,})\1+
- ([ATGC]{2,}){2,}
- ([ATGC])\1+
Principally, the code I am using is the following:
grep 'one_of_the_patterns_above' DNA_sequence_file.fasta
And the file looks something like this:
>sampled sequence 1 consisting of 500 bases.
GCAAAGTAGCCGAGGTCAGGGCATGTCAATGATAGCGCGAAAAGGTCACCACGAGAAGCG
GCACTCGGCCACGGATTGGTGGCACTTCATATGGAAACGCGACGACCGATAAAAACACAA
CGAAACCCAATTGGAATGAGATTTTCCTGAAACCGCAGCGAACCCAACCAAGCGGGAATA
AAGTCGGGAAGTCTAAACGAGATTAGCAGAATCCACCTCAGAATGACTGATGCCATGTAG
GCGCAGCAATAGATTACCGAAAGAGAAACACAGCAACGGATACATACAACTCAAGGGAAG
AGCACCTTTCGCTGAGAGGAGACGCCTTACAAACTATCCAGGGGTTTGAACAAGACAGGT
CGAAAAGCGGCCCTCTTCACAACCAGGTCAAGCGCGACTCGAGACAAGTATTCCCAAAGT
CCAAAAAAGAATCCTACAGAATCCCATCAAAGCATTTGTAGAAAGACATGGCCTACCAGC
TGCGCAAAGGACACATTACC