I am dealing with short strings (they're DNA sequences) of ~30 length. For my purposes, every 5th position needs to be swapped for any of the 4 DNA bases (A, C, T, G).
e.g. if I have an input of AAAAAAAAAAAAAA
the output would be a list of:
AAAAAAAAAAAAAA
AAAACAAAAAAAAA
AAAATAAAAAAAAA
AAAAGAAAAAAAAA
AAAACAAAACAAAA
AAAACAAAATAAAA
....
That is, every 5th position is individually swapped for A,C,T or G, to generate an array of all possible sequences where each 5th position is all possible DNA bases.
I have been attempting this with for loops, and can edit each 5th position, but not in a combinatorial approach
e.g.
echo "AAAAAAAAAAAAAAA" > one.spacer
for i in $(seq 1 3)
do
for base in {a,c,t,g}
do
awk -v b=$base -v x=$i '{print substr ($0,1,5*x-1) b substr ($0,5*x+1,100)}' one.spacer
done
done
gives the output:
AAAAaAAAAAAAAAA
AAAAcAAAAAAAAAA
AAAAtAAAAAAAAAA
AAAAgAAAAAAAAAA
AAAAAAAAAaAAAAA
AAAAAAAAAcAAAAA
AAAAAAAAAtAAAAA
AAAAAAAAAgAAAAA
AAAAAAAAAAAAAAa
AAAAAAAAAAAAAAc
AAAAAAAAAAAAAAt
AAAAAAAAAAAAAAg
but hopefully you can see that this is edited only singly at each 5th position. I need list of sequences that will include, for example
AAAAgAAAAgAAAAg
AAAAcAAAAtAAAAa
as well as all the other combinations. Hopefully that's a little clearer