3

I need some help understanding a function that i want to use but I'm not entirely sure what some parts of it do. I understand that the function is creating dictionaries from reads out of a Fasta-file. From what I understand this is supposed to generate pre- and suffix dictionaries for ultimately extending contigs (overlapping dna-sequences). The code:

def makeSuffixDict(reads, lenSuffix = 20, verbose = True):
    lenKeys = len(reads[0]) - lenSuffix
    dict = {}
    multipleKeys = []
    i = 1
    for read in reads:
        if read[0:lenKeys] in dict:
            multipleKeys.append(read[0:lenKeys])
        else:
            dict[read[0:lenKeys]] = read[lenKeys:]
        if verbose:
            print("\rChecking suffix", i, "of", len(reads), end = "", flush = True)
            i += 1
    for key in set(multipleKeys):
        del(dict[key])
    if verbose:
        print("\nCreated", len(dict), "suffixes with length", lenSuffix, \
            "from", len(reads), "Reads. (", len(reads) - len(dict), \
            "unambigous)")
    return(dict) 

Additional Information: reads = readFasta("smallReads.fna", verbose = True)

This is how the function is called:

if __name__ == "__main__":
    reads = readFasta("smallReads.fna", verbose = True)
    suffixDicts = makeSuffixDicts(reads, 10)

The smallReads.fna file contains strings of bases (Dna):

"> read 1

TTATGAATATTACGCAATGGACGTCCAAGGTACAGCGTATTTGTACGCTA

"> read 2

AACTGCTATCTTTCTTGTCCACTCGAAAATCCATAACGTAGCCCATAACG

"> read 3

TCAGTTATCCTATATACTGGATCCCGACTTTAATCGGCGTCGGAATTACT

Here are the parts I don't understand:

lenKeys = len(reads[0]) - lenSuffix

What does the value [0] mean? From what I understand "len" returns the number of elements in a list. Why is "reads" automatically a list? edit: It seems a Fasta-file can be declared as a List. Can anybody confirm that?

if read[0:lenKeys] in dict:

Does this mean "from 0 to 'lenKeys'"? Still confused about the value. In another function there is a similar line: if read[-lenKeys:] in dict: What does the "-" do?

def makeSuffixDict(reads, lenSuffix = 20, verbose = True):

Here I don't understand the parameters: How can reads be a parameter? What is lenSuffix = 20 in the context of this function other than a value subtracted from len(reads[0])? What is verbose? I have read about a "verbose-mode" ignoring whitespaces but i have never seen it used as a parameter and later as a variable.

3
  • It seems clear that this makeSuffixDict function expects that reads is in fact a list (if you don't pass it a list, it won't work). Do you have documentation for this function that specifies its requirements? Commented Jun 1, 2015 at 22:07
  • Lots of questions in here and I'll answer a few: the brackets are slice notation, so read[:lenKeys] means "everything in read up to index number lenKeys". Similarly, read[-lenKeys] is just an index, but using a negative operator. So, "lenKeys objects back from the end of read". Commented Jun 1, 2015 at 22:08
  • There is no documentation, Greg. This was available with the smallReads.fna file for a programming course i guess. I'll edit the content of the fna-file which i just read can be declared a list. It contains strings of bases (dna). @a p: thanks, that clarifies that part. Commented Jun 1, 2015 at 22:16

1 Answer 1

3

The tone of your question makes me feel like you're confusing things like program features (len, functions, etc) with things that were defined by the original programmer (the type of reads, verbose, etc).

def some_function(these, are, arbitrary, parameters):
    pass

This function defines a bunch of parameters. They don't mean anything at all, other than the value I give to them implicitly. For example if I do:

def reverse_string(s):
    pass

s is probably a string, right? In your example we have:

def makeSuffixDict(reads, lenSuffix = 20, verbose = True):
    lenKeys = len(reads[0]) - lenSuffix
    ...

From these two lines we can infer a few things:

  • the function will probably return a dictionary (from its name)
  • lenSuffix is an int, and verbose is a bool (from their default parameters)
  • reads can be indexed (string? list? tuple?)
  • the items inside reads have length (string? list? tuple?)

Since Python is dynamically typed, this is ALL WE CAN KNOW about the function so far. The rest would be explained by its documentation or the way it's called.

That said: let me cover all your questions in order:

  1. What does the value [0] mean?

some_object[0] is grabbing the first item in a container. [1,2,3][0] == 1, "Hello, World!"[0] == "H". This is called indexing, and is governed by the __getitem__ magic method

  1. From what I understand "len" returns the number of elements in a list.

len is a built-in function that returns the length of an object. It is governed by the __len__ magic method. len('abc') == 3, also len([1, 2, 3]) == 3. Note that len(['abc']) == 1, since it is measuring the length of the list, not the string inside it.

  1. Why is "reads" automatically a list?

reads is a parameter. It is whatever the calling scope passes to it. It does appear that it expects a list, but that's not a hard and fast rule!

  1. (various questions about slicing)

Slicing is doing some_container[start_idx : end_idx [ : step_size]]. It does pretty much what you'd expect: "0123456"[0:3] == "012". Slice indexes are considered to be zero-indexed and lay between the elements, so [0:1] is identical to [0], except that slices return lists, not individual objects (so 'abc'[0] == 'a' but 'abc'[0:1] == ['a']). If you omit either start or end index, it is treated as the beginning or end of the string respectively. I won't go into step size here.

Negative indexes count from the back, so '0123456'[-3:] == '456'. Note that [-0]is not the last value,[-1]is. This is contrasted with[0]` being the first value.

  1. How can reads be a parameter?

Because the function is defined as makeSuffixDict(reads, ...). That's what a parameter is.

  1. What is lenSuffix = 20 in the context of this function

Looks like it's the length of the expected suffix!

  1. What is verbose?

verbose has no meaning on its own. It's just another parameter. Looks like the author included the verbose flag so you could get output while the function ran. Notice all the if verbose blocks seem to do nothing, just provide feedback to the user.

Sign up to request clarification or add additional context in comments.

4 Comments

From your edit: note that the reads in reads = readFasta("smallReads.fna", verbose = True) is in your module scope, while the reads inside makeSuffixDict is in the function scope. They're different! Though I would hazard a guess that makeSuffixDict is called with that same reads variable.
thanks! this explains a lot. I'm new to Python so it's true that i confuse many things. I edited how the function is called into the question.
From what I understand this is supposed to generate pre- and suffix dictionaries for ultimately extending contigs (overlapping dna-sequences). Just added that to the initial post.
@grindbert yeah I don't know anything about the subject. To me it looks like reads is a list of strings, and each string is supposed to be the same length so lenKeys is calculated only once so read[:lenKeys] is everything before the suffix and read[lenSuffix:] is the suffix.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.