Skip to main content
Added explanation and reference.
Source Link
arshovon
  • 205
  • 2
  • 7

PythonProblem: Given two sequences, print the longest subsequence present in both of them. A subsequence is a sequence that appears in the same relative order, but not necessarily contiguous. For example, “abc”, “abg”, “bdf”, “aeg”, ‘”acefg”, .. etc are subsequences of “abcdefg”. So a string of length n has 2^n different possible subsequences.

Examples: LCS for input Sequences “ABCDGH” and “AEDFHR” is “ADH” of length 3 code:. LCS for input Sequences “AGGTAB” and “GXTXAYB” is “GTAB” of length 4.

Python 3 code:

def calculate_lcs_length(a,b):
    a_len = len(a)
    b_len = len(b)
    dp = []
    for i in range(a_len + 1):
        dp.append([0 for j in range(b_len + 1)])
    for i in range(1, a_len + 1):
        for j in range(1, b_len + 1):
            if a[i - 1] == b[j - 1]:
                dp[i][j] = dp[i - 1][j - 1] + 1
            else:
                dp[i][j] = max(dp[i-1][j], dp[i][j - 1])
    max_length = dp[a_len][b_len]
    return dp, max_length

def get_path(a, b, dp, i, j):
    if i == 0 or j == 0:
        return ""
    if a[i-1] == b[j-1]:
        return get_path(a, b, dp, i-1, j-1) + a[i-1]
    else:
        if dp[i-1][j] > dp[i][j-1]:
            return get_path(a, b, dp, i-1, j)
        else:
            return get_path(a, b, dp, i, j-1)
        
if __name__ == "__main__":
    a = "abcdef""ABCDGH"
    b = "aqbwc""AEDFHR"
    dp, max_length = calculate_lcs_length(a,b)
    lcs_str = get_path(a, b, dp, len(a), len(b))
    print(lcs_str)

Output: ADH

I wonder if I could use one single method (without using recursion) to get the length and string both.

Is this code can be more reader friendly. I am not asking for one liner or complex optimization improve.

Reference: Longest common subsequence problem, From Wikipedia

Python 3 code:

def calculate_lcs_length(a,b):
    a_len = len(a)
    b_len = len(b)
    dp = []
    for i in range(a_len + 1):
        dp.append([0 for j in range(b_len + 1)])
    for i in range(1, a_len + 1):
        for j in range(1, b_len + 1):
            if a[i - 1] == b[j - 1]:
                dp[i][j] = dp[i - 1][j - 1] + 1
            else:
                dp[i][j] = max(dp[i-1][j], dp[i][j - 1])
    max_length = dp[a_len][b_len]
    return dp, max_length

def get_path(a, b, dp, i, j):
    if i == 0 or j == 0:
        return ""
    if a[i-1] == b[j-1]:
        return get_path(a, b, dp, i-1, j-1) + a[i-1]
    else:
        if dp[i-1][j] > dp[i][j-1]:
            return get_path(a, b, dp, i-1, j)
        else:
            return get_path(a, b, dp, i, j-1)
        
if __name__ == "__main__":
    a = "abcdef"
    b = "aqbwc"
    dp, max_length = calculate_lcs_length(a,b)
    lcs_str = get_path(a, b, dp, len(a), len(b))
    print(lcs_str)

I wonder if I could use one single method (without using recursion) to get the length and string both.

Is this code can be more reader friendly. I am not asking for one liner or complex optimization improve.

Problem: Given two sequences, print the longest subsequence present in both of them. A subsequence is a sequence that appears in the same relative order, but not necessarily contiguous. For example, “abc”, “abg”, “bdf”, “aeg”, ‘”acefg”, .. etc are subsequences of “abcdefg”. So a string of length n has 2^n different possible subsequences.

Examples: LCS for input Sequences “ABCDGH” and “AEDFHR” is “ADH” of length 3. LCS for input Sequences “AGGTAB” and “GXTXAYB” is “GTAB” of length 4.

Python 3 code:

def calculate_lcs_length(a,b):
    a_len = len(a)
    b_len = len(b)
    dp = []
    for i in range(a_len + 1):
        dp.append([0 for j in range(b_len + 1)])
    for i in range(1, a_len + 1):
        for j in range(1, b_len + 1):
            if a[i - 1] == b[j - 1]:
                dp[i][j] = dp[i - 1][j - 1] + 1
            else:
                dp[i][j] = max(dp[i-1][j], dp[i][j - 1])
    max_length = dp[a_len][b_len]
    return dp, max_length

def get_path(a, b, dp, i, j):
    if i == 0 or j == 0:
        return ""
    if a[i-1] == b[j-1]:
        return get_path(a, b, dp, i-1, j-1) + a[i-1]
    else:
        if dp[i-1][j] > dp[i][j-1]:
            return get_path(a, b, dp, i-1, j)
        else:
            return get_path(a, b, dp, i, j-1)
        
if __name__ == "__main__":
    a = "ABCDGH"
    b = "AEDFHR"
    dp, max_length = calculate_lcs_length(a,b)
    lcs_str = get_path(a, b, dp, len(a), len(b))
    print(lcs_str)

Output: ADH

I wonder if I could use one single method (without using recursion) to get the length and string both.

Is this code can be more reader friendly. I am not asking for one liner or complex optimization improve.

Reference: Longest common subsequence problem, From Wikipedia

Source Link
arshovon
  • 205
  • 2
  • 7

Longest common subsequence length and backtracking the string

Python 3 code:

def calculate_lcs_length(a,b):
    a_len = len(a)
    b_len = len(b)
    dp = []
    for i in range(a_len + 1):
        dp.append([0 for j in range(b_len + 1)])
    for i in range(1, a_len + 1):
        for j in range(1, b_len + 1):
            if a[i - 1] == b[j - 1]:
                dp[i][j] = dp[i - 1][j - 1] + 1
            else:
                dp[i][j] = max(dp[i-1][j], dp[i][j - 1])
    max_length = dp[a_len][b_len]
    return dp, max_length

def get_path(a, b, dp, i, j):
    if i == 0 or j == 0:
        return ""
    if a[i-1] == b[j-1]:
        return get_path(a, b, dp, i-1, j-1) + a[i-1]
    else:
        if dp[i-1][j] > dp[i][j-1]:
            return get_path(a, b, dp, i-1, j)
        else:
            return get_path(a, b, dp, i, j-1)
        
if __name__ == "__main__":
    a = "abcdef"
    b = "aqbwc"
    dp, max_length = calculate_lcs_length(a,b)
    lcs_str = get_path(a, b, dp, len(a), len(b))
    print(lcs_str)

I wonder if I could use one single method (without using recursion) to get the length and string both.

Is this code can be more reader friendly. I am not asking for one liner or complex optimization improve.