1. Comments on your codeReview
- Your function
knapsacklacks a docstring that would explain what arguments the function takes (what kind of things are initems? mustitemsbe a sequence, or can it be an iterable?) and what it returns.
Also, this kind of function is ideal for doctests.
"""
Solve the knapsack problem by finding the most valuable
subsequence of `items` that weighs no more than `maxweight`.
`items` is a sequence of pairs `(value, weight)`, where `value` is
a number and `weight` is a non-negative integer.
`maxweight` is a non-negative integer.
Return a pair whose first element is the sum of values in the most
valuable subsequence, and whose second element is the subsequence.
>>> items = [(4, 12), (2, 1), (6, 4), (1, 1), (2, 2)]
>>> knapsack(items, 15)
(11, [(2, 1), (6, 4), (1, 1), (2, 2)])
"""
YourThe function
knapsacklacks a docstring that would explain what arguments the function takes (what kind of things are initems? mustitemsbe a sequence, or can it be an iterable?) and what it returns.This kind of function is ideal for doctests.
The comments say things like "Create an (N+1) by (W+1) 2-d list". But what is N and what is W? Presumably N is
len(items)and W ismaxweight, but this seems needlessly unclear. Better to putso it would be a couple of lines like thisgood idea to use the same names in the comments and the code:N = len(items) W = maxweight
so that the comments match the code (and then use N and W in the remainder of the code).
The comment above
bestvaluesfails to explain what the values in this table actually meanmean. I would write something like this instead:# bestvalues[i][j] is the best sum of values for any # subsequence of the first i items, whose weights sum # to no more than j.
(This This makes it obvious why 0 ≤ i ≤ N\$0 ≤ i ≤ N\$ and why 0 ≤ j ≤ W\$0 ≤ j ≤ W\$.)
In a loop like:
bestvalues = [[0] * (maxweight + 1) for i in xrange(len(items) + 1)]
You can simplify the code by omitting theseThese lines could be omitted:
# Increment i, because the first row (0) is the case where no items # are chosen, and is already initialized as 0, so we're skipping it i += 1
and then usingin the rest of the code, use i + 1 instead of i and i instead of i - 1.
YourThe reconstruction loop:
i = N while i > 0: # code i -= 1
for i in xrangereversed(Nrange(1, 0,N -+ 1)):
# code
You printThe code prints an error message like this:
print('usage: knapsack.py [file]')
Error messages ought to go to standard error (not standard output). And you can't know that yourit's a good idea not to hard-code the name of the program is called "knapsack.py":, because it might have been renamedwould be easy to rename the program but forget to update the code. So write instead:
sys.stderr.write('usage"usage: {0} [file]\n'[file]\n".format(sys.argv[0]))
YourThe block of code that reads the problem description and prints the result only runs when
__name__ == '__main__'. This makes it hard to test, for example from the interactive interpreter. It's usually best to put this kind of code in its own function, like this:def main(filename): with open(filename) as f: # etc. if __name__ == '__main__': if len(sys.argv) != 2: print('usage: knapsack.py [file]') sys.exit(1) main(sys.argv[1])
You readThe code reads the whole of the file into memory as a list of lines:
lines = f.readlines()
this is harmless here because the file is small, but it's a bad habit to get into. It's usually bestbetter to process a file one line at a time if you can, like this:
with open(filename) as f:
maxweight = int(next(f))
items = [map[[int(int,word) for word in line.split())] for line in f]
(Note that this results in slightly simpler code too.)
2. A more Pythonic solution?Recursive approach
Any dynamic programming algorithm can be implemented in two ways: by building a table of partial results from the bottom up (as in yourthe code in the post), or by recursively computing the result from the top down, using memoization to avoid computing any partial result more than once.
There are two advantages of theThe top-down approach: first, it often results in slightly simpler and clearer code, and second, it only computes the partial results that are needed for the particular problem instance (whereas in the bottom-up approach computesit's often hard to avoid computing all partial results even if some of them go unused).
So we could use the @memoized decorator from the Python Decorator Library@functools.lru_cache to implement a top-down solution, as shown below. (The Python wiki seems to be down at the moment, but you can find the code on archive.org.)like this:
from functools import lru_cache
def knapsack(items, maxweight):
"""
Solve"""Solve the knapsack problem by finding the most valuable
subsequence
subsequence of `items` subjectitems that weighs no more than
`maxweight`maxweight.
`items`items ismust be a sequence of pairs `(value, weight)`, where `value`value is
a
a number and `weight`weight is a non-negative integer.
`maxweight`maxweight is a non-negative integer.
Return a pair whose first element is the sum of values in the most
valuable subsequence, and whose second element is the subsequence.
>>> items = [(4, 12), (2, 1), (6, 4), (1, 1), (2, 2)]
>>> knapsack(items, 15)
(11, [(2, 1), (6, 4), (1, 1), (2, 2)])
"""
@lru_cache(maxsize=None)
def bestvalue(i, j):
# Return the value of the most valuable subsequence of the first i
# i elements in items whose weights sum to no more than j.
@memoized
def bestvalue(i, j):
if ij ==< 0: return 0
value, weight = items[i -return 1]float('-inf')
if weighti >== j0:
return bestvalue(i - 1, j)0
else:
value, weight = items[i - 1]
return max(bestvalue(i - 1, j),
bestvalue(i - 1, j - weight) + value)
j = maxweight
result = []
for i in xrangereversed(range(len(items), 0, -1)):
if bestvalue(i + 1, j) != bestvalue(i - 1, j):
result.append(items[i - 1]items[i])
j -= items[i - 1][1]items[i][1]
result.reverse()
return bestvalue(len(items), maxweight), result
To see how many partial solutions this code needs to compute, print len(bestvalue.cachecache_info() just before returning the result. When solving the example problem in the docstring, I find that this computes 37 partial solutions (compared tooutputs:
CacheInfo(hits=17, misses=42, maxsize=None, currsize=42)
The 42 entries in the cache are fewer than the 96 partial solutions computed by the bottom up approach).