Divide a set into subset with some constraints

Question

Problem:

Given a list L consisting of not necessarily unique non-negative integers, find the minimum number of subset with maximum sum k (given) in which it can be divided such that:

each subset shouldn't contain consecutive elements from the list*.

(*): I can see an obvious ambiguity:
Q: If there is 3 4, is the restriction is on all 3 or just the adjacent three?

A: Just on adjacent 3.

Examples:


1. L = [2, 1, 1, 2], k = 2 => 4 

Explanation: First set can take 0th index element
             second set can take 1st index element and similarly third and fourth can take 2nd and 3rd index element respectively

Here, you can see that k is working as a bottleneck

More examples:

L = [2, 4, 1, 0] k = 8 => 2

L = [3, 2, 3, 0, 0] k = 5 => 3

This seems to have some features of dynamic programming problems, so I went for dynamic programming and wrote the following recursive code:

from functools import lru_cache
def min_count(L, k):
    '''
    index - stores the current state's index
    sets - this stores information of already stored sets:
            It is tuple of tuple, 
                In each tuple:
                    First element - index of last stored element
                    Second element - stores the sum of current set
    k - It is limit|capacity of each subset(same as the one given in arugment)

    '''
    @lru_cache(maxsize=None)
    def recurse(index, sets, k):
        #If control reaches to the end of the list
        if index == len(L):
            return len(sets)

        #In case the current indexes is zero
        if L[index] == 0:

            #If this is the last index, then
            if index + 1 == len(L):
                return len(sets)
            return recurse(index + 1, sets, k)


        '''

        The job of this for loop is iterate all of the stored sets:
            And if that's capable of storing further, then add current index
            to it, and recurse
        '''
        m = 1e10
        for i, j in sets:
            if i - index < -1 and j + L[index] <= k:
                if index + 1 == len(sets):
                    return len(sets)
                m = min(m, recurse(index + 1,tuple(_ if _ != (i, j) else (index, j + L[index]) for _ in sets) , k))
        


        m = min(m, recurse(index + 1, sets + ((index, L[index]), ), k))
        return m


    return recurse(0, (), k)

This code works, but the problem is that it is very slow; the expected time complexity must be lower than \$O(n^2)\$.

How can I make it fast?

when would more than two subsets be required? Taking every other element should be non consecutive — Ted Brownlow
– Ted Brownlow, Commented Jan 7, 2021 at 16:55

toolic · Accepted Answer · 2025-04-17 11:28:37Z

Here are some minor coding style suggestions. They are not likely to improve performance.

Documentation

Add a docstring for the min_count function to describe its purpose, its input types and return type.

Layout

Add a blank line between the import statement and the function.

The placement of the first docstring in your code is misleading because docstrings typically come after the function def line. Here is better placement:

from functools import lru_cache

def min_count(L, k):
    '''
    Given a list L consisting of not necessarily unique non-negative integers,
    find the minimum number of subset with maximum sum "k" 
    in which it can be divided such that: 
    each subset shouldn't contain consecutive elements from the list.
    '''

    @lru_cache(maxsize=None)
    def recurse(index, sets, k):
        '''
        index - stores the current state's index
        sets - this stores information of already stored sets:
                It is tuple of tuple, 
                    In each tuple:
                        First element - index of last stored element
                        Second element - stores the sum of current set
        k - It is limit|capacity of each subset(same as the one given in arugment)

        '''

There is a typo in the docstring: arugment

This long line is hard to read and understand:

m = min(m, recurse(index + 1,tuple(_ if _ != (i, j) else (index, j + L[index]) for _ in sets) , k))

The black program can be used to automatically reformat it across several lines to more clearly show its structure.

DRY

This expression is used repeatedly:

len(L)

You should set it to a variable:

list_len = len(L)

The same is true for len(sets).

Naming

The code would be easier to understand with more meaningful names for some of the variables. For example, j is typically used as an index, but it looks like you are using it for a sum.

m does not convey much meaning. If it is used for a maximum of some sort, then having "max" in the name would be helpful.

Reinderien · Accepted Answer · 2025-04-20 19:09:50Z

Your implementation has extremely severe superlinear time blowup. Comparing your implementation to one I'll describe momentarily, here are some rudimentary benchmark comparisons:

 n, old_ms, new_ms
 2,    0.0,    4.4
 3,    0.0,    4.3
 4,    0.0,    4.7
 5,    0.1,    5.2
 6,    0.1,    5.8
 7,    0.4,    6.9
 8,    2.0,    8.2
 9,    9.7,   10.0
10,   57.1,   13.5
11,  336.5,   17.4
12, 2081.5,   20.2
13,19222.3,   24.8

If we were to believe these figures blindly and assume polynomial time, then the equivalent exponent across the last two times would be

$$\frac {\log (19/2.2)} {\log (13/12)} \approx 27$$

but of course the formal time complexity is not \$O(n^{27})\$. For a semi-educated guess, it may be \$O(n^{n/2})\$. I'm not going to put more effort into figuring out what it is in reality.

The alternative method I propose has a little more setup time but gentler time complexity. (How much exactly is "complicated" due to being LP.)

About the alternative implementation:

Frame it as an integer linear program (my frequent hammer for this kind of nail)
The problem size is \$ (3n + n(n - 1) ) \times ( n + n^2) \$
The problem is fairly sparse; all four of the constraints require Kronecker products:
- Exclusive element-group assignment
- Group sum limit
- Correlation of group assignment and element assignment
- Consecutive element prevention

This demonstration shows how to construct the sparse constraint matrices by hand; other high-level libraries will be able to make it a little more intuitive.

import functools
import time
import typing

import numpy as np
import scipy.sparse as sp
from scipy.optimize import milp, Bounds, LinearConstraint


def op_min_count(L, k): ...


def min_subsets(
    inputs: typing.Sequence[int],  # i.e. 'L'
    sum_limit: int,                # i.e. 'k'
) -> int:
    n = len(inputs)

    # Variables:
    # n binary group enables
    # n*n binary element-group assignments, slow axis is group, fast axis is element
    minimize_group_count = np.zeros(n + n*n)
    minimize_group_count[:n] = 1

    # Elements must be assigned to exactly one group
    group_zeros = sp.csc_array((n, n))
    eye_n = sp.eye_array(n, n)
    h_ones = np.ones((1, n))
    exclusive_assignment = LinearConstraint(
        A=sp.hstack(
            (group_zeros, sp.kron(h_ones, eye_n)),
            format='csc',
        ),
        lb=1, ub=1,
    )

    # Limit group sums
    input_array = np.asarray(inputs)[np.newaxis, :]
    group_sums = LinearConstraint(
        A=sp.hstack(
            (group_zeros, sp.kron(eye_n, input_array)),
            format='csc',
        ),
        ub=sum_limit,
    )

    # Correlate element assignment and group assignment
    # element <= group: 0 <= group - element
    assignment_correlation = LinearConstraint(
        A=sp.hstack(
            (
                n*eye_n,
                sp.kron(eye_n, -h_ones),
            ), format='csc',
        ),
        lb=0,
    )

    # Consecutive inputs elements may not be assigned to the same output group
    # x_i + x_{i+1} <= 1
    no_consecutives = LinearConstraint(
        A=sp.hstack(
            (
                sp.csc_array((n*(n - 1), n)),
                sp.kron(
                    eye_n,
                    sp.eye_array(n - 1, n) +      # first element in consecutive run
                    sp.eye_array(n - 1, n, k=1),  # second element in consecutive run
                ),
            ), format='csc',
        ),
        ub=1,
    )

    result = milp(
        c=minimize_group_count,
        integrality=1,              # all variables are integral
        bounds=Bounds(lb=0, ub=1),  # all variables are binary
        constraints=(
            assignment_correlation, exclusive_assignment, group_sums, no_consecutives,
        ),
    )

    if not result.success:
        raise ValueError(result.message)

    group_enables = np.empty(shape=n, dtype=np.int8)
    element_assignments = np.empty(shape=(n, n), dtype=np.int8)
    np.rint(result.x[:n], out=group_enables, casting='unsafe')
    np.rint(result.x[n:].reshape((n, n)), out=element_assignments, casting='unsafe')

    return group_enables.sum()


def test() -> None:
    # OP tests
    assert min_subsets((2, 1, 1, 2), 2) == 4, 'k working as a bottleneck'
    assert min_subsets((2, 4, 1, 0), 8) == 2, 'two simple groups'
    assert min_subsets((3, 2, 3, 0, 0), 5) == 3, 'limited by consecutive constraint'


def benchmark() -> None:
    rand = np.random.default_rng(seed=0)

    print(f'{"n":>2},{"old_ms":>7},{"new_ms":>7}')

    for n in range(2, 14):
        inputs = rand.integers(low=1, high=10, size=n)
        k = n * 10//2

        t0 = time.perf_counter()
        size_a = op_min_count(inputs, k)
        t1 = time.perf_counter()
        size_b = min_subsets(inputs, k)
        t2 = time.perf_counter()

        assert size_a == size_b
        print(f'{n:2},{(t1 - t0)*1e3:7.1f},{(t2 - t1)*1e3:7.1f}')


if __name__ == '__main__':
    test()
    benchmark()

This has a constraint sparsity pattern that - for an example n=6 - looks like

Stack Exchange Network

Divide a set into subset with some constraints

Problem:

Examples:

2 Answers 2

Documentation

Layout

DRY

Naming

You must log in to answer this question.

Hot Network Questions

Problem:

Examples:

2 Answers 2

Documentation

Layout

Naming

You must log in to answer this question.

Related