3

I need to get the Cartesian product of a dictionary {str: Fraction} with itself, but currently need to "loop" through the dict twice, once for the keys and once for the values.

The difference is that a tuple output works well for the keys, but for the values I actually need the regular math product.

How can I do better? Keen to stick to the standard library if possible.

import itertools as it
import math as mt
from fractions import Fraction as fr

dl = {
    'P1': fr(7, 120), 'P2': fr(20, 120), 'P3': fr(6, 120), 
    'P4': fr(10, 120), 'P6': fr(7, 120), 'P6': fr(18, 120), 
    'P7': fr(16, 120), 'P8': fr(19, 120), 'P9': fr(17, 120)
}
rp = 2 # <- This is variable, can be an unknown integer
iter_keys = list(it.product(dl, repeat = rp))
iter_vals = list(map(mt.prod, it.product(dl.values(), repeat = rp)))

# Desired output
print({iter_keys[i]: iter_vals[i] for i in range(len(iter_vals))})
6
  • 4
    Could you define "better"? Because timing wise, the most direct and naive solution {(k1,k2): dl[k1]*dl[k2] for k1 in dl for k2 in dl} is 3 times faster that your solution (and j.earls', since it is slightly slower than yours). Reading what you said in the question and in comments of the existing answer, I feel that for you jealrs>yours>naive, while timing-wise it is the exact opposite. So what is the criteria here? Commented Aug 7 at 8:22
  • 2
    Note that this is not a sarcastic question or what. Just, I don't know in which direction to search, since I can see that there are some missing information to know what is "better". One possible example of perfectly valid reason not to use the "naive" version (which is the fastest, the shortest, the most pythonic and the less dependent on libraries so far, so from your given criteria, I fail to see why not use it), is hidden in your rp=2 line. Which seems to indicate that it may also me not 2. In which case, nested for loops can't work, if you don't know how many for loop Commented Aug 7 at 8:27
  • 2
    As a sidenote, I personally think your fr, mt and it abbreviations are not adding any clarity. Such abbreviations are good for very long imports (such as import matplotlib.pyplot as plt) or for well-known abbreviations (such as import numpy as np) but I think you should either just import math and use math.prod, or from math import prod; the middle mt.prod doesn't really add any clarity Commented Aug 7 at 9:46
  • Not to mention in this case you probably want from operator import mul and not from math import prod Commented Aug 7 at 9:46
  • 1
    @chrslg is right, more details were needed. Yes rp is a parameter, so that means for loops don't scale well. Thanks @Kelly. Have fixed the code Commented Aug 7 at 22:55

4 Answers 4

3

without itertools

Since we have now 3 or 4 variants with itertools, here is my version without itertools, almost as naive as the direct for loops I proposed in comments

def rec(d, rp=2):
    if rp==0: return d
    return rec({k1+(k2,): d[k1]*dl[k2] for k1 in d for k2 in dl}, rp-1)
def wrec(rp=2):
    return rec({():1},rp)

So, just the same as the direct {(k1,k2): dl[k1]*dl[k2] for k1 in dl for k2 in dl}, but cross-producting one by one (it's a cross-product of a cross-product of ... if rp is big)

I use a "pure" version here, starting with {():1}, which is the neutral element of that cross-product. But of course, if we know there is no way the function would be called with rp=0 (after all, no other version works with rp=0), and even rp=1, we could save one recursion with

def wrec(rp=2):
    return rec({(k,):dl[k] for k in dl}, rp-1)

with itertools

Just to have my own itertools version, here is one

{kk: mt.prod(dl[k] for k in kk) for kk in it.product(dl, repeat=2)}

Timings

(on my machine, with OP example. Depending on what is intended usage, it might be worth studying what happens when dl since increases, or when rp increases)

method Timing
OP 346 ÎĽs
J. Earls 389 ÎĽs
J. Earls' simplified version 426 ÎĽs
Stef 468 ÎĽs
My itertools 398 ÎĽs
Robert's 363 ÎĽs
My no-itertools recursive 167 ÎĽs
My 2nd no-itertools rp>0 136 ÎĽs
(for reference, direct iteration, but with forced rp=2) 121 ÎĽs

So the recursive version may be disappointing. It's always cool to pull up a itertools impressive yoga, when my recursive function is just what you learn at school. But it is faster, it depends on no library at all. And even if in python recursion is not often a good idea (because no tail call optimization, so even the simplest recursion can reach quite quickly the eponymous error "stack overflow"), here, I doubt you plan to have rp=20000 or something (if you do, you'll have other problems before stack overflow), so as disappointing as it is, I feel it is the best.

And if that no-itertools solution isn't acceptable, anyway, so far, the best solution is yours :D

Edit: impact of rp

Just to be sure, I wanted to see how rp impacts the timing. So, since I did it, I post it here. But it is quite boring. Roughly it has no impact. Just an almost constant multiplicative factor.

Following shows 3 times the same thing: First figure shows the log time (since it is clearly exponential with rp). It shows how basically the timing ratios stay constant (constant gap on the log scale). Second figure same with lin scale (just to better visualize how it is basically all equivalent, but for the solution without itertools). And 3rd shows the ratio between each solution and OP's, for each rp. Which show better than the 2 first how almost constant that ratio is (but stef's and for my recursive version that seems to have an even better ratio when rp rises, but with an obvious asymptotic constant ratio quite soon). But bottom line is: even considering other rp value doesn't change the previous conclusion: "fastest way is the boring recursive one, and if that is unacceptable, then it is yours"

Timings vs rp

vs dl size

Likewise, increasing the number of entries in dl has no impact on that conclusion neither. (again, it was expected. We expect for all method timing to be O(exp(rp)) and more precisely O(n^rp). So, with rp=2, quadratic in dl size, with rp=3, cubic, etc.

Same with dl size

My code for timings

(the working part of the code is just copy taken from this page of contributors' code, including my own solution. So no answer about the question in this section. Just the timing/plotting code that OP requested in comments)

import itertools as it
import math as mt
from fractions import Fraction as fr
import timeit
import numpy as np
import matplotlib.pyplot as plt
from tqdm import tqdm

# Testing data
dl = {
    'P1': fr(7, 120), 'P2': fr(20, 120), 'P3': fr(6, 120), 
    'P4': fr(10, 120), 'P6': fr(7, 120), 'P6': fr(18, 120), 
    'P7': fr(16, 120), 'P8': fr(19, 120), 'P9': fr(17, 120)
}

rp=2

######## Everybody's code ######
# Follow the same pattern for everybody: a function whose optional argument is rp. 
# Use global variable dl as data

# OP
def org(rp=2):
    iter_args = list(it.product(dl, repeat = rp))
    iter_vals = list(map(mt.prod, it.product(dl.values(), repeat = rp)))

    # Desired output
    return {iter_args[i]: iter_vals[i] for i in range(len(iter_vals))}


# J. Earls first method
def jearls(rp=2):
    products_entries = [
        tuple(zip(*product))
        for product in it.product(dl.items(), repeat=rp)
    ]
    products = {
        product[0]: mt.prod(product[1])
        for product in products_entries
    }
    return products

# J. Earls simplified version. I skip it, since it is not faster
# (and they didn't claim it was faster. Just simpler, and less memory
# but I don't test memory)
def jearlsSimplified(rp=2):
    return {
        product[0]: mt.prod(product[1])
        for product in (
            tuple(zip(*product))
            for product in it.product(dl.items(), repeat=rp)
        )
    }

# My itertools based version
def myit(rp=2):
    return {kk: mt.prod(dl[k] for k in kk) for kk in it.product(dl, repeat=rp)}

# My recursive version 
# Recursive function
def rec(d, rp=2):
    if rp==0: return d
    return rec({k1+(k2,): (d[k1]*dl[k2]) for k1 in d for k2 in dl}, rp-1)
# Wrapper (all other function just have this rp argument), if we assume rp could be 0
def wrecPure(rp=2):
    return rec({():1}, rp)
# Wrapper if rp is at least 1
def wrecRpSup0(rp=2):
    return rec({(k,):dl[k] for k in dl}, rp-1)


# Stef's version
from itertools import product # Stef has its own import. They're right about that, I think
import math
def stef(rp=2):
    return {tuple(k for k,_ in p): mt.prod(v for _,v in p) for p in product(dl.items(), repeat=rp)}

# Robert's version
def robert(rp=2):
    return dict(zip(
        it.product(dl, repeat = rp),
        map(mt.prod, it.product(dl.values(), repeat = rp))
    ))

# List of method to test, with a human name
methods=[(org, "OP"), (jearls, "J. Earls'"), (stef, "Stef's"), (robert, "Robert's"), (myit, "My itertools-based"), (wrecRpSup0, "recursive")]
def showTimingTable():
    print("| Method | Timing |")
    print("| - | - |")
    for f, name in methods:
        print(f"| {name} | {timeit.timeit(f, number=1000)/1000*1e6:.0f} ÎĽs |")


# Started from hard coded, hence the ame op/robert, when it can compare any pair
def robVsOpMatch(op=org, robert=robert, nameOp="OP", nameRobert="Robert's"):
    orgsTime=np.array([timeit.timeit(op, number=1) for n in range(20000)])/1*1e6
    robsTime=np.array([timeit.timeit(robert, number=1) for n in range(20000)])/1*1e6
    fig,(ax1,ax2)=plt.subplots(2,1,sharex=True)
    rng = (min(orgsTime.min(), robsTime.min()), max(np.quantile(orgsTime, 0.99), np.quantile(robsTime, 0.99)))
    ax1.hist(orgsTime, bins=200, range=rng)
    ax2.hist(robsTime, bins=200, range=rng)
    ax1.set_title(nameOp)
    ax2.set_title(nameRobert)
    print(f"{nameRobert} wins {(robsTime<orgsTime).sum()/200} % of the times")
    plt.show()


def showTimingsVsRp():
    rps=list(range(2,8)) # 8 or more is to slow on my machine
    nums=[1000,100,10,1,1,1] # Number of experiment to make for timing
            # for low rp, we can have a lot to reduce randomness. For higher rp, 1 experiment is long enough, and randomness is reduced naturally by the timing
    times = [[] for _ in methods]

    with tqdm(total=2**(max(rps)-1)*len(methods)) as pbar:
        for rp,n in zip(rps, nums):
            for (f,name),tms in zip(methods, times):
                tms.append(timeit.timeit(lambda: f(rp), number=n)/n*1e6)
                pbar.update(2**(rp-2)) # I assume it is exponential starting for rp=2, which it is not. Especially with the `nums` thing. But well, it is just for progress bar. As long as it moves from times to times to tell that the computater is still alive...

    times = np.array(times) # I need them in np format to be able to divide them

    # 3 plots
    fig, (ax1, ax2, ax3) = plt.subplots(1,3)

    for (f,name),tms in zip(methods, times):
        # y log scale
        ax1.plot(rps, tms, label=name)
        ax1.set_yscale('log')
        ax1.legend()
        # exact same, but linear scale
        ax2.plot(rps, tms, label=name)
        ax2.legend()
        # ratio with OP
        ax3.plot(rps, tms/times[0], label=name)
        ax3.legend()

    plt.show()


# Just a copy of the previous, but it's dl that changes
def showTimingsVsDlSize():
    global dl # Because I need to alter the dl (since the functions don't have a dl parameter)
    rp=2
    lendl=list(range(9,27,3)) 
    # no need for nums thing this times, it is just quadratic (with rp=2)
    times = [[] for _ in methods]

    with tqdm(total=len(methods)*len(lendl)) as pbar:
        for N in lendl:
            # A mock dict of N entries (same for everyone)
            dl={f'P{i}': fr(np.random.randint(1,119), 120) for i in range(1, N+1)}
            for (f,name),tms in zip(methods, times):
                tms.append(timeit.timeit(f, number=100)/100*1e6)
                pbar.update(1) 

    times = np.array(times) # I need them in np format to be able to divide them

    # 3 plots
    fig, (ax1, ax2, ax3) = plt.subplots(1,3)

    for (f,name),tms in zip(methods, times):
        # y log scale
        ax1.plot(lendl, tms, label=name)
        ax1.set_yscale('log')
        ax1.legend()
        # exact same, but linear scale
        ax2.plot(lendl, tms, label=name)
        ax2.legend()
        # ratio with OP
        ax3.plot(lendl, tms/times[0], label=name)
        ax3.legend()

    plt.show()
    

#showTimingTable()
#robVsOpMatch()
#showTimingsVsRp()
showTimingsVsDlSize()
Sign up to request clarification or add additional context in comments.

8 Comments

"Or that itertools could be so slow :D" - Hmm? You know it isn't. In the test with type, you got 19 ÎĽs for my solution, and that includes all the itertools work.
@Robert 19 ÎĽs isn't very fast for iterating 81 elements. With any vectorized solution (like the R functions OP is alluding to), we should be talking in ns, not ÎĽs. Or like when you skip iterations by using numpy vectorized functions. itertools provide nowhere near the ×1000 factor you get with those "vectorization" (so-called). It is still some python iterations with python calls and python data. Besides, in your case, it is not itertools that differentiate you from others, it is map. All others, OP, Stef, Jearls, my own itertools-based solution still do the same 81 (or 162, depending
on how it is made) for iterations. So does my "no-itertools" version. Your method uses map. So, if you want to compare "only-iteration part, not computation", with type, it is map that you should credit with your lower "with type" timing. But my remark applies to map too: it is nowhere near the kind of cpu-time gain OP is alluding to coming from R world (and that I claim exist in python's word also, with numpy, pandas, etc). It is not at all "let don't use R or python, since they are slow language, and rely on an underlying C-coded library for this task)
map also is still python. It is python data, python iterations, python calls. Like itertools (or like the .apply or any lambda-based method of the aforementionned library), they can't really "vectorize" if they have to call some python code each iteration)
But, well, my remark was not at all intendend to criticize itertools. I love itertools. It was just about the OP saying "because I come from R, I have this reflex to avoid at all cost for loops". My remark is just "that reflex is valid in python too. Just, it is not spectacular if what you replace for with is some clever python library (itertools is written in C. But still, it has to call python iterator, performs pythons calls, etc. Numpy or R vectorized function we were alluding to don't)
|
2

You don't need lists and a comprehension, and you especially don't need looping by indices.

import itertools as it
import math as mt
from fractions import Fraction as fr

dl = {
    'P1': fr(7, 120), 'P2': fr(20, 120), 'P3': fr(6, 120), 
    'P4': fr(10, 120), 'P6': fr(7, 120), 'P6': fr(18, 120), 
    'P7': fr(16, 120), 'P8': fr(19, 120), 'P9': fr(17, 120)
}
rp = 2

print(dict(zip(
    it.product(dl, repeat = rp),
    map(mt.prod, it.product(dl.values(), repeat = rp))
)))

Attempt This Online!

Comments

2

you can do this with a combination of the dictionary .items() method and the zip function:

import itertools as it
import math as mt
from fractions import Fraction as fr

dl = {
    'P1': fr(7, 120), 'P2': fr(20, 120), 'P3': fr(6, 120), 
    'P4': fr(10, 120), 'P6': fr(7, 120), 'P6': fr(18, 120), 
    'P7': fr(16, 120), 'P8': fr(19, 120), 'P9': fr(17, 120)
}
rp = 2
products_entries = [
    tuple(zip(*product))
    for product in it.product(dl.items(), repeat=rp)
]
products = {
    product[0]: mt.prod(product[1])
    for product in products_entries
}

# Output
print(products)

The reason this works:

  1. dl.items() produces the items ('P1', Fraction(7, 120))), ('P2', Fraction(20, 120)), ('P3', Fraction(6, 120)), etc.
  2. itertools.product returns sets of items like (('P1', Fraction(7, 120)), ('P2', Fraction(20, 120))
  3. zip then "zips" together all the tuples passed to it, returning a tuple of all the 1st elements, a tuple of all the 2nd elements, etc. thus zip(*product) (which expands product to pass each component tuple as a separate argument) returns, for example, (('P1', 'P2'), (Fraction(7, 120), Fraction(20, 120)))
  4. Because zip returns a "zip object", tuple(zip(*product)) converts it to a tuple, resulting in products_entries being a list of [ (('P1', 'P1'), (Fraction(7, 120), Fraction(7, 120)), (('P1', 'P2'), (Fraction(7, 120), Fraction(20, 120)), ... ]
  5. That's then converted to the desired dictionary by taking the first item of each tuple as the dictionary key, and passing the value through mt.prod()

To simplify this further, we can combine the list comprehension and dictionary comprehension, which also reduces memory usage:

# ...
products = {
    keys: mt.prod(values)
    for (keys, values) in (
        tuple(zip(*product))
        for product in it.product(dl.items(), repeat=rp)
    )
}

4 Comments

This still needs two loops, but it almost got me there. Thanks to the tip about dl.items(), I can now define a one-line function def zip_prod(z): return (z[0], mt.prod(z[1])) and use zip_prod(tuple(zip(*product))) directly in the first for loop, thus avoiding the second one altogether. Do you want to please edit your answer?
i would suggest instead that the simplified example has the same timing and memory consumption as a single loop: the inner loop is a generator expression that yields one item at a time to the outer loop without building any data structure in memory and still only traverses the dictionary a single time.
I suggest using unpacking: replace product[0]: mt.prod(product[1]) for product in ... with keys: math.prod(values) for keys, values in ... or something similar. Right now you have two distinct variables called with the same name product, and these two variables have different structure (one is a tuple of pairs, the other is a pair of tuples), so having them use the same name is just confusing. Unpacking allows you to naturally name the first and second member of the pairs, which is way more readable in my opinion than having to juggle with product[0] and product[1].
@Stef agreed and edited as such.
1

If you only want pairs:

{(k1,k2): v1*v2 for k1,v1 in dl.items() for k2,v2 in dl.items()}

For general t-uples:

from itertools import product
import math

rp = 2
{tuple(k for k,_ in p): math.prod(v for _,v in p) for p in product(dl.items(), repeat=rp)}

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.