How can I parallelize this python script

Ask Question

Asked 10 years, 9 months ago

Modified 10 years, 9 months ago

Viewed 236 times

Hey I've got a working script but it work on k-combinations so it work long... I want to parallelize a for loop to divide the work time.

Here is the simplified code:

fin2 = open('combi_nod.txt','r')
for lines in fin2:
    (i, j) = eval(lines)
    edgefile = open('edge.adjlist', 'a')
    count = 0
    for element in intersection(
            eval(linecache.getline('triangleset.txt', i+1)),
            eval(linecache.getline('triangleset.txt', j+1))):
        if element not in merge:
           count = 1
           break
    if count == 0:
        edgefile.write(' ' + str(j))
    edgefile.close()
fin2.close()

How can I do this?

EDIT

After some modification I have accomplished the multiprocessing loop. But their is a associate issue:

in my initial for loop I search in the combi_nod.txt file; combi_nod.txt content is the itertools.combinaison of large number. (so, at a point I can anymore store them in variable)

My multiprocessing loop work with a list of this itertools.combinaison because I haven't see a way to pass line of a file in arguments (so I have a memory issue), have you a new Idea?

EDIT2

For clarification, here is the code like it is a this point:

def intersterer(lines):
  (i, j) = lines
  counttt = 0
  for element in some_stuff:
    if element not in merge:
      counttt = 1
      break
  if counttt == 0:
     return (int(i), int(j))
  else:
     return (0, 0)

fin2 = open('combi_nod.txt','w')
for trian_c in itertools.combinations(xrange(0, counter_tri), 2):
#counter_tri is a large number
    fin2.write(str(trian_c) + "\n")
fin2.close()
fin2 = open('combi_nod.txt','r')

if __name__ == '__main__':
    pool = Pool() 
    listt = pool.map(intersterer, itertools.combinations(xrange(0, counter_tri), 2))  
    f2(listt)
    if (0,0) in listt: listt.remove((0,0))

and I want to have something working like:

listt = pool.map(intersterer, fin2)

But all my tests doesn't work at all... Help...

edited Jan 17, 2015 at 9:48

asked Jan 14, 2015 at 20:06

Alex Jud

1151 silver badge11 bronze badges

It's hard to tell from the simplified code, but where is the process spending most of its time, computing the eval()s or reading/writing the files?

martineau
– martineau

2015-01-14 20:58:23 +00:00
Commented Jan 14, 2015 at 20:58
All my variables are written in files for minimize RAM usage. My script generate a lot of data so I can't store all of them in memory.

Alex Jud
– Alex Jud

2015-01-14 21:23:45 +00:00
Commented Jan 14, 2015 at 21:23
The only process I want parallelize is the first for loop. I'm thinking about doing eight lines at once.

Alex Jud
– Alex Jud

2015-01-14 21:27:37 +00:00
Commented Jan 14, 2015 at 21:27
That statement's a bit confusing since the second for loop is nested inside the first. Writing everything to files may complicate the matter if you need simultaneous write access to the same one from concurrent processes or threads. In concurrent programming access to any shared resource has to be controlled by locks or semaphores or something similar.

martineau
– martineau

2015-01-14 23:09:31 +00:00
Commented Jan 14, 2015 at 23:09
I can split the initial file and merge at final step. Thks to talk me about this upcoming issue. Like I said it's the first loop that I would parallelize.

Alex Jud
– Alex Jud

2015-01-15 07:03:44 +00:00
Commented Jan 15, 2015 at 7:03

| Show 11 more comments

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

How can I parallelize this python script

0

Linked

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Linked