3

I am trying to implement multiprocessing, but I am having difficulties accessing information from the object scans that I'm passing through the pool.map() function

Before multiprocessing (this works perfectly):

for sc in scans:
    my_file = scans[sc].resources['DICOM'].files[0]

After multiprocessing (does not work, error shown below):

def process(x):
    my_file = x.resources['DICOM'].files[0] 

def another_method():
    ...                
    pool = Pool(os.cpu_count())
    pool.map(process, [scans[sc] for sc in scans])

another_method()  

The error I am getting with 'After multiprocessing' code:

---> 24         pool.map(process, [scans[sc] for sc in scans])

~/opt/anaconda3/lib/python3.7/multiprocessing/pool.py in map(self, func, iterable, chunksize)
    266         in a list that is returned.
    267         '''
--> 268         return self._map_async(func, iterable, mapstar, chunksize).get()
    269 
    270     def starmap(self, func, iterable, chunksize=None):

~/opt/anaconda3/lib/python3.7/multiprocessing/pool.py in get(self, timeout)
    655             return self._value
    656         else:
--> 657             raise self._value
    658 
    659     def _set(self, i, obj):

~/opt/anaconda3/lib/python3.7/multiprocessing/pool.py in _handle_tasks(taskqueue, put, outqueue, pool, cache)
    429                         break
    430                     try:
--> 431                         put(task)
    432                     except Exception as e:
    433                         job, idx = task[:2]

~/opt/anaconda3/lib/python3.7/multiprocessing/connection.py in send(self, obj)
    204         self._check_closed()
    205         self._check_writable()
--> 206         self._send_bytes(_ForkingPickler.dumps(obj))
    207 
    208     def recv_bytes(self, maxlength=None):

~/opt/anaconda3/lib/python3.7/multiprocessing/reduction.py in dumps(cls, obj, protocol)
     49     def dumps(cls, obj, protocol=None):
     50         buf = io.BytesIO()
---> 51         cls(buf, protocol).dump(obj)
     52         return buf.getbuffer()
     53 

TypeError: can't pickle module objects
14
  • What is data_inputs? First of all you don't need to declare it as global. Your code as above should work fine. Assuming it's a dict, you can just pass its values instead : pool.map(process_scan, data_inputs.values()) Commented Jan 6, 2021 at 8:14
  • @Tomerikoo I just clarified this in my post. Also, when I try to run the line of code you suggested, I get this error: TypeError: can't pickle module objects Commented Jan 6, 2021 at 9:17
  • I said assuming it's a dict... I am not familiar with XNATListing objects... But as I said, your code should just work. There is no need to declare variables as globals for reading... Commented Jan 6, 2021 at 9:29
  • @Tomerikoo sorry, i forgot to mention that the pool calls are made from another method! (edited the post) Commented Jan 6, 2021 at 9:32
  • That doesn't change the fact that data_inputs is still accessible... I don't understand what's wrong with it, are you getting an error? No output? As I said the code should just work. Unless I'm missing something because there is no minimal reproducible example Commented Jan 6, 2021 at 9:33

2 Answers 2

1

You didn't provide the full data structure, but this might help. Multiprocessing is kinda sensible to objects... some object can't be pickled like file objects. Python: can't pickle module objects error

If you need only the file name use that in the map function instead of process

Sign up to request clarification or add additional context in comments.

Comments

0

Not an expert but I got around this issue by changing a little bit the for loop.

def process(i_x):
    x = scans[i_x]
    my_file = x.resources['DICOM'].files[0] 

def another_method():
    ...      
    scans = ...      
    pool = Pool(os.cpu_count())
    pool.map(process, [i for i in range(len(scans))])

another_method() 

By doing so, the object scans is not found in the local namespace of the function process, but it will be found in the global namespace. The argument parsing uses only integers and avoids complex objects that would require Pickle to be transferred to each process. That's at least how I understand the issue.

1 Comment

is scans a mulitprocess shared data structure? how can you access it from the worker?

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.