Python #ParallelComputing

Python Parallel Computing Summary

Python may be faster compared with MATLAB in some cases when a part of the program directly calls the C language. If we want to use Python to do parallel computation, we need use the ‘multiprocessing’ package.
Very detailed introduction and examples can be found in this

https://docs.python.org/3.4/library/multiprocessing.html?highlight=process

How to use?

There are many different ways of achieving multiprocessing (not multi threads which will be restriced by GIL in python), here I show the simplest realization by using “pool”

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
import multiprocessing as mp
import numpy as np
%use a function to define what we want to do
def fun=(a,b,c,d):
#do something here
return fun
# number of process you are going to use
processnum=10

if __name__ == '__main__':#this is needed in windows
pool = mp.Pool(processes=processnum)
# there are many parameters and we can choose one of them
# for example "a" as a loop element to be calculated parallely.
results = [pool.apply_async(fun, args=(a,b,c,d)) for a in range(loop_mat)]
#get the values arbitrarily
value_1= [p.get() for p in results]# use this get function to obtain the calculated value
pool.terminate() #shut down

In above examples, the pool.apply_async function will arrange the tasks in different cores simultaneouly.

What needs to be emphasized is that the paralleling computing should be used in which the single process is very slow and each core do the job very slowly. We shouldn’t distribute many small jobs which needs very short time to complete. Then most of the time will be used to distribute works.

How to show the progress?

In many cases when we use the multiprocessing we would want to know the progress and how to show the progress is not a easy task. We may first think that we can print something during the process. However, the print content will only show when all the results joined and returned. After searching on the internet, I know the solution

Stack Overflow

The answer from Zeawoas shows the correct use

for anybody looking for a simple solution working with Pool.apply_async():

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
>from multiprocessing import Pool
>from tqdm import tqdm
>from time import sleep

>def work(x):
sleep(0.5)
return x**2

>n = 10

>p = Pool(4)
>pbar = tqdm(total=n)
>res = [p.apply_async(work, args=(
i,), callback=lambda _: pbar.update(1)) for i in range(n)]
>results = [p.get() for p in res]