Python Parallel Computing Summary
Python #ParallelComputing
Python Parallel Computing Summary
Python may be faster compared with MATLAB in some cases when a part of the program directly calls the C language. If we want to use Python to do parallel computation, we need use the ‘multiprocessing’ package.
Very detailed introduction and examples can be found in this
https://docs.python.org/3.4/library/multiprocessing.html?highlight=process
How to use?
There are many different ways of achieving multiprocessing (not multi threads which will be restriced by GIL in python), here I show the simplest realization by using “pool”
1 | import multiprocessing as mp |
In above examples, the pool.apply_async
function will arrange the tasks in different cores simultaneouly.
What needs to be emphasized is that the paralleling computing should be used in which the single process is very slow and each core do the job very slowly. We shouldn’t distribute many small jobs which needs very short time to complete. Then most of the time will be used to distribute works.
How to show the progress?
In many cases when we use the multiprocessing
we would want to know the progress and how to show the progress is not a easy task. We may first think that we can print something during the process. However, the print content will only show when all the results joined and returned. After searching on the internet, I know the solution
The answer from Zeawoas shows the correct use
for anybody looking for a simple solution working with Pool.apply_async():
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15 >from multiprocessing import Pool
>from tqdm import tqdm
>from time import sleep
>def work(x):
sleep(0.5)
return x**2
>n = 10
>p = Pool(4)
>pbar = tqdm(total=n)
>res = [p.apply_async(work, args=(
i,), callback=lambda _: pbar.update(1)) for i in range(n)]
>results = [p.get() for p in res]