Catalog
  1. 1. Python Parallel Computing Summary
    1. 1.1. How to use?
    2. 1.2. How to show the progress?
Python Parallel Computing Summary

Python Parallel Computing Summary

Python may be faster compared with MATLAB in some cases when a part of the program directly calls the C language. If we want to use Python to do parallel computation, we need use the ‘multiprocessing’ package.
Very detailed introduction and examples can be found in this

https://docs.python.org/3.4/library/multiprocessing.html?highlight=process

How to use?

There are many different ways of achieving multiprocessing (not multi threads which will be restriced by GIL in python), here I show the simplest realization by using “pool”

1
import multiprocessing as mp
2
import numpy as np
3
%use a function to define what we want to do
4
def fun=(a,b,c,d):
5
#do something here
6
    return fun
7
# number of process you are going to use
8
processnum=10
9
10
if __name__ == '__main__':#this is needed in windows
11
    pool = mp.Pool(processes=processnum) 
12
    # there are many parameters and we can choose one of them 
13
    # for example "a" as a loop element to be calculated parallely.
14
    results = [pool.apply_async(fun, args=(a,b,c,d)) for a in range(loop_mat)]
15
    #get the values arbitrarily
16
    value_1= [p.get() for p in results]# use this get function to obtain the calculated value
17
    pool.terminate() #shut down

In above examples, the pool.apply_async function will arrange the tasks in different cores simultaneouly.

What needs to be emphasized is that the paralleling computing should be used in which the single process is very slow and each core do the job very slowly. We shouldn’t distribute many small jobs which needs very short time to complete. Then most of the time will be used to distribute works.

How to show the progress?

In many cases when we use the multiprocessing we would want to know the progress and how to show the progress is not a easy task. We may first think that we can print something during the process. However, the print content will only show when all the results joined and returned. After searching on the internet, I know the solution

Stack Overflow

The answer from Zeawoas shows the correct use

for anybody looking for a simple solution working with Pool.apply_async():

1
>from multiprocessing import Pool
2
>from tqdm import tqdm
3
>from time import sleep
4
>def work(x):
5
   sleep(0.5)
6
   return x**2
7
>n = 10
8
>p = Pool(4)
9
>pbar = tqdm(total=n)
10
>res = [p.apply_async(work, args=(
11
   i,), callback=lambda _: pbar.update(1)) for i in range(n)]
12
>results = [p.get() for p in res]
Author: Knifelee
Link: https://knifelees3.github.io/2019/04/17/A_En_Python_ParallelComputing/
Copyright Notice: All articles in this blog are licensed under CC BY-NC-SA 4.0 unless stating additionally.
Donate
  • 微信
  • 支付寶

Comment