Python #Tools #Dataprocessing #COMSOL

目的

目前对于python处理文字有了一些心得,刚好最近在用COMSOL做数值计算,需要参数扫描。我想分析每次计算COMSOL需要的计算内存和运算时间,防止计算内存爆掉了。

具体操作

COMSOL log文件信息详细记录了每一次扫描的一些进程,一般开头是这样的

1
2
3
4
5
6
7
8
9
*******************************************
***COMSOL 5.4.0.225 progress output file***
*******************************************
Thu Feb 20 21:36:07 CST 2020
<---- Compile Equations: Wavelength Domain {st1} in Study 1 {std1}/Solution 1
(sol1) {sol1} ------------------------------------------------------------
Started at Feb 20, 2020 9:36:13 PM.
Running on 4 x Intel(R) Xeon(R) Platinum 8163 CPU at 2.50 GHz.
Using 4 sockets with 25 cores in total on 4CPU.

运算完毕之后,会有这样的字样

1
2
3
4
5
6
7
8
9
Memory: 106911/106911 151517/151517
Iter SolEst Damping Stepsize #Res #Jac #Sol LinErr LinRes
---------- Current Progress: 100 % - Solving linear system
Memory: 115456/117206 149829/151541
1 0.79 1.0000000 0.79 1 1 1 2.2e-11 6.9e-12
Solution time: 815 s. (13 minutes, 35 seconds)
Physical memory: 122.42 GB
Virtual memory: 156.74 GB
Ended at Feb 20, 2020 9:50:10 PM.

我需要的就是快计算完毕后的这些字段的信息,Physical memory: 122.42 GB,Solution time: 815 s. (13 minutes, 35 seconds),
其实只需要依次读取log文件的每一行,判断是否有我需要的字段,如果有就将对应的字段提取出来。具体代码如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
import numpy as np
import matplotlib.pyplot as plt


def changetext(filename):
with open(filename, 'r', encoding='utf-8') as f:
lines_memo = [] # 创建了一个空列表,里面没有元素
lines_time = [] # 创建了一个空列表,里面没有元素
for line in f.readlines():
if 'Physical memory' in line:
memo = line[17:-4] # 只读取有用字段
lines_memo.append(memo)
if 'Solution time' in line:
memo = line[15:19] # 只读取有用字段
lines_time.append(memo)

return lines_memo, lines_time


if __name__ == "__main__":
# 服务器一共有三个文件
dst_dir_1 = '../Log-File/20200220/comsol_progress_11.txt'
dst_dir_2 = '../Log-File/20200220/comsol_progress_21.txt'
dst_dir_3 = '../Log-File/20200220/comsol_progress_31.txt'

lines_memo_1, lines_time_1 = changetext(dst_dir_1)
lines_memo_2, lines_time_2 = changetext(dst_dir_2)
lines_memo_3, lines_time_3 = changetext(dst_dir_3)
print(len(lines_time_1))
print(len(lines_time_2))
print(len(lines_time_3))

num_1 = int((len(lines_time_1)-1)/2)
num_2 = int((len(lines_time_2)-1)/2)
num_3 = int((len(lines_time_3)-1)/2)

x_1 = np.linspace(1, num_1, num_1)
x_2 = np.linspace(1, num_2, num_2)
x_3 = np.linspace(1, num_3, num_3)

memo_mat_1 = np.zeros(num_1)
time_mat_1 = np.zeros(num_1)

memo_mat_2 = np.zeros(num_2)
time_mat_2 = np.zeros(num_2)

memo_mat_3 = np.zeros(num_3)
time_mat_3 = np.zeros(num_3)

for l in range(num_1):
memo_mat_1[l] = float(lines_memo_1[3*(l)+2])
time_mat_1[l] = float(lines_time_1[2*(l)+1])
for l in range(num_2):
memo_mat_2[l] = float(lines_memo_2[3*(l)+2])
time_mat_2[l] = float(lines_time_2[2*(l)+1])
for l in range(num_3):
memo_mat_3[l] = float(lines_memo_3[3*(l)+2])
time_mat_3[l] = float(lines_time_3[2*(l)+1])
# 绘图部分
fig1 = plt.figure(figsize=(6.4*2, 4.8))
plt.subplot(121)
plt.plot(x_1, memo_mat_1, label='4CPU 1')
plt.plot(x_2, memo_mat_2, label='4CPU 2')
plt.plot(x_3, memo_mat_3, label='Dell 1')
plt.xlabel('Loop Counts')
plt.ylabel('Physical Memory (GB)')
plt.legend(loc='best')

plt.subplot(122)
plt.plot(x_1, time_mat_1, label='4CPU 1')
plt.plot(x_2, time_mat_2, label='4CPU 2')
plt.plot(x_3, time_mat_3, label='Dell 1')
plt.xlabel('Loop Counts')
plt.ylabel('Simulation Time (s)')

plt.legend(loc='best')

plt.show()

结果

绘制出来的图如下

实际的运算内存、所需时间与循环次数的关系

可以看见,随着循环次数的增加,运算所需内存从120GB几乎是线性增加到了200GB。所以这种MATLAB Livelink的方法不太好的地方就是内存占用会不断增加。

总结

本次是python编程的一个具体应用,还是一个很简单的例子,没有用到正则表达式。下次希望可以用正则表达式匹配来实现一些功能。