Python #Tools #Dataprocessing

介绍

我们会有一些场景会需要对一些文件的某些行进行操作,比如删除、增加、修改一些字段。我自己的需求是,本网站的所有博客会有一些font matter字段,我需要将这些字段删除掉而将字段里面的标题提取出来作为标题。如下图所示

1
2
3
4
5
6
7
8
9
10
---
title: 用python模糊匹配文件夹下的文件并复制文件到另外的文件夹
date: 2020-02-16 00:25:30
tags:
- Python
categories:
- 技术
mathjax: true
cover: https://raw.githubusercontent.com/knifelees3/my_pictures/master/icons/PythonICON.jpg
---

这是本片笔记的font matter,因此我们需要首先读取掉两个---,之后的就是我们的正文。另外还需要将title:后面的字段提取出来。作为第一行。下面是具体的实现方法

具体实现

如何实现对文件的读取呢?用open函数即可,如下,其中第二个参数r,w,a分别代表只读、写(覆盖),写(补充)。打开之后,需要用readlines,来依次读取每一行的字符。读取后的字符是一个字符串,用几个if语句来讲我们要的结果筛选出来就行了。具体实现代码如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
def changetext(filename):
with open(filename, 'r', encoding='utf-8') as f:
lines = [] # 创建了一个空列表,里面没有元素
counter = 0
for line in f.readlines():
print(counter)
if 'title' in line:
title = '#'+' '+line[7:]
lines.append(title)
if counter != 0 and counter != 1:
lines.append(line)
if line == '---\n':
counter = counter+1
f.close()
with open(filename, 'w', encoding='utf-8') as f:
for line in lines:
f.write('%s\n' % line)
print('已经修改了文件: ' + filename)

补充一下,open函数的使用

1
>open(file, mode='r', buffering=-1, encoding=None, errors=None, newline=None, closefd=True, opener=None)

而该函数的mode字,有以下选项,

Character Meaning
r 读(默认)
w open for writing, truncating the file first
x 打开以进行独占创建,如果文件已经存在则失败
a 打开进行写入,如果存在则追加到文件末尾
b 二进制模式
t 文字模式(默认)
+ 打开进行更新(读写)

readlines函数的使用

Python文件方法readline()使用readline()读取直到EOF为止,并返回包含这些行的列表。 如果存在可选的sizehint参数,则读取的总行数大约为sizehint字节(可能在四舍五入为内部缓冲区大小之后),而不是读取EOF。 仅当立即遇到EOF时,才返回一个空字符串。

sizehint为要从文件读取的字节数。下面是一个读取的例子,假设有一个文件foo.txt

1
2
3
4
5
This is 1st line
This is 2nd line
This is 3rd line
This is 4th line
This is 5th line

然后书写如下的python文件test.py

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
#!/usr/bin/python

# Open a file
fo = open("foo.txt", "rw+")
print "Name of the file: ", fo.name

# Assuming file has following 5 lines
# This is 1st line
# This is 2nd line
# This is 3rd line
# This is 4th line
# This is 5th line

line = fo.readlines()
print "Read Line: %s" % (line)

line = fo.readlines(2)
print "Read Line: %s" % (line)

# Close opend file
fo.close()

运行之后会有如下的结果

1
2
3
Name of the file:  foo.txt
Read Line: ['This is 1st line\n', 'This is 2nd line\n', 'This is 3rd line\n', 'This is 4th line\n', 'This is 5th line']
Read Line: []