首先构建一个字典,用来说明被替换的内容与替换后的内容:
DICT = {"我":"鲁迅"}
- oldFile 为想要替换的文档
- newFile 生成的新文档所在的位置
- DICT 中
key
为想要替换掉的文字,value
为输入的新内容
首先定义文档,并打开原有文档:
from docx import Document
document = Document("/data/demo/demo.docx")
为了实现替换,定义函数,实现在 DOCX 中的对象进行替换:
def replace_obj(doc_obj):
for key, value in DICT.items():
if key in doc_obj.text:
print(key, value)
doc_obj.text = doc_obj.text.replace(key, value)
下面先对表格进行遍历,实现替换的功能:
for table in document.tables:
for row in range(len(table.rows)):
for col in range(len(table.columns)):
replace_obj(table.cell(row, col))
我 鲁迅
对段落块对象进行遍历,实现替换的功能:
for para in document.paragraphs:
for i in range(len(para.runs)):
replace_obj(para.runs[i])
我 鲁迅 我 鲁迅
最后将文档保存为一个新文件:
document.save('xx_replaced_file.docx')
import difflib
dfile = Document('/data/demo/demo.docx')
d2file = Document('xx_replaced_file.docx')
d = difflib.HtmlDiff()
idx = 1
with open('xx_diff.html'.format(idx), 'w') as html_fo:
for para1, para2 in zip(dfile.paragraphs, d2file.paragraphs):
html_fo.write(
d.make_file(para1.text.split('\n'),
para2.text.split('\n'))
)
idx = idx+1
表格内容的检查需要读取表格对象,检查表格内是否替换成功。 从输出来看无论是段落内的还是表格内的都已经替换成功了。
另外,替换内容的场景可以有很多,有时并不仅仅是文字替换,还要根据上下文进行区分。 这种时候,可以配合使用 Python 的字符串处理与正则表达式来进一步判断要替换的文字。 这种处理方式用软件就难以完成了,体现了用编程处理的优点。
# %load xx_diff.html
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html>
<head>
<meta http-equiv="Content-Type"
content="text/html; charset=utf-8" />
<title></title>
<style type="text/css">
table.diff {font-family:Courier; border:medium;}
.diff_header {background-color:#e0e0e0}
td.diff_header {text-align:right}
.diff_next {background-color:#c0c0c0}
.diff_add {background-color:#aaffaa}
.diff_chg {background-color:#ffff77}
.diff_sub {background-color:#ffaaaa}
</style>
</head>
<body>
<table class="diff" id="difflib_chg_to0__top"
cellspacing="0" cellpadding="0" rules="groups" >
<colgroup></colgroup> <colgroup></colgroup> <colgroup></colgroup>
<colgroup></colgroup> <colgroup></colgroup> <colgroup></colgroup>
<tbody>
<tr><td class="diff_next"><a href="#difflib_chg_to0__top">t</a></td><td class="diff_header" id="from0_1">1</td><td nowrap="nowrap">从百草园到三味书屋</td><td class="diff_next"><a href="#difflib_chg_to0__top">t</a></td><td class="diff_header" id="to0_1">1</td><td nowrap="nowrap">从百草园到三味书屋</td></tr>
</tbody>
</table>
<table class="diff" summary="Legends">
<tr> <th colspan="2"> Legends </th> </tr>
<tr> <td> <table border="" summary="Colors">
<tr><th> Colors </th> </tr>
<tr><td class="diff_add"> Added </td></tr>
<tr><td class="diff_chg">Changed</td> </tr>
<tr><td class="diff_sub">Deleted</td> </tr>
</table></td>
<td> <table border="" summary="Links">
<tr><th colspan="2"> Links </th> </tr>
<tr><td>(f)irst change</td> </tr>
<tr><td>(n)ext change</td> </tr>
<tr><td>(t)op</td> </tr>
</table></td> </tr>
</table>
</body>
</html>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html>
<head>
<meta http-equiv="Content-Type"
content="text/html; charset=utf-8" />
<title></title>
<style type="text/css">
table.diff {font-family:Courier; border:medium;}
.diff_header {background-color:#e0e0e0}
td.diff_header {text-align:right}
.diff_next {background-color:#c0c0c0}
.diff_add {background-color:#aaffaa}
.diff_chg {background-color:#ffff77}
.diff_sub {background-color:#ffaaaa}
</style>
</head>
<body>
<table class="diff" id="difflib_chg_to1__top"
cellspacing="0" cellpadding="0" rules="groups" >
<colgroup></colgroup> <colgroup></colgroup> <colgroup></colgroup>
<colgroup></colgroup> <colgroup></colgroup> <colgroup></colgroup>
<tbody>
<tr><td class="diff_next"><a href="#difflib_chg_to1__top">t</a></td><td class="diff_header" id="from1_1">1</td><td nowrap="nowrap"></td><td class="diff_next"><a href="#difflib_chg_to1__top">t</a></td><td class="diff_header" id="to1_1">1</td><td nowrap="nowrap"></td></tr>
</tbody>
</table>
<table class="diff" summary="Legends">
<tr> <th colspan="2"> Legends </th> </tr>
<tr> <td> <table border="" summary="Colors">
<tr><th> Colors </th> </tr>
<tr><td class="diff_add"> Added </td></tr>
<tr><td class="diff_chg">Changed</td> </tr>
<tr><td class="diff_sub">Deleted</td> </tr>
</table></td>
<td> <table border="" summary="Links">
<tr><th colspan="2"> Links </th> </tr>
<tr><td>(f)irst change</td> </tr>
<tr><td>(n)ext change</td> </tr>
<tr><td>(t)op</td> </tr>
</table></td> </tr>
</table>
</body>
</html>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html>
<head>
<meta http-equiv="Content-Type"
content="text/html; charset=utf-8" />
<title></title>
<style type="text/css">
table.diff {font-family:Courier; border:medium;}
.diff_header {background-color:#e0e0e0}
td.diff_header {text-align:right}
.diff_next {background-color:#c0c0c0}
.diff_add {background-color:#aaffaa}
.diff_chg {background-color:#ffff77}
.diff_sub {background-color:#ffaaaa}
</style>
</head>
<body>
<table class="diff" id="difflib_chg_to2__top"
cellspacing="0" cellpadding="0" rules="groups" >
<colgroup></colgroup> <colgroup></colgroup> <colgroup></colgroup>
<colgroup></colgroup> <colgroup></colgroup> <colgroup></colgroup>
<tbody>
<tr><td class="diff_next" id="difflib_chg_to2__0"><a href="#difflib_chg_to2__top">t</a></td><td class="diff_header" id="from2_1">1</td><td nowrap="nowrap"><span class="diff_chg">我</span>家的后面有一个很大的园,相传叫作百草园。现在是早已并屋子一起卖给的子孙了,连那最末次的相见也已经隔了七八年,其中似乎确凿只有一些野草;但那时却是<span class="diff_chg">我</span>的乐园。</td><td class="diff_next"><a href="#difflib_chg_to2__top">t</a></td><td class="diff_header" id="to2_1">1</td><td nowrap="nowrap"><span class="diff_chg">鲁迅</span>家的后面有一个很大的园,相传叫作百草园。现在是早已并屋子一起卖给的子孙了,连那最末次的相见也已经隔了七八年,其中似乎确凿只有一些野草;但那时却是<span class="diff_chg">鲁迅</span>的乐园。</td></tr>
</tbody>
</table>
<table class="diff" summary="Legends">
<tr> <th colspan="2"> Legends </th> </tr>
<tr> <td> <table border="" summary="Colors">
<tr><th> Colors </th> </tr>
<tr><td class="diff_add"> Added </td></tr>
<tr><td class="diff_chg">Changed</td> </tr>
<tr><td class="diff_sub">Deleted</td> </tr>
</table></td>
<td> <table border="" summary="Links">
<tr><th colspan="2"> Links </th> </tr>
<tr><td>(f)irst change</td> </tr>
<tr><td>(n)ext change</td> </tr>
<tr><td>(t)op</td> </tr>
</table></td> </tr>
</table>
</body>
</html>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html>
<head>
<meta http-equiv="Content-Type"
content="text/html; charset=utf-8" />
<title></title>
<style type="text/css">
table.diff {font-family:Courier; border:medium;}
.diff_header {background-color:#e0e0e0}
td.diff_header {text-align:right}
.diff_next {background-color:#c0c0c0}
.diff_add {background-color:#aaffaa}
.diff_chg {background-color:#ffff77}
.diff_sub {background-color:#ffaaaa}
</style>
</head>
<body>
<table class="diff" id="difflib_chg_to3__top"
cellspacing="0" cellpadding="0" rules="groups" >
<colgroup></colgroup> <colgroup></colgroup> <colgroup></colgroup>
<colgroup></colgroup> <colgroup></colgroup> <colgroup></colgroup>
<tbody>
<tr><td class="diff_next"><a href="#difflib_chg_to3__top">t</a></td><td class="diff_header" id="from3_1">1</td><td nowrap="nowrap"></td><td class="diff_next"><a href="#difflib_chg_to3__top">t</a></td><td class="diff_header" id="to3_1">1</td><td nowrap="nowrap"></td></tr>
</tbody>
</table>
<table class="diff" summary="Legends">
<tr> <th colspan="2"> Legends </th> </tr>
<tr> <td> <table border="" summary="Colors">
<tr><th> Colors </th> </tr>
<tr><td class="diff_add"> Added </td></tr>
<tr><td class="diff_chg">Changed</td> </tr>
<tr><td class="diff_sub">Deleted</td> </tr>
</table></td>
<td> <table border="" summary="Links">
<tr><th colspan="2"> Links </th> </tr>
<tr><td>(f)irst change</td> </tr>
<tr><td>(n)ext change</td> </tr>
<tr><td>(t)op</td> </tr>
</table></td> </tr>
</table>
</body>
</html>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html>
<head>
<meta http-equiv="Content-Type"
content="text/html; charset=utf-8" />
<title></title>
<style type="text/css">
table.diff {font-family:Courier; border:medium;}
.diff_header {background-color:#e0e0e0}
td.diff_header {text-align:right}
.diff_next {background-color:#c0c0c0}
.diff_add {background-color:#aaffaa}
.diff_chg {background-color:#ffff77}
.diff_sub {background-color:#ffaaaa}
</style>
</head>
<body>
<table class="diff" id="difflib_chg_to4__top"
cellspacing="0" cellpadding="0" rules="groups" >
<colgroup></colgroup> <colgroup></colgroup> <colgroup></colgroup>
<colgroup></colgroup> <colgroup></colgroup> <colgroup></colgroup>
<tbody>
<tr><td class="diff_next"><a href="#difflib_chg_to4__top">t</a></td><td class="diff_header" id="from4_1">1</td><td nowrap="nowrap">这是鲁迅的母校:三味书屋</td><td class="diff_next"><a href="#difflib_chg_to4__top">t</a></td><td class="diff_header" id="to4_1">1</td><td nowrap="nowrap">这是鲁迅的母校:三味书屋</td></tr>
</tbody>
</table>
<table class="diff" summary="Legends">
<tr> <th colspan="2"> Legends </th> </tr>
<tr> <td> <table border="" summary="Colors">
<tr><th> Colors </th> </tr>
<tr><td class="diff_add"> Added </td></tr>
<tr><td class="diff_chg">Changed</td> </tr>
<tr><td class="diff_sub">Deleted</td> </tr>
</table></td>
<td> <table border="" summary="Links">
<tr><th colspan="2"> Links </th> </tr>
<tr><td>(f)irst change</td> </tr>
<tr><td>(n)ext change</td> </tr>
<tr><td>(t)op</td> </tr>
</table></td> </tr>
</table>
</body>
</html>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html>
<head>
<meta http-equiv="Content-Type"
content="text/html; charset=utf-8" />
<title></title>
<style type="text/css">
table.diff {font-family:Courier; border:medium;}
.diff_header {background-color:#e0e0e0}
td.diff_header {text-align:right}
.diff_next {background-color:#c0c0c0}
.diff_add {background-color:#aaffaa}
.diff_chg {background-color:#ffff77}
.diff_sub {background-color:#ffaaaa}
</style>
</head>
<body>
<table class="diff" id="difflib_chg_to5__top"
cellspacing="0" cellpadding="0" rules="groups" >
<colgroup></colgroup> <colgroup></colgroup> <colgroup></colgroup>
<colgroup></colgroup> <colgroup></colgroup> <colgroup></colgroup>
<tbody>
<tr><td class="diff_next"><a href="#difflib_chg_to5__top">t</a></td><td class="diff_header" id="from5_1">1</td><td nowrap="nowrap"></td><td class="diff_next"><a href="#difflib_chg_to5__top">t</a></td><td class="diff_header" id="to5_1">1</td><td nowrap="nowrap"></td></tr>
</tbody>
</table>
<table class="diff" summary="Legends">
<tr> <th colspan="2"> Legends </th> </tr>
<tr> <td> <table border="" summary="Colors">
<tr><th> Colors </th> </tr>
<tr><td class="diff_add"> Added </td></tr>
<tr><td class="diff_chg">Changed</td> </tr>
<tr><td class="diff_sub">Deleted</td> </tr>
</table></td>
<td> <table border="" summary="Links">
<tr><th colspan="2"> Links </th> </tr>
<tr><td>(f)irst change</td> </tr>
<tr><td>(n)ext change</td> </tr>
<tr><td>(t)op</td> </tr>
</table></td> </tr>
</table>
</body>
</html>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html>
<head>
<meta http-equiv="Content-Type"
content="text/html; charset=utf-8" />
<title></title>
<style type="text/css">
table.diff {font-family:Courier; border:medium;}
.diff_header {background-color:#e0e0e0}
td.diff_header {text-align:right}
.diff_next {background-color:#c0c0c0}
.diff_add {background-color:#aaffaa}
.diff_chg {background-color:#ffff77}
.diff_sub {background-color:#ffaaaa}
</style>
</head>
<body>
<table class="diff" id="difflib_chg_to6__top"
cellspacing="0" cellpadding="0" rules="groups" >
<colgroup></colgroup> <colgroup></colgroup> <colgroup></colgroup>
<colgroup></colgroup> <colgroup></colgroup> <colgroup></colgroup>
<tbody>
<tr><td class="diff_next"><a href="#difflib_chg_to6__top">t</a></td><td class="diff_header" id="from6_1">1</td><td nowrap="nowrap">不必说碧绿的菜畦,光滑的石井栏,高大的皂荚树,紫红的桑椹;也不必说鸣蝉在树叶里长吟,肥胖的黄蜂伏在菜花上,轻捷的(云雀)忽然从草间直窜向云霄里去了。单是周围的短短的泥墙根一带,就有无限趣味。</td><td class="diff_next"><a href="#difflib_chg_to6__top">t</a></td><td class="diff_header" id="to6_1">1</td><td nowrap="nowrap">不必说碧绿的菜畦,光滑的石井栏,高大的皂荚树,紫红的桑椹;也不必说鸣蝉在树叶里长吟,肥胖的黄蜂伏在菜花上,轻捷的(云雀)忽然从草间直窜向云霄里去了。单是周围的短短的泥墙根一带,就有无限趣味。</td></tr>
</tbody>
</table>
<table class="diff" summary="Legends">
<tr> <th colspan="2"> Legends </th> </tr>
<tr> <td> <table border="" summary="Colors">
<tr><th> Colors </th> </tr>
<tr><td class="diff_add"> Added </td></tr>
<tr><td class="diff_chg">Changed</td> </tr>
<tr><td class="diff_sub">Deleted</td> </tr>
</table></td>
<td> <table border="" summary="Links">
<tr><th colspan="2"> Links </th> </tr>
<tr><td>(f)irst change</td> </tr>
<tr><td>(n)ext change</td> </tr>
<tr><td>(t)op</td> </tr>
</table></td> </tr>
</table>
</body>
</html>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html>
<head>
<meta http-equiv="Content-Type"
content="text/html; charset=utf-8" />
<title></title>
<style type="text/css">
table.diff {font-family:Courier; border:medium;}
.diff_header {background-color:#e0e0e0}
td.diff_header {text-align:right}
.diff_next {background-color:#c0c0c0}
.diff_add {background-color:#aaffaa}
.diff_chg {background-color:#ffff77}
.diff_sub {background-color:#ffaaaa}
</style>
</head>
<body>
<table class="diff" id="difflib_chg_to7__top"
cellspacing="0" cellpadding="0" rules="groups" >
<colgroup></colgroup> <colgroup></colgroup> <colgroup></colgroup>
<colgroup></colgroup> <colgroup></colgroup> <colgroup></colgroup>
<tbody>
<tr><td class="diff_next"><a href="#difflib_chg_to7__top">t</a></td><td class="diff_header" id="from7_1">1</td><td nowrap="nowrap"></td><td class="diff_next"><a href="#difflib_chg_to7__top">t</a></td><td class="diff_header" id="to7_1">1</td><td nowrap="nowrap"></td></tr>
</tbody>
</table>
<table class="diff" summary="Legends">
<tr> <th colspan="2"> Legends </th> </tr>
<tr> <td> <table border="" summary="Colors">
<tr><th> Colors </th> </tr>
<tr><td class="diff_add"> Added </td></tr>
<tr><td class="diff_chg">Changed</td> </tr>
<tr><td class="diff_sub">Deleted</td> </tr>
</table></td>
<td> <table border="" summary="Links">
<tr><th colspan="2"> Links </th> </tr>
<tr><td>(f)irst change</td> </tr>
<tr><td>(n)ext change</td> </tr>
<tr><td>(t)op</td> </tr>
</table></td> </tr>
</table>
</body>
</html>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html>
<head>
<meta http-equiv="Content-Type"
content="text/html; charset=utf-8" />
<title></title>
<style type="text/css">
table.diff {font-family:Courier; border:medium;}
.diff_header {background-color:#e0e0e0}
td.diff_header {text-align:right}
.diff_next {background-color:#c0c0c0}
.diff_add {background-color:#aaffaa}
.diff_chg {background-color:#ffff77}
.diff_sub {background-color:#ffaaaa}
</style>
</head>
<body>
<table class="diff" id="difflib_chg_to8__top"
cellspacing="0" cellpadding="0" rules="groups" >
<colgroup></colgroup> <colgroup></colgroup> <colgroup></colgroup>
<colgroup></colgroup> <colgroup></colgroup> <colgroup></colgroup>
<tbody>
<tr><td class="diff_next"><a href="#difflib_chg_to8__top">t</a></td><td class="diff_header" id="from8_1">1</td><td nowrap="nowrap"></td><td class="diff_next"><a href="#difflib_chg_to8__top">t</a></td><td class="diff_header" id="to8_1">1</td><td nowrap="nowrap"></td></tr>
</tbody>
</table>
<table class="diff" summary="Legends">
<tr> <th colspan="2"> Legends </th> </tr>
<tr> <td> <table border="" summary="Colors">
<tr><th> Colors </th> </tr>
<tr><td class="diff_add"> Added </td></tr>
<tr><td class="diff_chg">Changed</td> </tr>
<tr><td class="diff_sub">Deleted</td> </tr>
</table></td>
<td> <table border="" summary="Links">
<tr><th colspan="2"> Links </th> </tr>
<tr><td>(f)irst change</td> </tr>
<tr><td>(n)ext change</td> </tr>
<tr><td>(t)op</td> </tr>
</table></td> </tr>
</table>
</body>
</html>