用 Jinja2 加载模板文件(html_base.html
):
from jinja2 import FileSystemLoader,Environment
env = Environment(loader=FileSystemLoader('/data/demo/templates'))
template = env.get_template('html_base.html')
openpyxl 模块是一个读写Excel文档的Python库,
创建Workbook对象:用openpyxl模块的load_workbook
函数进行读取excel文档。
import openpyxl
wb = openpyxl.load_workbook('xx_img_contents.xlsx')
获取当前活跃的Worksheet:
sheet = wb.active
对sheet进行遍历获取相应的内容,并赋值给变量html_str
:
html_str=''
for i in range(2, sheet.max_row + 1):
img_title = sheet.cell(row=i, column=1).value
img_misc = sheet.cell(row=i, column=2).value
img_length = sheet.cell(row=i, column=3).value
img_width = sheet.cell(row=i, column=4).value
if img_title:
pass
else:
continue
html_content = '<div class="thumb"><img src="imgs/' + img_title + '"'+ ' width=' + str(img_width) + ' height=' + str(img_length) + '/> ' + img_misc + ' </div>'
html_str += html_content
对模板进行渲染,将变量html_str
赋值给html_base.html
中的变量content:
template.render(content=html_str)
'<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"\n "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">\n<html xmlns="http://www.w3.org/1999/xhtml">\n<head>\n <meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>\n <title> </title>\n</head>\n<body>\n<div class="thumb"><img src="imgs/sphinx_1.jpg" width=150 height=150/> 图1描述 </div><div class="thumb"><img src="imgs/sphinx_2.jpg" width=151 height=151/> 图2描述 </div><div class="thumb"><img src="imgs/sphinx_3.jpg" width=152 height=152/> 图3描述 </div><div class="thumb"><img src="imgs/Tornado_1.jpg" width=153 height=153/> 图4描述 </div><div class="thumb"><img src="imgs/Tornado_2.jpg" width=154 height=154/> 图5描述 </div><div class="thumb"><img src="imgs/Tornado_3.jpg" width=155 height=155/> 图6描述 </div><div class="thumb"><img src="imgs/Tornado_4.jpg" width=156 height=156/> 图7描述 </div>\n</body>\n</html>'
将图文信息导出到word
思路:解析'http://drr.ikcest.org/' 页面,将图片保存下来,再将图片添加到world文档里,最后将图片删除。 (若将从excel中读取内容生成的html页面的图文保存到word中,只需要把以下代码url,解析规则做相应的修改即可。)
这里使用的是 Python-docx 库: Python-docx 包可以用来创建docx文档,并对现有文档进行更改, 包含段落、分页符、表格、图片、标题、样式等。
Python-docx安装:
pip install python-docx
Requirement already satisfied: python-docx in /opt/conda/lib/python3.12/site-packages (1.1.2) Requirement already satisfied: lxml>=3.1.0 in /opt/conda/lib/python3.12/site-packages (from python-docx) (5.4.0) Requirement already satisfied: typing_extensions>=4.9.0 in /opt/conda/lib/python3.12/site-packages (from python-docx) (4.12.2) Note: you may need to restart the kernel to use updated packages.
导入相应库:
import requests
from bs4 import BeautifulSoup
import os
import docx
from docx import Document
from docx.shared import Inches
解析页面:
url = 'https://ikcest-drr.data.ac.cn/'
html = requests.get(url).content
soup = BeautifulSoup(html,'html.parser')
imgs_table = soup.find('table',{"class":"table"})
img=str(imgs_table.find('div',{"class":"col-sm-4"})).split('src="')[1].split('"')[0]
img_src='https://ikcest-drr.data.ac.cn'+img
img_title=imgs_table.find('div',{"class":"col-sm-8"}).text
img
'/static/upload/82/827587c0-2fda-11eb-8efe-00163e0618d6_m.jpg'
保存图片至本地:
img_name = 'xx_drr_img.jpg'
with open(img_name,'wb')as f:
response = requests.get(img_src).content
f.write(response)
f.close()
创建document
对象,并向文档中添加文字,图片:
document = Document()
document.add_paragraph(img_title)
document.add_picture(img_name)
<docx.shape.InlineShape at 0x7f460480e210>
保存文档:
document.save('xx_tuwen.doc')
删除保存在本地的图片:
os.remove(img_name)
详细排版可了解python-doc进行进一步操作。