CSV 文件中的每行代表电子表格中的一行, 逗号分割了该行中的单元格。
本章将使用这个文件作为交互式环境的例子,
或在文本编辑器中输入文本,并保存为example.csv
。
example.csv
文件内容如下:
!cat /data/demo/demo.csv
序号,URL,TITLE,Abstract,标题,摘要 1,http://drr.ikcest.org/info/9ad53,Spatio-temporal Distribution of Desertification Disaster along the China-Mongolia railway (Mongolia section) in 2000 and 2015,"This dataset described the Spatio-temporal Distribution of Desertification Disaster along the China-Mongolia railway (Mongolia section) in 2000 and 2015, which mainly record the degree of desertification, and spatiotemporal distribution information. They were collected and organized by the Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences. This dataset was composed of 6 vector files and 4 grid files. It can be used in the study of desertification. And it can provide important basis for monitoring and prevention of desertification disaster.",中蒙铁路沿线(蒙古段)荒漠化灾害时空分布数据集(2000、2015),本数据集为2000、2015年中蒙铁路沿线(蒙古段)荒漠化灾害时空分布数据,其主要记录荒漠化程度及荒漠化时空分布特点,共6个矢量文件和4个栅格文件。它们由中国科学院地理科学与资源研究所收集和组织,其可用于荒漠化研究,为荒漠化灾害监测与防控提供重要依据。 2,http://drr.ikcest.org/info/98d74,"Meteorological resource database of "" Belt and Road"" China-Mongolia-Russia economic corridor","This dataset described the distribution of meteorological resource in China-Mongolia-Russia economic corridor, which mainly record the travel climate comfortable degree in the cross-border region between China and Russia. They were collected and organized by the Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences. This dataset was composed of 56 raster files. It can be used to study meteorological disasters and provide important basis for disaster prevention and reduction and reducing the negative effects of meteorological disasters.",“一带一路”中蒙俄经济走廊气象资源数据,"该数据集描述了中蒙经济走廊的气象资源分布,主要记录了中俄跨境区域的旅游气候舒适度。它们由中国科学院地理科学与自然资源研究所收集和组织。该 数据集由56个光栅文件组成。它可用于研究气象灾害,为防灾减灾和减少气象灾害的负面影响提供重要依据。" 3,http://drr.ikcest.org/info/91462,Dataset of desertification related land cover distribution along China-Mongolia railway (Mongolia section) in 2015,"This dataset was the land cover distribution data related to desertification along the China-Mongolia railway (Mongolia section) in 2015. This dataset used the object-oriented remote sensing image interpretation method to obtain the desertification data with a resolution of 30 meters along the China-Mongolia railway (Mongolia section) in 2015. It was collected and organized by the Institute of Geographic Sciences and Natural Resources Research, CAS. It can be used to study the risk assessment of desertification in China-Mongolia railway, providing an important basis for preventing sandstorms, floods and other disasters caused by desertification and alleviating the negative impact of desertification.",中蒙铁路沿线(蒙古段)荒漠化土地覆被分布数据集(2015),"该数据集是2015年中蒙铁路(蒙古段)荒漠化相关的土地覆盖分布数据。该数据集采用面向对象的遥感影像解译方法获取2015年中蒙铁路(蒙古段)沿海30米的 荒漠化数据。由中国科学院地理科学与资源研究所收集整理。可用于研究中蒙铁路荒漠化风险评估,为防治沙漠化造成的沙尘暴,洪涝等灾害,减轻荒漠化的负面影响提供重要依据。"
CSV 文件是简单的,缺少Excel电子表格的许多功能。 例如, CSV 文件中:
- 值没有类型,所有东西都是字符串;
- 没有字体大小或颜色的设置;
- 没有多个工作表;
- 不能指定单元格的宽度和高度;
- 不能合并单元格;
- 不能嵌入图像或图表。
CSV的文件的优势是简单。CSV文件被许多种类的程序广泛地支持,可以在文本编辑器中查看(包括 IDLE的文件编辑器), 它是表示电子表格数据的直接方式。 CSV 格式和它声称的完全一致:它就是一个文本文件,具有逗号分隔的值。
因为CSV文件就是文本文件,所以可能会尝试将它们读入一个字符串,然后处理这个字符串。
例如,因为 CSV 文件中的每个单元格有逗号分割,也许可以只是对每行文本调用 split()
方法,来取得这些值。
但并非 CSV 文件中的每个逗号,都表示两个单元格之间的分界。
CSV 文件也有自己的转义字符,允许逗号和其他字符作为值的一部分。
split()
方法不能处理这些转义字符。
因为这些潜在的缺陷,所以在 Python 中应该使用 csv
模块来读写CSV文件。
import csv
exampleFile = open('/data/demo/demo.csv')
exampleReader = csv.reader(exampleFile)
exampleData = list(exampleReader)
exampleData
[['序号', 'URL', 'TITLE', 'Abstract', '标题', '摘要'], ['1', 'http://drr.ikcest.org/info/9ad53', 'Spatio-temporal Distribution of Desertification Disaster along the China-Mongolia railway (Mongolia section) in 2000 and 2015', 'This dataset described the Spatio-temporal Distribution of Desertification Disaster along the China-Mongolia railway (Mongolia section) in 2000 and 2015, which mainly record the degree of desertification, and spatiotemporal distribution information. They were collected and organized by the Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences. This dataset was composed of 6 vector files and 4 grid files. It can be used in the study of desertification. And it can provide important basis for monitoring and prevention of desertification disaster.', '中蒙铁路沿线(蒙古段)荒漠化灾害时空分布数据集(2000、2015)', '本数据集为2000、2015年中蒙铁路沿线(蒙古段)荒漠化灾害时空分布数据,其主要记录荒漠化程度及荒漠化时空分布特点,共6个矢量文件和4个栅格文件。它们由中国科学院地理科学与资源研究所收集和组织,其可用于荒漠化研究,为荒漠化灾害监测与防控提供重要依据。'], ['2', 'http://drr.ikcest.org/info/98d74', 'Meteorological resource database of " Belt and Road" China-Mongolia-Russia economic corridor', 'This dataset described the distribution of meteorological resource in China-Mongolia-Russia economic corridor, which mainly record the travel climate comfortable degree in the cross-border region between China and Russia. They were collected and organized by the Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences. This dataset was composed of 56 raster files. It can be used to study meteorological disasters and provide important basis for disaster prevention and reduction and reducing the negative effects of meteorological disasters.', '“一带一路”中蒙俄经济走廊气象资源数据', '该数据集描述了中蒙经济走廊的气象资源分布,主要记录了中俄跨境区域的旅游气候舒适度。它们由中国科学院地理科学与自然资源研究所收集和组织。该\n数据集由56个光栅文件组成。它可用于研究气象灾害,为防灾减灾和减少气象灾害的负面影响提供重要依据。'], ['3', 'http://drr.ikcest.org/info/91462', 'Dataset of desertification related land cover distribution along China-Mongolia railway (Mongolia section) in 2015', 'This dataset was the land cover distribution data related to desertification along the China-Mongolia railway (Mongolia section) in 2015. This dataset used the object-oriented remote sensing image interpretation method to obtain the desertification data with a resolution of 30 meters along the China-Mongolia railway (Mongolia section) in 2015. It was collected and organized by the Institute of Geographic Sciences and Natural Resources Research, CAS. It can be used to study the risk assessment of desertification in China-Mongolia railway, providing an important basis for preventing sandstorms, floods and other disasters caused by desertification and alleviating the negative impact of desertification.', '中蒙铁路沿线(蒙古段)荒漠化土地覆被分布数据集(2015)', '该数据集是2015年中蒙铁路(蒙古段)荒漠化相关的土地覆盖分布数据。该数据集采用面向对象的遥感影像解译方法获取2015年中蒙铁路(蒙古段)沿海30米的\n荒漠化数据。由中国科学院地理科学与资源研究所收集整理。可用于研究中蒙铁路荒漠化风险评估,为防治沙漠化造成的沙尘暴,洪涝等灾害,减轻荒漠化的负面影响提供重要依据。']]
csv
模块是Python自带的,所以不需要安装就可以导入它。
要用 csv
模块读取CSV文件,首先用 open()
函数打开它,
就像打开任何其他文本文件一样。但是,
不用在 open()
返回的 File
对象上调用 read()
或 readlines()
方法,
而是将它传递给 csv.reader()
函数。
这将返回一个 Reader
对象,
供使用。请注意,不能直接将文件名字符串传递给 csv.reader()
函数。
要访问 Reader
对象中的值,最直接的方法,就是将它转换成一个普通Python列表,
即将它传递给 list()
。
在这个 Reader
对象上应用 list()
函数,
将返回一个列表的列表。可以将它保存在变量exampleData
中。
在交互式环境中输入exampleData
,
将显示列表的列表。
既然已经将 CSV 文件表示为列表的列表,就可以用表达式 exampleData[row][col]
来访问特定行和列的值。
其中, row
是 exampleData
中一个列表的下标,
col
是该列表中想访问项的下标。
在交互式环境中输入以下代码:
exampleData[0][0]
'序号'
exampleData[0][1]
'URL'
exampleData[0][2]
'TITLE'
exampleData[1][1]
'http://drr.ikcest.org/info/9ad53'
import csv
exampleFile = open ('/data/demo/demo.csv')
exampleReader = csv.reader(exampleFile)
for row in exampleReader:
print('Row #' + str(exampleReader.line_num) + ' ' + str(row))
Row #1 ['序号', 'URL', 'TITLE', 'Abstract', '标题', '摘要'] Row #2 ['1', 'http://drr.ikcest.org/info/9ad53', 'Spatio-temporal Distribution of Desertification Disaster along the China-Mongolia railway (Mongolia section) in 2000 and 2015', 'This dataset described the Spatio-temporal Distribution of Desertification Disaster along the China-Mongolia railway (Mongolia section) in 2000 and 2015, which mainly record the degree of desertification, and spatiotemporal distribution information. They were collected and organized by the Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences. This dataset was composed of 6 vector files and 4 grid files. It can be used in the study of desertification. And it can provide important basis for monitoring and prevention of desertification disaster.', '中蒙铁路沿线(蒙古段)荒漠化灾害时空分布数据集(2000、2015)', '本数据集为2000、2015年中蒙铁路沿线(蒙古段)荒漠化灾害时空分布数据,其主要记录荒漠化程度及荒漠化时空分布特点,共6个矢量文件和4个栅格文件。它们由中国科学院地理科学与资源研究所收集和组织,其可用于荒漠化研究,为荒漠化灾害监测与防控提供重要依据。'] Row #4 ['2', 'http://drr.ikcest.org/info/98d74', 'Meteorological resource database of " Belt and Road" China-Mongolia-Russia economic corridor', 'This dataset described the distribution of meteorological resource in China-Mongolia-Russia economic corridor, which mainly record the travel climate comfortable degree in the cross-border region between China and Russia. They were collected and organized by the Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences. This dataset was composed of 56 raster files. It can be used to study meteorological disasters and provide important basis for disaster prevention and reduction and reducing the negative effects of meteorological disasters.', '“一带一路”中蒙俄经济走廊气象资源数据', '该数据集描述了中蒙经济走廊的气象资源分布,主要记录了中俄跨境区域的旅游气候舒适度。它们由中国科学院地理科学与自然资源研究所收集和组织。该\n数据集由56个光栅文件组成。它可用于研究气象灾害,为防灾减灾和减少气象灾害的负面影响提供重要依据。'] Row #6 ['3', 'http://drr.ikcest.org/info/91462', 'Dataset of desertification related land cover distribution along China-Mongolia railway (Mongolia section) in 2015', 'This dataset was the land cover distribution data related to desertification along the China-Mongolia railway (Mongolia section) in 2015. This dataset used the object-oriented remote sensing image interpretation method to obtain the desertification data with a resolution of 30 meters along the China-Mongolia railway (Mongolia section) in 2015. It was collected and organized by the Institute of Geographic Sciences and Natural Resources Research, CAS. It can be used to study the risk assessment of desertification in China-Mongolia railway, providing an important basis for preventing sandstorms, floods and other disasters caused by desertification and alleviating the negative impact of desertification.', '中蒙铁路沿线(蒙古段)荒漠化土地覆被分布数据集(2015)', '该数据集是2015年中蒙铁路(蒙古段)荒漠化相关的土地覆盖分布数据。该数据集采用面向对象的遥感影像解译方法获取2015年中蒙铁路(蒙古段)沿海30米的\n荒漠化数据。由中国科学院地理科学与资源研究所收集整理。可用于研究中蒙铁路荒漠化风险评估,为防治沙漠化造成的沙尘暴,洪涝等灾害,减轻荒漠化的负面影响提供重要依据。']
import csv
outputFile = open('xx_output.csv', 'w', newline='')
outputWriter = csv.writer(outputFile)
outputWriter.writerow(['spam', 'eggs', 'bacon', 'ham'])
21
outputWriter.writerow(['Hello, world!', 'eggs', 'bacon', 'ham'])
32
outputWriter.writerow([1, 2, 3.141592, 4])
16
outputFile.close()
首先,调用 open()
并传入 'w'
,以写模式打开一个文件。
这将创建对象。然后将它传递给 csv.writer()
,
创建一个Writer
对象。
在Windows上,需要为 open()
函数的 newline
关键字参数传入一个空字符串。
这样做的技术原因超出了本书的范围。
如果忘记设置 newline
关键字参数, output.csv
中的行距将有两倍。
Writer
对象的 writerow()
方法接受一个列表参数。
列表中的每个词,放在输出的CSV文件中的一个单元格中。
writerow()
函数的返回值,是写入文件中这一行的字符数(包括换行字符)。
这段代码生成的文件像下面这样:
spam,eggs,bacon,ham
"Hello, world!",eggs,bacon,ham
1,2,3.141592,4
import csv
csvFile = open('xx_example.tsv', 'w', newline='')
csvWriter = csv.writer(csvFile, delimiter='\t', lineterminator='\n\n')
csvWriter.writerow(['apples', 'oranges', 'grapes'])
23
csvWriter.writerow(['eggs', 'bacon', 'ham'])
16
csvWriter.writerow(['spam', 'spam', 'spam', 'spam', 'spam', 'spam'])
31
csvFile.close()
这改变了文件中的分隔符和行终止字符。分隔符是一行中单元格之间出现的字符。
默认情况下, CSV文件的分隔符是逗号。行终止字符是出现在行末的字符。默认情况下,
行终止字符是换行符。可以利用 csv.writer()
的 delimiter
和 lineterminator
关键字参数,
将这些字符改成不同的值。
传入 delimeter='\t'
和 lineterminator='\n\n'
,
这将单元格之间的字符改变为制表符,将行之间的字符改变为两个换行符。
调用 writerow()
三次,
得到3行。
这产生了文件 example.tsv
,包含以下内容:
!more xx_example.tsv
apples oranges grapes eggs bacon ham spam spam spam spam spam spam
既然单元格是由制表符分隔的,就使用文件扩展名.tsv
,表示制表符分隔的值。