Hands-On Artificial Intelligence for IoT
上QQ阅读APP看书,第一时间看更新

Working with CSV files with the csv module

In Python, the csv module provides classes and methods for reading and writing CSV files. The csv.reader method creates a reader object from which rows can be read iteratively. Each time a row is read from the file, the reader object returns a list of fields. For example, the following code demonstrates reading the data file and printing rows:

import csv
import os

with open(os.path.join(data_folder,data_file),newline='') as csvfile:
csvreader = csv.reader(csvfile)
for row in csvreader:
print(row)

The rows are printed as a list of field values:

['date', 'time', 'global_active_power', 'global_reactive_power', 'voltage', 'global_intensity', 'sub_metering_1', 'sub_metering_2', 'sub_metering_3'] ['0007-01-01', '00:00:00', '2.58', '0.136', '241.97', '10.6', '0', '0', '0'] ['0007-01-01', '00:01:00', '2.552', '0.1', '241.75', '10.4', '0', '0', '0'] ['0007-01-01', '00:02:00', '2.55', '0.1', '241.64', '10.4', '0', '0', '0']

The csv.writer method returns an object that can be used to write rows to a file. As an example, the following code writes the first 10 rows of the file to a temporary file and then prints it:

# read the file and write first ten rows
with open(os.path.join(data_folder, data_file), newline='') as csvfile, \
open(os.path.join(data_folder, 'temp.csv'), 'w', newline='') as tempfile:
csvreader = csv.reader(csvfile)
csvwriter = csv.writer(tempfile)
for row, i in zip(csvreader, range(10)):
csvwriter.writerow(row)

# read and print the newly written file
with open(os.path.join(data_folder, 'temp.csv'), newline='') as tempfile:
csvreader = csv.reader(tempfile)
for row in csvreader:
print(row)

The delimiter field and the quoting field characters are important attributes that you can set while creating reader and writer objects.

By default, the delimiter field is , and the other delimiters are specified with the delimiter argument to the reader or writer functions. For example, the following code saves the file with | as delimiter:

    # read the file and write first ten rows with '|' delimiter
with open(os.path.join(data_folder, data_file), newline='') as csvfile, \
open(os.path.join(data_folder, 'temp.csv'), 'w', newline='') as tempfile:
csvreader = csv.reader(csvfile)
csvwriter = csv.writer(tempfile, delimiter='|')
for row, i in zip(csvreader, range(10)):
csvwriter.writerow(row)

# read and print the newly written file
with open(os.path.join(data_folder, 'temp.csv'), newline='') as tempfile:
csvreader = csv.reader(tempfile, delimiter='|')
for row in csvreader:
print(row)

If you do not specify a delimiter character when the file is read, the rows will be read as one field and printed as follows:

['0007-01-01|00:00:00|2.58|0.136|241.97|10.6|0|0|0']

quotechar specifies a character with which to surround fields. The quoting argument specifies what kind of fields can be surrounded with quotechar. The quoting argument can have one of the following values:

  • csv.QUOTE_ALL: All the fields are quoted
  • csv.QUOTE_MINIMAL: Only fields containing special characters are quoted
  • csv.QUOTE_NONNUMERIC: All non-numeric fields are quoted
  • csv.QUOTE_NONE: None of the fields are quoted

As an example, let's print the temp file first:

0007-01-01|00:00:00|2.58|0.136|241.97|10.6|0|0|0
0007-01-01|00:01:00|2.552|0.1|241.75|10.4|0|0|0
0007-01-01|00:02:00|2.55|0.1|241.64|10.4|0|0|0
0007-01-01|00:03:00|2.55|0.1|241.71|10.4|0|0|0
0007-01-01|00:04:00|2.554|0.1|241.98|10.4|0|0|0
0007-01-01|00:05:00|2.55|0.1|241.83|10.4|0|0|0
0007-01-01|00:06:00|2.534|0.096|241.07|10.4|0|0|0
0007-01-01|00:07:00|2.484|0|241.29|10.2|0|0|0
0007-01-01|00:08:00|2.468|0|241.23|10.2|0|0|0

Now let's save it with all fields quoted:

# read the file and write first ten rows with '|' delimiter, all quoting and * as a quote charachetr.
with open(os.path.join(data_folder, data_file), newline='') as csvfile, \
open('temp.csv', 'w', newline='') as tempfile:
csvreader = csv.reader(csvfile)
csvwriter = csv.writer(tempfile, delimiter='|', quotechar='*',quoting=csv.QUOTE_ALL)
for row, i in zip(csvreader, range(10)):
csvwriter.writerow(row)

The file gets saved with the specified quote character:

*0007-01-01*|*00:00:00*|*2.58*|*0.136*|*241.97*|*10.6*|*0*|*0*|*0*
*0007-01-01*|*00:01:00*|*2.552*|*0.1*|*241.75*|*10.4*|*0*|*0*|*0*
*0007-01-01*|*00:02:00*|*2.55*|*0.1*|*241.64*|*10.4*|*0*|*0*|*0*
*0007-01-01*|*00:03:00*|*2.55*|*0.1*|*241.71*|*10.4*|*0*|*0*|*0*
*0007-01-01*|*00:04:00*|*2.554*|*0.1*|*241.98*|*10.4*|*0*|*0*|*0*
*0007-01-01*|*00:05:00*|*2.55*|*0.1*|*241.83*|*10.4*|*0*|*0*|*0*
*0007-01-01*|*00:06:00*|*2.534*|*0.096*|*241.07*|*10.4*|*0*|*0*|*0*
*0007-01-01*|*00:07:00*|*2.484*|*0*|*241.29*|*10.2*|*0*|*0*|*0*
*0007-01-01*|*00:08:00*|*2.468*|*0*|*241.23*|*10.2*|*0*|*0*|*0*

Remember to read the file with the same arguments; otherwise, the * quote character will be treated as part of the field values and printed as follows:

['*0007-01-01*', '*00:00:00*', '*2.58*', '*0.136*', '*241.97*', '*10.6*', '*0*', '*0*', '*0*']

Using the correct arguments with the reader object prints the following:

['0007-01-01', '00:00:00', '2.58', '0.136', '241.97', '10.6', '0', '0', '0']

Now let's see how we can read CSV files with pandas, another popular Python library.