
Working with CSV files with the csv module
In Python, the csv module provides classes and methods for reading and writing CSV files. The csv.reader method creates a reader object from which rows can be read iteratively. Each time a row is read from the file, the reader object returns a list of fields. For example, the following code demonstrates reading the data file and printing rows:
import csv
import os
with open(os.path.join(data_folder,data_file),newline='') as csvfile:
csvreader = csv.reader(csvfile)
for row in csvreader:
print(row)
The rows are printed as a list of field values:
['date', 'time', 'global_active_power', 'global_reactive_power', 'voltage', 'global_intensity', 'sub_metering_1', 'sub_metering_2', 'sub_metering_3'] ['0007-01-01', '00:00:00', '2.58', '0.136', '241.97', '10.6', '0', '0', '0'] ['0007-01-01', '00:01:00', '2.552', '0.1', '241.75', '10.4', '0', '0', '0'] ['0007-01-01', '00:02:00', '2.55', '0.1', '241.64', '10.4', '0', '0', '0']
The csv.writer method returns an object that can be used to write rows to a file. As an example, the following code writes the first 10 rows of the file to a temporary file and then prints it:
# read the file and write first ten rows
with open(os.path.join(data_folder, data_file), newline='') as csvfile, \
open(os.path.join(data_folder, 'temp.csv'), 'w', newline='') as tempfile:
csvreader = csv.reader(csvfile)
csvwriter = csv.writer(tempfile)
for row, i in zip(csvreader, range(10)):
csvwriter.writerow(row)
# read and print the newly written file
with open(os.path.join(data_folder, 'temp.csv'), newline='') as tempfile:
csvreader = csv.reader(tempfile)
for row in csvreader:
print(row)
The delimiter field and the quoting field characters are important attributes that you can set while creating reader and writer objects.
By default, the delimiter field is , and the other delimiters are specified with the delimiter argument to the reader or writer functions. For example, the following code saves the file with | as delimiter:
# read the file and write first ten rows with '|' delimiter
with open(os.path.join(data_folder, data_file), newline='') as csvfile, \
open(os.path.join(data_folder, 'temp.csv'), 'w', newline='') as tempfile:
csvreader = csv.reader(csvfile)
csvwriter = csv.writer(tempfile, delimiter='|')
for row, i in zip(csvreader, range(10)):
csvwriter.writerow(row)
# read and print the newly written file
with open(os.path.join(data_folder, 'temp.csv'), newline='') as tempfile:
csvreader = csv.reader(tempfile, delimiter='|')
for row in csvreader:
print(row)
If you do not specify a delimiter character when the file is read, the rows will be read as one field and printed as follows:
['0007-01-01|00:00:00|2.58|0.136|241.97|10.6|0|0|0']
quotechar specifies a character with which to surround fields. The quoting argument specifies what kind of fields can be surrounded with quotechar. The quoting argument can have one of the following values:
- csv.QUOTE_ALL: All the fields are quoted
- csv.QUOTE_MINIMAL: Only fields containing special characters are quoted
- csv.QUOTE_NONNUMERIC: All non-numeric fields are quoted
- csv.QUOTE_NONE: None of the fields are quoted
As an example, let's print the temp file first:
0007-01-01|00:00:00|2.58|0.136|241.97|10.6|0|0|0 0007-01-01|00:01:00|2.552|0.1|241.75|10.4|0|0|0 0007-01-01|00:02:00|2.55|0.1|241.64|10.4|0|0|0 0007-01-01|00:03:00|2.55|0.1|241.71|10.4|0|0|0 0007-01-01|00:04:00|2.554|0.1|241.98|10.4|0|0|0 0007-01-01|00:05:00|2.55|0.1|241.83|10.4|0|0|0 0007-01-01|00:06:00|2.534|0.096|241.07|10.4|0|0|0 0007-01-01|00:07:00|2.484|0|241.29|10.2|0|0|0 0007-01-01|00:08:00|2.468|0|241.23|10.2|0|0|0
Now let's save it with all fields quoted:
# read the file and write first ten rows with '|' delimiter, all quoting and * as a quote charachetr.
with open(os.path.join(data_folder, data_file), newline='') as csvfile, \
open('temp.csv', 'w', newline='') as tempfile:
csvreader = csv.reader(csvfile)
csvwriter = csv.writer(tempfile, delimiter='|', quotechar='*',quoting=csv.QUOTE_ALL)
for row, i in zip(csvreader, range(10)):
csvwriter.writerow(row)
The file gets saved with the specified quote character:
*0007-01-01*|*00:00:00*|*2.58*|*0.136*|*241.97*|*10.6*|*0*|*0*|*0* *0007-01-01*|*00:01:00*|*2.552*|*0.1*|*241.75*|*10.4*|*0*|*0*|*0* *0007-01-01*|*00:02:00*|*2.55*|*0.1*|*241.64*|*10.4*|*0*|*0*|*0* *0007-01-01*|*00:03:00*|*2.55*|*0.1*|*241.71*|*10.4*|*0*|*0*|*0* *0007-01-01*|*00:04:00*|*2.554*|*0.1*|*241.98*|*10.4*|*0*|*0*|*0* *0007-01-01*|*00:05:00*|*2.55*|*0.1*|*241.83*|*10.4*|*0*|*0*|*0* *0007-01-01*|*00:06:00*|*2.534*|*0.096*|*241.07*|*10.4*|*0*|*0*|*0* *0007-01-01*|*00:07:00*|*2.484*|*0*|*241.29*|*10.2*|*0*|*0*|*0* *0007-01-01*|*00:08:00*|*2.468*|*0*|*241.23*|*10.2*|*0*|*0*|*0*
Remember to read the file with the same arguments; otherwise, the * quote character will be treated as part of the field values and printed as follows:
['*0007-01-01*', '*00:00:00*', '*2.58*', '*0.136*', '*241.97*', '*10.6*', '*0*', '*0*', '*0*']
Using the correct arguments with the reader object prints the following:
['0007-01-01', '00:00:00', '2.58', '0.136', '241.97', '10.6', '0', '0', '0']
Now let's see how we can read CSV files with pandas, another popular Python library.