Hands-On Artificial Intelligence for IoT

上QQ阅读APP看书，第一时间看更新

Working with CSV files with the pandas module

In pandas, the read_csv() function returns a DataFrame after reading the CSV file:

df = pd.read_csv('temp.csv')
print(df)

The DataFrame is printed as follows:

         date      time  global_active_power  global_reactive_power  voltage  \
0  0007-01-01  00:00:00                2.580                  0.136   241.97   
1  0007-01-01  00:01:00                2.552                  0.100   241.75   
2  0007-01-01  00:02:00                2.550                  0.100   241.64   
3  0007-01-01  00:03:00                2.550                  0.100   241.71   
4  0007-01-01  00:04:00                2.554                  0.100   241.98   
5  0007-01-01  00:05:00                2.550                  0.100   241.83   
6  0007-01-01  00:06:00                2.534                  0.096   241.07   
7  0007-01-01  00:07:00                2.484                  0.000   241.29   
8  0007-01-01  00:08:00                2.468                  0.000   241.23   

   global_intensity  sub_metering_1  sub_metering_2  sub_metering_3  
0              10.6               0               0               0  
1              10.4               0               0               0  
2              10.4               0               0               0  
3              10.4               0               0               0  
4              10.4               0               0               0  
5              10.4               0               0               0  
6              10.4               0               0               0  
7              10.2               0               0               0  
8              10.2               0               0               0

We see in the preceding output that pandas automatically interpreted the date and time columns as their respective data types. The pandas DataFrame can be saved to a CSV file with the to_csv() function:

df.to_csv('temp1.cvs')

pandas, when it comes to reading and writing CSV files, offers plenty of arguments. Some of these are as follows, complete with how they're used:

header: Defines the row number to be used as a header, or none if the file does not contain any headers.
sep: Defines the character that separates fields in rows. By default, the value of sep is set to ,.
names: Defines column names for each column in the file.
usecols: Defines columns that need to be extracted from the CSV file. Columns that are not mentioned in this argument are not read.
dtype: Defines the data types for columns in the DataFrame.

Many other available options are documented at the following links: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html and https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_csv.html.

Now let's see how to read data from CSV files with the NumPy module.