Hands-On Artificial Intelligence for IoT
上QQ阅读APP看书,第一时间看更新

Working with CSV files with the pandas module

In pandas, the read_csv() function returns a DataFrame after reading the CSV file:

df = pd.read_csv('temp.csv')
print(df)

The DataFrame is printed as follows:

         date      time  global_active_power  global_reactive_power  voltage  \
0  0007-01-01  00:00:00                2.580                  0.136   241.97   
1  0007-01-01  00:01:00                2.552                  0.100   241.75   
2  0007-01-01  00:02:00                2.550                  0.100   241.64   
3  0007-01-01  00:03:00                2.550                  0.100   241.71   
4  0007-01-01  00:04:00                2.554                  0.100   241.98   
5  0007-01-01  00:05:00                2.550                  0.100   241.83   
6  0007-01-01  00:06:00                2.534                  0.096   241.07   
7  0007-01-01  00:07:00                2.484                  0.000   241.29   
8  0007-01-01  00:08:00                2.468                  0.000   241.23   

   global_intensity  sub_metering_1  sub_metering_2  sub_metering_3  
0              10.6               0               0               0  
1              10.4               0               0               0  
2              10.4               0               0               0  
3              10.4               0               0               0  
4              10.4               0               0               0  
5              10.4               0               0               0  
6              10.4               0               0               0  
7              10.2               0               0               0  
8              10.2               0               0               0  

We see in the preceding output that pandas automatically interpreted the date and time columns as their respective data types. The pandas DataFrame can be saved to a CSV file with the to_csv() function:

df.to_csv('temp1.cvs')

pandas, when it comes to reading and writing CSV files, offers plenty of arguments. Some of these are as follows, complete with how they're used:

  • header: Defines the row number to be used as a header, or none if the file does not contain any headers.
  • sep: Defines the character that separates fields in rows. By default, the value of sep is set to ,.
  • names: Defines column names for each column in the file.
  • usecols: Defines columns that need to be extracted from the CSV file. Columns that are not mentioned in this argument are not read.
  • dtype: Defines the data types for columns in the DataFrame.

Now let's see how to read data from CSV files with the NumPy module.