
上QQ阅读APP看书,第一时间看更新
Working with CSV files with the pandas module
In pandas, the read_csv() function returns a DataFrame after reading the CSV file:
df = pd.read_csv('temp.csv')
print(df)
The DataFrame is printed as follows:
date time global_active_power global_reactive_power voltage \ 0 0007-01-01 00:00:00 2.580 0.136 241.97 1 0007-01-01 00:01:00 2.552 0.100 241.75 2 0007-01-01 00:02:00 2.550 0.100 241.64 3 0007-01-01 00:03:00 2.550 0.100 241.71 4 0007-01-01 00:04:00 2.554 0.100 241.98 5 0007-01-01 00:05:00 2.550 0.100 241.83 6 0007-01-01 00:06:00 2.534 0.096 241.07 7 0007-01-01 00:07:00 2.484 0.000 241.29 8 0007-01-01 00:08:00 2.468 0.000 241.23 global_intensity sub_metering_1 sub_metering_2 sub_metering_3 0 10.6 0 0 0 1 10.4 0 0 0 2 10.4 0 0 0 3 10.4 0 0 0 4 10.4 0 0 0 5 10.4 0 0 0 6 10.4 0 0 0 7 10.2 0 0 0 8 10.2 0 0 0
We see in the preceding output that pandas automatically interpreted the date and time columns as their respective data types. The pandas DataFrame can be saved to a CSV file with the to_csv() function:
df.to_csv('temp1.cvs')
pandas, when it comes to reading and writing CSV files, offers plenty of arguments. Some of these are as follows, complete with how they're used:
- header: Defines the row number to be used as a header, or none if the file does not contain any headers.
- sep: Defines the character that separates fields in rows. By default, the value of sep is set to ,.
- names: Defines column names for each column in the file.
- usecols: Defines columns that need to be extracted from the CSV file. Columns that are not mentioned in this argument are not read.
- dtype: Defines the data types for columns in the DataFrame.
Many other available options are documented at the following links: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html and https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_csv.html.
Now let's see how to read data from CSV files with the NumPy module.