更新时间:2021-07-14 11:06:29
封面
版权页
Credits
About the Author
About the Reviewers
www.PacktPub.com
eBooks discount offers and more
Preface
Why do you need this book?
Data analysis data science big data – what is the big deal?
A brief of history of data analysis with Python
A conjecture about the future
What this book covers
What you need for this book
Who this book is for
Sections
Conventions
Reader feedback
Customer support
Chapter 1. Laying the Foundation for Reproducible Data Analysis
Introduction
Setting up Anaconda
Installing the Data Science Toolbox
Creating a virtual environment with virtualenv and virtualenvwrapper
Sandboxing Python applications with Docker images
Keeping track of package versions and history in IPython Notebook
Configuring IPython
Learning to log for robust error checking
Unit testing your code
Configuring pandas
Configuring matplotlib
Seeding random number generators and NumPy print options
Standardizing reports code style and data access
Chapter 2. Creating Attractive Data Visualizations
Graphing Anscombe's quartet
Choosing seaborn color palettes
Choosing matplotlib color maps
Interacting with IPython Notebook widgets
Viewing a matrix of scatterplots
Visualizing with d3.js via mpld3
Creating heatmaps
Combining box plots and kernel density plots with violin plots
Visualizing network graphs with hive plots
Displaying geographical maps
Using ggplot2-like plots
Highlighting data points with influence plots
Chapter 3. Statistical Data Analysis and Probability
Fitting data to the exponential distribution
Fitting aggregated data to the gamma distribution
Fitting aggregated counts to the Poisson distribution
Determining bias
Estimating kernel density
Determining confidence intervals for mean variance and standard deviation
Sampling with probability weights
Exploring extreme values
Correlating variables with Pearson's correlation
Correlating variables with the Spearman rank correlation
Correlating a binary and a continuous variable with the point biserial correlation
Evaluating relations between variables with ANOVA
Chapter 4. Dealing with Data and Numerical Issues
Clipping and filtering outliers
Winsorizing data
Measuring central tendency of noisy data
Normalizing with the Box-Cox transformation
Transforming data with the power ladder
Transforming data with logarithms
Rebinning data
Applying logit() to transform proportions
Fitting a robust linear model
Taking variance into account with weighted least squares
Using arbitrary precision for optimization
Using arbitrary precision for linear algebra
Chapter 5. Web Mining Databases and Big Data
Simulating web browsing
Scraping the Web
Dealing with non-ASCII text and HTML entities
Implementing association tables
Setting up database migration scripts
Adding a table column to an existing table
Adding indices after table creation
Setting up a test web server
Implementing a star schema with fact and dimension tables
Using HDFS
Setting up Spark
Clustering data with Spark
Chapter 6. Signal Processing and Timeseries
Spectral analysis with periodograms
Estimating power spectral density with the Welch method
Analyzing peaks
Measuring phase synchronization
Exponential smoothing
Evaluating smoothing
Using the Lomb-Scargle periodogram
Analyzing the frequency spectrum of audio
Analyzing signals with the discrete cosine transform