
What this book covers
Chapter 1, Laying the Foundation for Reproducible Data Analysis, is a pretty important chapter, and I recommend that you do not skip it. It explains Anaconda, Docker, unit testing, logging, and other essential elements of reproducible data analysis.
Chapter 2, Creating Attractive Data Visualizations, demonstrates how to visualize data and mentions frequently encountered pitfalls.
Chapter 3, Statistical Data Analysis and Probability, discusses statistical probability distributions and correlation between two variables.
Chapter 4, Dealing with Data and Numerical Issues, is about outliers and other common data issues. Data is almost never perfect, so a large portion of the analysis effort goes into dealing with data imperfections.
Chapter 5, Web Mining, Databases, and Big Data, is light on mathematics, but more focused on technical topics, such as databases, web scraping, and big data.
Chapter 6, Signal Processing and Timeseries, is about time series data, which is abundant and requires special techniques. Usually, we are interested in trends and seasonality or periodicity.
Chapter 7, Selecting Stocks with Financial Data Analysis, focuses on stock investing because stock price data is abundant. This is the only chapter on finance and the content should be at least partially relevant if stocks don't interest you.
Chapter 8, Text Mining and Social Network Analysis, helps you cope with the floods of textual and social media information.
Chapter 9, Ensemble Learning and Dimensionality Reduction, covers ensemble learning, classification and regression algorithms, as well as hierarchical clustering.
Chapter 10, Evaluating Classifiers, Regressors, and Clusters, evaluates the classifiers and regressors from Chapter 9, Ensemble Learning and Dimensionality Reduction, the preceding chapter.
Chapter 11, Analyzing Images, uses the OpenCV library quite a lot to analyze images.
Chapter 12, Parallelism and Performance, is about software performance and I discuss various options to improve performance, including caching and just-in-time compilers.
Appendix A, Glossary, is a brief glossary of technical concepts used throughout the book. The goal is to have a reference that is easy to look up.
Appendix B, Function Reference, is a short reference of functions meant as an extra aid in case you are temporarily unable to look up documentation.
Appendix C, Online Resources, lists resources including presentations, links to documentation, and freely available IPython notebooks and data. This appendix is available as an online chapter.
Appendix D, Tips and Tricks for Command-Line and Miscellaneous Tools, in this book we use various tools such as the IPython notebook, Docker, and Unix shell commands. I give a short list of tips that is not meant to be exhaustive. This appendix is also available as online chapter.