
Using ggplot2-like plots
Ggplot2 is an R library for data visualization popular among R users. The main idea of ggplot2 is that the product of data visualization consists of many layers. Like a painter, we start with an empty canvas and then gradually add layers of paint. Usually, we interface with R code from Python with rpy2
(I will discuss several interoperability options in Chapter 11, of my book Python Data Analysis). However, if we only want to use ggplot2
, it is more convenient to use the pyggplot
library. In this recipe, we will visualize population growth for three countries using Worldbank data retrievable through pandas
. The data consists of various indicators and related metadata. The spreadsheet at scikit-learn, and we will discuss it in more detail in Chapter 9, Ensemble Learning and Dimensionality Reduction. Unfortunately, this approach has some limitations; in particular, we are not able to pickle all Python objects.
Getting ready
First, you need R with ggplot2 installed. If you are not going to seriously use ggplot2, maybe you should skip this recipe altogether. The homepage of R is documentation of ggplot2 is at pyggplot-23. To install joblib
, visit Anaconda.
How to do it...
- The imports are as follows:
import pyggplot from dautil import data
- Load the data with the following code:
dawb = data.Worldbank() pop_grow = dawb.get_name('pop_grow') df = dawb.download(indicator=pop_grow, start=1984, end=2014) df = dawb.rename_columns(df, use_longnames=True)
- The following line initializes pyggplot with the pandas
DataFrame
object we created:p = pyggplot.Plot(df)
- Add a bar chart with the following line:
p.add_bar('country', dawb.get_longname(pop_grow), color='year')
- Flip the chart so that the bars point to the right and render:
p.coord_flip() p.render_notebook()
Refer to the following plot for the end result:

The code is in the using_ggplot.ipynb
file in this book's code bundle.