Python Data Analysis Cookbook
上QQ阅读APP看书,第一时间看更新

Using ggplot2-like plots

Ggplot2 is an R library for data visualization popular among R users. The main idea of ggplot2 is that the product of data visualization consists of many layers. Like a painter, we start with an empty canvas and then gradually add layers of paint. Usually, we interface with R code from Python with rpy2 (I will discuss several interoperability options in Chapter 11, of my book Python Data Analysis). However, if we only want to use ggplot2, it is more convenient to use the pyggplot library. In this recipe, we will visualize population growth for three countries using Worldbank data retrievable through pandas. The data consists of various indicators and related metadata. The spreadsheet at scikit-learn, and we will discuss it in more detail in Chapter 9, Ensemble Learning and Dimensionality Reduction. Unfortunately, this approach has some limitations; in particular, we are not able to pickle all Python objects.

Getting ready

First, you need R with ggplot2 installed. If you are not going to seriously use ggplot2, maybe you should skip this recipe altogether. The homepage of R is documentation of ggplot2 is at pyggplot-23. To install joblib, visit Anaconda.

How to do it...

  1. The imports are as follows:
    import pyggplot
    from dautil import data
  2. Load the data with the following code:
    dawb = data.Worldbank()
    pop_grow = dawb.get_name('pop_grow')
    df = dawb.download(indicator=pop_grow, start=1984, end=2014)
    df = dawb.rename_columns(df, use_longnames=True)
  3. The following line initializes pyggplot with the pandas DataFrame object we created:
    p = pyggplot.Plot(df)
  4. Add a bar chart with the following line:
    p.add_bar('country', dawb.get_longname(pop_grow), color='year')
  5. Flip the chart so that the bars point to the right and render:
    p.coord_flip()
    p.render_notebook()

Refer to the following plot for the end result:

The code is in the using_ggplot.ipynb file in this book's code bundle.