Visualization of Top-selling 100 Books of All Time (with source code)
The dataset comes from Guardian’s DataBlog I only focused on generating a static visualization with Python.
I wanted to focus on (and visualize) the following factors:
- Publication Date
- Volumea Sold
- Product Class
I decided to use a 2D timeline (date on the x axis and volumes on the y axis), title is shown using the actual book cover (shape), the size of each book cover image is proportional to the price for the book (size). Additionally the border color for each book is a represents the Product class (red for “F” class, yellow for “Y” class and green for “T” class). A slice of the timeline is shown below.
It is easy to produce the same visualization by using author faces instead of book covers
Originally I had combined both of the images above into one and authors’ faces were a little icon on the book cover but it was making the visualization crowded and ugly so I split them into separate figures.
This type of visualization provides an easy way to realize more information about the dataset (e.g., J.K. Rownling writes very successful books but her earlier books have sold much more than her latest books) It seems that in almost all cases the second book of an authors sells smaller number of copies compared to the first book (also holds for Stephenie Meyer the author of Twilight and New Moon)
In the above diagrams, the Y axis was scaled linearly. The figures below remove the linear scale for the y axis so we can present them easier in a timeline form (If a book is placed above another book it means that it has sold more copies but we do not show the magnitude of this difference in these diagrams)
Please download the high resolution copy of these image files from this address. Some of them are large image files so you may need a tool like Google Picasa in order to smoothly navigate through the image.
I wrote a little Python library that uses urllib2 and Python Image Library (PIL). It calls the Google Data API to get the book covers and PIL takes care of scaling and working with images. I did some data clean ups in Excel.
The code reads in the data and generates a png output file. The only change that you need to make is to change the dir variable and point it to your extracted folder. I pulled book covers by using an API and some crowdsourcing. Author images are collected manually (took about 30 mins).
Please download the Python code from here