Visualization of Top-selling 100 Books of All Time (with source code)

The dataset comes from Guardian’s DataBlog I only focused on generating a static visualization with Python.

Visualization Design

I wanted to focus on (and visualize) the following factors:

  • Author
  • Publication Date
  • Title
  • Price
  • Volumea Sold
  • Product Class

I decided to use a 2D timeline (date on the x axis and volumes on the y axis), title is shown using the actual book cover (shape), the size of each book cover image is proportional to the price for the book (size). Additionally the border color for each book is a represents the Product class (red for “F” class, yellow for “Y” class and green for “T” class). A slice of the timeline is shown below.

Vis 1: A snapshot of the visualization

Vis 1: A snapshot of the visualization

It is easy to produce the same visualization by using author faces instead of book covers

Fig 2: Authors are placed on a Timeline

Fig 2: Authors are placed on a Timeline

Originally I had combined both of the images above into one and authors’ faces were a little icon on the book cover but it was making the visualization crowded and ugly so I split them into separate figures.

This type of visualization provides an easy way to realize more information about the dataset (e.g., J.K. Rownling writes very successful books but her earlier books have sold much more than her latest books) It seems that in almost all cases the second book of an authors sells smaller number of copies compared to the first book (also holds for Stephenie Meyer the author of Twilight and New Moon)

Fig 3: Book covers are placed on a timeline, Y axis shows volumes sold

Fig 3: Book covers are placed on a timeline, Y axis shows volumes sold

Fig 4: Authors are placed on a timeline, Y axis shows volumes sold

Fig 4: Authors are placed on a timeline, Y axis shows volumes sold

In the above diagrams, the Y axis was scaled linearly. The figures below remove the linear scale for the y axis so we can present them easier in a timeline form (If a book is placed above another book it means that it has sold more copies but we do not show the magnitude of this difference in these diagrams)

Fig 5: Book covers placed on a Timeline, Y axis is not a linear representation of the number of books sold anymore.

Fig 5: Book covers placed on a Timeline, Y axis is not a linear representation of the number of books sold anymore.

Please download the high resolution copy of these image files from this address. Some of them are large image files so you may need a tool like Google Picasa in order to smoothly navigate through the image.

Language Used

I wrote a little Python library that uses urllib2 and Python Image Library (PIL). It calls the Google Data API to get the book covers and PIL takes care of scaling and working with images. I did some data clean ups in Excel.

The code reads in the data and generates a png output file. The only change that you need to make is to change the dir variable and point it to your extracted folder. I pulled book covers by using an API and some crowdsourcing. Author images are collected manually (took about 30 mins).

Please download the Python code from here

 

 

Advertisements

~ by marksalen on January 27, 2011.

3 Responses to “Visualization of Top-selling 100 Books of All Time (with source code)”

  1. […] This post was mentioned on Twitter by Y Combinator Newest!, News Bloom. News Bloom said: Visualization of Top-selling 100 Books of All Time (with source code) – http://bit.ly/e3JguD – [Hacker News FH] […]

  2. Nice! I like how you captured 6 data types using a 2-D chart. How much time did this require?

    One question: You said

    Additionally the border color for each book is a represents the Product class (red for “F” class, yellow for “Y” class and green for “T” class)

    . What are product classes F, Y and T?

    Thanks!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: