Logo

Methods

Sources, Transformations, Limitations

Repository

You can find all the scripts/code used in this project in the guide2kulchur repository.

Data Sources

Goodreads

The majority of the visualizations use data publicly available from Goodreads. Goodreads has a large number of attributes for each book and author on their site. Specifically, the following fields are shown in the visualizations:

The "Similar" field warrants further discussion. This field isn't necessarily the authors/books most similar to a given/author book. More specifically, the language used is the following: Rather than "similar" implying similarity in content or style, "similar" here means similarity in readership. While there'll be some correlation with content and/or style, it's important to clarify this, as you may be confused when you see in the individual author networks that Plato is "similar" to Ernest Hemingway and Rick Rubin. This definition of "similar" still provides a meaningful data source to study author relationships, specifically in studying their modern readerships and their reading behavior.

Nominatim

In order to make map visualizations, Nominatim was used to find the coordinates of an author's place of birth. This process is not perfect of course, as author birthplace strings are often either incomplete in scope (e.g., "Buffalo"), too broad to generate specific coordinates (e.g., "United States of America"), or could return results from an entirely different location due to the string (e.g., "Georgia"). Despite these limitations, this service allowed for author and book maps with a broad range of authors and books, both in relation to time and geography.

Wikidata

Unfortunately, a large number of older authors did not have birth dates on record in Goodreads; further, a programming error caused authors with BC era birthdates to be coded as having unavailable birthdates. To partially alleviate these issues, Wikidata was used to find the birthdates and birth places for a number of older authors, like Hesiod and Homer. This process was also not perfect, as joins were made on an author's name, which could incorrectly match an author from Goodreads to an author in the Wikidata database.

Visualizations

kepler.gl

For all map visualizations, kepler.gl, a "data-agnostic, high-performance web-based application for visual exploration of large-scale geolocation data sets" made by Uber, was used. This open source application was very easy to work with, and its high-performance backend allowed for a large number of markers and marker data displayed on a single map. I'd recommend it to others, and will definitely use it again in the future on other projects.

ipysigma

ipysigma, a Jupyter widget made by médialab Sciences Po that uses sigma.js and graphology under the hood, was used to make graphs of author and book networks. Again, I'd definitely recommend this tool and will be using it again.

Limitations

While a number of limitations have already been mentioned, I'll mention a few more here: