Logo

Which genres are majority male/female?

A gendered analysis of author (reader) writing (reading) habits.

Gendered preferences in book genre has been studied in the past, but with the Goodreads data available, we have the ability to study two important and distinct questions:

  1. What are the gender splits within a genre? e.g., what share of Goodreads horror authors are male?
  2. What are the genre compositions within gender? e.g., what share of male users enjoy horror books?

The first question lets us look at the gendered composition within any given genre. The one issue with this approach, though, is that all measures will be impacted by the overall gendered composition on Goodreads. For example, if 66% of Goodreads users are female, and if we find that 50% of users who enjoy science-fiction are female, then science-fiction is actually male-dominated (even though there's a 50-50 split). This isn't a big issue as long as we have a baseline comparison, but it makes interpretations a bit annoying.

The second question allows us to study a different subject: the preference composition within a gender. For example, we can look at the share of enjoyed genres, among men only, that horror makes up.

Methodology

Authors can have multiple genres listed on their Goodreads page. For example, Dante has poetry, religion and philosophy listed. In order to capture all these genres, I unnest this array so that there is a author record with each genre. For example, there would be three records for Dante of the form, "('Male', 'poetry'), ('Male', 'religion'), ('Male', 'philosophy')." In order to study meaningful authors and have reliable estimates, I limit authors included to those with at least 100 user ratings, and genre groups to those with at least 100 author records in them.

A similar process is applied to Goodreads users and their "favorite genres." Here, I limit genre groups to those with at least 1000 user records in them, as there was a much larger number of distinct genres for users.

Genre prediction for authors is based on their name, and a pronoun analysis of their Goodreads description. Gender-name data comes from the World Gender Name Dictionary 2.0 Dataset, specifically the location-irrelevant dataset, as I don't have reliable location estimates for both author and user data. For author description, the male and female pronoun instances are counted, and an author is defined as male or female based on which count is greater. Gender prediction based on pronoun analysis takes higher priority over the gender-name analysis. In the case that no pronouns are found, or when the counts are equal, the gender-name prediction is taken. When both are unreliable or null, then the final prediction is null. Only authors with "reliable" predictions are included.

There is no user description available, so all gender predictions for users are based on the gender-name analysis. Users with indeterminate results are not included in the analysis.

For within gender compositions (the pie charts below), I further group genres into broader categories, in order to reduce the number of slices present. For example, genres that match the following regex pattern, "computer|tech|^science$|math", are categorized as "STEM". You can see the rest of the groupings in this script.

Gender Splits


The figure belows the author gender split within different genres. For reference, the author sample is about 48% male, meaning that any genre with a male share greater than 48% would be disproportiantely male. We can see some fairly intuitive, and replicated (in reference to past literature), results. Some of the "most male" genres include subjects like philosophy (84% male), economics (86% male) and military history (99% male). The "most female" genres include subjects like romance (8% male), young adult (25% male) and chick-lit (5% male).

User gender splits are heavily skewed by the fact that only about 23% of the user sample is male. We still see fairly similar trends. There are disproportionately male genres like sports (53% male) and business (49% male), and there are disproportionately female genres like true-crime (18% male) and women's fiction (5% male). In the "middle" are fairly proportional genres like poetry (24% male) and historical fiction (22% male).

Within-Gender Composition


The within-gender composition allows us to ignore the overall gender split of our samples, and study preferences for each gender. The figure below shows genre compositions for male and female authors. Aside from the "other" category, the most common genres for male authors are mystery/crime (10%), sci-fi/fantasy (10%) and non-fiction (8%). The most common genres for female authors are romance/erotica (16%), sci-fi/fantasy (9%) and fiction (8%).

Some notable differences between male and female authors can be seen in genres like romance/erotica (2% for men, 16% for women), history (6% for men, 1% for women), and manga/comics (4% for men, 1% for women).

Compositions for male and female users are a bit more similar. Sci-Fi/Fantasy is the most common genre for men to read (12%), while mystery/crime is the most common for women (14%). Romance/erotica is one genre with notable differences in consumption by gender (3% for men, 7% for women). On the other side, manga/comics is more common among men than women (4% for men, 2% for women).

Conclusion/Further Work


In this brief analysis, we show both differences in representation for authors and users, as well as differences in preferences. One further extension that would be worth exploring is simply looking at the rate of some genre being present in a user's favorite genres. For example, what share of male users have romance in their favorite genres, and what share of female users have STEM in their favorite genres? The current analysis doesn't really get at this question, as it rather looks at what the most common genres users read are.

Still this analysis shows differences in author and user representation in different genres, like the male-dominated philosophy and sports, and the female-dominated romance and young-adult fiction.