Twitch’s attempt to change its sexist chat culture has a ways to go

By: Patrick W. Zimmerman

The Twitch community still doesn’t treat its male and female stars the same, but it’s been trying to change that. Twitch, the 500lb gorilla of game streaming platforms, has both made news for some of its toxic, misogynist elements as well as its attempts to improve the behavior of its chat communities.

So how’s that going? Eeeeeeeeeeehhhhhhhhh……..

Comparing a sampling of popular streams on the platform, not only are female streamers insulted more, their audience also spends much more time paying attention to their looks, discussing their personality, wondering about (or discussing) their gender identity, talking about relationships, and using crude or explicit sexual language.

Through the language used by their audiences (specifically, in the IRC chat streams accompanying every broadcast), streamers elicit a good snapshot of the attitudes and expectations held by the community in which they reside: Twitch gamers and game fans.  That is to say, gender norms, assumptions, and biases (if present) will be detectable through differential language use between a representative sampling of female and male streamers.

How do expectations, ideologies, and cultural norms about gender express themselves on Twitch?

At its most inoffensive, this set of gender expectations can play out in a chat audience with compliments about a streamer’s hair or other physical features, or the announcement of the utterly fascinating discovery that the streamer is, in fact, a girl.  Playing video gamesOnline.  🤯

At its most vulgar, women streamers see a depressingly high amount of straight-up explicit commentary and blatant propositions.


The question

How does the gender of the streamer affect the frequency of insulting, infantilizing, gendered, and explicit language in Twitch chats?

Are there signs that Twitch’s attempts to promote inclusiveness and diversity in its streams (and its stream audiences) has been working?  Or does culture change take, you know, a lot of work over a long period of time?



Warning – This project will contain a lot of offensive and horribly hate-filled language as a necessary result of the subject of the study being the use of exactly that language in a particular online gaming community (Twitch.tv). 

Misogyny and swearing lie ahead. Ye be warned.


The short-short version

News flash: there’s still a double-standard for men and women. Our pilot study was on to something; the aggregate data over the course of the full month looks a lot like the quick-hit version of this project.

We logged the chats for 96 streams for a month (from late February to late March), totaling 1,440,392 users in the audience producing a 230,303,616-word corpus. The uptick in the frequency of our tested terms across all categories in chats for female streamers is stark and highly significant (no, seriously, your HS stat class didn’t even bother to list p-values this low). 

While (obviously) not a problem restricted to Twitch, the evidence is pretty damning that it still has quite a ways to go before its community reflects the company’s (stated) goals of diversity, openness, and obsessive enjoyment of games and gaming for one and all.

Term types with the strongest skew:

  • Butt comments (booty, bootie, or butt, but not poopbutt, which in Twitchese is “a player who’s sitting on their butt and not playing fast enough.”). This is a bit surprising given that the vast majority of webcams used in streams are head or torso shots, so no butts (except the characters’ on the gaming screen, if a 3rd-person game) in view.
  • It’s a girl! (girl, chick, woman, lady). Shocking, we know.
  • Comments (often complimentary) about hair (hair, wig, or look).
  • Stereotypically female personality tropes (cute, ditz, kawaii).
  • Body critique or compliments (fat, chub, lard, hot, thicc, pretty)

In order to avoid a biased test sample, we also included terms associated with male bodies, personalities, and sexuality, such as beard, ripped, fit, cuck (cuckold), dick, handsome, and assholeAlmost none of these were particularly common, even on male streams.

Yes, that’s right.  penis, asshole, and handsome actually show up (very very slightly) more frequently in the chats of female streamers.


Results

This box plot should concern Twitch.  

That’s a huge, huge skew of the dataset towards female gamers.  Broken down by term (≥1% normalized term frequency, see details in methodology section) and stream, women get more flak than men do, regardless of term category.  Go ahead, select for the term of your choice (or the category).  We’ll wait.

Mouseover for details.

Female streamer median: 4.47% normalized frequency across all terms and streams.

Male streamer median: 1.66%.

As you probably can tell just by eyeballing it, this result is significant (p < 0.00001. Test details below).

Note, since these are broken down by individual streams, the relative frequencies are calculated within each individual stream corpus, rather than across the entire dataset. Thus, there will be some variation between these numbers and those of the aggregate dashboards below (which uses percentages for each term across the entire dataset for each gender).


Aggregated across all streams, exactly 2 of our 24 qualifying test terms were more common among male streamers (in the chart below, below the reference line is more common in male streamer chats, above is more common in female streamer chats).

Mouseover for details.

  • xxx or porn stands out as the only term that really skews much towards the male streamer group.
  • Generically insulting someone’s intelligence has juuuuuuuust a touch higher normalized frequency among male streams than female ones, but not enough to read a whole lot into.

Are bar graphs more your thing?  Then check out the breakdown of median frequency (all terms) broken down by stream. 

Mouseover for details.

That’s a whole lot of pink on the right.  At least at this relatively high threshold of streamer popularity, there doesn’t seem to be any correlation between stream size and offensive language. 

Note: the x-axis doesn’t really mean anything, it’s just an ordering of streams from lowest median frequency to highest.


We also assigned each of the terms tested (both those that made the threshold and those that didn’t) into one of 5 categories:

  • Gender identity (discussing, questioning, or wondering about the gender, sex, identification, or orientation)
  • Gaming-ungendered (general comments and insults about the gameplay itself)
  • Personality
  • Physical appearance (either someone’s body, clothing, style, or the like, except for genitalia or as part of a term categorized as a sex act)
  • Relationships, sex acts, or explicit terminology (basically, anything that either describes a relationship or would get an x rating).

Mouseover for details.

And, yup, same story.  Female streamers get a way higher percentage of problematic language, across every single category tested.

In fact, only one male category occurs more often than the lowest female one. That’s gender identity (which just edges out female personality traits), and it shouldn’t surprise anyone that homophobic slurs are thrown around at both sexes.


Want to see the full list of terms tested? None of this “threshold” buuuulll?

Ok. Here you go, in convenient parsable table form. The story doesn’t change any. You can see that even the clearly male-coded terms (for example, beard) don’t show up particularly often.

Our original list was culled from the top 1000 most frequent ngrams found in our pilot study, focusing on insults and dismissive or infantilizing language in the context of gaming. That is to say, a discussion of someone’s hairstyle in the context of a fashion show: not belittling. Same discussion while in a PlayerUnknown’s Battlegrounds match: much more problematic.

We then expanded the list with synonyms and then debugged for as many confounds as we could find (for example, if you searched simply for body, you would also pick up hits on everybody, somebody, anybody, and the like.

Mouseover for details.

Note that many of the terms that didn’t make the cutoff would actually increase the strength of the dataset, with body and (slut|whor|skank| ho[^a-z]) both showing up far more often with women streamers, but both falling just short of the required 1% relative frequency (at 0.91% and 0.89%).


Methodology

We selected 50 popular female and male streams to use as our samples, all of whom have at least 9,000 followers, many of whom have numbers in the 100,000s. Four of the female streamers went dark during the course of the study (for whatever reason) and were removed from the sample set.

  • To be selected, a streamer needed to be streaming in English (‘cause writing an NLP script that handles multiple languages simultaneously is not something we wanted to deal with).
  • The person had to be streaming actual live gameplay, to keep the different chat contexts similar.
    • That means, no game debates, reviews, draw-alongs, or Twitch IRL streamers.
  • We intentionally selected both male and female streamers playing a variety of games (shooters, real-time strategy, card, MOBA, RPG, turn-based strategy, platformer, etc).
    • We didn’t narrow it down to a particular game or genre for a number of reasons, including a desire to get as representative a swath of gamers (and their audiences) as possible.
    • There’s no real way to control for streamers switching games over the course of the test period (2 weeks), as many almost certainly did. There is no log in the chat of game played in-stream (other than the organic mention of the game name by audience members). The assumption with which we are working is that game changes in a population of widely-spread genres re-shuffled in a non-predictable way.

We then logged all the chat transcripts with irssi bots using Twitch’s API (if you’re interested in doing this yourself, Crunchprank has a very easy to follow starter guide).  from 21 February 2018 to 21 March 2018.

The resulting text files, which were huge, were pruned of system messages (join messages, user counts, etc). The resulting dataset was about 1.3GB worth of plaintext files comprising 1,440,392 users (216,341 users in female streams, 1,224,051 in male streams). The male streamers were consistently more active and (thus) had a much higher volume of chat activity.

Term frequencies for male and female streamers for our test list were calculated using our suite of bash scripts, expressed as a relative percentage compared to the most common non-name term used (as mentioned above, lul), which was set as equal to 100%.

The threshold for terms to be included (in all but the all terms dashboard) was set at ≥1% of the most common term in either the set of male or female streams, as a shorthand for “common enough that someone would notice/care/be affected by/be bothered by it.” And that’s a pretty low threshold. 1% relative works out to (in the male streamer dataset) ~30k out of 189.5M or 0.01609% of the words in the corpus.

We used a two-tailed Mann-Whitney U test (because we didn’t want to assume a normal distribution) to test for significance at the α=0.01 level. The resulting U was 73842.5 and the z-score was 11.20234 (±2.58 is the threshold). Correcting for multiple testing (Bonferroni), with 24 testing terms, our p-value of p<0.00001 was still lower than the critical threshold of p<0.000417. 

What potential biases were introduced into our sample?

  • Sample stream size
      All our streams were relatively large, so even though they showed no correlation between activity (as measured by wordcount) and test term frequency level, different sized chat audiences might act in systematically different ways.
    • Our male streamers got much larger audiences (and, thus, activity levels) than our female ones. We corrected for differing chat corpus size through the use of normalized term frequency (as opposed to absolute frequency), but there’s a chance that smaller-sized male stream chat audiences would act differently.
  • Sexual breakdown of audiences. While it’s possible (even probable) that there was an uneven distribution of men and women across the two groups of corpora, gender information of streamers is not readily identifiable in chat, either for the participants or for observers (it’s not contained in any kind of metadata).
  • Sample date. There might be something particular in the gaming calendar about February and March that biases the sample. Though this seems pretty unlikely.
  • The ngram test list. It is quite possible that we missed some term, either because of sampling or imperfect leetspeak, which is alive and well on Twitch.

In short: random chance is certainly not behind these resultts, but follow-up studies are needed to be sure we didn’t miss something due to stream size, game genre, or something of the like.


What’s it all mean?

People are still jerks. Next question!

Or, more seriously, it means that social change is hard to engineer, even when you control the platform, the context, the financial incentives, and (to some extent) the content of a community. Twitch can make its community more welcoming to gamers of all chromosomal alignments, but there is likely no quick fix. It will take a concerted and systematic effort from the company and (probably) involve as much of a human solution as a technological one. An active team of moderators, an incentive structure that rewards community members who lead by example, and a focus on the early establishment of fanbases, subcommunities, and stream subscriber bases. Community cultures, once set, have a great deal of inertia….but this can be taken advantage of to prime new groups in certain ways.


What’s next?

Is it just a group of bad actors, or is the language use relatively evenly spread throughout the community?

We’ve got a script running right now to chunk this dataset by username, looking both for users who stand out for their language and who appear in the chats of multiple streams in our dataset. It’s running, but dayam does it take forever to:

  1. Pull a list of usernames, both a master list for each gender corpus, then a collated list for each stream corpus
  2. Search for the number of streams each username appears in.
  3. Extract a corpus (rest of line) for each occurrence of that username.
  4. Then look for our list of terms in each of those corpora.
….for each of the 1.4M users.

As we said, the server gremlins are not happy about the amount of overtime they’re putting in these days. And they’re organized.

About The Author

Architeuthis Rex, a man of (little) wealth and (questionable) taste. Historian and anthropologist interested in identity, regionalism / nationalism, mass culture, and the social and political contexts in which they exist. Earned Ph.D. in social and cultural History with a concentration in anthropology from Carnegie Mellon University and then (mostly) fled academia to write things that more than 10 other people will actually read. Driven to pursue a doctorate to try and answer the question, "Why do they all hate each other?" — still working on it. Plays beer-league hockey, softball, and soccer. Professional toddler wrangler. Likes dogs, good booze, food, and horribly awesome kung-fu movies.

No Comments on "Twitch’s attempt to change its sexist chat culture has a ways to go"

Leave a Comment

Your email address will not be published. Required fields are marked *