Looking at Ukraine through the messages of Twitch users

Looking at Ukraine through the messages of Twitch users

Introduction

Global events significantly influence online conversations, and advanced technologies enable us to analyze these discussions to understand public sentiment and behavior. In this series, we examine how Twitch users’ discussions about Ukraine have evolved.

The conflict in Ukraine, beginning in early 2022, has been a significant global event that has sparked intense discussions across various online platforms. We chose to focus on Ukraine for our analysis due to its profound impact on global discourse. By leveraging the capabilities of Sentinel, we aimed to track and analyze the volume and sentiment of Twitch messages related to Ukraine, providing valuable insights into public perception and the influence of key events.

We created a dedicated project in Sentinel to ensure effective streaming and processing of messages, capturing all relevant flags including sentiment and various forms of toxicity.

In this first post of the series, we aim to provide an overview of the project, including the background and importance of analyzing online conversations about Ukraine. We will outline the data collection and preprocessing steps, and present an initial analysis of message sentiment and volume. This will set the stage for deeper insights into public sentiment trends in the subsequent posts.

Volume of Messages

We handle around 5 billion stored messages, with activity peaking during American time zones, reflecting high engagement. Our system processes between 7 to 10 million messages daily, with real-time ingestion capabilities handling up to 600 messages per second.

Data Collection and Preprocessing

We accessed Twitch data and applied filters to isolate messages related to Ukraine. This involved using keywords associated with significant events and terms like “Ukraine” and “Bayraktar.” Here is a sample message we see from Twitch:

{
  "message": "Hello world!",
  "metadata": {
    "id": "dacfeab4-9b9b-11eb-b515-00155d193704",
    "channel": {
      "id": "1",
      "name": "esl_csgo",
      "type": "channel"
    },
    "sender": {
      "id": "1",
      "name": "esl_csgo",
      "type": "individual"
    },
    "recipient": {
      "id": "1",
      "name": "esl_csgo",
      "type": "channel"
    },
    "timestamp": "2022-03-22T18:59:44.155Z"
  },
}

The Message Section

  • “message”: This key holds the main content of the data, which is the greeting message "Hello world!".

The Metadata Section

The “metadata” section provides additional information about the message.

  1. ID

    • “id”: This unique identifier for the message is "dacfeab4-9b9b-11eb-b515-00155d193704".
  2. Channel Information

    • “channel”: Details about the communication channel where the message is sent.
  3. Sender Information

    • “sender”: Information about who sent the message.
  4. Recipient Information

    • “recipient”: Information about who received the message.
  5. Timestamp

    • “timestamp”: The date and time when the message was sent, formatted as "2022-03-22T18:59:44.155Z".

We then perform analysis and add these sections, as shown below:

  • toxicity: Whether the language is detected as being racist, sexually explicit, threatening, or insulting.
  • duplicates: How many times this message has been seen before across all of Twitch, useful for identifying spam and bots.
  • sentiment: Whether the message is deemed positive, neutral, or negative.
"analyses": {
    "toxicity": {
      "total": 1.106,
      "identity_attack": 0.005,
      "insult": 0.211,
      "obscene": 0.333,
      "severe_toxicity": 0,
      "sexual_explicit": 0.554,
      "threat": 0.003
    },
    "duplicates": {
      "duplicationKey": "2489c30d4b92cef309ace080fd2ff97d",
      "duplicateCount": 0
    }, 
    "sentiment": {
      "neg": 0.1,
      "neu": 0.1,
      "pos": 0.1,
      "compound": 0.3
    }
  }

Now that we understand our message object and how Sentinel adds relevant analysis, we are ready to move on to sense checking our data.

Top Sentiments

A sense check of our data reveals a variety of sentiments. Below are tables highlighting the top three negative and positive sentiments captured:

Top 3 Negative Sentiments

Name Message Sentiment
Azthir fuck war asmonsmash fuck urss asmonsmash fuck putin fuck war asmonsmash fuck urss asmonsmash fuck putin fuck war asmonsmash fuck urss asmonsmash fuck putin fuck war asmonsmash fuck urss asmonsmash fuck putin fuck war asmonsmash fuck urss asmonsmash fuck putin fuck war asmonsmash fuck urss asmonsmash fuck putin -1.00
erafor9 vladmir putin war criminal pog vladmir putin war criminal pog vladmir putin war criminal pog vladmir putin war criminal pog vladmir putin war criminal pog vladmir putin war criminal pog vladmir putin war criminal pog vladmir putin war criminal pog vladmir putin war criminal pog vladmir putin war criminal pog -1.00
russiancoiiusion fuck palestine fuck palestine fuck palestine fuck palestine fuck palestine fuck palestine fuck palestine fuck palestine fuck palestine fuck palestine fuck palestine fuck palestine fuck palestine fuck palestine fuck palestine fuck palestine fuck palestine fuck palestine fuck palestine fuck palestine fuck palestine -1.00

Top 3 Positive Sentiments

Name Message Sentiment
whiteguard17 <3 you so cute <3 russians love you <3 you so cute <3 russians love you <3 you so cute <3 russians love you <3 you so cute <3 russians love you <3 you so cute <3 russians love you <3 you so cute <3 russians love you <3 you so cute <3 russians love you <3 you so cute <3 russians love you <3 you so cute <3 russians love you <3 you so cute <3 russians love you <3 you so cute <3 russians love you <3 you so cute <3 russians love you <3 you so cute <3 russians love 1.00
rusuaaa russian best <3 russian best <3 russian best <3 russian best <3 russian best <3 russian best <3 russian best <3 russian best <3 russian best <3 russian best <3 russian best <3 russian best <3 russian best <3 russian best <3 russian best <3 russian best <3 russian best <3 russian best <3 russian best <3 russian best <3 russian best <3 russian best <3 russian best <3 russian best <3 russian best <3 russian best <3 russian best <3 russian best <3 1.00
rusuaaa russian best <3 russian best <3 russian best <3 russian best <3 russian best <3 russian best <3 russian best <3 russian best <3 russian best <3 russian best <3 russian best <3 russian best <3 russian best <3 russian best <3 russian best <3 russian best <3 russian best <3 russian best <3 russian best <3 russian best <3 russian best <3 russian best <3 russian best <3 russian best <3 russian best <3 russian best <3 russian best <3 russian best <3 1.00

From looking at the tables, sentiment is being analysed effectively. Negative sentiment has successfully picked up users messaging expletives against the vladimir and palestine. On the other hand, the positive sntiment successfully shows messages praising Russia.

User Analysis

Next, we examined the top users based on total toxicity and sentiment. By collecting and analyzing user messages, comments, and posts, we assigned sentiment scores to each piece of content, ranging from -1 (very negative) to +1 (very positive), with 0 being neutral. We also evaluated toxicity levels on a scale from 0 (non-toxic) to 1 (highly toxic).

By summing these scores, we ranked users based on their total sentiment and toxicity, highlighting those with the highest levels of positive or negative sentiment and the most toxic content.

The total sentiment scores are particularly insightful as they help us identify significant changes or spikes in user behavior. For example, a sudden increase in negative sentiment might indicate a reaction to a specific event, which is crucial for spotting trends.

Identifying these trends allows us to understand the underlying causes, such as community dynamics, external events, or the impact of platform policies. This information enables us to take proactive measures to manage community health. For instance, if we observe increasing toxicity, we can implement stricter moderation or introduce features to promote positive engagement.

Why Total Toxicity is Effective

In our analysis, we focus on total toxicity because it provides a clear picture of harmful behavior, ensuring that we don’t overlook spikes of negative activity. Unlike using an average or median score, which can dilute the impact of extreme behavior, total toxicity captures the full extent of harmful content generated by users.

By summing up the toxicity scores for each user, we can identify those who contribute the most disruptive content. These spikes of bad behavior are exactly where we need to act to protect the community. High total toxicity scores highlight users who consistently engage in harmful behavior, making it easier for us to pinpoint and address the root causes.

This approach is crucial for maintaining a healthy and positive online environment. By focusing on total toxicity, we can take targeted actions against users who are having the most significant negative impact. Whether it’s through stricter moderation, implementing new community guidelines, or providing support for affected users, addressing these spikes is key to safeguarding our community.

In summary, total toxicity is effective because it ensures that we don’t miss critical instances of bad behavior. It allows us to act swiftly and decisively to mitigate harm and promote a safer, more positive platform for all users.

We aggregate messages and output the day of the week and time their messages are most negative or positive, this can be really helpful in deciding actions to take against a user.

Top 3 Negative Users

Name Total Negative Sentiment Median Sentiment Message Count Normalized Negative Sentiment Sentiment Day Day of Week Time
nickname_matters -175.43 -0.98 206 -0.85 -0.99 2023-06-23 Friday 21:36:43.395000
studaks -100.99 -0.75 125 -0.81 -0.96 2022-04-04 Monday 22:21:27.728000
QQ1949192414188 -85.11 -0.99 106 -0.80 -1.00 2022-04-14 Thursday 15:26:20.509000

Top 3 Positive Users

Name Total Positive Sentiment Median Sentiment Message Count Normalized Positive Sentiment Sentiment Day Day of Week Time
garbage445 348.60 0.95 370 0.94 0.99 2023-07-26 Wednesday 18:53:38.544000
Saiavakie 119.52 0.95 128 0.93 0.95 2021-06-17 Thursday 20:15:57.451000
ONeKpacaB4uK 150.79 0.96 162 0.93 0.99 2023-03-11 Saturday 16:39:27.420000

Message Volume Over Time

We created a graph to visualize the count of messages over time, highlighting key dates such as the start of the war, the Battle for Severodonetsk, the one-year anniversary, and US-Ukraine discussions. The biggest spike coincides with the start of the war, with another noticeable increase around the anniversary.

alt text

We analyzed the number of messages over time to understand how significant events influenced user engagement. The graph below shows message activity from January 2021 to May 2024, highlighting key moments that sparked increases in messaging.

The vertical axis shows the number of messages posted, ranging up to 80,000, while the horizontal axis represents the timeline. The blue line traces daily message counts, revealing several significant spikes. These spikes correlate with major events, marked by red dashed lines, such as the start of the war in early 2022, the Battle for Severodonetsk, the one-year anniversary of the war, and crucial US-Ukraine discussions in early 2024.

Our findings indicate that the start of the war triggered the highest user activity, with message counts peaking dramatically. This initial spike underscores the immediate impact of significant geopolitical events on online discourse. Subsequent spikes around key events, like battles and anniversaries, show that users remain highly engaged during pivotal moments.

Post-war, the number of messages decreased but still exhibited periodic increases around significant dates. These patterns suggest ongoing discussions and sustained interest in the topic. For companies focused on sentiment analysis, understanding these trends is crucial. It allows us to pinpoint when and why users are most engaged, providing deeper insights into user behavior and the factors driving online conversations.

By analyzing message spikes and their correlation with key events, we can better understand user sentiment and engagement, helping to create a more responsive and informed community management strategy.

To make the graph clearer, we smoothed the data by grouping the messages into weekly periods. This approach helps in identifying trends more easily.

alt text

The smoother line in this graph offers a clearer view of the overall trends, eliminating some of the noise present in the daily data. This makes it easier to see the sustained increase in messages around key events and the gradual return to baseline levels of activity.

You can see here from the events:

  • Start of the War: A dramatic increase in messages occurs around early 2022, reaching the highest peak, similar to the first graph.
  • Battle for Severodonetsk: A notable spike is seen around mid-2022.
  • One Year Anniversary: Another increase in message activity is observed around early 2023.
  • US-Ukraine Discussions: A smaller, but still significant rise, is visible around early 2024.

This finding shows clearly a correlation which warrants further investigation. Next post we will look at building a graph and broader picture of the sentiment collected on the data.

Conclusion

In this first post, we set the stage by explaining our project’s background, data collection process, and initial analysis of message sentiment and counts. The next post in this series will delve deeper into plotting sentiment values over the entire data set, providing more insights into public sentiment trends. Stay tuned!

For more information about Sentinel and our real-time content analysis capabilities, visit Sentinel’s Main Product Site.