Categories: Visualizations

The language of tweets

Reading Time: 3 mins

This post is part of the output of the Bangalore Fifth Elephant Hacknight.

What you see above are the words most often used on Twitter by Indians. (Click for a larger image). The size of the bubble indicates how often the word is used.

We were looking at whether there are specific words that people with a large number of followers use, that are distinct from people with few followers. The words on the left (also coloured red) are used mainly by people with few followers. The words on the right (also coloured green) are mainly used by people with many followers.

(At this point, it’s worth discussing the dataset. These are 1 week’s worth of geocoded tweets, mainly around India (but including Pakistan, Nepal, etc.) It’s interesting that there were just 80,000 geocoded tweets in this period – and many of them were FourSquare entries.

It’s interesting that people )with low followers often talk about “know”, “high” and ‘”traffic”. People with many followers have significantly more hashtags. Whether this is a cause or an effect of having many followers is, of course, debatable. But the correlation is quite definite.

It also appears that those with more followers are polite. The “good morning”s and “thank you”s are quite to the right. Those with more followers are more likely to say “good” than “bad”, and vice versa. Perhaps there’s something about having Twitter followers that leads to happiness – or is it the other way around?

This picture shows you the words more often used in replies (on the left, in red) when compared to new tweets (on the right, in green).

“haha” and “lol” appear rather prominently in replies. Either folks who reply are an amused bunch, or it’s the funny tweets that get more replies. A lot of replies are also to thank people. The dominance of Mumbai, Maharashtra and Delhi on the right is easiest explained by the presence of the words “@foursquare” and “mayor” – most of these tweets appear to be FourSquare related.

The above shows the words used in the morning (up to 12 noon) vs the evening. Clearly, people mention “morning” in the morning – often, but not always, in the context of “good morning”. The evenings were, at least on this week, were dominated by Euro 2012.

The visualisation used above is a document contrast diagram. Each word is drawn as a bubble, whose size represents its frequency. The horizontal position determines whether the word is closer to one aspect or another – e.g. replies on the left vs new tweets on the right. This is a very quick and easy way of understanding what characterises an aspect (e.g. which words are often used with good vs bad), as well as the context in which words are used.

Gramener - A Straive Company

Gramener – A Straive company is a design-led data science firm. We build custom Data & Al solutions that help solve complex business problems with actionable insights and compelling data stories.

Leave a Comment

View Comments

  • twitter download does not give all tweets but only a random set. So, is it surely 80,000 or only twitter knows true number. Also, "good" or "bad" is not distinguishing polite from impolite. In fact low followers types use "love" and "best" more...just amusing, not shattering insights

    • GK Singh, these are geo-coded tweets from their Streaming API, and is a complete set. The reason it's about 80,000 is because most tweets in India are not geo-coded.

      Agreed on "good" and "bad" not distinguishing polite from impolite. A lot more textual analysis needs to go into a deeper understanding. In the 6 hours we spent at the hacknight, we were focusing on prototyping a visualisation that could bring out the insights and demonstrate the process end-to-end -- rather than dive deep into any one area such as text analytics.

Share
Published by
Gramener - A Straive Company

Recent Posts

How is AI Transforming Cold Chain Logistics in Healthcare?

In 2022, Americans spent USD 4.5 trillion on healthcare or USD 13,493 per person, a… Read More

26 mins ago

Top 7 Benefits of Using AI for Quality Control in Manufacturing

AI in Manufacturing: Drastically Boosting Quality Control Imagine the factory floors are active with precision… Read More

4 days ago

10 Key Steps to Build a Smart Factory

Did you know the smart factory market is expected to grow significantly over the next… Read More

2 weeks ago

How to Future-Proof Warehouse Operations with Smart Inventory Management?

Effective inventory management is more crucial than ever in today's fast-paced business environment. It directly… Read More

1 month ago

Gramener Bags a Spot in AIM’s Top Data Science Service Providers 2024 Penetration-Maturity (PeMa) Quadrant

Gramener - A Straive Company has secured a spot in Analytics India Magazine’s (AIM) Challengers… Read More

3 months ago

Gramener Wins Nasscom AI Gamechangers 2024 Award for Responsible AI

Recently, we won the Nasscom AI Gamechangers Award for Responsible AI, especially for our Fish… Read More

4 months ago

This website uses cookies.