Analysing YouTube Comments — Stuff Made Here
Introduction
Shane Wighton’s Youtube channel Stuff Made Here is one of my favourite Youtube channels. His is an engineering-focused channel, where he makes videos on various innovative inventions. I have been watching his videos since he started back in March 2020. If you haven’t yet, I will definitely recommend you to check his content.
On November 26, 2020, Stuff Made Here uploaded a video Making an unpickable lock. Calling locksmiths. In that video, Shane made a lock using interesting techniques, which proved to be unbreakable by a local locksmith. His wife suggested that he should send it to LockPickingLawyer (a Youtuber popular for picking numerous types of locks). LockPickingLawyer and Shane talked to each other and Shane decided to send an “improved” version to the LockPickingLawyer. It took him around 6 months to improve and send it. As it was one of the most anticipated Youtube crossovers, everyone was asking Shane about it.
When he finally uploaded the video about it titled TWO Unpickable (?) Locks for Lock Picking Lawyer!, he talked about the attention the idea received. He mentioned that it was difficult to count the number of times LockPickingLawyer was mentioned. That gave me the idea to count the number of times LockPickingLawyer was mentioned.
Visualizations
Word - Count table Stuff Made Here comments
Wordcloud from Stuff Made Here comments
Interesting Words
One of the things I noticed during the project is the number of typographical errors people make with ‘lockpickinglawyer’. Other than that, some interesting words were releaselplcut, teamlockpickinglawyer, unpicklockeble.
Making the vizualizations
I made all the visualizations using Python and various libraries. I used youtube-comment-downloader to fetch all the comments into JSON files. I used Natural Language Toolkit to tokenize, count and filter the words.
|
|
The Wordcloud was also made in Python using the Wordcloud library.
|
|
The other image (the Word-Count table, I manually typed in Google Docs).
The source code for the complete project is available on Github.
The data was fetched on June 4, 2021.