Insights into the Twitterverse: Benchmarking and analysis twitter content

0 %
100 %
Information about Insights into the Twitterverse: Benchmarking and analysis twitter content
Business & Mgmt

Published on March 17, 2014

Author: stephendann



The 2014 Remix of the Twitter Content Classification framework now featuring statistics, radar plots, Linguistic Inquiry Word Count, Leximancer, network plots and more opportunities to run maths, stats and graphs than ever before.

Exploring the Streams of Online Community Conversation: Insights into the Twitterverse @stephendann Australian National University

Usual House Rules @stephendann for questions #anzmac13 for commentary

A little context The Past • Dann (2010) – Six top level twitter categories – 23 sub domains • Dann (2011) – Six top level – 28 sub domain The Present • Dann (Today) – Six Top Level Categories • No sub domain analysis – Secondary Processing • Leximancer • Linguistic Inquiry Word Count

Twitter Analysis 2.0.14 The Procedure

Acquire Research Question • Does Event X change the tweeting patterns of Account @Y? • Do responses to the #hashtag event change over time? – #EventTags in Time Period A will have more Status than in Time Period D – Time Period D will have more Pass Along than Status • What were they thinking? – Dominant Categories of tweets over time within a selected account • Do comments change by platform for account @X? – mobile versus web versus desktop • Does @BrandX engage with the community? – Conversational over all other types over capture time period

Acquire your data • Personal timelines – Download from Twitter • #Hashtag captures – Hootsuite • Time line captures – Choose your own adventure – Getting worse, harder and Twitter’s API is less available. • Try to avoid big data

Big Data • If you are Axel Bruns, fine, continue – • For everyone else, what are you looking for? – What sample suits your research question?

Process your data • Stand by for ugliness and manual coding* – Extract data into Excel • Excel allows for additional data inputs as you progress the analysis – Keep tweet visible • Only keep a column visible if it fits your research question – Eg date, time, @user, platform – Add column for Tweet ID, category, cat_n • Sub category, sub_cat_n for the detailed version *Automated coding? People are working on it. It’s a terrible idea that’ll happen anyway

Manual Coding • Use the Dann (2010) or Dann (2011) top level domains – Dann (201X) is under development • I broke something important earlier this year • Manual coding is superior – Nuance and interpretation counts.

Pick a box 1 Conversational Uses an @statement to address another user 2 News Events Identifiable news content 3 Pass along Tweets of endorsement of content 4 Phatic Content independent connected presence 5 Status Tweets which address the statement "What are you doing?" and "What's happening?" in terms of an account holder's experiences 6 Spam Unsolicited content

Keep it on manual Conversational Uses an @statement to address another user 1.1 Action Activities involving other Twitter users, or tweets which describe the presence of other Twitter users. 1.2 Query Any statement style tweet that ends with a question mark, as it represents an active attempt to engage responses from the community 1.3 Referral An @response which contains URLs or recommendation of other Twitter users. (Excludes RT @user) 1.4 Response Classification for tweets which commence with another user’s name and which do not meet the requirements of the referral category 1.5 Rhetoric Question Asked and answered within the same tweet (distinct from Conversational - Query) which may not require (but may elicit) audience response

Upgrades Pass along Tweets of endorsement of content 3.1 Automated Endorsement Status announcements triggered by third party applications which publish URLs 3.2 Endorsement Links to web content not created by the sender 3.3 Retweet Any statement reproducing another Twitter status using the via @ or RT protocol 3.4 Secondary Social Media Links to Facebook ( or similar social media platform 3.5 User generated content Links to own content created by the user 3.6 Quote Comment marked with “ “ to represent a direct quote, paraphrase of a statement without a source URL, including reference to offline speaker or overheard (OH) 3.7 Cite Any tweet which contains a reference in a recognised Harvard, Oxford or similar format 3.8 Modified ReTweet Acknowledgement of the use of MT protocol to allow for an edited RT.

Speed Hacking Excel • Speed hacks exist – Alphabet Tweet Sort • @, RT, MT cluster • “Find all” selecting.

Coding Time! • Cross check the coding – Some variance is okay – Resolve it through the usual traditions

Sample Data #qldquake


Analysis Table Block Category Tweet (TCat) Tweet Ratio Max Density Actual Characters Character Density Density Ratio Conversational News Pass Along Phatic Spam Status n

Tweet Math Dude • Tweet Count – N per category • Calculate the Tweet Ratio – Tweet ratio is a normalized rank order of the highest volume of tweets, where the most common category is scored as 1 • Calculating the Tweet Ratio – Highest number of tweets in a single category = TTMax – Tweets per category = TCat – Ratio is Tcat / TTMax I’m only mildly mocking statistical analysis here

Maximum Character Density • Max Density = 140 x TCat [number of tweets in each category] • Theoretical range for a tweet is between 1 and 140 characters • Maximum tweet is 140 characters • More characters used, more information density • Calculate Character Density – (Actual Character / Max Density) • Divide each CharDensity score by the highest Char density • Normalise CharDensity score to rank order

Reporting the Data Category Tweet (TCat) Tweet Ratio Max Density Actual Characters Character Density Density Ratio Conversational 39 0.08 5460 3533 65% 0.81 News 41 0.08 5740 3778 66% 0.83 Pass Along 481 1 67340 53491 79% 1.00 Phatic 21 0.04 2940 2179 74% 0.93 Spam 1 0.00 140 81 58% 0.73 Status 18 0.03 2520 1543 61% 0.77 n 601 84140 64605 77%

Reporting the Data 0 0.2 0.4 0.6 0.8 1 1.2 Conversational News Pass Along Phatic Spam Status Ratio Density

Text Analysis Wave 1 Linguistic Inquiry Word Count So. Very. Fast.

LIWC • – text analysis software – calculates the degree to which people use different categories of words in texts • 70 other language dimensions. – positive or negative emotions, – self-references, – causal words,

A giant bucket of data • 70 variables – So have a hypothesis and a purpose for the analysis • Differences in tweet construction – Word Counts – Unique Words

Results Average Word Count (AWC) Unique Word Count (UWC) Category AWC AWC_Ratio Conversational 12.82 0.78 News 13.56 0.82 Pass Along 16.35 1 Phatic 15.42 0.94 Status 12.94 0.79 Category UWC UWC_Ratio Conversational 93 0.97 News 93 0.97 Pass Along 92 0.96 Phatic 93 0.97 Status 96 1

Results Word Count Unique Word 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Conversational News Pass AlongPhatic Status 0.93 0.94 0.95 0.96 0.97 0.98 0.99 1 Conversational News Pass AlongPhatic Status Chart Title

Text Analysis Wave 2 Leximancer

Leximancer • Import into Leximancer as an individual analysis (individual project) – Edit Pre processing options: Sentence per block 1 – Run to Generate Outputs – Generate Concept Map

Map time!

Four sample maps Entirely because quadrants fit on screens better than hexes. No other reason conversational news pass along phatic

Tweet Network Density • Calculate Network Density – Count Nodes (n) – Count Actual Connections (e) Edges (paths between nodes) – Calculate Network density based on 2e / n(n-1) • Network Density Notes – Calculate potential connections

Pass Along Network Nodes Edges Network Density 15 15 0.14

Network Density Results Category Nodes Edges Network Density Conversational 13 12 0.15 News 18 17 0.11 Pass Along 15 15 0.14 Phatic 3 2 0.67 Status 4 3 0.50 n 19 17 0.10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Conversational News Pass AlongPhatic Status

One Bucket of Data • This is why a research question is important – You can map a range of information – None of it is useful without the RQ / hypothesis – It’s pretty, but not valuable Category Tweet Density Network Ave.WC Unique Words Conversatio nal 0.081081 0.819075 0.814598 0.830959 0.96875 News 0.085239 0.83315 0.828595 0.878952 0.96875 Pass Along 1 1.005496 1 1.059722 0.958333 Phatic 0.043659 0.938173 0.933044 1 0.96875 Status 0.037422 0.775065 0.770829 0.838992 1 0 0.2 0.4 0.6 0.8 1 1.2 Tweet Density Network Ave.WC Unique Words Chart Title Conversational News Pass Along Phatic Status

Questions? • @stephendann • •

Add a comment

Related presentations

Canvas Prints at Affordable Prices make you smile.Visit http://www.shopcanvasprint...

30 Días en Bici en Gijón organiza un recorrido por los comercios históricos de la ...

Con el fin de conocer mejor el rol que juega internet en el proceso de compra en E...

With three established projects across the country and seven more in the pipeline,...

Retailing is not a rocket science, neither it's walk-in-the-park. In this presenta...

What is research??

What is research??

April 2, 2014

Explanatory definitions of research in depth...

Related pages

Benchmarking & Analysis | LinkedIn

Benchmarking & Analysis Articles, experts, jobs, and more: get all the professional insights you need on LinkedIn
Read more

Twitter Content | LinkedIn

Twitter Content. Articles, experts ... get all the professional insights you need on LinkedIn. ... Associate news and government partnerships manager at ...
Read more

Twitter Analytics

Twitter Cards help you richly represent your content on Twitter. Now use analytics to measure their effectiveness. Learn more
Read more

4 ways to benchmark your social media stats

Here’s how to add benchmarking to your social media analysis and step-by ... My benchmark for Twitter ... us complete in-depth insight into what ...
Read more

Social Analytics & Benchmarking Key Features | quintly

Monitor all asked questions on Facebook and Twitter; Get valuable insights ... into the precious Twitter ... Analysis. Pinterest Analytics in quintly ...
Read more


... support network structure becomes an important benchmark for ... like Twitter can provide insights into the role ... content analysis of terms ...
Read more

Mapping Twitter Topic Networks: From Polarized Crowds to ...

... the full Twitterverse. Moreover, Twitter users are ... innovative data analysis tools that provide new insight into the ... benchmark for evaluating ...
Read more

Tame | Welcome

Tame delivers instant and personalized search and analytics for Twitter. ... into insights. Tame helps ... real time analysis of popular content, ...
Read more