A cautionary tale of social media statistics

Lies damn lies and statistics
It’s important to understand the full context relating to social media statistics before you act on them.

The Stat

I came across this stat the other day:

91 per cent of mentions [on social media] come from people with fewer than 500 followers.

The implication in the source blog post and whitepaper was:

When it comes to your social media strategy, don’t discount the importance of brand mentions by Twitter users with low follower counts.

It’s complicated

Follower numbers shouldn’t be the be all and end all when it comes to defining your social media strategy. Agreed.

For a start, where influence is concerned, relevance, proximity, context and other factors are crucial. And followers is a very simplistic metric and depending on how they use social platforms, may have little in common with a person’s real potential for influence.

Also, even if the mention itself doesn’t influence anyone, simply the knowledge that an individual has shown an interest in your brand in some way is potentially of value.

But while sympathising with the inference drawn, I think the statistic and its underlying data would benefit from some numerical context to better understand their implications.

N.B. I’ve focussed on Twitter in this analysis as that’s where the majority of the data in the particular research apparently came from.


Given the stat focuses on accounts with less than 500 followers, let’s split Twitter into two groups:

– Low Follower Group – Less than 500 followers.
– High Follower Group – 500 or more followers.

And then let’s look at two relevant areas – Impressions and Retweets.


Who could have seen brand mentions by each of these groups and potentially been influenced by them?

To calculate this we need to know the following for each group:

– Average number of followers.
– Impression rate.

Average followers

I used this estimated distribution of follower numbers across Twitter users*, combined with Lissted‘s data on nearly 2 million of the most influential accounts, to calculate a weighted average of the number of followers each group is likely to have.


– Low Follower Group – 100
– High Follower Group – 8,400

Impression rate

Every time you tweet only a proportion of your followers will actually see it. For many users this proportion could be less than ten per cent. The “impression rate” represents the total number of impressions generated by your tweet, divided by your follower number.

It only includes impressions on specific Twitter platforms – web, iOS app and Android app. This means impressions in applications like Hootsuite and Tweetdeck don’t count.

The rate is also complicated by retweets. The rate calculated by Twitter Analytics includes impressions that were actually seen by followers of the retweeting account, who may not follow you.

I’ve tried to look at retweets separately below, so for the purpose of this analysis I’m looking for impression rates without the benefit of retweet amplification.

On this basis I’ve assumed an impression rate of ten per cent for the Low Follower Group and five per cent for the High Follower Group. These assumptions are based on various articles estimating impression rates in the range of 2-10%. For the sake of prudence I’ve used a lower rate for High Follower accounts on the assumption that they could have a higher proportion of inactive and spam followers.

We can now calculate the proportion of total impressions related to each group as shown in this table:

Brand mentions impressions analysis

Finding: only 19 per cent of impressions relate to the Low Follower Group.

Quite simply the difference in reach of the High Follower accounts (84x higher – 8,400 v 100) more than offsets the difference in volume of mentions by the Low Follower Group (only 10x higher – 910 v 90).

For the Low Follower Group to even represent 50 per cent of the total impressions we’d need to assume an impressions rate for this group that is over 8x higher than for the High Follower Group e.g. 42% v 5%.

Though I suspect there may be a difference, is it really likely to be that much?


Next we need to consider if any of the brand mentions were retweets. If so were the original tweets more likely to be by accounts with high or low followers?

A lot of retweets by volume are by accounts with low followers. That’s just common sense because the vast majority of Twitter users have low follower numbers. But when we’re exposed to a retweet it’s the original tweet that we’re exposed to. This is the very reason why Twitter includes the resulting impressions in the Impression rate (I’m assuming automatic retweets, not manual ones).

To understand this better I analysed a sample of over six million tweets tracked by Lissted over the last two months that were retweeted at least once. The sample included tweets by 1.27 million different accounts and collectively these tweets received over 200 million retweets in total.

Of these six million tweets, 0.6% of them (c.39,000) accounted for two thirds of the total retweets generated.

And 99 per cent of these “top tweets” were by users with 500+ followers.

Finding: a high proportion of retweets are of users with High Followers, even if many are by users with Low Followers.


Mentions relating to accounts with higher than 500 followers appear more likely to:

– represent the majority of initial impressions; and
– generate the majority of any resulting retweets.

In other words it’s high follower accounts that are more likely to be the source of the majority of the brand mentions that people are exposed to on Twitter.


As I said at the start the purpose of this analysis is simply to give some proper context to an isolated statistic. Assessing the impact and actions you should take due to mentions of your brand requires consideration of a lot more factors than simply numerical exposure.

It could be the case that high follower tweets make up the vast majority of the mentions people are exposed to, but factors like trust, context, proximity and relevance could lead to mentions by low followers having more influence on business outcomes.

The key is to properly understand who is talking about you and why, and not base decisions on sweeping statistics.

*N.B the follower distribution analysis is from Dec 2013, but as Twitter hasn’t grown a huge amount in the last year, it seems reasonable to assume its validity. Happy to share my detailed workings with anyone who’s interested.

UK journalists say social media more important than ever (the real story of Cision’s study)

Social journalism headlines

A survey by Cision has found that time spent using social media for work by the UK journalists who responded has fallen. The focus this finding has received is unfortunate as this reduction may simply be due to increased productivity. Meanwhile, for the first time the same survey found that a majority of UK journalists now think social media use for work is both necessary and beneficial.

Cision have produced their annual survey of how journalists are using social media. The top finding is a fall in the proportion of UK journalists using social media for work for four hours or more per day. The level has reduced from 24 per cent in 2012 to 13 per cent in 2014. The inference drawn is that we’ve reached a point of “saturation”, or even decline, in the use of social media by UK journalists.

The thing is, time spent is only relevant if you can relate it to a set objective. In this case the reduction in time seems most likely to me to be due to improved productivity in the use of social media by journalists.

Here are a few potential reasons for this:

– According to the survey, Twitter is the No.1 tool used by UK journalists (75 per cent). Our analysis of when UK journalists joined Twitter suggests they have had between three and six years to become proficient at it.

– Productivity tools are likely to be widely used by now, particularly by those who use social media the most. An example of this is in the survey where it highlights 25% of respondents saying they use Hootsuite.

– Knowledge from earlier adopters will have been shared with colleagues who joined later. Journalism.co.uk’s excellent newsrewired conferences are an example (the latest of which was yesterday).

Meanwhile the same survey also tells us:

– 54% of journalists who responded couldn’t carry out their work without social media (up from 43% in 2013 and 28% in 2012).

– 58% say social media has improved their productivity (up from 54% in 2013 and 39% in 2012).

If the survey is representative, this means a majority of UK journalists now think that the use of social media for work is both necessary and beneficial.

Isn’t this the real story?

Why Brandwatch bought Peer Index and the Future of Social Listening

LisstedFutureofSocialListeningIn the week before Christmas, Brandwatch, the social media monitoring company, acquired influencer platform (and Lissted* competitor) Peer Index for a reported figure of £10m in cash and shares.

In the words of Giles Palmer, Brandwatch’s CEO, it was because

“As we (Giles and Azeem, Peer Index’s CEO) talked, I became acutely aware that PeerIndex were years ahead of us in their understanding and technology for influencer analytics and mapping.”

But why the need for a social media monitoring company to invest so heavily** to address influencer analytics and community mapping?

The answer lies in the exponential rate at which online activity has been growing, and the challenge this has created to find the people, content and conversations that really matter to PR and Marketing objectives.

*I’m the founder and architect of Lissted for anyone who doesn’t already know me.
**£10m looks to represent around 10-15% of Brandwatch’s value based on filings relating to its most recent finance raising in May 2014.

A world of ‘pretty noise’

The key social media monitoring platforms (including Brandwatch) were conceived and designed in the mid-to-late Noughties when we had a fraction of the online conversations we have today. Even by the summer of 2008, Facebook had only reached 100 million users (versus 1.35bn now), Twitter had a measly 10 million (versus 284 million now) and Instagram didn’t even exist.

In this relatively quiet online world the platforms didn’t need to do much to address what I think of as the ‘laws of social listening‘. A few simple metrics – like number of Twitter followers – and keywords were often enough to find who, and what, mattered most.

As the scale of online conversation has grown in the last few years these platforms have invested in an arms race of engineering to try and keep pace with the demands of multiple sources, processing and storage. Meanwhile at their heart they mostly still treat ‘listening’ as a purely data driven exercise, continuing to use similar metrics of now questionable worth, combined with increasingly complex Boolean keyword strings, to desperately try and filter it.

This has resulted in what I call “pretty noise”. Beautifully designed front end applications with graphs, charts and word clouds that look wonderful, but often tell you very little of real value, or worse can be genuinely misleading.

Because real listening, and the insight that comes with it, requires an understanding of people and communities, not simply data mining.

Noise doesn’t equal influence

At the same time as social listening platforms have been struggling in a Canute-like fashion with this vast wave of conversation, we’ve also seen the rise of the ‘influencer’ – someone who is judged to have the potential to exert higher levels of influence over others.

Such people have always existed of course, but social media and the wider online world has increased the ways in which this potential can be earned, observed and utilised.

Again, a plethora of tools and platforms have been created to try and help users identify these influencers in relation to their brand, product or industry. The majority of them start with who produces online content around your chosen keywords, and then look at the reaction they generate before deciding who ranks highest.

Unfortunately, these tools generally suffer from the same weakness as the social listening platforms. The scale of conversation and online activity is so great that they often equate noise to influence.

And even when these tools are successful in identifying truly influential and relevant content creators, they’re still of limited use, as creating content isn’t the only way to be influential.

They may help me identify candidates for outreach purposes, but it certainly doesn’t follow that their answers will be relevant to other key Marketing and PR activities such as:

  • ranking higher in search;
  • organising an event with industry leading figures;
  • understanding how my competitors are behaving online;
  • reaching more relevant people with my own content; or
  • identifying who I should target with my advertising.

True, there may be some active content creators in a particular field who will indeed be relevant no matter what your objective. For instance – if you’re looking at a list of UK PR influencers that doesn’t have Stephen Waddington near (or at) the top (as I was the other day) then I would seriously question whatever approach they’re using.

But mostly, this combination of noise driven methodologies and varying objectives has created a situation where users are asking questions of these tools, and it’s often just dumb luck if they get back the answer they really need.

It’s all about Community

So how do we address these two problems?

  • Effectively listen to social media sources to find true insight in a real time world.
  • Identify the right people and organisations depending on our objective.

The solution lies with understanding communities and the contextual relevance they provide.

LisstedFutureofSocialListeningIt’s about identifying, observing and listening to enough of the key members of a community, particularly the ones who are authoritative and knowledgeable.

Lissted approaches this challenge in a very different way to Peer Index, but we do agree that if you can understand the makeup of communities relevant to you, everything else starts to fall into place. This is because the very people, content and conversations they are paying attention to are the ones that are likely to matter most.

We call this “Superhuman” social listening. A host of people who really know their stuff helping you to filter the online world and discover who, and what, really matters to your PR and Marketing objectives:

Reputation management: they’ll highlight important stories and conversations about your brand before most people get to hear about them.

Outreach: they’ll tell you who the content creators are that drive influential conversations, not just noisy ones.

Amplifying your content: they’ll tell you who the curators are that identify and share influential conversations.

Improving your search ranking: they’ll tell you the domains that they trust, which are therefore likely to be the very ones that Google will trust too.

Targeting your advertising: they’ll help you identify the people most like them and who share their interests.

Event organising: they’ll tell you who are the most recognised people in their field.

Real time marketing: they’ll help you identify what’s really getting relevant people engaged.

And so on……

The future

The critical element is the ability to identify these communities accurately. To find the right people to listen to. This is why we’ve spent the last two years developing Lissted’s real world approach to identifying relevant communities and why I believe Brandwatch have invested heavily with this acquisition to try and achieve this too.

I expect we’ll see a lot more activity around this challenge in 2015 and beyond as others in the social listening industry recognise the need to address the elephant of noise in the room.

Beta invites

We’re currently running a private beta of Lissted’s latest community analysis tool. If you’d like to get involved then drop me an email, adam@lissted.com.

‘The Big Bang Theory’ of Content Marketing

Physics can teach us a thing or two about what matters in Content Marketing


Source: Moviepilot.com


Two things collided to create this post.

First, I’m a huge ‘The Big Bang Theory’ fan. For anyone who doesn’t watch the show its central characters are two roommates – Sheldon Cooper and Leonard Hofstadter – both physicists at CalTech, and Penny, an aspiring actress and “temporary” waitress who moves in across the hall from them.

Penny lacks the guys intellect, often struggling to understand (or care) what they’re talking about, but she knows a lot more about life and relating to people than the two geeky scientists.

Second, I showed a very clever real life physicist, Stephen Baldwin* a graph of retweet activity over time for one of the tweets identified by our Tweets Distilled experiment. Tweets Distilled seeks to identify interesting tweets early in their lifecycle.

Tweets distilled

Stephen suggested there could be parallels between what makes content successful and the physical properties of heat capacity and phase transition.

*Stephen specialises in acoustics and sound processing and is currently looking for a new challenge.

Our Theory

Here’s a summary of what we produced:

Content heat capacity

Content Heat Capacity equations

Content phase transition 

Content Phase Transition changes

Sheldon Cooper will explain the physics behind these properties later in the post. First here’s the story of the tweet that got me and Stephen talking….

Giving it 110 per cent

On the day of the Scottish Independence Referendum, CNN’s graphic department had clearly been listening to too many footballers’ post match analysis as they put this graphic up on screen.

CNN reports 110  turnout in Scottish independence vote   Daily Mail Online

A Twitter user called Brady, who goes by the screenname of @BurningGoats picked up on the error and tweeted:

This was just after 8pm. The graph below shows the average retweets per minute in the 8 hours after the tweet was posted. Burning goats retweets You can see that in the first hour and a half there was a limited amount of activity. and by 21:31 the tweet had been retweeted 62 times. This is exceptional for Brady as none of his tweets in the subsequent 2 months have had more than 2 retweets, but it’s nothing compared to what happens in the next few minutes. Betfair Exchange   BetfairSports    Twitter At 21:31 the tweet is retweeted by @BefairSports, an account that doesn’t follow @BurningGoats. @BetFairSports has over 80,000 followers, one of whom is the ex Liverpool and Germany footballer Dietmar Hamann who has over 600,000 followers. Didi Hamann   DietmarHamann    Twitter He also retweets the story at 21:33 and immediately following this combination we see a massive spike in retweet activity. Burninggoats retweets after influencers From then on the tweet never looks back maintaining a rate of 40-80 retweets per minute for the next couple of hours.

Finally at 02.26 the story is picked up by the mainstream media and appears on the Daily Mail, whose article generates nearly 1,500 comments on Facebook. Daily Mail CNN 110 per cent The Science

So how can physics help to explain what happened? Over to the awesome Sheldon Cooper to explain.


Source: bigbangtheory.wikia.com/

Sheldon: Two physical properties are relevant here. Heat capacity and phase transition.

Heat capacity

Heat capacity is the amount of energy that is required to increase the temperature of a material. It can be expressed as an equation, thus: Heat Capacity Equation So the higher the heat capacity, the more energy it’s going to need to get hot. Different materials have different heat capacities.

Leonard explanation for Penny: It’s why they don’t make hair straightener plates out of rubber!

ghd Phase transition

Sheldon: As the temperature of a material rises it will change state. These changes are called phase transitions and the most well-known are from solid to liquid and liquid to gas.Phase Transition There is also a fourth state that matter can take. When a gas is heated sufficiently it will ionise and form plasma (the most abundant material in the universe). Different materials go through phase changes at different temperatures.

Applying physics theory to Content Marketing

Content heat capacity

Sheldon: If we relate:

– heat capacity as a measure of the quality and likely potential interest a piece of content possesses within an online community (where high quality content is equivalent to material with a low heat capacity);

– energy as the tweets, retweets, favorites, likes, shares and other forms of engagement with the content it receives; and

– temperature as the level of interest it’s achieving within the relevant online community;

we get: Content Heat Capacity This predicts that low quality content will need high engagement to raise the level of interest being shown in it.

Leonard to Penny:  “You might read an article on toilet brushes, but only if Gerard Butler tweeted it!”

Gerard_Butler_My_Morning_Man (2) Content phase transition

Sheldon:  If we see the different states content can exist in as: Content Phase Transition changesSolid phase = low quality content that even your followers find uninteresting or decent content that receives very little engagement.

Liquid phase = great content that’s got some engagement and started reaching followers of your followers, or not so great content that’s lucky enough to get engagement from some influential sources.

Gas phase = content that’s reached the wider community, either because a) it’s awesome and quickly got the attention of people dotted throughout that community, b) it’s pretty great content that’s getting lots of engagement generally or c) decent content that’s had the full star power treatment to force it out into the wider community.

Ionised = Amazing content that’s so “hot” you’d actually talk to someone about it in the real world and/or it’s appearing in media outside of the online community concerned.

The key is that the more widespread the appeal of the content among the community the lower the temperature (interest level) it will have to reach to change its state.

So awesome content will take a lot less energy to go through these phases and start reaching the wider community .

Leonard to Penny: “It’s why the biggest secrets make the best gossip.”


Source: fanpop.com

How the theory fits the CNN Story

Sheldon: The CNN graphic is great material. It’s funny, has immediate visual impact and it’s got numbers, and everyone loves numbers. It therefore has an inherently low heat capacity i.e. very high content quality rating.

The tweet by @burninggoats improves on this by bringing in the “giving it 110 per cent” phrase that is often used by sports people. This meant the tweet itself had an even lower heat capacity and so raised its inherent quality further.

Implication: The tweet only needed a relatively small amount of energy (engagement) to raise its temperature (interest level). 

At the same time the content also had the potential for widespread appeal. Whether you were interested in the Scots referendum, appreciated the sporting reference or simply wanted to have a laugh at CNN’s expense, many people were likely to find this content interesting.

Implication: The tweet only needed a relatively low increase in temperature to change state and reach the wider Twitter community.

Combine these two and you had content that only needed a relatively small amount of energy (engagement) and it was going to reach far and wide. This is what happened when it received the retweets from @BetfairSports and @DidiHamann. Liquid to Gas And the content subsequently “ionised” when it was published by the Daily Mail.

Practical application of the Theory

To be successful you need to recognise three key implications:

1. Content creation and design is crucial

If your content isn’t all the things you know it should be – well designed, eye catching, exciting, thought-provoking, surprising, timely, appropriate format etc – then its going to need a lot of energy from the community to raise the interest level.

If you don’t possess this yourself e.g. a brand with a huge organic following like Apple, or you can’t buy it (celebrity endorsement for example), then content like this is going to really struggle to reach beyond a small proportion of those who are closest to you.

2. Listen to understand what will have widespread appeal

If you want to reach the parts of a community you don’t already know then you need to understand what is likely to engage those people, as well as those close to you, and design your content accordingly.

This will mean that your content has the potential to “change state” at much lower levels of interest. Again this means the energy (engagement) requirements to achieve this are lower.

3. Influencer engagement (and potentially paid promotion) will often still be necessary for success

In almost every case the quality and appeal of the content will only get you so far. They will reduce the energy requirements, but they won’t eliminate them.

As we saw with the CNN example, the innate quality and widespread appeal of the tweet meant it turned into a liquid (reached the followers of Brady’s followers) quite quickly. Even still it took the input of energy from Betfair and Didi Hamann’s engagement to make it change state to a gas and start reaching the wider community.

This demonstrates one of the potential benefits of an influencer strategy. Though be wary. It’s important that the influencers, really do possess the potential for influence. Don’t get fooled by simple reach numbers. Make sure they are highly relevant so that the wider community they help you reach is the one you were looking to target.

Finally in a corporate situation the use of paid promotion should be considered as an alternative to provide this additional energy when it doesn’t appear organically.

Sheldon Cooper signing off. 


The 4 (F)laws of Social Listening

4 laws of social listening

Social listening is big business. Having access to a platform that can interrogate social media data has become a must have for many, if not most, organisations.

The problem is the vast majority of these platforms suffer from major flaws which can often lead to them producing pretty graphs and analysis that at best tell you very little and at worst are hugely misleading.

For social listening to be effective in providing genuine insight, it needs to fully address the following: refine, contextualise and weight the data, and be wary of sentiment analysis.

1. Refine

This is where it starts. Rubbish in, rubbish out. Within any social media the volume of spam (or robot created) accounts, conversation and content is vast. Any social listening exercise or platform must be effective at filtering this crud out, otherwise the resulting analysis is going to be seriously flawed.

Think of it like refining oil. You could try running your car on raw crude, but I wouldn’t recommend it.

Separating the wheat from the chaff is not a simple exercise. Just saying “let’s look at data from accounts with more than X followers” or “at least X shares” or similar sliders that many vendors provide doesn’t cut it. In fact, it can often result in missing really important conversations involving key individuals who just happen to be less active or noisy.

You need filters that are sophisticated enough to eliminate the noise, whilst retaining the signal, something Ray Dolby knew a thing or two about.

2. Contextualise

The keyword. That wonderful item, beloved by social listening platform users the world over. Selecting the right keywords has almost become an art for some people. They produce complex Boolean searches in a desire to find the conversations they hope are contextually relevant.

The thing is, the more complex your keyword search criteria, the more likely you’re missing something you needed to hear. I want to identify conversations about Apple the brand not the fruit, so I search for a string of “Apple” AND “XXXX”, or “Apple” NOT “XXXX”.

More sophisticated solutions use pattern recognition and topic clustering to identify the context and save you thinking of every AND XXXX or NOT XXXX. But what about conversations that are relevant, but in a less direct way? What about competitors like Samsung or Microsoft? When I ask for Apple data do we include these conversations or ignore them? Is it only when they mention Apple? Where do we draw the line?

The other major weakness in these approaches is that they rely on semantic content in the first place. If keywords aren’t present in the social media post then the system can’t identify them. Makes for a pretty major weakness when you consider the huge increase in visual based social media like minimally tagged Instagram posts, or tweets where it’s the picture that tells the story.

The bottom line is neither of these approaches is a human way of listening. We don’t rely on keywords to tell us if something is insightful or of interest; we have the capacity to recognise it when it is. Social listening solutions need to be designed with this in mind.

After all we are listening to people, not simply crunching data.

3. Weight

When Barack Obama gets up to the White House podium, the world listens. When @abc45xxx (not real) with 200 spam followers auto posts a link to an article, no one does.

I’m not talking about influence measurement here – though it’s a related topic – I’m merely pointing out that social media may have democratised conversation, but that doesn’t mean that every social media post should be given equal weight. It also doesn’t mean that only posts from “influencers” are important either.

Social listening solutions must take account of the relative importance of what is said, by whom and how people reacted to it and then weight their analysis accordingly.

4. Be wary of sentiment

This is simple. Automated sentiment analysis is quite simply not much better than a coin toss in many cases.

If you’re happy to make business decisions on this basis then fine, otherwise don’t believe those seductive dashboard graphs and accept that you’re going to have to take a more human approach to such analysis and that means you probably can’t do it for every last tweet, YouTube comment, blog post etc. 


Compliance with all the first three requirements is an absolute necessity if a social listening exercise is to have any potential for producing insight.

If you don’t filter out the noise your analysis is flawed from the start.

Filter the noise, but don’t ensure you have the right context and you’ll be listening to the wrong conversations, even if they will be crystal clear.

And fail to weight what was said and you risk missing the key voices that were driving the conversation.

Social listening solutions need to start addressing these (f)laws properly and fast, before users realise how irrelevant and misleading a lot of what they are telling them actually is.