How Twitter could help solve Facebook’s fake news problem

Twitter shares by influential individuals and organisations could be harnessed in an automated news content rating system.

This system could assist Facebook in identifying articles that have a high risk of being fake. The methodology is based on a journalistic verification model.

Examples: the model would have rated as high risk:

-FINAL ELECTION 2016 NUMBERS: TRUMP WON BOTH POPULAR ( 62.9 M -62.2 M ) – about the election results. It was ranking top of Google for a search for “final election results” earlier this week and has had over 400,000 interactions on Facebook. It was identified as fake (obviously) by Buzzfeed.

–  Pope Francis Shocks World, Endorses Donald Trump for President, Releases Statement‘. Shared  nearly 1 million times on Facebook. Now taken down, having been reported as fake by The New York Times

The rating system described below is subject to patent pending UK 1619460.7.

At the weekend Mark Zuckerberg described as “pretty crazy” the idea that sharing fake news on Facebook contributed to Donald Trump being elected President.

He went on to say in a Facebook post:

“Of all the content on Facebook, more than 99% of what people see is authentic. Only a very small amount is fake news and hoaxes. The hoaxes that do exist are not limited to one partisan view, or even to politics. Overall, this makes it extremely unlikely hoaxes changed the outcome of this election in one direction or the other.”

“That said, we don’t want any hoaxes on Facebook. Our goal is to show people the content they will find most meaningful, and people want accurate news. We have already launched work enabling our community to flag hoaxes and fake news, and there is more we can do here. We have made progress, and we will continue to work on this to improve further.”

Yesterday, Business Insider reported a group of students had hacked together a tool that might help.

I think part of the answer lies in another social network, Twitter.

An important aside

It’s important to note the topic of “fake” news is not black and white. For example, parody accounts and sites like The Onion are “fake news” that many people enjoy for the entertainment they provide.

There’s also the question of news that is biased, or only partially based in fact.

The idea proposed below is simply a model to identify content that is:

1. more likely to be fake; and

2. is generating a level of interaction on Facebook that increases the likelihood of it being influential.

Verification and subsequent action would be for a human editorial approach to decide.

Using Twitter data to identify potentially fake news

In its piece on Zuckerberg’s comments, The New York Times highlighted this article Pope Francis Shocks World, Endorses Donald Trump for President, Releases Statement (now removed) that had been shared nearly a million times on Facebook. It’s fake. This never happened.

If it had been true it would obviously have been a big story.

As such you’d expect influential Trump supporters, Republicans and other key right wing media, organisations and individuals to have been falling over themselves to highlight it.

They weren’t.

Lissted tracks the Twitter accounts of over 150,000 of the most influential people and organisations. This includes over 8,000 key influencers in relevant communities such as Republicans and US Politics, as well as potentially sympathetic ones such as UKIP and Vote Leave.

Of these 150,000+ accounts only 6 shared the article.

Extending the analysis

Lissted has indexed another 106 links from the same domain during the last 100 days.

The graph below shows analysis of these links based on how many unique influencer shares they received.

analysis-of-links

You can see that 74 of the 107 links (including the Pope story) were only shared by a single member of the 150,000 influencers we track. Only 5 have been shared by 6 or more and that includes the Pope story.

That’s just 196 influencer shares in total across the 107 links.

Yet, between them these URLs have been interacted with 12.1 million times on Facebook.

And of course these are the stories that have been shared by an influencer. There could be more that haven’t been shared at all by influential Twitter users.

Lissted’s data also tells us:

– 133 of the 150,000 influencers (less than 0.1%) have shared at least one of its articles; and

– the article published by the site that has proved most popular with influencers has received 10 shares.

How could this help identify high risk news?

You can’t identify fake news based simply on levels of reaction, nor based on analysing what they say. You need a journalistic filter. Twitter provides a potential basis for this because its data will tell you WHO shared something.

For example, Storyful, the Irish social media and content licensing agency, has used Twitter validation by specific sources as a way of identifying content that is more likely to be genuine.

I don’t know why very few of the influencers Lissted has been tracking shared the piece. But my suspicion would be that as influential members of their communities they’re:

– capable of spotting most fake news for what it is, and/or

– generally less likely to share it as even when it serves their purpose they know that they could be called out for it (they’re more visible and they’ve got more to lose); and /or

– less likely to be exposed to it in the first place.

Obviously, not all content will be shared on Twitter by these 150,000 accounts. But you can bet your bottom dollar that any vaguely significant news story will be. The temptation to want to highlight a genuine story is just too great.

Comparison to example of genuine content

To give the Pope story numbers some context, the table below shows a comparison to this piece on the Donald Trump website – Volunteer to be a Trump Election Observer (NB: post victory the URL now redirects to the home page).

comparion-table

Both URLs have similar Facebook engagement, but there’s a huge difference in the influencer metrics for the article and the domain.

This is just one example though. If we build a model based on this validation methodology does it provide a sound basis for rating content in general?

NB: the model that follows focuses on content from websites. A similar, approach could be applied to other content e.g. Facebook posts, YouTube videos etc.

Proof of concept

To test the methodology I built a rating model and applied it to three sets of data:

1. The 107 links identified from endingthefed.com – data here.

2. Links that Newswhip reported as having 250,000+ Facebook interactions in the period 15/9/16 – 14/11/16 – data here.

3. A random sample of over 3,000 links that were shared by influencers from the specific communities above in the period 15/10/16 -14/11/16 – data here.

The rating model gives links a score from 0 – 100. With 100 representing a links that has a very high risk of being fake and zero being a very low risk.

To rate as 100 a link would need to have:

– received 1,000,000 Facebook interactions; and
– be on a site that has never been shared by one of the 150,000 influencers, including the link itself.

The distribution of rating for the random sample is as follows:

distribution-of-articles-by-risk-rating

Mark Zuckerberg’s commented that less than 1 per cent of content on Facebook is fake. If we look at the distribution we find that 1 per cent corresponds to a score of 30+.

The distribution also shows that no link in the sample scored more than 70.

Finally over 90 per cent of URLs rated at less than 10.

On this basis I’ve grouped links in the three data sets above into 4 risk bands:

Exceptional – 70+
High – 30 -70
Medium – 10 – 30
Low – 0-10

Applying these bands to the three sets gives:

distribution-of-articles-by-risk-rating across three sets

Unsurprisingly a high proportion of the 250,000+ group are rated as Medium to Exceptional risk. This reflects the fact that there are so few of them – 182 – and the implicit risk of being influential due to their high engagement.

Verifying these would not be a huge drain on resources as that translates to just 2 or 3 links per day!

The graph also shows how high risk the endingthefed site is with over 95 per cent of its content rated as High or Medium.

HEALTH WARNINGS

1. Being ranked as medium – exceptional risk does NOT mean the content is fake. It is simply an indicator. Just because one article on a site is fake does not mean that all the risky content is.

Also an article could be genuine viral content that’s come out of the blue from a new source.

The value in the model is its ability to identify the content that needs verifying the most. Such verification should then be done by professional journalists.

2. The rankings only reflect the 150,000 individuals and organisations that Lissted currently tracks. There could be communities that aren’t sufficiently represented within this population.

This isn’t a flaw in the methodology however, just the implementation. It could be addressed by expanding the tracking data set.

Example findings

The top 10 ranked articles in the 250,000+ group are as follows:

1. Mike Pence: ‘If We Humble Ourselves And Pray, God Will Heal Our Land’ (506k Facebook interactions, 0 influencer shares)

2. Just Before the Election Thousands Take Over Times Square With the Power of God (416k Facebook interactions, 0 influencer shares)

3. TRUMP BREAKS RECORD in Pennsylvania “MASSIVE CROWD FOR TRUMP! (VIDEO) – National Insider Politics (207k Facebook interactions, 0 influencer shares)

4. SUSAN SARANDON: CLINTON IS THE DANGER, NOT TRUMP – National Insider Politics (273k Facebook interactions, 0 influencer shares)

5. FINGERS CROSSED: These 11 Celebrities Promised To Leave America If Trump Wins (455k Facebook interactions, 1 influencer share)

6. Trump: No Salary For Me As President USA Newsflash (539k Facebook interactions, 0 influencer shares)

7. I am. (454k Facebook interactions, 1 influencer share)

8. A Secret Has Been Uncovered: Cancer Is Not A Disease But Business! – NewsRescue.com (336k Facebook interactions, 0 influencer shares)

9. The BIGGEST Star Comes Out for TRUMP!! Matthew McConaughey VOTES Trump! (294k Facebook interactions, 1 influencer share)

10. Chicago Cubs Ben Zobrist Shares Christian Faith: We All Need Christ (548k Facebook interactions, 1 influencer share)

My own basic verification suggests some of these stories are true. For instance Donald Trump did indeed say that he would not draw his Presidential salary.

However the Matthew McConaughey story is false and by the article’s own admission the Pennslyvania rally image is from April not October, plus there are no details on what “records” have been broken.

From outside the top 10 this post, rated as high risk, FINAL ELECTION 2016 NUMBERS: TRUMP WON BOTH POPULAR ( 62.9 M -62.2 M ) about the election results was ranking top of Google for a search for “final election results” earlier this week. It was identified as fake by Buzzfeed.

It would be great if any journalists reading this would go through the full list of articles rated as high risk and see if they can identify any more.

Equally if anyone spots URLs rated as low risk that are fake please let me know.

Further development

This exercise, and the mathematical model behind it, were just a rudimentary proof of concept for the methodology. An actual system could:

– utilise machine learning to improve its hit rate;

– flag sites over time which had the highest inherent risk of fake content;

– include other metrics such as domain/page authority from a source such as Moz.

Challenge to Facebook

A system like this wouldn’t be difficult to setup. If someone (Newswhip, BuzzSumo etc) is willing to provide us with a feed of articles getting high shares on Facebook, we could do this analysis right now and flag the high risk articles publicly.

Snopes already does good work identifying fake stories. I wonder if they’re using algorithms such as this to help? If not then perhaps they could.

Either way, this is something Zuckerberg and Dorsey could probably setup in days, hours perhaps!

@London2012: golden social media assets going to waste

Social media accounts with huge associated audiences are lying dormant. Examples like @London2012’s Twitter account could be repurposed to make the most of these assets.

So, the Olympics are over for another four years. Having gorged myself on the heroics of TeamGB, I’m personally suffering from withdrawl.

TeamGB’s social media team also did a sterling job over the two weeks, sharing content about our athletes’ magnificent performances.

Their two primary platforms based on fans and followers were Facebook and Twitter. Their Twitter account has an impressive 822,000 followers.

But there’s another relevant Twitter account with an even larger audience, and it’s dormant.

@London2012.

London 2012   London2012    Twitter

This account has 1.32 million followers. It’s tweeted seven times since the end of the Paralympics in 2012, the last in July 2013.

Since then, nothing.

Will this account and its audience of 1.3 million potentially sport mad followers just sit and fester forever?

And it isn’t just the number of followers that’s impressive, it’s the quality too.

Here are some examples of significant followers of @London2012 who don’t follow @TeamGB:

@coldplay, @WayneRooney, @idriselba, @astonmartin, @thetimes, @Harrods, @EvanHD, @cabinetofficeuk, @WomensRunning, @KathViner and @andyburnhammp.

The identity shouldn’t get in the way of using it. Behind every account is a unique TwitterId (it’s 19900778 in @London2012’s case if you’re interested). This means you can change your @username and still maintain your follower and following relationships. Here are the Twitter instructions to do this.

I don’t know who “owns” this asset, but surely whoever it is could think of a change of identity that would still be relevant to the majority of its followers. Perhaps it could have been used to support the Games’ legacy? @UK_Sport’s 91,400 followers rather pales in comparison.

And @London2012 isn’t the only account like this.

What are the BBC going to do with accounts relating to shows that are no more, like @BBCTheVoiceUK and its 521,000 followers, or the @ChrisMoylesShow with 518,000?

Nothing by the looks of it.

On a sombre note, there are accounts that become dormant because someone dies. Examples like @ebertchicago and @davidbowiereal demonstrate that even then there can be circumstances where it’s appropriate for the accounts to live on.

As of writing Lissted‘s data shows 28,401 accounts with 10,000+ followers who haven’t tweeted in the last 90 days.

Not all of these accounts will be dormant. Some like Ed Sheeran may be just “buggering off for a bit“. But many will.

Between them they have a combined untapped group of 1.5 trillion followers.

Now there’s a number worthy of a gold medal!

Unicorns, content and engagement flights of fancy

When you’re seeking influential content, engagement metrics such as a Facebook likes and LinkedIn shares are too simplistic. You need to know more about who engaged with it and why.

On the Road to Recap Venture Capital Community ReactionLast week venture capitalist Bill Gurley published a post called On the Road to Recap.

For anyone who doesn’t know, a Unicorn in this context is a startup company with a valuation in excess of $1bn.

The post analysed in depth the current investment situation in relation to Unicorns and concluded:

“The reason we are all in this mess is because of the excessive amounts of capital that have poured into the VC-backed startup market. This glut of capital has led to (1) record high burn rates, likely 5-10x those of the 1999 timeframe, (2) most companies operating far, far away from profitability, (3) excessively intense competition driven by access to said capital, (4) delayed or non-existent liquidity for employees and investors, and (5) the aforementioned solicitous fundraising practices. More money will not solve any of these problems — it will only contribute to them. The healthiest thing that could possibly happen is a dramatic increase in the real cost of capital and a return to an appreciation for sound business execution.”

The post lit a fire in the VC and startup communities.

In fact Lissted ranks the post as the most significant piece of content on any investment related topic in the VC community in the last two months. 

So I thought I’d see how it compares to other recent posts about Unicorns.

Comparison with other “Unicorn” content

I searched across the last month for posts with the most shares on LinkedIn (URLs listed at the end). If you search across all platforms you end up with very different types of unicorn!

Having found the Top 10 articles on this basis, I then looked at the number of distinct members of Lissted‘s VC community on Twitter who shared each of the articles. The community tracks the tweets of over 1,500 of the most influential people and organisations in relation to venture capital and angel investment.

Finally for completeness I also looked at the number of distinct Lissted influencers from any community who tweeted a link to the piece.

In the graph the engagement numbers have been rebased for comparison, with the top ranking article for each measure being set to 100.

On the Road to Recap Venture Capital Community ReactionThe difference in reaction by the VC community and influential individuals in general is considerable.

15x more influential members of the VC community (169) shared ‘On the Road to Recap’ than the next highest article (11 -Topless dancers, champagne, and David Bowie: Inside the crash of London’s $2.7 billion unicorn Powa).

9x more influencers across all Lissted communities (419) shared the post (46 for the Powa piece).

VC Community reaction examples

Influential retweeters of Bill’s initial tweet above included Chris Sacca, Om Malik & Jessica Verrill.

Examples of key community influencers who tweeted their own views were:  

And people are still sharing it days later:  

Mythical measurement

So, the next time you set out to find influential content, don’t get too carried away with big engagement numbers. Focus on understanding where and who that engagement came from.

That way your conclusions will be legendarynot mythical.

If you’d like to get a daily digest of the influential content in the Venture Capital community, sign up for a free Lissted account here, then visit the Venture Capital page.

Lissted Venture capital page

Articles

1. Forget unicorns — Investors are looking for ‘cockroach’ startups now

2. What investors are really thinking when a unicorn startup implodes

3. On the Road to Recap: | Above the Crowd

4. Next Chapter: Cvent Acquired for $1.65 Billion

5. The fall of the unicorns brings a new dawn for water bears

6. Why Unicorns are struggling

7. Oracle just bought a 20-person company for $50 million

8. Silicon Valley startups are terrified by a new idea: profits

9. Topless dancers, champagne, and David Bowie: Inside the crash of London’s $2.7 billion unicorn Powa

10. 10 Startups That Could Beat a Possible Bubble Burst

Another “If I was Jack” post: Top 3 things Twitter needs to do to stay relevant

There’s been a lot of talk over the last week or so about what Twitter needs to do to turnaround its fortunes. As someone who’s spent more time than is probably healthy looking at Twitter data over the last three years I thought I’d throw in my two penneth.

Here are the three areas I think are crucial to address.

Note none of them relate to tweets or ads. True, changes to video, ability to edit tweets, tweet length, ad options etc. might improve things in the short term. But I’m convinced in the medium/long term they’re like moving the deckchairs on the Titanic.

Effective policing

Twitter’s public nature (protected accounts aside) is a major reason why it appeals to a minority of people. Those who accept, or are naïve about, the risk involved with such a platform.

Friday night saw an example of such naivety from a Twitter employee of all people in response to the #RIPTwitter hashtag:

His experience was pretty mild though.

Frequent stories about people attacked by trolls, spammers and bullies can’t be helping user growth. Some investment has been made to address this, but it must be maintained.

Freedom of speech and expression is something to be valued. But just like society won’t tolerate all behaviour, nor should Twitter.

Update: While I’ve been drafting this post today, Twitter has announced the creation of a Trust and Safety Council.

Follow spam

Hands up who’s been followed multiple times by the same account? Here’s a screenshot of an account that followed our @Tweetsdistilled account ten times last month.

Multiple follows tweets distilled us

Each time it’s unfollowed and tried again because @Tweetsdistilled didn’t follow it back. Such automated follow spam is a joke. If these are the kind of users Twitter thinks it needs to be serving then it really doesn’t have a future.

At the moment anyone can follow up to 5,000 accounts. You are then limited to following only 10 per cent more accounts than the number that follow you. So to follow more than 5,000 accounts you currently need 4,545 followers.

I’d suggest changing this ratio to substantially less than 1.0x after 5,000 accounts. For example, if set at 0.25x then if you wanted to follow 6,000 (1,000 more) you would need to have 8,545 followers (4,000 more).

I’d also place stricter limits on the number of times you can follow the same account than appears to be the case at the moment. Twice in any 30 day period would be enough to allow for an accidental unfollow!

Combined, these changes would still allow people to grow their followers, but would mean they could only do so if they were interesting to an increasingly large group of users.

Why do I know these constraints shouldn’t be an issue?

Because of 2.57 million accounts that Lissted has identified as having any real influence potential on Twitter, 95 per cent of them (2.44 million) follow less than 5,000 accounts. Of the remaining 124,000 accounts, 24,000 would still be within the parameters I’ve suggested.

Here’s a table summarising the stats:

Following analysis

You can see the remaining 100,000 accounts have more follow relationships (2.619bn) than the other 2.47 million combined (2.449bn).

And these are just the accounts that Lissted has identified as having some degree of likelihood they are “genuine”. There are probably more that are pure spam that Lissted filters out.

So this tiny minority, less than 0.1 per cent of Twitter users is creating this huge amount of irrelevance.

Communities

A key strength of Twitter is the groups of experts you can find related to pretty much every industry, profession and topic you can think of.

In my opinion Twitter focuses too much on promoting “celebrities” and not enough on these niche communities.

Twitter needs to provide new and existing users with simple and effective ways to “plug into” them.

Inside Twitter

This could be done within the existing feed mechanism. Over the last 12 months our niche Tweetsdistilled accounts e.g. @PoliticsUKTD, @HealthUKTD and @EducationUKTD have been demonstrating this. They’re like a cross between Twitter lists and ‘While you were away’. Having chosen to subscribe to the feed it then posts interesting tweets from the community into your timeline and like Twitter lists you don’t need to be following the specific accounts concerned.

They appear to be doing something right, as they’re followed by many key members of these communities. Even accounts you might assume would have this covered anyway.

Outside Twitter

I’d love to know the engagement stats for the Popular in your Network emails. Does anyone actually look at them? For new users they seem to focus heavily on celebrity tweets. My suspicion is if you wanted to sign up for Stephen Fry’s or Kanye’s tweets you’d have done it by now.

Instead, why not allow users to subscribe to a summary of what communities have been talking about. The content they’ve shared and the tweets they’ve reacted to.

Lissted can now deliver daily and weekly digests of the most interesting content and tweets from an array of communities. Here’s Sunday’s US Business community weekly digest for example.

USBusinessLisstedWeeklyDigest070216

To produce these digests Lissted actually combines the response of a Twitter community with the wider social reaction across Facebook, LinkedIn and Google+. But it still demonstrates Twitter has the ability to be seen as a powerful intelligence tool for new and existing users with minimum investment on their part.

If you have 7 minutes to spare here’s a detailed story we produced last October about how this could also help Twitter in an onboarding context too.

Over to you Jack

Twitter’s next quarterly results announcement is tomorrow (10th February). I wonder if any of these areas will be addressed….

A tiny fraction of real conversation is analysed by social media monitoring tools

Social media listening tools can provide powerful insights when they’re used to find answers to really good actionable questions.

But recently I’ve noticed a move to start making absolute statements based on such analysis. I highlighted one such area earlier this year in relation to the UK general election. Some people even suggested Twitter could predict the outcome. They were wrong.

The thing is, as much as social data can be powerful and seem vast in scope, you still need to keep a sense of perspective.

It’s been estimated that every day people speak an average of around 16,000 words. With this in mind I thought I’d try and make a quick estimate of the proportion of people’s conversation in North America and the UK that social media monitoring data represents.

Answer? 0.16 per cent* 

And that’s before we get into issues like spam accounts, bias towards power users’ output, questions about whether tweets and posts are truly an authentic reflection of what people think and feel, demographic bias and the online disenfranchised.

I based my estimate on Twitter and Facebook, as they represent the majority of conversation that such tools access. We could add Reddit, blog posts, comments on online articles and YouTube videos, forums etc, and if anyone fancies doing so, be my guest! But I don’t expect you’ll get to a much bigger number.

Particularly as on the other side of the equation we could add to what people say other forms of conversation that aren’t accessible to social listening: emails, messaging apps and collaboration tools like Slack to name a few.

So does this make social listening as an insight tool a waste of time?

No, of course not. I’ve spent enough time buried deep in social data to know that it can provide hugely valuable insights. But to achieve this you need to be extremely focussed.

Ask good questions

Structure questions that take into account the limitations of the data. “Who does Twitter conversation suggest is going to win the UK general election?” does not fall into this category. Also ensure the answer doesn’t lead to a “so what” moment, but provides a genuine basis to take more action.

Say no to pretty noise

Pretty dashboards that pluck results out of the ether aren’t the answer. Make sure you understand exactly who you’re listening to – who is behind the data.You need this audience perspective to be confident what you’re seeing is real insight and to address what I call the four (f)laws of social listening.

Be sceptical

Sometimes social media analysis gives you an answer you didn’t expect, one that differs from your existing world view. It’s crucial you don’t dismiss such answers as they could be the most valuable insights you’ll ever get. Equally, don’t naively just accept them at face value. Challenge. Try and triangulate the answer from another source. Try asking the question in a different way and compare the answers. Sometimes you can be surprised.

* You can see my back of an envelope calc here. The estimated variables are editable in the “Try your own” sheet (highlighted in blue) so you can have a play to work out your own figures. In simple terms we’re comparing:

Talking: c. 422 million people across US, Canada and UK using 16,000 words per day = 6.75 trillion words.
Twitter: c. 137 million tweets (N. American and UK users assumed at 27.5 per cent of active users multiplied by 500 million tweets per day) assumed to contain an average of 25 words = 3.4 billion words
Facebook: c. 707 million Facebook posts per day (N. American and UK users assumed at 16.4 per cent of users multiplied by 4,320 million posts per day) assumed to contain an average of 50 words = 35 billion words. Only 20 per cent of these posts assumed to be accessible by social listening tools. I have no specific basis for the level of this last assumption, though clearly it is the case that social listening tools can’t access all Facebook data – though Datasift’s PYLON offering provides a potential solution to this privacy issue. However even if you assume all posts accessible the result only increases to 0.57 per cent.