Oct 18, 2018 6:21 PM

Twitter's Dated Data Dump Doesn’t Tell Us About Future Meddling

Twitter's release of more than 10 million tweets from Russia's Internet Research Agency and Iran sheds little light on those agencies' current tactics, researchers say.

Image may contain Flag Symbol City Town Urban Building Metropolis Office Building Landscape Outdoors and Nature

Twitter dropped an almost unfathomably large archive of tweets connected to two alleged influence campaigns on Wednesday. The trove included over 9 million tweets associated with 3,841 accounts connected to Russia’s notorious Internet Research Agency, or IRA, as well as more than a million tweets attributed to a network of 770 Iranian propaganda-pushing accounts. Twitter has never before released an archive of this size. But researchers tell WIRED that it says more about the past than it does about present or future threats Twitter should be wary of with important midterm elections less than three weeks away.

The data provides an interesting historical account of the early tactics employed by the IRA and Iran, but it isn’t particularly useful for trying to understand what these groups, or others, are doing now to influence conversations and the upcoming election, says Clemson University professor Darren Linvill. He and professor Patrick Warren have been analyzing and tracking the IRA’s activity on Twitter for years; until Wednesday’s data dump, they were responsible for compiling one of the most comprehensive archives of IRA tweets. They’ve also been tracking ongoing IRA activity, which they describe as unlike anything they’ve ever seen before.

“With each passing year they [the IRA] get more and more sophisticated,” said Linvill. “And the work they’re doing right now is very sophisticated. They gain followers really, really quickly, and their messages spread better than they did in previous years.”

Shifting Tactics

Early on, Linvill says, the IRA mostly targeted Russians in Russian, relying on memes, links, and specific hashtag campaigns rather than more personality-driven posting. The group later expanded its focus to include politics around the world, and US issues in particular. As agents grew more advanced, their fake accounts turned to hot-button issues in American society. The IRA would devote time to creating elaborate fake news and even posted stories on CNN’s community pages, as Ben Nimmo, a senior fellow at the Atlantic Council’s Digital Forensic Research Lab, pointed out on Twitter. In response to Twitter’s 2017 crackdown on IRA accounts, they turned to automation rather than continue posting individually.

In June, Twitter said it had shut down more than 70 million accounts in May and June alone. But the company did not identify any specific accounts as being linked to the IRA, and it has been silent on the subject since. Linvill says it seems highly unlikely that all of the IRA operatives simply just disappeared. If Twitter really wanted to help researchers and the public gain a better understanding of what groups like the IRA are up to now, they would release the names of any IRA-linked accounts in recent months. "This would give us a much better idea of current IRA tactics." (Twitter declined a request for comment.)

Without more recent information on the IRA’s activities, Linvill says, it’s extremely difficult to discern what threats platforms like Twitter should be prepared for as the US gears up for midterm elections. “Two-year-old data is history,” said Linvill, “I think researchers and journalists would really like to know what's happening now” according to Twitter.

Much of the data Twitter released Wednesday had previously been made public. In February, NBC News released 200,000 tweets tied to “malicious activity” in the 2016 US presidential election from the over 3,000 accounts associated with the IRA. Five months later, Linvill and Warren published 3 million more. Though Twitter hadn’t previously named all 770 Iranian accounts, around half of those had been identified by independent cybersecurity company FireEye in August. The bulk of the truly new data dated from before June 2015, as this information wasn’t previously accessible to those outside of Twitter due to GDPR-related restrictions.

“A dated big data dump doesn’t help us at all today,” said John Gray, CEO and cofounder of Mentionmapp Analytics, a social media information and network visualization organization that specializes in identifying manipulation on platforms like Twitter. Gray says that while making all the data publicly accessible certainly makes Twitter appear more interested in transparency than other platforms like Facebook, the company is still missing the larger point. More valuable than disclosing the tweets, Gray said, would be detailing how Twitter identifies malicious actors, so outside researchers can check and replicate the process.

Linvill and Warren agree. In an email to WIRED, the two professors wrote that they believe that some of Twitter’s processes for identifying IRA accounts weren’t thorough enough, which has led the company to misidentify users before. “We have some concern these issues may remain,” they wrote. In June 2018 Twitter released a list of accounts it suspected were associated with the IRA. When combing over this data, Linvill and Warren found 20 accounts that they felt were likely legitimate accounts run by real people with no ties to Russia and removed them from their data set. Linvill and Warren say they shared this information with Twitter on Tuesday, before the archive was released to the public, and were surprised to see that Twitter included the 20 accounts in their data set regardless. Twitter similarly included the accounts of real Americans on a list it submitted to Congress in November of suspected IRA-associated accounts. Twitter did include a contact form on its new elections integrity hub for those who feel they’ve been erroneously identified as compromised.

Disputed Impact

Researchers disagree on the global impact of these tweets. The Atlantic Council’s DFRLab, which was given early access to the trove of data by Twitter, found the 10 million tweets to have “low impact” in a thorough analysis posted shortly after the archive was made public. “Other than in the United States, the troll operations do not appear to have had significant influence on public debate,” wrote DFRLab. “There is no evidence to suggest that they triggered large-scale changes in political behavior, purely on the basis of their social media posts.”

Linvill, Warren, and Gray disagree. Linvill and Warren argue that measuring impact is tricky, as the goals of groups like the IRA can’t be reduced to something as simple as changing an individual’s political beliefs through a few tweets. Rather, they say the groups aim to sow division, exacerbate distrust in civil institutions, and manipulate public conversation. “What the IRA in particular is trying to do is pretty broad in scope. You can't really put a number on it, because it's fundamentally not about any particular election,” said Linvill.

Disinformation researcher Erin Gallagher says it reminds her of the influence operations favored by Mexico’s Institutional Revolutionary Party (PRI) since 2012. These social-media-based campaigns aimed to distract the public from certain news by pushing unrelated viral narratives on Twitter, Facebook, and WhatsApp. In Spanish, they literally call it cortina de humo, or a smoke screen, explained Gallagher. Whenever something embarrassing happened to the president or his allies, an army of sock-puppet-like accounts would find some other scandal or bit of gossip and help it trend, in order to push the offending content out of users’ minds.

The question of impact is complicated, says Gray of Mentionmapp. He believes impact can’t be derived by looking at single instances, like individual tweets sent by a particular group, but rather by inferring the cumulative effect of months of increasingly divisive rhetoric. “How can we dismiss the impact when outrage and doubt have been commodified?” Gray wonders. Gallagher sees it from a more transactional perspective: “On one hand I feel like it must work. They must have influence; if it didn't, no one would do it,” and certainly not for nearly a decade, said Gallagher. “But then, on the other hand, there’s no benchmark. We don't know how they are being measured internally, or if that is even accurate.”