The Data Aren't Worth Anything But We'll Keep Them Forever Anyways. You're Welcome.

4 min read

Earlier this week, Instructure announced that they were being acquired by a private equity firm for nearly 2 billion dollars. 

Because Instructure offers a range of services, including a learning management system, this triggered the inevitable conversation: how much of the 2 billion price tag represented the value of the data?

The drone is private equity.

There are multiple good threads on Twitter that cover some of these details, so I won't rehash these conversations - the timelines of Laura Gibbs, Ian Linkletter, Matt Crosslin, and Kate Bowles all have some interesting commentary on the acquisition and its implications. I recommend reading their perspectives.

My one addition to the conversation is relevant both to Instructure and educational data in general. Invariably, when people raise valid privacy concerns, defenders of what currently passes as acceptable data use say that people raising privacy concerns are placing too much emphasis on the value of the data, because the data aren't worth very much.

Before we go much further, we also need to understand what we mean when we say data in this context: data are the learning experiences of students and educators; the artifacts that they have created through their effort that track and document a range of interactions and intellectual growth. "Data" in this context are personal, emotional, and intellectual effort -- and for everyone who had to use an Instructure product, their personal, emotional, and intellectual effort have become an asset that is about to be acquired by a private equity firm.

But, to return to the claims that the data have no real value: these claims about the lack of value of the underlying data are often accompanied by long descriptions of how companies function, and even longer descriptions about where the "real" value resides (hint: in these versions, it's never the data).

Here is precisely where these arguments fall apart: if the data aren't worth anything, why do companies refuse to delete them?

We can get a clear sense of the worth of the data that companies hold by looking at the lengths they go to both obfuscate their use of this data, and the lengths that they go to hold on to it. We can see a clear example of what obfuscation looks like from this post on the Instructure blog from July of 2019. The post includes this lengthy non-answer about why Canvas doesn't support basic user agency in the form of an opt out:

What can I say to people at my institution who are asking for an "opt-out" for use of their data?

When it comes to user-generated Canvas data, we talk about the fact that there are multiple data stewards who are accountable to their mission, their role, and those they serve. Students and faculty have a trust relationship with their educational institutions, and institutions rely on data in order to deliver on the promise of higher education. Similarly, Instructure is committed to being a good partner in the advancement of education, which means ensuring our client institutions are empowered to use data appropriately. Institutions who have access to data about individuals are responsible to not misuse, sell, or lose the data. As an agent of the institution, we hold ourselves to that same standard.

Related to this conversation, when we hear companies talking about developing artificial intelligence (AI) or machine learning (ML) to develop or improve their product, they are describing a process that requires significant amounts of data to start the process, and significant amounts of new/additional data to continue to develop the product.

But for all the companies, and the paid and unpaid defenders of these companies: you claim that the data have no value while simultaneously refusing to delete the data -- or to even allow a level of visibility into or learner control over how their data are used.

If -- as you claim -- the data have no value, then delete them.

Misinformation: Let's Study the Olds, Shall We?

4 min read

The Stanford History Education Group (SHEG) released a study recently about Student's Civic Online Reasoning. The link shared here contains a link to a download page for the full study, and it's worth reading.

The Executive Summary somberly opens with a Serious Question (tm):

The next presidential election is in our sights. Many high school students will be eligible to vote in 2020. Are these first-time voters better prepared to go online and discern fact from fiction?

The conclusions shared in the Executive Summary are equally somber:

Nearly all students floundered. Ninety percent received no credit on four of six tasks.

I was able to capture secret video of Serious People reacting to this study. You're welcome.

FREAK OUT!!

All kidding aside, the "kids are bad at misinformation" cliche needs to be retired. It was never particularly good to begin with, and it hasn't gotten much better with age.

To be clear: if the adults in the building lack basic information literacy, it will be increasingly difficult for students to master these skills. Last I checked, high school sophomores didn't vote in the 2016 or 2018 elections. But their teachers, and their school administrators? They sure did.

Also, to be clear: if I had a nickel for every time I discovered a romance scam account only to see some of our educational and edtech "leaders" following the account, I could retire, right now. To date, I have refrained from naming names, but hoo boy my patience wears thin on some days.

But when we are studying misinformation, we need to stop doing it in a vacuum, and in a limited way. The recent SHEG study pulls in demographic information including race, gender, and maternal education levels, and this is a good start, but it's still incomplete.

Additional data points that are readily and publicly available, and that could be assembled once and reused indefinitely, include:

  • Voter turnout percentages for 2012, 2014, 2016, 2018, and -- eventually -- 2020 elections.
  • Voter results (Federal House and Senate, and Presidential) for 2012, 2014, 2016, 2018, and -- eventually -- 2020 elections.

These data can be obtained via postal code or FIPS code, and would provide an additional point of reference to results. Given that many studies of misinformation and youth are contextualized within the frame of civic participation, we should probably have some measure of actual civic participation that holds true across the entire country.

While this addition would provide some useful context, it still doesn't get any information about the adults in the system, and their skill levels. Toward that end, surveys should include as many adults within evaluated systems as possible: administrative and district staff; school board members; superintendents and assistant superintendents; curriculum staff and technical staff; building level principals and assistant principals; school librarians (ha, yeah, I know); and classroom teachers. Data should also note levels of participation across staff.

By including adults in the study, the relative skill level of the adults could be cross referenced against the students for whom they are responsible, and the overall levels of participation in national elections. Rate of participation from adults would also be an interesting data point.

This is a very different study than what SHEG put out. Getting adult participation would make recruiting participant districts even more time consuming -- but if we are going to move past where we are now, we need to do better than we're currently doing. All of us need to get better at addressing misinformation, and we're not going to get there by pointing fingers at youth or by taking too narrow a view of the problem. But we can't shy away from the reality that adults have played an outsized role in creating and perpetuating the success of misinformation. To fix the problems caused my misinformation, we need to study ourselves as well.

Adtech, Tracking, and Misinformation: It's Still Messy

15 min read

Introduction

Over the last several months, I have wasted countless hours read through and collected online posts related to several conversational spikes that were triggered by current events. These conversational spikes contained multiple examples of outright misinformation and artificial amplification of this misinformation.

I published three writeups describing this analysis: one on a series of four spikes related to Ilhan Omar, a second related to the suicide of Jeffrey Epstein, and a third related to trolls and sockpuppets active in the conversation related to Tulsi Gabbard. For these analyses, I looked at approximately 2.7 million tweets, including the domains and YouTube videos shared.

Throughout each of these spikes, right leaning and far right web sites that specialize in false or misleading information were shared far more extensively than mainstream news sources. As shown in the writeups, there was nothing remotely close to balance in the sites shared. Rightwing sites and sources dominated the conversation, both in number of shares, and in number of domains shared.

This imbalance led me to return to a question I looked at back in 2017: is there a business model or a sustainability model for publishing misinformation and/or hate? This is a question multiple other people have asked; as one example, Buzzfeed has been on this beat for years now.

To begin to answer this question, I scanned a subset of the sites used when spreading or amplifying misinformation, along with several mainstream media sites. This scan had two immediate goals:

  • get accurate information about the types of tracking and advertising technology used on each individual site; and 
  • observe overlaps in tracking technologies used across multiple sites.

Both mainstream news sites and misinformation sites rely on advertising to generate revenue.

The companies that sell ads collect information about people, the devices they use, and their geographic location (at minimum, inferred from IP addresses, but also captured via tracking scripts), as part of how they sell and deliver ads.

This scan will help us answer several questions:

  1. what companies help these web sites generate revenue?
  2. what do these adtech companies know about us?
  3. given what these companies know about us, how does that impact their potential complicity in spreading, supporting, or profiting from misinformation?

Methodology

25 sites were scanned -- each site is listed below, followed by the number of third parties that were called on each site. The sites selected for scanning meet one or more of the following criteria: were used to amplify false or misleading narratives on social media; have a track record of posting false or misleading content; are recognized as a mainstream news site; are recognized as a partisan but legitimate web site.

Every site scan began by visiting the home page. From the home page, I followed a linked article. From the linked article, I followed a link to another article within the site, for a total of three pages in each site.

On each pageload, I allowed any banner ads to load, and then scrolled to the bottom of the page. A small number of the sites used "infinite scroll" - on these sites, I would scroll down the equivalent of approximately 3-4 screens before moving on to a new page in the site.

While visiting each site, I used OWASP ZAP (an intercepting proxy) to capture the web traffic and any third party calls. For each scan, I used a fresh browser with the browsing history, cookies, and offline files wiped clean.

Summary Results

The list of sites scanned are listed below, sorted in order of observed trackers, from low to high.

The sites at the top of the list shared information about site visitors with more third party domains. In general, each individual domain is a different company, although in some cases (like Google and Facebook) a single company can control multiple domains. This count is at the domain level, so if a site sent user information to subdomain1.foo.com and subdomain2.foo.com, the two distinct subdomains count as a single site.

  • dailycaller (dot) com -- 189
  • thegatewaypundit (dot) com -- 160
  • thedailybeast (dot) com -- 154
  • mediaite (dot) com -- 153
  • dailymail.co.uk -- 151
  • zerohedge (dot) com -- 145
  • cnn (dot) com -- 143
  • westernjournal (dot) com -- 140
  • freebeacon (dot) com -- 137
  • huffpost (dot) com -- 131
  • breitbart (dot) com -- 107
  • foxnews (dot) com -- 101
  • twitchy (dot) com -- 92
  • thefederalist (dot) com -- 88
  • townhall (dot) com -- 83
  • washingtonpost (dot) com -- 82
  • dailywire (dot) com -- 71
  • pjmedia (dot) com -- 61
  • lauraloomer.us -- 52
  • nytimes (dot) com -- 42
  • infowars (dot) com -- 40
  • vdare (dot) com -- 21
  • prageru (dot) com -- 19
  • reddit (dot) com -- 18
  • actblue (dot) com -- 13

The list below highlights the most commonly used third party domains. The list breaks out the domain, the number of times it was called, and the company that owns the domain. As shown below, the top 24 third parties were all called by 18 or more sites.

The top 24 third party sites getting data include some well known names in the general tech world, such as Google, Facebook, Amazon, Adobe, Twitter, and Oracle.

However, lesser known companies are also broadly used, and get access to user information as well. These less known companies collecting information about people's browsing habits include AppNexus, MediaMath, The Trade Desk, OpenX, Quantcast, RapLeaf, Rubicon Project, comScore, and Smart Ad Server.

Top third party domains called:

  • doubleclick.net - 25 - Google
  • googleapis.com - 24 - Google
  • facebook.com - 23 - Facebook
  • google.com - 23 - Google
  • google-analytics.com - 22 - Google
  • googletagservices.com - 22 - Google
  • gstatic.com - 22 - Google
  • adnxs.com - 21 - AppNexus
  • googlesyndication.com - 21 - Google
  • adsrvr.org - 20 - The Trade Desk
  • mathtag.com - 20 - MediaMath
  • twitter.com - 20 - Twitter
  • yahoo.com - 20 - Yahoo
  • amazon-adsystem.com - 19 - Amazon
  • bluekai.com - 19 - Oracle
  • facebook.net - 19 - Facebook
  • openx.net - 19 - OpenX
  • quantserve.com - 19 - Quantcast
  • rlcdn.com - 19 - RapLeaf
  • rubiconproject.com - 19 - Rubicon Project
  • scorecardresearch.com - 19 - comScore
  • ampproject.org - 18 - Google
  • everesttech.net - 18 - Adobe
  • smartadserver.com - 18 - Smart Ad Server (partners with Google and the Trade Desk)

The full list of domains, and the paired third party calls, are available on Github.

As noted above, Doubleclick -- an adtech and analytics service owned by Google -- is used on every single site in this scan. We'll take a look at what that means in practical terms later in this post. But other domains are also used heavily across multiple sites.

amazon-adsystem.com -- controlled by Amazon -- was called on 19 sites in the scan, including Mediaite, CNN, Reddit, Huffington Post, the Washington Post, the NY Times, Western Journal, PJ Media, ZeroHedge, the Federalist, Breitbart, and the Daily Caller.

adsrvr.org -- a domain that appears to be owned by The Trade Desk, was called on 20 sites in the scan, including Breitbart, PJMedia, ZeroHedge, The Federalist, CNN, Mediaite, Huffington Post, and the Washington Post.

Stripe -- a popular payment platform -- was called on right wing sites to outright hate sites. While I did not confirm that each payment gateway is active and functional, the chances are good that Stripe is used to process payments on some or all of the sites where it appears. Sites where calls to Stripe came up in the scan include VDare (a white nationalist site), Laura Loomer, Breitbart, and Gateway Pundit.

Stripe is primarily a payment processor, and is included here to show an additional business model -- selling merchandise -- used to generate revenue. However, multiple adtech and analytics providers are used indiscriminately on sites across the political spectrum. While some people might point to the ubiquity and reuse of adtech across the political spectrum -- and across the spectrum of news sites, from mainstream to highly partisan sites, to hate sites and misinformation sites -- as a sign of "neutrality", it is better understood as an amoral stance.

Adtech helps all of these sites generate revenue, and helps all of these sites understand what content "works" best to generate interaction and page views. When mainstream news sites use the same adtech as sites that peddle misinformation, the readers of mainstream sites have their reading and browsing habits stored and analyzed alongside the browsing habits of people who live on an information diet of misinformation. In this way, when mainstream news sites choose to have reader data exposed to third parties that also cater to misinformation sites, it potentially exposes these readers to advertising designed for misinformation platforms. In the targeted ad economy, one way to avoid being targeted is to be less visible in the data pool, and when mainstream news sites use the same adtech as misinformation sites, they sell us out and increase our visibility to targeted advertisers.

Note: Ad blockers are great. Scriptsafe, uBlock Origin, and/or Privacy Badger are all good options.

Looking at this from the perspective of an adtech or analytics vendor, they have the most to gain financially from selling to as many customers as possible, regardless of the quality or accuracy of the site. The more data they collect and retain, the more accurate (theoretically) their targeting will become. The ubiquity of adtech used across sites allows adtech vendors to skim profit off the top as they sell ads on web properties working in direct opposition to one another.

In short, while our information ecosystem slowly collapses under the weight of targeted misinformation, adtech profits from all sides, and collects more data from people being misled, thus allowing more accurate targeting of people most susceptible to misleading content over time. Understood this way, adtech has a front row seat to the steady erosion of our information ecosystem, with a couple notable caveats: first, with the dataset adtech has collected and continues to grow, they could identify the most problematic players. Second, adtech profits from lies just as much as truth, so they have a financial incentive to not care.

But don't take my word for it. In January 2017, Randall Rothenberg, the head of the Interactive Advertising Bureau (IAB, the leading trade organization for online adtech), described this issue:

We have discovered that the same paths the curious can trek to satisfy their hunger for knowledge can also be littered deliberately with ripe falsehoods, ready to be plucked by – and to poison – the guileless.

In his 2017 speech, Rothenberg correctly observes that advertising has what he describes as a "civic responsibility":

Our objective isn’t to preserve marketing and advertising. When all information becomes suspect – when it’s not just an ad impression that may be fraudulent, but the data, news, and science that undergird society itself – then we must take civic responsibility for our effect on the world.

In the same speech in 2017, Rothenberg highlights the interdependence of adtech and the people affected by it, and the responsibilities that requires from adtech companies.

First, let me dispense with the fantasy that your obligation to your company stops at the door of your company. For any enterprise that has both customers and suppliers – which is to say, every enterprise – is a part of a supply chain. And in any supply chain, especially one as complex as ours in the digital media industry, everything is interdependent – everything touches something else, which touches someone else, which eventually touches everyone else. No matter how technical your company, no matter how abstruse your particular position and the skill it takes to occupy it, you cannot divorce what you do from its effects on the human beings who lie, inevitably, at the end of this industry’s supply chain.

Based on what is clearly observable in this scan of 25 sites that featured heavily in misinformation campaigns, nearly three years after the head of the IAB called for improvements, actual improvements appear to be in very short supply.

Tracking Across the Web

To illustrate how tracking looks in practice, I did a sample scan across six web sites: Gateway Pundit Breitbart PJ Media Mediaite The Daily Beast The Federalist

While all of these sites use dozens of trackers, for reasons of time we will limit our review to two: Facebook and Google. Also, to be very clear: the proxy logs for this scan of six sites contains an enormous amount of information about what is collected, how it's shared, and the means by which data are collected and synched between companies. The discussion in this post barely scratches the surface, and this is an intentional choice. Going into more detail would have required a deeper dive into the technical implementation of tracking, and while this deeper dive would be fun, it's outside the scope of this post.

In the screenshots below, the urls sent in the headers of the request, the User Agent information, and the full cookie ID are partially obfuscated for privacy reasons.

Facebook:

Facebook sets a cookie on the first site: Gateway Pundit. This cookie has a unique ID, which gets reused across multiple sites. The initial request sent to Facebook includes a timestamp, and basic information about the system used to access the site (details like operating system, browser, browser version, and screen height and width). The request also includes the time of day, and the referring URL.

Gateway Pundit and Facebook tracking ID

At this point, Facebook doesn't need much more flesh out a device fingerprint to map to this ID to a specific device. However, a superficial scan of multiple scripts loaded by domains affiliated with Facebook suggest that Facebook collects adequate data to generate a device fingerprint, which would allow them to then tie that more specific identifier to different cookie IDs over time.

The cookie ID is consistently included in headers across multiple web sites. In the screenshot below, the cookie ID is included in a request on Breitbart:

Breitbart and Facebook tracking ID

And PJ Media:

PJ Media and Facebook tracking ID

And Mediaite:

Mediaite and Facebook tracking ID

And the Daily Beast:

Daily Beast and Facebook tracking ID

And the Federalist:

Federalist and Facebook tracking ID

Google:

Google (or more specifically, Doubleclick, which is owned by Google) works in a similar way as Facebook.

The initial Doubleclick cookie, with a unique value, gets set on the first site, Gateway Pundit. As with Facebook, this cookie is repeatedly included in header requests on every site in this scan.

Gateway Pundit and Google tracking ID

Here, we see the same ID getting included in the header on PJ Media:

PJ Media and Google tracking ID

And on Breitbart:

Breitbart and Google cookie ID

As with Facebook, Google repeatedly gets browsing information, and information about the device doing the browsing. This information is tied to a common identifier across web sites, and this common identifier can be tied to a device fingerprint, which can be used to precisely identify individuals over time. The data collected by Facebook and Google in this scan includes specific URLs accessed, and patterns of activity across the different sites. Collectively, over time, this information provides a reasonably clear picture of a person's habits and interests. If this information is combined with other data sets -- like search history from Google, or group and interaction history from Facebook, we can begin to see how browsing patterns provide an additional facet that can be immensely revealing as part of a larger profile.

Conclusion, or Thoughts on Why this Matters

Political campaigns are becoming increasingly more aggressive with how they track people and target them for outreach.

As has been demonstrated, it's not difficult to identify the location of specific individuals using even rudimentary adtech tools.

Given the opacity of the adtech industry, it can be difficult to detect and punish fraudulent behavior -- such as what happened with comScore, an adtech service used in 19 of the 25 sites scanned.

As social media platforms -- who are also adtech vendors and data brokers -- flail and fail to figure out their role, the ability to both amplify questionable content and to target people using existing adtech services provide powerful opportunities to influence people who might be prone to a nudge. This is the promise of advertising, both political and consumer, and the tools for one are readily adaptable for the other.

Adtech both profits from and extends information asymmetry. The companies that act as data brokers and adtech vendors know far more about us than we do about them. Web sites pushing misinformation -- and the people behind these sites -- can potentially use this stacked deck to underwrite and potentially profit from misinformation.

Adtech in its current form should be understood as a parasite on the news industry. When mainstream news sites throw money and data into the hands of adtech companies that also support their clear enemies, mainstream sites are actively undermining their long term interests.

Conversely, though, the adtech companies that currently profit from the spread of misinformation, and the targeting of those who are most susceptible to it, are sitting on the dataset that could help counter misinformation. The same patterns that are used to target ads and analyze individuals susceptible to those ads could be put to use to better understand -- and dismantle -- the misinformation ecosystem. And the crazy thing, and a thing that could provide hope: all it would take is one reasonably sized company to take this on.

If one company decided that, finally, enough is enough, they could theoretically work with researchers to develop an ethical framework that would allow for a comprehensive analysis of the sites that are central to spreading specific types of misinformation. While companies like Google, Facebook, Amazon, Appnexus, MediaMath, the Trade Desk, comScore, or Twitter have shown no inclination to tackle this systematically, countless smaller companies would potentially have datasets that are more than complete enough to support detailed insights.

Misinformation campaigns are happening now, across multiple platforms, across multiple countries. The reasons driving these campaigns vary, but the tradecraft used in these campaigns has overlaps. While adtech currently supports people spreading misinformation, it doesn't need to be this way. The same data that are used to target individuals could be used to counter misinformation, and make it more difficult to profit from spreading lies.

Apple and Chromebooks in Education, and zzzzzz

4 min read

I read this story with some interest. It's largely a press release in interview form, but at the end of the article the interview includes a question about Apple's position in the education market. To be clear, the closing paragraphs on education -- and an accompanying discussion on a listserv where I lurk -- are what motivated this post. This excerpt in particular stood out:

Yet Chromebooks don't do that. Chromebooks have gotten to the classroom because, frankly, they're cheap testing tools for required testing. If all you want to do is test kids, well, maybe a cheap notebook will do that. But they're not going to succeed.

My initial thoughts after reading this story haven't changed much -- I'll share those thoughts in a little bit, but first I want to be clear about the limits of my perspective, which also provides insight into my bias. I have worked in schools, but am not currently employed by a school or district. When I was working in schools, I was responsible for several different laptop and 1:1 initiatives supporting teachers and students. But this is back in the late 90s, early aughts - so a while ago.

Currently, I have the good fortune to have both professional contacts and friends in schools and districts across the country, so I've been able to -- over the years -- learn directly from people who run and manage several hundred to several thousand to tens of thousands to over 100K machines. These machines have varied from Chromebooks to iPads to Windows machines to (yes, for real) even some Linux boxes. The larger deployments - and even a lot of the smaller deployments -- mix multiple types of devices.

But this gets back to my original reaction after reading the apple exec's quotes in the article about Chromebooks and how their usefulness for testing helped drive adoption: he's not wrong, but the fact that his statement contains a kernel of accuracy is completely irrelevant.

More importantly, the fact that he is able to repeat Apple's highly flawed marketing copy doesn't make Chromebooks any better or any worse.

The big edtech players are most loyal to their marketing copy, their talking points, and moving product. There are many excellent individuals working in these companies who care about education, but even these folks will acknowledge that the needs of the company will generally win out.

Apple is getting its lunch eaten by Google in education for a lot of reasons - and the fact that Chromebooks are good at supporting standardized testing is a small part of the conversation. But we should not kid ourselves either: the ease of managing Chromebooks compared to the relative complexity of managing different Apple devices is also a factor (See also: why Firefox struggles for a greater share of use in schools). Chromebooks are definitely easier for the adults, and that can translate into kids having more access to more devices more of the time.

But, of course, using a Chromebook means that we are committing to using multiple other parts of a larger ecosystem -- Chromebooks shouldn't be understood just as a device -- they need to be understood as a hardware product that is simultaneously product onboarding and loss leader that is easy for adults to administer and hand over to kids in schools.

So, yeah - when I read the article my reaction was more than a little bit of facepalm. But not because he was wrong about education -- that's pretty normal for just about any tech exec when they talk about education. My facepalm is rooted in the fact that we are still arguing over the merits of one device over another device. To be blunt: at the end of the day, show me the evidence base -- real, actual peer reviewed and replicable science -- that shows that Chromebooks or Windows devices or iPads or MacBooks matter, or that they matter more than a school environment where kids aren't shamed for lunch debt or in fear of being shot.

PS: Semi-related: if Airpods were spun off as a standalone company, they could become the 32nd largest company in the US. So let's talk about what companies are putting sub-$300 devices in schools.

Misinformation in Conversations about Tulsi Gabbard

14 min read

On the weekend of 19 October 2019, I pulled just over 1.2 million tweets from Twitter that covered conversation about Tulsi Gabbard. These posts cover a time period from 10 October 2019 to 19 October 2019. To understate things, there is a lot of noise in this data, and multiple conflicting stories in this data. This dataset contains accounts that average hundreds of posts a day, and that almost certainly belong to real people. This dataset also contains accounts that are highly active, and look like rented/borrowed accounts from real people. The dataset contains still other accounts that appear like trolls or sockpuppets engaged in artificial manipulation of the conversation. Telling these different accounts from one another can be complicated -- the behavior of a true believer can look a lot like the behavior of a bot or a sockpuppet, and a useful idiot can look a lot like a professional troll.

The narratives in the dataset also cover a lot of ground. The time period covers the Democratic debate, and the conversation around Hillary Clinton's remarks about Jill Stein, Russian assets, and Tulsi Gabbard's comments calling Hillary Clinton a "warmonger." Multiple conspiracy narratives get rehashed in this data, including how the 2016 campaign was "rigged" by the DNC so Hillary Clinton could win (As a side note, Hillary Clinton is truly conspiracy catnip; so many conspiracy theories -- from Seth Rich, to Pizzagate, to Epstein, to Benghazi, to the DNC server -- have been spread around Hillary Clinton that the mere mention of her name allows for a host of conspiracy theorists to draw "connections" and to document those "connections" with links to multiple web sites and YouTube videos). The conversation also included Yang supporters, Sanders supporters, Marianne Williamson supporters, Trump supporters, and adherents of the various Q conspiracy narratives, to name a few.

In this post, for reasons of brevity and coherence, I will limit myself to inorganic accounts and the narratives (and potential networks) they represent. This constraint leaves a lot out; for example, a scan of the popular YouTube shares shows (as usual) a skew towards outright misinformation or heavily slanted sources.

Accounts highlighted in this post meet multiple criteria to merit inclusion, and the accounts highlighted in this post also represent narratives that are more prevalent in the larger data set. The other reason this writeup focuses on accounts is that the dataset contains multiple examples of what appears to be inorganic activity looking to manipulate the conversation online.

Key Highlights:

  • The conversations about Tulsi Gabbard on Twitter contain thousands of accounts that exhibit traits of inorganic behavior;
  • Multiple different actors are represented in the dataset. Gabbard -- as a subject of conversation -- has been used in different ways to promote and amplify different narratives. This post contains several examples of representative accounts.
  • The divisiveness of Gabbard within the Democratic party makes her an attractive subject for actors looking to pollute conversations.

Summary:

At least 1,238,124 posts were shared by 342,894 accounts. 1858 accounts, or 0.54% percent of all accounts active in this spike, created 10% of the content. This imbalance of activity -- where under 1% of participating accounts create more than 10% of all content -- is comparable, but somewhat more pronounced, to other spikes. This potentially indicates that the most active accounts in this dataset are more active than normal. While high posting rates can be an indicator of trolls or sockpuppets dominating the conversation, posting rates alone are not an adequate indicator of inorganic activity or artificial amplification of narratives.

Out of all of the accounts that posted about Tulsi Gabbard between 10 October and 19 October, 33,354 (9.7% of all accounts) were created since January 1, 2019. Out of these fresh accounts, 2397 averaged 100 or more posts a day, and 643 averaged 200 or more posts a day. To put this in perspective, to share 100 posts a day, a person would need to average just over 8 posts an hour over 12 hours, every day. To average 200 posts a day, a person would need to average just under 17 posts an hour for 12 hours, every day.

Within the new accounts posting about Tulsi Gabbard, profile descriptions contained hashtags from across the political spectrum. While the hashtags used in profiles leaned toward hashtags associated with President Trump, right wing politics, and the Qanon conspiracy theory, several democratic campains were also represented within the dataset.

Looking at the top 15 most popular hashtags in profiles of new accounts, we get a rough breakdown across the ideological spectrum:

  • 8 Trump/Right/Q related hashtags - 3779 total hashtags
  • 3 Yang related hashtags - 1192 total hashtags
  • 1 Sanders related hashtag - 267 total hashtags
  • 3 Left/General Progressive related hashtags - 1158 total hashtags

The counts for the top 15 hashtags are listed below:

  • maga 1322 Right
  • kag 772 Right
  • resist 738 Progressive
  • yanggang 695 Yang
  • trump2020 544 Right
  • wwg1wga 401 Right
  • yang2020 278 Yang
  • 2a 272 Right
  • bernie2020 267 Sanders
  • resistance 225 Progressive
  • humanityfirst 219 Yang
  • fbr 195 Progressive
  • kag2020 174 Right
  • qanon 167 Right
  • trump 127 Right

Representative Accounts

In this writeup, I am highlighting multiple representative accounts. These accounts are examples of types of accounts that are present in large numbers in the data set.

For an account to merit inclusion in this writeup, the account needs to meet several or all of the following criteria:

  • new account, and/or
  • highly active in general, and/or
  • active on a developing news story, and/or
  • collecting large numbers of followers quickly, and/or
  • has a follower/following percentage of close to 1, and/or
  • no personally identifying info available,and/or
  • profile picture or background is sourced from stock photos or is clearly not the account holder, and/or
  • the account posts in multiple languages and across multiple highly disparate topics, and/or
  • shares links to sources that are known to be unreliable, and/or
  • other images used as background or in timeline are misrepresentations, and/or have been used in past misinfo efforts.

While these traits are used specifically to evaluate accounts on Twitter, these criteria can be adapted for use on other platforms.

Account Type # 1: Yang Supporter

The representative account of a Yang supporter calls for Sanders to support Tulsi to back up Andrew Yang and Marianne Williamson. Multiple accounts attempted to create the connection between Sanders, Yang, Gabbard, and Williamson as people connected in their fight against a "rigged" system. This narrative works in several ways: it uses the larger visibility of Bernie Sanders to get visibility for less popular and visible candidates like Yang, Gabbard, and Williamson; and it also plays on the subtext of a "rigged system", and this subtext still resonates with a subset of Sanders supporters upset about the 2016 Democratic primary. These narratives are divisive, provide visibility for fringe candidates that could conceivably siphon votes away from a Democratic candidate in a general election, and use past history as a foundation for current narratives -- and the divisiveness of the narrative makes it a great tool for misinformation.

Chen/Gabbard

This account uses the name of James Chen, a "US Army Soldier." It was created in September 2019, and has been fairly inactive except for during this spike in conversation.

Profile screengrab

However, this account uses the picture of Danny Chen, a soldier who died in Afghanistan in 2011.

News story

Account Type # 2: Right Wing/Q

This Q account opportunistically uses the conflict between Gabbard and Clinton to push multiple narratives and conspiracy theories. First, the account touches on a favored talking point held in common between Q and the far right, which claims that the Russian intervention in the 2016 election never happened, and that claims of Russian intervention are somehow part of a conspiracy designed to hurt President Trump. Second, this account shares links to ZeroHedge, a far right conspiracy site. Third, this account injects Q-related talking points into the conversation about Gabbard. All of these actions work to introduce the conspiracy theories and language of Q into the conversation about Gabbard.

Multiple Q posts

On its own, this might not seem especially significant - after all, the vast majority of people will be unaffected by these messages. However, misinformation or disinformation does not need to be universally successful to be effective. Success for some forms of misinformation involves normalizing fringe theories to make them more palatable to the mainstream, and eventually expanding the reach of a fringe message to people who might be receptive to it. This account has over 83,000 followers, which ensures that posts shared by this account (and similar accounts) are broadly distributed.

Account profile

This account uses a profile picture of Seth Rich, who has been at the center of conspiracy theories pushed by Sean Hannity on Fox, among others. The background picture uses images of Pepe, an image that was co-opted by right wing trolls. These images should be understood both as dog whistles to other conspiracy theorists, and potentially as propaganda for people not familiar with the conspiracy theories.

Also, this point needs to be emphasized: accounts highlighted here are representative. They are included not because they are singular examples; they are included because there are multiple accounts that fit this profile. For example, if we only look at new accounts (created after January 2019), over 100 have Q-related hashtags in their profile and average over 100 posts a day. The screencap below is from an account that was created in September 2019, and averages over 290 posts a day.

Second Q profile

As is suggested by the profile picture, the content of this feed is consistently racist, and amplifies conspiracy theories. These two Q-related accounts are two of many accounts active in the conversation about Tulsi Gabbard that fit this profile.

Account Type # 3: International Focus

The dataset also includes multiple accounts that have a more international focus. These accounts use profile pictures drawn from non-US sources, post about topics related to international politics, post in multiple languages, and amplify stories from state run or controlled media.

International Account 1:

In the screenshot below, we see an account sharing content in three languages (English, Turkish, and Urdu), sharing content from the Government of Punjab, and RT, a site controlled by the Russian government. The RT article is in English, and discusses Brexit.

Multiple posts

The account posted 17 times about Tulsi Gabbard, including the article shown below which asks a rhetorical question about whether Indian nationalists are collaborating with Russia to interfere in US elections.

Gabbard retweet

The account sharing these posts was created on May 30. Since it has been live, the account has averaged 288 posts a day. To put this in context, a person would need to average 24 posts an hour for 12 hours a day, every day, to create 288 posts daily.

International Account 2:

The next account we will look at was created on October 8, 2019, and has averaged 321 posts a day since it was created. The account shared multiple posts about Gabbard, including these two that float the idea of Gabbard as a third party candidate, and that highlight the conflict between Clinton and Gabbard.

Gabbard-related posts

From scanning the timeline of this account, the account also shares content more broadly related to US politics.

US politics

However, the account has broad interests outside of US politics. For example, it shares content from the Pakistani government, and BBC Urdu.

multiple languages

The account also shares content, in Russian, from the Russian government.

Russian language content

As an added bonus, the account also promotes pornography.

Porn spam

While it is unclear who controls this account, and despite the English language content shared by this account, the account does not seem to be run by a person fluent in English.

Posts in English

International Account 3:

The next representative account was created on September 28, 2019. Since creation, it has averaged 817 posts a day. To put this in human terms, a person would need to share 68 times an hour for 12 hours a day, every day, to keep up this posting volume. In this dataset, this account posted 24 times about Tulsi Gabbard, generally highlighting posts from known progressive accounts.

Sockpuppet retweets

A review of the images used for the account profile highlights some red flags.

First, the profile image is stolen from an image of a model.

Twitter Profile:

Sockpuppet background on Twitter

Original image source:

The background image is a modified version of an image stolen from a video game.

Original background

From a manual review of the timeline, the account shows interaction patterns that could suggest future atttempts at source hacking. In the screenshot below, the account replies to Lindsey Graham.

sockpuppet interactions

Account Type # 4: Progressive Sockpuppet

The dataset of accounts posting about Tulsi Gabbard includes progressive accounts that show several traits of being sockpuppets, and/or being involved in inorganic amplification of content.

The account pictured below posted posted at least 13 times about Tulsi Gabbard in the dataset. The account was created on February 15, 2019, and since that time has averaged 158 posts a day. While this is far from the most active account in this dataset, in human terms this would require 13 posts an hour for 12 hours a day, every day.

Progressive sockpuppet profile

In recent conversations about Tulsi Gabbard, posts from this account use language that shows consistent support for Hillary Clinton. In general, posts from this account highlight standard centrist Democratic talking points.

posts from progressive sockpuppet

A look at the user profile shows some red flags.

Progressive sockpuppet profile

Despite the relative newness of the account, it has managed to amass over 9000 followers, and the follower to following ratio is pretty close to 1:1 (9442 followers, 9310 following when the data were collected). Additionally, the user profile picture of the account is pulled from a stock photo titled, "Portrait of confident blonde man with arms crossed."

Progressive stock photo

Conclusions

As noted in the beginning of this post, over 33,000 new accounts (created since 1 January 2019) participated in the conversation about Tulsi Gabbard between 10 October 2019 to 19 October 2019. Many of these accounts are legitimate new users, but a subset of these accounts -- marked by high posting rates, stolen profile images, and/or any of the other traits of a suspect account. In addition, numerous other accounts created before January 2019 were also highly active in the conversation, and a subset of these accounts show multiple characteristics of inorganic amplification of various narratives.

One detail that stood out about the Gabbard dataset was the number of inorganic accounts, the methods used to create these account personas, and the various narratives spun by these inorganic accounts. Within the dataset, we had sockpuppets pushing libertarian ideals, mainstream democratic positions, hardcore right wing positions, and conspiracy theorists. We also had accounts with an international focus posting in multiple languages about political issues in Pakistan, India, Scotland, Britain, using state media and web sites associated with past misinformation campaigns.

The methods used to create these accounts spanned stolen identities, profile images and background images that serve as dog whistles (such as Pepe the frog or images of Seth Rich), to tactics used by romance scammers. The range of narratives, alongside recognized tactics used to create fake accounts in bulk, alongside the vast amount of conversation coming from inorganic accounts, suggests that some level of coordination between subsets of these accounts is likely. However, this analysis stops short of identifying specific actors in specific networks.

With all that said, Tulsi Gabbard clearly remains a useful prop for misinnformation and disinformation. Her divisiveness within the Democratic party makes her very useful for right wing propaganda efforts. Her positions on Syria and her lack of criticism of Bashar al Assad ensures that she can be used by fringe sites and the accounts that promote them to generate ongoing attention. Her past appearances on Russian state media and her willingness to appear on Tucker Carlson's show ensure that she will draw clicks from the right looking to criticize Democrats, from the left looking to criticize Gabbard, and from the anti-Hillary Clinton wing of the Democrats/Progressives who still cling to the belief that the DNC "stole" the nomination from Bernie Sanders. None of this implies or indicates that Gabbard or her campaign has any role in any of the inorganic amplification of the narratives about her. However, Gabbard does benefit from the visibility these campaigns bring.

The point of misinformation isn't to "win" an argument. Misinformation works by shaving off the margins; sometimes visibility is the end goal, because visibility is all that's required to pollute the conversation. The chatter about Gabbard, and the inorganic activity from fake accounts in the middle of this chatter, indicates that Gabbard's presence in the Democratic primary provides a useful tool that supports multiple narratives with different goals.

The Limits of Reverse Image Search, Demonstrated with Google and Allison Brie

2 min read

Using reverse image search to detect fake profile pictures, or to identify reuse of images, is a useful tool.

But, like any tool, effective use requires a clear understanding of the limits of the tool.

Have you ever seen a user profile on social media that has a slightly blurred image caused by some "artistic" filter (like many of the filters provided by Instagram, Snapchat, or other image-centric social media sites)? Maybe it's art, maybe it's obfuscation.

To demonstrate how reverse image search can provide mixed results, I'll start with a picture of a working actor, Allison Brie. This picture is from an interview on a late night show.

Allison Brie in an interview

This is a closeup pulled from the image via a screengrab.

Using an unaltered closeup, the image is detected in a small number of matching pages - but the original page from IMDB is not among the matching pages.

Limited results with original closeup

Then, I added a glow effect to the image -- this is a common visual effect used on profile images.

Image with slight glow

This image returned no matches at all.

No results with glow

Conclusion

When evaluating accounts on social media sites, reverse image search is a useful tool, but be aware that it's neither precise or fully accurate. If a profile pic is a closeup and/or subtly blurred, that can be an indication that the profile image is lifted from another source and modified to avoid detection.

Two notes unrelated to misinformation work, but stil worth noting:

First, this also highlights the extreme limitations of machine learning and AI, especially across large datasets. The machines are easily fooled.

Second, Google, what is up with your generic descriptions of images? The image in question is of a woman taken when she was in her 30s. The descriptive text related to "girl" is cringeworthy. Do better.

From a Rumor to the President: Transferring Misinformation in Five Steps

4 min read

UPDATE, October 1, 2019: the Inspector General of the Intelligence Community has taken the unusual step of issuing a statement that says that its process has not been altered. In other words, the ICIG is saying that the misinformation repeated by right wing media, and the president, is not true. END UPDATE

Studying and documenting misinformation can often be abstract and indirect. However, the last week has provided one of the more clear examples of misinformation traveling from a rumor to -- unfortunately -- being repeated as fact by news organizations and government officials. The whole process only took seven days; in this post I break down how the lie was introduced, how it was whitewashed, and how it was amplified.

The short version of this story: a person on Twitter made a post that laid out a false claim on September 23rd. The false claim was amplified in another Twitter thread on September 27th. The content of these two threads were then repackaged into an article on a far right web site. From there, the misinformation spread through multiple other right wing misinformation sites, into right wing pundits, to Fox News, and from there to the President's Twitter feed.

This piece from Kevin Poulsen goes into detail about why the claim is false, and about some of the people who shared the false claim.

The lie that was spread and amplified via misinformation was that the requirements for whistleblowers was mysteriously changed.

Step 1

The misinformation about the "changed requirements" for whistleblowers first appeared in this Twitter thread:

First twitter thread

I'm not linking to the thread to avoid giving it additional visibility (throughout the post, I will not be linking directly to any of the misinformation pieces listed so as to not give them any additional visibility. All of the posts referenced here are available on archive.org; I recommend viewing them there so as not to give any additional visibility to sites spreading misinformation).

This thread misrepresents the language of forms to suggest that the standards for whistleblower complaints were secretly changed. This is, of course, not true, but this lie serves two purposes: first, it is part of the larger multifaceted effort to discredit the whistleblower; and second, it is a dog whistle to the conspiracy theorists who have been pushing the lie that there is a "deep state" that is actively trying to combat Donald Trump.

Step 2

On September 27, the second thread amplified the misinformation of the first thread. This thread looks at the creation date of the pdfs, the revision date on another document, and then launches into a 20+ post thread that uses another standard technique of conspiracy theorists: stacking multiple tangentially related (and largely unsupported) claims in a mammoth thread. The desired effect of juxtaposing numerous unrelated spurious claims is to create a vague sense that "something is wrong" paired with the cognitive overload that comes with trying to sort out why any individual claim is relevant. This technique should be understood as a form of a DDOS on critical thinking. 

Second twitter thread

Again, for a review of why these claims about the whistleblower are unfounded, please see the article from Kevin Poulsen, re-linked again here for convenience.

Step 3

The misinformation contained in the Twitter threads summarized above were largely repackaged into a story posted on The Federalist. The story on the Federalist doesn't contain much new information; it largely regurgitated the screenshots and claims originally shared on Twitter, and added some of the talking points that the White House mistakenly emailed out earlier in the week.

Federalist

The Federalist story, however, attracted attention within the right.

Nunes tweet

Step 4

On Sunday, Fox News invited a right wing pundit on their Sunday Morning "news" program. In this segment, the right wing pundit repeats the lie that the whistleblower requirements were changed. Notably, this false claim was not disputed or checked; for viewers of Fox News, this piece of misinformation has now been presented as reality.

Mediaite - ugh

This piece was shared on Mediaite, a left leaning site that is also known for misinformation and partisan spin. Despite the unreliable source, and the left leaning bias of the source, the president uses this source to congratulate the right wing pundit who amplified the lie about the requirements for whistleblowers.

DJT noticing Fox News clip

Step 5

And finally, earlier today (Monday, September 30th), the personal account of the president of the United States referenced this false claim on his Twitter feed.

Trump question that amplifies misinformation

Conclusion

It's very unusual to have such a clear and concrete example of how misinformation travels from an early rumor into channels that, ideally, would be more reliable. However, in this example, we can trace the migration of false information directly from a twitter thread to the President.

Conspiracy Theories, Misinformation, Romance Scamming, White Nationalism, Twitter, and YouTube

12 min read

Trigger warning: this post discusses misinformation related to the suicide of a person implicated in sex trafficking of minors. While it does not go into detail on any of these topics, these topics are part of the subtext.

1. Summary

The dataset used in this analysis was pulled from two hashtags that trended on Twitter the morning Jeffrey Epstein's body was found in his cell. By virtue of Twitter surfacing these hashtags as trending, countless people were exposed to lies, misinformation, and conspiracy theories related to the suicide of Epstein.

  • Accounts that were looking to spread conspiracy theories about Epstein's death had a ready supply of web sites and video to support these theories.
  • Multiple videos spreading conspiracy theories were created within hours; these videos were shared hundreds of times within hours of the story breaking, and collectively viewed millions of times in the thirty days after Epstein's suicide
  • YouTube provides a convenient platform for spreading misinformation in real time. 10 of the top 17 videos shared were created on August 10th.
  • Twitter remains an effective medium for spreading misinformation and hate sites over factual reporting. Links to the Gateway Pundit were shared 720 times, compared to 417 shares to CNN.
  • Out of the 7000 most active accounts, over a thousand showed two or more signs of potential inorganic activity. Manual spot checks of these accounts highlight that accounts are using stolen images or stock images for their profiles, and in their feeds.

2. Introduction

The day that Jeffrey Epstein was found dead in his jail cell after his suicide, conspiracy theories and their related hashtags trended on Twitter. Two conspiracy theories were spread broadly: the first conspiracy theory claimed Epstein was murdered because he "knew too much;" and the second, related conspiracy theory claimed that the Clintons -- especially Hillary -- were directly involved. On August 10 - the day Epstein was found dead - these conspiracy theories were spread on Twitter so heavily that the hashtags "epsteinmurder" and "clintonbodycount" trended.

The posts used in this analysis were collected from Twitter in response to the search terms "epsteinmurder" or "clintonbodycount" on the day Epstein was found dead. Because the content was collected from terms that are directly related to conspiracy theories, we can expect to see more content from misinformation sites, and content from accounts that are actively engaged in spreading misinformation. This analysis examines domains shared to support the misinformation, YouTube videos shared, and selected accounts that stood out in this analysis.

3. Activity Overview

In the hours after Epstein's death, at least 312,623 posts were shared by 137,962 accounts. 947 accounts, or 0.69% percent of all accounts active in this spike, created 10% of the content. This imbalance of activity -- where under 1% of participating accounts create more than 10% of all content -- is in line with other recently observed spikes in activity.

Posts on "epsteinmurder" and "clintonbodycount"

4. Domain Shares

During this spike in conversation, the accounts that were active on these two hashtags associated with conspiracy theories shared approximately 400 domains 6280 times.

Looking at the top 20 most popular domain shares, right-leaning to far right sites dominated the domain shares, with 9 right wing domains shared 2515 times. 7 different mainstream to left leaning to left wing sites were shared 928 times. For information on how sites are categorized, see this explanation.

All of the mainstream to left leaning to left wing sites were from news organizations. Links to CNN were the most-shared mainstream site, with a total of 417 shares. In comparison, Gateway Pundit was shared 720 times, and links to the site of Ben Garrison -- a cartoonist -- was shared 1129 times.

Infowars and VDARE were both shared 98 times each, putting them in a tie for the 8th most popular domain shared. Infowars has a long track record as a source of hate and misinformation, and VDARE is a white nationalist web site. In contrast, links to the New York Times were shared 41 times, and the Times was the 24th most popular domain shared.

Expanding out to look at the top 50 most popular domains shared, right wing domains continue to dominate the conversation. Given the hashtags used to generate the data set, this isn't surprising. However, the amount of content ready to be shared -- or created in response to the news of Epstein's death -- is worthy of note. Additionally, the ability for this content to be spread so readily on Twitter, and for Twitter to highlight parallel bodies of content on other platforms -- shows a depth to the far right misinformation universe that should not be overlooked.

In the top 50 most shared domains, 23 right leaning or far right domains were shared 2901 times. 12 mainstream to left leaning to far left domains were shared 1080 times. The full list of domains shared can be read below, and it includes multiple far right misinformation sites and hate sites.

5. Videos and YouTube

Several video sharing sites -- Bitchute, Daily Motion, RedIce, and DLive -- all appeared in the top 50 domains shared. These sharing platforms were all used to distribute a small number of channels of far right content and/or conspiracy theories. While YouTube remains far and away the most popular platform used to share and disseminate hate speech and conspiracy theories, some people appear to be hedging their bets and using smaller platforms as a backup on the (highly unlikely) possibility that YouTube actually makes good on their years' worth of largely empty promises to address hate speech.

Links to YouTube videos were shared 808 times. 17 videos make up just under half (402) of all shares.

Only two of the top 17 videos were from mainstream to left leaning news sources (CBS and MSNBC). However, the content of these videos has been widely used in conspiracy theories. The CBS video is an outtake from a 2011 interview with Hillary Clinton when she was secretary of state; Clinton is on camera discussing the death of Libyan dictaror Moammar Qaddafi. The MSNBC video is a more recent story showing photos of Trump and Epstein together at Mar a Lago in the early 90s.

10 of the top 17 videos were created on August 10th; as of late August these new videos had 2,273,639 views. None of these videos were from mainstream news organizations. While YouTube continues to make promises about improving their platform, these efforts still allow questionable content to flourish. Hate speech and misinformation thrive on YouTube, and given YouTube's track record, will likely continue to flourish for the forseeable future.

6. Questionable Accounts Engaging in the Conversation

This section looks at questionable accounts that were active within the dataset using conspiracy-related hashtags. While this section examines account behavior that can often be traits of inorganic or artificial inputs that attempt to influence a conversation, the analysis does not examine whether or not the inorganic behavior on display within the data set can be attributed to a network, and/or that there is any level of coordination between accounts.

Additionally, while this section will highlight multiple examples that appear to be clear examples of inorganic activity, it is always healthy to remember that reality is complicated. While I am relatively comfortable that the examples I am highlighting show accounts engaging in inorganic behavior, precise attribution with 100% confidence is notoriously difficult. People are wonderful and strange; we do weird things all the time, so while an obervation can appear likely when viewed within a dataset, reality can make a liar out of the best data.

Additionally, when looking at behavior within politically motivated and conspiracy-obsessed communities, the activities of a true believer can look a lot like the activity of a troll, bot, or sockpuppet. Because of this inherent uncertainty, I use screenshots of accounts, but I blank out their username and their account name.

All of this is a long way of saying that additional analysis beyond the scope of this writeup would be required to do attribution with a higher level of accuracy. The platforms themselves (in this case, Twitter) have the most complete raw data that would make a more thorough and more accurate analysis possible.

When looking at accounts within the dataset that engage in behavior that might be inorganic, I limited the analysis to the accounts that were in the 95th percentile or higher (as measured by post count) during the 12 hours of the spike.

Among the 7455 most active accounts, 258 were created during June, July, or August 2019.

Among these 258 new accounts, many accounts post at an incredibly high rate, with some accounts averaging hundreds of posts a day, every day. To put this in perspective, for an account to post 50 times a day, that translates to roughly 6 posts an hour over an 8 hour day, or three posts an hour over 16 hours on social media, every day. Of the new accounts, 89 have been averaging 100 or more posts per day; 147 averaged 50 or more posts per day, and 57 have amassed 1000 or more followers in the short time (between 1 and 100 days) they have been active. While post volume alone is not an adequate measure of abnormal activity, accounts that meet the following criteria merit additional review:

  • new,
  • highly active in general,
  • highly active on a hashtag associated with a fringe conspiracy theory, and
  • collecting large numbers of followers quickly.

A spot check of accounts that meet these 4 criteria show additional signs of being illegitimate, which reinforces the possibility that a subset of these accounts would almost certainly be illegitimate accounts.

As an example, the account pictured below was created in mid-July.

Account page

Since its creation, the account has posted over 37,000 times (measured through September 10), averaging just over 600 posts a day. Throughout this time, the account has also maintained a clear balance between accounts it follows and accounts it's following, and the account has just over 2800 followers.

The account's profile picture appears to be pulled from a profile for a user on the site "Rentmen" from 2015 and 2016.

Rentmen archive

Out of the 7455 most active accounts, 2140 averaged 50 or more posts a day, every day. 937 accounts averaged 100 or more posts a day. Out of the 2140 accounts that averaged 50 or more posts a day, 1102 accounts had roughly the same number of followers as people they followed, which is suggestive of follow-back schemes or of automated processes that balance number of followers with number of accounts followed. While it is worth noting that follower count alone can be attributed to human behavior, when multiple indicators of inorganic behavior can be observed across a large group of accounts, additional review is warranted.

Among the most active accounts, a quick spot check showed multiple accounts that, while not displaying the high post counts or close follower/following ration described above, had other traits that could indicate that the accounts are not authentic.

Account page

For example, this account was active discussing and amplifying conspiracy theories around Epstein's suicide, in addition to pushing content that supports other popular narratives among the far right.

Post attacking Ilhan Omar

However, the profile picture of the account is a http://beautypunkfashion.blogspot.com/2012/10/maria-selena-beauty-fashion-dress-miss.html">lightly photoshopped version of Maria Selena Nurcahya, a person from Indonesia who competed in the 2012 Miss Universe contest.

MS site and image

Another example account that came up in a quick spot check is shown below.

Account page

This account promotes conservative books, and has amplified multiple hard right stories and conspiracy theories, including about Epstein's suicide. The account also makes statements about being in specific locations, and uses photos that suggest they are of the account holder.

Back in "Texas"

However, a quick search shows that the photo is commonly used by romance scammers - the picture is from a person who uses the performance name "Ann Angel."

Romance scammer photos

A second post states that the account holder just donated to the NRA.

Tweet claiming support for the NRA

This post reuses the Ann Angel picture highlighted above, and also incorporates a stock photo of a blond woman on a motorcycle.

Stock photo - woman on motorcycle

While there are possibly legitimate reasons why an account that posts conspiracy theories, has averaged nearly 70 posts a day for over 1000 days, amplifies multiple far right accounts and sites, and uses the same pictures as romance scammers, these behaviors are also strong signs that the account could be engaged in inorganic behavior. The use of tactics favored by romance scammers are especially telling.

The three manual spot checks described here show accounts that each have multiple signs of questionable behavior. Precise attribution, however, is very difficult -- more importantly, the behavior of a fervent believer can -- to a non-believer -- look highly abnormal. But with all that said, taken collectively, the overall behavior and trend within this dataset is away from accuracy, and toward speculation that is founded in racism, paranoia, and misogyny.

7. Conclusion

Between the popularity of hate sites and misinformation sites relative to more accurate news sites, the complete dominance of far right content on YouTube and selected smaller video platforms, and questionable accounts amplifying these questionable sources, Twitter remains a central element of spreading misinformation to a broad audience. As noted in the earlier writeup about conversational spikes about Ilhan Omar:

Over time, the right leaning to far right content creates an ever-growing foundation of sources that it can use to buttress arguments in future conversations. This ever-growing body of content provides a repository that reinforces a world view and a perspective. Conversations about specific issues become less about the individual issue, and more about proselytizing a world view and bringing people into the fold.

Social media platforms are all collecting the data that would help them address these problems more effectively. Twitter's follower recommendations are surprisingly accurate: when a person visits the account of a likely troll or romance scammer, the "other accounts" block is often filled with other trolls or scammers. YouTube's "recommended videos" list and Facebook's "related pages" blocks are equally (and surprisingly) accurate. Moreover, each of these services specialize in targeted ads based on accurate prediction of and insight into our interests. If we are to believe that their adtech isn't complete junk, then we should also expect that they could bring a comparable level of precision to addressing hate speech and misinformation on their platforms.

This analysis doesn't cover the sharing of misinformation on Facebook, Instagram, Reddit, or Snapchat. But, leaving those platforms out of the picture, given the relative ease with which Twitter can be used to get conspiracy theories trending -- and the equal ease with which conspiracy theories can be distributed at no cost on YouTube -- we are in for a long and ugly run up to the 2020 election.

8. Domain sharing information

The top 50 domains shared on Twitter:

Researching Political Ads -- A Process, and an Example

13 min read

It might seem like the 2020 elections are a long way away (and in any sane democracy, they would be), but here in the US, we have a solid fourteen months of campaigning ahead of us. This means that we can look forward to fourteen months of attack ads, spurious claims, questionable information -- all of it amplified and spread via Facebook, YouTube, Instagram, Reddit, Snapchat, Telegram, Pinterest, and Twitter, to name a few.

In this post, I break down some steps that anyone can use to uncover how political ads or videos get created by looking at the organizations behind the ad.

The short version of this process:

  • Find organizations
  • Find people
  • Find addresses

Then, look for repetition and overlaps. Later in this post, I'll go into more detail about what these overlaps can potentially indicate.

Like all content I share on this blog, this post is released under a Creative Commons Non-Commercial Attribution Share-Alike license. Please use and adapt the process described here; if you use it in any derivative work please link back to this page.

1. Steps

1. Find out who is running the ad. This can be found via multiple ways, including identifying information at the end of the ad or finding social media posts or a YouTube channel sharing the ad. If an ad or a piece of content cannot immediately be sourced, that's a sign that the content might not be reliable. It's worth highlighting, though, that the clear and obvious presence of a source doesn't mean the content is reliable -- it just means that we know who created it.

2. Visit the web site of the organization or PAC running the ad, if they have one. Look for names of people involved in the project, and/or other partner orgs. While the the lack of clear and obvious disclosure of the organization/people behind a site is a reason for concern, disclosure does not mean that the source is reliable or to be trusted. The organizational affiliation, or people behind the organization, should be understood as sources for additional research.

If, after steps 1 and 2, there is no clear sign of what people or organizations are behind the ad, that can indicate that the ad is pushing unreliable or false information.

3. Do a general search on organization or PAC name. Note any distinct names and/or addresses that come up.

4. Do a search at the FEC web site for the PAC name. Note addresses, and names of key officials in the PAC.

5. Do a focused search on the exact addresses on the FEC web site. Be sure to include suite numbers.

6. Do a focused search on any key names on the FEC web site.

The point of these searches is to find repetition: shared staff across orgs, and a common address across orgs, can suggest coordination.

While follow up research on organizations sharing a common address, or staff shared across multiple orgs, would be needed to help clarify the significance of any overlaps, this triage can help uncover signs of potential coordination between orgs that don't disclose their relationships.

Searching Notes

a. When doing a general search on the web for a PAC name, start with Duck Duck Go and Startpage. Your initial search should put the organization name in quotes. If Duck Duck Go and Startpage don't get you results, then switch to Google. However, because most organizations do white hat or black hat SEO with Google in mind, using other search engines for research can often get better results.

b. When searching the FEC web site, you can often get good results without using the search interface of the FEC web site. To do this, use this structure for your search query:

  • "your precise search string" site:docquery.fec.gov or
  • "your precise search string" site:fec.gov.
  • when searching for an address, split the address and the suite number: "123 Main Street" AND 111 site:fec.gov. Using this syntax will return results at "123 Main Street, Suite 111" or "123 Main Street STE 111" or "123 Main Street # 111". 

I generally use docquery.fec.gov first, as that brings up results that are directly from the filings, but either will work.

Unlike searches across the open web, Google will often return cleaner results than searching within the FEC site.

A note on names of companies and individuals

In this writeup, we will be discussing companies, political groups, politicians, and consultants. Generally, companies, political groups, and politicians will be named in the text of this writeup.

I have reviewed screenshots and obscured names and email addresses that contained names, and in general individuals will not be named in this writeup. However, in some cases, the names of individuals will be accessible and visible via URLs shared in this document. This is a decision that I struggled with, and am still not 100% okay with, but it's hard to both show the process while not showing any potentially identifiable information.

I am not comfortable naming people, even when their names are readily available in the public record via multiple sources (and to be clear, all of the information described here is from publicly available documents). The fact that a person's name can be found via public records doesn't justify pulling a name from a public record and blasting it out. In this specific case, in this specific writeup, I made an intentional effort to not include the names in screenshots or in text. This provides some incremental protection (the names won't visible in this piece via search, for example), while still providing some clear and comprehensible instructions so that anyone can do similar research on their own.

But, for people doing this research on your own: do not be irresponsible with people's names and identities. Naming people can put a target on them, and that is just not right.

And, if anyone who reads my piece uses my work to target a person, you are engaging in reprehensible behavior. Stop. Conducting real research means that you will see real information. If you lack the moral and ethical character to use what you learn responsibly, you have no business being here.

2. Using the steps to analyze an ad

To show how to use this process, I will use the recent attack ad levelled at Representative Ocasio-Cortez during a recent debate among Democratic presidential candidates. Representative Ocasio-Cortez responded to the ad on Twitter:

AOC on attack ad

To start, OpenSecrets has a breakdown of the major funders of the PAC behind the ad. The writeup here doesn't look at the funders; it goes into more detail about the PAC, and ways of researching them. If you are looking for information about specific funders, the post at OpenSecrets is for you.

Step 1: Who is running the ad

In this instance, the group behind the ad is pretty simple to find. A person connected to the group quote tweeted Representative Ocasio-Cortez:

Response to AOC

This leads us to the web site for New Faces GOP PAC at newfacespac.com.

Step 2: visit the web site

The site features information about Elizabeth Heng, who lost the 2018 Congressional race for California District 16. It also features the ad that attacks Representative Ocasio-Cortez.

The site also includes a donation form, and at the bottom of the form we can see a small piece of text: "Secured by Anedot." This text gives us a bookmark.

Anedot embedded form

Many forms on political sites are embedded from third party services, and if we look at the original form we can often get useful information. To find the location of this form, we hit "ctrl-U" to view the page source, and then search the text (using ctrl-F) for "anedot".

This identifies the URL of the form.

Anedot URL

Strip away "?embed=true" from the end of the link, and you can go directly to the original form. In this case, the form gets us an address:

Anedot address

We'll note that for use later.

Step 3: Search for the PAC name

A search for "New Faces GOP" turns up a listing in Bizapedia.

New Faces GOP search

This listing provides three additional names, and two additional addresses: a physical address in Texas, and a PO Box in California.

Bizapedia main page

The Texas address (700 Lavaca St Ste 1401, Austin, TX 78701) is a commonly used listing address for multiple organizations, which is a sign of a registry service. 

Lavaca Bizapedia

The California address (PO Box 5434 Fresno, CA 93755) appears to be used less widely.

Step 4: Do a search at the FEC web site for the PAC name

A search of the FEC web site returns the main page at https://www.fec.gov/data/committee/C00700252/?tab=summary - this page provides a review of fundraising and spending.

Additional details are available from the "Filings" tab at https://www.fec.gov/data/committee/C00700252/?tab=filings

FEC docs list

The Statements of Organization provide an overview of key people in the organization, and of relevant addresses.

The most recent Statement of Organization (shown below) contains the same Fresno PO Box (PO Box 5434 Fresno, CA 93755) found in the Bizapedia listing. The filings also include the name of a treasurer. We will note this name for focused searches later.

FEC filing for New Faces GOP

At the end of Step 4, we have the following information:

  • multiple addresses to investigate;
  • multiple people connected to the PAC;
  • by virtue of having information pulled directly from FEC filings, some confirmation that our information is accurate;

Step 5: Do a focused search on the exact addresses on the FEC web site

For this search, we have three main addresses: the Fresno PO Box; the Austin, TX address; and the Washington DC address.

The Fresno PO Box links primarily to filings for New Faces GOP PAC, and for Elizabeth Heng's failed congressional bid.

FEC search PO Box

The search for the Texas address returns no direct results.

The search for the Washington DC address returns results for multiple different PACs, all connected to the Washington DC address.

FEC search on DD address

The FEC results also include the name of a consulting firm, "9Seven Consulting."

In the spending cycle for 2018, this firm received $156,000 in disclosed payments, per OpenSecrets.

Oddly, a web search for "9Seven Consulting" returns a top hit of a Digital Consulting firm named "Campaign Solutions" that also appears to be the employer of the person listed across multiple PACs connected to the Washington DC address at 499 South Capitol Street SW, Suite 405. These results are consistent across DuckDuckGo and Google.

9Seven search results

A search on that address returns yet another advocacy group.

Prime Advocacy search results

This group claims to specialize in setting up meetings with lawmakers.

By the end of Step 5, we have collected and/or confirmed the following information:

  • we have confirmed that many PACs list the Washington, DC address as their place of business;
  • we have confirmed that at least two political consulting firms list the same Washington, DC address as their place of business
  • we have confirmed that multiple PACs list a key employee that is also part of a digital consulting firm

Step 6. Do a focused search on any key names on the FEC web site

For this search, we will focus on the name that appears across multiple filings. A Google search returns 135 results. Based on a quick scan of names, these PACs appear to be almost exclusively right leaning. Obviously, the results contain some repetition, but there are upwards of 25 unique PACs here. In the screenshot below, the same name appeared on all results; it is obscured for privacy reasons.

Search results with name obscured

Additionally, the same name is connected to an IRS filing connected to George Papadopolous. This filing also uses the same DC address.

Shared name

Based on the results of this search, it appears pretty clear that these PACs were supported by a common process or a common entity. The combination of shared staff on their filings and, in some cases, a shared address, could imply a degree of coordination. Clearly, the DC address is used as at least a mailing address for multiple organizations that have at least some general overlap in political goals.

What Does All This Mean

The information uncovered via this process helps us understand what this ad is, what this ad isn't, and how political content gets generated.

Clearly, the group behind the ad is connected to Republican and right wing political organizing. It is unclear whether or not the shared infrastructure and shared process used to create these PACs indicates any level of cooperation across PACs, or whether the PAC-generating infrastructure is akin to HR outsourcing companies that manage payroll and benefits for smaller companies - but given the overlaps described in this post, a degree of coordination would certainly be possible and straightforward to set up, if it doesn't already exist.

The infrastructure supporting the New Faces GOP PAC seems solid. Based on their FEC filings, the group was formed in March of 2019, and by the end of June had raised over $170,000.00. While this isn't a huge amount of money by the standards of national political campaigns, it's still significant, and this level of access to donors, paired with access to the organizational expertise to manage the PAC, suggests a level of support that would be abnormal for a true grassroots effort.

However, this research just scratches the surface; on the basis of what we've seen here, there are multiple other PACs, people, and addresses that could expand the loose network we are beginning to see here. Political funding and PACs are a rabbit hole, and this research has us at the very beginning, leaning over and peering into the abyss.

But, understanding the ad in this context helps us see that it is one facet of what is likely a larger strategy that uses leaders like Representative Alexandria Ocasio-Cortez as foils to energize the Republican base. The hyperbolic rhetoric used in the ad normalizes overblown claims and irrational appeals in an effort to drown out conversations about policy. PACs can be used to fund a body of content that can help fuel future conversational spikes as needed, and to introduce narratives. Because PACs are so simple to form -- especially when there are consultancies designed that appear to bundle PAC creation with a digital distribution plan -- PACs can be thought of as a form os specialized link farm. https://en.wikipedia.org/wiki/Link_farm Just like link farms, PACs provide a way of spamming the conversation with messages from orgs that can be discarded, and subsequently reborn under a different name.

The message matters, but the message in this case becomes clearer when filtered through the ecosystem of PACs that helped create it.

One final note

The research that fueled this writeup isn't especially time consuming. It took me about 10 minutes of searching. The writeup took a while -- they always do -- but the process of doing a quick triage is very accessible. More importantly, every time you do it, you get better and faster. Also, it's not necessary to review every ad. Just do some - learn the process. By learning this research process, you can both see the forces that help shape (some highly misleading) political advertisements and get a clearer view into the process that allows money to shape politics. We are better able to disrupt and debunk what we understand.

Observable Patterns in Conversations about Ilhan Omar

26 min read

A. Summary

This data analysis looks at conversations on Twitter about Ilhan Omar that occurred in June, July, and August. Specifically, this analysis examines 4 spikes in conversation on Twitter, and looks at the accounts partipating in each spike, and the top domains and YouTube videos shared by participants in the conversation. Collectively, these 4 spikes in conversation make up 1.19 million tweets.

Data used for this analysis was collected from a twitter search for Congressperson Ilhan Omar's name: "Ilhan Omar." The search did not use any other hashtags or terms.

The analysis shows several trends:

  • Shares to YouTube dwarf shares to other domains. In each spike, YouTube was the most popular external domain shared - collectively, in all four spikes, at least 10,756 YouTube links were shared.
  • Aside from YouTube, the right wing site Gateway Pundit is the most popular domain shared. In three out of the four spikes, Gateway Pundit was the most popular site shared. In the fourth spike, Gateway Pundit was the second most popular site shared.
  • 260 accounts were highly active in all 4 spikes (in the 95th percentile or greater as measured by post count). 246 accounts (94.6%) are right leaning to far right, compared to 10 accounts (3.9%) that were mainstream to left wing.
  • The most popular YouTube shares in each spike trended hard right. Out of the top 16 YouTube videos shared, only one was from a left leaning source (Now This News); the remaining fifteen were from right wing sources, including some sources known for sharing exremist content and misinformation. Additionally, YouTube's recommended videos reinforced right leaning to far right perspectives, so once a person landed on YouTube the video recommendations would keep them firmly rooted in a right wing perspective, or an extremist/white supremacist perspective.

The four spikes in conversation that occurred in the summer of 2019 show multiple ways in which the right and the far right dominated the conversation about Ilhan Omar on Twitter, and how that imbalance extended onto YouTube.

This analysis does not look at corresponding activity on Facebook, and this analysis does not look extensively at whether or not any accounts are engaging in coordinated misinformation efforts.

B. Introduction

In this analysis, we will look at 4 spikes in conversation that have occurred about Ilhan Omar in June, July, and August. This analysis uses Twitter as a starting point, and also examines YouTube shares. Each individual spike is described in more detail below.

This analysis focuses on three things for each spike:

  • levels of participation among the most active accounts;
  • domains shared within the data set, and
  • top YouTube videos shared.

These general, and distinct, indicators help provide an initial sense of the source material used to inform the conversation.

At the end of the analysis of the four spikes, I also examine the apparent ideological leanings of the accounts that were highly active (in the 95th percentile or greater) across all four spikes.

In this analysis, I will generally not be identifying individual accounts for two main reasons:

  1. precise attribution is difficult; while some accounts within this dataset clearly appear to be inauthentic, I prefer to err on the side of caution. If/when an authentic account is incorrectly labelled as inauthentic, it can direct destructive attention toward that account. The short version: I'm personally not okay with doxing.
  2. issues related to misinformation go beyond individual accounts. Patterns are interesting, and individual accounts are rarely of interest in their own right, but they are of greater interest when they can be situated within a pattern.

In rare cases, if or when an individual account does help illustrate a larger point, I reserve the right to use an individual post, but this will be rare, and generally only when the account in question is verified, and/or belongs to a public figure, and/or has been active in spreading misinformation. However, these instances will be rare, and in most cases if or when I use an individual post as an example it will be stripped of as many non-relevant details as possible.

C. Questions Asked and General Notes

This section provides context and some general notes on methodology used in the analysis. If you want to read this later and skip straight to the analysis of the spikes, head right this way!

C1. Who/what is creating the buzz?

To help get a rough sense of how participation in this conversation unfolds, I calculate what percentage of accounts participating in the conversation create 10% of all posts in the spike. This number is a rough proxy for how top-heavy a conversation might be: in a balanced conversation, 10% of participants would create 10% of the conversation.

It cannot be emphasized enough that these numbers are a very rough proxy for how engaged the most engaged participants are, and these numbers are best understood as indicators of other things to look for, rather than as meaningful in their own right. On Twitter - as in life - conversations can be dominated by very loud or active participants. The gap between the percent of participants and percent of the overall conversation can be an interesting indicator. When the percentage of participants edges closer to 10%, it can suggest more balanced participation across accounts. When percentage of participants is smaller, it can indicate a more frenzied conversation, higher participation by spambots (on or off topic), or other forms of artificial manipulation.

However, to re-emphasize this point: these numbers should only be understood as potential indicators. Additionally, the search terms and filters used to generate a data set can affect what these numbers look like, which makes it difficult to use these numbers to make apples to apples comparisons across data sets generated from different search terms. I am including these numbers here because they provide some context, but they should be considered rough indicators, at best.

C2. What domains are shared?

The domains used as sources within a conversation can provide a rough indication of the perspectives and ideological leanings of participants. Collecting the list of domains is the easy part. Coding those domains on a scale that measures (or approximates a measure of) ideological leaning is more difficult, and generally satisfies no one. However, it's a necessary element of the work, and I am attempting to be as clear and transparent as possible about how domains are coded.

For this analysis, I created two general groups using the spectrum of political right to political left. At the outset, I want to be clear that this definition is an oversimplification. However, for the purposes of this analysis, the oversimplification embedded in this coding is both a strength and a weakness - while there are going to be fringe cases that don't fit cleanly within this coding, the general structure is simple to the point where it is easy to use and easy to understand.

The two general categories are:

  • mainstream to left leaning to far left;
  • and right leaning to far right.

In determining where a publication stood on the spectrum from far right to right leaning to mainstream to left leaning to far left, publications like USA Today, the AP, and Reuters are considered mainstream. Sources like CNN, the NY Times, and the Washington Post, which are generally mainstream but, in aggregate, lean left, are included in the "mainstream to left leaning to far left" group. Sites like "Mediaite" and "Raw Story" have an editorial direction that is strongly to the left; these sites also share stories and headlines designed to be clickbait, and/or to misrepresent the facts of an issue to fit a political or ideological narrative. Publications like the New York Post and Wall Street Journal, which consistently swing right, are included as mainstream sources and are coded within the "mainstream to left leaning to far left" group.

In general, for a source to be considered right leaning or far right, it needed to be to the right of the Wall Street Journal or the New York Post. Fox News (discussed in more detail below) is coded within the "right leaning to far right" group, where Fox affiliates -- who often have more balance and a degree of editorial independence -- were coded as within the mainstream group. Sites associated with known racists or far right activists were coded within right leaning and far right.

Advocacy sites were coded within the political affiliation that most closely aligned with their advocacy. I also used Media Bias Fact Check to check my coding. This writeup also contains the list of the top 50 domains shared in each spike, and that list includes my coding so it can be checked for accuracy and argued over indefinitely

The decision about where to code Fox News was surprisingly difficult. My initial tendency -- largely because of the presence of voices like Sean Hannity, Tucker Carlson, Lou Dobbs, Laura Ingraham, Jeanine Pirro, etc -- strongly indicated that Fox should be included within right leaning to far right. However, there are a small number of journalists in their news unit (looking at you, Shep Smith) who, while definitely leaning right, have committed acts of actual journalism.

However, this question was simplified by how Fox shares its content on YouTube. On YouTube, Fox shares it's opinion hosts -- many of whom share biased, racist, misogynistic content, and/or blatant conspiracy theories -- under the "Fox News" name.

Fox News opinion hosts

This clear connection of the news side and opinion side on their YouTube presence -- which has over 3 million subscribers, and millions of views on its videos -- simplifies the decision, and was the deciding factor in grouping Fox News into the "right leaning to far right" group.

The coding in this analysis should be understood as a rough grouping of political leaning, and this coding stops short of determining whether or not a site shares false, misleading, or inaccurate stories. In some cases, if a site has a clear track record of spreading misinformation, that is noted in the analysis.

C3. What does it mean when google.com shows up in a domain list?

The domain "google.com" shows up in the listing of top domains; this is generally an indication of a hamfisted and amateurish setup of Google's "accelerated mobile pages" - more information available here.

The National Review provides a great example of this incompetence in action: https://www.google.com/amp/s/www.nationalreview.com/news/link-to-misinformation/amp. In this example (with the full url changed so as not to provide more visibility to any stories), you can see that google (dot) com shows up as the primary domain. This is a common trait among both less reputable sites, and reputable sites with sub-par technical implementation: because the main domain shows up as "google (dot) com" the site will generally show up as "trustworthy" regardless of whether or not the site is reliable. This is one of several ways that AMP is not good.

C4. What does it mean when twitter.com shows up in the domain list?

Links to twitter.com indicate that people are sharing and amplifying individual tweets, which can be indicative of echo chambers and/or highlighting accounts to swarm. Additional analysis of accounts sharing links to other Twitter URLs is required to gauge whether or not there is any level of artificial or coordinated signal boosting among these accounts.

C5. Analysis of YouTube Shares

For each spike, I examine the top 4 YouTube videos shared. This analysis looks at:

  • the number of shares on Twitter to the video
  • the source of the video
  • number of plays for the video.

For the first and fourth most popular videos in each spike, the analysis includes a breakdown of the recommended videos in the sidebar, up to a maximum of 9.

D. Spike One: June 22nd through June 29

This initial spike coincided with congressional delegations visiting border camps holding people seeking asylum in the US. Multiple congresspeople compared the dehumanizing conditions of these camps to concentration camps, and the story was gaining increased visibility via press coverage. Additionally, on June 20th, the NY Times put out a story profiling some of the people having racist reactions to Somali refugees resettling in Minnesota.

The subject of this spike was misinformation about Ilhan Omar's past.

Spike 1 posts

Between June 22nd and June 29th, 175,260 tweets came from 80,365 accounts.

The top 544 most active accounts - .68% of all active accounts in this spike - created 10% of all content in this spike.

Across all accounts, approximately 1500 unique domains were shared a total of 21,314 times.

A scan through the top 20 domains shared over this time period show a strong skewing toward right wing sites, with multiple sites of conspiracy theorists and far right figures appearing above mainstream sites. In the top 20 sites, 3828 shares point to 13 different right leaning or far right domains. 899 shares point to mainstream or left leaning content -- and all of those shares are from one source, the Star Tribune, the local paper in Ilhan Omar's district. Six of the domains in the top 20 were either link sharing services or links to other social media sites like Facebook or YouTube.

Out of the top 50 sites, 28 domains were right leaning or far right; these 28 domains were shared 4732 times. 8 domains were mainstream to left leaning to far left, and these domains were shared 1297 times. When we look at individual examples, fringe conspiracy sites and outright hate sites were shared in greater numbers than mainstream news sites. For example, links to Pam Geller's site were shared 184 times; Laura Loomer's site was shared 147 times; Infowars was shared 75 times; and the New York Times was shared 56 times.

Links to YouTube videos dwarf shares to other domains, with 1776 shares to YouTube. The next most popular domain after YouTube is Gateway Pundit, with 1328 domain shares. The Star Tribune - a local paper in Minnesota which is considered both mainstream and reliable - was shared 899 times.

The full list of the top 50 domains is included below.

The top 4 YouTube shares are listed below. The top 3 videos - shared collectively 406 times during this spike - all point to right leaning to far right content. The 4th most shared video - shared 91 times during this spike - is from Now This News, a progressive organization.

A look at the YouTube pages for these videos, however, suggests that the sharing of the link from Twitter is just the beginning. The screenshot shared below of the Rebel Media video was taken on August 20th from a clean browser while not logged in to YouTube. The 9 recommended videos at the top of the list include:

Spike 1 - video 1

  • 4 links to Fox News
  • 1 link to CNN
  • 1 link to Piers Morgan
  • 1 link to Channel 4 News
  • 1 link to Star Parker
  • 1 links to Vice "debate"

If a person comes to this video, the main options presented to them slant heavily to right leaning to far right perspectives.

Looking at the only progressive video in the top 4 most shared -- which was the 4th most popular video shared -- the breakdown of the 9 top recommended videos on the "Now This News" video include:

Spike 1 - Video 4

  • 3 links to Fox News
  • 2 links to MSNBC
  • 1 link to a Bill Maher interview with Ben Shapiro
  • 1 link to CNN
  • 1 link to C-SPAN
  • 1 link to The Daily Show

The YouTube recommendations on these videos include a small number of mainstream to left leaning sources, but the majority of recommendations are to right wing sources.

E. Spike Two: July 9th to July 13th

This spike appears to be sparked by a Tucker Carlson segment where Carlson continued his pattern of using racist smears as a core element of his program.

Spike 2 posts

In this spike, 90,788 accounts posted 209,404 times over 5 days (July 9-13).

The top 703 most active accounts - .77% of all active accounts in this spike - created 10% of all content in this spike.

In this time period, approximately 1450 domains were shared 20,702 times.

Out of the top 20 domains shared, 3238 shares pointed to 11 different right leaning or far right domains. 790 shares pointed to 4 different mainstream or left leaning domains.

Out of the top 50 domains shared, 3803 shares pointed to 19 different right leaning to far right domains. 1604 shares pointed to 16 mainstream to left leaning to far left domains.

As with the first spike, links to Twitter and YouTube dominated shares, with 7079 and 1391 shares, respectively. Fox News, The Gateway Pundit, and Breitbart were the next most popular domains, collectively shared 2062 times. In comparison, the most popular mainstream to left leaning domains (Huffington Post, Mediaite, and Microsoft News) were shared a total of 625 times.

The Western Journal - a far right site run by a political activist who was responsible for the Willie Horton ad and who currently runs a PAC with Herman Cain - was shared 198 times. In comparison, the Washington Post was shared 165 times.

The most popular YouTube videos slant heavily toward right leaning and far right sources as well. The top four YouTube shares all point to videos that represent right wing perspectives.

The most shared video - from the Next News Network - has recommended videos that are almost exclusively right wing. The recommended videos include:

Spike 2 - Most shared on YouTube

  • 6 from Fox News
  • 1 from NBC News
  • 1 from "Valuetainment"
  • 1 from "Pure living for life"

The fourth most popular video - from an account named "Contemptor" - follows the same pattern. Recommended videos include:

Spike 2 - 4th on YouTube

  • 5 to Fox News
  • 1 to a Bill Maher interview with Ben Shapiro
  • 1 to a video of Ann Coulter calling feminists "angry man-hating lesbians"
  • 1 to CNN
  • 1 to CBS News

The second spike has very similar patterns to the first spike: the share of domains is heavily slanted to right leaning and far right content. Top YouTube shares are nearly exclusively to right leaning or far right content, and the recommended videos from the top shares are heavily weighted to right leaning or far right sources.

F. Spike Three: July 13th to July 18th

The third spike picks up where the second spike ends, and includes the time period that includes Trump telling Ilhan Omar and three other congressional representatives to go back to "the totally broken and crime infested places from which they came."

Trump comments

While both the second and the third spike include parts of the 13th, the second spike ends at 03:00 on the 13th, and the 3rd spike picks up at 04:00.

Spike 3 posts

In the time period between July 13th and July 18th, 232,293 accounts posted 622,855 tweets over 6 days.

The top 1605 most active accounts - .69% of all active accounts in this spike - created 10% of all content in this spike.

In this time period, approximately 3450 domains were shared 81,815 times.

Out of the top 20 domains shared, 8325 shares link to 7 right leaning or far right sources. 5401 shares link to 6 different mainstream or left leaning or far left domains.

Out of the top 50, 12,294 posts linked to 21 right leaning or far right domains. 7995 shares linked to 16 mainstream to left leaning to far left domains. In this third spike, shares to right wing sources still dominate shares to left wing or mainstream sources. The Gateway Pundit - a far right site that regularly spreads misinformation - was shared 4780 times; this is more than the combined total of the top 4 most shared mainstream to left leaning to far left sites (Huffington Post, the Star Tribune, Wall Street Journal, and The Guardian), which were shared a total of 4400 times.

Shares of YouTube videos continue the right leaning to far right domination seen in the first two spikes.

In the third spike, 5882 total posts share links to YouTube, and the top 4 videos shared all represent right wing viewpoints.

The recommendations from The Blaze link almost exclusively to right leaning or far right sources:

Spike 3 - The Blaze

  • 6 from Fox News
  • 1 from Vice
  • 1 from Black Pill
  • 1 from Glenn Beck

The ads and recommendations from The Next News Network video point primarily to right wing content. For this video, an ad cut one video out from the top screen, so we only have eight video recommendations.

Spike 3 Next News Network

  • 6 from Fox News
  • 1 from "enduringcharm"
  • 1 from Vice

G. Spike 4: August 15 - August 17

This spike was triggered by Israel refusing entry to Ilahn Omar and Rashida Tlaib, and President Trump's two tweets supporting a foreign nation over two elected congresspeople.

Spike 4 posts

In the time period between August 15th and August 17th, 93,844 accounts posted 188,656 tweets over 3 days.

The top 826 most active accounts - or .88% - created 10% of all content

In the fourth spike, approximately 2050 domains were shared 32,780 times.

Out of the top 20 domains shared, 3514 shares link to 6 right leaning or far right sources. 2370 shares link to 5 different mainstream or left leaning or far left domains.

Out of the top 50 domains shared, 4864 posts linked to 17 right leaning or far right domains. 4201 shares linked to 19 mainstream to left leaning to far left domains. In this fourth spike, the count of total shares to right wing sources still dominate shares to left wing or mainstream sources - despite that in the top 50 domain shares there are 2 more mainstream to left leaning domains.

The fourth spike follows the patterns of the first three spikes, with links to right wing domains publishing dubious or outright racist and/or extreme content being shared at a higher volume than links to mainstream or left leaning or far left content. The Gateway Pundit was shared 1077 times, more than twice the total of shares to the NY Times, the most shared mainstream to left leaning site, which was shared 517 times. The Western Journal was shared 190 times, and links to Laura Loomer's site were shared 154 times; links to the Washington post were shared 151 times.

In the fourth spike, 1707 posts shared links to YouTube videos. As with the other spikes, the most popular videos all featured right wing content, including content from sources known to push misinformation.

In looking at the videos and links shared on the first screen with the top shared video from Black Pill, we have 6 options - two ads, and four recommended videos. The two ads are to Epoch Times and Judicial Watch. Epoch Times has recently been engaged in highly suspect and misleading behavior on Facebook, and Judicial Watch is a a far right source of conspiracy theories.

Spike 4 - YouTube video 1

The other video recommendations include:

  • 2 for Fox News
  • 1 for Black Pill
  • 1 for PragerU

The 4th most shared video - also to Black Pill - includes 8 links on the top screen; 7 videos and one ad.

Spike 4 - YouTube video number 4

The ad is for the National Republican Congressional Committee.

The video recommendations include:

  • 4 for Fox News
  • 1 for Fox Business
  • 1 for the Daily Signal
  • 1 for Huckabee

As with the other spikes, the top shared videos are right leaning to far right, and the recommended videos from YouTube are nearly all right leaning to far right.

H. Who Shows Up?

As noted in the summary of each spike, a small percentage of accounts creates an outsize percentage of the content. This isn't necessarily abnormal, but over time, noting what accounts show up most frequently can also help illustrate patterns. For each spike, I collected the accounts that were were in the 95th percentile or higher as measured by post count. Then, I looked at what accounts were in the 95th percentile of activity across all four spikes.

260 accounts total were active across all four spikes covered in this analysis. Out of these 260 accounts:

  • 246 accounts (94.6%) are right leaning to far right.
  • 10 accounts (3.9%) are mainstream to left leaning to far left.
  • 4 accounts (1.5%) were not clearly affiliated. These accounts were on a spectrum between overt gibberish and failed attempts at parody/joke accounts.

Coding of account leanings examined general traits of the accounts, including bios, recent posting histories, hashtags used, domains shared, and posts liked or retweeted. The following tweets provide samples from accounts that were coded as right leaning or far right:

Right wing Twitter example 2

Right wing Twitter example 1

Rightwing Twitter example 3

Two examples of left leaning accounts that were active across all four spikes are Ilhan Omar and new outlet The Hill.

Among the most active repeat participants, right and far right accounts vastly outnumbered mainstream and left leaning accounts. This analysis does not make any effort to determine whether or not these accounts are connected to real people, or whether or not these accounts are part of inorganic or inauthentic amplification as part of a larger network. While many of these accounts do show signs of being trolls and/or sockpuppets, more detailed analysis is required to determine potential authenticity or inauthenticity of individual accounts.

The overwhelming numbers of active participants from the right, relative to the much smaller number of participants from the mainstream and the left, indicates that on Twitter, right wing accounts show up more consistently. The fact that just under 95% of active repeat participants in these spikes are right leaning to far right, with just under 4% being left leaning or far left, helps highlight that in the conversations about Ilhan Omar, the right wing and far right voices are significantly more consistent and active than left leaning voices. This imbalance calls out for additional research on these accounts to determine how many can be connected to actual people, and how many are potentially sockpuppets working within a network.

I. Conclusion

When looking at what domains get shared, and at the most popular shares of YouTube videos, two facts become clear about the recent conversations about Ilhan Omar:

  • Content from right leaning to far right domains is shared at a much higher volume than mainstream, left leaning or far-left domains.
  • On YouTube, right leaning to far right content is initially amplified by disproportionate sharing from Twitter, and visitors to YouTube are subsequently served more right leaning to far right content via YouTube's content recommendation algorithm.

Individually, either of these elements indicate that right wing perspectives are overwhelming the conversation about an elected official. Taken together, however, these two factors are mutually supportive - this is how a closed system on seemingly "open" platforms take shape.

When the imbalance in domain shares, the imbalance in shares to YouTube videos, and the rabbit hole effect of YouTube's content recommendation algorithm are combined, we get a clearer sense of how social media platforms can potentially be gamed in parallel to fabricate consensus, and to support the spread of increasingly radical and hateful content. The imbalance in one conversation (or spike) both shifts the bounds of what's "normal" and then the next conversation shifts the norms even further.

This imbalance is further multiplied by the repeated rates of participation from right wing accounts. These accounts draw on older content generated from past flareups, creating a system that supports bias or misinformation in depth. Multiple content distribution strategies are at play here, and the whole is absolutely greater than the sum of its parts.

Over time, the right leaning to far right content creates an ever-growing foundation of sources that it can use to buttress arguments in future conversations. This ever-growing body of content provides a repository that reinforces a world view and a perspective. Conversations about specific issues become less about the individual issue, and more about proselytizing a world view and bringing people into the fold. To make a vast oversimplification, one of the possibilities suggested by this data set is that the left argues about specific points, while the right uses specific points to proselytize a world view.

While this analysis stays away from whether or not any of the activity is coordinated or inauthentic, this analysis highlights that conservative complaints of "censorship" on social media are somewhere between flimsy to baseless. The data set used in this analysis was derived from a search on a person's name. Theoretically, the results should have been a pretty balanced. If YouTube and Twitter are attempting to be biased against conservatives, they are very bad at it. Similarly, if they are attempting to check or curb the use of their platforms as a means of spreading misinformation and extreme speech, they're not doing great there either.

I have yet to see any platform provide concrete data around the numbers of FTEs (and I'm talking full, salaried employees, not contractors) with dedicated time and clearly defined authority to shut down hate speech and misinformation. I have also never seen comparisons of staffing levels between, for example, advertising, or sales, or marketing, and teams fighting misinformation and abuse. If and when platforms ever become transparent and show us this information, we could begin to get a more concrete sense of how they prioritize the health of their platform relative to other business interests.

On August 27th, YouTube released new guidelines and renewed promises to "RAISE UP authoritative voices" and "REDUCE the spread of content that brushes right up against our policy line." However, given what is readily apparent on their platforms, the visible results of the current efforts of Twitter and Youtube - as observed in this analysis - do not appear remotely effective.

J. Top 50 Domains Shared

Spike 1

Spike 2

Spike 3

Spike 4