Bill has worked in education as an English and history teacher, an administrator, and a technology director. Bill initially discovered the Internet in the mid-1990's at the insistence of a student who wouldn't stop talking about it.

Vice and Philip Morris Partner to Create Vaping "Documentaries" - and YouTube Provides the Assist

2 min read

Vice News partnered with Philip Morris to create pro-vaping content. More details are available in this thread, and via the Financial Times.

The "documentaries" are, of course, on YouTube. One sample documentary is called out in this thread.

Link to "documentary"

When we go to YouTube to watch the video, we can see the recommendation algorithm kick in. I navigated through these videos while using Tor, and not logged in to Google, so these recommendations would not be affected by any account history.

YouTube's recommendation algorithm leads viewers to increasingly pro-vaping content. If we follow autoplay three times (to see a total of 4 videos), by the time we hit the 4th video, one of the top recommendations is for "5 Things Every New Vaper Must Know" which is just below "Stoner Compilation 2019." I'm not linking to these videos because I don't want to give them any direct exposure, but the screenshots of the trail are included below.

Video One - the Philip Morris and Vice Media production.

First video

Video Two - top recommendation from Video One.

Second video

Video Three - top reccomentation from Video Two

Third video

Video Four - top reccomentation from Video Three

Fourth video

Video Five - the fourth recommendation from Video Four

Fifth video - vaping recommendations

Once a person clicks on the "5 Things Every New Vaper Must Know" we are firmly in the territory of vaping recommendations.

As noted above, Philip Morris and Vice teamed up to market vaping to youth. YouTube ensures that the "documentaries" produced by Vice lead people directly to more pro-vaping content.

Readings on Big Data Use and Implications

2 min read

This is a general and incomplete list. For additional recommendations, please let me know!

2006: AOL releases data on searches.

2012: Location tracking, and predicting where we will go:

2013: Likes are revealing. This study formed the basis of Cambridge Analytica's work.

2013: Location data is a highly accurate method of identifying individuals. 2 data points can identify 50% of individuals; 4 data points identifies 95% of individuals.

2013: Discrimination in online ads:

2014: NYC Taxi data, aka anonymization is hard.

2016: From ProPublica, the different data categories Facebook (and other data collection companies) collect about us.

2016: Big data, risk assessments, and sentencing:

2017: Cambridge Analytica and the 2016 election:

2017: Tracking a specific person, legally, with $1000 and adtech.

Four books that help frame uses and issues with Big Data: 

Four Things I Would Recommend for

2 min read is an open source voice assistant. It provides functionality that compares with Alexa, Google Home, Siri, etc, but with the potential to avoid the privacy issues of these proprietary systems. Because of the privacy advantages of using an open source system, Mycroft has an opportunity to distinguish itself in ways that would be meaningful, especially within educational settings.

If I was part of the team behind, these are four things I would recommend doing as soon as possible (and possibly this work is already in progress -- as I said, I'm not part of the team).

  1. Write a blog post (and/or update documentation) that describes exactly how data are used, paying particular attention to what stays on the device and what data (if any) need to be moved off the device.
  2. Develop curriculum for using in K12 STEAM classes, especially focusing on the Raspberry Pi and Linux versions.
  3. Build skills that focus on two areas: learning the technical details required to build skills for Mycroft devices; and a series of equity and social justice resources, perhaps in partnership with Facing History and/or Teaching Tolerance. As an added benefit, the process of building these skills could form the basis of the curriculum for point 2, above.
  4. Get foundation or grant funding to supply schools doing Mycroft development with Mycroft-compatible devices

Voice activated software can be done well without creating unnecessary privacy risks. Large tech companies have a spotty track record -- at best -- of creating consistent, transparent rules about how they protect and respect the privacy of the people using their systems. Many people -- even technologists -- aren't aware of the alternatives. That's both a risk and an opportunity for open source and open hardware initiatives like

Attending ISTE While My Country Puts Children in Prison

2 min read

In a few days, I'll be traveling to Chicago for ISTE 2018. I'm curious to listen to what people talk about, and what they don't talk about. 

Crying child separated from parents

Image credit: John Moore

If people are unwilling or unable to discuss the reality - the well documented, indisputable reality - that our government is putting children into makeshift jails, I will be curious to know why. We are living in a time when our government is attempting to call putting babies - some as young as 6 months old - into cages, alone - and calling it a "tender age" shelter. When we torment children, and then torture language to mask the torment we are causing, we have multiple problems, but we cannot and should not pretend this is normal.

If, at ISTE, people try and retreat behind the veneer of "politeness" I will observe that demands for politeness in the face of the obscenities of xenophobia, racism, anti-LGBTQ bigotry, misogyny, and/or white supremacy are not polite: demands for politeness are another form of erasure. 

I will be listening to how people attempt to position technology as neutral and apolitical in a time when the ability to retreat to the pillars of neutrality and apoliticism are clear definitions of privilege.

If we can't see and acknowledge that a government policy of hurting kids and destroying families is an educational issue, then we have problems.

And my recommendation here: if, in a conversation, you have a choice between polite or candid, choose candid. It might not go over well at the time, but long term, it's the kindest thing you can do.

Fordham CLIP Study on the Marketplace for Student Data: Thoughts and Reactions

9 min read

A new study was released today from the Fordham Center on Law and Information Policy (Fordham CLIP) on the marketplace for student data. It's a compelling read, and the opening sentence of the abstract provides a clear description of what is to follow:

Student lists are commercially available for purchase on the basis of ethnicity, affluence, religion, lifestyle, awkwardness, and even a perceived or predicted need for family planning services.

The study includes four recommendations that help frame the conversation. I'm including them here as points of reference.

  1. The commercial marketplace for student information should not be a subterranean market. Parents, students, and the general public should be able to reasonably know (i) the identities of student data brokers, (ii) what lists and selects they are selling, and (iii) where the data for student lists and selects derives. A model like the Fair Credit Reporting Act (FCRA) should apply to compilation, sale, and use of student data once outside of schools and FERPA protections. If data brokers are selling information on students based on stereotypes, this should be transparent and subject to parental and public scrutiny.
  2. Brokers of student data should be required to follow reasonable procedures to assure maximum possible accuracy of student data. Parents and emancipated students should be able to gain access to their student data and correct inaccuracies. Student data brokers should be obligated to notify purchasers and other downstream users when previously transferred data is proven inaccurate and these data recipients should be required to correct the inaccuracy.
  3. Parents and emancipated students should be able to opt out of uses of student data for commercial purposes unrelated to education or military recruitment.
  4. When surveys are administered to students through schools, data practices should be transparent, students and families should be informed as to any commercial purposes of surveys before they are administered, and there should be compliance with other obligations under the Protection of Pupil Rights Amendment (PPRA).

The study uses a conservative methodology to identify vendors selling student data, so in practical terms, they are almost certainly under-counting the number of vendors selling student data. One of the vendors selling student data identified in the survey clearly states that they have information on students between 2 and 13:

Our detailed and exhaustive set of student e-mail database has names of students between the ages of 2 and 13.

I am including a screenshot of the page to account for any changes that happen to this page into the future.

Students between 2 and 13

The study details multiple ways that data brokers actively (and in some cases, enthusiastically) exploit youth. One vendor had no qualms about selling a list of 14 and 15 year old girls for targeting around family planning services. The following quotation is from a sales representative responding to an inquiry from a researcher:

I know that your target audience was fourteen and fifteen year old girls for family planning services. I can definitely do the list you’re looking for -- I just have a couple more questions.

The study also highlights that, even for a motivated and informed research team, uncovering details about where data is collected from is often not possible. Companies have no legal obligation to disclose this information, and therefore, they don't. The observations of the research team dovetail with my firsthand experience researching similar issues. Unless there is a clear and undeniable legal reason for a company to disclose a specific piece of information, many companies will stonewall, obfuscate, or outright refuse to be transparent.

The study also emphasizes two of the elephants in the room regarding the privacy of students and youth: both FERPA and COPPA have enormous loopholes, and it's possible to be fully compliant with both laws and still do terrible things that erode privacy. The study covers some high level details, and as I've described in the past, FERPA directory information is valuable information.

The study also highlights the role of state level laws like SOPIPA. SOPIPA-style laws have been passed in multiple states nationwide, starting in California. This might actually feel like progress. However, when one stops and realizes that there have been a grand total of zero sanctions under SOPIPA, it's hard to escape the sense that some regulations are more privacy theater than privacy protection. While a strict count of sanctions under SOPIPA is a blunt measure of effectiveness, the lack of regulatory activity under SOPIPA since the law's passage either indicates that all the problems identified in SOPIPA have been fixed (hah!) or that the impact of the regulation is nonexistent. If a law passes and it's not enforced, what is the impact?

The report also notes that the data collected, shared, and/or sold goes far beyond simple contact information. The report details that one vendor collects information on a range of physical and mental health issues, family history regarding domestic abuse, and immigration status.

One bright spot in the report is that, among the small number of school districts that responded to the researcher's requests for information, none appeared to be selling or sharing student information to advertisers. However, even this bright area is undermined by the small number of districts surveyed, and the fact that some districts took over a year to respond, and with at least one district not responding at all.

The report details the different ways that school-age youth are profiled by data brokers, with their information sold to support targeted advertising. While the report doesn't emphasize this, we need to understand profiling and advertising as separate but unrelated issues. A targeted ad is an indication that profiling is occurring; profiling is an indication that data collection from or about students is occurring -- but we need to address the specific problems of each of these elements distinctly. Advertising, profiling (including combining data from multiple sources), and data collection without clearly obtained informed consent are each distinct problems that should be understood both individually and collectively.

If you work with youth (or, frankly, if you care about the future and want to add a layer of depth to how you understand information literacy) the report should be read multiple times, and shared and discussed with your colleagues. I strongly encourage this as required reading in both teacher training programs, and as back to school reading for educators in the fall of 2018.

But, taking a step back, the implications of this report shine a light on serious holes in how we understand "student" data. The report also demonstrates how the current requirement that a person be able to show a demonstrable harm from misuse of personal information is a sham. Moving forward, we need to refine and revise how we discuss misuse of information.

Many of the problems and abuses arise from systemic and entrenched lack of transparency. As demonstrated in the report:

It is difficult for parents and students to obtain specificity on data sources with an email, a phone call, or an internet search. From the perspective of parents and students, there is no data trail. Likewise, parents and students are generally unable to know how and why certain student lists were compiled or the basis for designating a student as associated with a particular attribute. Despite all of this, student lists are commercially available for purchase on the basis of ethnicity, affluence, religion, lifestyle, awkwardness, and even a perceived or predicted need for family planning services.

This is what information asymetry looks like, and it mirrors multiple other power imbalances that stack the deck against those with less power. As documented in multiple places in the survey, a team of skilled researchers with legal, educational, and technical expertise were not able to pierce the veil of opacity maintained by data brokers and advertisers. It is both unrealistic and unethical to expect a person to be able to demonstrate harm from the use of specific data elements when the companies in a position to do the harm have no requirement to explain anything about their practices, including what data they used and how they obtained it.

But taking an additional step back, the report calls into question what we consider "student" data. The marketplace for data on school age people looks a lot like the market for people who are past the traditional school age: a complete lack of transparency about how the data are gathered, sold, used, and retained. It feels worse with youth because adults are somehow supposed to know better, but this is a fallacy. When we turn 18, or 21, or 35, or 50, we aren't magically given a guidebook about how data brokers and profiling work. The information asymmetry documented in the Fordham report is the same for adults as it is for youth. Both adults and youth face comparable problems, but the injustice of the current systems are more obvious when kids are the target.

Companies collect data about people, and some of the people happen to be students. Possibly, some of these data might have been collected within an educational context. But, even if the edtech industry had airtight privacy and security, multiple other sources for data about youth exist. Between video games, health-related data breaches (which often contain data about youth and families in the breached records), Disney and comparable companies, Equifax, Experian, Transunion, Axciom,, Snapchat, Instagram, Facebook Messenger, parental oversharing on social media, and publicly available data sources, there is no shortage of readily available data about youth, their families, and their demographics. When we pair that with technology companies (both inside and outside edtech) going out of business and liquidating their data as part of the bankruptcy process, the ability to get information about youth and their families is clearly not an issue.

It's more accurate to say data that have been collected on people who are school age. To be very clear, data collected in a learning environment is incredibly sensitive, and deserves strong protections. But drawing a line between "educational" data and everything else misses the point. Non-educational data can be used to do the same types of redlining as educational data. If we claim to care about student privacy, then we need to do a better job with privacy in general.

This is what is at stake when we talk about the need to limit our ISPs from selling our web browsing history, our cellular providers from selling our usage information -- including precise information, in real time, about our location. What we consider student data is tied up in the data trails of their parents, friends, relatives, colleagues -- information about a younger sister is tied to that of her older siblings. Privacy isn't an individual trait. We are all in this together.

Read the study. Share the study. It's important work that helps quantify and clarify issues related to data privacy for adults and youth.

Privacy Postcard: Starbucks Mobile App

2 min read

For more information about Privacy Postcards, read this post.

General Information

App permissions

The Starbucks app has permissions to read your contacts, and to get network location and location from GPS.

Starbucks app permissions

Access contacts

The application permissions indicate that the app can access contacts, and this is reinforced in the privacy policy.


Law enforcement

Starbucks terms specify that they will share data if sharing the information is required by law, or if sharing information helps protect Starbuck's rights.

Starbucks law enforcement

Location information and Device IDs

Starbucks can use location as part of a broader user profile.

Starbucks collects location info

Data Combined from External Sources

The terms specify that Starbucks can collect, store, and use information about you from multiple sources, including other companies.

Starbucks data collection

Third Party Collection

The terms state that Starbucks can allow third parties to collect device and location information.

Third party

Social Sharing or Login

The terms state that Starbucks facilitates tracking across multiple services.

Social sharing

Summary of Risk

The Starbucks mobile app has several problematic areas. Individually, they would all be grounds for concern. Collectively, they show a clear lack of regard for the privacy of people who use the Starbucks app. The fact that the service harvests contacts, and harvests location information, and allows selected information to be used by third parties to profile people creates significant privacy risk.

People shouldn't have to sell out their contact list and share their physical location to get a cup of coffee. I love coffee as much as the next person, but avoid the app (and maybe go to a local coffee shop), pay cash, and tip the barista well.

Privacy Postcards, or Poison Pill Privacy

10 min read

NOTE: While this is obvious to most people, I am restating this here for additional emphasis: this is my personal blog, and only represents my personal opinions. In this space, I am only writing for myself. END NOTE.

I am going to begin this post with a shocking, outrageous, hyperbolic statement: privacy policies are difficult to read.

Shocking. I know. Take a moment to pull yourself up from the fainting couch. Even Facebook doesn't read all the necessary terms. Policies are dense, difficult to parse, and in many cases appear to be overwhelming by design.

When evaluating a piece of technology, "regular" people want an answer to one simple question: how will this app or service impact my privacy?

It's a reasonable question, and this process is designed to make it easier to get an answer to that question. When we evaluate the potential privacy risks of a service, good practice can often be undone by a single bad practice, so the art of assessing risk is often the art of searching for the poison pill.

To highight that this process is both not comprehensive and focused on surfacing risks, I'm calling this process Privacy Postcards, or Poison Pill Privacy - it is not designed to be comprehensive, at all. Instead, it is designed to highlight potential problem areas that impact privacy. It's also designed to be straightforward enough that anyone can do this. Various privacy concerns are broken down, and include keywords that can be used to find relevant text in the policies.

To see an example of what this looks like in action, check out this example. The rest of this post explains the rationale behind the process.

If anyone reading this works in K12 education and you want to use this with students as part of media literacy, please let me know. I'd love to support this process, or just hear how it went and how the process could be improved

1. The Process


Collect some general information about the service under evaluation.

  • Name of Service:
  • Android App
  • Privacy Policy url:
  • Policy Effective Date:

App permissions

Pull a screenshot of selected app permissions from the Google Play store. The iOS store from Apple does not support the transparency that is implemented in the Google Play store. If the service being evaluated does not have a mobile app, or only has an iOS version, skip this step.

The listing of app permissions is useful because it highlights some of the information that the service collects. The listing of app permissions is not a complete list of what the service collects, nor does it provide insight into how the information is used, shared, or sold. However, the breakdown of app permissions is a good tool to use to get a snapshot of how well or poorly the service limits data collection to just what is needed to deliver the service.

Access contacts

Accessing contacts from a phone or address book is one way that we can compromise our own privacy, and the privacy of our friends, family, and colleagues. This can be especially true for people who work in jobs where they have access to sensitive information or priviliged information. For example, if a therapist had contact information of patients stored in their phone and that information was harvested by an app, that could potentially compromise the privacy of the therapist's clients.

When looking at if or how contacts are accessed, it's useful to cross-reference what the app permissions tell us against what the privacy policy tells us. For example, if the app permissions state that the app can access contacts and the privacy policy says nothing about how contacts are protected, that's a sign that the privacy policy could have areas that are incomplete and/or inadequate.

Keywords: contact, friend, list, access

Law enforcement

Virtually every service in the US needs to comply with law enforcement requests, should they come in. However, the languaga that a service uses about how they comply with law enforcement requests can tell us a lot about how a service's posture around protecting user privacy.

Additionally, is a service has no language in their terms about how they respond to law enforcement or other legal requests, that can be an indicator that the terms have other areas where the terms are incomplete and/or inadequate.

Keywords: legal, law enforcement, comply

Location information and Device IDs

As individual data elements, both a physical location and a device ID are sensitive pieces of information. It's also worth noting that there are multiple ways to get location information, and different ways of identifying an individual device. The easiest way to get precise location information is via the GPS functionality in mobile devices. However, IP addresses can also be mapped to specific locations, and a string of IP addresses (ie, what someone would get if they connected to a wireless network at their house, a local coffee shop, and a library) can give a sense of someone's movement over time.

Device IDs are unique identifiers, and every phone or tablet has multiple IDs that are unique to the device. Additionally, browser fingerprinting can be used on its own or alongside other IDs to precisely identify an individual.

The combination of a device ID and location provides the grail for data brokers and other trackers, such as advertisers: the ability to tie online and offline behavior to a specific identity. Once a data broker knows that a person with a specific device goes to a set of specific locations, they can use that information to refine what they know about a person. In this way, data collectors build and maintain profiles over time.

Keywords: location, zip, postal, identifier, browser, device, ID, street, address

Data Combined from External Sources

As noted above, if a data broker can use a device ID and location information to tie a person to a location, they can then combine information from external sources to create a more thorough profile about a person, and that person's colleagues, friends, and families.

We can see examples of data recombination in how Experian sorts humans into classes: data recombination helps them identify and distinguish their "Picture Perfect Families" from the "Stock cars and State Parks" and the "Urban Survivors" and the "Small Towns Shallow Pockets".

And yes, the company combining this data and making these classifications is the same company that sold data to an identity thief and was responsible for a breach affecting 15 million people. Data recombination matters, and device identifiers within data sets allow companies to connect disparate data sources into a larger, more coherent profile.

Keywords: combine, enhance, augment, source

Third Party Collection

If a service allows third parties to collect data from users of the service, that creates an opportunity for each of these third parties to get information about people in the ways that we have described above. Third parties can access a range of information (such as device IDs, browser fingerprints, and browsing histories) about users on a service, and frequently, there is no practical way for people using a service to know what third parties are collecting information, or how these third parties will use it.

Additionally, third parties can also combine data from multiple sources.

Keywords: third, third party, external, partner, affiliate

Social Sharing or Login

Social Sharing or Login, when viewed through a privacy lens, should be seen as a specialized form of third party data collection. With social login, however, information about a person can be exchanged between the two services, or taken from one service.

Social login and social sharing features (like the Facebook "like" button, a "Pin it" link, or a "Share on Twitter" link) can send tracking information back to the home sites, even if the share never happens. Solutions like this option from Heise highlight how this privacy issue can be addressed.

Keywords: login, external, social, share, sharing

Education-specific Language

This category only makes sense on services that are used in educational contexts. For services that are only used in a consumer context, this section might be superfluous.

As noted below, I'm including COPPA in the list of keywords here even though COPPA is a consumer law. Because COPPA (in the US) is focused on children under 13, there are times when COPPA connects with educational settings.

Keywords: parent, teacher, student, school, , family, education, FERPA, child, COPPA


Because this list of concerns is incomplete, and there are other problematic areas, we need a place to highlight these concerns if and when they come up. When I use this structure, I will use this section to highlight interesting elements within the terms that don't fit into the other sections.

If, however, there are elements in the other sections that are especially problematic, I probably won't spend the time on this section.

Summary of Risk

This section is used to summarize the types of privacy risks associated with the service. As with this entire process, the goal here is not to be comprehensive. Rather, this section highlights potential risk, and whether those risks are in line with what a service does. IE, if a service collects location information, how is that information both protected from unwarranted use by third parties and used to benefit the user?

2. Closing Notes

At the risk of repeating myself unnecessarily, this process is not intended to be comprehensive.

The only goal here is to streamline the process of identify and describing poison pills buried in privacy policies. This method of evaluation is not thorough. It will not capture every detail. It will even miss problems. But, it will catch a lot of things as well. In a world where nothing is perfect, this process will hopefully prove useful.

The categories listed here all define different ways that data can be collected and used. One of the categories explicitly left out of the Privacy Postcard is data deletion. This is not an oversight; this is an intentional choice. Deletion is not well understood, and actual deletion is easier to do in theory than in practice. This is a longer conversation, but the main reason that I am leaving deletion out of the categories I include here is that data deletion generally doesn't touch any data collected by third party adtech allowed on a service. Because of this, assurances about data deletion can often create more confusion. The remedy to this, of course, is for a service to not use any third party adtech, and to have strict contractual requirements with any third party services (like analytics providers) that restrict data use. Many educational software providers already do this, and it would be great to see this adopted more broadly within the tech industry at large.

The ongoing voyage of MySpace data - sold to an adtech company in 2011, re-sold in 2016, and breached in 2016 - highlights that data that is collected and not deleted can have a long shelf life, completely outside the context in which it was originally collected.

For those who want to use this structure to create your own Privacy Postcards, I have created a skeleton structure on Github. Please, feel free to clone this, copy it, modify it, and make it your own.

Dark Patterns when Deleting an Account on Facebook

3 min read

By default, Facebook makes it more complicated than it needs to be to delete an account. Their default state is to have an account be deactivated, but not deleted.

However, both the deactivation and deletion process can be undone if a person logs back into Facebook.

To make matters worse, to fully delete an account, a person needs to make a separate request to Facebook to start the account deletion process. Facebook splits the important information across two separate pages, which further complicates the process of actually deleting an account. The main page for deleting an account has some pretty straightforward language.

However, this language is undercut by the information on the . page that describes the difference between deactivating and deleting an account.

Some key details from the second page that are omitted from the main page on deleting an account include this gem:

We delay deletion a few days after it's requested. A deletion request is cancelled if you log back into your Facebook account during this time.

This delay is critical, and the fact that it can be undone is also something that needs additional attention.

Facebook further clarifies what they consider "logging in" in a third, separate page, where they describe deactivating an account.

If you’d like to come back to Facebook after you’ve deactivated your account, you can reactivate your account at anytime by logging in with your email and password. Keep in mind, if you use your Facebook account to log into Facebook or somewhere else, your account will be reactivated.

While Facebook's instructions aren't remotely as clear as they should be, the language they use here implies that an account deletion request can be undone if a person logs in (or possibly just uses a service with an active Facebook login) at any point during the "few days" after a person has requested their account deletion. It's also unclear what this means if someone logs into Messenger. And, of course, the avcerage person will never know that their Facebook account hasn't been deleted because they won't be going back to Facebook to check.

My recommendations here for people looking to leave Facebook:

  • First, identify any third party services where you use Facebook login. If possible, migrate those accounts to a separate login unconnected from Facebook.
  • Second, delete the Facebook app from all mobile devices.
  • Third, using the web UI on a computer, request account deletion from within Facebook.
  • Fourth, install an ad blocker so Facebook has a harder time tracking you via social share icons.

Facebook, Cambridge Analytica, Privacy, and Informed Consent

4 min read

There has been a significant amount of coverage and commentary on the new revelations about Cambridge Analytica and Facebook, and how Facebook's default settings were exploited to allow personal information about 50 million people to be exfiltrated from Facebook.

There are a lot of details to this story - if I ever have the time (unlikely), I'd love to write about many of them in more detail. I discussed a few of them in this thread over on Twitter. But as we digest this story, we need to move past the focus on the Trump campaign and Brexit. This story has implications for privacy and our political systems moving forward, and we need to understand them in this broader context.

But for this post, I want to focus on two things that are easy to overlook in this story: informed consent, and how small design decisions that don't respect user privacy allow large numbers of people -- and the systems we rely on -- to be exploited en masse.

The following quote is from a NY Times article - the added emphasis is mine:

Dr. Kogan built his own app and in June 2014 began harvesting data for Cambridge Analytica. The business covered the costs — more than $800,000 — and allowed him to keep a copy for his own research, according to company emails and financial records.

All he divulged to Facebook, and to users in fine print, was that he was collecting information for academic purposes, the social network said. It did not verify his claim. Dr. Kogan declined to provide details of what happened, citing nondisclosure agreements with Facebook and Cambridge Analytica, though he maintained that his program was “a very standard vanilla Facebook app.”

He ultimately provided over 50 million raw profiles to the firm, Mr. Wylie said, a number confirmed by a company email and a former colleague. Of those, roughly 30 million — a number previously reported by The Intercept — contained enough information, including places of residence, that the company could match users to other records and build psychographic profiles. Only about 270,000 users — those who participated in the survey — had consented to having their data harvested.

The first highlighted quotation gets at what passes for informed consent. However, in this case, for people to make informed consent, they had to understand two things, neither of which are obvious or accessible: first, they had to read the terms of service for the app and understand how their information could be used and shared. But second -- and more importantly -- the people who took the quiz needed to understand that by taking the quiz, they were also sharing personal information of all their "friends" on Facebook, as permitted and described in Facebook's terms. This was a clearly documented feature available to app developers that wasn't modified until 2015. I wrote about this privacy flaw in 2009 (as did many other people over the years). But, this was definitely insider knowledge, and the expectation that a person getting paid three dollars to take an online quiz (for the Cambridge Analytica research) would read two sets of dense legalese as part of informed consent is unrealistic.

As reported in the NYT and quoted above, only 270,000 people took the quiz for Cambridge Analytica - yet these 270,000 people exposed 50,000,000 people via their "friends" settings. This is what happens when we fail to design for privacy protections. To state this another way, this is what happens when we design systems to support harvesting information for companies, as opposed to protecting information for users.

Facebook worked as designed here, and this design allowed the uninformed decisions of 270,000 people to create a dataset that potentially undermined our democracy.

Spectre, Meltdown, Adtech, Malware, and Encryption

2 min read

This week has seen a lot of attention paid to Spectre and Meltdown, and justifiably so. Get the technical details here:

These issues are potentially catastrophic for cloud providers (see the details in the articles linked above) but they can also affect regular users on the web. While there are updates available for browsers that mitigate the attack, and updates available for most major operating systems, updates only work when they are applied, which means that we will almost certainly see vulnerable systems into the foreseeable future.

I was very happy to see both Nicholas Weaver and Zeynep Tufekci addressing the connection between these vulnerabilities and adtech. 

Adtech leaves all internet users exposed to malware - it has for a while, and, in its current form, adtech exposes us to unneeded risk (as well as compromising our privacy). This risk is increased because many commonly used adtech providers do not support or require encryption.

To examine traffic over the web, use an open source tool like OWASP ZAP. If you are running a Linux machine, routing traffic through your computer and OWASP ZAP is pretty straightforward if you set your computer up to act as an access point

But, using these basic tools, it's simple to see how widespread the issue of unencrypted adtech actually is, in both web sites and mobile applications (on a related note, some mobile apps actually get their content via an unencrypted zip file. You read that correctly - the expected payload is an unencrypted zip file. That's a topic for a different post, and I'm not naming names, but the fact that this is accepted behavior within app stores in 2018 should raise some serious questions).

The unencrypted adtech includes javascript sent to the browser or the device. Because this javascript is sent unencrypted over the network, intercepting it and modifying it would be pretty straightforward, which exposes people to increased risk. 

The next time you are in a coffee shop and see a kid playing a game on their parent's device while the parent talks with a friend, ask yourself: is that kid playing an online game, or downloading malware, or both? Because so much adtech is sent unencrypted, anything is possible.