PVC Shelters

5 min read

As summer faded into fall, and with the pandemic still here, I was sorting out how to both maintain healthy physical distance while hanging out outdoors with people. Because weather can get unpredictable, I wanted to sort out a way to keep people dry in case the weather turned, without embarking on a huge construction project.

Enter PVC! It's relatively inexpensive, and can be used to do many things, including building a portable rain shelter.

I can say from personal experience that it works reasonably well. If you have improvements, suggestions, or other thoughts, please hit me up on the bird site.

Required materials for a shelter that is 10 feet wide, 7+ feet high, and just over 5 feet deep (ie, a good shelter for 2-4 people who are in a pod) include:

  • PVC to build the frame;
  • a tarp to keep the rain out;
  • a means of attaching the PVC frame to the ground.

Optional materials include parachute cord, bungee cord and tent stakes.

With this overview in place, here are the precise materials I used for the structure.

PVC frame:

  • Walls: 6 pieces; 7 feet
  • Top crossbeams: 3 pieces, 64 1/2 inches
  • Top front and back brackets: 4 pieces, 5 feet
  • Arched roof: 10 pieces, 34 5/8 inches

These pieces can be cut from 11 10' pieces of PVC pipe. I used 1" diameter pipe, and all fittings in this writeup are also for 1" pipe. Make sure your fittings match the diameter of your pipe!

Connecting Pieces:

  • 4 3-way elbows for the corners
  • 2 T connectors for the center of the structure

Connectors

  • 5 90 degree elbows for the roof connectors
  • 10 slide ts for the roof connectors

Roof connectors

Cover:

Either an 8x10 or a 10x12 foot tarp. The 10 foot width is important because it fits cleanly over the top of the PVC frame.

Foundation:

For the foundation, I used ready mix concrete and rebar. I found that 4 foot rebar provides a little more stability than 2 foot, but both work really well. One consideration for the foundation pieces is that they should all be roughly the same height. Mine are all roughly six inches high, which means that the height of the structure will be 7 and a half feet (7 foot PVC, plus 6 inches foundation).

Optional Material:

Because I wanted a structure I could assemble and disassemble, I didn't glue the pieces together. However, I did want some extra stability, so I use lengths of parachute cord connected to bungee cord to keep everything connected and provide extra resilience against gusting wind. Additionally, if you want to leave the structure up for several days, using tent stakes and parachute cord to stake the structure down provides still more support.

Assembly

To begin, lay out the foundation pieces. The six foundation pieces should form a rectangle roughly 10 feet by five and a half feet. Then, slide the 7 foot lengths of PVC over the rebar.

Once you have the foundation and 7 foot lengths in place, attach the upper connectors. If you have a step ladder, that will be helpful, but a stable chair is just as effective. The three 64 1/2 inch lengths connect along the side, while the four 5 foot lengths connect along the front and back of the shelter.

Before you connect put the five foot lengths in place, make sure that you have put the slide ts in place on the five foot lengths. They are necessary for the roof.

Slide Ts in place on the 5 foot length of PVC

Once you have all "ceiling" pieces connected, put the roof in place. Use the 90 degree angle connectors to make 5 arched brackets with the 34 5/8th lengths of PVC.

roof supports

FAQ: why the weird lengths for these things? Great question! As you might have noticed, PVC pipe sells in 10 foot lengths. Let's say you wanted to reduce the number of cuts you needed to make and build a 10x10 structure. To make a roof for a 10x10 structure, you would connect two 64 1/2 inch lengths of PVC using the 90 degree angle connector. So, in the interest of reusing these pieces in different structures over time, I made the arbitrary call to make the depth of the smaller structure 64 1/2 inches, which in turn means that the roof brackets need to be 34 5/8 inches. The geometry of my measurements can definitely be fine tuned, but these measurements are what worked for me as I was building these structures.

Slide the angled roof supports into place, making sure to spread them evenly across the 10 foot length. Based on experimentation, I have found that 5 supports across a 10 foot length provide good support and protection even during heavy rain, but if someone wants to have one support every two feet, using 6 supporting brackets is required.

Now, the skeleton of the structure is in place. As noted earlier, these shelters are designed to be assembled, disassembled, and reused, so none of these pieces are glued together. If you want to add some extra stability without using glue, it's pretty straightforward using parachute cord, with or without bungee cords.

Two 12-13 foot lengths of parachute cord can be used along the front and back, and two 6-7 foot lengths can be used along the sides. Using bungee cords can make it easier to secure the parachute cord.

While using parachute cord to reinforce the connections isn't essential, it does add some additional stability which can come in handy on windy days.

Now that the roof is in place and the skeleton has been reinforced with parachute cord, one step remains: attaching the tarp to protect against rain or snow. Using an 8 by 10 or a 10 by 12 foot tarp simplifies this process because the 10 foot side fits cleanly over the width of the structure. Use bungee cords or parachute cord to attach the tarp, and you are ready to go.

Assembled PVC shelter

If the structure is going to stay up for a few days or longer, I recommend tieing the corners down using parachute cord and some spare tent stakes (which you can get at a hardware store or an outdoor supply store).

All of the raw materials used to build this structure are available at many hardware stores, or via places like Home Depot.

Thinking Out Loud Here

1 min read

I'm curious how these thoughts will age, so I'll put them here as a timestamped point of reference.

I've been thinking about the frame of disinformation as a "demand problem" lately - as in the problem of disinformation and misinformation exists because there are people supplying the lies, and also because there are people who are eager to consume those lies.

There is some accuracy in this, of course - it feels approximately similar to "if a tree falls in the forest and no one hears it, does it make a sound?"

But I don'e see this as a "demand problem" - people don't consume disinformation because they want to be lied to. I see this as more of a junk food problem - disinformation and misinformation are not good for us, but they are easy and they taste so good.

And it's great until Democracy dies of a heart attack and diabetes.

I'm certain that someone else has already articulated this idea - please feel free to point me to prior work here.

A Short Reading List on Requiring Cameras During Remote Learning

2 min read

This is a short reading list about why requiring video in classrooms is harmful to students, and erodes trust between students and teachers, and the quality of the learning environment. This list also includes resources for building a fair and equitable learning environment, both fully online, hybrid, or in person.

As an added bonus: in general, practices that respect learner agency and control contribute to a better learning environment, and are better for privacy.

  • Why video is more exhausting than in person discussion.
  • Suggestions for creating equitable and inclusive learning environments from Stanford. This doc isn't perfect, but it's a start, and while it's focused on online learning, many of the suggestions work equally well in person.
  • This piece summarizes how requiring cameras to be on for remote learners violates some of the basic things we know about trauma-aware teaching and universal design for learning:
  • The switch to remote learning -- a larger discussion that includes but is not limited to video -- often fails to meet basic accessibility needs. The requirement of camera use emphasizes sight, which embeds some ableist assumptions about what can actually be "seen" and how that "seeing" occurs.
  • I was interviewed for this piece on Good Morning America - it highlights some of the equity and privacy issues related to video in the classroom, and requiring videos to be on.
  • This infographic from Torrey Trust, Ph.D. shows some helpful do's and don't for using video with learners.

Additions from people shared after this post was initially published:


This list is short and incomplete! If there are other resources you think should be added, please share them with me on Twitter, and I'll add them in as time allows. Any added posts will include a credit to the person sharing it.

Making Sure Things Work

2 min read

Over the weekend, I pulled together some recommendations on how to protect privacy while working from home, and possibly sharing a computer with one or more people. However, after writing the recommendations, I was curious about if or how the differences could be quantified. To get a sense of this, I tested four different scenarios:

To run the test, I needed four sites that are crawling with trackers. Unfortunately, the web has no shortage of sites that are overrun with trackers.

For this test, I chose:

  • Weather dot Com
  • WebMD
  • HuffingtonPost
  • Breitbart

This test was pretty simple -- I visited the home page of each site, and scrolled to the footer. Then, I went to the next site, and repeated until I had visited all four sites. The order was the same for each test.

Web traffic was intercepted and observed using an intercepting proxy.

The results showed some clear differences.

  • De-Tuned Firefox - 157 calls to different domains
  • Tuned Firefox, no uBlock Origin - 42 calls to different domains
  • Tuned Firefox with uBlock Origin - 22 calls to different domains
  • Chrome, set to defaults - 192 calls to different domains

Not all of the third party domains called were explicitly about ad tracking, but it's worth noting that the sites were just as functional using Firefox with uBlock Origin -- which communicated with 22 different domains -- as they were when using default Chrome, which sent information to 192 different domains.

My takeaway from this: these limited, simple tests suggest that Chrome's defaults do little to nothing to minimize the number of calls to third party web sites, and protect users from tracking. Even a detuned version of Firefox -- where the defaults were adjusted to allow more trackers through -- was more effective than default Chrome. The steps outlined in my earlier post on browser hygiene -- and in particular, using uBlock Origin -- offer good protection from tracking.

The datset generated from the tests is available on Github.

Browser Hygiene for Better Privacy - Think of it Like Washing Your Hands Online!

6 min read

This post covers some of the basics of keeping the online components of your work life (or your school life) separate from your personal life. This split was good practice before Covid19, but now that we are all spending more time online -- for school, work, social interactions, shopping, news, entertainment, etc -- keeping a split between our personal lives and our school/work lives is an important element to protecting your privacy.

The steps in this post won't block all tracking, but they will minimize risk and minimize exposure. The advice and steps laid out in this post are all available free of charge. In one place, I recommend a password manager that has a subscription fee, but I also include a free option.

This post does not cover choosing a VPN. VPN's are a key component to both privacy and security. My one main piece of advice with regards to VPNs is to NEVER use a free VPN because they make money by exploiting their users. My second piece of advice regarding VPNs is to point people to https://thatoneprivacysite.net. The information on choosing a VPN helps provide context about things to consider when using a VPN.

The instructions in the post are split into three sections:

Setting up the Profile in Firefox

Open Firefox. If you don't have Firefox installed on your computer, get it here.

1. Enter "about:profiles" in the address bar. Click "Create New Profile".

Start to create a new profile

2. Click "Next" to navigate past the informational dialog text.

Skip the chitchat

3. Give your new profile a distinctive name, and click "Finish".

Name your new profile

4. Find the profile in the list, and click "Launch profile in new browser"

Launch profile in new browser

Congratulations! You now have a clean and fresh profile! The next section covers how to set it up for increased privacy protection.

Configuring the Browser Settings

1. In the Address Bar, enter "about:preferences#privacy" - this will allow you to adjust some privacy settings. If a setting isn't mentioned, you can leave it at the default setting.

General privacy settings

2. At "Cookies and Site Data" - select "Delete cookies and site data when Firefox is closed". This will wipe out tracking cookies when you close the browser, but it will also wipe out your logins so you will need to login each time. This is less convenient, but on a shared computer it prevents someone who isn't you from accessing sites where you have logged in.

Cookies and site data

3. For "Logins and Passwords" - de-select "Ask to save logins and passwords for websites" and "Show alerts about passwords for breached websites".

Logins and Passwords

Later in this post, we'll cover getting a good password manager.

The alerts feature for breached websites is powered by a great website, Have I been pwned. I strongly recommend signing up for an account on this site at https://haveibeenpwned.com/

4. In the "History" settings, selecting "Never Remember History" can bring additional privacy benefits, but a lot of people like the benefit of having the browser remember their history. If you choose to have the browser remember your history, you should clear your browsing history weekly using the "Clear history" button.

Don't remember browser history. World history? Remember that.

5. In "Address Bar" - de-select all options. While these suggestions could all be processed locally, I recommend erring on the side of caution. In the future, I might test this option by monitoring network traffic but I haven't done that yet.

Suggestions

6. In "Permissions" - for "Location" click the "Settings" button, and then select the "Block all requests for location" checkbox. For the other options here, you can block or leave open at your discretion. Firefox generally does a decent job of alerting you when an app or site asks to access your camera or microphone (for example, when you want to join a web based videoconference, you will need to provide access to your camera and microphone).

Location - hard no.

7. Under "Firefox Data Collection and Use" - de-select all options.

Just say no to data collection. FFS. No.

8. In the address bar, enter "about:preferences#search". Choose "DuckDuckGo" as your default search engine.

default search engine

9. In "Search Suggestions" - deselect all options. This prevents keystrokes being sent to any search engine when you enter your search terms in the address bar.

Search suggestions

10. In the address bar, enter "about:preferences#home". For "Homepage and new windows" and "New tabs" select "Blank page" as an option.

Homepage and new windows

Your profile is now set with some additional privacy protection.

Additional Protections

1. Now that the browser is set up to run cleanly, we want to add an extra layer of protection against tracking. While there are a range of options that exist, uBlock Origin provides a good balance between protection and usability. You can get the Firefox Extension here: https://addons.mozilla.org/en-US/firefox/addon/ublock-origin/

If you have chosen the option to "Never Remember History" you will need to select the "Allow to run in Private Windows" option to complete the install.

Allow to run in Private Windows

You will know that the install is successful when you see the uBlock Origin icon in the top right of your browser window.

uBlock Origin logo - installed

2. The final step we will include here is getting a password manager. If you want a web-based password manager you can use across your computer and phone, 1Password is a good option. They offer individual and family plans, and their subscription rate is reasonable. https://1password.com

If you only need a password manager that works on your computer, KeepassXC is a great option. It's open source, free, mature, and stable. Get it here: https://keepassxc.org/

Conclusion

With these steps in place -- a distinct browser profile for work and school, some tuned settings in the browser to increase protection, and some ad blocking paired with a password manager -- you have made some real improvements in safeguarding your privacy. The first few days you use this setup, it might feel awkward. That's okay - it's a new way of working, and change generally feels awkward.

Stick with it. As the steps become familiar, this way of working will become second nature -- and that's a skill you will need after the pandemic is over. It's not like adtech and the other companies that track us are going away anytime soon.

Maybe It Isn't a Great Idea to Outsource Public Education to Private Companies

2 min read

As the rapid switch to online learning has made abundantly clear, K12 schools in the United States schools need learning management systems, student information systems, and videoconferencing to function. It was pretty obvious before, but school during Covid19 has brought even more focus on the infrastructure that makes school possible. Learning management systems are the mediator between students and the work they are assigned; video conferencing (used well) allows students to connect with one another, and with their teachers, when a concept is better explained as a group, or when some community bonding is needed to maintain cohesion within the course.

How many public schools in the US rely on proprietary (closed source) software supplied by private companies to run their student information system and learning management system?

How many schools use videoconferencing solutions provided by a for profit vendor?

Let's be clear about this: the glue holding the required infrastructure of our public education system together is owned by private companies. The leaning management systems, the student information systems, the videoconferencing tools -- the most widely used systems are owned by private companies, and these private companies are paid with public dollars.

The privacy issues that plague K12 education exist for many reasons, but the central role played by for profit companies collecting data from K12 students as these students engage in their legally required public education is near the top of the list.

The observations in this post aren't new, but against the backdrop of a pandemic we should take careful note: the public education infrastructure is largely run by private companies with an obligation to shareholders first, school customers second, and students somewhere after that. It doesn't need to be this way; education shouldn't need to go hat in hand to private companies to have basic needs met, but here we are. Once we are through the worst of Covid19 (realistically, when we have a working vaccine that is widely accessible) we should re-evaluate a lot of assumptions that have shaped our educational system. Hopefully, our habit of outsourcing public education to private companies will be among the many items that get improved.

What Shows Up In Facebook's Ad Library Anyways?

6 min read

The Facebook Ad Library is part of Facebook's effort at increasing their transparency around political ads.

This post is going to ignore the myriad usability issues with the Ad Library, and focus on a more fundamental, but less visible question: what exactly can we see in the Ad Library anyways?

To start, we'll look at this overview page about the Ad Library. The second paragraph of this descriptive page contains this fairly specific description of what is covered in the Facebook Ads Archive:

The Ad Library contains all active ads running across our products. Transparency is a priority for us to help prevent interference in elections, so the Ad Library offers additional information about ads about social issues, elections or politics, including spend, reach and funding entities. These ads are visible whether they're active or inactive and will be stored in the Ad Library for seven years.

This description makes it clear that all active ads are in the Ad Library, and that "additional information" is available for ads "about social issues, elections or politics". The language in this description -- "These ads are visible whether they're active or inactive" -- is less than clear, primarily because of the unclear reference of "these."

The Facebook page describing the Ads Archive also contains makes it clear that keyword search only works on ads that have been categorized as about social issues, elections, or politics.

Ads that aren't about social issues, elections or politics will only be discoverable through visiting a Page in the Ad Library and will not surface in keyword searches.

We will return to the subject of keyword searches later in this post.

Over the weekend, Rob Leathern -- A Director of Product at Facebook -- responded to questions from two journalists, Brandy Zadrozny and Shoshana Wodinsky. The conversation was originally about the overlaps between boosted posts and ads, and in the ensuing conversation, Leathern added some details about how the Ad Library works, and about some things that the Ad Library omits.

In response to several questions, Leathern provided a clarification that should be added to the About the Ad Library page. In this Twitter conversation, Leathern appears to be very clear that, while all active ads are present in the Ad Library, only ads that are explicitly tagged as about social issues, elections, or politics will be stored in the archive after the ads are no longer active.

Ad Library

This clarification, while informative, raises the possibility of some clear and obvious loopholes, which prompted me to ask for some additional clarification -- because based on Leathern's description, it seems incredibly simple to avoid the additional review that is directed at political ads.

clarification

At this point in the post, I want to take a step back and highlight that I am sincerely appreciative of Rob Leathern's willingness to engage at all. This conversation took place on a weekend, and he is under no obligation (that I know of) to engage with anyone on Twitter about anything, including political ads. I see his willingness to answer questions as an act of good faith, and I appreciate his time and openness.

With that said, the current functionality of the Ad Library ensures that bad actors can operate with relative freedom. Leathern describes this as a "'tree falls in the woods' variety: if nobody knows it is a political ad, obviously it can’t be labeled and archived right?"

However, anything can be labelled and archived. Bad actors engaging in disinformation are not looking to work within the system, and they won't be kind enough to willingly label their posts accurately. This is where even a basic feature like keyword search across all active ads would be helpful - but as noted above, keyword search only works on ads that are labelled as about social issues, elections or politics.

Because unlabelled ads disappear from the archive when they stop running, this means that political posts from bad actors disappear from public view almost immediately. Additionally, because unlabelled posts are invisible to keyword search, the process of finding them in real time is essentially blind luck: either a person is served an ad when they are logged in, or they happen to stumble over a page promoting political posts.

At this point, it's not clear (to me, anyways) what percentage of past ads are available in the Ad Library. However, based on these descriptions, it's highly likely that many successful misinformation or disinformation campaigns are completely hidden from public view because Facebook is making an intentional choice to drop ads from view immediately after they stop running. A bad actor could minimize scrutiny simply by running ads for short durations. For operations focused on vote suppression, small numbers of tightly focused ads (content, demographic makeup, and geographic region) running for brief periods could possibly be both devastatingly effective, and largely invisible in the Ad Library. It's not like the dates of the US Elections are secret; a nation state actor or a political operative would have no problems creating dummy pages years in advance to use when needed.

Facebook has internal teams dedicated to fighting misinformation, and these teams also do some work with outside experts, and what I am describing is almost certainly not news to anyone doing misinformation or disinformation work inside or outside Facebook. However, this work is largely invisible to the vast majority of people outside Facebook. Facebook could increase transparency, and improve the usefulness of the Ad Library by taking the following steps:

0. Continue to archive all political ads for 7 years.
1. Expand the archive to include all ads in the Ad Library for somewhere between 6-12 months after they have stopped running.
2. Extend keyword search to all ads in the Ad Library.
3. Allow retroactive tagging of ads (ie, an Ad can be flagged as a political ad even after it has run).
4. Publish a rough percentage of the number of political ads, social issue ads, and election ads relative to the overall number of ads.

There are myriad other usability issues with the Ad Library, but steps 0-4 listed above would at least provide consistent and comprehensible results for external researchers looking to understand misinformation and disinformation within Facebook, Instagram, and Messenger.

Update on "Personal Email, School-Required Software, and Ad Tracking"

4 min read

I just re-ran the scan that, earlier this week, found what appeared to be advertising-related tracking in Canvas when a student logged in to Canvas after logging in to a personal GMail account.

The latest round of tests showed very different behavior: the tracking that was observed in the earlier tests is not present in the more recent tests. This change appears to have happened since I put out my original blog post approximately 36 hours ago. The technical details are in my original writeup (linked above), but the short version:

  • In the original scan, after logging into Canvas, there were two subdomains connected via redirects: "google.com/ads" and "stats.g.doubleclick.net". Calls to these subdomains appeared to map cookie IDs set for advertising to Canvas's Google Analytics ID.
  • In the original scan, after logging into Canvas, these subdomains were called multiple times (at least three times each over approximately 90 seconds of browsing).
  • In the most recent scan, after logging into Canvas, using an identical script to the original scan, these subdomains and the related cookie IDs are not called at all.

Fixed?

Viewed through a privacy lens, the removal of the cookie mapping is a good thing. It's an interesting shift, and raises a few questions and possibilities. I will attempt to include as many of these as possible, even options that are fairly unlikely.

  1. the fix for the issue I flagged in my post was already in the development pipeline and was deployed yesterday right on schedule;
  2. the ID mapping was part of a larger strategic plan and was removed intentionally;
  3. the ID mapping was in place as a result of human error, and this was addressed;
  4. the issue was related to how Google deploys Analytics, and Google made a change on their end completely unrelated to anything I observed;
  5. my original tests reported a bug or some other aberration that was subsequently fixed;
  6. ???

In my opinion -- based both on past experience with issues like this, and just a gut feeling (which for all obvious reasons, doesn't mean much) -- the third option (human error) feels most likely.

Regardless of the reason, I would strongly advise Instructure to provide a clear, transparent, and complete breakdown of what exactly happened here. There are range of plausible and reasonable explanations -- but students and families that have their information entrusted to Instructure deserve a clear, transparent, and complete explanation.

Taking a step back, this is an issue that goes beyond Instructure. While Instructure had the bad luck to be the vendor included in this scan, we need to look long and hard at the reliance the edtech industry places on Google Analytics.

Analytics data are tracking data, and can easily be repurposed to support profiling and advertising. Google Analytics is increasingly transparent about this, but we shouldn't pretend that analytics from other services can't be used in similar ways. Google describes the relationship very clearly:

When you link your Google Analytics account to your Google Ads account, you can:

  • view Google Ads click and cost data alongside your site engagement data in Google Analytics;
  • create remarketing lists in Analytics to use in Google Ads campaigns;
  • import Analytics goals and transactions into Google Ads as conversions; and
  • view Analytics site engagement data in Google Ads.

The distinctions made between educational data/student data and consumer data are often contrived, and the protections offered over "educational" data are fragile. Instead of thinking about "student data," we would be better off thinking about data that are collected in an educational setting -- and we would be even better off with real privacy protections that protected the rights of individuals regardless of where the data were collected.

Personal Email, School-Required Software, and Ad Tracking

18 min read

UPDATE December 21, 2019: After I put this post out, I re-ran the scan as part of routine follow up. The cookie mapping that was observed in the original scan and documented in this post is no longer present. It's not clear how or why this shift occurred, but at some point between the original scan, publishing this writeup, and a new scan completed after this writeup was published, the tracking behavior observed within Canvas has changed. More details are available here. END UPDATE.

Recently, a friend reached out to me with some questions about ad tracking, and the potential for ad tracking that may or may not occur when a learner is using a Learning Management System (or LMS) provided by a school. LMSs are often required by schools, colleges, and universities. LMSs hold a unique spot in student learning, effectively positioned between students, faculty, and the work both need to do to succeed and progress.

With the central placement of LMSs in mind, we wanted to look at a common use case for students required to use an LMS as part of their daily school experience. In particular, we wanted to look at the potential for third party tracking when students do a range of pretty normal things: check their personal email, search and find information, watch a video, and check an assignment for school. The tasks in this scan take a person about five to seven minutes to complete.

The account used for testing is from a real student above the age of 13 in a K12 setting in the United States. The LMS accessed in the test is Canvas from Instructure, and the LMS is required for use in the school setting. The full testing scenario, additional details on the testing process, and screenshots documenting the results, are all available below.

Summary and Overview

The scan described in this post focuses on one question: if a high school student has a personal GMail account and is required to use a school provided LMS with a school provided email, what ad tracking could they be exposed to via regular web browsing?

In this scan, we observed tracking cookies set on a person's browser almost immediately after logging into their consumer GMail account. These tracking cookies were used to track the person as they searched on Google and YouTube, and as they browsed a popular site focused on providing medical information. Because the GMail account used for the scan is a consumer GMail account, the observed tracking is not unexpected.

However, when the student logged into Canvas, the LMS provided by their school, using their school-provided email address which is not a GSuite account, we also observed the same ad tracking cookies getting synched to the LMS' Google Analytics tracking ID. This synchronization clearly occurred when the student was logged into the LMS.

This tracking activity raises several questions, but in this summary we will limit the scope to three:

  1. Why is a Google Analytics ID being mapped to tracking cookies that are tied to an individual identity and set in an ad tracking context?
  2. Why is the LMS -- in this example, Canvas -- using Analytics that potentially exposes learners to ad tracking?

These two questions lead into the third question, which will be the subject of follow up scans: given the large number of educational sites that also use Google Analytics, can similar mapping of Google Analytics IDs to adtech cookie IDs be observed on other educational sites?

The analysis of the scan is broken into multiple sections, and each section has a "Breakpoint" that summarizes the report.

  • Testing Scenario: The steps used in this scan to allow anyone to replicate this work.
  • Testing Process: The process used to set up for the scan.
  • Results: The full results of the scan.
  • Breakpoint 1: A summary of the process that sets the tracking cookies after a person logs in to a consumer GMail account.
  • Breakpoint 2: Search activity on Google.
  • Breakpoint 3: Ad tracking on the Mayo Clinic site.
  • Breakpoint 4: Search activity on YouTube.
  • Breakpoint 5: Mapping of Instructure's Google Analytics IDs to ad tracking IDs.
  • Additional Scans: Follow up work indicated by this scan.
  • Conclusions: Takeaways and observations from the scan.

Testing Scenario

The scan was run using a real GMail account, and a real school email account provisioned by a public K12 school district in the United States. The owner of both accounts is over the age of 13. The school email account was not a GSuite EDU account. The LMS used to run this test was Canvas from Instructure. The testing scan used these steps:

A. Consumer Google Account

  1. Log in at google.com
  2. Go to email
  3. Read an email
  4. Return to google.com
  5. Search for "runny nose"

B. Medical Information

  1. View the top hit from Mayo Clinic or WebMD

C. YouTube

  1. Go to YouTube.com
  2. Search for "runny nose"
  3. View the top hit for 90 seconds
  4. Watch one of the top recommended videos for 90 seconds.

D. School-supplied LMS in K12

  1. Go to Canvas login page and log in using a school-provided email address
  2. Navigate course materials (approximately 10 clicks to access assignments and notes)
  3. Return to student dashboard
  4. Log out of Canvas

Testing Process

The testing used a clean browser with all cookies, cache, browsing history, and offline content deleted prior to beginning the scan. The GMail account used had not modified or altered the default settings.

Web traffic was observed using OWASP ZAP, an intercepting proxy.

Results

In summarizing the results, we will focus on tracking that happens related to Google, and while logged in to Canvas. This analysis does not get into the tracking that Canvas does, or the tracking and data access permitted by Canvas via Canvas's APIs. For a good analysis of the tracking and access that Canvas allows via their APIs, read Kin Lane's breakdown of the data elements supported by Canvas's public APIs.

This post looks at one specific question: if a person is both browsing the web and using their school-provided LMS, what could tracking look like? The results described here provide a high level summary of the full scan; for reasons of focus and brevity, we only cover observed tracking from Google. Other entities that appear in this scan also get data, but Google gets data throughout the testing script.

In the scan, multiple services set multiple IDs. The analysis in this post highlights two IDs set by Google; these two IDs merit a higher level of attention because they are called across multiple sites, are mapped to one another, and are mapped to a separate Google Analytics ID connected to Canvas. In the scan, mapping Google Analytics IDs to IDs that appear to be connected to ad tech happens on both sites that use Google Analytics - the Mayo Clinic site, and the Canvas site.

To protect the privacy of the account used to run this scan, we obscure the IDs when we show the screenshots. The first ID will be marked by this screen:

Screen for Tracker 1

The second ID is marked by this screen:

Screen for Tracker 2

For privacy reasons, I also obscure the referrer URL and the user-agent string. The referrer URL shows the domain that was scanned, which in turn would expose the specific Canvas instance, which would compromise the privacy of the account used to run the scan. The user-agent string provides technical information about computer running the scan, including details about the web browser, version, and operating system. This information is the foundation of a device fingerprint, which can be used to identify an individual.

Step A. Consumer Google Account

Our scan begins with a person logging in to a personal GMail account.

Almost immediately after logging into GMail, the two tracking cookies are set. These cookies are set sequentially, and are mapped to one another immediately.

A call to "adservice.google.com" sets the first cookie. This initial request both sets a cookie (indicated by the value screened by "Tracker 1") and redirects to a second subdomain (googleads.g.doubleclick.net) controlled by Google:

Initial GET request

Screenshot 1

And this is the response that sets the cookie:

Response and set cookie

Screenshot 2

In the response shown above, three things can be observed/noted:

1. the initial request returns a 302 redirect that calls a new URL; 2. the location of the URL is specified in the "Location" line, highlighted in yellow; 3. the tracker value screened by "Tracker 1" is set via the "Set Cookie" directive.

The next event tracked in the scan is the get request to the URL (in the googleads.g.doubleclick.net subdomain) indicated in Screenshot 2.

Get request for Doubleclick

Screenshot 3

The screenshot below shows the response, including the directive to set the second tracking cookie (marked at "Set-Cookie").

Set Doubleclick cookie

Screenshot 4

At this point in the scan, the two cookies (marked by the "Tracker 1" and "Tracker 2" screens) that will be called repeatedly across all sites visited have been set. As shown in the screenshots, these cookies are mapped to one another from the outset. These two cookies are set after a person logs into a GMail account, so they can be tied to a person's actual identity.

As we will observe in this scan, these cookies are accessed repeatedly across multiple web sites, and connected to a range of different activities and behaviors.


Breakpoint 1: Two tracking cookies have been set. The specific responses that set the cookies are shown in Screenshots 2 and 4. As the cookie values are initially set, the values are set to "IDE" and "ANID" and it's important to note that the cookies are almost certainly synchronized with one another via the 302 redirect used to set both values sequentially. When the first cookie value is set, the response header specifies the exact call that sets the second cookie value. In practical terms, this means that Google and Doubleclick both "know" that Tracker 1 and Tracker 2 correspond to the same person. Moreover, because these cookies are set after a person logs into their personal GMail account, these values are directly tied to a person's identity.

Google provides some partial documentation on the cookies they set and access:

We also use one or more cookies for advertising we serve across the web. One of the main advertising cookies on non-Google sites is named ‘IDE‘ and is stored in browsers under the domain doubleclick.net. Another is stored in google.com and is called ANID

As shown above in Screenshot 2 the ANID value (marked by Tracker 1) is accessible from within .google.com. As shown above in Screenshot 4, the IDE value (marked by Tracker 2) is accessible from within .doubleclick.net.


Search on Google

After reading the email, we returned to google.com to do a search for "runny nose." After all, it is the season for colds.

One thing to note for any search functionality that returns suggestions while you type: this functionality doubles as a key logger. For example, when searching for "runny nose" we can observe every keystroke being sent to Google in real time.

Search autocomplete

Screenshot 5

As shown in the above screenshot, every keystroke entered while searching is tied to the first tracking cookie documented in our scan. The text entered in the search box is highlighted in yellow, and we can observe each new keystroke being sent to Google, with the get request mapped to the cookie ID set in Screenshot 2.


Breakpoint 2: Search activity on google.com is (obviously) managed by Google. The full search activity, including individual keystrokes, is tracked and tied to Tracker 1.


Step B. Medical Information

The search for information about a runny nose leads to a page on the Mayo Clinic web site. Visiting this page kicks off some additional tracking and advertising-related behavior.

First, we see the Google Analytics ID for the Mayo Clinic site mapped to the second tracking cookie ID. The Google Analytics ID for the Mayo Clinic site, along with the referrer URL, are both highlighted in yellow.

Mayo Clinic Analytics mapping

Screenshot 6

Then, we can observe what appears to be additional adtech and tracking-related behavior connected to this same tracking cookie ID

Mayo Clinic ad tracking behavior

Screenshot 7

As we can see in the above screenshot, the referrer url is from the specific page on the Mayo Clinic web site. As noted above, the cookie IDs are mapped to a specific identity known to Google. Thus, Google knows when the account used for this scan searched for a specific piece of medical information, and accessed a web site about it. Because these tracking cookies were set when a person logged into GMail, this activity can be directly tied to a specific person.


Breakpoint 3: when a person moves off a Google property, the tracking switches to Tracker 2, which can be read by Doubleclick. Screenshot 6 shows Tracker 2 being mapped to the Google Analytics ID of Mayo Clinic. Screenshot 7 shows additional ad related behavior connected to Tracker 2. In this section, we can observe two additional subdomains; stats.g.doubleclick.net (often connected to Analytics) and ad.doubleclick.net (generally connected to ads). It is not clear why the Tracker 2 value, which was clearly set in an advertising/tracking context, needs to be mapped to a Google Analytics ID.


Step C. YouTube

After visiting the Mayo Clinic web site, the scan continued on YouTube. Here, we searched for a video about a "runny nose" and watched the video.

As noted above when searching using Google, YouTube search also functions as a key logger, and ties the results to a cookie ID that is directly connected to a person's real identity.

Mapping cookies in YouTube

Screenshot 9

Screenshot 9 shows the "ru" of the eventual search query "runny nose". As shown in Screenshot 5 related to searching on Google, a request is sent for every keystroke, including spaces and deletions.


Breakpoint 4: Search activity within YouTube is managed by Google. As with search on google.com, the full search activity, including individual keystrokes, is tracked and tied to Tracker 1.


Step D. School-supplied LMS

After searching for and watching a video about a runny nose, the scan proceeded to log in to a K12 instance of Canvas.

For this scan, the person logged into the LMS with a school-provided email account. The school provided email account was not provisioned from a GSuite for EDU domain. The email address was from a domain connected to a K12 school district connected to a student account.

After the person logs into Canvas, both cookie IDs are mapped to Instructure's Google Analytics ID. The mapping occurs via 302 redirects, with the Analytics ID contained in URL calls that include the Cookie IDs in the request headers. The process is documented in the screenshots below, and is similar to the mapping that occurred while browsing the Mayo Clinic web site.

The referring URL is clearly a course within the LMS. The Google Analytics ID (UA-9138420) that belongs to Canvas/Instructure is highlighted in yellow.

The first call is to stats.g.doubleclick.net. As you can see in the screenshot below, the request includes the Google Analytics ID and the tracking cookie in the request header. The response returns a redirect that also includes the Google Analytics ID.

First call to map trackers in Canvas

Screenshot 10

As shown in Screenshot 10, the URL specified by the redirect points to google.com/ads. The redirect also contains the Google Analytics ID for Instructure.

Mapping trackers in Canvas

Screenshot 11

As described and shown in Screenshots 10 and 11, these two calls map both cookie IDs to Instructure's Google Analytics ID. To emphasize, both of the cookie IDs mapped to Instructure's Google Analytics ID are also directly connected to a personal GMail account that is tied to a person's identity.


Breakpoint 5: While logged into a school-provided (and required) LMS, both Tracker 1 and Tracker 2 are mapped to the Google Analytics ID of the LMS. This means that the same advertising IDs that are tied to a specific student's identity, tied to browsing history on a site with medical information, and tied to search history on Google and YouTube, are also tied to the Google Analytics ID of an EdTech vendor. In practical terms, this means that Google could theoretically incorporate general LMS usage data (time on site, time on page, pages visited, etc) into their profiles of learners and/or educators.


Visiting Subdomains

Visiting the subdomains called when the cookies were mapped to Instructure's Analytics ID returns web sites that appear to serve advertisers.

Attempting to visit google.com/ads redirects to a page that clearly appears to be connected to advertising:

Google Ads web page

Screenshot 12

Attempting to visit stats.g.doubleclick.net redirects to a page that offers services for analytics related to Google Marketing Platform.

Google Marketing

Screenshot 13

A look at the features overview page shows that there is a "native data onboarding integration" with Google Ads and Adsense, and "native remarketing integrations" with Google Ads.

Google Analytics integration

Screenshot 14

Additional Areas for Examination

This initial scan was limited in scope to test one specific -- yet common -- use case: what does ad tracking look like when a person has a consumer GMail account, and uses the same browser to access that personal account as their school-provided LMS? With this initial scan in place, several follow up tests would help create a more complete picture.

  • Use a school-provided Gmail account.
  • Visit other sites with ads and observe other ad-related interactions that are mapped to either of these cookies.
  • Test other LMSs that use Google Analytics to see if there is comparable mapping of Google Analytics IDs to cookie IDs.
  • Test other educational sites that use Google Analytics to see if there is comparable mapping of Google Analytics IDs to cookie IDs.

These scans would each provide additional information that would help create a more complete picture, and would build on and provide additional context to what was observed in this initial scan. If the mapping observed in this scan is replicated across the web on other educational sites that use Google Analytics within K12 or higher ed, then -- theoretically -- students could be profiled based on their interactions with sites they are required to use for school. The types of redlining, targeting, or "predictions" that would be possible from this type of profiling are clearly not in the best interests of learners.

Conclusions

This scan covers a pretty common use case: a person who checks their personal email and searches for other information, and then does some schoolwork. As documented in this writeup, this behavior results in a range of tracking behavior that includes:

  • a. tracking cookies are set shortly after a person logs into a Google account, and these cookies are directly tied to a person's specific identity;
  • b. via these cookies, Google gets specific information about searches on YouTube and Google, including keylogging of the search process;
  • c. via these cookies, Google gets specific information about the sites a person visits, and when they visit them;
  • d. on both sites in this scan that used Google Analytics, the domain's Google Analytics ID was synched with tracking cookies;
  • e. while logged in to an LMS as a high school student, the Google Analytics ID of the required LMS for a public high school student is mapped to cookie IDs that appear to be used for ad targeting, and are tied to a student's real identity.

It is not clear why Instructure's Google Analytics ID needs to be mapped to cookie IDs that are set in a consumer context and appear to be related to ad tracking.

To be very clear: the tracking cookies mapped to a person's actual identity occurred within the context of consumer use. When a person uses Gmail, or searches via Google, or browses a site for medical information, they are tracked, and they are tracked in ways that can be connected back to their real identity. This is how adtech works, and -- based on current privacy law in the United States -- this is completely legal.

As observed in this scan, the tracking cookies set in a consumer context are also accessed when a student is logged into their LMS, in a strictly educational context. In practical terms, the only way for a high school student to completely avoid the type of tracking documented in this scan would be to practice abnormally strong browser hygiene -- for example, they could set up a separate profile in Firefox that they only used while accessing the LMS. But realistically, the chances of that happening are slim to none, and "solutions" like this put the onus in the wrong place: a high school student should not be required to fix the excesses of the adtech industry, especially when they are accessing the required software that comes as a part of their legally required public education.

Dark Patterns and Passive Aggressive Disclaimers - It's CCPA Season!

4 min read

In today's notes on CCPA compliance, Dashlane gets the award for passive aggressive whinging paired with a dark pattern designed to obscure consent. I have managed to get my hands on secret video of Dashlane's team while they were planning how to structure their opt out page. This completely legitimate video is included below.

Hidden camera video of Dashlane team
Hidden camera video of the design process for Dashlane's opt out page.

In case you've never heard of Dashlane, they are a password manager. Three alternatives that are all less whingy are 1Password, LastPass, and KeePassXC -- and KeePassXC is an open source option.

Dashlane appears to be preparing for California's privacy law, CCPA, which is set to go into effect in 2020. 

The screenshot below is from Dashlane's spash page where, under CCPA, they are required to allow California residents to opt out of having their data sold. CCPA has a reasonably broad definition of what selling data means, and, predictably, some companies are upset at having any limits placed on their ability to use the data they have collected or accumulated. 

Full page screenshot

Dashlane's disclaimer and opt out page provides a good example of how a company can comply, yet exhibit bad faith in the process.

First, let's look at their description of sales as defined by CCPA:

However, the California Consumer Privacy Act (“CCPA”), defines “sale” very broadly, and it likely includes transfers of information related to advertising cookies.

Two thoughts come to mind nearly simultaneously: this is cute, and stop whining. Companies have used a range of jargon to define commercial transfers of data for years - for example, "sharing" with "affiliates", or custom definitions of what constitutes PII, or shell games with cookies that are mapped between vendors and/or mapped to a browser or device profile. It's also worth noting that Dashlane is theoretically a company that helps people maintain better privacy and security practice via centralized password management. It's hard to imagine a better example of a company that should look to exceed the basic ground level requirements of privacy laws. Instead, Dashlane appears to be whinging about it.

However, Dashlane does more than just whine about CCPA. They take the extra step of burying their opt out in a multilayered dark pattern, complete with unclear "help" text and labels.

Dark pattern

As shown in the above screenshot, Dashlane's text instructs people to make a selection in "the box below". However, two obvious problems immediately become clear. First, there is no box, below or otherwise - the splash page contains a toggle and a submit button.

Second, assuming that the toggle is what they mean by "box", we have two options: "active" or "inactive." It's not clear what option turns cookies "off" - does the "active" setting means that we have activated enhanced privacy protections, or does the "active" setting means that ad tracking is activated? This is a pretty clear example of a dark pattern, or a design pattern that intentionally misleads or confusers end users. 

Based on additional language on the splash page, it looks like the confusion that Dashlane has created is pretty meaningless because anything we set on this page appears pretty easy to wipe out, either intentionally or accidentally. So, even if the user makes the wrong choice because the language is intentionally confusing, this vague choice can get erased pretty easily.

Brittle settings

Based on this description, the ad tracking opt out sounds like it's cookie based, and therefore brittle to the point of meaningless.

While it remains to be seen how other companies will address their obligations under CCPA, I'd like to congratulate Dashlane on taking an early lead in the "toothless compliance" and "aggressive whinging" categories.