Skip to content

Working with DuckDuckGo’s Tracker Radar

The Tracker Radar from DuckDuckGo is an awesome project – for those not familiar with it, it’s a comprehensive list of organizations that maintain tracking tools used on web sites. It tracks different organizations that provide tracking software, and – more importantly – it connects urls to their corporate owners. I’ve done attribution work manually, in the dark ages before the Tracker Radar existed, and it was laborious and time consuming. The existence of the Tracker Radar makes work that used to take hours take seconds. In addition to being used in work from DuckDuckGo itself, the Tracker Radar is incorporated into work like Blacklight from The Markup.

As is true of any data set of any size, it requires maintenance. In this post, I’ll document my understanding of how to stand up the required pieces of the Tracker Radar and the Tracker Radar Detector to make reasonably clean pull requests if and when I see information that needs updating. This post serves two purposes: first, I’m documenting this for my future self, so six months from now when I need to replicate this work I’ll know where to start; and second, if anyone else wants to know how to get up and running, this can help them.

Caveats and disclaimers: this post represents my best understanding of the process at this point in time. I’ll update this post if and when I learn about details that need to be added, or explained more clearly.

The steps laid out in this post incorporate and expand upon some feedback that the maintainers of the project were kind enough to provide. I add in some additional details for people reading this who might not be versed in the technical details, but who want a starting point to learn more.

This documentation was developed and tested on Linux, but it should work on OSX as well.

Overview

  • For brief details with less explanation, jump to the Cheat Sheet.

This documentation has six steps. Completing the setup in steps 1-4 took me about 10 minutes. The first four steps only need to be done once, on initial setup.

Steps 5 and 6 are where the actual work happens; making and pushing changes took around another 10 minutes.

1. Prerequisites

This work uses both git and Node. Setting up git and node isn’t especially complex, but a full breakdown is outside the scope of this post. For details on setting up both git and Node, see:

If you are not familiar with git or Node, this is a good way to start to get familiar with them.

2. Fork the Projects

To start, you will need to fork two projects: the Tracker Radar and the Tracker Radar Detector. To fork the project, navigate to the main page of each project and click the “Fork” button in the top right corner of the page.

Tracker Radar landing page

The Tracker Radar landing page. Note the “Fork” button in the top-right.

This creates a copy of the repository in your account. You will be able to write your changes to this copy, and then submit those changes back to the original repository, where the maintainers will be able to review them and accept them, modify them, or reject them.

3. Clone the Projects Locally

Once you have forked both repositories, open up your terminal and navigate to the directory where you will be working. You will want to put both projects in the same directory, so create that directory:

mkdir ddg_tools

Move into this directory:
cd ddg_tools

Return to your web browser and go to the forked copies of the repositories you created earlier. You want to get the code for each project so you can clone it onto your local machine. To do that, click the “Code” button, and select the “SSH” option. Copy the address for the code, and return to your terminal.

Get url for the repo

Get the url to clone the repository locally.

Clone both repositories using “git clone”.

  • git clone git@github.com:YOUR_GITHUB_USERNAME/tracker-radar.git
  • git clone git@github.com:YOUR_GITHUB_USERNAME/tracker-radar-detector.git

This will create two top-level directories: “tracker-radar” and “tracker-radar-detector”.

We will make changes in the “tracker-radar” directory, and use a single utility in the “tracker-radar-detector” to process those changes. This probably sounds more complicated than it really is.

4. Set up Tracker Radar Detector

In the terminal, move into the Tracker Radar Detector:

cd tracker-radar-detector

Install depencies:

npm install,

or npm i

After the dependencies are installed, you need make one change in the config.json file of the Tracker Radar Detector.

Location of the config.json file

Location of the config.json file

In the file, set trackerDataLoc to the absolute path of the tracker-radar directory. In my setup, this value needed to be the absolute path, not a relative path, and I also recommend not using a trailing slash at the end of the path.

5. Making Changes to the Tracker Radar

Now that the Tracker Radar Detector has been set up, we are ready to work. The actual changes will all be in the Tracker Radar; once the changes have been made they will be processed using a command in the Tracker Radar Detector.

From your terminal, navigate back to the “tracker-radar” directory. Check the branch you are currently on — from a fresh checkout/clone, you should be on “main” but it’s always better to verify.

git status

In order to keep your work organized, you should create a new branch for each related set of changes. I generally try and name my branches something that makes sense to a human – for example, if I’m working on a change related to Citrix I’ll name the branch citrix.

To create and check out a new branch, enter this into your terminal:

git checkout -b BRANCHNAME

When making changes to the Tracker Radar, we will be working primarily with two files: entity_map.json and privacy_policies.json.

Modifying privacy_policies.json

To add a privacy policy for an organization, add the data to the privacy_policies.json file. This file is located in:

tracker-radar/build-data/static/privacy_policies.json.

The structure is pretty straightforward; to the best of my knowledge, the one thing that needs to be verified is that the organization name in privacy_policies.json needs to match a corresponding organization in entity_map.json. I am not 100% certain of that, however.

The contents of a privacy policy record

The contents of a privacy policy record.

If all you are changing is the privacy_policies.json file, then you are done, and can skip to the next section: Push Changes back to Github.

If, however, you are changing information about ownership of a tracking company (ie, one company buys another, or a company changes their name), then you will need to modify the entity_map.json file.

Modifying the entity_map.json record

To modify information related to organization names and the urls they control, modify entity_map.json.

This file is located in tracker-radar/build-data/generated/entity_map.json.

The entity_map.json files contains thousands of records about organizations. Each record stores values for:

  • name;
  • aliases (often when a company has bought other organizations);
  • properties (the domains associated with an organization);
  • display name.

The structure for Gandi.net shows the data contained in these records:


    "Gandi SAS": {
        "aliases": [
            "Gandi SAS"
        ],
        "properties": [
            "gandi.net"
        ],
        "displayName": "Gandi"
    },

Examining a more complicated record, such as the one for Pearson Education, shown below, demonstrates how an organization with multiple aliases and affiliated urls is stored:


    "Pearson Education, Inc.": {
        "aliases": [
            "Pearson",
            "Pearson Education, Inc.",
            "Pearson PLC"
        ],
        "properties": [
            "adobepress.com",
            "ciscopress.com",
            "informit.com",
            "ldoceonline.com",
            "peachpit.com",
            "pearson.com",
            "pearsonactivelearn.com",
            "pearsonclinical.com",
            "pearsoncmg.com",
            "pearsoned.com",
            "pearsonitcertification.com",
            "pearsonmylabandmastering.com"
        ],
        "displayName": "Pearson Education"
    },

Once you have made any needed changes in the entity_map.json file, save the file. In your terminal, review the changes:

git diff entity_map.json

If the changes look right, return to the tracker-radar-detector directory.

In the terminal, enter: npm run apply-entity-changes.

This script processes the updates you made in entity_map.json and propagates those changes to other parts of the Tracker Radar. To review and commit these changes, return to the tracker-radar directory, and view the files that have changed.

git status

This will show all files that have changed, and will verify that you are on the branch you created earlier.
git diff PATH_TO_FILE

This shows the changes in individual files.

If the changes look right, add and commit the changes.

git add PATH_TO_FILE

This is perhaps overly cautious, but adding files individually helps reduce the chance of human error in adding cruft to the repository.

Once all files have been added, commit the changes:

git commit

6. Push Changes Back to Github

With the changes made and committed locally, the final step pushes the changes back to the remote repository (Github, in this case). Once the changes are pushed to the remote repository, you can create the pull request.

To push the local changes to the remote repository:

git push origin BRANCHNAME

For example, when I was working in the “citrix” branch, I pushed that branch to the remote repository with git push origin citrix.

This command pushed the changes to the fork you created in Step 2. Once they have been pushed to the forked repository, you need to create the pull request on the original source repository.

Pull requests

The “Pull Requests” tab on Github.

To do this, return the the original repository and click the “Pull Requests” tab. You should see a message that flags your commit; use that to streamline creating your pull request, or click the green “New Pull Request” button on the right side of the screen.

Conclusion/Cheat Sheet

Reading this post will take nearly as long as doing the actual work. Once you’ve run through it once, it will be significantly faster. This cheat sheet summarizes the basic steps.

Fork the Projects

Done via the web browser.

Clone the Projects Locally

  • mkdir ddg_tools
  • cd ddg_tools
  • git clone git@github.com:YOUR_GITHUB_USERNAME/tracker-radar.git
  • git clone git@github.com:YOUR_GITHUB_USERNAME/tracker-radar-detector.git

Set up Tracker Radar Detector

  • cd tracker-radar-detector
  • npm install, or npm i

In a text editor, modify config.json by setting trackerDataLoc to the absolute path to the local Tracker Radar repository.

Making Changes to the Tracker Radar

Starting from main, create and checkout a branch for your changes:

  • git checkout main
  • git checkout -b BRANCHNAME

Modify entity_map.json and/or privacy_policies.json as needed. If you modify entity_map.json, you will need to run npm apply-entity-changes in the Tracker Radar Detector.

In the tracker-radar repository:

  • review your changes with git diff;
  • add changed files with git add PATH_TO_FILE;
  • commit files with git commit

Push Changes Back to Github

  • git push origin BRANCHNAME

In the web browser, return the the original repository and click the “Pull Requests” tab.