Making Text from the Facebook Papers More Accessible


I’ve been working on extracting text from the released pdfs of the Facebook Papers. The cleaned pdfs, the extracted text and the code used to clean the text are all available on Github.

Original pdf on the left; processed pdf on the right

The script requires Python 3.6 or higher, and has only been tested on Linux. Enjoy!

The Details

Like many of us, I’ve been following the reporting on internal Facebook documents, and how these documents confirm and reinforce details that have been clear about Facebook for years, and how these documents illustrate exactly how well Facebook knew and didn’t act to solve the problems they created.

Also like many of us, I’ve been dying to see the original docs, so when the team at Gizmodo started releasing the docs I was pretty darn excited.




Seriously, the team at Gizmodo (Shoshana Wodinsky, Dell Cameron, Andrew Couts) have been doing stellar work reporting on these docs, and getting the core docs released publicly.

Due to the provenance of these documents, the “pdfs” released were actually worse than your normal PDF – and that’s saying something, because on the best of days PDFs are where information goes to die. These pdfs appear to be a collection of images taken of a computer screen stitched together into pdfs.

But the information in these pdfs is incredibly valuable, and we are lucky to have it.

Fortunately, from an old side project, I had some dirty, ugly, functional code lying around that cleaned up PDFs. I grabbed some of the early docs released by Gizmodo, did a test run, and lo and behold, it worked. It was ugly, but it worked.

Last night, I reworked my original (dirty, ugly) script into something cleaner, that generates better output. I generally don’t write code, except when I need to, so about the only thing I will ever say about code I write is that it solves a clearly defined problem for me at a point in time — which is a far cry from actually writing code that is good. In this improved version, I had some invaluable help from Smart People Who Know Things (I have asked permission to credit them here; I’ll update this post if/when I receive their consent

The resulting code is now up on Github, along with the text files and the cleaned pdfs. I’m keeping my fingers crossed that I don’t bump into any repository size restrictions on Github anytime soon.

And: if there are any improvements you’d like to make or questions you have, let me know.

Moving Off AT&T

I’ve been meaning to move to a different mobile provider for years, but the story about how AT&T supports – and continues to support – a propaganda network that actively spreads disinformation finally broke through my inertia.

For others who want to move off AT&T and port your number, I want to share one hiccup in the process that I experienced. This documentation assumes that:

  • you are out of contract with AT&T;
  • have unlocked your phone;
  • have a SIM card for your new carrier;
  • you are porting your existing number to your new carrier.

The transfer documentation for many services states that you should swap in your new SIM card before starting the transfer. With AT&T, if you have SIM protection enabled (which you should, and might be enabled by default, which is a good thing), you will need to respond to a text message that asks you to confirm the number transfer.

And, if you have swapped out your AT&T SIM card to your new SIM, you’ll never get the message.

So, if you’re moving from AT&T to another carrier, your sequence should look something like this:

  • verify that you are out of contract, and/or are okay with any financial penalties from switching mid-contract
  • verify that you have an unlocked phone;
  • select a new provider;
  • get a SIM from the new provider, and leave this out of the phone;
  • initiate the number transfer;
  • respond to the text from AT&T confirming the transfer;
  • remove the AT&T SIM card and replace it with the SIM from your new carrier.

Then, do a happy dance because you are no longer supporting a phone carrier that supports propaganda and disinformation!

(and yeah, I know, AT&T owns, well, everything. But their mobile service is terrible and expensive, and every journey is made up of small steps.)

Image Credit:
“Phone, Telefon, Fernsprechapparat” by Dr. Mattias Ripp, released under a CC 2.0 Generic license.

Categorized as ATT, mobile