Mortgage Data, and Working with Large Datasets

Since The Markup reporters Lauren Kirchner and Emmanuel Martinez released their story on bias in mortgage algorithms, I’ve been digging into the data behind their reporting and looking at potential additional patterns. The story is worth a read, and a re-read. They also do a great job showing their work, which includes releasing the code and data they used for their analysis.

Their reporting is based on the 2019 data, but the Consumer Finance Protection Bureau also has 2020 data, so I figured I’d grab that as well.

This is a sizeable dataset, and even though I have a decent workhorse of a machine, loading the datasets made my computer VERY unhappy.

To work around this, I did two things. First, I pulled the code from the Jupiter notebooks into Python, which helped reduce memory usage and CPU load bit, at least in my setup. But this wasn’t enough to process the full dataset without crashing, so I made a temporary increase in the size of my swap directory. I saved this as a bash file so I can run it whenever I need a temporary memory boost to prevent crashes.

I’ve worked with large datasets with tens of millions of records in the past, and I have never needed to do this. Writing to swap files can be very slow in its own right, and if there is a better way to prevent crashes when loading large data sets, I’d love to hear it. As I process data, I am deleting dataframes when I no longer need them, and using gc to free memory, but on my machine loading the datasets caused the crash. I would not recommend using this hack as a permanent solution, or on a machine that is not local.

The commands can be typed out individually, which elimnates the need for a script. But hey – why type out three lines when you can just type out one? This script was used on a Debian flavored Linux system; YMMV if used in other setups.

In the script, you need to set two variables: the location of the swap file, and the size. Make sure that your hard drive has adequate room to support your swap file.

Before you run the script, run sudo free -h from the command line. This will show your default setup, with the amount of free memory on your system, and your default swap setup. After you run the shell script, re-run sudo free -h to see the changes.

When you restart your computer, your system reverts to the default setup.



sudo fallocate -l $SIZE $SWAPDIR
sudo chmod 600 $SWAPDIR
sudo swapon $SWAPDIR

There Is No Such Thing as an “Online Proctoring System”

Image credit:

The act of proctoring an exam in person is pretty straightforward. I have proctored more than my share over the years.

For most exams, the rules and expectations about what is allowed during the exam are established before the exam. The proctor will review them, but they generally aren’t a surprise, and they largely center around the physical space, what additional material is allowed, and other basic, common sense details.

If an in-person proctor required new rules, new checks, and made technical demands that the student needed to meet in real time before taking the exam, that would be abnormal and intrusive.

If a proctor stared into a test takers eyes and tracked where the test taker looked, that would be invasive.

If the proctor demanded that the test taker show them the inside of their backpack at any point, that would be invasive.

If the proctor kept a running tally of every time the test taker looked away from the physical exam, that would be an absurd and meaningless statistic.

If the proctor demanded that the test taker shine a light on their face, sit by the window, turn on more lights in the room, so that the proctor could get a better picture of them, that would be creepy.

If the proctor let the test taker know that they would be sending a report on their behavior to their instructor, and that the instructor might accuse them of cheating, that would not be considered fair or reliable.

In person proctors generally do not do any of these things.

Yet, all of the above, and more, are common “features” of what we currently mis-name as “online proctoring systems.”

Moving forward, we need to call these systems what they are: surveillance tools used during tests.

There is no straight line between the behavior on in-person proctors and the surveillance of “online proctoring systems.” These are different systems, with different impacts, and that needs to be openly acknowledged.

FunnyMonkey gets a technical facelift

Keeping even a simple web site up to date is work, and anything we can do to reduce the time required is a good thing. On this site, I’ve been carrying old posts going back to 2005, which is just plain silly.

In the interest of simplifying things, I made a couple decisions:

  1. All the old posts are archived as flat html; and
  2. is now running on WordPress.

I’ve used WordPress for a range of things over the years, and it’s a solid foundation. I’d be lying if I said I loved it, but I don’t hate it, and it doesn’t fill me with revulsion. In an ideal world, I’d be running something using flat files and markdown, and I’ll probably move in that directtion sooner rather than later, but until then, WordPress is a decent option.