Creating Accessible Data to Empower Asylum Seekers
An Overview of My Experience in Labs Creating a Database to Track Asylum Cases for a Nonprofit

Introduction to Human Rights First
Over the last month, I’ve had the opportunity to work on a project with the nonprofit Human Rights First (HRF). Their mission is to fight for human rights across the United States through policy and accountability. Protecting refugees and ensuring they can receive asylum is one of HRF’s core missions. The project I worked on was directly tied to this mission.
The laws surrounding asylum cases are often applied and interpreted differently by individual immigration judges in the United States. Our mission with this project was to give lawyers insight into how to frame asylum cases in the most compelling way to give the asylum seeker the highest chance of being approved. To achieve this, I worked with a team of data scientists and web developers to create a database that would allow lawyers across the country to access historical information on asylum cases.
When we started this project, we were building off of great work that teams of data scientists had completed before us. Our job was to improve the functionality of a scraper that was intended to pull data from scans of pdf case files. The idea was that lawyers could upload casefiles and instead of manually having to type out all of the fields for the database, the fields would be auto populated.
Training an Algorithm to Think Like a Lawyer
This ended up being a more complicated task then I initially realized. The biggest challenge was just understanding all of the data we needed to be scraping. As someone who doesn’t have any background in law or asylum cases, it took a lot of reading and research to gain an understanding into what needed to be done.
Some of the fields we needed to pull were the date of the case, gender of the applicant, whether or not the individual had experienced some type of violence, what made them qualify for asylum, whether or not the judge found the applicant credible and much more. A few of these fields were a little easier to automatically find in the document, but many were much more context dependent like asylum qualifications.
In case files, lawyers will cite precedents or laws but only use a piece of that to apply to their applicant’s case. Finding a way to make an algorithm understand the difference was going to be our biggest challenge. As a team, we spent a significant amount of time just reading through different case files and finding patterns within the documents. We took copious notes on what phrases surrounded the data we were trying to pull.
Challenges and Triumphs
As we worked through the project, a few problems began to emerge. First, there was no distinction being made on the different types of cases. Initial and appellate cases were both being treated the same way and we were trying to pull the same data from both. This doesn’t work. Appellate cases have different judges and a different set of potential outcomes. The first major pivot we made in the project was to totally separate out appellate and initial cases in our databases. Instead of having one table with all the cases, there would be two separated by case type.
As a team, we worked to think through problems and solve them together. One of the problems I spent a bit of time on was how to pull the correct date from each case file. Depending on the format of the case, the date can be found in several different areas and most cases also include several dates. I first tried scraping the date based on finding titles similar to “Date:”, pulling the string of text that followed, and seeing if it contained a date. This was successful in most cases but not all. I thought the algorithm could do better. The most accurate method I found was to use a package called spacy and the matcher functionality contained in that package. Essentially, spacy ran over the tokenized text in each of the case files and pulled out any text that matched the patterns I programmed it to look for.
Then, I changed each of the strings to be timestamps using the datetime package in python. The algorithm then searched for the most recent timestamp found and returned that timestamp in a consistently formatted way. I communicated with the web developers to make sure the data was being returned to them in a way that would work for their database.
Polishing Up & Handing Off Our Work
As my team’s four weeks in labs came to a close, we needed to package up our work and leave it in a way that the next group of students would be able to continue the work we had started. We were all really excited about the progress we had been able to make over just a few weeks. The web developers had created a beautiful web application that was able to have cases uploaded and stored. The data scientists had successfully created algorithms to scrape 7 different fields including city, state, applicant gender, date, country of origin, violence experienced by the applicant, and different statutes referenced within the document.
The next group of students will be faced with the challenge of getting the application 100% ready to be handed off to the client. There are still a few data fields that the client wants pulled that our team didn’t have time to get to. Also, with a new team looking at the scraper, they might think of ways to make the runtime more efficient. Currently it takes around 50 seconds for the document to be uploaded and scraped so any improvement on that would be a huge plus.
One of my favorite parts of this project was being able to work with my team. I had never directly worked with web developers before so being able to see a piece of their world and what their job contains was awesome. It gave me more understanding on how to go about my job as a data scientist in a way that makes my work more production ready. More than anything, it made me excited to work on more projects like this and continue to expand the work that I am able to do.
Check Out The Code Here: https://github.com/JenFaith/human-rights-first-asylum-ds-a