Using Deep Learning to Identify Malaria In Cells

Creating A Convolutional Neural Network to Predict Whether or Not a Cell is Infected with Malaria

 

When you think of the world’s most deadly diseases, malaria might not make the top of the list. But for children, malaria is the third largest cause of mortality (UNICEF, 2018). In 2019, the World Health Organization (WHO) estimated that over 400,000 individuals died from malaria--the majority of those were children under the age of 5 (Fact Sheet about Malaria, 2021). While the disease is almost nonexistent in the United States, it affects half of the world’s population (UNICEF, 2018).

In areas where malaria is widespread and medical access limited, antimalarial drugs are often used whenever an individual has symptoms of malaria even if there is no formal diagnosis. While this practice makes sense in areas with limited medical care, it has had costly side effects. Alarming drug resistance trends have emerged over the last few decades. UNICEF estimates in the global malaria hotspots, close to 70% of those infected no longer respond to common antimalarial drug treatments (Fact Sheet about Malaria, 2021). One of the main causes for this occurrence is believed to be the overuse of antimalarial drugs like chloroquine (Hyde, 2007).

 
(Source - World Health Organization, “Global report on antimalarial drug efficacy and drug resistance: 2000–2010”, p. 29)

(Source - World Health Organization, “Global report on antimalarial drug efficacy and drug resistance: 2000–2010”, p. 29)

 

While an obvious remedy would be limiting the distribution of chloroquine to only diagnosed cases, formally confirming a malaria diagnosis is a herculean endeavor in many rural and underdeveloped areas. The first of two options currently available requires a clinical specialist to analyze a blood smear under a microscope and manually count infected cells to determine a diagnosis (CDC, 2020). Option two is a rapid diagnostic test or RDT. These tests can be done within a few minutes in a clinic and don’t require examination under a microscope. The problem with this type of test is that it only is able to identify 1 out of 5 strands of malaria (Makanjuola & Taylor-Robinson, 2020).

There needs to be innovation in all areas of malaria prevention and treatment, but a faster way to an accurate diagnosis would be a good start. I decided to see if I could use deep learning to determine if a picture of a cell was infected with malaria or not. 

THE TASK

In this next section, I’m going to explain a bit about what CNNs are and the process behind how I developed one to accurately predict malaria infections. If the technical side of things is not interesting to you and you want to just see the results, feel free to skip down to the next section by clicking here :)

Convolutional Neural Networks (CNNs) are a class of deep learning algorithms most commonly used to detect patterns in images. Images are represented to the network as arrays (think structured groups of numbers). Depending on the color of a pixel or tiny piece of an image, a different number is given to represent that tiny piece of the image. These numbers often range on a scale from 0 to 255, because 255 is the largest number that can be stored in one byte of memory on your computer. If you’ve done any type of graphic design, you’ll be familiar with this because rgb colors range from 0 to 255.

From there, these arrays of numbers that represent an image are passed through different layers. These layers can come in many different arrangements and have different functionings, but in the most basic form, these layers contain mathematical directions on how to process the array and how to assign meaning to the numbers in the array. The coolest part is that through a process called backpropagation, those mathematical directions are changed to give more accurate results as the arrays are fed through the network.

The two most common layers in my model are convolution layers and pooling layers. Convolution layers walk through an image’s array one section at time and put each section through a filter. These filters take the dot product (a systematic way of multiplying matrices together) of each matrix found in the image’s array and another predefined array. This is how neural nets are able to pull out lines, curves, and other defining features out of an image. Pooling layers help to pull out the strongest features in an image. These layers also walk through an image’s array one piece at a time, but instead of passing the array through a filter, they pull out the largest number in the array and make a new array based on the largest number in each matrix. After the array is fed through each layer, it repeats the process a number of times. Each time the layers are altered slightly to try to capture the patterns more accurately.

In my model, there are 9 layers--3  convolution, 3 pooling, 1 flattening, and two dense layers. I pulled ideas on how to structure my model from the VGG16 model developed by Karen Simonyan and Andrew Zisserman from the University of Oxford (Simonyan & Zisserman, 2015). I borrowed theory from their model on structuring pooling layers after convolution layers but made my network more lightweight since I only needed to train it on one specific type of image. (There’s was designed to be trained on thousands of images.) 

The model was trained on about 17,000 images, validated on 4,000, and tested on almost 7,000 more. Number of infected and uninfected cell images were equal. Because of the structure of neural networks, you can get slightly different accuracies if you allow for random assignment of filter matrices (those mathematical directions I talked about earlier). I wanted to allow for some randomization in the model so that in the long run I could be more sure of the accuracy. I ran the model four times and received similar accuracies each time giving me confidence that it could be replicated and rerun with successful results.

OUTCOME

Using the CNN model detailed above, the algorithm had a 95% accuracy on the validation set of images and a 95% accuracy on the test set of images. That means that on images the model had never seen before, it made correct identifications 95% of the time.

Screen Shot 2021-04-25 at 8.24.15 PM.png

Where this CNN is not useful:

This algorithm has no way of determining between the different strains of malaria. If in the coming years malaria treatment expanded so as to treat individual types of malaria, this algorithm would also need to expand. Additionally, this algorithm was built to identify anomalies on cell images that have been previously stained to make the malaria parasites stand out. The staining is a common operation done when slides are prepared, but it’s worth noting that this would not be helpful if the slides were not stained.

Where this CNN could be useful:

I believe this idea could be expanded in a way that would eliminate much of the manual labor involved in making a malaria diagnosis. If a patient had many of the symptoms of malaria and you were looking to just confirm a diagnosis, all you would need to do is prepare a stained blood slide, take a photo under a microscope, feed the photo into a computer, and the computer would then in just a few seconds be able to show results. It could dramatically cut down on the time and labor required for a formal laboratory diagnosis. While there would still be a lot of work to make the idea production ready, the basic prediction model has been shown to work.


Want to check out the code or run the model yourself? Visit https://github.com/JenFaith/malaria-detection
Have feedback? I’d love to hear from you at jennifer.faith16@gmail.com


References:

  • UNICEF. (2018, April). Ten Things You Didn’t Know About Malaria. https://www.unicef.org/press-releases/ten-things-you-didnt-know-about-malaria

  • Fact sheet about Malaria. (2021, April 1). Who.Int. https://www.who.int/news-room/fact-sheets/detail/malaria

  • Hyde, J. E. (2007). Drug-resistant malaria − an insight. FEBS Journal, 274(18), 4688–4698. https://doi.org/10.1111/j.1742-4658.2007.05999.x

  • Makanjuola, R. O., & Taylor-Robinson, A. W. (2020). Improving Accuracy of Malaria Diagnosis in Underserved Rural and Remote Endemic Areas of Sub-Saharan Africa: A Call to Develop Multiplexing Rapid Diagnostic Tests. Scientifica, 2020, 1–7. https://doi.org/10.1155/2020/3901409

  • CDC. (2020, February 19). CDC - Malaria - Diagnostic Tools. Cdc.Gov. https://www.cdc.gov/malaria/diagnosis_treatment/diagnostic_tools.html

  • Simonyan, K., & Zisserman, A. (2015, October). VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION. University of Oxford. https://arxiv.org/pdf/1409.1556.pdf



Next
Next

Predicting Health Violations in Skilled Nursing Facilities