Malaria Cell Detection using Deep Learning and Python, High school final essays of Computer science

This document is a dissertation on malaria cell detection using deep learning model in Python. Convolution neural network is used as an approach and results are discussed in detail. Also data is collected from an open sourced database where the link is provided in the document.

Typology: High school final essays

2019/2020

Available from 12/06/2021

muslekh-barbhuiya
muslekh-barbhuiya 🇮🇳

2 documents

1 / 55

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Malaria Cell Detection using Convolution Neural
Network
Dissertation submitted in part fulfilment of the requirements for the degree of
MSc in Data Analytics at Dublin Business School
Muhammad Haseeb
10514547
MSc in Data Analytics August 2020
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34
pf35
pf36
pf37

Partial preview of the text

Download Malaria Cell Detection using Deep Learning and Python and more High school final essays Computer science in PDF only on Docsity!

Malaria Cell Detection using Convolution Neural

Network

Dissertation submitted in part fulfilment of the requirements for the degree of

MSc in Data Analytics at Dublin Business School

Muhammad Haseeb

MSc in Data Analytics August 2020

DECLARATION

I declare that this applied project that I have submitted to Dublin Business School for the award of M.SC in Data Analytics is the result of my own investigations, except where otherwise stated, where it is clearly acknowledged by references. Furthermore, this work has not been submitted for any other degree. Signed: Muhammad Haseeb Student Number: 10514547 Date:

Table of Content

LIST OF FIGURES

  • DECLARATION..............................................................................................................................................
  • ACKNOWLEDGEMENT.................................................................................................................................
  • Table of Content..........................................................................................................................................
  • LIST OF FIGURES..........................................................................................................................................
  • LIST OF TABLES............................................................................................................................................
  • Abstract.......................................................................................................................................................
    1. INTRODUCTION........................................................................................................................................
    • 1.1 Malaria Disease characterization.....................................................................................................
      • 1.1.1 Stages of parasites....................................................................................................................
      • 1.1.2 Species of parasites...................................................................................................................
    • 1.2 Malaria Diagnosis Characterization..................................................................................................
    • 1.3 Machine learning Overview.............................................................................................................
    • 1.4 Neural Network Overview...............................................................................................................
    • 1.5 CNN Architecture.............................................................................................................................
      • 1.5.1 Convolution layer......................................................................................................................
      • 1.5.2 Pooling layers............................................................................................................................
    • 1.6 VGG-19 Architecture........................................................................................................................
    • 1.7 Aim of the project............................................................................................................................
    • 1.8 Problem Hypotheses and Research Questions................................................................................
    1. LITERATURE REVIEW..............................................................................................................................
    1. DESIGN IMPLEMENTATION AND SETUP.................................................................................................
    • 3.1 Custom layered CNN........................................................................................................................
    • 3.2 VGG-19 model.................................................................................................................................
    • 3.3 Image processing techniques...........................................................................................................
      • 3.3.2 Feature extraction.....................................................................................................................
      • 3.3.3 Feature Selection......................................................................................................................
      • 3.3.4 Classification.............................................................................................................................
    • 3.4 Experimental Set Up........................................................................................................................
    1. METHODOLOGY.....................................................................................................................................
    • 4.1 About the data.................................................................................................................................
    • 4.2 Customized CNN model...................................................................................................................
      • 4.2.1 Importing the data....................................................................................................................
      • 4.2.2 Data Exploration.......................................................................................................................
      • 4.2.3 Data Preparation.......................................................................................................................
      • 4.2.4 Image manipulation..................................................................................................................
        • 4.2.4.1 Feature Standardization.....................................................................................................
        • 4.2.4.2 ZCA whitening....................................................................................................................
        • 4.2.4.3 Saving images to disk.........................................................................................................
      • 4.2.5 Generating images from a directory.........................................................................................
      • 4.2.6 Creating the model...................................................................................................................
      • 4.2.7 Compile the model....................................................................................................................
      • 4.2.8 Training the model....................................................................................................................
      • 4.2.9 Fitting the model.......................................................................................................................
    • 4.3 VGG model.......................................................................................................................................
      • 4.3.1 Importing the libraries..............................................................................................................
      • 4.3.2 Data Preparation.......................................................................................................................
      • 4.3.3 Uploading the framework.........................................................................................................
      • 4.3.4 Creating the model...................................................................................................................
      • 4.3.5 Compiling the model.................................................................................................................
      • 4.3.6 Preparing data for the model....................................................................................................
      • 4.3.7 Training the model....................................................................................................................
      • 4.3.8 Fitting the model.......................................................................................................................
    1. RESULTS.................................................................................................................................................
    • 5.1 Custom layered CNN........................................................................................................................
    • 5.2 VGG-19 model.................................................................................................................................
    1. DISCUSSION...........................................................................................................................................
    1. RESEARCH LIMITATIONS........................................................................................................................
    1. CONCLUSION.........................................................................................................................................
  • References.................................................................................................................................................
  • Figure 1: VGG-19 Architecture...................................................................................................................
  • Figure 2: Configuration of CNN models.....................................................................................................
  • Figure 3: Design flowchart of custom layered CNN model........................................................................
  • Figure 4: Design flowchart of VGG-19 model.............................................................................................
  • Figure 5: Parasitized cells of training images.............................................................................................
  • Figure 6: Uninfected cells of training images.............................................................................................
  • Figure 7: Joint plot of dimensions of training images................................................................................
  • Figure 8: Model summary of CNN model...................................................................................................
  • Figure 9: Model summary of VGG-19 model.............................................................................................
  • Figure 10: Training and validation loss of CNN model...............................................................................
  • Figure 11: Training and validation accuracy of VGG-19 model..................................................................
  • Figure 12: Training and validation loss of VGG-19 model..........................................................................

LIST OF TABLES

1. INTRODUCTION Malaria is a disease particularly spread by Anopheles Mosquito and it is considered as an infection caused by protozoan parasite. With a reporting case of 240 million per year, the disease largely targets over 40 percent of the world population. The infection of malaria can give rise to symptoms which ranges from fever, nausea and headaches to severe cases such as coma, seizures and yellow skins and sometimes this disease even leads to death. If we go by the reports in 2017, as per WHO, almost 219 million cases of malaria were registered which led to over 435000 deaths in that year. Early detection of malaria is a good sign and can be cured but severe cases may sometimes cause symptoms such as seizures and led to death of a person who is infected. Many techniques ranging from cheap to expensive are used worldwide in order to cure Malaria. Often technicians use the blood cells of infected person to detect the presence of parasites in a human body. These techniques often require lot of time and are sometimes costly. Mostly, a technique called gold standard method is used which is again a microscopy-based method. In such method, blood smears are used and magnified by a microscope. Blood smears analysis is quite difficult as there are different types of malarial parasites and analyzing all is quite difficult. High magnification factor such as 1000x is required and it is proved to be more painful for visual examination for technicians and pathologists. Another drawback is the delay required in the examination of parasites as large delay of analysis can go fatal for those patients who are undergoing severe symptoms. Another drawback is the amount of skill and expertise required in observation as parasites differ based on their appearances. These drawbacks and limitations of manual examination of parasites in the blood cells forced a way to solve and fasten the diagnosis process.

1.1 Malaria Disease characterization

1.1.1 Stages of parasites

In the human body, a parasite can undergo 3 growth stages that are present in the blood. The 3 stages are the trophozoite stage, the schizont stage and the gametocyte stage. The most commonly seen parasite is in the trophozoite stage when they appear in thick blood smearand vary from small to large inside the host cells. When the parasite starts to begin daughter cells and trophozoite stage is going to be completed, the parasite passes to schizont stage where the merozoites also known as the daughter cells of the parasites start developing in the blood cells. The chromatin bodies tend to grow in the schizont stage and they are ready to invade uninfected blood cells. This is where gametocyte stage begins where they form round or in the shape of a banana like structure and in this stage the species can be identified whether they are male or female. In thin blood smears, the sex of the parasites can be identified but it is completely difficult to identify the sex of the parasites in thick blood smear.

1.1.2 Species of parasites

There are generally four species of parasite that can infect a human body. These are P.falciparum, P.vivax, P.ovale and P.malariae. The most commonly occurring parasite is P.falciparum which can cause severe illness and eventual death if not detected at an early stage and treated with medicines. Most of the people are infected with this parasite. P.vivax is most commonly found in the cooler parts of the tropical region and it is also dangerous which can cause severe fever and lead to death. P.ovale is the rarest of all parasites as it is mostly found in African region and mostly detected in West Africa. P.malarie is also found as common as P.falciparum but it leaves the patient undergo chronic illness such as nephritic syndrome. Though it does not lead to death, but it leaves behind chronic infection in many patients.

1.2 Malaria Diagnosis Characterization

The infection of malaria can be doubted based on the symptoms of patients, history of travel where the patient was previously travelling and the physical examination of the patient body in the lab. In order to detect and diagnose the disease properly, proper tests conducted by proper technicians is necessary to detect the parasites properly. From ages, the microscopic examination is the most common method which detect the infected cells by preparing and staining the blood smear with Giemsa stain and observing it under a microscope. This type of diagnoses cannot be

1.5 CNN Architecture

To overcome such problems, CNN is used which used to fetch promising results using categorization and feature extraction. Using deep learning techniques such as CNN automated malaria screening for detection of parasites in the blood cell is proved to be an effective idea for diagnosis of malaria. Microscopic screening of blood cells is often time consuming and is not accurate especially in rural areas where there is lack of expert’s pathologists and technicians and this can be considered one of the reasons of the endemic. CNN or Convolution Neural Network is an image processing technique which uses neural layers to extract the features from the blood cells and categorized them as parasitic zed or uninfected. This technique can be also helpful in advanced medical studies and researches where this technique can be further developed for much advanced level of object detection. Convolution Neural Networks often comprises of two elements which are convolutional layers and pooling layers. Images are a group of colorful objects for us but for computers and algorithms, they are a group of number known as pixels. Each image comprises array of pixels made of heights, widths and dimensions. An RGB images consists pixels of 3 dimensions while a grayscale image consists pixels of single dimension. For building a CNN model, input images are considered in training and passed through an activation function to give the output. The activation function is used to feed all the input parameters and provides the probabilities for different output of images. Now let us discuss about the convolution and pooling layers.

1.5.1 Convolution layer

The convolution layer is the initial layer that helps in extracting the features from the input images. It learns the feature by taking two inputs such as image matrix and filters. A simple convolution can be defined as an operation which involves the product of the input data and weights also known as filter or kernel and is precisely designed for two-dimensional input. The filter is basically tending to be smaller compared to the input data and a dot product multiplication is applied between the input and the filter. A dot product operation is always considered a scalar product as it gives a single values output after being multiplied with filter by an input.

1.5.2 Pooling layers

Pooling layers are used to cut down the number of parameters if the image found as an input is large. Down sampling of feature maps is required to locate the sensitivity which is obtained after the implementing the convolution layers in the input image. Pooling layers help to summarize the features in the feature maps. There are mostly two types of pooling which are max pooling and average pooling. An average pooling helps to summarize the average features in a feature map whereas the max pooling helps to summarize those features which are most activated in a feature map.

1.6 VGG-19 Architecture

VGG is a deep convolutional network which was proposed by Ziesserman and Simonyan in 2014 and is considered a pre trained model. A pre trained model is a model which is previously trained on large scale images and assigned different weights and biases which can be used to train on new images to save time and computational power. VGG-19 is trained on Imagenet, which is an image project contain about 1.2 million images of 22000 categories. These images are used to train this model and is powerful in order to classify images outside of these categories by using transfer learning methods such as feature extraction and fine tuning. VGG is very simple in nature which uses 3x3 convolutional layers stacked on top of each other. It uses 19 weight layers in the network. The stacked layers are reduced in terms of volume by using max pooling technique. Then they are finally connected by two fully connected layers with 4096 nodes followed by a SoftMax function. Let us illustrate the architecture below

Figure 2 : Configuration of CNN models VGG-19 architecture is clearly defined in column E of the configuration table.

1.7 Aim of the project

This project is designed to serve the following purposes. a) To classify the images as parasitized and uninfected b) To attempt for improving the model using VGG- c) To study the effect of miss class rates d) To study the effect of different convolution layers

e) To perform a better analysis compared to a normal CNN model f) To find a method to solve malaria detection in an effective way.

1.8 Problem Hypotheses and Research Questions

Previously organized studies found the effectiveness of convolution layers in medical image classification and use of different approaches enhances and improves the accuracy of the model. Medical images classification tends to be a serious concern in practice as high false rates can give rise to unusual problems suffering from various diseases and people from economic burden who are uninfected but claimed diagnosed due to false rates of the model. Also, the increase of layers and use of pre trained models claim to improve the performance of the model and reduce the misclassification.  Hypothesis 1: The model will show a good accuracy on minimum amount of convolution layers  Hypothesis 2: The model will show a good accuracy on pre trained approach  Hypothesis 3: The miss class rate will be low enough to avoid problems while making new predictions  Hypothesis 4: The overall performance of the model will be effective in solving medical problems. The effectiveness of the model and use of convolution layers in solving medical imaging problems gave rise to some of research questions. These questions are sorted from less abstract to more abstract and they are as follows

  1. How effective is pretrained model over custom layered model? The pre trained is model is already trained on ImageNet, a dataset with over 1.2 million images of 22000 categories. This model can be trained on images which are not trained by using transfer learning as extracting features from the trained images. These are effective as they save lot of computational power and time as they are already trained. A custom layered model will train from the beginning and is also effective as it can extract the features of the image much accurately in medical images.
  2. Why VGG-19 is used as a pre trained model and not other models?

2. LITERATURE REVIEW

Many previous studies have been conducted to detect blood cells using image recognition software’s and neural networks. Some of the studies have been closely followed while doing this research. Meanwhile different approaches are used for blood cell detection and these studies are collected to fetch minimal knowledge on the effectiveness of this method. Also, proper evidence throughout the studies encouraged us to proceed further in the studies. Poharkar and Ladhake, 2018,[1] took help of Boltzmann’s machine for model pre training with the help concatenated color features and texture features. There are believed to be other feature extraction techniques which are beneficial such as color coherence vector, color histogram, etc. Textural features on gray level and statistical methods for gray level co-occurrence matrix are able to run binary pattern and length matrix. This study is supported by applying backpropagation algorithm with 4 hidden layers along with 600 hidden nodes. This study was a success as it gave almost a specificity of 95%. Another study suggested by Sanjay Saxena and Paul, 2021,[2] proposed an algorithm for prediction of malaria using Artificial Neural Network where data are collected by collecting physical appearance of the patient or the verbal record given by the patient without having proper results of the tests of their disease. Another study performed by Delgado-Ortet et al., 2020[3] who performed diagnosis using CNN in rural areas using blood smear samples by specifying resolution of p.falciparum and p.vivax ratio. For detection of infected malarial blood cells, they used the approach of laser desorption mass spectrometry was used during the growth of red blood cells. They applied Ultraviolet laser beam in order to clean the blood samples by specifying particular threshold. Fagbamigbe, 2019[4] also used various machine learning approaches and deep learning approaches for medical diagnosis of malarial blood cells. He used techniques such as decision trees, linear programming, rule-based approach, fuzzy logic. Also, he used the approach of Mockler situation analysis in his study, SVM, neural networks and expert systems. Poostchi et al., 2018[5] also perfomed a study where they identified an approach where automated system is created that helps the infected samples with the help of blood smears. Extraction method is applied in order to separate the RBC in blood. The algorithm initially

performed in the study is C-EPAC Geodesic Active Contour based Segmentation which was tested on human and mouse blood cells that were infected with malaria. They took the help of clinical databases to collect the data and extracted the features such as RBC color, performance of classification and evaluation, texture features, etc.For classification of cells, they have implemented neural networks. Later study by Panchori et al., 2020[6] analyzed the images using machine learning approaches. Their later study focused more on image cell classification and segmentation. In their research, various image acquisition techniques of malaria such as fluorescent, binocolor, multi spectral, multi modal, phase imaging, cartridge scanner, image-based cytometer, electron scanning, serial block face scanning, sightDX image scanning and other approaches. Srivastava, 2020[7] also performed image processing technique for detecting malaria disease. Parasites from infected cells are detected with the help of uninfected red platelets and the segmentation of such cell is done with the help of local binary patterns. They used SVM technique to detect the number of plasmodium parasites. Amini and Rabbani, 2016[8] applied the technique of ORB and SVM for medical image classification. They considered the local features as a small fraction of medical images and applied KNN to classify the whole image. Same study is conducted by Warrier, 2018[9] in order to detect Pneumonia where they extracted features such as discrete wavelet transform, wavelet frame transform and used fuzzy C-means technique to detect Pneumonia. Another study performed by Caicedo et al., 2017a, [10] who used SVM to classify the images and used feature descriptor such as SIFT(Scale invariant feature transform) and achieved a precision on about 67%. The approach of ORB is used by Amami, 2017 [11] in order to achieve better performance than SIFT and used the same SVM technique in order to classify the algorithm. Similar study by R.G, 2020 [12] presented a study by designing a custom CNN including shallow ConvLayer in order to detect lung diseases. Their study can be extended for other studies of disease detection. CNN can be trained from chest X ray images having high accuracy and sensitivity results. High accuracy can be achieved on a data such as Stanford Normal Radiology diagnosis dataset that contain around 400000 XCR images. Transfer learning is also another option to train a model where there is limited data available for training.