I recently completed the cs231n course and wanted to implement some of the things I learnt. So I turned to this report on predicting heart volumes from a MRI image dataset of 500 patients. This post will outline how I set up the model on a remote machine and interpreted the results.

Overview

The network will extract features from an existing caffe model and train a CNN network on these enriched features instead of the raw images directly. The data is structured as follows. Each patient has MRI scans taken from multiple views. Each view has 30 images each. There are 500 such patients. The number of views can vary across patients, but averages at around 11. The labels provided contain the actual volume for systole and diastole for each patient. So 1 label for every (10x30) images. The process of assigning this label to images is explained in detail here. But the idea is to:

  1. Identify the systole and diastole image by measuring the brightness of the image (brightest is systole and vice versa). So 1 for systole and 1 for diastole from each view.
  2. Bucket the views into 4 categories. Randomly pick one of the images selected from 1 across all views in each bucket.
  3. Concatenate the images and assign that label to each concatenated image.

The following block diagram illustrates the steps mentioned above:

CNN Architecture

Download the Dataset

First, download the dataset from kaggle. Its a little tricky to do this on a remote server as there is no javascript support and hence no browser to download. To circumvent this, you need to do the following:

  1. Download this chrome extension.
  2. Login to kaggle and visit this page hosting the dataset.
  3. Get the cookies from the extension and save it in your remote machine in $PROJECT_PATH/train_data/cookies.txt
  4. Inside $PROJECT_PATH/train_data directory nohup wget --load-cookies cookies.txt https://www.kaggle.com/c/second-annual-data-science-bowl/download/train.zip &
  5. Do the same for the labelled data as well.

Setup the ipython server

Next, you'd have to set up ipython so that you can view the images and visualize the data. This tutorial gives a detailed breakdown of the steps involved. To view the notebook, ssh into the machine as follows: ssh -X -L localhost:8888:localhost:<port defined in ipython config> ubuntu@<ip_address> The -X allows you to view images from the remote machine on your local. Now, execute jupyter notebook --no-browser on the remote machine and open localhost:8888/ on the browser of your local machine.

View the images

To get a sense of the data I first decided to view a few of the images and see if the heuristic of brightness gives a good approximation of systolic and diastolic heart images in a set of 30 in each view. I used otsu thresholding to compute this brightness. This notebook shows the images chosen as systolic and diastolic from the 30 images in a view. You will need ImageMagick installed on your remote machine for this.

Image Selection

As discussed in the Overview, 4 images for systole and 4 for diastole have to be chosen for every patient. The first step of identifying systole and diastole was covered above. Next, for the view buckets. ch views form 1 bucket whereas all the sax views form 3 buckets. For this, I sorted the sax views in ascending order and split this into 3 equal groups. Then I picked one image at random from each of these buckets and arrived at 4 images for systole and diastole each. This python script walks through these steps. It should be used as follows: python data_input.py <path_to_dataset> <path_to_label> <path_to_output> <mode>

Extract features using caffe

Install Caffe and download model

First, install caffe and specifically, get pycaffe to work. I used this tutorial. The premise of extracting features is that you take an existing model with pre trained weights and get the features for your images at a layer you choose in this model. Specifically, you take a model X with 20 layers. Then pass your training data to it and collect the output at say the 15th layer. Now that you have enriched features instead of raw images, you only need to train a few layers from scratch. I decided to follow this report and used the VGG_ILSVRC_19_layers model. Steps on how to get the model are explained here. Just one catch, to enable download_model_binary.py to parse the readme.md, the following changes have to be made to it: Add --- before name: and after gist_id:. This identifier tells the parser to start and stop parsing at these lines Next add sha1: after gist_id:. The download_model_binary.py script will throw an error if this field is missing.

View the extracted features

Before we extract features for all the images, it is instructive to view what these ‘features’ will look like. I wasn’t able to draw too many conclusions from them. But it helped me get a feel of what I was really extracting from the caffe model. I slightly modified an existing python notebook that does the same for a different model (with more interpretable results). A link to a similar notebook for this project can be found here. It will require you to obtain the mean of each of the 8 image sets first. I did so using this python script.

Obtain and store extracted features for all images

Now that we have visualized the features, we can perform this forward pass for all images and store it as inputs for the model we create. I stored the output in hdf5 format using the python library tables. I also concatenate the features extracted from images from each of the 4 buckets of size (512x14x14) here and store it as a single image that is (512x28x28). The code for doing this is here. You will have to change the VGG_ILSVRC_19_layers_deploy.prototxt to read 50 examples. (I would have preferred more, but I ran out of memory at 100)

Train your network

Store labels in a separate npy file

I used a separate python script to do this. By this step the root_input_path should contain the following directories:

  1. max_min_select
  2. random_select
  3. labelled_data
  4. mean_image
  5. extracted_feat
  6. labels

Finally, lets train

Some of the code in the above section and almost all the code in this section is from this project. I decided to iterate over 3 choices of models. Each new model adds an extra layer of convolution. The script that does this is here. Model1 has 1 CNN layer, model2 has 2 CNN layers and model3 has 3 of them. Each layer has dropout.

Test your network

Run data_input.py followed by extract_features.py to obtain the necessary input for the model. Now run the keras_model.py in test mode to obtain actual versus predicted arrays.

Results

Using the output stored from the keras_model.py script, we can plot the following for each of the models:

  1. Train/Validation loss across iterations
  2. Actual versus predicted volume for validation data.

Model1:

  1. Training versus Validation Loss Train versus Validation loss

  2. Actual versus Predicted Systole Validation Dataset Actual versus Predicted Systole Validation Dataset

  3. Actual versus Predicted Diastole Validation Dataset Actual versus Predicted Systole Validation Dataset

Model2:

  1. Training versus Validation Loss Train versus Validation loss

  2. Actual versus Predicted Systole Validation Dataset Actual versus Predicted Systole Validation Dataset

  3. Actual versus Predicted Diastole Validation Dataset Actual versus Predicted Systole Validation Dataset

Model3:

  1. Training versus Validation Loss Train versus Validation loss

  2. Actual versus Predicted Systole Validation Dataset Actual versus Predicted Systole Validation Dataset

  3. Actual versus Predicted Diastole Validation Dataset Actual versus Predicted Systole Validation Dataset

Observations

  1. In all 3 models, the validation loss for systole and diastole more or less plateaued after 20-25 iterations although the train loss kept reducing till around 150 iterations. I guess the model was heavily overfitting since the training was on only 400 images (80/20 train/test split of the 500 in total), each with (512x28x28) features.
  2. More layers didn't help. The graph suggests that training for more iterations may have reduced the training loss of model3. But the validation loss didn't show any downward slope.
  3. Interestingly, diastole gave significantly worse loss than systole in each of the models.
  4. Model1 did a fairly good job with figuring out the average weight of systole and diastole in the validation dataset. It doesn't do so good with huge outliers, but gives a fairly decent approximation for variations around the average.