by Bochen Guan, Saide Wu and Cheng Xiang
METHOD
The proposed deep learning-based ACL diagnosis system consists of three deep networks. Since one MRI sequence consists of several image slides and not every slide contains ACL, we use first CNN to detect slides of interest (slides containing ligament). The second CNN is designed to detect ligament region in selected slides and locate ligaments to narrow the range of recognition. The third classification CNN evaluats ACL structure within segmented images from the second CNN. These three networks are connected in a cascaded fashion to create a fully-automated processing pipeline outlined in Figure 1. The CNN processing pipeline framework is implemented in a hybrid computing environment involving Python (version 2.7, Python Software Foundation, Wilmington, DE). The CNNs are coded using TensorFlow deep learning computing frame.features.
Figure 1 Our model consists of three CNNs. When a MRI sequence is been feed to the network, firstly we detect the slices with ACL, locate where is the ligament and tell if ACL tear exists. The final diagnosis result of a subject will be determined by the output result the last CNN.
The slide detection CNN shown in figure 2 in the process is developed from LeNet-5, which is widely used to recognize visual patterns directly from pixel images with minimal preprocessing [2]. Our model has two sets of convolutional, activation, and pooling layers, followed by a fully connected layer, activation, another fully-connected, and finally a SoftMax classifier. Convolutional layers of network extract features from images while fully connected layers predict probabilities of slides containing ACL. Higher resolution of input images will improve accuracy and also full use of information of original slides in a MRI sequence [6]. Thus, resolution of input image of our model is 256 by 256 rather than 64 by 64 in LeNet-5. Instead of 5 by 5Â kennels used by LeNet-5, we use 7 by 7Â kennels at the first convolutional layer and 5 by 5Â for second layer, which enable model to capture large scale features.
Figure 2 Illustration of our interested slides detection CNN. The whole CNN could be divided into two parts. The first two convolutional and max pooling layers to extract features and followed two fully connected layers for classification.
Our ligament detection CNN is developed from Tiny YOLO [6].  The network has 9 convolutional layers followed by 2 fully connected layers. The 9 convolutional layers extract features from the image and 2 fully connected layers predict detected ligament probabilities and estimate its coordinates. Selected slides firstly resized and feed to the network. YOLO performs as a regression algorithm, which divides resized images into S×S grid. Each grid cell predicts B bounding boxes, correspondingly confidence score and C class probabilities. Therefore, the prediction output is a S×S×(B×5+C)  tensor. Further, our model thresholds confident score/probability of final output class and decides whether detected region is ligament. Besides, the linear activation function is used for the final layer in our model and the leaky rectified linear activation function is used for all other layers of Tiny YOLO. Compared with other object detection algorithms, Tiny YOLO has less background errors and more simple architecture to implement, which allows less MRI images for training.
Figure 3 Illustration of our ligament detection CNN. Instead of 24 convolutional layers of YOLO, tiny YOLO has only 9 convolutional and max pooling layers, which get fast convergence and reduce the data for training and validation.
The recognition CNN is also developed from LeNet. However, we use input image with 64×64, 5×5 kernel function for the first convolutional layer and 3×3 kernel function for the second convolutional layer. Since the ACL tear is not a large-scale obvious feature, smaller kennels are used to improve the sensitivity of details of the CNN and guarantee the model can extract small scale features.
Figure 4 Illustration of our ACL tear detection CNN. This CNN is also developed from LeNet-5 but uses smaller kernel function.