CNN Image Classification Using Deep Learning

Convolutional Neural Network

1
APPLICATION OF CONVOLUTIONAL NEURAL NETWORK IN IMAGE
CLASSIFICATION

Name

Course

Professor's Name

Institution

Location of Institution

Date


2
ABSTRACT

Thanks to its broad applications in fields as diverse as smart surveillance and tracking, health and
medicine, sports and entertainment, robots, drones, and self-driving cars, computer vision has
become increasingly popular and successful in recent years. The basic building blocks of each of
these applications are image-processing tasks like image classification, localization, and
detection. Latest advances in Convolutional Neural Networks (CNNs) have resulted in excellent
results in these cutting-edge visual recognition tasks and systems. Consequently, CNNs are now
at the heart of computer vision's deep learning algorithms. This article would be useful to
anyone who wants to learn about the principles behind CNNs as well as get hands-on experience
with CNNs in image processing. It gives a thorough overview of CNNs, beginning with the
fundamental principles of neural networks: preparation, regularization, and optimization in
image processing. Besides, it also proves the effectiveness of CNNs as compared to other image
classification algorithms such as support vector machine.


3
CONTENTS

ABSTRACT

2

..................................................................................................................................................
CHAPTER 1 INTRODUCTION

5

................................................................................................................
1.1 Background of the Study

5

.....................................................................................................................
CHAPTER 2 ARTIFICIAL NEURAL NETWORK

6

................................................................................
2.1 Artificial Neural Network

7

....................................................................................................................
2.2 Artificial Neuron

8

..................................................................................................................................
2.3 Weight, Biases and activation functions

8

..............................................................................................
2.3.1 Weight and Structure of a Neuron

8

...............................................................................
2.3.2 Bias

9

.................................................................................................................................
2.3.3 Activation function and the ReLu

10

..............................................................................
2.4 Back Propagation

11

...............................................................................................................................
2.5 Loss Function

12

.....................................................................................................................................
2.6 Gradient Descent

12

...............................................................................................................................
2.7 Learning Rate

12

.....................................................................................................................................
CHAPTER 3 CONVOLUSIONAL NEURAL NETWORK

13

..................................................................
3.1 Convolutional Neural Network Architecture

13

.....................................................................................
3.2 Convolutional Layers

14

.........................................................................................................................
3.3 Pooling Layers

15

...................................................................................................................................
3.4 Fully Connected Layers

16

.....................................................................................................................
3.5 Models for Composing CNN in Image Classification

16

.......................................................................
3.5.1 Classification and Localization

17

..................................................................................
3.5.2 Semantic Segmentation

18

...............................................................................................
3.5.3 Object Detection

19

..........................................................................................................
3.5.4 Instance Segmentation

23

................................................................................................
CHAPTER 4 CONCEPTUAL FRAMEWORK AND LITERATURE REVIEW

24

...............................
4.1 Literature Overview

24

...........................................................................................................................
4.2 Case Study 1. Convolutional Neural Networks for Image Processing

24

..............................................
4.2 Case Study 2. Deep Convolutional Neural Networks for Hyperspectral Image Classification.

26

.......


4
4.3 Case Study 3. Convolutional neural networks: an overview and application in radiology

28

...............
4.4 Case Study 4. Evaluating the performance of convolutional neural networks with direct acyclic
graph architectures in automatic segmentation of breast lesion in US images

29

........................................
4.5 Conclusion

29

.........................................................................................................................................
CHAPTER 5 DESIGN

30

..............................................................................................................................
5.1 Methodology

30

......................................................................................................................................
5.2 Stages Of Development

30

.....................................................................................................................
5.2.1 Feasibility Study - Stage 0.

30

.........................................................................................
5.2.3 Requirements Specification - Stage 3.

31

.......................................................................
5.2.4 Logical System Specification – Stage 4&5.

31

...............................................................
5.2.5 Physical Design – Stage 6.

31

...........................................................................................
5.3 Reasons for Choosing SSADM

32

.........................................................................................................
5.4 Comparison of SSADM With Other Methodologies

32

.........................................................................
a) Waterfall Model

32

........................................................................................................
b.) Iterative Model

34

..............................................................................................................
5.5 Research Methods

34

..............................................................................................................................
5.5.1 Techniques for data collection.

34

...................................................................................
CHAPTER 6 IMPLEMENTATION

36

........................................................................................................
6.1 Hardware and Software Used

36

............................................................................................................
6.2 Definitions

36

.........................................................................................................................................
6.2.1 Train, Validation, and Test

36

..........................................................................................
6.2.2 Overfitting and Underfitting

37

......................................................................................
6.2.3 Batch Size

37

.....................................................................................................................
6.2.4 Epoch

37

............................................................................................................................
6.2.5 Dropout

37

........................................................................................................................
6.2.6 Batch Normalization

37

...................................................................................................
6.3 Modelling And Results

38

......................................................................................................................
6.3.1 First Model

38

...................................................................................................................
6.3.2 Second Model

40

...............................................................................................................
6.3.3 Third Model

42

.................................................................................................................
6.4 CNN and SVM comparison

44

...............................................................................................................
CHAPTER 7 CONCLUSION

47

...................................................................................................................
CHAPTER 8 RECOMMENDATION AND FURTHER WORK

48

.........................................................


5
CHAPTER 1 INTRODUCTION

1.1 Background of the Study

Artificial intelligence (AI) has become increasingly common in recent years. One of the
responsibilities of computer vision, which is the ability to see things, is something that AI can
8.1 Tune Parameters

48

.................................................................................................................................
8.2 Image Data Augmentation

49

.................................................................................................................
8.3 Deeper Network Topology

49

.................................................................................................................
8.4 Handle Overfitting and Overfitting Problem

50

.....................................................................................
APPENDICES

54
............................................................................................................................................


6
help with. Computers are used to process and analyze images to simulate human vision. Image
recognition is one of the most important tasks in computer vision. Image classification, for
example, is when there are pictures of several items that need to be classified into "groups," such
as "car," "plane," "ship," or "house."

Convolutional neural networks are a popular method for image classification. It involves
employing deep learning, which is implemented using neural networks. Deep learning is a subset
of machine learning, which is a subset of AI.

First, the University of Edinburgh's CINIC-10 dataset was employed to show how to apply
convolutional neural networks in image classification and how to achieve more accurate results
by employing different factors. CIFAR-10 and ImageNet are two well-known image
classification datasets, and CINIC-10 is a mixture of the two.

Second, a dataset composing the Salina, University of Pavia scenes, and Indian pines data was
used to show the effectiveness of convolutional neural networks in image classification as
compared to support vector machine. Support vector machine was used since it has been known
to be a very effective algorithm for image classification for many years (Gulli 2021).

CHAPTER 2 ARTIFICIAL NEURAL NETWORK


7
Since artificial neurons are greatly inspired by human neurons, it is important to understand how
human neurons work.

Figure 1: A diagram of the neuron showing the structure between the axon and dendrite.

When a neuron fires, normally in response to a stimulus, signals are sent down its axon to the
dendrites of another neuron through a synapse. The new neuron through then fire, causing
another neuron to fire, repeating the process in the system.

2.1 Artificial Neural Network

An artificial neural network (ANN) is a set of layers of neurons (referred to as units or nodes in
this context). Each unit in one layer is connected to each unit in the next layer.


8
Figure 2: The artificial neural network architecture

The network takes all the information it needs, in this case the images to identify, through an
input layer. Secret layers exist between the input and output layers. Each hidden layer detects a
different set of features in an image, ranging from simple to complex. The first hidden layer, for
example, detects edges and lines, the second layer detects curves, and the third layer detects
objects. The first secret layer, for example, detects edges and lines, the second detects curves,
and the third layer detects specific image features, such as a face or a wheel. The first secret
layer, for example, detects edges and lines, the second detects curves, and the third layer detects
specific image features, such as a face or a wheel.

The network makes predictions in the output layer. Human-provided labels are compared to the
projected image categories. If they are wrong, the network corrects its learning using a technique
called backpropagation (discussed later in this chapter) so that it can make better guesses in the
next iteration. After enough training, a network may make classifications on its own, without the
need for human intervention.

2.2 Artificial Neuron

In an artificial neural network, an artificial neuron is a link point (unit or node) that can process
input signals and generate output signals.

2.3 Weight, Biases and activation functions

2.3.1 Weight and Structure of a Neuron

In a neural network, the connections between the units are weighted, which means that the
weight shows how much the input from a previous unit influences the output of the next unit. To


9
compute an artificial neuron mathematically, add all the products of all the inputs (x1 to xn) and
their corresponding weights (w1 to wn), then add a bias (b), then feed the resulting value into an
activation function (f) to form the output.

Figure 3: A diagram to show the work of a neuron: input x, weights w, bias b, activation function
f.

2.3.2 Bias

A bias (b) is an additional input to a neuron that is technically the number 1 compounded by a
weight. The bias allows the activation function curve to be moved left or right on the coordinate
graph, allowing the neuron to produce the desired output value.

Figure 4: A bias value allows the activation function to shift to the left or right.


10
To illustrate Figure 4, when the input (x) is 2, a bias value of 5 allows the Sigmoid activation
function to output 0.

2.3.3 Activation function and the ReLu

An activation mechanism, by definition, determines whether or not a neuron should be activated
(“fired”). It causes a neuron's output to become nonlinear. Without activation functions, a neural
network is nothing more than a linear regression model. The ReLu: A(x) = max (0, x) is the most
common activation function for CNNs (13) and the one used in this thesis. (No. 14) When x is
positive, it outputs x; otherwise, it outputs 0.


11
Figure 5: The ReLu function.

Since the mathematical operation is simpler and the activation is sparser, ReLu is less
computationally costly than some other popular activation functions like tanh and Sigmoid.
Since the function returns 0 when x is less than zero, there's a good chance that a given unit
won't turn on at all. Sparsity also means less noise and overfitting, as well as more succinct
models with higher predictive capacity. Neurons in a sparse network are more likely to process
useful data. A neuron that can recognize human faces, for example, should not be triggered if the
picture is actually about a house.

Another advantage that the ReLu has over the others is that it is faster. Converges more quickly
Linearity (when x 0) denotes that the line's slope does not change. As x rises, it does not reach a
plateau. As a result, ReLu does not have the vanishing capacity. Other activation functions, such
as Sigmoid or tanh, suffer from a gradient problem.

The Softmax function is another common activation function in CNNs. It's frequently used in the
output layer, where multiclass classification is performed. However, this function's mathematical
calculation is outside the reach of this thesis.

2.4 Back Propagation

Backpropagation is an algorithm that aids neural networks in learning new information.
parameters, primarily because of prediction errors. This chapter will focus on using gradient
descent, illustrate backpropagation.


12
2.5 Loss Function

A loss function is an error measure, a method of calculating the degree of inaccuracy in a system.
Forecasting the goal of deep learning models is to minimize this loss function value, and this
process is known as optimization.

2.6 Gradient Descent

Gradient descent is an optimization algorithm that changes the internal state of the system. To minimize
the loss function value, adjust the weights of the neural network. The gradient descent algorithm tries to
reduce the loss function value by adjusting weights after each iteration until further tweaks are no longer
possible. produce little to no change in the value of the loss function, also known as convergence.

2.7 Learning Rate

In gradient descent or other optimization algorithms, a learning rate is the step size of each
iteration. Convergence will take a long time if the learning rate is too low, but there may be no
convergence at all if the learning rate is too high.


13
CHAPTER 3 CONVOLUSIONAL NEURAL NETWORK

3.1 Convolutional Neural Network Architecture

A Convolutional neural network is a deep neural network used in image processing that takes
images as input and understands the characteristics from the data. Any colored image is divided
into three layers: red, green, and blue, each of which is nothing more than a pixel value matrix.
On previous output, mathematical operations such as convolutions and pooling are used to create
new layers. Convolutions are used to remove functionality and pooling is used to reduce the
network's complexity. For classification, the output matrix is flattened to one layer and attached
to a completely connected layer.


14
Figure 10: CNN Architecture

The connectivity pattern between neurons in convolutional networks was influenced by
biological processes in that it resembles the organization of the animal visual cortex. Individual
cortical neurons respond to stimuli only in the receptive field, which is a small portion of the
visual field. Different neurons' receptive fields partly overlap, allowing them to occupy the entire
visual field. Our vision is based on multiple cortical layers, each of which recognizes
increasingly organized data. Single pixels are first seen, followed by basic geometric forms and
more complex elements such as shapes, faces, human beings, animals, and so on.

3.2 Convolutional Layers

The mathematical combination of two functions to form a third function is referred to as
"convolution." When this occurs, two sets of data are combined. A convolutional layer (also
known as a filter or kernel) is added to the input data in CNNs to generate a function map.


15
Figure 9: Convolutional layer with filter slides over the input and performs its output on the new
layer.

Between a 3x3 sized filter matrix and a 3x3 sized region of the input image's matrix, a dot
product multiplication is performed. The output value (“destination 16pixel”) on the function
map is the number of the elements of the resulting matrix. The filter then slides over the input
matrix and completes the function map by repeating the dot product multiplication for each
remaining combination of 3x3 sized areas.

3.3 Pooling Layers

Pooling layers reduce the dimensionality of feature charts, specifically the height, and width
while maintaining the depth. This is advantageous because it reduces the amount of computing
power needed to process the data when extracting the most important features in function maps.

Pooling layers are divided into two categories: maximum pooling and average pooling.


16
Figure 10: Types of Pooling.

The maximum value of the elements in the portion of the image projected by the filter is returned
by max pooling, while the average value is returned by average pooling. Max pooling is more
effective at extracting dominant features and is therefore, more efficient.

3.4 Fully Connected Layers

The classification takes place in completely linked layers. The input matrix is converted to a
column vector and fed into a series of fully connected layers, similar to the fully connected ANN
architecture mentioned previously. Each completely connected layer (called a Dense layer) goes
through an activation function (such as tanh or ReLu), but the output Dense layer goes through
Softmax. Cross-Entropy (categorical cross-entropy in Keras) is the loss function used in Softmax
multiclass classification. The Softmax function returns an N-dimensional vector, where N is the
number of classes from which the CNN must choose. The probability that the image belongs to
each of the classes is represented by each number in this N-dimensional vector. For example, if
the output vector is [0.1,1.75,0,0,0,0 ,0,0.0,5], there is a 10% chance that this image belongs to
class 2, a 25% chance that it belongs to class 3, a 75% chance that it belongs to class 4, and a 5%
chance that it belongs to class 10.

3.5 Models for Composing CNN in Image Classification

To solve several complex tasks, the simple CNN architecture can be composed and expanded in
a variety of ways.


17
3.5.1 Classification and Localization

You must report not only the type of object contained in the image but also the coordinates of the
bounding box where the object appears in the image in the classification and localization task.
This task assumes that an image contains only one instance of an entity.

In a standard classification network, this can be accomplished by adding a "regression head" in
addition to the "classification head." Remember that the final production of convolution and
pooling operations, called the feature map, is fed into a fully connected network that generates a
vector of class probabilities in a classification network. The classification head is a completely
connected network that is tuned using a categorical loss function (Lc) such as categorical cross-
entropy (Gulli 2021).

A regression head is a completely connected network that takes the function map and generates a
vector (x, y, w, h) that represents the top-left x and y coordinates, as well as the bounding box's
width and height. A continuous loss function (Lr), such as mean squared error, is used to tune it.
A linear combination of the two losses is used to tune the entire network, i.e.

L=αLC+(1-α)Lr

This is a hyper parameter that can have a value of 0 or 1. It can be set to 0.5 unless the value is
determined by some domain information about the problem. A typical classification and


18
localization network architecture is depicted in the diagram below. The only deviation from a
standard CNN classification network, as you can see, is the additional regression head on the top
right:

Figure 4: Architecture for Classification and Localization

3.5.2 Semantic Segmentation

The goal here is to assign a single class to each pixel on the image. A first step may be to create a
classifier network for each pixel, with the input being a small neighborhood surrounding each
pixel. In reality, this method is inefficient, so running the image through convolutions to increase
the feature depth while keeping the image width and height constant may be a better alternative.
After that, each pixel has a feature map that can be sent through a completely connected network
to predict the pixel's class. In reality, however, this is often very costly, and it is seldom used.

A third method is to use a CNN encoder-decoder network, in which the encoder reduces the
image's width and height while increasing its depth (number of features), while the decoder uses
transposed convolution operations to increase the image's size while decreasing its depth.

The method of moving in the opposite direction of a typical convolution is known as transpose
convolution (or up sampling). The picture is the input to this network, and the segmentation map
is the output (Gulli 2021).

The U-Net (a good implementation is available at https://github.com/jakeret/tf unet), which was
originally designed for biomedical image segmentation and has additional skip-connections


19
between corresponding layers of the encoder and decoder, is a common implementation of this
encoder-decoder architecture. The U-Net architecture is depicted in the diagram below:

Figure 10: Semantic Segmentation

3.5.3 Object Detection

The classification and localization tasks are identical to the object detection task. The main
difference is that there are now several objects in the image, and we must determine the class and


20
bounding box coordinates for each one. Furthermore, neither the number nor the size of the items
is specified ahead of time. As you would expect, this difficult problem has prompted a significant
amount of study. A first solution to the problem might be to make several random croppings of
the input image and apply the classification and localization networks we discussed earlier to
each crop. However, such an approach wastes a lot of computing power and is unlikely to be
competitive. Using a method like Selective, which uses conventional computer vision techniques
to identify areas in the image that may contain objects, is a more realistic approach.

Figure 10: Object Detection

These areas are known as "Region Proposals," and the network that was used to find them was
known as the "Region Proposal Network," or R-CNN. The regions were resized and fed into a
network in the original R-CNN to produce image vectors: The bounding boxes suggested by the
external tool were corrected using a linear regression network over the image vectors, and the
vectors were then categorized using an SVM-based classifier. A R-CNN network can be
conceptually interpreted as follows:


21
Figure 10: R-CNN Network Architecture

The Quick R-CNN was the next version of the R-CNN network. Instead of feeding each region
proposal through the CNN, the Quick R-CNN feeds the entire picture through the CNN, and the
region proposals are projected onto the resulting feature map. Each region of interest is fed
through a Region of Interest (ROI) pooling layer before being fed into a fully connected network,
which generates an ROI feature vector.

ROI pooling is a common operation in convolutional neural network object detection tasks. The
ROI pooling layer employs maximum pooling to transform features within any valid region of
interest into a small feature map with a defined spatial extent of H W. (where H and W are two
hyperparameters).

The function vector is then fed into two completely connected networks, one of which predicts
the ROI class and the other of which corrects the proposal's bounding box coordinates. As an
example, consider the following:


22
Figure 10: Quick R-CNN Network Architecture

The fast R-CNN is 25 times faster than the R-CNN. The next upgrade, known as the Faster R-
CNN (an implementation can be found at), replaces the external region proposal mechanism with
a trainable portion within the network called the Region Proposal Network (RPN). As shown
below, the performance of this network is combined with the feature map and passed through a
pipeline similar to that of the Fast R-CNN network. The Faster R-CNN network is approximately
10 times faster than the Fast R-CNN network, making it roughly 250 times faster than an R-CNN
network (Gulli 2021).

Figure 10: Faster R-CNN Network Architecture

Single Shot Detectors (SSD), such as You Only Look Once, is a slightly different type of object
detection network (YOLO). Each image is divided into a predetermined number of sections
using a grid in these cases. A 7x7 grid is used in the case of YOLO, resulting in 49 subimages.
Each subimage receives a predetermined collection of crops with different aspect ratios. The
output for each image is a vector of size (7 * 7 * (5B + C) given B bounding boxes and C object


23
groups. Each grid has prediction probabilities for the various objects detected inside it, as well as
trust and coordinates (x, y, w, h) for each bounding box.

This transition is carried out by the YOLO network, which is a CNN affiliate. The results from
this vector are combined to find the final predictions and bounding boxes. In YOLO, the
bounding boxes and associated class probabilities are predicted by a single convolutional
network. YOLO is the quickest solution for object detection, but the algorithm can miss smaller
artifacts.

3.5.4 Instance Segmentation

With a few key differences, instance segmentation is similar to semantic segmentation — the
process of associating each pixel of an image with a class mark. It must first differentiate
between different instances of the same class in a picture. Second, labeling every bitmap image
in the image is not necessary. In some ways, instance segmentation is similar to object detection,
but we are looking for a binary mask that covers each object instead of bounding boxes. The
second concept contributes to the Mask R-CNN network's intuition. The Mask R-CNN is a
Faster R-CNN with an additional CNN in front of its regression head that takes the ROI
bounding box coordinates as input and converts them to a binary mask.

The second concept contributes to the Mask R-CNN network's intuition. The Mask R-CNN is a
Faster R-CNN with an additional CNN in front of its regression head that converts the bounding
box coordinates recorded for each ROI to a binary mask as input.


24
Figure 11: Mask R-CNN Network Architecture

CHAPTER 4 CONCEPTUAL FRAMEWORK AND LITERATURE REVIEW

4.1 Literature Overview

While hand-crafted feature extraction techniques, such as texture analysis, have been used in
radionics studies for many years, they have been followed by traditional machine learning
classifiers, such as random forests and support vector machines. When it comes to image
recognition, there are a few distinctions to be made between certain approaches and CNN. First,
CNN does not necessitate feature extraction by hand. Second, CNN architectures do not often
require human experts to segment tumors or organs. Third, since there are millions of learnable
parameters to estimate, CNN is much more data-hungry and computationally costly,
necessitating the use of graphical processing units (GPUs) for model training (Browne and
Ghidary 2003).

4.2 Case Study 1. Convolutional Neural Networks for Image Processing

The term convolutional network (CNN) is used to describe an architecture for applying neural
networks to two-dimensional arrays (usually images), based on spatially localized neural input.
The ‘sharing' of weights across processing units in the CNN architecture decreases the number of
free parameters, improving the network's generalization efficiency. Weights are repeated
throughout the spatial collection, resulting in inherent insensitivity to input translations – a useful
function for image classification. CNNs have a range of distinct advantages over completely
connected and unconstrained neural network architectures in the sense of image processing.


25
When providing input directly to the network, the number of free parameters in the network can
easily become unmanageable unless a specialized architecture is used. Traditional neural network
applications may be able to solve this problem by relying on comprehensive pre-processing of
images to make them in a usable format. However, this results in a hybrid two-stage architecture
in which the pre-processing stage does most of the "interesting" function, which is, of course,
hard-wired and non-adaptive (Browne and Ghidary 2003)

.

There is no built-in invariance in unstructured neural networks when it comes to translations or
local distortions of the inputs. Indeed, one shortcoming of fully connected architectures is that
the input topology is completely ignored. Images are strongly correlated and have a solid 2D
local structure. In general, we argue that when input data is organized temporally or spatially, a
general CNN architecture is better than a generic neural network (Browne and Ghidary 2003).

CNNs perform mappings in any dimension between spatially/temporally distributed arrays. They
tend to be appropriate for use with time series, photographs, or video. CNNs have the following
characteristics:

• Translation invariance (neural weights remain constant regardless of translation
direction).

• Connectivity within the group (neural connections only exist between spatially local
regions).

• A gradual reduction in spatial resolution is a choice (as the number of features is
gradually increased).


26
4.2 Case Study 2. Deep Convolutional Neural Networks for Hyperspectral Image
Classification.

Huang et al. (2015) Found out that generally, in comparison to other image classification
algorithms, CNNs need very little pre-processing. This means that the network learns to optimize
the filters (or kernels) through automatic learning, as opposed to hand-engineered filters in
conventional algorithms. This lack of reliance on prior expertise or human involvement in
feature extraction is a significant benefit.

Hyperspectral imagery is determined and created by remote sensors, which involve hundreds of
observation channels with high spectral resolution. This process has inspired the development of
many algorithms such as K-nearest neighbors, minimum distance, and logistic regression.
However, these algorithms over the years have proved inefficient as compared to CNN when
employed in remote sensing data. CNN provides multilayer perceptron and a radial basis
function neural networks that these other algorithms lack. It is true that algorithms like SVM are
indeed efficient as compared to the conventional CNN in terms of classification accuracy and
computing cost, but when a deep structure and architectures of CNN are employed then CNN
proves to be a powerful model for classification than all the other algorithms and very
competitive as compared to SVM. Not only has CNN overpowered other algorithms but also
over the years, deep CNN results in a promising performance in many fields as it has played a
vital role for processing visual-related problems (Huang et al 2015).


27
CNN has even more recently proved efficient than some of the superior methods such as human
performance and many vision-oriented tasks, including image classification, object detection,
and scene mapping, number digit classification and face recognition. When applying CNNs to
HIS classification the structure of CNN is gradually proven the most effective and preferable
way to understand visual representations. The figure below represents hyperspectral data with
hundreds of spectral channels. Each curve for specific lass has its visual shape, though it’s hard
distinguishing some of these differences with human eyes, CNN can achieve better results as
compared to humans and as a result, CNN has proven to be the best techniques when employed
in HIS classification (Huang et al 2015).

Figure 12: HIS classification


28
4.3 Case Study 3. Convolutional neural networks: an overview and application in radiology

Classification using deep learning in medical image analysis typically uses target lesions
represented in medical images, and these lesions are divided into two or more groups. Deep
learning, for example, is commonly used to classify lung nodules on computed tomography (CT)
images as benign or malignant, as seen below. For efficient classification using CNN, a large
amount of training data with corresponding labels is needed. CT photographs of lung nodules
and their indications (i.e., benign or cancerous) are used as training data for lung nodule
classification. Below is a display two examples of lung nodule classification training results, one
for a benign lung nodule and the other for primary lung cancer.

Figure 13: CNN in radiology


29
4.4 Case Study 4. Evaluating the performance of convolutional neural networks with direct
acyclic graph architectures in automatic segmentation of breast lesion in US images

In Ultra Sound (US) breast photos, highlighting lesion contours is a vital step in breast cancer
diagnosis. Infiltrating the underlying tissue, malignant lesions produce irregular contours with
speculation and angulated edges, while benign lesions produce smooth contours with an elliptical
form. He states that In breast imaging, the majority of the existing publications in the literature
focus on using Convolutional Neural Networks (CNNs) for segmentation and classification of
lesions in mammographic images. However In this study the main objective is to assess the
ability of CNNs in detecting contour irregularities in breast lesions in US images.

4.5 Conclusion

It is very clear that while with the convolutional neural, the model accuracy in image
classification increases proportionally. Secondly using CNN in different fields such as radiology,
hyperspectral image classification and many other fields, has proven to be more beneficial and
advanced as compared to other algorithms employed in image classification. As such it is worthy
concluding that CNN is the best method to apply when involved in image classification


30
CHAPTER 5 DESIGN

5.1 Methodology

To evaluate and build, the structured systems analysis and design process (SSADM) was used.
According to (Kendall 1988), the SSADM approach includes users during the most important
and intensive period of the development process: the first stages of development. Aside from
that, in terms of development stages of operation, it is close to the waterfall model. It divides
growth into stages and modulates it. The data model is the first model it creates. The following
techniques were used:

Logical data modeling - logical data modeling is the method of defining, modeling, and
recording data. The information is then divided into entities and relationships.

Data Flow Modelling - involves following the flow of data in a computer system. Processes, data
servers, external actors, and data movement are all thoroughly examined.

Entity Behavior Modeling - involves defining and recording the events that influence each
individual, as well as the order in which they occur.

5.2 Stages Of Development

5.2.1 Feasibility Study - Stage 0.


31
Its aim is to determine whether the project's course and specifications are financially, technically,
and operationally feasible.

5.2.2 Requirement Analysis - Stage 1&2.

This stage entails looking at the current situation and finding issues and areas that need to be
improved. The second state entails creating a range of options that meet the specified criteria and
selecting the most appropriate alternative.

5.2.3 Requirements Specification - Stage 3.

The stage aims to identify the desired system data, functions, and events.

5.2.4 Logical System Specification – Stage 4&5.

This stage aims to evaluate the technical system's operations as well as the conceptual design.

5.2.5 Physical Design – Stage 6.

The physical world in which the device will operate is taken into account.


32
Figure 14: SSADM Methodology.

5.3 Reasons for Choosing SSADM

Within a systems development cycle, SSADM incorporates three approaches, each of which
complements the others:

• Logical Data Modelling

• Data Flow Modelling

• Entity Event Modelling.

Its key advantages over other methodologies are as follows:

➢ Quality improvement

➢ Detailed documentation of the development stages

➢ Reusability for similar projects that follow.

Because of this thorough examination of the information system, this approach decreases the
likelihood of information misunderstandings during the project life cycle, which is why it was
chosen for this project.

5.4 Comparison of SSADM With Other Methodologies

Other software development methodologies that have been investigated but not taken into
account for this project include:

a) Waterfall Model


33
This is a sequential design process in which progress is viewed as a waterfall that flows steadily
downward through the phases of:

• Conception

• Initiation

• Analysis

• Design

• Construction

• Testing

• Implementation and maintenance

All of these steps flow through one another, with progress appearing to flow slowly like a
waterfall.

Advantages of Waterfall

It is simple to handle since each stage is defined by rigid deliverables and a review process.

There is no overlapping since phases are processed and completed one at a time.

Disadvantages of waterfall

It is difficult to predict how long each phase of construction will take and how much it will cost.

For dynamic and object-oriented projects, this is not a suitable model.

Not ideal for projects with variable specifications.

SSADM vs Waterfall

While the two always seem to be identical, a subtle difference makes SSADM superior to
Waterfall. This is because, unlike the traditional Waterfall, SSADM allows for the review of
previous stages/phases even after they have been completed, while the traditional Waterfall is
static and cannot be checked until a step has been completed.


34
b.) Iterative Model

This is a version of the software development life cycle that focuses on a simple initial
implementation that gradually increases in complexity and feature set until the final set is
complete. Following the initial planning process, a limited number of steps are replicated, with
each cycle's completion refining and iterating the software incrementally. These phases include:

• Planning and requirements

• Analysis and design

• Implementation (coding)

• Testing

• Evaluation

Advantages of iterative model

Simple adaptability to the system's ever-changing requirements. computer applications

To suit the needs of the project or organization, each stage can be broken down into smaller
chunks.

Disadvantages of iterative model

User interaction is under more strain.

Users notice the changes in each iteration, so feature/requirement creep is a possibility.

5.5 Research Methods

This section explains the data collection techniques that will be used in the qualitative study of
the system. Farmers and some management are the system's most important stakeholders.

5.5.1 Techniques for data collection.


35
Many data collection methods are available. Interviews, observation, and documents and records
review will be used to collect data. Existing data was majorly used during the development and
testing.

5.5.1.1 Existing Data

This refers to the addition of new investigation questions to the ones that were originally used
when the data was collected. It entails incorporating measurement into a study or research
project. Data sourced from an archive is an example.

Advantages of Existing Data

The level of precision is extremely high.

Data that is easily available.

Disadvantages of Existing Data

Evaluation issues and comprehension difficulties.


36
CHAPTER 6 IMPLEMENTATION

6.1 Hardware and Software Used

The report employed free GPUs from Google Colab (Collaboratory). The deep learning framework
applied is TensorFlow with Keras API.

6.2 Definitions

6.2.1 Train, Validation, and Test

The model is trained using the training dataset. The model learns its weights and prejudices in the case of
neural networks.

After each set of predictions, the model evaluates itself using the validation dataset. It aids the model's
hyperparameter tuning.

After the model has been fully trained, the test dataset is used to validate it.


37
6.2.2 Overfitting and Underfitting

When a model captures the noise in the data, it is said to overfit. It intuitively suits the data too well, or in
other words, it is overly reliant on the training data.

Underfitting, on the other hand, happens when the model fails to capture the underlying pattern of the
data or does not intuitively match the data well enough.

Overfitting and underfitting both result in poor predictions in new datasets.

6.2.3 Batch Size

In most cases, the whole dataset cannot be fed into the neural network at the same time. As a result, it
must be divided into parts or batches. The batch size specifies how many training samples are used in a
single batch.

6.2.4 Epoch

When the entire dataset (i.e. every training sample) is fed forward and backward through the neural
network only once, it is referred to as an epoch.

6.2.5 Dropout

Dropout is a method for reducing overfitting. The word "dropout" refers to units and their links being
dropped out at random during training.

6.2.6 Batch Normalization

Overfitting can also be reduced by using batch normalization. It adjusts and scales the activations to
normalize the input layer. Batch normalization's mathematics is outside the reach of this thesis.


38
6.3 Modelling And Results

6.3.1 First Model

This model is based on TensorFlow’s Convolutional Neural Network (CNN) tutorial, with some tweaks.
To avoid overfitting, there are three convolutional layers, each followed by a max-pooling layer and two
dropout layers with a dropout rate of 0.3. Following that, there are two thick layers, each with 256 and 10
units (10 is the number of classes for classification). A dropout layer with a dropout rate of 0.2 exists
between the two thick layers. The batch size is 32 and the number of epochs is 32. The optimizer is Adam
with a learning rate of 0.0001.

Below is the code and the model summary

All programs are implemented using Python language and Theano library.


39

Figure 15: Two layer model CNN code

Here are the results concerning the accuracies and loses

Figure 16: Test loss: 1.25/ Test accuracy: 0.58

The training accuracy continues to improve, but the validation accuracy quickly reaches a
plateau. As a result, despite several Dropout layers, the model is extremely overfitting.


40
6.3.2 Second Model

This model is based on TensorFlow's Convolutional Neural Network (CNN) tutorial (33), with some
tweaks. Three convolutional layers follow each other, followed by a max pooling layer and two dropout
layers with a dropout rate of 0.3 to avoid overfitting. Following that, there are two thick layers, each with
256 and 10 units (10 is the number of classes for classification). A dropout layer with a dropout rate of 0.2
exists between the two thick layers. The number of epochs is 32, and the batch size is 32. Adam is the
optimizer, and his learning rate is 0.0001.



41
Figure 17: Three layer CNN model codel

Here are the results concerning the accuracies and loses


42

Figure 18: Test loss: 1.16 / Test accuracy: 0.58

The model is not as overfitting, but the accuracy is not high enough (just over 60%).

6.3.3 Third Model

The third model has the same structure as the second, but after each convolutional layer, batch
normalization is applied. To save time studying, the batch size has been increased to 128 and the
number of epochs has been reduced to 27. Adam is the optimizer again, but this time the learning
rate has been improved to 0.001 to reduce learning time.



43

Figure 19: Batch Normalization CNN code

Figure 20: Test accuracy: 0.71 / Test loss: 0.83

The model is now running exceptionally well. It is still not overfitting; the accuracies of
preparation, validation, and testing are all reasonably high: 75%, 71%, and 71%, respectively.


44
6.4 CNN and SVM comparison

The Data Set: Majorly three hyperspectral data, composing the Salina, University of Pavia
scenes, and Indian pines are employed to test the effectiveness of CNN in imaged classification
as compared to the SVM algorithm. I am comparing CNN to SVM since SVM has been known
to be the most effective algorithm when employed in image classification. For the data, 200-
labeled pixels are randomly selected per class for the training data sets while all the rest pixels
are employed as the testing datasets.


45
Table 1: Number of training and test samples used in the Indian Pines data set.

The second data was provided by University of Pavia

Table 2: Number of training and test samples used in University of Pavia data set

Table 3: Number of training and test samples used in the Indian Pines data set.

Results and Comparison


46
The figures below provide the comparison between SVM and CNN when employed in HSI
classification.

Table 4: comparison of SVM and CNN in HIS classification

Result of comparison with different neural networks on the Indian Pines data set.

Table 5: comparison of SVM and CNN in HIS classification


47
Figure 21: comparison of SVM and CNN in HIS classification

CHAPTER 7 CONCLUSION

First, the model with four convolutional layers (the second and third models) outperforms the
model with three convolutional layers (the first model) by a large margin, with slightly less
overfitting. The models with four convolutional layers (the second and third models) outperform
the model with three convolutional layers (the first model) by a large margin, with slightly less
overfitting.

It is very clear that while increasing the convolutional layers, the model accuracy in image
classification increases proportionally.

Secondly, the CNN algorithm is more efficient as compared to other image classification
algorithms such as SVM and KNN. As such, it is worth concluding that CNN when employed
effectively is the best algorithm in image classification.


48
CHAPTER 8 RECOMMENDATION AND FURTHER WORK

The research that has been performed for this report has highlighted several topics that suggest
further research and improvement.

8.1 Tune Parameters

Das (2021), found out that to improve CNN model performance, we can tune parameters like
epochs, learning rate, etc… The Number of epochs affects the performance. There is an increase
in efficiency over a wide number of epochs. However, some experimentation is needed when
deciding on epochs and learning rates. We can see that there is no reduction in training failure
and no increase in training precision after a certain number of epochs. Accordingly, we can
determine the number of epochs. In the CNN model, we can also use a dropout layer. During
model compilation, the appropriate optimizer must be chosen based on the application. Various
optimizers, such as SGD can be used. Various optimizers must be used to fine-tune the model.
All of these factors have an impact on CNN's results.


49
8.2 Image Data Augmentation

“Deep learning is only useful when there is a lot of data.” It's not incorrect. CNN requires the
ability to automatically learn features from data, which is typically only possible when a large
amount of training data is available.

If we have less training data available.. what to do? Solution is here.. use Image Augmentation

Zoom, shear, rotation, preprocessing feature, and other image augmentation parameters are
commonly used to increase the data sample count. When these parameters are used during the
training of a Deep Learning model, images with these attributes are created. Existing data
samples increased by nearly 3x to 4x time when image samples were produced using image
augmentation.

Another benefit of data augmentation is that, since CNN is not rotation invariant, we can use it to
add images to the dataset while taking rotation into account. It would undoubtedly improve the
system's accuracy.

8.3 Deeper Network Topology

With any possible input value, a large neural network can be trained. As a result, these networks
excel at memorization but struggle with generalization. However, there are a few drawbacks to
using a very large, shallow network. However, a wide neural network can accept every possible
input value, in the practical application, we won’t have every possible value for training.


50
Deeper networks capture the inherent "hierarchy" that can be seen all over the world. Consider a
covnet: it captures low-level features in the first layer, slightly better but still low-level features
in the second layer, and object parts and basic structures in higher layers. Multiple layers have
the advantage of being able to learn features at different levels of abstraction.

That explains why a deep network may be preferable to a wide but shallow network.

However, why not a very deep, very wide network?

The answer is to achieve successful results, we want our network to be as limited as possible. It
will take longer to train the broader network. Deep networks need a lot of computing power to
practice. As a result, make them wide and deep enough to work, but no wider or deeper (Gulli
2021).

8.4 Handle Overfitting and Overfitting Problem

Let us start with a basic definition, such as Model, to discuss overfitting and underfitting. What
exactly is a model? It is a machine that converts input into output. For example, we can create an
image classification model that takes a test input image and predicts a class label for it. It's
fascinating!

We split the dataset into training and testing sets to build a model. On the training set, we train
our model with a classifier, such as CNN. Then we can use the trained model to predict test data
production.

Overfitting: A model that overfits the training data is referred to as overfitting. What exactly
does it mean? Let's keep it easy... Your model has a high level of accuracy on qualified data but a


51
low level of accuracy on test data due to overfitting. This means that an overfitting model has
strong memorization but poor generalization abilities. From our training data to unknown data,
our model does not generalize well.

Underfitting: Underfitting refers to a model that performs poorly on both train and test results. It
is very hazardous. Isn't that so? The model does not match the training data well.

In technical terms, an overfitting model has a low bias and a high variance. A model that
underfits has a low variance and a high bias. There will always be a tradeoff between bias and
variance in every model, and we strive to strike the best balance when we design models.

Now what is bias and variance?

Bias is a mistake concerning the training collection. The variance of a model refers to how much
it varies in response to the training data. The sense of variance is that a model's accuracy on test
data is low.

How to prevent overfitting and underfitting?

Your model has a 50% accuracy on train data and an 80% accuracy on test data. Is this an
example of underfitting?

Its the worst problem.. why does it occur

Underfitting happens when a model is too simplistic — based on too few features or too
regularized — making it inflexible when learning from a dataset.

Solution


52
If there is underfitting, I would recommend concentrating on the model's depth. It's possible that
you'll need to add layers to get more comprehensive features. To avoid Underfitting, you must
tune parameters, as we mentioned earlier.

Overfitting:

Overfitting is exemplified by the model's accuracy of 99 percent on train data and 60 percent on
test data.

In machine learning, overfitting is a common issue.

There are a few options for avoiding overfitting.

1. Experiment with more info.

2. Taking a break early:

3. Validation by cross-validation

Train with more data

Increase the amount of data you train with to improve mode accuracy. Overfitting can be avoided
by using a large amount of training data. To increase the size of the training set in CNN, we can
use data augmentation.

Early stopping


53
The system is being trained through a series of iterations. Every iteration of the model improves
it. But wait... the model begins to overfit the training data after a certain number of iterations. As
a result, the model's capacity to generalize could be harmed. Do the same with the early stop.
Stopping the training phase before the learner reaches the stage is referred to as early stopping.

Cross Validation

So what is cross validation

Let’s start with k-fold cross validation. (where k is any integer number)

Divide the original training data set into k subsets of equal size. Each subset is referred to as a
fold. Let's call the folds f1, f2,..., fk.

For i = 1 to i = k

• Hold the fold fi in the Validation package, and the rest of the k-1 folds in the Cross
validation training set.

• Using the cross validation training set, train your machine learning algorithm and
measure the accuracy of your model by validating the predicted results against the
validation set.

• Averaging the accuracies derived in all k cases of cross validation can be used to estimate
the accuracy of your machine learning model.


54
APPENDICES


55
Figure 22: CNN Algorithm


56
Figure 23: CNN vs SVM Algorithm

List of References


57
Gulli, A., 2021. Using the CNN Architecture in Image Processing. [online] Medium. Available at:
<https://medium.com/@ODSC/using-the-cnn-architecture-in-image-processing-65b9eb032bdc>
[Accessed 22 April 2021].

Hu, W., Huang, Y., Wei, L., Zhang, F. and Li, H., 2015. Deep Convolutional Neural Networks for
Hyperspectral Image Classification. Journal of Sensors, [online] 2015, pp.1-12. Available at:
<https://doi.org/10.1155/2015/258619> [Accessed 22 April 2021].

Schwartzman, A., Kagan, M., Mackey, L., Nachman, B. and De Oliveira, L., 2016. Image
Processing, Computer Vision, and Deep Learning: new approaches to the analysis and physics
interpretation of LHC events. Journal of Physics: Conference Series, [online] 762, p.012035.
Available at: <https://doi.org/10.1088/1742-6596/762/1/012035> [Accessed 22 April 2021].

Costa, M., Campos, J., de Aquino e Aquino, G., de Albuquerque Pereira, W. and Costa Filho, C.,
2019. Evaluating the performance of convolutional neural networks with direct acyclic graph
architectures in automatic segmentation of breast lesion in US images. BMC Medical Imaging,
[online] 19(1). Available at: <https://doi.org/10.1186/s12880-019-0389-2> [Accessed 22 April
2021].

Browne, M. and Ghidary, S., 2003. Convolutional Neural Networks for Image Processing: An
Application in Robot Vision. Lecture Notes in Computer Science, [online] pp.641-652. Available
at: <https://doi.org/10.1007/978-3-540-24581-0_55> [Accessed 22 April 2021].


58
Das, A., 2021. Convolution Neural Network for Image Processing — Using Keras. [online]
Medium. Available at: <https://towardsdatascience.com/convolution-neural-network-for-image-
processing-using-keras-dc3429056306> [Accessed 22 April 2021].

CNN Image Classification Using Deep Learning

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie CNN Image Classification Using Deep Learning

Ähnlich wie CNN Image Classification Using Deep Learning (20)

Mehr von Writers Per Hour

Mehr von Writers Per Hour (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

CNN Image Classification Using Deep Learning