Classification of Mammogram Images

Diplomica Verlag
  • 1. Auflage
  • |
  • erschienen im September 2017
  • |
  • 52 Seiten
E-Book | PDF ohne DRM | Systemvoraussetzungen
978-3-96067-641-6 (ISBN)
Breast cancer is the most common type of cancer in women, which also causes the most cancer deaths among them today. Mammography is the only reliable method to detect breast cancer in the early stage among all diagnostic methods available currently. Breast cancer can occur in both men and women and is defined as an abnormal growth of cells in the breast that multiply uncontrollably. The main factors which cause breast cancer are either hormonal or genetic. Masses are quite subtle, and have many shapes such as circumscribed, speculated or ill-defined. These tumors can be either benign or malignant.
Computer-aided methods are powerful tools to assist the medical staff in hospitals and lead to better and more accurate diagnosis. The main objective of this research is to develop a Computer Aided Diagnosis (CAD) system for finding the tumors in the mammographic images and classifying the tumors as benign or malignant. There are five main phases involved in the proposed CAD system: image pre-processing, extraction of features from mammographic images using Gabor Wavelet and Discrete Wavelet Transform (DWT), dimensionality reduction using Principal Component Analysis (PCA) and classification using Support Vector Machine (SVM) classifier.
  • Englisch
  • Hamburg
  • |
  • Deutschland
34 Abb.
  • 3,36 MB
978-3-96067-641-6 (9783960676416)
3960676417 (3960676417)
weitere Ausgaben werden ermittelt
Text Sample:


This chapter describes techniques used for implementation of the system. The proposed system consists of image preprocessing, feature extraction, dimensionality reduction and classification.
3.1 Matlab Environment:

MATLAB, which stands for MATrix LABoratory. Matlab is a mathematical software package used extensively in both academia and industry. It is an interactive tool for numerical computation and data visualization. MATLAB is a high-level computing language with technical applications and environment for data visualization, data analysis, algorithm development and numeric computation. MATLAB is used for these areas of programming and can be used to great effect as extensive specialized libraries of usage definable functions are available to the user that are implemental by simply naming and passing parameters to the function. Matlab has several advantages over other traditional means of numerical computing. It allows quick and easy coding in a very high level language.
- An interactive interface allows easy debugging and rapid experimentation.
- Visualization and High-quality graphic facilities are available.
- Matlab M-files are portable in a wide range of platforms.
- Toolboxes can be added to extend the system.
- Matlab has a problem-solving environment.
- It has sophisticated data structures, contains built in debugging and profiling tools, and supports object oriented programming.
Thus, Matlab is a powerful tool for research and practical problem solving and an excellent language for teaching.
3.2 The Proposed System:

The system is divided into three main stages. The first step involves an enhancement which is used to improve an image quality. The next stage is the Gabor Wavelet and DWT based features extraction from the mammogram. The last stage involves classification using multiclass SVM classifier [.].
The digitized mammogram images are given as an input. The Mammographic Image Analysis Society (MIAS) [23] Mini Mammographic Database from the Royal Marsden Hospital in London is used for performing experiment. It contains 322 images (Medio-Lateral Oblique (MLO)) representing 161 bilateral pairs. The database is divided into seven categories which include circumscribed masses, micro-calcifications, architectural distortion, spiculated lesions, ill-defined masses and asymmetric densities. The input digitized images are not clean; it may contain some noise which should be removed, so that they can be used for further processing. 2D Median Filter is used to remove noise from an image. Further Adaptive histogram equalization is applied to it. Once noise is removed from the image, to discard irrelevant information like breast contour, 140 × 140 pixels patches of surrounding the abnormality region were extracted from the original 1024 × 1024 pixels images. The patches assures that the abnormality region is captured, providing the information about the abnormality shape. For the normal case, the patches are extracted from random position. In order to reduce the computational load each image is down sampled to a final size of 30 × 30 pixels. At last, Gabor filtered image is generated. The 2-D wavelet decomposition is performed by applying one dimensional DWT along the rows of the image first and then, the results are decomposed along the columns. Once features are extracted, they are stored in one vector. But the extracted information may require large space for storage as well as while processing it may take more time to compute the operation and produce result. Thus dimensionality reduction mechanism is implemented in which the given feature set is reduced. Here Principle Component Analysis is used. The extracted ROIs can be classified as benign or malign. For classification Support Vector Machine (SVM) is used.
3.3 Implementation of system:

In the proposed system Gabor wavelets based features are extracted from mammogram images. It may contain normal tissues, benign and malign tumors. Once features are detected, Principal Component Analysis (PCA) is further employed to reduce data dimensionality. Finally SVM is applied to classify the tumor as Benign or Malign. For comparison 2D Discrete Wavelet Transform is used and the results are analyzed. The database lists the film and provides appropriate details as follows:

1st column: MIAS database reference number.
2nd column: Character of background tissue (Fatty, Fatty-glandular, or Dense-glandular).
3rd column: Class of abnormality (Calcification, Well-defined/circumscribed masses, spiculated masses, Other/ill-defined masses, Architectural distortion, Asymmetry, or Normal).
4th column: Severity of abnormality (Benign or Malignant).
5th and 6th columns: x, y image coordinates of center of abnormality.
7th column: Approximate radius (in pixels) of a circle enclosing the abnormality.
3.3.1 Image Preprocessing:

In the proposed system two dimensional median filters are used for image preprocessing. Median filter is a nonlinear operation used to reduce salt and pepper noise. Medfilt2 pads the image with 0s on the edges. Thus, the median values for the points within [m n]/2 of the edges might appear distorted. Medfilt2 uses ordfilt2 algorithm to perform the filtering [.].
Then Adaptive histogram equalization is applied. It uses function ADAPTHISTEQ which enhances the contrast of images by transforming the values in the intensity image I. Unlike HISTEQ, it operates on small data regions, instead of entire image. Each region's contrast is enhanced, so that the histogram of the output region approximately matches the specified histogram. To eliminate artificially induced boundaries, the neighboring regions are then combined using bilinear interpolation. The contrast, especially in homogeneous areas, can be limited in order to avoid amplifying the noise which might be present in the image [.].
3.3.2 Feature Extraction by Gabor Wavelets:

For feature extraction the coordinates of center of abnormality are provided. The largest identified abnormality corresponds to a radius of 197 pixels, while the tightest correspond to a radius of 3 pixels. In some cases calcification are widely distributed throughout the image rather than concentrated at a single site. Here, the center locations and radius are neglected. The location and the approximate size of abnormality allow us to extract sub images (patches) with proper dimension representing the tumor zone. To discard unrelated background information like breast contour, patches of 140 × 140 pixels containing the abnormality region are extracted from the original 1024 × 1024 pixels images [.].
3.3.3 Feature Extraction by Discrete Wavelet Transform:

The 2-D wavelet decomposition is performed by applying one dimensional DWT along the rows of the image first and then, the results are decomposed along the columns. This operation results in four decomposed sub band images referred to as high-low (HL), high-high, low-low (LL) and low-high (LH). As a result 2D Discrete Wavelet Transform image is generated [.].
3.3.4 Dimensionality Reduction:

It is the process of elimination of closely related data with other data items in a set. The dimensionality reduction technique is a good approach to improve the efficiency of the classifier. It generates smaller set of features and also preserves all the properties of the original large data set. PCA generates a new set of variables, called principal components. Each principal component is a linear combination of the original variables. The aim of PCA is to reconstruct a simplified multivariate signal. By selecting the numbers of retained principal components, interesting simplified signals can be reconstructed.
3.3.5 Classification by Support Vector Machine:

Support Vector Machines were introduced as a machine learning method by Cortes and Vapnik (1995). If two classes of training set are given to support vector machine, they project ist data points in a higher dimensional space and specify a maximum-margin separating hyper plane between the data points of two classes. This hyper plane is optimal in the sense that it generalizes well to unseen data. The training input of SVMs consists of data points that are vectors of real-valued numbers. The dataset is then projected to higher dimensional feature space, using a function that satisfies Mercer's condition, the kernel function.
In order to train SVMs one does not need them to consider the feature space in ist explicit form. This is due to the fact that only the inner products between support vectors and the vectors of the feature space are required. Therefore, the problem that occurs from the high dimensional feature space is alleviated, because it allows the computations to take place in the original feature space of the problem. The use of the kernel functions is usually referred to as the "kernel trick". A kernel function is a function that corresponds to a dot product of two feature vectors in some expanded feature space: [.].
After projecting the data points to the higher dimension space, SVMs try to identify the optimal hyper plane that separates the two classes. As mentioned earlier, optimality refers to the generalization ability of the hyper plane. As expected, there can be many more than one separating hyper plane for a specific projection of a dataset; the optimal may separates the data with the maximal margin. Support Vector Machines recognize the data points near the optimal separating hyper plane which are called support vectors. The distance of the support from the separating hyper plane is called the margin of the SVM classifier. A good separation is achieved by the hyper plane that has the largest distance to the nearest training data point of any class. During testing, the distance of the unseen data points from the separating hyper plane is calculated. Depending on the sign of the value of this distance, the data point is classified as belonging to the positive or the negative class. Ist calculation requires only the support vectors identified during training.

Dateiformat: PDF
Kopierschutz: ohne DRM (Digital Rights Management)


Computer (Windows; MacOS X; Linux): Verwenden Sie zum Lesen die kostenlose Software Adobe Reader, Adobe Digital Editions oder einen anderen PDF-Viewer Ihrer Wahl (siehe E-Book Hilfe).

Tablet/Smartphone (Android; iOS): Installieren Sie die kostenlose App Adobe Digital Editions oder eine andere Lese-App für E-Books (siehe E-Book Hilfe).

E-Book-Reader: Bookeen, Kobo, Pocketbook, Sony, Tolino u.v.a.m. (nur bedingt: Kindle)

Das Dateiformat PDF zeigt auf jeder Hardware eine Buchseite stets identisch an. Daher ist eine PDF auch für ein komplexes Layout geeignet, wie es bei Lehr- und Fachbüchern verwendet wird (Bilder, Tabellen, Spalten, Fußnoten). Bei kleinen Displays von E-Readern oder Smartphones sind PDF leider eher nervig, weil zu viel Scrollen notwendig ist. Ein Kopierschutz bzw. Digital Rights Management wird bei diesem E-Book nicht eingesetzt.

Weitere Informationen finden Sie in unserer E-Book Hilfe.

Download (sofort verfügbar)

29,99 €
inkl. 19% MwSt.
Download / Einzel-Lizenz
PDF ohne DRM
siehe Systemvoraussetzungen
E-Book bestellen

Unsere Web-Seiten verwenden Cookies. Mit der Nutzung des WebShops erklären Sie sich damit einverstanden. Mehr Informationen finden Sie in unserem Datenschutzhinweis. Ok