br Academic Radiology Vol No February MAMMOGRAPHIC RADIOMIC
Academic Radiology, Vol 26, No 2, February 2019 MAMMOGRAPHIC RADIOMIC FEATURES
shown to be related to the molecular subtypes of breast cancer (5 8).
Radiomics refers to the extraction and analysis of a large amount of quantitative imaging features from medical images, which has been extensively studied in recent literature, dem-onstrating predictive or prognostic associations between quan-titative imaging features and medical outcome (9,10). A few recent studies reported that there are differences in quantitative radiomic features extracted from breast dynamic contrast-enhanced (DCE) MRI with respect to the four breast cancer subtypes (11 13). However, breast MRI is an expensive examination and currently not widely available especially in less-developed countries. Digital mammography is less expen-sive and widely used for breast cancer screening and diagnosis; but unlike DCE-MRI, mammography lacks the capability of characterizing certain biological or physiological properties of the breast tissue. Yet, several studies have demonstrated that quantitative imaging features can distinguish malignant and benign lesions in digital mammogram images (14 16). In this study, we aimed to employ machine learning techniques to investigate whether quantitative radiomic imaging features extracted from digital mammography are associated with breast cancer subtypes on a Chinese women population.
MATERIALS AND METHODS
Study Cohort and Imaging Dataset
We conducted a retrospective study and it C6 NBD Ceramide was approved by our institutional review board, and the informed consent require-ment was waived. We collected a study cohort of 331 Chinese women who were diagnosed with invasive breast cancer (con-firmed by pathology) during August 2015 to October 2015. All of the 331 cancers were detected on mammography and 253 (76.4%) were palpable. This cohort included 29 triple-negative, 45 HER2-enriched, 36 luminal A, and 221 luminal B lesions. All mammogram images were acquired with the Hologic Lorad Selenia full-field digital mammography systems (HOLOGIC Gen-Probe, San Diego, USA). The full-field digital mammog-raphy images were at 14-bit quantization with a pixel size of 70 £ 70 mm. Each patient or breast has both the craniocaudal
(CC) view and the mediolateral oblique (MLO) view images. A total of 662 mammogram images were analyzed.
Radiomic Feature Extraction
Radiomic features were extracted from the lesion area on each image. To do this, the first step was to outline the lesion area from each image. An experienced breast imaging-specialized radiologist (25 years' experience) manually outlined the contours of the diagnosed breast tumor in each image of each individual patient. The largest lesion was selected if there are multiple lesions in a breast. These contours circumscribe the breast tumor region, and over this lesion region, we applied existing auto-mated computer programs to calculate a set of 39 quantitative radiomic features from the mammographic imaging data.
The 39 features include (1) morphologic features, such as shape, size, area, perimeter, roundness, concavity, and Fourier coeffi-cient descriptors; (2) grayscale statistic features calculated from the histogram of tumor voxel intensities, such as mean, standard deviation, skewness, and kurtosis (17); and (3) Haralick texture features for quantifying intratumor heterogeneity calculated using the gray level co-occurrence matrix, such as contrast, cor-relation, energy, entropy, homogeneity, inertia, and inverse dif-ferent moments (18). All the 39 radiomic features were computed based on standard mathematical algorithms or formu-las, as reported in previous literature. These features provide a quantitative way to capture important phenotypic information of the segmented lesions. Note that all these features were nor-malized to a standard range before used in a machine learning model for breast cancer subtype classification.
Molecular Subtype Classification
We performed three binary classification tasks for subtype pre-diction: (A) triple-negative vs non triple-negative, (B) HER2-enriched vs non HER2-enriched, and (C) luminal (A and B) vs nonluminal. The Naive Bayes machine learning scheme was employed for the classification (19). Considering that some of the 39 radiomic features may be correlated, we used the least absolute shrinkage and selection operator (LASSO) feature selection method (20) to preidentify those top-ranked or most predictive features before the classification experiments. LASSO is a regression analysis process to improve the prediction accuracy and interpretability of statistical models by altering the model fitting process to select only a subset of the provided variables for use in the final model rather than using all of them. LASSO utilizes both variable selection and regularization to select the subset of variables that minimizes predicting error of the outcome. Statistical significance of a fea-ture was assessed by the Kruskal-Wallis test. In addition, because the sample numbers of the subtypes are unbalanced for certain subtypes (eg, 29 triple-negative vs 153 non triple-negative images), we adopted the synthetic minority oversam-pling technique to balance the sample numbers. The synthetic minority oversampling technique has been used in many pre-vious works to address a similar data imbalance problem for image classification (21 23), and here in our study, we have followed the routine procedures of this technique. We utilized a 10-fold cross validation and repeated xylem 10 times to calculate an average classification performance. Classification perfor-mance was evaluated by the area under receiver operating characteristic curve (AUC) and accuracy. In addition, to evalu-ate the respective effects of the MLO and CC view images for the subtype classification, we conducted and compared the classification experiments on the MLO view images alone, CC view alone, and their combination.