Review of deep learning-driven MRI brain tumor detection and segmentation methods

: The application of deep learning in the field of medical imaging has become increasingly widespread, greatly promoting the advancement and development of Magnetic Resonance Imaging (MRI) brain tumor detection and segmentation techniques. Therefore, a comprehensive review of deep learning-based methods for MRI brain tumor detection and segmentation was conducted. This review introduces the basic concepts of brain tumors and MRI brain tumor detection and segmentation, discusses the specific applications and typical methods of deep learning in MRI brain tumor detection and segmentation, and analyzes and compares the performance and advantages and disadvantages of different methods. Additionally, representative brain tu-mor segmentation dataset (BraTS) and its evaluation metrics are introduced, upon which the performance of various deep learning-based brain tumor segmentation methods on the BraTS 2019-2022 dataset is compared. Lastly, the challenges and future development trends in deep learning-based MRI brain tumor detection and segmentation methods are summarized and anticipated.


Introduction
Brain tumors, as an abnormal cluster of cells that grow within human brain tissue, can be classified into benign (non-cancerous) and malignant (cancerous) tumors [1].Cells in benign tumors generally grow slowly, lack invasiveness and metastasis, and can be surgically removed without significantly affecting the patient's survival rate.Examples include meningiomas and acoustic neuromas.On the other hand, cells in malignant tumors proliferate rapidly and possess invasiveness and metastatic potential, such as glioblastomas and medulloblastomas.Without timely treatment, malignant brain tumors can greatly reduce a patient's survival rate.However, due to the variable locations where brain tumors can develop, including the cerebrum, cerebellum, brainstem, and other areas, detecting and diagnosing brain tumors pose challenges.Therefore, early screening and accurate diagnosis play a significant role in the prevention, treatment, and improvement of survival rates for brain tumors.
Currently, various medical imaging methods such as B-ultrasound imaging, computed tomography (CT), and magnetic resonance imaging (MRI) can be used for early detection of brain tumors.These imaging methods serve as auxiliary tools to help doctors accurately locate brain tumors [2][3][4].Among them, MRI, as a typical non-invasive imaging technique, can provide safer and more precise information about the shape, size, and location of brain tumors, making it a primary imaging modality for brain tumors [5].Due to its ability to generate multi-modal information for the same brain tumor case, MRI allows for diverse information representation, which contributes to improving the accuracy of brain tumor detection [6]. Figure 1 represents a sequence of multi-modal information obtained from MRI scans, including T1-weighted imaging (T1), T1-weighted imaging with contrast enhancement (T1CE), T2-weighted imaging (T2), and fluid-attenuated inversion recoveryimaging (FLAIR).Abdel-Gawad et al. [7] combined edge detection techniques with optimal threshold algorithms to extract tumor edge information from MRI images, which helps doctors locate the position and extent of the tumor.However, this method does not fully consider the diversity and complexity of brain tumors, such as tumor type, size, shape, and location, which may result in poor detection performance for certain special or difficult-to-segment brain tumors.To address the diversity and complexity of brain tumors, Rehman et al. [8] employed a 3D Convolutional Neural Network (CNN) to extract brain tumors and utilized a pre-trained CNN model for feature extraction, followed by feature selection using correlation.This method fully utilizes the spatial information of volumetric images, improving the accuracy and robustness of brain tumor detection.Building upon this, Jun et al. [9] designed a brain tumor detection model that combines a multi-path network and attention mechanism to further accelerate the detection speed of brain tumors.This model optimizes information extraction using attention mechanisms and reduces model complexity with the multipath network.As a result, it not only improves the accuracy of brain tumor detection but also speeds up the detection process.
However, brain tumor detection is only the first step, and once a tumor is detected, tumor segmentation becomes crucial.Brain tumor segmentation is the process of accurately separating and labeling the tumor from the surrounding normal brain tissue.Through segmentation, information such as the volume, shape, and extent of tumor spread in the tumor region can be obtained.This facilitates a better understanding of tumor characteristics and development trends for doctors, enabling them to develop personalized treatment plans for patients [10].Anithadevi et al. [11] proposed an image segmentation method that combines region growing and thresholding.This hybrid approach uses a center pixel/fixed seed point for region growing segmentation and a single threshold for thresholding segmentation, which helps improve the results of region growing segmentation.However, this traditional brain tumor segmentation method requires high computational requirements when dealing with complex images.With the development of computer technology, machine learning-based brain tumor segmentation methods have significantly improved the speed and efficiency of segmentation by leveraging hardware acceleration such as GPUs, saving computational time compared to traditional methods.
Csaholczi et al. [12] proposed an automatic recognition program for neural glioma based on a Random Forest (RF) classifier.The RF classifier used 80 computed features, including morphological features, gradients, and Gabor wavelet features, in addition to four observed features.The program achieved good accuracy in brain tumor segmentation when evaluated on the BraTS (Brain Tumor Segmentation) dataset.Support Vector Machines (SVM) are another commonly used machine learning method, but SVM sometimes requires manual design and selection of features as input.Therefore, Khan et al. [13] proposed a new framework for MRI brain tumor detection based on the U-Net architecture and SVM classifier, by integrating U-Net with SVM.The improved U-Net architecture was used to segment MRI brain tumors and extract regions of interest, and then the SVM algorithm was applied to classify normal and tumor images.This fusion method effectively addresses the limitations of SVM requiring manual feature design and selection, significantly improving the speed of segmentation and the accuracy of distinguishing the presence of brain tumors.In the field of brain tumor segmentation, machine learning methods have made significant progress compared to traditional methods.However, machine learning methods face challenges of low accuracy and low efficiency when dealing with imbalanced training samples and large-scale datasets.Therefore, finding more accurate and faster brain tumor segmentation methods holds high research value.
In recent years, many deep learning-based methods have been applied to intelligent detection and segmentation of brain tumors, achieving remarkable results in practical applications [14][15].The remaining sections of this paper are organized as follows: In Section 1, an overview of deep learning-based MRI brain tumor detection methods is provided, analyzing the advantages and limitations of these methods in brain tumor detection.In Section 2, deep learning-based MRI brain tumor segmentation methods are introduced, and the design principles and performance of different deep learning methods are further compared.In Section 3, the BraTS dataset for MRI brain tumor segmentation and evaluation metrics are introduced.In Section 4, the advantages and challenges of current deep learning-driven MRI brain tumor detection and segmentation methods are summarized, and future development directions are discussed.

Deep learning-based methods for MRI brain tumor detection
Due to the large volume of MRI image datasets involved in brain tumor detection and the similarity in appearance between tumors and normal tissues, it is a time-consuming task for doctors and can delay timely treatment for patients.The emergence of deep learning-based methods for MRI brain tumor detection offers new opportunities to address this issue.Compared to traditional detection methods [16] and machine learning approaches [17], deep learning-based methods for brain tumor detection have significant advantages due to their powerful learning capabilities and automatic feature extraction abilities.
Siddique et al. [18] employed a deep convolutional neural network (DCNN) to detect the presence of brain tumors from MRI images.The experiment used a dataset consisting of 253 brain MRI images, of which 155 had tumors.The model was able to automatically select MRI images with tumors, achieving an overall accuracy of 96%, which can effectively assist clinical experts in verifying whether a patient has a brain tumor.Saeedi et al. [19] used a similar approach, designing a new 2D CNN and a convolutional autoencoder.The experiment demonstrated that the 2D CNN had high accuracy in classifying brain tumors, providing effective assistance to radiologists in tumor detection.
Faster R-CNN [20] is a deep learning algorithm used for object detection, which speeds up the detection process by introducing a Region Proposal Network (RPN) and shared convolutional features.Building upon this, Yilmaz [21] proposed a multi-channel convolutional structure that demonstrated good accuracy in brain tumor detection.However, this method utilizes a complex 2D CNN model with multiple convolutional and pooling layers, which may lead to overfitting and high computational costs.To address this, Chattopadhyay et al. [22] proposed a high-precision automatic brain tumor detection method.Based on traditional classifiers and deep learning, they utilized an improved CNN model to assist doctors in accurately detecting brain tumors in MRI images, significantly improving treatment speed.CNN has shown great potential in the field of brain tumor detection [23][24], as it possesses features such as automatic feature learning, handling large-scale data, and automated diagnostic assistance.CNN has become a powerful tool in brain tumor detection, contributing to improved diagnostic accuracy and efficiency.
Inspired by CNN, Mostafiz et al. [25] developed an intelligent system that integrates the directional gradient histogram of MRI with deep neural features based on CNN for tumor identification.This fusion method of machine learning and deep learning achieved good accuracy in detecting brain tumors, but it requires a large amount of data and takes longer to train the system.Therefore, Rai et al. [26] proposed using a less complex U-Net model architecture (LeU-Net) with fewer layers to detect abnormalities in brain MRI images.The model only required 244.42 seconds and 252.36 seconds for processing on uncropped and cropped images, respectively, significantly reducing the training time.It achieved 98% accuracy on cropped images and 94% accuracy on uncropped images.U-Net, as a neural network architecture for image semantic segmentation, plays an important role in brain tumor detection.Some authors [27][28] have made improvements to the U-Net structure to adapt to complex brain tumor detection tasks.These improvements can enhance the network's expressive power and receptive field, better capturing feature information at different scales, and contribute to further improving the performance of brain tumor detection.
In summary, the methods based on CNN and improved U-Net have unique advantages in the field of brain tumor detection.Furthermore, brain tumor detection is only the first step in the entire process of brain tumor analysis.The more crucial step is the accurate segmentation of brain tumors.Brain tumor segmentation involves precisely labeling and delineating tumor regions in medical images, which is crucial for evaluating tumor growth and formulating treatment plans.

Deep learning-based methods for MRI brain tumor segmentation
Deep learning-based MRI brain tumor segmentation methods utilize deep neural networks to learn and extract high-level features from large-scale data, leading to significant improvements in accuracy and efficiency compared to traditional and machine learning methods.In this section, we will introduce the current mainstream brain tumor segmentation methods based on three architectures: CNN, FCN, and U-Net.

CNN-based methods
CNN is a deep learning model that is suitable for image processing and computer vision tasks.The basic structure of CNN was first proposed by LeCun et al. [29], and they achieved breakthrough results in handwritten digit recognition tasks.Building upon this, Krizhevsky et al. [30] introduced the AlexNet architecture, which won the ImageNet image classification competition by a significant margin, leading to the rapid development of CNN.
As a commonly used deep learning model, CNN does not require manual feature selection and can better adapt to complex brain tumor data, thereby improving the accuracy and efficiency of brain tumor segmentation.From the perspective of convolutional operations, there are 2D CNN and 3D CNN.Since brain images are represented using 3D data, some researchers directly use 3D CNN for brain tumor segmentation.Anand et al. [31] employed 3D CNN to segment gliomas in multimodal MRI images and used traditional machine learning methods to predict patient survival from texture features.In the implementation of this 3D CNN, a challenging issue is how to obtain a sufficient quantity and quality of training samples.In some cases, limitations in computational resources, including memory usage during the training process, can also be a problem.If the data samples are 3D images instead of 2D slices, more memory space is needed to store the data of the entire batch of 3D samples.To address these issues, three-dimensional brain images are often treated as sequences of two-dimensional slices, where both input and label data are two-dimensional samples.This approach not only saves storage space but also speeds up data transmission.For brain tumor segmentation, some approaches combine 2D and 3D convolutions.In these methods, 2D convolutions are used to extract intra-slice features, while 3D convolutions are used to capture inter-slice features [32].There are also methods that combine traditional techniques with CNN.These approaches utilize traditional methods to adjust the learning rate of the CNN, thereby improving the classification and segmentation performance [33].
Overall, CNN has played an important role as the foundational architecture in deep learning for brain tumor segmentation tasks.It learns image features through multiple layers of convolution and pooling operations, demonstrating excellent performance in image classification and segmentation tasks.However, despite the success of CNN in brain tumor segmentation, there are still limitations and challenges.For example, improved CNN models require a large amount of annotated data for training, demand significant computational resources for processing 3D MRI images, and are prone to overfitting when dealing with tasks with limited data.To address these limitations, researchers have started exploring other deep learning architectures that may be better suited for the task, such as FCN and U-Net.These new architectures offer more flexibility and effectiveness in handling multi-scale images and capturing global information, bringing new advancements to brain tumor segmentation.In the future, CNN will continue to serve as the foundation of deep learning, but in specific application scenarios, combining it with other more suitable architectures may achieve better results.

FCN-based methods
FCN is another commonly used method for brain tumor segmentation [34].In CNN, the fully connected layers compress the 2D feature maps into 1D for classification, causing the loss of spatial information in the images.Therefore, FCN replaces the fully connected layers in CNN with convolutional layers, allowing the input image size to be arbitrary and enabling end-to-end image semantic segmentation.The head of FCN is similar to AlexNet, but it removes the fully connected layers of the classification network and replaces them with upsampling and convolutional layers, which allows for pixel-wise predictions for each pixel in the image, as shown in Figure 2.However, FCN tends to lose the global semantic context of the image, resulting in blurry segmentation.Therefore, the base model VGG-16 incorporates skip connections by merging low-level and highlevel features in the last layer, which helps the FCN model achieve effective results.Shen et al. [35] designed a boundary-aware fully convolutional network (BFCN) based on FCN, which can automatically segment different sub-regions of brain tumors.This method utilizes multimodal MRI images and their symmetric difference maps to extract multi-level contextual information and improves segmentation performance by directly incorporating boundary information in the loss function.However, BFCN only uses 2D FCN and does not consider 3D information, which may result in discontinuity and inconsistency in the segmentation results.Therefore, Sun et al. [36] proposed a multi-path 3D architecture based on FCN for glioma segmentation in MRI images.This architecture uses 3D dilated convolutions in each pathway to extract feature maps with different receptive fields from multi-modal MRI images and then performs spatial fusion using skip connections.This structure helps the FCN model better localize tumor region boundaries.
In other tumor segmentation tasks, some researchers have used FCN networks as the backbone structure for kidney tumor segmentation [37][38] and achieved good segmentation results.According to literature research, it is found that in the past two years, there have been few studies using FCN-based methods for brain tumor segmentation, while the FCN variant U-Net has gradually become the mainstream framework for brain tumor segmentation.

UNET-based methods
The U-Net network, proposed by Ronneberger et al. [54], is one of the most successful networks based on the FCN structure and has become a fundamental architecture for medical image segmentation.Many algorithms have been improved based on U-Net, achieving better segmentation results, such as Res-UNet [39], U-Net++ [40], and Attention U-Net [41].
To handle volumetric medical images, the U-Net model has been introduced into the field of 3D medical image segmentation, such as 3D U-Net [42] and V-Net [43].The U-Net network is based on the classic encoder-decoder architecture proposed by SegNet [44] and addresses the issue of loss of edge information caused by downsampling through the use of skip connections.3D U-Net is an extension of U-Net designed for 3D image segmentation.It performs convolutions and pooling operations in three directions, enabling the processing of volumetric data and learning richer feature representations, thereby improving the accuracy of brain tumor segmentation.However, this also limits the depth and expressive power of the network.V-Net, on the other hand, uses residual connections to increase the depth and expressive power of the network while avoiding the problem of vanishing gradients.Zhang et al. [45] proposed an improved U-Net network inspired by V-Net.In this network, residual modules are introduced in both the encoding and decoding stages to improve computation speed and accuracy.Attention modules are also incorporated to better integrate high-level semantic information and low-level semantic information, thereby enhancing the accuracy of brain tumor segmentation.
Nodirov et al. [46] proposed a new architecture based on 3D U-Net that effectively reduces the number of parameters, improves the computational capacity, and speeds up convergence of the model.However, it still requires more computational resources and time.To address this issue, Anaya-Isaza et al. [47] introduced three new segmentation networks that are computationally efficient.These networks employ a 4-level deep encoder-decoder structure and incorporate crossattention models and separable convolution layers, achieving high performance in brain tumor segmentation while reducing the computational cost.Furthermore, some studies have combined U-Net with other techniques such as data augmentation, multi-scale processing, and attention mechanisms to further enhance the performance of brain tumor segmentation.Although these combined techniques have made significant progress in brain tumor segmentation, there are still challenges to overcome.For example, combining techniques may increase the complexity and computational cost of the models, requiring more computational resources for training and inference.Additionally, proper parameter settings and model selection are crucial to avoid issues like overfitting and underfitting.
In summary, the integration of U-Net with other techniques has significantly improved the performance of brain tumor segmentation.With the continuous development of deep learning and medical image processing technologies, we can expect further innovations in the field of brain tumor segmentation, providing more reliable and efficient support for clinical diagnosis and treatment.

Datasets
With the continuous development of deep learning in the field of medical imaging, significant progress has been made in brain tumor segmentation methods.However, training deep learning models requires a large amount of annotated dataset, which is also indispensable for brain tumor segmentation.Brain tumor datasets typically consist of multiple brain imaging scans, such as MRI or CT images, along with corresponding tumor segmentation annotations.These datasets need to be annotated by professional doctors or medical imaging experts to accurately label the tumor regions.
Currently, there are several publicly available brain tumor datasets for researchers and developers to use.For example, the BraTS dataset [48] includes multi-modal MRI images (see Figure 1) and corresponding tumor segmentation annotations, suitable for evaluating and comparing brain tumor segmentation algorithms.The LGG-1p19qDeletion dataset [49] focuses on the segmentation of low-grade gliomas (LGG) and provides different cases and multi-modal MRI images, with annotations containing information related to the 1p/19q chromosomal deletion status.The ISLES (Ischemic Stroke Lesion Segmentation) dataset [50] primarily focuses on the segmentation of stroke lesions but also includes some brain tumor images and segmentation annotations, along with MRI images at multiple time points and different types of lesions.
In recent years, the BraTS dataset has become one of the mainstream datasets for evaluating the performance of brain tumor segmentation algorithms.To gain a better understanding of the other two brain tumor segmentation datasets, a summary of their usage by researchers is provided in Table 1.In addition to these publicly available datasets, some research institutions and hospitals have created their own private brain tumor datasets for use in specific research projects [51].[61] 0.93 0.87 0.87 Milesi et al. [62] 0.92 0.87 0.85 Fidon et al. [63] BraTS-2020 0.89 0.84 0.81 Henry et al. [64] 0.89 0.84 0.79 Jun et al. [65] 0.88 0.78 0.75 Liu et al. [66] BraTS-2019 0.90 0.84 0.78 Sun et al. [36] 0.89 0.78 0.76 Zhou et al. [67] 0.87 0.87 0.79 To visually understand the brain tumor segmentation annotations on different modalities of MRI images, a case from the BraTS 2021 dataset is selected and visualized using the 3DSlicer medical image analysis software, as shown in Figure 3

Evaluation Metrics
In the task of multimodal MRI brain tumor segmentation, the Dice Similarity Coefficient (DSC) and 95% Hausdorff Distance (HD) are commonly used as evaluation metrics to assess the performance of tumor segmentation.
The DSC is a commonly used evaluation metric that measures the similarity between the segmentation result and the reference standard.The formula is as follows: 2( ) () Where SG  represents the number of intersecting pixels between the segmentation result and the reference standard, S represents the number of pixels in the segmentation result, and G represents the number of pixels in the reference standard.The DSC has a range of values between 0 and 1, with a value closer to 1 indicating a higher similarity between the segmentation result and the reference standard.
HD is a metric used to measure the shape differences between the segmentation result and the reference standard.It measures the maximum distance between points in the segmentation result and the nearest points in the reference standard.Specifically, HD compares each point in the segmentation result with points in the reference standard, identifies the pair of points with the maximum distance, and uses this maximum distance as the value of HD.However, to mitigate the influence of outliers, it is common to calculate the 95% HD, which sorts the distances in ascending order and selects the distances corresponding to the top 95% as the value of HD.HD quantifies the shape differences of the segmentation result, where a smaller value indicates a closer shape resemblance to the reference standard.The formula for HD is as follows: max{ , } max{max{min ( , )}, max{min ( , )}} Where ( , )  d a b represents the Euclidean distance between each point in the segmentation result S and all points in the reference standard G, and the shortest distance pair is retained as min ( , )  d a b .Then, the maximum distance among the retained shortest distance pairs is identified and denoted as

Conclusions and Outlook
This article provides an overview of various methods and applications of deep learning in MRI brain tumor detection and segmentation.It introduces representative datasets and corresponding evaluation metrics, and compares the performance of different algorithms using these datasets.Deep learning has several advantages in MRI brain tumor detection and segmentation.Firstly, by training on large-scale datasets, deep learning models can learn complex features of brain tumors, enabling high accuracy detection and segmentation.Secondly, deep learning methods have the capability of automated processing, extracting features directly from raw MRI images for tumor detection and segmentation through end-to-end training, eliminating the need for tedious manual feature extraction and preprocessing steps.Additionally, deep learning models exhibit strong adaptability.By training on different types and levels of brain tumor samples, they can adapt to different data and tasks.This means that deep learning models can handle variations in different tumor types and cases, demonstrating good generalization ability.
However, deep learning-driven MRI brain tumor detection and segmentation methods still face some challenges.
For rare types of brain tumors, obtaining sufficient annotated data becomes difficult due to their low occurrence rate, limiting the training and performance of deep learning models.Therefore, fewshot learning is a promising research direction aimed at improving model performance by utilizing a small number of annotated samples.Additionally, generative adversarial networks can be used to synthesize more rare data, thereby increasing the diversity of training data.
Existing deep learning methods for brain tumor detection and segmentation are mostly based on fully supervised learning, relying on a large amount of annotated data for model training.However, acquiring sufficient and accurate labeled data is costly.Therefore, semi-supervised and selfsupervised learning methods have great potential for future development, as they only require a limited amount of labeled data and a large amount of unlabeled data to train models, reducing the cost of data annotation while improving model generalization.
Due to the complex anatomical structures and pathological information of brain tumors, effectively presenting this information for medical analysis and understanding remains a challenge.3D visualization of brain tumors can meet this demand, but in some cases, data registration still poses difficulties.Therefore, developing more accurate and robust 3D image registration algorithms to effectively align images from different time points or imaging modalities would help clinicians better analyze the growth and changes of brain tumors.
In conclusion, deep learning holds great potential in the field of MRI brain tumor detection and segmentation, but challenges related to data limitations, model applicability, and clinical practice need to be overcome.Future research will continue to explore and innovate methods to provide solutions to these issues, further advancing and applying deep learning in the field of brain tumor medical image analysis.

Figure 1 :
Figure 1: Multimodal information sequence of MRI scan

Figure 2 :
Figure 2: Structural differences between FCN and AlexNet . The BraTS 2021 dataset is a commonly used brain tumor segmentation dataset in the BraTS series, consisting of MRI scans of 2000 patients.The training set contains 1251 cases, the validation set contains 219 cases, and the test set contains 530 cases.Each MRI scan has 4 modalities of 3D images.The training set contains both the 3D images and the segmentation labels, while the validation set and test set do not include segmentation labels.The validation set is used for the public leaderboard, and the test set is not publicly available and is used for the final ranking evaluation of participants.In the training set, each scan consists of 4 modalities of 3D images and 1 shared label.The 4 modalities are t1, t1ce, t2, and flair, and the shared label includes four classification labels [0, 1, 2, 4]. Figure 3 clearly shows the meanings represented by different classification labels.For Task 1 of the BraTS 2021 challenge, the final goal is to identify three subregions: the enhancing tumor (ET) region (Label 4 only), the tumor core (TC) region (including Labels 1 and 4), and the whole tumor (WT) region (including Labels 1, 2, and 4).

Figure 3 :
Figure 3: Multimodal MRI images and brain tumor labeling ab d .Similarly, ( , ) d b a represents the Euclidean distance between each point in the reference standard G and all points in the segmentation result S, and the shortest distance pair is retained as min ( , ) d b a .Then, the maximum distance among the retained shortest distance pairs is identified and denoted as ba d .Finally, the maximum value between ab d and ba d is taken as the value of HD.

Table 1 :
LGG-1p19qDeletion and ISLES datasets The BraTS dataset provides multi-modal MRI images and corresponding brain tumor segmentation annotations, covering different types and sizes of tumor samples.This dataset is widely used for the development, training, and evaluation of deep learning algorithms, as well as for comparing the performance of different algorithms.Since 2012, the Medical Image Computing and Computer-Assisted Intervention Society (MICCAI) has organized an annual multi-modal brain tumor segmentation challenge and released the corresponding MRI brain tumor segmentation datasets (2012-2023).Table 2 lists the performance results of MRI brain tumor segmentation methods on the BraTS 2019-2022 datasets in recent years.

Table 2 :
Performance comparison on different BraTS datasets