A Watershed-Canny Based Approach for Building Footprint Extraction from Very High Resolution Optical Image

. The advanced very high resolution (VHR) sensors are capable of achieving sub-meter resolution, which offers the opportunity for a fine level of analysis of man-made structures. In this paper, we present a method for the extraction of 2-D building footprints from VHR optical scenes. The data sets include Worldview-2 (0.5m) image and cover urban area of San Francisco. Our main idea was to combine edge based detector, regions based segmentation and non-building masks. In the first step, Canny operator was used to extract edge from the optical image and morphological operations were used to remove small edge. In the second step, the Watershed transform was used for segmentation of optical image and morphological operations were used to remove small regions. In the third step, we computed two non-building masks which are vegetation mask and shadow mask. These two masks were applied first to filter Watershed segmentation result. In the fourth step, we have translated the Canny contour image in both directions and computed the Correlation Coefficient between Canny contour and filtered Watershed contour for each value of translation, and we got the best translation parameters when Correlation Coefficient was maximal. In the last step, the matched Canny edge image was combined separately with: 1) vegetation mask, 2) shadow mask and 3) filtered Watershed segmentation. The three results were then combined (logical "or") to obtain the final building footprint. The obtained results demonstrated that our approach performs well and improves the boundaries extraction based on Watershed segmentation.


Introduction
In the last decade, the development of remote sensing technologies and the launch of highresolution remote sensing satellites have increased the amount of available metrics images [1]. Very high resolution satellite images, which are an important source of information for decision makers, are used for many tasks like change detection, image fusion [2], urban growth monitoring and damage assessment.
Extraction of the land use information from remote sensed data is a very important task in the Geographic Information Systems (GIS). One of the important land types is building region. The extracted building regions are useful for mapping and map updating, disaster monitoring, as well as other applications. Traditionally, building regions are extracted from image data [3].
Considering the migration of thousands of people from rural to urban areas, the land cover classes in urban and suburban areas are changing rapidly, and this trend will increase in the future with urbanization. Also, and without any human intervention, Earth's surface is highly dynamic itself, it is changing continuously. Thus, continuous monitoring of changing Earth is very important and extremely useful [4].
Thereby, building detection using remote sensing images is one of the most challenging problems of target detection. For the automated building detection system, various algorithms have been developed for a better automatic building extraction but they do not provide an exact solution. In literature, the methods aim to solve this problem with a great variety of approaches. These approaches can be grouped under two methods: supervised and unsupervised methods. The unsupervised approaches, detect the buildings using predefined rule-based models and unsupervised classifiers. One of the most popular approaches in this group is the use of shape based features which represent the rectangular structure of the roofs [5]. Some algorithms employ techniques to discriminate and remove the irrelevant regions from the image, then focus on the regions which include buildings. For example, authors in [6] formulated the urban-region and building detection problems using graphical models. In [7], they used the shadow evidence to focus on building regions which models the spatial relationship between buildings and their shadows [8].

Research Frame Work and Theoretical Background
The increasing availability of very high resolution (VHR) images from satellites like GeoEye-1 and Worldview-1/4, provides great opportunity for extracting buildings from satellite images. Edge detection and segmentation are fundamental tasks in image analysis. However, conventional algorithms often miss parts of the true boundaries. Considering that building areas and their surroundings are presented with various intensity values and complex features, obtaining the segmentation of building footprints from satellite images is a complex process. In general, regionbased segmentation methods can be used to establish the boundaries of building structures. These methods divide the image into similar and homogenous regions. However, using these approaches for building detection in several cases where spectral heterogeneity exists, over-detection or underdetection is usually noticed.
In this paper, a combined building footprint extraction approach based on Watershed segmentation and Canny edge detector is proposed. In this approach, we use Canny to extract boundaries and we use Watershed transform to segment the image. We also use vegetation and shadow masks to remove non-building area and then, we combine the results to obtain the refined building footprint.

Discrete Canny Algorithm
The main steps of discrete Canny algorithm are as follows [9]:  Smooth the original image in order to reduce the noise;  Convolve the image with edge detection operator, like Sobel, to get the intensity gradient of the image: whereG x 2 and G y 2 are the gradient on x and y axis, G is the gradient on the maximum direction and θ is the angle between maximum gradient direction and x axis;  Suppression of Non-maximum: the edges are set to a width of one pixel. Every pixel which belongs to an edge doesn't have a neighbor pixel, except in edge direction.  Compute a hysteresis threshold th 1 and th 2 with th 1 >th 2 : if the grey value of a pixel is larger than th 1 , it is considered as edge. If it is smaller than th 2 , it is not. For the grey values between th 1 and th 2 , if the grey values of its abutting pixels are larger than th 1 , they are edges, otherwise they are not.

Watershed Transform
In order to divide the panchromatic image into spectrally homogeneous regions, image segmentation was performed using the Watershed segmentation algorithm. Considering the objectives of our work, this algorithm seems to be adapted and provides satisfactory results. Also, the use of the Watershed algorithm allows the placement of boundaries at the significant edges and the produced regions are closed and connected, whereas edge-based techniques usually lead to disconnected boundaries. The Watershed segmentation technique is based on a simple heuristic approach that consists in analyzing the gray level of the image pixels in an ascending order and performing region growing [10]. Because of the regions in the image characterized by small variations in gray levels, in practice, the Watershed segmentation is often applied to the gradient of an image rather than to the image itself [11].The Watershed segmentation technique uses, in addition to the gradient image, a seed image calculated from the gradient image. By gathering into the region, the pixels that are the closest to the corresponding seed, the growing process determines the region associated with each seed. This will satisfy a certain homogeneity in the gray level [10].
However, the obtained image is generally over-segmented. So, in order to overcome these shortcomings, we propose to combine the Watershed segmentation with Canny edge detector and some morphological filters.

Non-Building Mask
We also produce a vegetation mask image which is obtained by thresholding the Normalized Difference Vegetation Index (NDVI) of the multispectral image and the Shadow Index (SI) mask image. The NDVI and SI are given by [12]: SI= (RED+GREEN+BLUE+3NIR)/6 (3)

Correlation Coefficient
The Correlation Coefficient (CC) is a number representing the similarity between two images based on their respective pixels intensities. Karl Pearson defined the Pearson product-moment correlation coefficient r. It was the first formal correlation measure and is widely used in statistical analysis, pattern recognition and image processing. The Pearson's Correlation Coefficient is defined as: Here, x ij and y ij are the images to compare, whereas the subscript indices i and j refer to the pixel location in the image. Also, x m and y m are mean intensity values of 1 st and 2 nd image respectively.

Simu
In this s consists in

Appl
We use filter was a features pr

Com
In this vegetation shadow ind The Figure   Figure

Edge
In  e  Qualitative (visual) evaluation: a visual evaluation was done by comparing the result of the proposed approach with Watershed segmentation approach. The Figure 7 shows that the proposed method eliminates the over detection caused by Watershed segmentation in both datasets.  Quantitative evaluation: a quantitative evaluation was also done for the second dataset. Indeed, from 23 building identified in the dataset, 17 were correctly and completely detected, 4 building were correctly but partially detected and 2 missed building.

Conclusion
In this work, we described an approach for detecting buildings using very high spatial resolution optical image. After an initial over segmentation based on Watershed segmentation, we used the Canny edge detector and non-building masks, which are vegetation mask and shadow mask, to improve the detection results. We tested the proposed approach on two scenes with different building characteristics. The tests showed that the proposed approach is able to detect buildings with different colors and shapes. The evaluation was done visually, by comparing the results of our approach with the results of Watershed segmentation, and quantitatively by calculating the number of detected buildings compared to the number of originally identified buildings. The evaluation confirmed that the proposed approach performs well and improves the building footprint extraction based on Watershed segmentation.