Vehicle Tracking and Detection

On Road Vehicle Breakdown Assistance Finder Project


A.FAST-HoG Detection Method In this detection method, we integrated the FAST corner detection with the HoG descriptors feature because the FAST detection can narrow down the Region of Interest (RoI) for the HoG detection, which can reduce the large processing time of the sliding window process. The FAST detector classifies a pixel p as a corner by performing a simple brightness test on a discretized circle of sixteen pixels around the pixel p. A corner is detected at p if there are twelve contiguous pixels in the circle with intensities that are all brighter or darker than the centre pixel p by a threshold t. A score function is evaluated for each candidate corner in order to perform non-maximal suppression for the final detection where Sbright is the subset of pixels in the circle that are brighter than p by the threshold t ,and Sdark the subset of pixels that are darker than p by t. The HoG feature was originally developed for detecting humans. The idea of the HoG descriptor is that the shape of the objects can always be identified by the distribution of the edge even without precise information about the edges themselves. However, a weakness of the HoG descriptor is that it is not rotationally invariant. To solve this problem, four different directions (0, 45, 90, 135 degrees) of each training samples were used in the proposed method. Each group of the orientated training sample has its own classification model and the final classification model is calculated based on all four of the orientated classification models. The extraction of a HoG feature vector starts with colour and gamma normalisation, then edges are detected by convolving the image patch with the simple mask [-1, 0, 1] both horizontally and vertically. The image patch is then subdivided into rectangular regions cells, and within each cell the gradient for each pixel is computed. In the next step each pixel computes a weighted vote for the orientation of the cell by the gradient magnitude. Those votes are accumulated in to orientation bins with the range of 0 to 180 degrees which identify as the gradient angle that stored in a histogram. Local contrast normalisation is used to suppress the effects of changes in illumination and contrast with the background on the gradient magnitude. This step was found to be essential for better performance which is achieved by grouping cells into large blocks and normalising within these blocks, ensuring that low contrast regions are stretched. The HoG feature vectors extracted from the regions of interest are imported into a binary classifier that determines the presence of a vehicle in the image patch. The method used separate SVMs to train on sample vehicle images that are categorised into four angular offsets (0, 45, 90, and 135). These four SVM’s models are then intergraded as a single classifier model that evaluates a rotationally invariant response for a single HoG feature vector. The Support Vector Machines were chosen as the learning algorithm used in classification as they demonstrated a very high accuracy in previous vehicle detection research.

B.HSV-GLCM Detection Method The second detection method uses HSV colour feature with the GLCM feature. The GLCM is a tabulation of how often different combinations of pixel brightness values occur in an image. The idea of using GLCM for detection is to calculate the values of GLCM by using sliding window method in the input image. These values are considered as the descriptors of the GLCM feature. The GLCM texture can be classified into three groups: contrast (CON), homogeneity (HOM) and entropy (ENT). Before computing the GLCM, there are some measurements of the GLCM need to be set. First of all, the number of grey levels has to be set. A grey image has 256 grey levels, so there will be 256 × 256 (65,536) combinations in the GLCM matrix; analysing it will require huge computing power and waste lots of time. Therefore, to save time and computing power, we could reduce the number grey levels. Usually, we have the choice of 16, 32, or 64. The greater the value, the better the effect, but also the longer the time required. According to the previous research [14], this value was set to 32 in our process. Secondly, the directions of the offset were set from four different orientations: 0°, 45°, 90°, and 135°. This is because GLCM is not direction invariant. Vehicles are coming from different directions in the video; we define four main directions to detect vehicles, which can solve this invariant problem. Furthermore, the offset distance between the pixels has to be set. Usually, a value of 1 is chosen for the distance between the two pixels. F. Zhou et al. [15] proposed that idea that there are relationships between distance and the calculated values (CON, HOM and ENT). In the perspective of the authors of this paper, using a Markov Random Field (MRF) could prove that a calculation is correct only when the distance is greater than the value of a GLCM feature. Conversely, when the distance is small, the results of GLCM calculations are random or change. A. Chaddad et al. [16] also proposed the same idea; in their opinion, when the distance is small, or the two chosen pixels are close together, the result of GLCM calculations rapidly change with any increase of distance. But when the distance becomes large, the result will be more stable. The conclusions of these two papers are same. As a result, it is essential to find a suitable value for the distance. The offset distance was set to 3 in the proposed detection. Once the GLCM measurements were set, the GLCM values of the training samples have been calculated and inserted into the SVM classification model. Each GLCM contains six values, which are the mean and standard deviation for CON, HOM and ENT..