STEM I

STEM at Mass Academy is taught by Dr. C. Over the course of six months, students are guided through the process of brainstorming, researching, and finalizing a STEM project that culminates in a school-wide fair in February.

Enhancing Bicycle Safety using Computer Vision for Real-Time Vehicle Detection and Risk Analysis

The overall aim of this project is to develop an early warning system capable of identifying approaching vehicles and warning cyclists of potential collisions using extracted information such as the vehicle's speed, distance, and position on the road.

Abstract

Over 45,000 bicycle accidents are reported annually, with 30% of these caused by collisions with motor vehicles. The overall aim of this project is to develop an early warning system capable of identifying approaching vehicles and warning cyclists of potential collisions using the vehicle's speed, distance, and position on the road. To accomplish this, a camera was mounted on the rear end of a bicycle that collected image frames of the environment. A CNN capable of detecting vehicles was deployed on a raspberry pi connected to the camera. The output bounding box of the vehicle along with metrics such as the speed and distance of the vehicle was shown on the display mounted on the handlebar of the bicycle. Results show that the model had an accuracy of 90% in detecting vehicles in an image frame while having a false negative rate of less than 1%. In addition, the model’s performance on a variety of edge devices was tested. The devices averaged a latency between frames of less than 50 ms. The results indicate that the model is effective at detecting vehicles in ideal lighting conditions. However, the accuracy suffers under inclement weather and low lighting conditions that fail to match the lighting levels present in the training data. This can be mitigated by introducing additional training samples with poor lighting conditions to allow the model to adapt to varying light levels. Alternatively, ultrasonic or LIDAR sensors may be used in conjunction with the camera.

graphicabstract Click here to access my supporting documents

Phrase 1

Many thousands of fatal bicycle accidents occur each year due to collisions with motor vehicles.

Phrase 2

The overall aim of this project is to develop an early warning system capable of identifying approaching vehicles and warning cyclists of potential collisions using extracted information such as the vehicle's speed, distance, and position on the road.

Background

backgroundinfograph

Bicycle-motor vehicle collisions are becoming increasingly common as more individuals opt to use bicycles as a clean source of transportation and exercise. However, the risk of collisions deter many from using this method of transportation. The goal of this project is to develop a system that will be capable of detecting cars automatically and identifying risk factors which can then be relayed to the cyclist.In 2019, 843 people lost their lives in crashes with motor vehicles with 21% of these occurring between 6 P.M. and 9 P.M (Fatality Facts, 2019). In fact, 30% of all bicycle accidents occur as a result of collisions with motor vehicles. Collisions are caused by a variety of factors such as distracted drivers, speeding, and failing to abide by traffic laws. While only 2% of all bicycle accidents are fatal, they can still result in long-term head and neck injuries that can lead to severe pain and loss of motor functions. One of the most common types of bicycle accidents is the right hook, where a car passes a cyclist to the left and makes a right turn (NHTSA, 2021). If the driver fails to leave sufficient space for the cyclist before turning, an accident may occur. This can be made more difficult to account for by the cyclist if the driver does not indicate their turn. Another common type of collision with motor vehicles is the left cross, which occurs when a driver makes a left turn in front of a cyclist traveling in the opposite direction, causing the cyclist to collide with the side of the motor vehicle. As such, being able to detect the possibility of an accident is crucial to improving the safety of cyclists on the road.

Procedure

methodsinfograph

Image data on vehicles was required to train a custom object detection model. Datasets on vehicles were obtained using the Stanford AI Lab – Cars Dataset which contained 16185 images of vehicles. This consisted of 196 different classes of cars with 8144 training images and 8041 testing images in jpg format. In addition, this dataset included bounding box information for each image describing the coordinates of the rectangle enclosing the vehicle. For model training, a desktop PC with an Intel i7-8700, NVIDIA RTX 2080, and 32 GB of RAM was used. To test inferencing on portable edge devices, a Raspberry Pi 4 with 4GB of RAM powered by a 4200 mAh battery pack was used. Additionally, the SSD_MobileNet_V2 model pretrained on the COCO dataset was imported from the Tensorflow Model Zoo before being trained on the custom vehicle data previously described. These existing weights were updated through the retraining process. Python was primarily used for software development in this project. The Numpy, Pandas, and Pillow libraries were employed to import, resize, and format the image data into the appropriate dimensions for the model architecture. The Anaconda package manager was used to install and manage python libraries while Jupyter Notebooks were used to write software and visualize data. The Cars Dataset from the Stanford AI lab was downloaded onto the project directory. The MATLAB file describing the 8144 training samples was loaded into a Pandas Dataframe using the python library mat4py. This dataframe consisted of 8 columns representing information about each image: the filename, width, height, class of vehicle in the image, and the four values representing the bounding box surrounding the vehicle in the image. Utilizing the bounding box coordinates, the vehicle was cropped out of the larger image and stored in a separate directory. Negative samp¬¬les, or background samples, were generated by a web crawler that collected 1000 images not containing a vehicle. These images tell the model what to avoid when detecting vehicles in each image. Additionally, training samples were generated by overlaying cropped vehicles on the obtained background samples. Positive samples were resized and cropped using the opencv-python library. Haar-Cascades are a machine-learning based approach in which positive samples and negative samples are used to train a model capable of detecting objects in an image. Positive images contain the target object which the model aims to detect, while negative images contain a variety of different objects or backgrounds which allows the model to learn to detect the target image under different conditions. Particularly, this model utilizes vertical and horizontal edge features to identify the parts of an image where pixel intensities dramatically change, marking the edges of objects. Figure 1 shows a visual representation of the matrices of 1s and 0s which make up these features. Upon training the model, an xml file containing the model was generated by the OpenCV library. Using this xml file, software was written to load the model into a python environment and perform bounding box predictions on vehicles in an image. The model was evaluated on a laptop running an Intel Core i7-9750h CPU at 2.6 GHz to determine the baseline performance. Then, the model was downloaded to a Raspberry Pi 4 to evaluate frames-per-second and latency metrics on real-time dashcam footage. A statistical test was required to determine the degree to which changing environments and the number of vehicles in an image affect the variation of latencies between the laptop and Raspberry Pi 4. A variety of tests were considered to eliminate the null hypothesis stating that the device used did not affect the variation in latencies, but ultimately the F-test was most suitable to analyze these results. A statistical F-test compares the variance of two datasets using a calculated F Statistic ((F-Test, n.d.). If the ratio of the two variances equals 1, it can be assumed that the null hypothesis is true.

i7graph

Figure 1: Vehicle detection latency for Haar-Cascade Model on i7 CPU

rpi4graph

Figure 2: Vehicle detection latency for Haar-Cascade Model on Raspberry Pi

ssdgraph

Figure 3: Vehicle detection latency for SSD_Mobilenet Model on i7 CPU and RTX 2080 GPU

Analysis

Results show that the model is effective at detecting vehicles in each image frame at low latency levels. Generally, an IoU score exceeding 0.5 is indicative of a strong predictive model in object detection (Sandeep, 2019). The mean IoU score calculated based on the predicted bounding boxes on testing images was 0.8336 indicating a very accurate model. In addition, the minimum IoU score calculated of the 400 testing images was 0.6680, which still exceeds the threshold 0.5 value. The vehicle detection latency on the Intel Core i7-9750h processor averaged 44.5 milliseconds, meaning that the cyclist would see 22.7 frames per second on average. Based on Figure 2, significant spikes in the data can be observed, indicating that a variety of factors could affect the latency data such as the number of vehicles in a particular image frame or the speed at which they are moving at. In addition, the dashcam footage used for latency testing in a real-world application contained varying lighting levels unlike the training dataset which could account for some of the variance. Latency figures on the Raspberry Pi 4 were significantly greater than those of the Intel processor due to limited compute resources and significantly lower power envelope. As shown in Figure 3, the model required upwards of 375 ms to perform inferencing on the first frame, likely due to initial delays in loading the model into system memory and starting the camera stream. Additionally, clear jumps in latency can be observed at the 1250 and 1750 frame points, which correspond to the camera stopping movement and resuming respectively, suggesting that the model performs inferencing more quickly under stationary conditions. To evaluate the effects of changing conditions in the image frame on model performance between the two processing units, an F-test was performed. Comparing the variance between the two latency results yielded a score of 1.246 and a corresponding p value of 0.0413. This indicates that the Raspberry Pi is more significantly affected by variation in the input image frame than the Core i7 processor.

Discussion/Conclusion

The developed Haar-Cascade model can be deployed on bicycles, as well as other vehicles such as motorcycles, mopeds, and scooters to detect approaching vehicles and warn the rider of a potential collision. The portable nature of the Raspberry Pi computer allows the device to be fitted to nearly any vehicle to provide additional information regarding the vehicle’s surroundings to the driver. Moreover, as hardware devices become increasingly powerful, this easy-to-implement model can be scaled to devices with greater compute resources, resulting in increased performance and lower latencies. There are many areas where future research can be performed to improve the effectiveness of this solution. Primarily, a wider variety of training samples can be used to train the model by changing the lighting conditions, number of vehicles in an image, and weather conditions. Currently, the model is most effective under well-lit daylight conditions, so adding varied training samples would allow the model to perform inferences effectively under less ideal conditions. Another area of research is developing a model that solely uses features such as the headlights and taillights of a vehicle, which the Raspberry Pi can switch to using when the average pixel intensity value of an image decreases below a certain threshold, indicating that lighting conditions are worsening.

References

February Fair Poster