Phase III

Question and Submit Instructions 

1. Vehicle speed estimation challenge

A. Participating teams were asked to submit results for individual vehicle speeds in a test set containing about 10 CCTV videos. Performance was evaluated based on the ground truth that obtained by vehicle speed sensor. Evaluation for the challenge was based on the detection rate of the control vehicles and the root mean square error (RMSE) of the predicted control vehicle speeds.

*Participants were encouraged to use the data set available from the SUNY Albany UA-DETRAC benchmark suite https://detrac-db.rit.albany.edu/ [1] in case they needed to develop models for vehicle detection and tracking.

2. Task

A. Participants should identify the speed of all vehicles on the main ways in all frames of all given videos.

3. Submission format

A. One text file should be submitted containing, on each line, details of a detected vehicle, in the following format. Values are space-delimited. 

<test_video_name> <frame_no> <obj_id> <xmin> <ymin>  <xmax> <ymax>  <speed> <confidence>

i. <test_video_name> is the test video file name.

ii. <frame_no> represents the frame count for the current frame in the current video, starting with 1.

iii. <obj_id> is a numeric identifier. It is integer.

iv. The axis-aligned rectangular bounding box of the detected video will be denoted by its pixel-valued coordinates within the image canvas, <xmin> <ymin> <xmax> <ymax>, computed from the top-left corner of the image (similar to the VOC2012 challenge format). All coordinates are integers.

v. <speed> denotes the instantaneous speed of the vehicle in the given frame, measured in kilometer per hour (km/h), which is a non-negative real value.

vi. <confidence> denotes the confidence of the prediction. Should be between 0 and 1.

B. The confidence score is not currently used in the evaluation but may be used in the future. As such, it would be beneficial to include confidence scores if possible.

i. The text file containing all predictions should be named “speed_result.txt” and can be archived using Zip (speed_result.zip) or tar+gz (speed_result.tar.gz) to reduce upload time.

C. Brief explanatory document

i. All participants should submit a brief explanatory document about what they have developed for vehicle speed estimation.

ii. The explanatory document is recommended to be written within one page of a  double-column PDF, and it should include information about the developed AI model, training data, etc.

D. Source code

i. All participants should submit the source code for the AI model they have developed. The source code should be written in Python based on PyTorch, and should include training and testing code with all libraries necessary.

ii. Additionally, it should include comments within the code and a readme file for executing the source code.

iii. The source code should be named "speed_source_code.zip” or "speed_source_code.tar.gz".


4. Evaluation

A. Speed data has been collected via in-vehicle tracking for a subset of the cars in each video, which we call ground-truth vehicles. The result will be evaluated based on the ability to localize these vehicles and predict their speed. For each ground-truth vehicle, an independent party has annotated the vehicle in all frames it appears in with a bounding box. An interpolation function was used to assign speed estimates in each frame based on the tracker speed data. The result score will be computed as where is the detection rate and is the normalized root mean square error (RMSE). The score ranges between 0 and 1, and higher scores are better. 

B. is computed as the ratio of detected ground truth vehicles and the total number of ground truth vehicles. A vehicle is said to be detected if it was localized in at least 30% of frames it appeared in. A vehicle is localized if at least one predicted bounding box exists with intersection-over-union (IOU) score of 0.5 or higher relative to the annotated bounding box for the vehicle.

C. We compute the speed estimate error as the RMSE of the ground truth vehicle speed and predicted speed for all correctly localized ground-truth vehicles. If multiple bounding boxes with IOU &gt;= 0.5 exist, we consider only the speed estimate from the one with the highest confidence score. NRMSE is the normalized RMSE score across all teams, obtained via min-max normalization given all team submissions. Specifically, NRMSE is computed as:

where and are the minimum and maximum RMSE values among all teams, respectively.


5. Important dates


[1] L. Wen, D. Du, Z. Cai, Z. Lei, M. Chang, H. Qi, J. Lim, M. Yang, and S. Lyu. UA-DETRAC: A new benchmark and protocol for multi-object detection and tracking. arXiv CoRR, abs/1511.04136, 2015.

Link for Test Dataset: Test Dataset Google Folder