Knowledge of the root causes of delays in transit networks has obvious value; it can be used to direct resources toward mitigation efforts and measure the effectiveness of those efforts. However, delays with indirect causes can be difficult to attribute, and may be assigned to broad categories that indicate “overcrowding,” incorrectly naming heavy ridership, train congestion, or both, as the cause. This paper describes a methodology to improve such incident assignments using historical train movement and incident data to determine if there is a root-cause incident responsible for the delay. It is intended as a first step toward improved, data-driven delay recording to help time-strapped dispatchers investigate incident impacts. This methodology considers a train’s previous trip and when it arrived at the terminal to begin its next trip, as well as en route running times and dwell times. If the largest source of delay can be traced to a specific incident, that incident is suggested as the cause. For New York City Transit (NYCT), this methodology reassigns about 7% of trains originally without a root cause identified by dispatchers. Its results are provided to NYCT’s Rail Control Center staff via automated daily reports which, along with other improvements to delay recording procedures, has reduced these “overcrowding” categories from encompassing up 38% of all delays in early 2018 to only 28% in 2019. The results confirm both that it is possible to improve delay cause diagnoses with algorithms and that there are delays for which both humans and algorithms find it difficult to determine a cause.
Related Publications: