The Problem with GPU Benchmarks | Reality vs. Numbers, Animation Error Methodology White Paper

The video introduces animation error, a new benchmarking metric that better captures the perceived smoothness and stuttering in games by measuring the mismatch between frame creation and display timing, addressing limitations of traditional frame rate and frame time metrics. It highlights how animation error reveals performance issues masked by conventional benchmarks, encourages its adoption alongside existing metrics, and presents the open-source tool PresentMon for accessible analysis.

The video introduces a new benchmarking metric called animation error, which addresses limitations in traditional CPU and GPU game performance testing. Traditional metrics like average frame rate and frame time intervals have been useful but do not fully capture the real player experience, especially when it comes to perceived stuttering and smoothness. Animation error measures the mismatch between the pace at which frames are created and the pace at which they are displayed, highlighting issues that frame time spikes alone cannot explain. This metric was inspired by work from Tom Peterson and others and has been developed into an open-source tool called PresentMon, allowing anyone to test and analyze animation error in games.

Animation error focuses on two key aspects of visual perception: smoothness and acceleration. While frame time pacing measures the consistency of frame display intervals, animation error quantifies how well the timing of frame display matches the animation timing within the frames themselves. This is important because even if frames are displayed evenly, the animation can still feel jerky if there is acceleration or deceleration in frame timing that does not align with the animation’s natural progression. The video demonstrates this with examples using single and dual GTX 1080 Ti GPUs, showing that despite similar frame rate metrics, the dual GPU setup exhibited significantly higher animation error, correlating with a worse perceived experience.

The video further explains the technical background of animation error, including how it differs from latency and micro stuttering. Animation error is not about frames arriving late but about jitteriness in the timing of frame display relative to the animation timeline. The presenter uses analogies like flip books and movies to illustrate how mismatches between frame creation and display timing cause visual artifacts that disrupt smooth motion. The video also discusses how animation error can reveal issues masked by traditional benchmarks, such as those caused by multi-GPU rendering techniques or frame generation technologies like Nvidia’s frame metering, which can shuffle frame timings and induce animation error without obvious frame time spikes.

To help visualize animation error, the video presents various charting methods, including scatter plots and bar charts, explaining their strengths and weaknesses. It emphasizes that while bar charts provide easier readability, they can obscure important spikes and reintroduce frame rate as a confounding variable. The video also introduces a percentage-based metric that normalizes animation error relative to total frame time, offering a more balanced comparison across different hardware setups. Several real-world examples from games like Far Cry 5, Dragon Stoma 2, and Borderlands 2 illustrate how animation error can provide new insights into game performance and player experience that were previously difficult to quantify.

Finally, the video encourages the gaming and review community to adopt animation error as a complementary metric alongside existing benchmarks. It stresses that animation error does not replace frame rate or frame time analysis but rather enhances understanding of why games sometimes feel stuttery even when traditional metrics look fine. The open-source nature of PresentMon and its increasing integration into popular tools make it accessible for reviewers and enthusiasts to experiment with. The video concludes by acknowledging the foundational work of Tom Peterson and others, urging the industry to embrace this new metric to improve the accuracy and relevance of game performance testing.