Wednesday, January 11, 2017

Instrument blunders: AMD, FCAT, and Ashes of the Singularity benchmarks



The previous day, we pronounced on Ashes of the Singularity overall performance in DirectX 12 and how it offers AMD a considerable benefit over Nvidia. There’s a record making the rounds from Guru3D that shows AMD’s FCAT outcomes as compared with Nvidia. The ensuing frame time plot makes AMD look horrible, however these consequences aren’t correct. The output looks the manner it does due to the fact there’s a mismatch between what FCAT expects and the way AMD’s motive force in reality plays picture compositing. This creates the distinct affect (seen below) of poor overall performance on AMD GPUs.

First, a few basics. FCAT is a device NVIDIA pioneered that may be used to document, playback, and analyze the output that a recreation sends to the show. This captures a recreation at a one of a kind factor than FRAPS does, and it gives exceptional-grained evaluation of the whole captured consultation. Guru3D argues that FCAT’s outcomes are intrinsically accurate due to the fact “wherein we measure with FCAT is definitive although, it’s what your eyes will see and observe.” Guru3D is incorrect. FCAT facts output information, but its analysis of that information is based totally on assumptions it makes approximately the output — assumptions that don’t reflect what customers enjoy in this example.

AMD’s motive force follows Microsoft’s pointers for DX12 and composites the usage of the computer windows manager to increase smoothness and decrease tearing. FCAT, in assessment, assumes that the GPU is the use of DirectFlip.

According to Oxide, the trouble is that FCAT assumes so-referred to as intermediate frames make it into the facts stream and relies upon on these frames for its facts analysis. If V-Sync is applied in a different way than FCAT expects, the FCAT tools cannot properly examine the final output. The application’s accuracy is best as dependable as its assumptions, in any case.

An Oxide consultant advised us that the handiest real terrible from AMD’s switch to DWM compositing from DirectFlip “[I]s that it throws off FCAT.”

In this example, AMD is using Microsoft’s endorsed compositing technique, no longer the approach that FCAT expects, and the result is an FCAT graph that makes AMD’s performance appearance horrible. It isn’t. From an end-user’s angle, compositing thru DWM eliminates tearing in windowed mode and might reduce it in fullscreen mode as properly when V-Sync is disabled.

When we approached Oxide about this hassle, the company provided us with an event trace for home windows (ETW) of Ashes of the Singularity jogging on an AMD Radeon R9 390X.

The pinnacle row of the yellow facts line indicates whilst facts become supplied to the back buffer. There’s a few moderate variation, to make sure — however no longer the crazy up-and-down sample FCAT is showing. Oxide recommends the usage of ETW for overall performance analysis on body smoothness, for the reason that instances it gives are correct to inside 100 microseconds (zero.1ms).

Consistent with Oxide, Microsoft is creating a large push in windows 10 to make the working device cooperative, with an emphasis on easy photo presentation (which is why the AMD driving force composites the use of WDM as opposed to DirectFlip). DirectFlip also isn’t as electricity-green as WDM. All of those issues, but, make it greater tough to profile programs.

FCAT is an exceptionally beneficial and effective device, but it’s now not ideal. In his initial coverage of FCAT numerous years in the past, Scott Wasson, who pioneered using “in the 2d” strategies of studying GPU output, wrote the subsequent:

There’s a quite vast assumption at other websites that FCAT statistics is “higher” since it comes from later in the body manufacturing process, and a few parents like to say Fraps is less “accurate” as a end result. I dispute those notions. Fraps and FCAT are each correct for what they degree; they just degree one of a kind points within the body production process.

It’s quite possible that Fraps facts is a better indication of animation smoothness than FCAT statistics. as an instance, a smooth line in an FCAT body time distribution wouldn’t lead to clean animation if the sport engine’s inner simulation timing doesn’t fit nicely with how frames are being added to the display. The simulation’s timing determines the *content material* of the frames being produced, and you should healthy the sim timing to the show timing to supply optimally fluid animation. Even “best” delivery of the frames to the show will appearance awful if the visual records in the ones frames is out of sync.

No comments:

Post a Comment