- All Implemented Interfaces:
- Optimizer
- Enclosing class:
- CuDNNFunctionOptimizations
public static class CuDNNFunctionOptimizations.CudnnConv2dNCHWtoNHWCConversion
extends Object
implements Optimizer
https://docs.nvidia.com/deeplearning/sdk/dl-performance-guide/index.html#tensor-layout
For tensor cores: we want NHWC layout:
Section 7.3.1
"Layout choice has an effect on performance, as convolutions implemented for Tensor Cores require NHWC layout and are fastest when input tensors are laid out in NHWC."
"To maximize performance, we recommend using NHWC tensor layout."
As for weights format: cuDNN docs are vague - but TF uses NCHW+OIHW or NHWC+OHWI