Class CuDNNFunctionOptimizations
- java.lang.Object
-
- org.nd4j.autodiff.samediff.optimize.optimizations.BaseOptimizerSet
-
- org.nd4j.autodiff.samediff.optimize.optimizations.CuDNNFunctionOptimizations
-
- All Implemented Interfaces:
OptimizerSet
public class CuDNNFunctionOptimizations extends BaseOptimizerSet
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classCuDNNFunctionOptimizations.CudnnConv2dNCHWtoNHWCConversionhttps://docs.nvidia.com/deeplearning/sdk/dl-performance-guide/index.html#tensor-layout For tensor cores: we want NHWC layout: Section 7.3.1 "Layout choice has an effect on performance, as convolutions implemented for Tensor Cores require NHWC layout and are fastest when input tensors are laid out in NHWC." "To maximize performance, we recommend using NHWC tensor layout." As for weights format: cuDNN docs are vague - but TF uses NCHW+OIHW or NHWC+OHWI
-
Field Summary
Fields Modifier and Type Field Description protected static booleanisCudaBackend
-
Constructor Summary
Constructors Constructor Description CuDNNFunctionOptimizations()
-