Package io.milvus.common.utils
Class Float16Utils
java.lang.Object
io.milvus.common.utils.Float16Utils
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionConverts a ByteBuffer to bf16 vector upcasts to float32 array.static floatbf16ToFloat(short input) Upcasts a bf16 value stored in a short into a float32 value.Converts a ByteBuffer to a fp16/bf16 vector stored in short array.static ByteBufferf16VectorToBuffer(List<Short> vector) Stores a fp16/bf16 vector into a ByteBuffer.static ByteBufferf32VectorToBf16Buffer(List<Float> vector) Rounds a float32 vector to bf16 values, and stores into a ByteBuffer.static ByteBufferf32VectorToFp16Buffer(List<Float> vector) Rounds a float32 vector to fp16 values, and stores into a ByteBuffer.static shortfloatToBf16(float input) Converts a float32 into bf16.static shortfloatToFp16(float input) Rounds a float32 value to a fp16 stored in a short.Converts a ByteBuffer to fp16 vector upcasts to float32 array.static floatfp16ToFloat(short input) Upcasts a fp16 value stored in a short to a float32 value.
-
Constructor Details
-
Float16Utils
public Float16Utils()
-
-
Method Details
-
floatToBf16
public static short floatToBf16(float input) Converts a float32 into bf16. May not produce correct values for subnormal floats. This method is copied from microsoft ONNX Runtime: https://github.com/microsoft/onnxruntime/blob/main/java/src/main/jvm/ai/onnxruntime/platform/Fp16Conversions.java- Parameters:
input- a standard float32 value which will be converted to a bfloat16 value- Returns:
- a short value to store the bfloat16 value
-
bf16ToFloat
public static float bf16ToFloat(short input) Upcasts a bf16 value stored in a short into a float32 value. This method is copied from microsoft ONNX Runtime: https://github.com/microsoft/onnxruntime/blob/main/java/src/main/jvm/ai/onnxruntime/platform/Fp16Conversions.java- Parameters:
input- a bfloat16 value which will be converted to a float32 value- Returns:
- a float32 value converted from a bfloat16
-
floatToFp16
public static short floatToFp16(float input) Rounds a float32 value to a fp16 stored in a short. This method is copied from microsoft ONNX Runtime: https://github.com/microsoft/onnxruntime/blob/main/java/src/main/jvm/ai/onnxruntime/platform/Fp16Conversions.java- Parameters:
input- a standard float32 value which will be converted to a float16 value- Returns:
- a short value to store the float16 value
-
fp16ToFloat
public static float fp16ToFloat(short input) Upcasts a fp16 value stored in a short to a float32 value. This method is copied from microsoft ONNX Runtime: https://github.com/microsoft/onnxruntime/blob/main/java/src/main/jvm/ai/onnxruntime/platform/Fp16Conversions.java- Parameters:
input- a float16 value which will be converted to a float32 value- Returns:
- a float32 value converted from a float16 value
-
f32VectorToBf16Buffer
Rounds a float32 vector to bf16 values, and stores into a ByteBuffer.- Parameters:
vector- a float32 vector- Returns:
ByteBufferthe vector is converted to bfloat16 values and stored into a ByteBuffer
-
fp16BufferToVector
Converts a ByteBuffer to fp16 vector upcasts to float32 array.- Parameters:
buf- a buffer to store a float16 vector- Returns:
- List of Float a float32 vector
-
f32VectorToFp16Buffer
Rounds a float32 vector to fp16 values, and stores into a ByteBuffer.- Parameters:
vector- a float32 vector- Returns:
ByteBufferthe vector is converted to float16 values and stored in a ByteBuffer
-
bf16BufferToVector
Converts a ByteBuffer to bf16 vector upcasts to float32 array.- Parameters:
buf- a buffer to store a bfloat16 vector- Returns:
- List of Float the vector is converted to float32 values
-
f16VectorToBuffer
Stores a fp16/bf16 vector into a ByteBuffer.- Parameters:
vector- a float16 vector stored in a list of Short- Returns:
ByteBuffera buffer to store the float16 vector
-
bufferToF16Vector
Converts a ByteBuffer to a fp16/bf16 vector stored in short array.- Parameters:
buf- a buffer to store a float16 vector- Returns:
- List of Short the vector is converted to a list of Short, each Short value is a float16 value
-