| Interface | Description |
|---|---|
| GpuAssignmentStrategy |
Strategies to find next available GpuID.
|
| Class | Description |
|---|---|
| Job |
A class represents an inference job.
|
| ModelInfo |
A class represent a loaded model and it's metadata.
|
| ModelManager |
A class that in charge of managing models.
|
| PermanentBatchAggregator |
a batch aggregator that never terminates by itself.
|
| RoundRobinGpuAssignmentStrategy |
Assign next gpu using round robin to get the next gpuID.
|
| TemporaryBatchAggregator |
a batch aggregator that terminates after a maximum idle time.
|
| WorkerIdGenerator |
class to generate an unique worker id.
|
| Enum | Description |
|---|---|
| WorkerState |
An enum represents state of a worker.
|
| Exception | Description |
|---|---|
| ScaleCapacityExceededException |
Is thrown when capacity of workers is reached during autoscaling.
|