String queryId
The unique identifier that identifies the query to be canceled.
String status
The status of the cancelation
String loadId
The ID of the load job to be deleted.
String status
The cancellation status.
String id
The unique identifier of the data-processing job.
String neptuneIamRoleArn
The ARN of an IAM role that provides Neptune access to SageMaker and Amazon S3 resources. This must be listed in your DB cluster parameter group or an error will occur.
Boolean clean
If set to TRUE, this flag specifies that all Neptune ML S3 artifacts should be deleted when the job
is stopped. The default is FALSE.
String status
The status of the cancellation request.
String id
The unique identifier of the model-training job to be canceled.
String neptuneIamRoleArn
The ARN of an IAM role that provides Neptune access to SageMaker and Amazon S3 resources. This must be listed in your DB cluster parameter group or an error will occur.
Boolean clean
If set to TRUE, this flag specifies that all Amazon S3 artifacts should be deleted when the job is
stopped. The default is FALSE.
String status
The status of the cancellation.
String id
The unique ID of the model transform job to be canceled.
String neptuneIamRoleArn
The ARN of an IAM role that provides Neptune access to SageMaker and Amazon S3 resources. This must be listed in your DB cluster parameter group or an error will occur.
Boolean clean
If this flag is set to TRUE, all Neptune ML S3 artifacts should be deleted when the job is stopped.
The default is FALSE.
String status
the status of the cancelation.
String id
A unique identifier for the new inference endpoint. The default is an autogenerated timestamped name.
String mlModelTrainingJobId
The job Id of the completed model-training job that has created the model that the inference endpoint will point
to. You must supply either the mlModelTrainingJobId or the mlModelTransformJobId.
String mlModelTransformJobId
The job Id of the completed model-transform job. You must supply either the mlModelTrainingJobId or
the mlModelTransformJobId.
Boolean update
If set to true, update indicates that this is an update request. The default is
false. You must supply either the mlModelTrainingJobId or the
mlModelTransformJobId.
String neptuneIamRoleArn
The ARN of an IAM role providing Neptune access to SageMaker and Amazon S3 resources. This must be listed in your DB cluster parameter group or an error will be thrown.
String modelName
Model type for training. By default the Neptune ML model is automatically based on the modelType
used in data processing, but you can specify a different model type here. The default is rgcn for
heterogeneous graphs and kge for knowledge graphs. The only valid value for heterogeneous graphs is
rgcn. Valid values for knowledge graphs are: kge, transe,
distmult, and rotate.
String instanceType
The type of Neptune ML instance to use for online servicing. The default is ml.m5.xlarge. Choosing
the ML instance for an inference endpoint depends on the task type, the graph size, and your budget.
Integer instanceCount
The minimum number of Amazon EC2 instances to deploy to an endpoint for prediction. The default is 1
String volumeEncryptionKMSKey
The Amazon Key Management Service (Amazon KMS) key that SageMaker uses to encrypt data on the storage volume attached to the ML compute instances that run the training job. The default is None.
String sourceS3DirectoryPath
The path to the Amazon S3 location where the Python module implementing your model is located. This must point to
a valid existing Amazon S3 location that contains, at a minimum, a training script, a transform script, and a
model-hpo-configuration.json file.
String trainingEntryPointScript
The name of the entry point in your module of a script that performs model training and takes hyperparameters as
command-line arguments, including fixed hyperparameters. The default is training.py.
String transformEntryPointScript
The name of the entry point in your module of a script that should be run after the best model from the
hyperparameter search has been identified, to compute the model artifacts necessary for model deployment. It
should be able to run with no command-line arguments.The default is transform.py.
String sourceS3DirectoryPath
The path to the Amazon S3 location where the Python module implementing your model is located. This must point to
a valid existing Amazon S3 location that contains, at a minimum, a training script, a transform script, and a
model-hpo-configuration.json file.
String transformEntryPointScript
The name of the entry point in your module of a script that should be run after the best model from the
hyperparameter search has been identified, to compute the model artifacts necessary for model deployment. It
should be able to run with no command-line arguments. The default is transform.py.
String id
The unique identifier of the inference endpoint.
String neptuneIamRoleArn
The ARN of an IAM role providing Neptune access to SageMaker and Amazon S3 resources. This must be listed in your DB cluster parameter group or an error will be thrown.
Boolean clean
If this flag is set to TRUE, all Neptune ML S3 artifacts should be deleted when the job is stopped.
The default is FALSE.
String status
The status of the cancellation.
Integer statusCode
The HTTP response code: 200 if the delete was successful, or 204 if there were no statistics to delete.
String status
The cancel status.
DeleteStatisticsValueMap payload
The deletion payload.
Integer statusCode
The HTTP response code: 200 if the delete was successful, or 204 if there were no statistics to delete.
String status
The cancel status.
DeleteStatisticsValueMap payload
The deletion payload.
String action
The fast reset action. One of the following values:
initiateDatabaseReset – This action generates a unique token needed to actually perform
the fast reset.
performDatabaseReset – This action uses the token generated by the
initiateDatabaseReset action to actually perform the fast reset.
String token
The fast-reset token to initiate the reset.
String status
The status is only returned for the performDatabaseReset action, and indicates whether
or not the fast reset rquest is accepted.
FastResetToken payload
The payload is only returned by the initiateDatabaseReset action, and contains the
unique token to use with the performDatabaseReset action to make the reset occur.
String gremlinQuery
The Gremlin explain query string.
ByteBuffer output
A text blob containing the Gremlin explain result, as described in Tuning Gremlin queries.
String gremlinQuery
The Gremlin query string to profile.
Boolean results
If this flag is set to TRUE, the query results are gathered and displayed as part of the profile
report. If FALSE, only the result count is displayed.
Integer chop
If non-zero, causes the results string to be truncated at that number of characters. If set to zero, the string contains all the results.
String serializer
If non-null, the gathered results are returned in a serialized response message in the format specified by this parameter. See Gremlin profile API in Neptune for more information.
Boolean indexOps
If this flag is set to TRUE, the results include a detailed report of all index operations that took
place during query execution and serialization.
ByteBuffer output
A text blob containing the Gremlin Profile result. See Gremlin profile API in Neptune for details.
String gremlinQuery
Using this API, you can run Gremlin queries in string format much as you can using the HTTP endpoint. The interface is compatible with whatever Gremlin version your DB cluster is using (see the Tinkerpop client section to determine which Gremlin releases your engine version supports).
String serializer
If non-null, the query results are returned in a serialized response message in the format specified by this parameter. See the GraphSON section in the TinkerPop documentation for a list of the formats that are currently supported.
String requestId
The unique identifier of the Gremlin query.
GremlinQueryStatusAttributes status
The status of the Gremlin query.
ByteBuffer results
A text blob containing the openCypher explain results.
String token
A UUID generated by the database in the initiateDatabaseReset action, and then consumed by the
performDatabaseReset to reset the database.
String status
Set to healthy if the instance is not experiencing problems. If the instance is recovering from a
crash or from being rebooted and there are active transactions running from the latest server shutdown, status is
set to recovery.
String startTime
Set to the UTC time at which the current server process started.
String dbEngineVersion
Set to the Neptune engine version running on your DB cluster. If this engine version has been manually patched
since it was released, the version number is prefixed by Patch-.
String role
Set to reader if the instance is a read-replica, or to writer if the instance is the
primary instance.
String dfeQueryEngine
Set to enabled if the DFE engine is fully enabled, or to viaQueryHint (the default) if
the DFE engine is only used with queries that have the useDFE query hint set to true.
QueryLanguageVersion gremlin
Contains information about the Gremlin query language available on your cluster. Specifically, it contains a version field that specifies the current TinkerPop version being used by the engine.
QueryLanguageVersion sparql
Contains information about the SPARQL query language available on your cluster. Specifically, it contains a version field that specifies the current SPARQL version being used by the engine.
QueryLanguageVersion opencypher
Contains information about the openCypher query language available on your cluster. Specifically, it contains a version field that specifies the current operCypher version being used by the engine.
Map<K,V> labMode
Contains Lab Mode settings being used by the engine.
Integer rollingBackTrxCount
If there are transactions being rolled back, this field is set to the number of such transactions. If there are none, the field doesn't appear at all.
String rollingBackTrxEarliestStartTime
Set to the start time of the earliest transaction being rolled back. If no transactions are being rolled back, the field doesn't appear at all.
Map<K,V> settings
Contains information about the current settings on your DB cluster. For example, contains the current cluster
query timeout setting (clusterQueryTimeoutInMs).
String queryId
The unique identifier that identifies the Gremlin query.
String queryId
The ID of the query for which status is being returned.
String queryString
The Gremlin query string.
QueryEvalStats queryEvalStats
The evaluation status of the Gremlin query.
String status
Status of the data processing job.
String id
The unique identifier of this data-processing job.
MlResourceDefinition processingJob
Definition of the data processing job.
String status
The status of the inference endpoint.
String id
The unique identifier of the inference endpoint.
MlResourceDefinition endpoint
The endpoint definition.
MlConfigDefinition endpointConfig
The endpoint configuration
String status
The status of the model training job.
String id
The unique identifier of this model-training job.
MlResourceDefinition processingJob
The data processing job.
MlResourceDefinition hpoJob
The HPO job.
MlResourceDefinition modelTransformJob
The model transform job.
List<E> mlModels
A list of the configurations of the ML models being used.
String status
The status of the model-transform job.
String id
The unique identifier of the model-transform job to be retrieved.
MlResourceDefinition baseProcessingJob
The base data processing job.
MlResourceDefinition remoteModelTransformJob
The remote model transform job.
List<E> models
A list of the configuration information for the models being used.
String queryId
The unique ID of the openCypher query for which to retrieve the query status.
String queryId
The unique ID of the query for which status is being returned.
String queryString
The openCypher query string.
QueryEvalStats queryEvalStats
The openCypher query evaluation status.
String status
The HTTP return code of the request. If the request succeeded, the code is 200. See Common error codes for DFE statistics request for a list of common errors.
Statistics payload
Statistics for property-graph data.
String mode
Mode can take one of two values: BASIC (the default), and DETAILED.
Integer statusCode
The HTTP return code of the request. If the request succeeded, the code is 200.
PropertygraphSummaryValueMap payload
Payload containing the property graph summary response.
String mode
Mode can take one of two values: BASIC (the default), and DETAILED.
Integer statusCode
The HTTP return code of the request. If the request succeeded, the code is 200.
RDFGraphSummaryValueMap payload
Payload for an RDF graph summary response
String status
The HTTP return code of the request. If the request succeeded, the code is 200. See Common error codes for DFE statistics request for a list of common errors.
When invoking this operation in a Neptune cluster that has IAM authentication enabled, the IAM user or role making the request must have a policy attached that allows the neptune-db:GetStatisticsStatus IAM action in that cluster.
Statistics payload
Statistics for RDF data.
Long limit
Specifies the maximum number of records to return. There is also a size limit of 10 MB on the response that can't
be modified and that takes precedence over the number of records specified in the limit parameter.
The response does include a threshold-breaching record if the 10 MB limit was reached.
The range for limit is 1 to 100,000, with a default of 10.
String iteratorType
Can be one of:
AT_SEQUENCE_NUMBER – Indicates that reading should start from the event sequence number
specified jointly by the commitNum and opNum parameters.
AFTER_SEQUENCE_NUMBER – Indicates that reading should start right after the event sequence
number specified jointly by the commitNum and opNum parameters.
TRIM_HORIZON – Indicates that reading should start at the last untrimmed record in the system,
which is the oldest unexpired (not yet deleted) record in the change-log stream.
LATEST – Indicates that reading should start at the most recent record in the system, which is
the latest unexpired (not yet deleted) record in the change-log stream.
Long commitNum
The commit number of the starting record to read from the change-log stream. This parameter is required when
iteratorType isAT_SEQUENCE_NUMBER or AFTER_SEQUENCE_NUMBER, and ignored
when iteratorType is TRIM_HORIZON or LATEST.
Long opNum
The operation sequence number within the specified commit to start reading from in the change-log stream data.
The default is 1.
String encoding
If set to TRUE, Neptune compresses the response using gzip encoding.
Map<K,V> lastEventId
Sequence identifier of the last change in the stream response.
An event ID is composed of two fields: a commitNum, which identifies a transaction that changed the
graph, and an opNum, which identifies a specific operation within that transaction:
Long lastTrxTimestampInMillis
The time at which the commit for the transaction was requested, in milliseconds from the Unix epoch.
String format
Serialization format for the change records being returned. Currently, the only supported value is
NQUADS.
List<E> records
An array of serialized change-log stream records included in the response.
Integer totalRecords
The total number of records in the response.
String queryId
The ID of the Gremlin query.
String queryString
The query string of the Gremlin query.
QueryEvalStats queryEvalStats
The query statistics of the Gremlin query.
Boolean includeWaiting
If set to TRUE, the list returned includes waiting queries. The default is FALSE;
Integer limit
The number of load IDs to list. Must be a positive integer greater than zero and not more than 100
(which is the default).
Boolean includeQueuedLoads
An optional parameter that can be used to exclude the load IDs of queued load requests when requesting a list of
load IDs by setting the parameter to FALSE. The default value is TRUE.
String status
Returns the status of the job list request.
LoaderIdResult payload
The requested list of job IDs.
Integer maxItems
The maximum number of items to return (from 1 to 1024; the default is 10).
String neptuneIamRoleArn
The ARN of an IAM role that provides Neptune access to SageMaker and Amazon S3 resources. This must be listed in your DB cluster parameter group or an error will occur.
Integer maxItems
The maximum number of items to return (from 1 to 1024; the default is 10.
String neptuneIamRoleArn
The ARN of an IAM role that provides Neptune access to SageMaker and Amazon S3 resources. This must be listed in your DB cluster parameter group or an error will occur.
Integer maxItems
The maximum number of items to return (from 1 to 1024; the default is 10).
String neptuneIamRoleArn
The ARN of an IAM role that provides Neptune access to SageMaker and Amazon S3 resources. This must be listed in your DB cluster parameter group or an error will occur.
Integer maxItems
The maximum number of items to return (from 1 to 1024; the default is 10).
String neptuneIamRoleArn
The ARN of an IAM role that provides Neptune access to SageMaker and Amazon S3 resources. This must be listed in your DB cluster parameter group or an error will occur.
Boolean includeWaiting
When set to TRUE and other parameters are not present, causes status information to be returned for
waiting queries as well as for running queries.
Integer acceptedQueryCount
The number of queries that have been accepted but not yet completed, including queries in the queue.
Integer runningQueryCount
The number of currently running openCypher queries.
List<E> queries
A list of current openCypher queries.
String mode
The statistics generation mode. One of: DISABLE_AUTOCOMPUTE, ENABLE_AUTOCOMPUTE, or
REFRESH, the last of which manually triggers DFE statistics generation.
String status
The HTTP return code of the request. If the request succeeded, the code is 200.
RefreshStatisticsIdMap payload
This is only returned for refresh mode.
String mode
The statistics generation mode. One of: DISABLE_AUTOCOMPUTE, ENABLE_AUTOCOMPUTE, or
REFRESH, the last of which manually triggers DFE statistics generation.
String status
The HTTP return code of the request. If the request succeeded, the code is 200.
RefreshStatisticsIdMap payload
This is only returned for refresh mode.
String name
The resource name.
String arn
The resource ARN.
String status
The resource status.
String outputLocation
The output location.
String failureReason
The failure reason, in case of a failure.
String cloudwatchLogUrl
The CloudWatch log URL for the resource.
Long count
Number of nodes that have this specific structure.
List<E> nodeProperties
A list of the node properties present in this specific structure.
List<E> distinctOutgoingEdgeLabels
A list of distinct outgoing edge labels present in this specific structure.
Long numNodes
The number of nodes in the graph.
Long numEdges
The number of edges in the graph.
Long numNodeLabels
The number of distinct node labels in the graph.
Long numEdgeLabels
The number of distinct edge labels in the graph.
List<E> nodeLabels
A list of the distinct node labels in the graph.
List<E> edgeLabels
A list of the distinct edge labels in the graph.
Long numNodeProperties
A list of the distinct node properties in the graph, along with the count of nodes where each property is used.
Long numEdgeProperties
The number of distinct edge properties in the graph.
List<E> nodeProperties
The number of distinct node properties in the graph.
List<E> edgeProperties
A list of the distinct edge properties in the graph, along with the count of edges where each property is used.
Long totalNodePropertyValues
The total number of usages of all node properties.
Long totalEdgePropertyValues
The total number of usages of all edge properties.
List<E> nodeStructures
This field is only present when the requested mode is DETAILED. It contains a list of node
structures.
List<E> edgeStructures
This field is only present when the requested mode is DETAILED. It contains a list of edge
structures.
String version
The version of this graph summary response.
Date lastStatisticsComputationTime
The timestamp, in ISO 8601 format, of the time at which Neptune last computed statistics.
PropertygraphSummary graphSummary
The graph summary.
String version
The version of the query language.
Long numDistinctSubjects
The number of distinct subjects in the graph.
Long numDistinctPredicates
The number of distinct predicates in the graph.
Long numQuads
The number of quads in the graph.
Long numClasses
The number of classes in the graph.
List<E> classes
A list of the classes in the graph.
List<E> predicates
"A list of predicates in the graph, along with the predicate counts.
List<E> subjectStructures
This field is only present when the request mode is DETAILED. It contains a list of subject
structures.
String version
The version of this graph summary response.
Date lastStatisticsComputationTime
The timestamp, in ISO 8601 format, of the time at which Neptune last computed statistics.
RDFGraphSummary graphSummary
The graph summary of an RDF graph. See Graph summary response for an RDF graph.
String statisticsId
The ID of the statistics generation run that is currently occurring.
Long commitTimestampInMillis
The time at which the commit for the transaction was requested, in milliseconds from the Unix epoch.
Map<K,V> eventId
The sequence identifier of the stream change record.
SparqlData data
The serialized SPARQL change record. The serialization formats of each record are described in more detail in Serialization Formats in Neptune Streams.
String op
The operation that created the change.
Boolean isLastOp
Only present if this operation is the last one in its transaction. If present, it is set to true. It is useful for ensuring that an entire transaction is consumed.
String source
The source parameter accepts an S3 URI that identifies a single file, multiple files, a folder, or
multiple folders. Neptune loads every data file in any folder that is specified.
The URI can be in any of the following formats.
s3://(bucket_name)/(object-key-name)
https://s3.amazonaws.com/(bucket_name)/(object-key-name)
https://s3.us-east-1.amazonaws.com/(bucket_name)/(object-key-name)
The object-key-name element of the URI is equivalent to the prefix parameter in an S3 ListObjects API call. It
identifies all the objects in the specified S3 bucket whose names begin with that prefix. That can be a single
file or folder, or multiple files and/or folders.
The specified folder or folders can contain multiple vertex files and multiple edge files.
String format
The format of the data. For more information about data formats for the Neptune Loader command, see
Load Data
Formats.
Allowed values
csv for the Gremlin CSV
data format.
opencypher for the openCypher
CSV data format.
ntriples for the N-Triples RDF data format.
nquads for the N-Quads RDF data format.
rdfxml for the RDF\XML RDF data
format.
turtle for the Turtle RDF data format.
String s3BucketRegion
The Amazon region of the S3 bucket. This must match the Amazon Region of the DB cluster.
String iamRoleArn
The Amazon Resource Name (ARN) for an IAM role to be assumed by the Neptune DB instance for access to the S3 bucket. The IAM role ARN provided here should be attached to the DB cluster (see Adding the IAM Role to an Amazon Neptune Cluster.
String mode
The load job mode.
Allowed values: RESUME, NEW, AUTO.
Default value: AUTO.
RESUME – In RESUME mode, the loader looks for a previous load from this source, and if it finds
one, resumes that load job. If no previous load job is found, the loader stops.
The loader avoids reloading files that were successfully loaded in a previous job. It only tries to process failed files. If you dropped previously loaded data from your Neptune cluster, that data is not reloaded in this mode. If a previous load job loaded all files from the same source successfully, nothing is reloaded, and the loader returns success.
NEW – In NEW mode, the creates a new load request regardless of any previous loads. You can use
this mode to reload all the data from a source after dropping previously loaded data from your Neptune cluster,
or to load new data available at the same source.
AUTO – In AUTO mode, the loader looks for a previous load job from the same source, and if it
finds one, resumes that job, just as in RESUME mode.
If the loader doesn't find a previous load job from the same source, it loads all data from the source, just as
in NEW mode.
Boolean failOnError
failOnError – A flag to toggle a complete stop on an error.
Allowed values: "TRUE", "FALSE".
Default value: "TRUE".
When this parameter is set to "FALSE", the loader tries to load all the data in the location
specified, skipping any entries with errors.
When this parameter is set to "TRUE", the loader stops as soon as it encounters an error. Data
loaded up to that point persists.
String parallelism
The optional parallelism parameter can be set to reduce the number of threads used by the bulk load
process.
Allowed values:
LOW – The number of threads used is the number of available vCPUs divided by 8.
MEDIUM – The number of threads used is the number of available vCPUs divided by 2.
HIGH – The number of threads used is the same as the number of available vCPUs.
OVERSUBSCRIBE – The number of threads used is the number of available vCPUs multiplied by 2. If
this value is used, the bulk loader takes up all available resources.
This does not mean, however, that the OVERSUBSCRIBE setting results in 100% CPU utilization. Because
the load operation is I/O bound, the highest CPU utilization to expect is in the 60% to 70% range.
Default value: HIGH
The parallelism setting can sometimes result in a deadlock between threads when loading openCypher
data. When this happens, Neptune returns the LOAD_DATA_DEADLOCK error. You can generally fix the
issue by setting parallelism to a lower setting and retrying the load command.
Map<K,V> parserConfiguration
parserConfiguration – An optional object with additional parser configuration values.
Each of the child parameters is also optional:
namedGraphUri – The default graph for all RDF formats when no graph is specified (for
non-quads formats and NQUAD entries with no graph).
The default is https://aws.amazon.com/neptune/vocab/v01/DefaultNamedGraph.
baseUri – The base URI for RDF/XML and Turtle formats.
The default is https://aws.amazon.com/neptune/default.
allowEmptyStrings – Gremlin users need to be able to pass empty string values("") as
node and edge properties when loading CSV data. If allowEmptyStrings is set to false
(the default), such empty strings are treated as nulls and are not loaded.
If allowEmptyStrings is set to true, the loader treats empty strings as valid property
values and loads them accordingly.
Boolean updateSingleCardinalityProperties
updateSingleCardinalityProperties is an optional parameter that controls how the bulk loader treats
a new value for single-cardinality vertex or edge properties. This is not supported for loading openCypher data.
Allowed values: "TRUE", "FALSE".
Default value: "FALSE".
By default, or when updateSingleCardinalityProperties is explicitly set to "FALSE", the
loader treats a new value as an error, because it violates single cardinality.
When updateSingleCardinalityProperties is set to "TRUE", on the other hand, the bulk
loader replaces the existing value with the new one. If multiple edge or single-cardinality vertex property
values are provided in the source file(s) being loaded, the final value at the end of the bulk load could be any
one of those new values. The loader only guarantees that the existing value has been replaced by one of the new
ones.
Boolean queueRequest
This is an optional flag parameter that indicates whether the load request can be queued up or not.
You don't have to wait for one load job to complete before issuing the next one, because Neptune can queue up as
many as 64 jobs at a time, provided that their queueRequest parameters are all set to
"TRUE". The queue order of the jobs will be first-in-first-out (FIFO).
If the queueRequest parameter is omitted or set to "FALSE", the load request will fail
if another load job is already running.
Allowed values: "TRUE", "FALSE".
Default value: "FALSE".
List<E> dependencies
This is an optional parameter that can make a queued load request contingent on the successful completion of one or more previous jobs in the queue.
Neptune can queue up as many as 64 load requests at a time, if their queueRequest parameters are set
to "TRUE". The dependencies parameter lets you make execution of such a queued request
dependent on the successful completion of one or more specified previous requests in the queue.
For example, if load Job-A and Job-B are independent of each other, but load
Job-C needs Job-A and Job-B to be finished before it begins, proceed as
follows:
Submit load-job-A and load-job-B one after another in any order, and save their
load-ids.
Submit load-job-C with the load-ids of the two jobs in its dependencies field:
Because of the dependencies parameter, the bulk loader will not start Job-C until
Job-A and Job-B have completed successfully. If either one of them fails, Job-C will
not be executed, and its status will be set to LOAD_FAILED_BECAUSE_DEPENDENCY_NOT_SATISFIED.
You can set up multiple levels of dependency in this way, so that the failure of one job will cause all requests that are directly or indirectly dependent on it to be cancelled.
Boolean userProvidedEdgeIds
This parameter is required only when loading openCypher data that contains relationship IDs. It must be included
and set to True when openCypher relationship IDs are explicitly provided in the load data
(recommended).
When userProvidedEdgeIds is absent or set to True, an :ID column must be
present in every relationship file in the load.
When userProvidedEdgeIds is present and set to False, relationship files in the load
must not contain an :ID column. Instead, the Neptune loader automatically generates an ID for
each relationship.
It's useful to provide relationship IDs explicitly so that the loader can resume loading after error in the CSV data have been fixed, without having to reload any relationships that have already been loaded. If relationship IDs have not been explicitly assigned, the loader cannot resume a failed load if any relationship file has had to be corrected, and must instead reload all the relationships.
String id
A unique identifier for the new job. The default is an autogenerated UUID.
String previousDataProcessingJobId
The job ID of a completed data processing job run on an earlier version of the data.
String inputDataS3Location
The URI of the Amazon S3 location where you want SageMaker to download the data needed to run the data processing job.
String processedDataS3Location
The URI of the Amazon S3 location where you want SageMaker to save the results of a data processing job.
String sagemakerIamRoleArn
The ARN of an IAM role for SageMaker execution. This must be listed in your DB cluster parameter group or an error will occur.
String neptuneIamRoleArn
The Amazon Resource Name (ARN) of an IAM role that SageMaker can assume to perform tasks on your behalf. This must be listed in your DB cluster parameter group or an error will occur.
String processingInstanceType
The type of ML instance used during data processing. Its memory should be large enough to hold the processed dataset. The default is the smallest ml.r5 type whose memory is ten times larger than the size of the exported graph data on disk.
Integer processingInstanceVolumeSizeInGB
The disk volume size of the processing instance. Both input data and processed data are stored on disk, so the volume size must be large enough to hold both data sets. The default is 0. If not specified or 0, Neptune ML chooses the volume size automatically based on the data size.
Integer processingTimeOutInSeconds
Timeout in seconds for the data processing job. The default is 86,400 (1 day).
String modelType
One of the two model types that Neptune ML currently supports: heterogeneous graph models (
heterogeneous), and knowledge graph (kge). The default is none. If not specified,
Neptune ML chooses the model type automatically based on the data.
String configFileName
A data specification file that describes how to load the exported graph data for training. The file is
automatically generated by the Neptune export toolkit. The default is
training-data-configuration.json.
List<E> subnets
The IDs of the subnets in the Neptune VPC. The default is None.
List<E> securityGroupIds
The VPC security group IDs. The default is None.
String volumeEncryptionKMSKey
The Amazon Key Management Service (Amazon KMS) key that SageMaker uses to encrypt data on the storage volume attached to the ML compute instances that run the training job. The default is None.
String s3OutputEncryptionKMSKey
The Amazon Key Management Service (Amazon KMS) key that SageMaker uses to encrypt the output of the processing job. The default is none.
String id
A unique identifier for the new job. The default is An autogenerated UUID.
String previousModelTrainingJobId
The job ID of a completed model-training job that you want to update incrementally based on updated data.
String dataProcessingJobId
The job ID of the completed data-processing job that has created the data that the training will work with.
String trainModelS3Location
The location in Amazon S3 where the model artifacts are to be stored.
String sagemakerIamRoleArn
The ARN of an IAM role for SageMaker execution.This must be listed in your DB cluster parameter group or an error will occur.
String neptuneIamRoleArn
The ARN of an IAM role that provides Neptune access to SageMaker and Amazon S3 resources. This must be listed in your DB cluster parameter group or an error will occur.
String baseProcessingInstanceType
The type of ML instance used in preparing and managing training of ML models. This is a CPU instance chosen based on memory requirements for processing the training data and model.
String trainingInstanceType
The type of ML instance used for model training. All Neptune ML models support CPU, GPU, and multiGPU training.
The default is ml.p3.2xlarge. Choosing the right instance type for training depends on the task
type, graph size, and your budget.
Integer trainingInstanceVolumeSizeInGB
The disk volume size of the training instance. Both input data and the output model are stored on disk, so the volume size must be large enough to hold both data sets. The default is 0. If not specified or 0, Neptune ML selects a disk volume size based on the recommendation generated in the data processing step.
Integer trainingTimeOutInSeconds
Timeout in seconds for the training job. The default is 86,400 (1 day).
Integer maxHPONumberOfTrainingJobs
Maximum total number of training jobs to start for the hyperparameter tuning job. The default is 2. Neptune ML
automatically tunes the hyperparameters of the machine learning model. To obtain a model that performs well, use
at least 10 jobs (in other words, set maxHPONumberOfTrainingJobs to 10). In general, the more tuning
runs, the better the results.
Integer maxHPOParallelTrainingJobs
Maximum number of parallel training jobs to start for the hyperparameter tuning job. The default is 2. The number of parallel jobs you can run is limited by the available resources on your training instance.
List<E> subnets
The IDs of the subnets in the Neptune VPC. The default is None.
List<E> securityGroupIds
The VPC security group IDs. The default is None.
String volumeEncryptionKMSKey
The Amazon Key Management Service (KMS) key that SageMaker uses to encrypt data on the storage volume attached to the ML compute instances that run the training job. The default is None.
String s3OutputEncryptionKMSKey
The Amazon Key Management Service (KMS) key that SageMaker uses to encrypt the output of the processing job. The default is none.
Boolean enableManagedSpotTraining
Optimizes the cost of training machine-learning models by using Amazon Elastic Compute Cloud spot instances. The
default is False.
CustomModelTrainingParameters customModelTrainingParameters
The configuration for custom model training. This is a JSON object.
String id
A unique identifier for the new job. The default is an autogenerated UUID.
String dataProcessingJobId
The job ID of a completed data-processing job. You must include either dataProcessingJobId and a
mlModelTrainingJobId, or a trainingJobName.
String mlModelTrainingJobId
The job ID of a completed model-training job. You must include either dataProcessingJobId and a
mlModelTrainingJobId, or a trainingJobName.
String trainingJobName
The name of a completed SageMaker training job. You must include either dataProcessingJobId and a
mlModelTrainingJobId, or a trainingJobName.
String modelTransformOutputS3Location
The location in Amazon S3 where the model artifacts are to be stored.
String sagemakerIamRoleArn
The ARN of an IAM role for SageMaker execution. This must be listed in your DB cluster parameter group or an error will occur.
String neptuneIamRoleArn
The ARN of an IAM role that provides Neptune access to SageMaker and Amazon S3 resources. This must be listed in your DB cluster parameter group or an error will occur.
CustomModelTransformParameters customModelTransformParameters
Configuration information for a model transform using a custom model. The
customModelTransformParameters object contains the following fields, which must have values
compatible with the saved model parameters from the training job:
String baseProcessingInstanceType
The type of ML instance used in preparing and managing training of ML models. This is an ML compute instance chosen based on memory requirements for processing the training data and model.
Integer baseProcessingInstanceVolumeSizeInGB
The disk volume size of the training instance in gigabytes. The default is 0. Both input data and the output model are stored on disk, so the volume size must be large enough to hold both data sets. If not specified or 0, Neptune ML selects a disk volume size based on the recommendation generated in the data processing step.
List<E> subnets
The IDs of the subnets in the Neptune VPC. The default is None.
List<E> securityGroupIds
The VPC security group IDs. The default is None.
String volumeEncryptionKMSKey
The Amazon Key Management Service (KMS) key that SageMaker uses to encrypt data on the storage volume attached to the ML compute instances that run the training job. The default is None.
String s3OutputEncryptionKMSKey
The Amazon Key Management Service (KMS) key that SageMaker uses to encrypt the output of the processing job. The default is none.
Boolean autoCompute
Indicates whether or not automatic statistics generation is enabled.
Boolean active
Indicates whether or not DFE statistics generation is enabled at all.
String statisticsId
Reports the ID of the current statistics generation run. A value of -1 indicates that no statistics have been generated.
Date date
The UTC time at which DFE statistics have most recently been generated.
String note
A note about problems in the case where statistics are invalid.
StatisticsSummary signatureInfo
A StatisticsSummary structure that contains:
signatureCount - The total number of signatures across all characteristic sets.
instanceCount - The total number of characteristic-set instances.
predicateCount - The total number of unique predicates.
Copyright © 2025. All rights reserved.