Class SemanticChunkingConfiguration
- java.lang.Object
-
- software.amazon.awssdk.services.bedrockagent.model.SemanticChunkingConfiguration
-
- All Implemented Interfaces:
Serializable,SdkPojo,ToCopyableBuilder<SemanticChunkingConfiguration.Builder,SemanticChunkingConfiguration>
@Generated("software.amazon.awssdk:codegen") public final class SemanticChunkingConfiguration extends Object implements SdkPojo, Serializable, ToCopyableBuilder<SemanticChunkingConfiguration.Builder,SemanticChunkingConfiguration>
Settings for semantic document chunking for a data source. Semantic chunking splits a document into into smaller documents based on groups of similar content derived from the text with natural language processing.
With semantic chunking, each sentence is compared to the next to determine how similar they are. You specify a threshold in the form of a percentile, where adjacent sentences that are less similar than that percentage of sentence pairs are divided into separate chunks. For example, if you set the threshold to 90, then the 10 percent of sentence pairs that are least similar are split. So if you have 101 sentences, 100 sentence pairs are compared, and the 10 with the least similarity are split, creating 11 chunks. These chunks are further split if they exceed the max token size.
You must also specify a buffer size, which determines whether sentences are compared in isolation, or within a moving context window that includes the previous and following sentence. For example, if you set the buffer size to
1, the embedding for sentence 10 is derived from sentences 9, 10, and 11 combined.- See Also:
- Serialized Form
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static interfaceSemanticChunkingConfiguration.Builder
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description IntegerbreakpointPercentileThreshold()The dissimilarity threshold for splitting chunks.IntegerbufferSize()The buffer size.static SemanticChunkingConfiguration.Builderbuilder()booleanequals(Object obj)booleanequalsBySdkFields(Object obj)<T> Optional<T>getValueForField(String fieldName, Class<T> clazz)inthashCode()IntegermaxTokens()The maximum number of tokens that a chunk can contain.List<SdkField<?>>sdkFields()static Class<? extends SemanticChunkingConfiguration.Builder>serializableBuilderClass()SemanticChunkingConfiguration.BuildertoBuilder()StringtoString()Returns a string representation of this object.-
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
-
Methods inherited from interface software.amazon.awssdk.utils.builder.ToCopyableBuilder
copy
-
-
-
-
Method Detail
-
breakpointPercentileThreshold
public final Integer breakpointPercentileThreshold()
The dissimilarity threshold for splitting chunks.
- Returns:
- The dissimilarity threshold for splitting chunks.
-
bufferSize
public final Integer bufferSize()
The buffer size.
- Returns:
- The buffer size.
-
maxTokens
public final Integer maxTokens()
The maximum number of tokens that a chunk can contain.
- Returns:
- The maximum number of tokens that a chunk can contain.
-
toBuilder
public SemanticChunkingConfiguration.Builder toBuilder()
- Specified by:
toBuilderin interfaceToCopyableBuilder<SemanticChunkingConfiguration.Builder,SemanticChunkingConfiguration>
-
builder
public static SemanticChunkingConfiguration.Builder builder()
-
serializableBuilderClass
public static Class<? extends SemanticChunkingConfiguration.Builder> serializableBuilderClass()
-
equalsBySdkFields
public final boolean equalsBySdkFields(Object obj)
- Specified by:
equalsBySdkFieldsin interfaceSdkPojo
-
toString
public final String toString()
Returns a string representation of this object. This is useful for testing and debugging. Sensitive data will be redacted from this string using a placeholder value.
-
-