Packages

package text

Ordering
  1. Alphabetic
Visibility
  1. Public
  2. All

Type Members

  1. class Dictionary extends Serializable

    Class that help build a dictionary either from tokenized text or from saved dictionary

  2. class LabeledSentence[T] extends Sentence[T]

    Represent a sentence

  3. class LabeledSentenceToSample[T] extends Transformer[LabeledSentence[T], Sample[T]]

    if oneHot = true: Transform labeled sentences to one-hot format samples e.g.

    if oneHot = true: Transform labeled sentences to one-hot format samples e.g. sentence._data: [0, 2, 3] sentence._label: [2, 3, 1] vocabLength: 4

    > input: 0, 0, 0], [0, 0, 1, 0], [0, 0, 0, 1 target: [3, 4, 2]

    else: The model will use LookupTable for word embedding.

    > input: [1, 2, 3]

    > label: [2, 3, 4] The input is an iterator of LabeledSentence class The output is an iterator of Sample class

  4. class SentenceBiPadding extends Transformer[String, String]

    x => ["start", x, "end"]

  5. class SentenceSplitter extends Transformer[String, Array[String]]

    Input a sequence of string, cut it into sentences.

    Input a sequence of string, cut it into sentences. The sentenceDetector is an API from OpenNLP. If sentFile is None, the default sentence delimiter is period.

  6. class SentenceTokenizer extends Transformer[String, Array[String]]

    Transformer that tokenizes a Document (article) into a Seq[Seq[String]]

  7. class TextToLabeledSentence[T] extends Transformer[Array[String], LabeledSentence[T]]

    Transform a string of sentence to LabeledSentence.

    Transform a string of sentence to LabeledSentence. e.g. ["I", "love", "Intel"] => [0, 1, 2] data: [0, 1] label: [1, 2]

    The input Array[String] should be a tokenized sentence. e.g. I love Intel => ["I", "love", "Intel"]

Value Members

  1. object Dictionary extends Serializable
  2. object LabeledSentenceToSample extends Serializable
  3. object SentenceBiPadding extends Serializable
  4. object SentenceSplitter extends Serializable
  5. object SentenceTokenizer extends Serializable
  6. object TextToLabeledSentence extends Serializable

Ungrouped