com.twitter
Class Extractor

java.lang.Object
  extended by com.twitter.Extractor

public class Extractor
extends Object

A class to extract usernames, lists, hashtags and URLs from Tweet text.


Nested Class Summary
static class Extractor.Entity
           
 
Field Summary
protected  boolean extractURLWithoutProtocol
           
 
Constructor Summary
Extractor()
          Create a new extractor.
 
Method Summary
 List<String> extractCashtags(String text)
          Extract $cashtag references from Tweet text.
 List<Extractor.Entity> extractCashtagsWithIndices(String text)
          Extract $cashtag references from Tweet text.
 List<Extractor.Entity> extractEntitiesWithIndices(String text)
          Extract URLs, @mentions, lists and #hashtag from a given text/tweet.
 List<String> extractHashtags(String text)
          Extract #hashtag references from Tweet text.
 List<Extractor.Entity> extractHashtagsWithIndices(String text)
          Extract #hashtag references from Tweet text.
 List<String> extractMentionedScreennames(String text)
          Extract @username references from Tweet text.
 List<Extractor.Entity> extractMentionedScreennamesWithIndices(String text)
          Extract @username references from Tweet text.
 List<Extractor.Entity> extractMentionsOrListsWithIndices(String text)
           
 String extractReplyScreenname(String text)
          Extract a @username reference from the beginning of Tweet text.
 List<String> extractURLs(String text)
          Extract URL references from Tweet text.
 List<Extractor.Entity> extractURLsWithIndices(String text)
          Extract URL references from Tweet text.
 boolean isExtractURLWithoutProtocol()
           
 void modifyIndicesFromUnicodeToUTF16(String text, List<Extractor.Entity> entities)
           
 void modifyIndicesFromUTF16ToToUnicode(String text, List<Extractor.Entity> entities)
           
 void setExtractURLWithoutProtocol(boolean extractURLWithoutProtocol)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

extractURLWithoutProtocol

protected boolean extractURLWithoutProtocol
Constructor Detail

Extractor

public Extractor()
Create a new extractor.

Method Detail

extractEntitiesWithIndices

public List<Extractor.Entity> extractEntitiesWithIndices(String text)
Extract URLs, @mentions, lists and #hashtag from a given text/tweet.

Parameters:
text - text of tweet
Returns:
list of extracted entities

extractMentionedScreennames

public List<String> extractMentionedScreennames(String text)
Extract @username references from Tweet text. A mention is an occurance of @username anywhere in a Tweet.

Parameters:
text - of the tweet from which to extract usernames
Returns:
List of usernames referenced (without the leading @ sign)

extractMentionedScreennamesWithIndices

public List<Extractor.Entity> extractMentionedScreennamesWithIndices(String text)
Extract @username references from Tweet text. A mention is an occurance of @username anywhere in a Tweet.

Parameters:
text - of the tweet from which to extract usernames
Returns:
List of usernames referenced (without the leading @ sign)

extractMentionsOrListsWithIndices

public List<Extractor.Entity> extractMentionsOrListsWithIndices(String text)

extractReplyScreenname

public String extractReplyScreenname(String text)
Extract a @username reference from the beginning of Tweet text. A reply is an occurance of @username at the beginning of a Tweet, preceded by 0 or more spaces.

Parameters:
text - of the tweet from which to extract the replied to username
Returns:
username referenced, if any (without the leading @ sign). Returns null if this is not a reply.

extractURLs

public List<String> extractURLs(String text)
Extract URL references from Tweet text.

Parameters:
text - of the tweet from which to extract URLs
Returns:
List of URLs referenced.

extractURLsWithIndices

public List<Extractor.Entity> extractURLsWithIndices(String text)
Extract URL references from Tweet text.

Parameters:
text - of the tweet from which to extract URLs
Returns:
List of URLs referenced.

extractHashtags

public List<String> extractHashtags(String text)
Extract #hashtag references from Tweet text.

Parameters:
text - of the tweet from which to extract hashtags
Returns:
List of hashtags referenced (without the leading # sign)

extractHashtagsWithIndices

public List<Extractor.Entity> extractHashtagsWithIndices(String text)
Extract #hashtag references from Tweet text.

Parameters:
text - of the tweet from which to extract hashtags
Returns:
List of hashtags referenced (without the leading # sign)

extractCashtags

public List<String> extractCashtags(String text)
Extract $cashtag references from Tweet text.

Parameters:
text - of the tweet from which to extract cashtags
Returns:
List of cashtags referenced (without the leading $ sign)

extractCashtagsWithIndices

public List<Extractor.Entity> extractCashtagsWithIndices(String text)
Extract $cashtag references from Tweet text.

Parameters:
text - of the tweet from which to extract cashtags
Returns:
List of cashtags referenced (without the leading $ sign)

setExtractURLWithoutProtocol

public void setExtractURLWithoutProtocol(boolean extractURLWithoutProtocol)

isExtractURLWithoutProtocol

public boolean isExtractURLWithoutProtocol()

modifyIndicesFromUnicodeToUTF16

public void modifyIndicesFromUnicodeToUTF16(String text,
                                            List<Extractor.Entity> entities)

modifyIndicesFromUTF16ToToUnicode

public void modifyIndicesFromUTF16ToToUnicode(String text,
                                              List<Extractor.Entity> entities)


Copyright © 2014. All Rights Reserved.