Class TweetParser

java.lang.Object
TweetParser

public class TweetParser
extends java.lang.Object
TweetParser.csvFileToTrainingData() takes in a CSV file that contains tweets and iterates through the file, one tweet at a time, removing parts of the tweets that would be bad inputs to MarkovChain (for example, a URL). It then parses tweets into sentences and returns those sentences as lists of cleaned-up words. Note: TweetParser's public methods are csvFileToTrainingData() and getPunctuation(). These are the only methods that other classes should call. All of the other methods provided are helper methods that build up the code you'll need to write those public methods. They have "package" (default, no modifier) visibility, which lets us write test cases for them as long as those test cases are in the same package.
  • Constructor Summary

    Constructors 
    Constructor Description
    TweetParser()  
  • Method Summary

    Modifier and Type Method Description
    static java.util.List<java.util.List<java.lang.String>> csvFileToTrainingData​(java.lang.String pathToCSVFile, int tweetColumn)
    Given a path to a CSV file and the column from which to extract the tweet data, computes a training set.
    static char[] getPunctuation()  

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Constructor Details

  • Method Details

    • getPunctuation

      public static char[] getPunctuation()
      Returns:
      an array containing the punctuation marks used by the parser.
    • csvFileToTrainingData

      public static java.util.List<java.util.List<java.lang.String>> csvFileToTrainingData​(java.lang.String pathToCSVFile, int tweetColumn)
      Given a path to a CSV file and the column from which to extract the tweet data, computes a training set. The training set is a list of sentences, each of which is a list of words. The sentences have been cleaned up by removing URLs and non-word characters, putting all words into lower case, and stripping out punctuation.
      Parameters:
      pathToCSVFile - - a String representing a path to a CSV file containing tweets
      tweetColumn - - the number of the column in the CSV file that contains the tweet
      Returns:
      a list of training data examples
      Throws:
      java.lang.IllegalArgumentException - if pathToCSVFile is null or if the file doesn't exist