Class TweetParser
java.lang.Object
TweetParser
public class TweetParser
extends java.lang.Object
TweetParser.csvFileToTrainingData() takes in a CSV file that contains tweets
and iterates through the file, one tweet at a time, removing parts of the
tweets that would be bad inputs to MarkovChain (for example, a URL). It then
parses tweets into sentences and returns those sentences as lists of
cleaned-up words.
Note: TweetParser's public methods are csvFileToTrainingData() and
getPunctuation(). These are the only methods that other classes should call. All
of the other methods provided are helper methods that build up the code
you'll need to write those public methods. They have "package" (default, no
modifier) visibility, which lets us write test cases for them as long as
those test cases are in the same package.
-
Constructor Summary
Constructors Constructor Description TweetParser()
-
Method Summary
Modifier and Type Method Description static java.util.List<java.util.List<java.lang.String>>
csvFileToTrainingData(java.lang.String pathToCSVFile, int tweetColumn)
Given a path to a CSV file and the column from which to extract the tweet data, computes a training set.static char[]
getPunctuation()
-
Constructor Details
-
TweetParser
public TweetParser()
-
-
Method Details
-
getPunctuation
public static char[] getPunctuation()- Returns:
- an array containing the punctuation marks used by the parser.
-
csvFileToTrainingData
public static java.util.List<java.util.List<java.lang.String>> csvFileToTrainingData(java.lang.String pathToCSVFile, int tweetColumn)Given a path to a CSV file and the column from which to extract the tweet data, computes a training set. The training set is a list of sentences, each of which is a list of words. The sentences have been cleaned up by removing URLs and non-word characters, putting all words into lower case, and stripping out punctuation.- Parameters:
pathToCSVFile
- - a String representing a path to a CSV file containing tweetstweetColumn
- - the number of the column in the CSV file that contains the tweet- Returns:
- a list of training data examples
- Throws:
java.lang.IllegalArgumentException
- if pathToCSVFile is null or if the file doesn't exist
-