CIS 1200

TwitterBot FAQ

Setup and Preparation

How do I set up my file system in IntelliJ?

Take a look at our IntelliJ Set-Up Guide and feel free to come to Office Hours if you run into any issues!

General

How do I use an iterator with a list?

You can make an iterator by calling yourListName.iterator(). An example is below:

List<String> tweets;

Iterator<String> iter = tweets.iterator();

Now, you are able to call iter.hasNext() to determine if there’s another variable left in the list and also iter.next() to get the next value in the list.

Take a look at the iterator interface to see all methods.

Task 0: Understanding the Problem

What is a Markov Chain?

A Markov Chain is a specific type of model that describes a sequence or mapping of possible events.  You can think of a Markov Chain as a bunch of nodes and arrows, where at any given node X, you have some probability of moving to another node Y.  For a more in-depth explanation about how the Markov Chain algorithm works, check out this video made by one of your TA’s (Bayley Tuch!!)

What is a CSV file?

A CSV (short for Comma Separated Value) file is a common way of storing and representing data.

line1_value1, line1_value2, line1_valueN
line1_value1, line1_value2, line1_valueN
lineN_value1, lineN_value2, lineN_valueN

In other words, the data will look relatively tabular.  Think of it like an Excel spreadsheet or Google Sheets where instead of separating elements by cells, you’re separating them by commas and lines.  In this assignment, you’ll be working with CSV files and figuring out how to extract tweets and strings from each column.

Task 1: Testing

What exactly are we testing for each of the files?

Similar to previous assignments, you should have a thorough set of test cases.  

This includes (but is not limited to) normal cases, invalid inputs, longer cases, and other edge cases.  A great way to determine what test cases are needed for each file/method is to read the comments above each file/method.  They’ll usually give you a good hint as to what edge cases you’ll need to consider (i.e. null inputs, exceptions).

Take a look at our testing guide for a more detailed explanation and examples.

Do we have to test for an input of train being "" or " “?

You don’t need to handle those strings in any specific way.

What is the purpose of Collections.nCopies(100, 0) and indices.set(0, 1)?

This creates a list that looks like [1, 0, 0, 0, 0, 0, 0, 0, ...], which will choose the second (index 1) startWord alphabetically, and every other word as just the single word that follows the previous one. In more complicated cases, you will have more options than just index 0 when deciding which word should follow the current word in the sentence.

Task 2: FileLineIterator

What is a FileLineIterator? What are we supposed to implement here?

Recall the Iterator interface from lecture. An iterator allows us to process data (usually in the form of collections) step by step. In this file, we’re looking for you to implement an iterator that, given a buffered reader, will process line by line.

As specified in the documentation, every Iterator needs implementations for next and hasNext, so you should start there.

I’m getting an IOException when I close the reader?

This is normal, hasNext() has still been set to false.

What Exceptions should we throw?

The JavaDocs specify that if the file path leads to a nonexistent file or the BufferedReader cannot find the correct file, you should throw an IllegalArgumentException.

How do we deal with an IOException when reading the file?

The JavaDocs specify that if there’s an IOException, don’t throw another exception - just set the next to null (i.e. hasNext() should return false).

When I reach the end of a file, will it throw an IOException?

No, this will set readLine() to null.

How should I handle tweets that have line breaks?

You can just ignore any text that occurs after a line break – just read the beginning of the tweet.

Task 3: TweetParser

How should I approach this task?

Follow the order of the methods!  In this file, each of the methods will progressively build (and reference) previously implemented methods.  First take a look at the given methods and their implementations.  Then, when approaching each of the next few methods, read through the JavaDoc comments to understand what exactly should be done for various edge cases.

Can there be commas within a tweet?

No - you can assume that a single tweet will not have commas in it and that commas are only used to separate tweets/lines.

Task 4: MarkovChain

Will I get an unexpected output in ListNumberGenerator?

If train() is called, nothing should happen besides setting up your iterator. If there’s an unnecessary call to reset() or pick() in train(), this will result in numbers being used up more quickly than expected.

How can I get a boolean value that tests whether or not a word is at the beginning of a new sentence?

You want to check if the word is in the collection of start words.

When should we return null vs NoSuchElementException in next()?

You should be able to tell if there is no more data in the file when you reach the end, without encountering an IOException. In those situations, you should throw a NoSuchElementException

If you encounter an IOException, but you have a previous line prepared to return, you should return whatever you were going to return, then return null for the call immediately afterwards. Calls after this one won’t be tested.

Why do we allow start words that are not part of the chain in reset(String start)?

The user may not know (or care) whether the word is already in the training data, and this feature lets them try out the word and see whether it has any successors.  The walk / iteration won’t go anywhere if that start word doesn’t have any successors from the training data - it’ll terminate as soon as that start word is produced by next, producing a sentence of one word.

What does it mean to “Do nothing if the sentence is empty” in train()?

An empty sentence should be represented by an empty iterator – and not one with a single empty string as an element.

Task 5: TwitterBot

ArrayOutOfBoundsException when trying to run the main method in TwitterBot?

Do you correctly handle out of bounds column inputs in ExtractColumn?

An invalid upper bound would be any bound that is equal to or exceeds the length of the number of columns in the csvLine.

An invalid lower bound would be a negative bound.

OutOfMemoryException?

Make sure you call close() on all of your readers and writers after you’re done reading / writing to them.

NullPointerException?

You may be passing a null string to removeURLs when you call on removeURLs in parseAndCleanTweet().

How does generateTweet handle edge cases for length?

Negative inputs should throw an IllegalArgumentException and lengths of 0 should return an empty string. All tweets should be punctuated at the end.

What should parseAndCleanTweet return if given an empty tweet?

It should return an empty list and not a list containing an empty list. The latter case is a list of length one whose single element is an empty list; it is not an empty list.

Make sure that you are not including unnecessary words and/or empty “sentences” in your accumulated List<List<String>>.

Running/Submitting

When I try to run my TwitterBot I get a NullPointerException?

Remember that in TweetParser, you have methods like extractColumn and cleanWord that would return null. Remember to keep those out of your training data!

“compilation failed” error when submitting?

Make sure you’re not throwing any exceptions other than IllegalArgumentException anywhere in your code.

Also, make sure you surround every line that can throw an IOException (or one of its subclasses) with a try-catch.

“First failure is Invalid File Path/File Path Null” error when submitting?

This is saying you’re supposed to throw an IllegalArgumentException if the filepath is null or doesn’t exist, but your code does not.