Class MarkovChain
java.lang.Object
MarkovChain
- All Implemented Interfaces:
java.util.Iterator<java.lang.String>
public class MarkovChain
extends java.lang.Object
implements java.util.Iterator<java.lang.String>
A Markov Chain is a data structure that tracks the frequency with which one
value follows another value in sequence. This project uses a MarkovChain to
model tweets by gathering the frequency information from a Twitter feed. We
can use the MarkovChain to generate "plausible" tweets by conducting a random
walk through the chain according to the frequencies. Please see the homework
instructions for more information on Markov Chains.
TRAINING:
An example: Suppose we train the MarkovChain on these two Strings that
represent tweets: "a table" and "A banana? A banana!"
We first "clean up" the tweets and parse them into individual sentences to
use as training data. This process will remove punctuation and put all words
into lower case, yielding these three sentences (written using OCaml list
notation):
[["a"; "table"]; ["a"; "banana"]; ["a"; "banana"]]
The MarkovChain that results from this training data maps each observed
string to a ProbabilityDistribution that is based on the recorded occurrences
of bigrams (adjacent words) in the data:
- "a" maps to "table":1, "banana":2
- "table" maps to null:1
- "banana" maps to null:2
"a" is followed by "table" one time and "banana" twice, "table" is the end of
a sentence once, and "banana" is the end of a sentence twice. NOTE: we use
null to mark the end of a sentence.
The MarkovChain also records a ProbabilityDistribution that contains the
frequencies with which words start any sentence. In this case, that
startWords data will just say that "a" started 3 sentences.
GENERATING A TWEET:
Once we have trained the Markov model, we can use it to generate a tweet.
Given a desired length of tweet (in characters), we repeatedly generate
sentences until the tweet is long enough.
To generate a sentence, we treat the MarkovChain as an iterator that
maintains state about the current word (i.e. the one that will be generated
by next()).
- the reset() method picks (at random) one of the startWords to be the
current word. We use reset() to start a new sentence.
- the next() method picks (at random) a successor of the current word
according to the current word's probability distribution. That successor will
be the new "current" word after the current one is returned by next().
In the example above, reset() sets the current word to "a" (the only choice
offered by startWord). Then: next(); // yields "a" (the start word) with
probability 3/3 next(); // yields "table" with probability 1/3 and "banana"
with probability "2/3" then the iterator is finished (the current word will
be null), since both "table" and "banana" appeared only at the end of
sentences.
The random choices are determined by a NumberGenerator.
-
Constructor Summary
Constructors Constructor Description MarkovChain()
No need to write any constructors.MarkovChain(NumberGenerator ng)
No need to write any constructors. -
Method Summary
Modifier and Type Method Description boolean
hasNext()
This method should check if there is another word to retrieve from the Markov Chain based on the current word of our walk.java.lang.String
next()
Returns either: (1) the most recent word passed to reset(word), or (2) a successor picked from the probability distribution associated with the word most recently returned by next()void
reset()
DO NOT EDIT THIS METHOD.void
reset(java.lang.String start)
Given a starting String, sets up the Iterator functionality such that: (1) the Markov Chain will begin a walk at start.void
train(java.util.Iterator<java.lang.String> sentence)
Adds a sentence's training data to the MarkovChain frequency information.
-
Constructor Details
-
MarkovChain
public MarkovChain()No need to write any constructors. They are provided for you. -
MarkovChain
No need to write any constructors. They are provided for you.- Parameters:
ng
- - A (non-null) NumberGenerator used to walk through the MarkovChain
-
-
Method Details
-
train
public void train(java.util.Iterator<java.lang.String> sentence)Adds a sentence's training data to the MarkovChain frequency information. This method is meant to be called multiple times. Each call to this method should provide this method with an Iterator that represents one sentence. If we were to train a Markov Chain with four unique sentences, we would convert each sentence into an iterator and call train() four times, once on each of the iterators. Look at the homework instructions for more details on bigrams. You should use the addBigram() method, provided below, in this method. Once you reach the last word of a sentence, add a bigram of that word and null. This will teach the Markov Chain that that word can be used to end a sentence. Do nothing if the sentence is empty.- Parameters:
sentence
- - an iterator representing one sentence of training data- Throws:
java.lang.IllegalArgumentException
- if the sentence Iterator is null
-
reset
public void reset(java.lang.String start)Given a starting String, sets up the Iterator functionality such that: (1) the Markov Chain will begin a walk at start. (2) the first call to next() made after calling reset(start) will return start. If start is null, then hasNext() should return false. start need not actually be a part of the chain (but it should have no successor).- Parameters:
start
- - the element that will be the first word in a walk on the Markov Chain.
-
reset
public void reset()DO NOT EDIT THIS METHOD. WE COMPLETED IT FOR YOU. Sets up the Iterator functionality with a random start word such that the MarkovChain will now move along a walk beginning with that start word. The first call to next() after calling reset() will return the random start word selected by this call to reset(). -
hasNext
public boolean hasNext()This method should check if there is another word to retrieve from the Markov Chain based on the current word of our walk. Your solution should be very short.- Specified by:
hasNext
in interfacejava.util.Iterator<java.lang.String>
- Returns:
- true if next() will return a String and false otherwise
-
next
public java.lang.String next()Returns either: (1) the most recent word passed to reset(word), or (2) a successor picked from the probability distribution associated with the word most recently returned by next()- Specified by:
next
in interfacejava.util.Iterator<java.lang.String>
- Returns:
- the next word in the MarkovChain (chosen at random via the number generator if it is a successor)
- Throws:
java.util.NoSuchElementException
- if there are no more words on the walk through the chain.
-