Package org.cis1200

Class CSV

java.lang.Object
org.cis1200.CSV

public class CSV extends Object
Operations for working with CSV data. For our purposes, CSV data is a series of text lines, each of which is considered to be a record consisting of fields separated by the comma ',' character. (Some variants of CSV allow for multi-line data representations, but we disregard that possibility here.) For example, the file files/illustrative_example.csv contains two CSV records, one on each line of the file:
 col0, col1, a table and a chair
 cola, colb, a banana! and a banana?
 
Each of the records in this example contains three fields, but there is no requirement that each record have the same number of fields.

There is one subtlety to parsing CSV records: to allow for the possibility of a field that contains the ',' character itself, CSV treats the double quote character '"' specially. You can quote a field that contains commas.

For example, the following line has two fields, the first of which has a quoted comma:

 "this , is quoted",but there are none in this field
 
  • Constructor Details

    • CSV

      public CSV()
  • Method Details

    • parseRecord

      public static List<String> parseRecord(String csvLine)
      Parses one line of a CSV file as a record of fields separated by commas. Returns the sequence of fields as a list of Strings.

      The parser maintains a boolean state: whether it is in quotation mode.

      To process the csvLine, the parser scans through it character by character (the toCharArray() method might be useful here), accumulating the current field.

      We recommend using a StringBuilder to accumulate the String for each field. You can "empty" a StringBuilder by instantiating a new one with new StringBuilder().

      If the current character is not DOUBLE_QUOTES:

      • if the character is a COMMA and the parser isn't in quotation mode, then the COMMA ends the current field. The field is then added to the results list and the parser continues processing the next field (this COMMA is not part of any field).
      • otherwise, append the character to the current field, and continue.

      If the current character is a DOUBLE_QUOTE, flip the current quotation mode state. (If the parser is in quotation mode, flip it to not be in quotation mode, and vice versa.) This is because if a DOUBLE_QUOTE is seen, we are either starting or ending a quotation.

      Finally, no matter what state the parser is in, when it reaches the end of the line, it adds whatever partial field has been accumulated to the result list.

      Kudos: Parsing quoted quotes

      Aside from quoting commas, it is also possible to quote quotes by writing a pair of quotes next to each other (somewhere inside quotes). (This is analogous to how, for Java, one must use a backslash to allow quotation marks inside a String literal as in: "\"".)

      For example, the following line has a quoted field that includes quoted quotes:

       "He said ""hello""!"
       

      For more examples, see the tests written for this kudos problem in CSVKudosTest and the file files/quotes_and_commas.csv.

      To parse a CSV line containing quotes in quotes, you will have to maintain an extra state: whether the last character seen was an unpaired DOUBLE_QUOTES character. You should also update the states according to the current character.

      Parameters:
      csvLine - The line of text data to be treated as a CSV record
      Returns:
      The sequence of fields of the CSV record as a list of Strings.
    • extractColumn

      static String extractColumn(String csvLine, int csvColumn)
      Given a String that represents a CSV line and an int column index, returns the contents of that column. Columns in the buffered reader are zero indexed.
      Parameters:
      csvLine - the String containing the CSV record
      csvColumn - the column index of the CSV field whose contents ought to be returned
      Returns:
      the field of csvLine corresponding to csvColumn
      Throws:
      IllegalArgumentException - if csvLine is null
      IndexOutOfBoundsException - if csvColumn is not a valid field index of the record
    • csvFieldsAtColumn

      static List<String> csvFieldsAtColumn(BufferedReader br, int csvColumn)
      Given a BufferedReader of CSV data and a column index, returns the list of all CSV fields appearing in that column.

      If a line has no field at the given index, it is skipped.

      If the line has a field at the given index, it should be returned as an element of the list.

      Parameters:
      br - - a BufferedReader that represents tweets
      csvColumn - - the index of the column in the CSV data
      Returns:
      a List of CSV fields (none of which is null)