Class CSV
files/illustrative_example.csv
contains two
CSV records, one on each line of the file:
col0, col1, a table and a chair cola, colb, a banana! and a banana?Each of the records in this example contains three fields, but there is no requirement that each record have the same number of fields.
There is one subtlety to parsing CSV records: to allow for the possibility of a field that contains the ',' character itself, CSV treats the double quote character '"' specially. You can quote a field that contains commas.
For example, the following line has two fields, the first of which has a quoted comma:
"this , is quoted",but there are none in this field
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptioncsvFieldsAtColumn
(BufferedReader br, int csvColumn) Given aBufferedReader
of CSV data and a column index, returns the list of all CSV fields appearing in that column.(package private) static String
extractColumn
(String csvLine, int csvColumn) Given aString
that represents a CSV line and anint
column index, returns the contents of that column.parseRecord
(String csvLine) Parses one line of a CSV file as a record of fields separated by commas.
-
Constructor Details
-
CSV
public CSV()
-
-
Method Details
-
parseRecord
Parses one line of a CSV file as a record of fields separated by commas. Returns the sequence of fields as a list ofString
s.The parser maintains a boolean state: whether it is in quotation mode.
To process the
csvLine
, the parser scans through it character by character (thetoCharArray()
method might be useful here), accumulating the current field.We recommend using a
StringBuilder
to accumulate theString
for each field. You can "empty" aStringBuilder
by instantiating a new one withnew StringBuilder()
.If the current character is not DOUBLE_QUOTES:
- if the character is a COMMA and the parser isn't in quotation mode, then the COMMA ends the current field. The field is then added to the results list and the parser continues processing the next field (this COMMA is not part of any field).
- otherwise, append the character to the current field, and continue.
If the current character is a DOUBLE_QUOTE, flip the current quotation mode state. (If the parser is in quotation mode, flip it to not be in quotation mode, and vice versa.) This is because if a DOUBLE_QUOTE is seen, we are either starting or ending a quotation.
Finally, no matter what state the parser is in, when it reaches the end of the line, it adds whatever partial field has been accumulated to the result list.
Kudos: Parsing quoted quotes
Aside from quoting commas, it is also possible to quote quotes by writing a pair of quotes next to each other (somewhere inside quotes). (This is analogous to how, for Java, one must use a backslash to allow quotation marks inside a
String
literal as in: "\"".)For example, the following line has a quoted field that includes quoted quotes:
"He said ""hello""!"
For more examples, see the tests written for this kudos problem in
CSVKudosTest
and the filefiles/quotes_and_commas.csv
.To parse a CSV line containing quotes in quotes, you will have to maintain an extra state: whether the last character seen was an unpaired DOUBLE_QUOTES character. You should also update the states according to the current character.
- Parameters:
csvLine
- The line of text data to be treated as a CSV record- Returns:
- The sequence of fields of the CSV record as a list of
String
s.
-
extractColumn
Given aString
that represents a CSV line and anint
column index, returns the contents of that column. Columns in the buffered reader are zero indexed.- Parameters:
csvLine
- theString
containing the CSV recordcsvColumn
- the column index of the CSV field whose contents ought to be returned- Returns:
- the field of csvLine corresponding to
csvColumn
- Throws:
IllegalArgumentException
- ifcsvLine
is nullIndexOutOfBoundsException
- ifcsvColumn
is not a valid field index of the record
-
csvFieldsAtColumn
Given aBufferedReader
of CSV data and a column index, returns the list of all CSV fields appearing in that column.If a line has no field at the given index, it is skipped.
If the line has a field at the given index, it should be returned as an element of the list.
- Parameters:
br
- - a BufferedReader that represents tweetscsvColumn
- - the index of the column in the CSV data- Returns:
- a
List
of CSV fields (none of which is null)
-