public final class CsvFile extends Object
Represents a CSV file together with the ability to parse it from a CharSource.
The separator may be specified, allowing TSV files (tab-separated) and other similar formats to be parsed.
This class loads the entire CSV file into memory.
To process the CSV file row-by-row, use CsvIterator.
The CSV file format is a general-purpose comma-separated value format. The format is parsed line-by-line, with lines separated by CR, LF or CRLF. Each line can contain one or more fields. Each field is separated by a comma character (,) or tab. Any field may be quoted using a double quote at the start and end. A quoted field may additionally be prefixed by an equals sign. The content of a quoted field may include commas and additional double quotes. Two adjacent double quotes in a quoted field will be replaced by a single double quote. Quoted fields are not trimmed. Non-quoted fields are trimmed.
The first line may be treated as a header row. The header row is accessed separately from the data rows.
Blank lines are ignored. Lines may be commented with has '#' or semicolon ';'.
| Modifier and Type | Method and Description |
|---|---|
boolean |
containsHeader(Pattern headerPattern)
Checks if the header pattern is present in the file.
|
boolean |
containsHeader(String header)
Checks if the header is present in the file.
|
boolean |
containsHeaders(Collection<String> headers)
Checks if the headers are present in the file.
|
boolean |
equals(Object obj)
Checks if this CSV file equals another.
|
static char |
findSeparator(CharSource source)
Finds the separator used by the specified CSV file.
|
int |
hashCode()
Returns a suitable hash code for the CSV file.
|
ImmutableList<String> |
headers()
Gets the header row.
|
static CsvFile |
of(CharSource source,
boolean headerRow)
Parses the specified source as a CSV file, using a comma as the separator.
|
static CsvFile |
of(CharSource source,
boolean headerRow,
char separator)
Parses the specified source as a CSV file where the separator is specified and might not be a comma.
|
static CsvFile |
of(List<String> headers,
List<? extends List<String>> rows)
Obtains an instance from a list of headers and rows.
|
static CsvFile |
of(Reader reader,
boolean headerRow)
Parses the specified reader as a CSV file, using a comma as the separator.
|
static CsvFile |
of(Reader reader,
boolean headerRow,
char separator)
Parses the specified reader as a CSV file where the separator is specified and might not be a comma.
|
CsvRow |
row(int index)
Gets a single row.
|
int |
rowCount()
Gets the number of data rows.
|
ImmutableList<CsvRow> |
rows()
Gets all data rows in the file.
|
String |
toString()
Returns a string describing the CSV file.
|
CsvFile |
withHeaders(List<String> headers)
Returns an instance with the specified headers.
|
public static CsvFile of(CharSource source, boolean headerRow)
CSV files sometimes contain a Unicode Byte Order Mark.
Callers are responsible for handling this, such as by using UnicodeBom.
source - the CSV file resourceheaderRow - whether the source has a header row, an empty source must still contain the headerUncheckedIOException - if an IO exception occursIllegalArgumentException - if the file cannot be parsedpublic static CsvFile of(CharSource source, boolean headerRow, char separator)
This overload allows the separator to be controlled. For example, a tab-separated file is very similar to a CSV file, the only difference is the separator.
CSV files sometimes contain a Unicode Byte Order Mark.
Callers are responsible for handling this, such as by using UnicodeBom.
source - the file resourceheaderRow - whether the source has a header row, an empty source must still contain the headerseparator - the separator used to separate each field, typically a comma, but a tab is sometimes usedUncheckedIOException - if an IO exception occursIllegalArgumentException - if the file cannot be parsedpublic static CsvFile of(Reader reader, boolean headerRow)
This factory method takes a Reader.
Callers are encouraged to use CharSource instead of Reader
as it allows the resource to be safely managed.
This factory method allows the separator to be controlled. For example, a tab-separated file is very similar to a CSV file, the only difference is the separator.
CSV files sometimes contain a Unicode Byte Order Mark.
Callers are responsible for handling this, such as by using UnicodeBom.
reader - the file resourceheaderRow - whether the source has a header row, an empty source must still contain the headerUncheckedIOException - if an IO exception occursIllegalArgumentException - if the file cannot be parsedpublic static CsvFile of(Reader reader, boolean headerRow, char separator)
This factory method takes a Reader.
Callers are encouraged to use CharSource instead of Reader
as it allows the resource to be safely managed.
This factory method allows the separator to be controlled. For example, a tab-separated file is very similar to a CSV file, the only difference is the separator.
CSV files sometimes contain a Unicode Byte Order Mark.
Callers are responsible for handling this, such as by using UnicodeBom.
reader - the file resourceheaderRow - whether the source has a header row, an empty source must still contain the headerseparator - the separator used to separate each field, typically a comma, but a tab is sometimes usedUncheckedIOException - if an IO exception occursIllegalArgumentException - if the file cannot be parsedpublic static char findSeparator(CharSource source)
The search includes comma, semicolon, colon, tab and pipe (in that order of priority).
The algorithm operates in a number of steps. Firstly, it looks for occurrences where a separator is followed by valid quoted text. If this matches, the separator is assumed to be correct. Secondly, it looks for lines that only consist of a separator. If this matches, the separator is assumed to be correct. Thirdly, it looks to see which separator is the most common on the line. If that separator is also the most common on the next line, and the number of columns matches, the separator is assumed to be correct. Otherwise another line is processed. Thus to match a separator, there must be two lines with the same number of columns. At most, 100 content lines are read from the file. The default is comma if the file is empty.
source - the source to read as CSVUncheckedIOException - if an IO exception occursIllegalArgumentException - if the file cannot be parsedpublic static CsvFile of(List<String> headers, List<? extends List<String>> rows)
The headers may be an empty list. All the rows must contain a list of the same size, matching the header if present.
headers - the headers, empty if no headersrows - the data rowsIllegalArgumentException - if the rows do not match the headerspublic ImmutableList<String> headers()
If there is no header row, an empty list is returned.
public ImmutableList<CsvRow> rows()
public int rowCount()
public CsvRow row(int index)
index - the row index, zero-basedpublic boolean containsHeader(String header)
Matching is case insensitive.
header - the column header to matchpublic boolean containsHeaders(Collection<String> headers)
Matching is case insensitive.
headers - the column headers to matchpublic boolean containsHeader(Pattern headerPattern)
Matching is case insensitive.
headerPattern - the header pattern to matchpublic CsvFile withHeaders(List<String> headers)
headers - the new headerspublic boolean equals(Object obj)
The comparison checks the content.
public int hashCode()
Copyright 2009-Present by OpenGamma Inc. and individual contributors
Apache v2 licensed
Additional documentation can be found at strata.opengamma.io.