public final class CsvIterator extends Object implements AutoCloseable, PeekingIterator<CsvRow>
Provides the ability to iterate over a CSV file together with the ability to parse it from a CharSource.
The separator may be specified, allowing TSV files (tab-separated) and other similar formats to be parsed.
See CsvFile for more details of the CSV format.
This class processes the CSV file row-by-row.
To load the entire CSV file into memory, use CsvFile.
This class must be used in a try-with-resources block to ensure that the underlying CSV file is closed:
try (CsvIterator csvIterator = CsvIterator.of(source, true)) {
// use the CsvIterator
}
One way to use the iterable is with the for-each loop, using asIterable():
try (CsvIterator csvIterator = CsvIterator.of(source, true)) {
for (CsvRow row : csvIterator.asIterable()) {
// process the row
}
}
This class also allows the headers to be obtained without reading the whole CSV file:
try (CsvIterator csvIterator = CsvIterator.of(source, true)) {
ImmutableList <String> headers = csvIterator.headers();
}
| Modifier and Type | Method and Description |
|---|---|
Iterable<CsvRow> |
asIterable()
Returns an
Iterable that wraps this iterator. |
Stream<CsvRow> |
asStream()
Returns a stream that wraps this iterator.
|
void |
close()
Closes the underlying reader.
|
boolean |
containsHeader(Pattern headerPattern)
Checks if the header pattern is present in the file.
|
boolean |
containsHeader(String header)
Checks if the header is present in the file.
|
boolean |
containsHeaders(Collection<String> headers)
Checks if the headers are present in the file.
|
boolean |
hasNext()
Checks whether there is another row in the CSV file.
|
ImmutableList<String> |
headers()
Gets the header row.
|
CsvRow |
next()
Returns the next row from the CSV file.
|
List<CsvRow> |
nextBatch(int count)
Returns the next batch of rows from the CSV file.
|
List<CsvRow> |
nextBatch(Predicate<CsvRow> selector)
Returns the next batch of rows from the CSV file using a predicate to determine the rows.
|
static CsvIterator |
of(CharSource source,
boolean headerRow)
Parses the specified source as a CSV file, using a comma as the separator.
|
static CsvIterator |
of(CharSource source,
boolean headerRow,
char separator)
Parses the specified source as a CSV file where the separator is specified and might not be a comma.
|
static CsvIterator |
of(Reader reader,
boolean headerRow)
Parses the specified reader as a CSV file, using a comma as the separator.
|
static CsvIterator |
of(Reader reader,
boolean headerRow,
char separator)
Parses the specified reader as a CSV file where the separator is specified and might not be a comma.
|
CsvRow |
peek()
Peeks the next row from the CSV file without changing the iteration position.
|
void |
remove()
Throws an exception as remove is not supported.
|
String |
toString()
Returns a string describing the CSV iterator.
|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, waitforEachRemainingpublic static CsvIterator of(CharSource source, boolean headerRow)
source - the source to read as CSVheaderRow - whether the source has a header row, an empty source must still contain the headerUncheckedIOException - if an IO exception occursIllegalArgumentException - if the file cannot be parsedpublic static CsvIterator of(CharSource source, boolean headerRow, char separator)
This overload allows the separator to be controlled. For example, a tab-separated file is very similar to a CSV file, the only difference is the separator.
source - the source to read as CSVheaderRow - whether the source has a header row, an empty source must still contain the headerseparator - the separator used to separate each field, typically a comma, but a tab is sometimes usedUncheckedIOException - if an IO exception occursIllegalArgumentException - if the file cannot be parsedpublic static CsvIterator of(Reader reader, boolean headerRow)
The caller is responsible for closing the reader, such as by calling close().
reader - the file readerheaderRow - whether the source has a header row, an empty source must still contain the headerUncheckedIOException - if an IO exception occursIllegalArgumentException - if the file cannot be parsedpublic static CsvIterator of(Reader reader, boolean headerRow, char separator)
This overload allows the separator to be controlled. For example, a tab-separated file is very similar to a CSV file, the only difference is the separator.
The caller is responsible for closing the reader, such as by calling close().
reader - the file readerheaderRow - whether the source has a header row, an empty source must still contain the headerseparator - the separator used to separate each field, typically a comma, but a tab is sometimes usedUncheckedIOException - if an IO exception occursIllegalArgumentException - if the file cannot be parsedpublic ImmutableList<String> headers()
If there is no header row, an empty list is returned.
public boolean containsHeader(String header)
Matching is case insensitive.
header - the column header to matchpublic boolean containsHeaders(Collection<String> headers)
Matching is case insensitive.
headers - the column headers to matchpublic boolean containsHeader(Pattern headerPattern)
Matching is case insensitive.
headerPattern - the header pattern to matchpublic Iterable<CsvRow> asIterable()
Iterable that wraps this iterator.
Unlike most Iterable implementations, the method Iterable.iterator()
can only be called once. This is intended for use with the for-each loop.
try (CsvIterator csvIterator = CsvIterator.of(source, true)) {
for (CsvRow row : csvIterator.asIterable()) {
// process the row
}
}
Iterablepublic Stream<CsvRow> asStream()
The stream will process any remaining rows in the CSV file. As such, it is recommended that callers should use this method or the iterator methods and not both.
public boolean hasNext()
hasNext in interface Iterator<CsvRow>UncheckedIOException - if an IO exception occursIllegalArgumentException - if the file cannot be parsedpublic CsvRow peek()
peek in interface PeekingIterator<CsvRow>UncheckedIOException - if an IO exception occursIllegalArgumentException - if the file cannot be parsedNoSuchElementException - if the end of file has been reachedpublic CsvRow next()
next in interface PeekingIterator<CsvRow>next in interface Iterator<CsvRow>UncheckedIOException - if an IO exception occursIllegalArgumentException - if the file cannot be parsedNoSuchElementException - if the end of file has been reachedpublic List<CsvRow> nextBatch(int count)
This will return up to the specified number of rows from the file at the current iteration point. An empty list is returned if there are no more rows.
count - the number of rows to try and get, negative returns an empty listUncheckedIOException - if an IO exception occursIllegalArgumentException - if the file cannot be parsedpublic List<CsvRow> nextBatch(Predicate<CsvRow> selector)
This is useful for CSV files where information is grouped with an identifier or key. For example, a variable notional trade file might have one row for the trade followed by multiple rows for the variable aspects, all grouped by a common trade identifier. In general, callers should peek or read the first row and use information within it to create the selector:
while (it.hasNext()) {
CsvRow first = it.peek();
String id = first.getValue("ID");
List<CsvRow> batch = it.nextBatch(row -> row.getValue("ID").equals(id));
// process batch
}
This will return a batch of rows where the selector returns true for the row.
An empty list is returned if the selector returns false for the first row.selector - selects whether a row is part of the batch or part of the next batchUncheckedIOException - if an IO exception occursIllegalArgumentException - if the file cannot be parsedpublic void remove()
remove in interface PeekingIterator<CsvRow>remove in interface Iterator<CsvRow>UnsupportedOperationException - alwayspublic void close()
close in interface AutoCloseableUncheckedIOException - if an IO exception occursCopyright 2009-Present by OpenGamma Inc. and individual contributors
Apache v2 licensed
Additional documentation can be found at strata.opengamma.io.