Class DataBaseConnector
- java.lang.Object
-
- de.julielab.costosys.dbconnection.DataBaseConnector
-
public class DataBaseConnector extends Object
This class creates a connection with a database and allows for convenient queries and commands.
Database layout and returned columns are specified by a configuration file. The class was developed for a PostgreSQL back-end, using another database server may require modifications.
Queries use up to 3 threads for higher performance and a connection pool is used for higher performance if multiple instances are deployed simultaneous.Visit
http://commons.apache.org/dbcp/apidocs/org/apache/commons/dbcp/package- summary.html#package_description<\code> for more information about the connection pooling.- Author:
- hellrich, faessler
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classDataBaseConnector.StatusElement
-
Field Summary
Fields Modifier and Type Field Description static StringDEFAULT_PIPELINE_STATEstatic intMETA_IN_ARRAYDeprecated.static StringPOSTGRES_VERSIONThe PostgreSQL version against which this version of CoStoSys is developed and tested.static LinkedHashMap<String,String>subsetColumnsThis is the definition of subset tables except the primary key.
-
Constructor Summary
Constructors Constructor Description DataBaseConnector(InputStream configStream)This class creates a connection with a database and allows for convenient queries and commands.DataBaseConnector(InputStream configStream, int queryBatchSize)This class creates a connection with a database and allows for convenient queries and commands.DataBaseConnector(String configPath)Constructors ********************************DataBaseConnector(String dbUrl, String user, String password)This class creates a connection with a database and allows for convenient queries and commands.DataBaseConnector(String dbUrl, String user, String password, String pgSchema, int queryBatchSize, InputStream configStream)This class creates a connection with a database and allows for convenient queries and commands.DataBaseConnector(String dbUrl, String user, String password, String pgSchema, InputStream fieldDefinition)This class creates a connection with a database and allows for convenient queries and commands.DataBaseConnector(String serverName, String dbName, String user, String password, String pgSchema, int queryBatchSize, InputStream configStream)DataBaseConnector(String serverName, String dbName, String user, String password, String pgSchema, InputStream fieldDefinition)
-
Method Summary
All Methods Instance Methods Concrete Methods Deprecated Methods Modifier and Type Method Description voidaddFieldConfiguration(FieldConfig config)Classes for query()FieldConfigaddPKAdaptedFieldConfiguration(List<Map<String,String>> primaryKey, String fieldConfigurationForAdaption, String fieldConfigurationNameSuffix)FieldConfigaddPKAdaptedFieldConfiguration(List<Map<String,String>> primaryKey, String fieldConfigurationForAdaption, String fieldConfigurationNameSuffix, List<Map<String,String>> additionalColumns)FieldConfigaddXmiAnnotationFieldConfiguration(List<Map<String,String>> primaryKey, boolean doGzip)Deprecated.JeDIS does not store annotations in columns to the primary document table.FieldConfigaddXmiDocumentFieldConfiguration(List<Map<String,String>> primaryKey, boolean doGzip)Adds an auto-generated field configuration that exhibits the given primary key and all the fields required to store complete XMI document data (i.e.FieldConfigaddXmiTextFieldConfiguration(List<Map<String,String>> primaryKey, List<Map<String,String>> additionalColumns, boolean doGzip)Adds an auto-generated field configuration that exhibits the given primary key and all the fields required to store XMI base document data (i.e.voidassureColumnsExist(String tableName, List<String> columnsNames, String columnDataType)Checks if the given columns exist with the given data type.voidcheckTableDefinition(String tableName)Checks whether the given table matches the active table schema.voidcheckTableDefinition(String tableName, String schemaName)Compares the actual table in the database with its definition in the xml configuration Note: This method currently does not check other then primary key columns for tables that reference another table, even if those should actually be data tables.voidcheckTableHasSchemaColumns(String tableName, String schema)Checks if the given table has at least the columns defined in the given schema.voidcheckTableSchemaCompatibility(String... schemaNames)voidcheckTableSchemaCompatibility(String referenceSchema, String[] schemaNames)voidclose()intcountRowsOfDataTable(String tableName, String whereCondition)intcountRowsOfDataTable(String tableName, String whereCondition, String schemaName)intcountUnprocessed(String subsetTableName)intcountUnprocessed(String subsetTableName, String schemaName)Counts the unprocessed rows in a subset tablevoidcreateIndex(String table, String... columns)Creates an index for table table on the given columns.voidcreateSchema(String schemaName)Creates the PostgreSQL schemaschemaNamein the active database.voidcreateSubsetTable(String subsetTable, String supersetTable, Integer maxNumberRefHops, String comment)Does the same ascreateSubsetTable(String, String, Integer, String, String)with the exception that the assumed table schema is that of the active schema defined in the configuration file.voidcreateSubsetTable(String subsetTable, String supersetTable, Integer posOfDataTable, String comment, String schemaName)Creates an empty table referencing the primary key of the data table given bysuperSetTableor, if this is a subset table itself, the data table referenced by that table.voidcreateSubsetTable(String subsetTable, String supersetTable, String comment)Does the same ascreateSubsetTable(String, String, Integer, String, String)with the exception that the assumed table schema is that of the active schema defined in the configuration file and the first referenced data table is used as data table.voidcreateTable(String tableName, String comment)Creates a new table according to the field schema definition corresponding to the active schema name determined in the configuration.voidcreateTable(String tableName, String schemaName, String comment)Creates a new table according to the field schema definition corresponding to the nameschemaNamegiven in the configuration file.voidcreateTable(String tableName, String referenceTableName, String schemaName, String comment)Creates a new table according to the field schema definition corresponding to the nameschemaNameand with foreign key references to the primary key of referenceTableName.voiddefineMirrorSubset(String subsetTable, String supersetTable, boolean performUpdate, Integer maxNumberRefHops, String comment)Convenience method for creating and initializing a subset in one step.voiddefineMirrorSubset(String subsetTable, String supersetTable, boolean performUpdate, String comment)Convenience method for creating and initializing a subset in one step.voiddefineMirrorSubset(String subsetTable, String supersetTable, boolean performUpdate, String comment, String schemaName)Convenience method for creating and initializing a subset in one step.voiddefineRandomSubset(int size, String subsetTable, String supersetTable, String comment)Convenience method for creating and initializing a subset in one step.voiddefineRandomSubset(int size, String subsetTable, String supersetTable, String comment, String schemaName)Convenience method for creating and initializing a subset in one step.voiddefineSubset(String subsetTable, String supersetTable, String comment)Convenience method for creating and initializing a subset in one step.voiddefineSubset(String subsetTable, String supersetTable, String comment, String schemaName)Convenience method for creating and initializing a subset in one step.voiddefineSubset(List<String> values, String subsetTable, String supersetTable, String columnToTest, String comment)Convenience method for creating and initializing a subset in one step.voiddefineSubset(List<String> values, String subsetTable, String supersetTable, String columnToTest, String comment, String schemaName)Convenience method for creating and initializing a subset in one step.voiddefineSubsetWithWhereClause(String subsetTable, String supersetTable, String conditionToCheck, String comment)Convenience method for creating and initializing a subset in one step.voiddefineSubsetWithWhereClause(String subsetTable, String supersetTable, String conditionToCheck, String comment, String schemaName)Convenience method for creating and initializing a subset in one step.voiddeleteFromTable(String table, List<Object[]> ids)Deletes entries from a table<T> voiddeleteFromTableSimplePK(String table, List<T> ids)Deletes entries from a table where the primary key of this table must consist of exactly one column.int[]determineExistingSubsetRows(CoStoSysConnection conn, String subsetTableName, List<Object[]> pkValues, String schemaName)booleandropSchema(String schema)Drops the empty Postgres schema with given name.booleandropTable(String table)Drops the table with the given name.StringgetActiveDataPGSchema()StringgetActiveDataTable()StringgetActivePGSchema()FieldConfiggetActiveTableFieldConfiguration()StringgetActiveTableSchema()ConfigReadergetConfig()StringgetDbURL()byte[]getEffectiveConfiguration()Returns the effective XML configuration as abyte[].FieldConfiggetFieldConfiguration()FieldConfiggetFieldConfiguration(String schemaName)intgetMaxConnections()Map<String,Boolean>getMirrorSubsetNames(CoStoSysConnection conn, String tableName)StringgetNextDataTable(String referencingTable)Follows the foreign-key specifications of the given table to the referenced table.StringgetNextOrThisDataTable(String referencingTable)Determines the first data table on the reference pathreferencingTable -> table1 -> table2 -> ...org.apache.commons.lang3.tuple.Pair<Integer,List<Map<String,String>>>getNumColumnsAndFields(boolean joined, String[] schemaNames)Helper method to determine the columns that are returned in case of a joining operation.intgetNumReservedConnections()intgetNumReservedConnections(boolean excludeNonShared)longgetNumRows(String tableName)Returns the row count of the requested table.List<Integer>getPrimaryKeyIndices()Returns the indices of the primary keys, beginning with 0.List<Object[]>getProcessedPrimaryKeys(String subsetTable)Creates a query cursor to the given subset table and retrieves all those primary keys according to the active table schema that are marked as processed.List<Object[]>getProcessedPrimaryKeys(String subsetTable, String tableSchema)Creates a query cursor to the given subset table and retrieves all those primary keys according to tableSchema that are marked as processed.intgetQueryBatchSize()StringgetReferencedTable(String referencingTable)Returns the name of a table referenced by an SQL-foreign-key.StringgetReferencedTable(String startTable, Integer posOfDataTable)Gets the - possibly indirectly - referenced table of startTable where posOfDataTable specifies the position of the desired table in the reference chain starting at startTable.StringgetScheme()List<Map<String,Object>>getTableColumnInformation(String qualifiedTable, String... fields)Returns information about the columns in a table.Stream<String>getTableColumnNames(String qualifiedTable)List<String>getTableDefinition(String tableName)Query the MetaData for the columns of a tableList<String>getTables()booleanhasUnfetchedRows(String tableName)booleanhasUnfetchedRows(String tableName, String schemaName)Utility **********************************voidimportFromRowIterator(Iterator<Map<String,Object>> it, String tableName)voidimportFromRowIterator(Iterator<Map<String,Object>> it, String tableName, boolean commit, String schemaName)Internal method to import into an existing tablevoidimportFromRowIterator(Iterator<Map<String,Object>> it, String tableName, String tableSchema)voidimportFromXML(Iterable<byte[]> xmls, String identifier, String tableName)voidimportFromXML(Iterable<byte[]> xmls, String tableName, String identifier, String schemaName)Imports XMLs into a table.voidimportFromXMLFile(String fileStr, String tableName)Import new medline XMLs in a existing table from an XML file or a directory of XML files.voidimportFromXMLFile(String fileStr, String tableName, String schemaName)Import new medline XMLs in a existing table from an XML file or a directory of XML files.voidinitMirrorSubset(String subsetTable, String supersetTable, boolean performUpdate)voidinitMirrorSubset(String subsetTable, String supersetTable, boolean performUpdate, String schemaName)Defines a mirror subset populating a subset table with primary keys from another table.voidinitRandomSubset(int size, String subsetTable, String supersetTable)voidinitRandomSubset(int size, String subsetTable, String superSetTable, String schemaName)Selectssizerows of the given super set table randomly and inserts them into the subset table.voidinitSubset(String subsetTable, String supersetTable)InitializessubsetTableby inserting one row for each entry insupersetTable.voidinitSubset(String subsetTable, String supersetTable, String schemaName)Defines a subset by populating a subset table with all primary keys from another table.voidinitSubset(List<String> values, String subsetTable, String supersetTable, String columnToTest)Defines a subset by populating a subset table with primary keys from another table.voidinitSubset(List<String> values, String subsetTable, String supersetTable, String columnToTest, String schemaName)Defines a subset by populating a subset table with primary keys from another table.voidinitSubsetWithWhereClause(String subsetTable, String supersetTable, String whereClause)Defines a subset by populating a subset table with primary keys from another table.voidinitSubsetWithWhereClause(String subsetTable, String supersetTable, String whereClause, String schemaName)Defines a subset by populating a subset table with primary keys from another table.booleanisDatabaseReachable()booleanisDataTable(String table)booleanisEmpty(String tableName)Tests if a table contains entries.booleanisEmpty(String tableName, String columnName)booleanisSubsetTable(String table)Checks if the given table is a subset table.intmarkAsProcessed(String table)Modifies a subset table, marking all its entries as processed.intmarkAsProcessed(String table, List<Object[]> ids)Modifies a subset table, marking entries as processed.intmodifyTable(String sql, List<Object[]> ids)Executes a given SQL command (must end with "WHERE "!) an extends the WHERE-clause with the primary keys, set to the values in ids.intmodifyTable(String sql, List<Object[]> ids, String schemaName)Executes a given SQL command (must end with "WHERE "!) and extends the WHERE-clause with the primary keys, set to the values in ids.CoStoSysConnectionobtainConnection()Returns the connection associated with the current thread object if it exists.CoStoSysConnectionobtainOrReserveConnection()This is just a convenience method forobtainOrReserveConnection(boolean)with the parameter set to true.CoStoSysConnectionobtainOrReserveConnection(boolean shared)This is the preferred way to obtain a database connection.int[]performBatchUpdate(CoStoSysConnection conn, List<Object[]> pkValues, String sqlFormatString, String schemaName)voidprintConnectionPoolStatus()DBCIterator<Object[]>query(String table, List<String> fields)Returns the requested fields from the requested table.DBCIterator<Object[]>query(String table, List<String> fields, long limit)Returns the requested fields from the requested table.DBCIterator<Object[]>query(List<String[]> keys, String table)Returns the values the the columnDEFAULT_FIELDin the given table.DBCIterator<Object[]>query(List<String[]> keys, String table, String schemaName)Returns the values the the columnDEFAULT_FIELDin the given table.DBCIterator<Object[]>queryAll(List<String> fields, String table)Returns an iterator over the columnfieldin the tabletable.DBCIterator<byte[][]>queryDataTable(String tableName, String whereCondition)Returns all column data from the data tabletableNamewhich is marked as 'to be retrieved' in the table scheme specified by the active table scheme.DBCIterator<byte[][]>queryDataTable(String tableName, String whereCondition, String[] tablesToJoin, String schemaName)DBCIterator<byte[][]>queryDataTable(String tableName, String whereCondition, String[] tablesToJoin, String[] schemaNames)Returns all column data from the data tabletableNamewhich is marked as 'to be retrieved' in the table scheme specified byschemaName.DBCIterator<byte[][]>querySubset(String tableName, long limitParam)DBCIterator<byte[][]>querySubset(String tableName, String whereClause, long limitParam, Integer numberRefHops, String schemaName)Retrieves XML field values in the data table referenced by the subset tabletableNameortableNameitself if it is a data table.DBCIterator<byte[][]>queryWithTime(List<Object[]> ids, String table, String timestamp)DBCIterator<byte[][]>queryWithTime(List<Object[]> ids, String table, String timestamp, String schemaName)Returns an iterator over all rows in the table with matching id and a timestamp newer (>) thantimestamp.voidreleaseConnections()Releases all connections associated with the current thread back to the connection pool.booleanremoveTableFromMirrorSubsetList(CoStoSysConnection conn, String mirrorSubsetName)Removes an entry from the table listing mirror subsets, located atConstants.MIRROR_COLLECTION_NAME.CoStoSysConnectionreserveConnection()Just delegates to reserveConnection(true).CoStoSysConnectionreserveConnection(boolean shared)Only use when you are sure you need this method.int[]resetSubset(CoStoSysConnection conn, String subsetTableName, List<Object[]> pkValues)int[]resetSubset(CoStoSysConnection conn, String subsetTableName, List<Object[]> pkValues, String schemaName)Sets the values in theis_processedandis_in_processrows of a subset toFALSE.voidresetSubset(String subsetTableName)Sets the values in theis_processed,is_in_process,has_errorsandlogcolumns of a subset toFALSE.voidresetSubset(String subsetTableName, boolean whereNotProcessed, boolean whereNoErrors, String lastComponent)Sets the values in theis_processed,is_in_process,has_errorsandlogcolumns of a subset toFALSEwhere the corresponding rows areis_in_processoris_processed.voidresetSubset(String subsetTableName, List<Object[]> pkValues)List<Object[]>retrieveAndMark(String subsetTableName, String readerComponent, String hostName, String pid)Retrieves from a subset-tablelimitprimary keys whose rows are not marked to be in process or finished being processed and sets the rows of the retrieved primary keys as being "in process".List<Object[]>retrieveAndMark(String subsetTableName, String readerComponent, String hostName, String pid, int limit, String order)Retrieves primary keys from a subset table and marks them as being "in process".List<Object[]>retrieveAndMark(String subsetTableName, String schemaName, String readerComponent, String hostName, String pid, int limit, String order)Retrieves from a subset-tablelimitprimary keys whose rows are not marked to be in process or finished being processed and sets the rows of the retrieved primary keys as being "in process".DBCIterator<byte[][]>retrieveColumnsByTableSchema(List<Object[]> ids, String table)Retrieves row values oftablefrom the database.DBCIterator<byte[][]>retrieveColumnsByTableSchema(List<Object[]> ids, String[] tables, String[] schemaNames)Retrieves data from the database over multiple tables.DBCIterator<byte[][]>retrieveColumnsByTableSchema(List<Object[]> ids, String table, String schemaName)Retrieves row values oftablefrom the database.booleanschemaExists(String schemaName)crea Tests if a schema exists.voidsetActivePGSchema(String pgSchema)voidsetActiveTableSchema(String schemaName)voidsetDbURL(String uri)voidsetException(String subsetTableName, ArrayList<byte[][]> primaryKeyList, HashMap<byte[][],String> logException)Sets the value ofhas_errorstoTRUEand adds a description inlogfor exceptions which occured during the processing of a collection of documents according to the given primary keys.voidsetHost(String host)voidsetMaxConnections(int num)voidsetPassword(String password)voidsetPort(Integer port)voidsetPort(String port)voidsetProcessed(String subsetTableName, List<byte[][]> primaryKeyList)Sets the values ofis_processedtoTRUEand ofis_in_processtoFALSEfor a collection of documents according to the given primary keys.voidsetQueryBatchSize(int queryBatchSize)voidsetUser(String user)SubsetStatusstatus(String subsetTableName, Set<DataBaseConnector.StatusElement> statusElementsToReturn)Returns a map with information about how many rows are marked as is_in_process, is_processed and how many rows there are in total.
The respective values are stored under with the keysConstants.IN_PROCESS,Constants.PROCESSEDandConstants.TOTAL.booleantableExists(CoStoSysConnection conn, String tableName)Tests if a table exists.booleantableExists(String tableName)Tests if a table exists.voidupdateFromRowIterator(Iterator<Map<String,Object>> it, String tableName, boolean resetUpdatedDocumentsInMirrorSubsets)Updates a table with the entries yielded by the iterator.voidupdateFromRowIterator(Iterator<Map<String,Object>> it, String tableName, boolean commit, boolean resetUpdatedDocumentsInMirrorSubsets, String schemaName)Updates a table with the entries yielded by the iterator.voidupdateFromXML(String fileStr, String tableName, boolean resetUpdatedDocumentsInMirrorSubsets)voidupdateFromXML(String fileStr, String tableName, boolean resetUpdatedDocumentsInMirrorSubsets, String schemaName)Updates an existing database.voidwithConnectionExecute(DbcExecution command)ObjectwithConnectionQuery(DbcQuery<?> command)booleanwithConnectionQueryBoolean(DbcQuery<Boolean> command)doublewithConnectionQueryDouble(DbcQuery<Double> command)intwithConnectionQueryInteger(DbcQuery<Integer> command)StringwithConnectionQueryString(DbcQuery<String> query)
-
-
-
Field Detail
-
POSTGRES_VERSION
public static final String POSTGRES_VERSION
The PostgreSQL version against which this version of CoStoSys is developed and tested.- See Also:
- Constant Field Values
-
DEFAULT_PIPELINE_STATE
public static final String DEFAULT_PIPELINE_STATE
- See Also:
- Constant Field Values
-
META_IN_ARRAY
@Deprecated public static final int META_IN_ARRAY
Deprecated.Used as a hack for the not-yet-published EMNLP-Paper. In the meantime, a more sophisticated system has been implemented (EF, 18.01.2012)- See Also:
- Constant Field Values
-
subsetColumns
public static final LinkedHashMap<String,String> subsetColumns
This is the definition of subset tables except the primary key.
-
-
Constructor Detail
-
DataBaseConnector
public DataBaseConnector(String configPath) throws FileNotFoundException
Constructors ********************************- Throws:
FileNotFoundException
-
DataBaseConnector
public DataBaseConnector(InputStream configStream)
This class creates a connection with a database and allows for convenient queries and commands.- Parameters:
configStream- used to read the configuration for this connector instance
-
DataBaseConnector
public DataBaseConnector(InputStream configStream, int queryBatchSize)
This class creates a connection with a database and allows for convenient queries and commands.- Parameters:
configStream- used to read the configuration for this connector instancequeryBatchSize- background threads are utilized to speed up queries, this parameter determines the number of pre-fetched entries
-
DataBaseConnector
public DataBaseConnector(String dbUrl, String user, String password, String pgSchema, InputStream fieldDefinition)
This class creates a connection with a database and allows for convenient queries and commands.- Parameters:
dbUrl- the url of the databaseuser- the username for the dbpassword- the password for the usernamefieldDefinition-InputStreamcontaining data of a configuration file
-
DataBaseConnector
public DataBaseConnector(String serverName, String dbName, String user, String password, String pgSchema, InputStream fieldDefinition)
-
DataBaseConnector
public DataBaseConnector(String dbUrl, String user, String password, String pgSchema, int queryBatchSize, InputStream configStream)
This class creates a connection with a database and allows for convenient queries and commands.- Parameters:
dbUrl- the url of the databaseuser- the username for the dbpassword- the password for the usernamequeryBatchSize- background threads are utilized to speed up queries, this parameter determines the number of pre-fetched entriesconfigStream- used to read the configuration for this connector instance
-
DataBaseConnector
public DataBaseConnector(String serverName, String dbName, String user, String password, String pgSchema, int queryBatchSize, InputStream configStream)
-
-
Method Detail
-
getConfig
public ConfigReader getConfig()
-
setHost
public void setHost(String host)
-
setPort
public void setPort(String port)
-
setPort
public void setPort(Integer port)
-
setUser
public void setUser(String user)
-
setPassword
public void setPassword(String password)
-
getMaxConnections
public int getMaxConnections()
-
setMaxConnections
public void setMaxConnections(int num)
-
printConnectionPoolStatus
public void printConnectionPoolStatus()
-
getActiveDataTable
public String getActiveDataTable()
- Returns:
- the activeDataTable
-
getEffectiveConfiguration
public byte[] getEffectiveConfiguration()
Returns the effective XML configuration as a
byte[].The effective configuration consists of the default configuration and the given user configuration as well (merged by the ConfigReader in the constructor).
- Returns:
- the effectiveConfiguration
-
getActiveDataPGSchema
public String getActiveDataPGSchema()
-
getActivePGSchema
public String getActivePGSchema()
-
setActivePGSchema
public void setActivePGSchema(String pgSchema)
-
getActiveTableSchema
public String getActiveTableSchema()
-
setActiveTableSchema
public void setActiveTableSchema(String schemaName)
-
getActiveTableFieldConfiguration
public FieldConfig getActiveTableFieldConfiguration()
-
retrieveAndMark
public List<Object[]> retrieveAndMark(String subsetTableName, String readerComponent, String hostName, String pid) throws TableSchemaMismatchException, TableNotFoundException
Retrieves from a subset-table
limitprimary keys whose rows are not marked to be in process or finished being processed and sets the rows of the retrieved primary keys as being "in process".The table is locked during this transaction. Locking and marking ensure that every primary key will be returned exactly once. Remember to remove the marks if you want to use the subset again ;)
- Parameters:
subsetTableName- - name of a table, conforming to the subset standardhostName- - will be saved in the subset tablepid- - will be saved in the subset table- Returns:
- An ArrayList of pmids which have not yet been processed
- Throws:
TableSchemaMismatchExceptionTableNotFoundException
-
retrieveAndMark
public List<Object[]> retrieveAndMark(String subsetTableName, String readerComponent, String hostName, String pid, int limit, String order) throws TableSchemaMismatchException, TableNotFoundException
Retrieves primary keys from a subset table and marks them as being "in process". The table schema - and thus the form of the primary keys - is assumed to match the active table schema determined in the configuration file.
The table is locked during this transaction. Locking and marking ensure that every primary key will be returned exactly once. Remember to remove the marks if you want to use the subset again ;)- Parameters:
subsetTableName- - name of a table, conforming to the subset standardhostName- - will be saved in the subset tablepid- - will be saved in the subset tablelimit- - batchsize for marking/retrievingorder- - determines an ordering. Default order (which may change over time) when this parameter is null or empty.- Returns:
- An ArrayList of primary keys which have not yet been processed.
- Throws:
TableSchemaMismatchExceptionTableNotFoundException- See Also:
retrieveAndMark(String, String, String, String, int, String)
-
retrieveAndMark
public List<Object[]> retrieveAndMark(String subsetTableName, String schemaName, String readerComponent, String hostName, String pid, int limit, String order) throws TableSchemaMismatchException, TableNotFoundException
Retrieves from a subset-table
limitprimary keys whose rows are not marked to be in process or finished being processed and sets the rows of the retrieved primary keys as being "in process".The following parameters may be set:
limit- sets the maximum number of primary keys retrievedorder- determines whether to retrieve the primary keys in a particular order. Note that the default order of rows is undefined. If you need the same order in every run, you should specify some ordering as an SQL 'ORDER BY' statement. Whenorderis not prefixed with 'ORDER BY' (case ignored), it will be inserted.
The table is locked during this transaction. Locking and marking ensure that every primary key will be returned exactly once. Remember to remove the marks if you want to use the subset again ;)
- Parameters:
subsetTableName- - name of a table, conforming to the subset standardhostName- - will be saved in the subset tablepid- - will be saved in the subset tablelimit- - batchsize for marking/retrievingorder- - determines an ordering. Default order (which may change over time) when this parameter is null or empty.- Returns:
- An ArrayList of primary keys which have not yet been processed.
- Throws:
TableSchemaMismatchExceptionTableNotFoundException
-
countUnprocessed
public int countUnprocessed(String subsetTableName)
- Parameters:
subsetTableName-- Returns:
- See Also:
countUnprocessed(String)
-
countUnprocessed
public int countUnprocessed(String subsetTableName, String schemaName)
Counts the unprocessed rows in a subset table- Parameters:
subsetTableName- - name of the subset table- Returns:
- - number of rows
-
countRowsOfDataTable
public int countRowsOfDataTable(String tableName, String whereCondition, String schemaName)
-
hasUnfetchedRows
public boolean hasUnfetchedRows(String tableName)
-
hasUnfetchedRows
public boolean hasUnfetchedRows(String tableName, String schemaName)
Utility **********************************
-
deleteFromTable
public void deleteFromTable(String table, List<Object[]> ids)
Deletes entries from a table- Parameters:
table- name of the tableids- primary key arrays defining the entries to delete- See Also:
deleteFromTableSimplePK(String, List)
-
deleteFromTableSimplePK
public <T> void deleteFromTableSimplePK(String table, List<T> ids)
Deletes entries from a table where the primary key of this table must consist of exactly one column. For deletion from tables which contain a multi-column-primary-key seedeleteFromTable(String, List).- Parameters:
table- name of the tableids- primary key arrays defining the entries to delete- See Also:
deleteFromTable(String, List)
-
markAsProcessed
public int markAsProcessed(String table, List<Object[]> ids)
Modifies a subset table, marking entries as processed.- Parameters:
table- name of the subset tableids- primary key arrays defining the entries to delete- Returns:
- The number of successfully modified table rows.
-
markAsProcessed
public int markAsProcessed(String table)
Modifies a subset table, marking all its entries as processed.- Parameters:
table- name of the subset table- Returns:
- The number of successfully modified table rows.
-
modifyTable
public int modifyTable(String sql, List<Object[]> ids)
Executes a given SQL command (must end with "WHERE "!) an extends the WHERE-clause with the primary keys, set to the values in ids.
Assumes that the form of the primary keys matches the definition given in the active table schema in the configuration.
- Parameters:
sql- a valid SQL command, ending with "WHERE "ids- list of primary key arrays- Returns:
- The number of successfully modified table rows.
- See Also:
modifyTable(String, List)
-
modifyTable
public int modifyTable(String sql, List<Object[]> ids, String schemaName)
Executes a given SQL command (must end with "WHERE "!) and extends the WHERE-clause with the primary keys, set to the values in ids.
- Parameters:
sql- a valid SQL command, ending with "WHERE "ids- list of primary key arraysschemaName- name of the schema which defines the primary keys- Returns:
- The number of successfully modified table rows.
-
getReferencedTable
public String getReferencedTable(String referencingTable)
Returns the name of a table referenced by an SQL-foreign-key.- Parameters:
referencingTable- the name of the table for which the foreign keys shall be checked- Returns:
- the name of the first referenced table or
nullif there is no referenced table (i.e. the passed table name denotes a data table). - Throws:
IllegalArgumentException- WhenreferencingTableisnull.
-
createSchema
public void createSchema(String schemaName)
Creates the PostgreSQL schemaschemaNamein the active database.- Parameters:
schemaName- The name of the PostgreSQL schema to create.
-
createTable
public void createTable(String tableName, String comment) throws SQLException
Creates a new table according to the field schema definition corresponding to the active schema name determined in the configuration.- Parameters:
tableName- the name of the new table- Throws:
SQLException
-
createTable
public void createTable(String tableName, String schemaName, String comment)
Creates a new table according to the field schema definition corresponding to the nameschemaNamegiven in the configuration file.- Parameters:
tableName- the name of the new table- Throws:
SQLException
-
createTable
public void createTable(String tableName, String referenceTableName, String schemaName, String comment)
Creates a new table according to the field schema definition corresponding to the name
schemaNameand with foreign key references to the primary key of referenceTableName.The primary key of the tables tableName and referenceTableName must be equal. The foreign key constraint is configured for ON DELETE CASCADE which means, when in the referenced table rows are deleted, there are also deleted in the table created by this method call.
- Parameters:
tableName- The name of the new table.referenceTableName- The table to be referenced by this table.schemaName- The table schema determining the structure (especially the primary key) of the new table.comment- A comment for the new table.- Throws:
SQLException
-
assureColumnsExist
public void assureColumnsExist(String tableName, List<String> columnsNames, String columnDataType)
Checks if the given columns exist with the given data type. If not, the missing columns are appended to the table.
- Parameters:
tableName-columnsNames-columnDataType-
-
getTableColumnInformation
public List<Map<String,Object>> getTableColumnInformation(String qualifiedTable, String... fields)
Returns information about the columns in a table. The most simple usage of this method would be to retrieve the names of all columns of a table, for example.
Possible column information fields:- table_catalog
- table_schema
- table_name
- column_name
- ordinal_position
- column_default
- is_nullable
- data_type
- character_maximum_length
- character_octet_length
- numeric_precision
- numeric_precision_radix
- numeric_scale
- datetime_precision
- interval_type
- interval_precision
- character_set_catalog
- character_set_schema
- character_set_name
- collation_catalog
- collation_schema
- collation_name
- domain_catalog
- domain_schema
- domain_name
- udt_catalog
- udt_schema
- udt_name
- scope_catalog
- scope_schema
- scope_name
- maximum_cardinality
- dtd_identifier
- is_self_referencing
- is_identity
- identity_generation
- identity_start
- identity_increment
- identity_maximum
- identity_minimum
- identity_cycle
- is_generated
- generation_expression
- is_updatable
- Parameters:
qualifiedTable-fields- The column meta information fields to return. Will be all if this parameter is omitted.- Returns:
-
createSubsetTable
public void createSubsetTable(String subsetTable, String supersetTable, Integer maxNumberRefHops, String comment) throws SQLException
Does the same as
createSubsetTable(String, String, Integer, String, String)with the exception that the assumed table schema is that of the active schema defined in the configuration file.- Parameters:
subsetTable- name of the subset tablesupersetTable- name of the referenced tablemaxNumberRefHops- the maximum number of times a foreign key reference to a data table may be followedcomment- will be added to the table in the database, used to make tables reproducable- Throws:
SQLException
-
createSubsetTable
public void createSubsetTable(String subsetTable, String supersetTable, String comment) throws SQLException
Does the same as
createSubsetTable(String, String, Integer, String, String)with the exception that the assumed table schema is that of the active schema defined in the configuration file and the first referenced data table is used as data table.- Parameters:
subsetTable- name of the subset tablesupersetTable- name of the referenced tablecomment- will be added to the table in the database, used to make tables reproducable- Throws:
SQLException
-
createSubsetTable
public void createSubsetTable(String subsetTable, String supersetTable, Integer posOfDataTable, String comment, String schemaName) throws SQLException
Creates an empty table referencing the primary key of the data table given by
superSetTableor, if this is a subset table itself, the data table referenced by that table.To fill the empty subset table with data, use one of the
init[...]methods offered by this class.Subset tables have a particular table scheme. They define a foreign key to the primary key of the referenced data table. There are the following additional columns:
Name Type is_in_process boolean is_processed boolean last_component text log text has errors boolean pid character varying(10) host_name character varying(100) processing_timestamp timestamp without time zone The subset table can be used for processing, e.g. by UIMA CollectionReaders, which store information about the processing in it.
The actual data is located in the referenced table.
- Parameters:
subsetTable- name of the subset tablesupersetTable- name of the referenced tableposOfDataTable- the position of the datatable that should be referenced; the 1st would be nearest data table, i.e. perhaps supersetTable itself. The 2nd would be the datatable referenced by the first data table on the reference path.schemaName- name of the table schema to work with (determined in the configuration file)comment- will be added to the table in the database, used to make tables reproducable- Throws:
SQLException
-
createIndex
public void createIndex(String table, String... columns) throws SQLException
Creates an index for table table on the given columns. The name of the index will be <table>_idx. It is currently not possible to create a second index since the names would collide. This would require an extension of this method for different names.- Parameters:
table- The table for which an index should be created.columns- The columns the index should cover.- Throws:
SQLException- In case something goes wrong.
-
getReferencedTable
public String getReferencedTable(String startTable, Integer posOfDataTable) throws SQLException
Gets the - possibly indirectly - referenced table of startTable where posOfDataTable specifies the position of the desired table in the reference chain starting at startTable.- Parameters:
startTable-posOfDataTable-- Returns:
- Throws:
SQLException
-
getNextDataTable
public String getNextDataTable(String referencingTable)
Follows the foreign-key specifications of the given table to the referenced table. This process is repeated until a non-subset table (a table for whichisSubsetTable(String)returnsfalse) is encountered or a table without a foreign-key is found. IfreferencingTablehas no foreign-key itself, null is returned since the referenced table does not exist.- Parameters:
referencingTable- The table to get the next referenced data table for, possibly across other subsets ifreferencingTabledenotes a subset table..- Returns:
- The found data table or
null, ifreferencingTableis a data table itself. - Throws:
CoStoSysSQLRuntimeException- If table meta data checking fails.
-
getNextOrThisDataTable
public String getNextOrThisDataTable(String referencingTable)
Determines the first data table on the reference pathreferencingTable -> table1 -> table2 -> ... -> lastTable -> nullreferenced fromreferencingTable. This means thatreferencingTableis returned itself if it is a data table.- Parameters:
referencingTable- The start point table for the path for which the first data table is to be returned.- Returns:
- The first data table on the foreign-key path beginning with
referencingTableitself. - Throws:
SQLException- If a database operation fails.
-
isSubsetTable
public boolean isSubsetTable(String table)
Checks if the given table is a subset table.
A database table is identified to be a subset table if it exhibits all the column names that subsets have. Those are defined in
subsetColumns.- Parameters:
table- The table to check for being a subset table.- Returns:
- True, iff
tabledenotes a subset table, false otherwise. The latter case includes thetableparameter beingnull. - Throws:
SQLException- If table meta data checking fails.
-
isDataTable
public boolean isDataTable(String table)
-
dropTable
public boolean dropTable(String table) throws SQLException
Drops the table with the given name. The name must be schema-qualified unless it resides in the public schema.
This automatically removes this table from the list of mirror tables, if it was one.
- Parameters:
table- The schema-qualified table name to drop.- Returns:
- Whether the drop was successful.
- Throws:
SQLException- If an error occurs.
-
dropSchema
public boolean dropSchema(String schema) throws SQLException
Drops the empty Postgres schema with given name.- Parameters:
schema- The schema to be dropped.- Throws:
SQLException
-
removeTableFromMirrorSubsetList
public boolean removeTableFromMirrorSubsetList(CoStoSysConnection conn, String mirrorSubsetName) throws SQLException
Removes an entry from the table listing mirror subsets, located at
Constants.MIRROR_COLLECTION_NAME.- Parameters:
conn- A database connection.mirrorSubsetName- The name of the mirror subset table to be removed.- Returns:
- Whether the deletion was successful.
- Throws:
SQLException- If an error occurs.
-
tableExists
public boolean tableExists(CoStoSysConnection conn, String tableName)
Tests if a table exists.- Parameters:
tableName- name of the table to test- Returns:
- true if the table exists, false otherwise
-
tableExists
public boolean tableExists(String tableName)
Tests if a table exists.- Parameters:
tableName- name of the table to test- Returns:
- true if the table exists, false otherwise
-
schemaExists
public boolean schemaExists(String schemaName)
crea Tests if a schema exists.- Parameters:
schemaName- name of the schema to test- Returns:
- true if the schema exists, false otherwise
-
isEmpty
public boolean isEmpty(String tableName)
Tests if a table contains entries.- Parameters:
tableName- name of the schema to test- Returns:
- true if the table has entries, false otherwise
-
defineRandomSubset
public void defineRandomSubset(int size, String subsetTable, String supersetTable, String comment) throws SQLExceptionConvenience method for creating and initializing a subset in one step. See method references below for more information.
- Parameters:
size-subsetTable-supersetTable-comment-- Throws:
SQLException- See Also:
initRandomSubset(int, String, String)
-
defineRandomSubset
public void defineRandomSubset(int size, String subsetTable, String supersetTable, String comment, String schemaName) throws SQLExceptionConvenience method for creating and initializing a subset in one step. See method references below for more information.
- Parameters:
size-subsetTable-supersetTable-comment-schemaName-- Throws:
SQLException- See Also:
initRandomSubset(int, String, String, String)
-
defineSubset
public void defineSubset(List<String> values, String subsetTable, String supersetTable, String columnToTest, String comment) throws SQLException
Convenience method for creating and initializing a subset in one step. See method references below for more information.
- Parameters:
values-subsetTable-supersetTable-columnToTest-comment-- Throws:
SQLException- See Also:
initSubset(List, String, String, String)
-
defineSubset
public void defineSubset(List<String> values, String subsetTable, String supersetTable, String columnToTest, String comment, String schemaName) throws SQLException
Convenience method for creating and initializing a subset in one step. See method references below for more information.
- Parameters:
values-subsetTable-supersetTable-columnToTest-comment-schemaName-- Throws:
SQLException- See Also:
initSubset(List, String, String, String, String)
-
defineSubset
public void defineSubset(String subsetTable, String supersetTable, String comment) throws SQLException
Convenience method for creating and initializing a subset in one step. See method references below for more information.
- Parameters:
subsetTable-supersetTable-comment-- Throws:
SQLException- See Also:
initSubset(String, String)
-
defineSubset
public void defineSubset(String subsetTable, String supersetTable, String comment, String schemaName) throws SQLException
Convenience method for creating and initializing a subset in one step. See method references below for more information.
- Parameters:
subsetTable-supersetTable-comment-schemaName-- Throws:
SQLException- See Also:
initSubset(List, String, String, String, String)
-
defineSubsetWithWhereClause
public void defineSubsetWithWhereClause(String subsetTable, String supersetTable, String conditionToCheck, String comment) throws SQLException
Convenience method for creating and initializing a subset in one step. See method references below for more information.
- Parameters:
subsetTable-supersetTable-conditionToCheck-comment-- Throws:
SQLException- See Also:
initSubsetWithWhereClause(String, String, String)
-
defineSubsetWithWhereClause
public void defineSubsetWithWhereClause(String subsetTable, String supersetTable, String conditionToCheck, String comment, String schemaName) throws SQLException
Convenience method for creating and initializing a subset in one step. See method references below for more information.
- Parameters:
subsetTable-supersetTable-conditionToCheck-comment-schemaName-- Throws:
SQLException- See Also:
initSubsetWithWhereClause(String, String, String, String)
-
defineMirrorSubset
public void defineMirrorSubset(String subsetTable, String supersetTable, boolean performUpdate, String comment) throws SQLException
Convenience method for creating and initializing a subset in one step. See method references below for more information.
- Parameters:
subsetTable-supersetTable-comment-- Throws:
SQLException
-
defineMirrorSubset
public void defineMirrorSubset(String subsetTable, String supersetTable, boolean performUpdate, Integer maxNumberRefHops, String comment) throws SQLException
Convenience method for creating and initializing a subset in one step. See method references below for more information.
- Parameters:
subsetTable-supersetTable-maxNumberRefHops- the maximum number of times a foreign key reference to a data table may be followedcomment-- Throws:
SQLException- See Also:
createSubsetTable(String, String, Integer, String)
-
defineMirrorSubset
public void defineMirrorSubset(String subsetTable, String supersetTable, boolean performUpdate, String comment, String schemaName) throws SQLException
Convenience method for creating and initializing a subset in one step. See method references below for more information.
- Parameters:
subsetTable-supersetTable-comment-schemaName-- Throws:
SQLException
-
initRandomSubset
public void initRandomSubset(int size, String subsetTable, String superSetTable, String schemaName)Selects
sizerows of the given super set table randomly and inserts them into the subset table.- Parameters:
size- size of the subset to createsubsetTable- name of subset table to insert the chosen rows intosuperSetTable- name of the table to choose fromschemaName- name of the schema to use
-
initSubset
public void initSubset(List<String> values, String subsetTable, String supersetTable, String columnToTest)
Defines a subset by populating a subset table with primary keys from another table. A WHERE clause is used to control which entries are copied, checking if columnToTest has the desired value.- Parameters:
values- Desired values for the columnToTestsubsetTable- name of the subset tablesupersetTable- name of table to referencecolumnToTest- column to check for value
-
initSubset
public void initSubset(List<String> values, String subsetTable, String supersetTable, String columnToTest, String schemaName)
Defines a subset by populating a subset table with primary keys from another table. A WHERE clause is used to control which entries are copied, checking if columnToTest has the desired value.- Parameters:
values- Desired values for the columnToTestsubsetTable- name of the subset tablesupersetTable- name of table to referenceschemaName- schema to usecolumnToTest- column to check for value
-
initSubset
public void initSubset(String subsetTable, String supersetTable)
InitializessubsetTableby inserting one row for each entry insupersetTable.- Parameters:
subsetTable-supersetTable-- See Also:
initSubset(String, String, String)
-
initSubset
public void initSubset(String subsetTable, String supersetTable, String schemaName)
Defines a subset by populating a subset table with all primary keys from another table.- Parameters:
subsetTable- name of the subset tablesupersetTable- name of table to referenceschemaName- name of the schema used to determine the primary keys
-
initSubsetWithWhereClause
public void initSubsetWithWhereClause(String subsetTable, String supersetTable, String whereClause)
Defines a subset by populating a subset table with primary keys from another table. All those entries are selected, for which the conditionToCheck is true.- Parameters:
subsetTable- name of the subset tablesupersetTable- name of table to referencewhereClause- condition to check by a SQL WHERE clause, e.g. 'foo > 10'- See Also:
initSubsetWithWhereClause(String, String, String, String)
-
initSubsetWithWhereClause
public void initSubsetWithWhereClause(String subsetTable, String supersetTable, String whereClause, String schemaName)
Defines a subset by populating a subset table with primary keys from another table. All those entries are selected, for which the conditionToCheck is true.- Parameters:
subsetTable- name of the subset tablesupersetTable- name of table to referenceschemaName- name of the schema used to determine the primary keyswhereClause- condition to check by a SQL WHERE clause, e.g. 'foo > 10'
-
initMirrorSubset
public void initMirrorSubset(String subsetTable, String supersetTable, boolean performUpdate) throws SQLException
- Throws:
SQLException
-
initMirrorSubset
public void initMirrorSubset(String subsetTable, String supersetTable, boolean performUpdate, String schemaName) throws SQLException
Defines a mirror subset populating a subset table with primary keys from another table.
Its name is saved into a special meta data table to enable automatic syncing (changes to the superset are propagated to the mirror subset).- Parameters:
subsetTable- name of the subset tablesupersetTable- name of table to reference- Throws:
SQLException
-
getMirrorSubsetNames
public Map<String,Boolean> getMirrorSubsetNames(CoStoSysConnection conn, String tableName)
- Parameters:
tableName- table to gather mirror subsets for- Returns:
- schema-qualified names of all mirror subsets for this table in a
LinkedHashMap. The values indicate whether the mirror subset is in "perform update" mode.
-
resetSubset
public void resetSubset(String subsetTableName)
Sets the values in theis_processed,is_in_process,has_errorsandlogcolumns of a subset toFALSE.- Parameters:
subsetTableName- name of the subset to reset
-
resetSubset
public void resetSubset(String subsetTableName, boolean whereNotProcessed, boolean whereNoErrors, String lastComponent)
Sets the values in theis_processed,is_in_process,has_errorsandlogcolumns of a subset toFALSEwhere the corresponding rows areis_in_processoris_processed.The boolean parameter
whereNotProcessedis used for the use case where only those rows should be reset that arein_processbut notis_processedwhich may happen when a pipeline crashed, a document has errors or a pipeline ist just canceled.In a similar fashion,
whereNoErrorsresets those rows that have no errors.Both boolean parameters may be combined in which case only non-processed rows without errors will be reset.
- Parameters:
subsetTableName- name of the table to reset unprocessed rows
-
resetSubset
public int[] resetSubset(CoStoSysConnection conn, String subsetTableName, List<Object[]> pkValues)
- Parameters:
subsetTableName-pkValues-- Returns:
-
performBatchUpdate
public int[] performBatchUpdate(CoStoSysConnection conn, List<Object[]> pkValues, String sqlFormatString, String schemaName)
-
resetSubset
public int[] resetSubset(CoStoSysConnection conn, String subsetTableName, List<Object[]> pkValues, String schemaName)
Sets the values in theis_processedandis_in_processrows of a subset toFALSE. Only resets the subset table rows where the primary key equals one of the entries inpkValues.- Parameters:
subsetTableName- - name of the table to resetpkValues- - list of primary keys- Returns:
-
determineExistingSubsetRows
public int[] determineExistingSubsetRows(CoStoSysConnection conn, String subsetTableName, List<Object[]> pkValues, String schemaName)
-
importFromXML
public void importFromXML(Iterable<byte[]> xmls, String identifier, String tableName)
- Parameters:
xmls-tableName-identifier-- See Also:
importFromXML(Iterable, String, String, String)
-
importFromXML
public void importFromXML(Iterable<byte[]> xmls, String tableName, String identifier, String schemaName)
Imports XMLs into a table.- Parameters:
xmls- - an Iterator over XMLs as byte[]tableName- - name of the table to importidentifier- - used for error messages
-
importFromXMLFile
public void importFromXMLFile(String fileStr, String tableName)
Import new medline XMLs in a existing table from an XML file or a directory of XML files. The XML must be in MEDLINE XML format and can additionally be (G)Zipped.- Parameters:
fileStr- - path to file or directory of (G)Zipped MEDLINE XML file(s)tableName- - name of the target table- See Also:
importFromXMLFile(String, String, String)
-
importFromXMLFile
public void importFromXMLFile(String fileStr, String tableName, String schemaName)
Import new medline XMLs in a existing table from an XML file or a directory of XML files. The XML must be in MEDLINE XML format and can additionally be (G)Zipped.- Parameters:
fileStr- - path to file or directory of (G)Zipped MEDLINE XML file(s)tableName- - name of the target tableschemaName- the table schema to use for the import
-
updateFromXML
public void updateFromXML(String fileStr, String tableName, boolean resetUpdatedDocumentsInMirrorSubsets)
- Parameters:
fileStr-tableName-resetUpdatedDocumentsInMirrorSubsets-- See Also:
updateFromXML(String, String, boolean, String)
-
updateFromXML
public void updateFromXML(String fileStr, String tableName, boolean resetUpdatedDocumentsInMirrorSubsets, String schemaName)
Updates an existing database. If the file contains new entries those are inserted, otherwise the table is updated to the version in the file.- Parameters:
fileStr- - file containing new or updated entriestableName- - table to updateresetUpdatedDocumentsInMirrorSubsets- If the rows of mirror subsets which correspond to the updated document table rows should be reset, i.e. is_processed and is_in_process both set to FALSE.schemaName- The name of the table schema that the updated table adheres to.
-
importFromRowIterator
public void importFromRowIterator(Iterator<Map<String,Object>> it, String tableName)
- Parameters:
it-tableName-
-
importFromRowIterator
public void importFromRowIterator(Iterator<Map<String,Object>> it, String tableName, String tableSchema)
- Parameters:
it-tableName-
-
importFromRowIterator
public void importFromRowIterator(Iterator<Map<String,Object>> it, String tableName, boolean commit, String schemaName)
Internal method to import into an existing table- Parameters:
it- - an Iterator, yielding rows to insert into the databasetableName- - the updated tablecommit- - if true, the inserted data will be committed in batches within this method; no commits will happen otherwise.schemaName- the name of the table schema corresponding to the data table
-
updateFromRowIterator
public void updateFromRowIterator(Iterator<Map<String,Object>> it, String tableName, boolean resetUpdatedDocumentsInMirrorSubsets)
Updates a table with the entries yielded by the iterator. If the entries is not yet in the table, it will be inserted instead.
The input rows are expected to fit the active table schema.
- Parameters:
it- - an Iterator, yielding new or updated entries.tableName- - the updated table
-
updateFromRowIterator
public void updateFromRowIterator(Iterator<Map<String,Object>> it, String tableName, boolean commit, boolean resetUpdatedDocumentsInMirrorSubsets, String schemaName)
Updates a table with the entries yielded by the iterator. If the entries is not yet in the table, it will be inserted instead.
The input rows are expected to fit the table schema
schemaName.- Parameters:
it- - an Iterator, yielding new or updated entries.tableName- - the updated tablecommit- - if true, the updated data will be committed in batches within this method; nothing will be commit otherwise.resetUpdatedDocumentsInMirrorSubsets-schemaName- the name of the table schema corresponding to the updated data
-
queryWithTime
public DBCIterator<byte[][]> queryWithTime(List<Object[]> ids, String table, String timestamp)
- Parameters:
ids-table-timestamp-- Returns:
- See Also:
queryWithTime(List, String, String, String)
-
queryWithTime
public DBCIterator<byte[][]> queryWithTime(List<Object[]> ids, String table, String timestamp, String schemaName)
Returns an iterator over all rows in the table with matching id and a timestamp newer (>) thantimestamp. The Iterator will use threads, memory and a connection until all matches are returned.- Parameters:
ids- - List with primary keystable- - table to querytimestamp- - timestamp (only rows with newer timestamp are returned)- Returns:
- - pmid and xml as an Iterator
-
queryAll
public DBCIterator<Object[]> queryAll(List<String> fields, String table)
Returns an iterator over the columnfieldin the tabletable. NOTE: The Iterator will use threads, memory and a connection until the iterator is empty, i.e.hasNext()returns null!- Parameters:
fields- - field to returntable- - table to query- Returns:
- - results as an Iterator
-
query
public DBCIterator<Object[]> query(String table, List<String> fields)
Returns the requested fields from the requested table. The iterator must be fully consumed or dangling threads and connections will remain, possible causing the application to wait forever for an open connection.- Parameters:
table- The table to query.fields- The names of the columns to retrieve values from.- Returns:
- An iterator over the requested columns values.
-
query
public DBCIterator<Object[]> query(String table, List<String> fields, long limit)
Returns the requested fields from the requested table. The iterator must be fully consumed or dangling threads and connections will remain, possible causing the application to wait forever for an open connection.- Parameters:
table- The table to query.fields- The names of the columns to retrieve values from.limit- A limit of documents to retrieve.- Returns:
- An iterator over the requested columns values.
-
query
public DBCIterator<Object[]> query(List<String[]> keys, String table)
Returns the values the the columnDEFAULT_FIELDin the given table. The Iterator will use threads, memory and a connection until all matches were returned.- Parameters:
keys-table-- Returns:
- See Also:
query(List, String, String)
-
query
public DBCIterator<Object[]> query(List<String[]> keys, String table, String schemaName)
Returns the values the the columnDEFAULT_FIELDin the given table. The Iterator will use threads, memory and a connection until all matches were returned.- Parameters:
keys- - list of String[] containing the parts of the primary keytable- - table to query- Returns:
- - results as an Iterator
-
retrieveColumnsByTableSchema
public DBCIterator<byte[][]> retrieveColumnsByTableSchema(List<Object[]> ids, String table)
Retrieves row values oftablefrom the database. The returned columns are those that are configuration to be retrieved in the active table schema.- Parameters:
ids-table-- Returns:
- See Also:
retrieveColumnsByTableSchema(List, String, String)
-
retrieveColumnsByTableSchema
public DBCIterator<byte[][]> retrieveColumnsByTableSchema(List<Object[]> ids, String table, String schemaName)
Retrieves row values oftablefrom the database. The returned columns are those that are configuration to be retrieved in the table schema with nameschemaName.- Parameters:
ids-table-schemaName-- Returns:
-
retrieveColumnsByTableSchema
public DBCIterator<byte[][]> retrieveColumnsByTableSchema(List<Object[]> ids, String[] tables, String[] schemaNames)
Retrieves data from the database over multiple tables. All tables will be joined on the given IDs. The columns to be retrieved for each table is determined by its table schema. For this purpose, thetablesandschemaNamearrays are required to be parallel.- Parameters:
ids- A list of primary keys identifying the items to retrieve.tables- The tables from which the items should be retrieved that are identified byids.schemaNames- A parallel array totablesthas specifies the table schema name of each table.- Returns:
- The joined data from the requested tables.
-
queryDataTable
public DBCIterator<byte[][]> queryDataTable(String tableName, String whereCondition)
Returns all column data from the data table
tableNamewhich is marked as 'to be retrieved' in the table scheme specified by the active table scheme.For more specific information, please refer to
queryDataTable(String, String, String[], String).- Parameters:
tableName- Name of a data table.whereCondition- Optional additional specifications for the SQL "SELECT" statement.- See Also:
queryDataTable(String, String, String[], String)
-
queryDataTable
public DBCIterator<byte[][]> queryDataTable(String tableName, String whereCondition, String[] tablesToJoin, String schemaName)
-
queryDataTable
public DBCIterator<byte[][]> queryDataTable(String tableName, String whereCondition, String[] tablesToJoin, String[] schemaNames)
Returns all column data from the data table
tableNamewhich is marked as 'to be retrieved' in the table scheme specified byschemaName.This method offers direct access to the table data by using an SQL
ResultSetin cursor mode, allowing for queries leading to large results.An optional where clause (actually everything behind the "FROM" in the SQL select statement) may be passed to restrict the columns being returned. All specifications are allowed which do not alter the number of columns returned (like "GROUP BY").
- Parameters:
tableName- Name of a data table.whereCondition- Optional additional specifications for the SQL "SELECT" statement.schemaNames- The table schema names to determine which columns should be retrieved.- Returns:
- An iterator over
byte[][]. Each returned byte array contains one nested byte array for each retrieved column, holding the column's data in a sequence of bytes.
-
querySubset
public DBCIterator<byte[][]> querySubset(String tableName, long limitParam) throws SQLException
- Parameters:
tableName-limitParam-- Returns:
- Throws:
SQLException
-
getQueryBatchSize
public int getQueryBatchSize()
-
setQueryBatchSize
public void setQueryBatchSize(int queryBatchSize)
-
querySubset
public DBCIterator<byte[][]> querySubset(String tableName, String whereClause, long limitParam, Integer numberRefHops, String schemaName)
Retrieves XML field values in the data table referenced by the subset table
tableNameortableNameitself if it is a data table.The method always first retrieves a batch of primary keys from the subset table and then gets the actual documents from the data table (necessary for the data table - subset paradigm). As this is unnecessary when querying directly from a data table, for that kind of queries this method calls
queryDataTable(String, String, String[], String).The number of returned documents is restricted in number by
limitParam. All documents are returned iflimitParamis of negative value.
Note: Of course,whereClausecould already contain an SQL 'LIMIT' specification. However, I won't work as expected since this limit expression would be applied to each batch of subset-IDs which is used to query the data table. Using thelimitParamparameter will assure you get at most as much documents from the iterator as specified. IftableNamedenotes a data table andwhereClausedoes not already contain a 'LIMIT' expression,limitParamwill be added towhereClausefor the subsequent call toqueryDataTable.- Parameters:
tableName- Subset table determining which documents to retrieve from the data table; may also be a data table itself.whereClause- An SQL where clause restricting the returned columns of each queried subset-ID batch. This clause must not change the rows returned (e.g. by 'GROUP BY').limitParam- Number restriction of documents to return.numberRefHops-schemaName- The name of table schema of the referenced data table.- Returns:
- An iterator returning documents references from or in the table
tableName. - Throws:
SQLException- See Also:
queryDataTable(String, String, String[], String)
-
getNumColumnsAndFields
public org.apache.commons.lang3.tuple.Pair<Integer,List<Map<String,String>>> getNumColumnsAndFields(boolean joined, String[] schemaNames)
Helper method to determine the columns that are returned in case of a joining operation. Returns the number of returned fields and the according field definitions. Ifjoinedis set tofalse, only the first table and the first schema is taken into account.- Parameters:
joined- Whether the data is joined.schemaNames- The names of the table schemas of the tables that are read. From the respective table schemas, the columns that are marked to be retrieved, are extracted.- Returns:
- A pair holding the number of retrieved columns and those columns themselves.
-
getNumRows
public long getNumRows(String tableName)
Returns the row count of the requested table.- Parameters:
tableName- The table to count the rows of.- Returns:
- The table row count.
-
status
public SubsetStatus status(String subsetTableName, Set<DataBaseConnector.StatusElement> statusElementsToReturn) throws TableNotFoundException
Returns a map with information about how many rows are marked as is_in_process, is_processed and how many rows there are in total.
The respective values are stored under with the keysConstants.IN_PROCESS,Constants.PROCESSEDandConstants.TOTAL.- Parameters:
subsetTableName- name of the subset table to gain status information for- Returns:
- A SubsetStatus instance containing status information about the subset table subsetTableName
- Throws:
TableNotFoundException- If subsetTableName does not point to a database table.
-
getTableDefinition
public List<String> getTableDefinition(String tableName)
Query the MetaData for the columns of a table- Parameters:
tableName- - the table- Returns:
- - List of String containing name and type of each column
-
getScheme
public String getScheme()
- Returns:
- - the active Postgres scheme
-
getFieldConfiguration
public FieldConfig getFieldConfiguration()
- Returns:
- the active field configuration
-
addFieldConfiguration
public void addFieldConfiguration(FieldConfig config)
Classes for query()
-
getFieldConfiguration
public FieldConfig getFieldConfiguration(String schemaName)
- Parameters:
schemaName- The name of the schema for which the eventualFieldConfigshould be returned.- Returns:
- The field configuration for
schemaName.
-
checkTableDefinition
public void checkTableDefinition(String tableName) throws TableSchemaMismatchException, TableNotFoundException
Checks whether the given table matches the active table schema.- Parameters:
tableName- The table to check.- Throws:
TableSchemaMismatchExceptionTableNotFoundException- See Also:
checkTableDefinition(String, String)
-
checkTableDefinition
public void checkTableDefinition(String tableName, String schemaName) throws TableSchemaMismatchException, TableNotFoundException
Compares the actual table in the database with its definition in the xml configuration Note: This method currently does not check other then primary key columns for tables that reference another table, even if those should actually be data tables.This method makes use of the
obtainOrReserveConnection(boolean)method to obtain a connection in case the current thread has not already obtained one.- Parameters:
tableName- - table to check- Throws:
TableSchemaMismatchExceptionTableNotFoundException
-
setProcessed
public void setProcessed(String subsetTableName, List<byte[][]> primaryKeyList)
Sets the values of
is_processedtoTRUEand ofis_in_processtoFALSEfor a collection of documents according to the given primary keys.- Parameters:
subsetTableName- name of the subsetprimaryKeyList- the list of primary keys which itself can consist of several primary key elements
-
setException
public void setException(String subsetTableName, ArrayList<byte[][]> primaryKeyList, HashMap<byte[][],String> logException)
Sets the value of
has_errorstoTRUEand adds a description inlogfor exceptions which occured during the processing of a collection of documents according to the given primary keys.- Parameters:
subsetTableName- name of the subsetprimaryKeyList- the list of primary keys which itself can consist of several primary key elementslogException- matches primary keys of unsuccessfully processed documents and exceptions that occured during the processing
-
getPrimaryKeyIndices
public List<Integer> getPrimaryKeyIndices()
Returns the indices of the primary keys, beginning with 0.
-
checkTableHasSchemaColumns
public void checkTableHasSchemaColumns(String tableName, String schema) throws TableSchemaMismatchException, TableNotFoundException
Checks if the given table has at least the columns defined in the given schema. An exception is raised if this is not the case.
- Parameters:
tableName- The table to check.schema- The table schema to check against.- Throws:
TableSchemaMismatchException- If the table misses at least one column defined in the given table schema.TableNotFoundException
-
checkTableSchemaCompatibility
public void checkTableSchemaCompatibility(String referenceSchema, String[] schemaNames) throws TableSchemaMismatchException
- Throws:
TableSchemaMismatchException
-
checkTableSchemaCompatibility
public void checkTableSchemaCompatibility(String... schemaNames) throws TableSchemaMismatchException
- Throws:
TableSchemaMismatchException
-
getDbURL
public String getDbURL()
-
setDbURL
public void setDbURL(String uri)
-
close
public void close()
-
isDatabaseReachable
public boolean isDatabaseReachable()
-
addXmiDocumentFieldConfiguration
public FieldConfig addXmiDocumentFieldConfiguration(List<Map<String,String>> primaryKey, boolean doGzip)
Adds an auto-generated field configuration that exhibits the given primary key and all the fields required to store complete XMI document data (i.e. not segmented XMI parts but the whole serialized CAS) in a database table. The field configuration will have the given primary key and an additional field named 'xmi'. This method is used by the Jena Document Information System (JeDIS) components jcore-xmi-db-reader and jcore-xmi-db-consumer.- Parameters:
primaryKey- The document primary key for which a document CAS XMI table schema should be created.doGzip- Whether the XMI data should be gzipped in the table.- Returns:
- The created field configuration.
-
addPKAdaptedFieldConfiguration
public FieldConfig addPKAdaptedFieldConfiguration(List<Map<String,String>> primaryKey, String fieldConfigurationForAdaption, String fieldConfigurationNameSuffix)
-
addPKAdaptedFieldConfiguration
public FieldConfig addPKAdaptedFieldConfiguration(List<Map<String,String>> primaryKey, String fieldConfigurationForAdaption, String fieldConfigurationNameSuffix, List<Map<String,String>> additionalColumns)
-
addXmiTextFieldConfiguration
public FieldConfig addXmiTextFieldConfiguration(List<Map<String,String>> primaryKey, List<Map<String,String>> additionalColumns, boolean doGzip)
Adds an auto-generated field configuration that exhibits the given primary key and all the fields required to store XMI base document data (i.e. the document text but not its annotations) in a database table. The additional fields are- xmi
- max_xmi_id
- sofa_mapping
addXmiAnnotationFieldConfiguration(List, boolean). This method is used by the Jena Document Information System (JeDIS) components jcore-xmi-db-reader and jcore-xmi-db-consumer.- Parameters:
primaryKey- The document primary key for which an base document XMI segmentation table schema should be created.doGzip- Whether the XMI data should be gzipped in the table.- Returns:
- The created field configuration.
-
addXmiAnnotationFieldConfiguration
public FieldConfig addXmiAnnotationFieldConfiguration(List<Map<String,String>> primaryKey, boolean doGzip)
Deprecated.JeDIS does not store annotations in columns to the primary document table.Adds an auto-generated field configuration that exhibits the given primary key and all the fields required to store XMI annotation data (not base documents) in database tables. The only field besides the primary key isxmiand will store the actual XMI annotation data. This table schema is used for the storage of XMI annotation graph segments. Those segments will then correspond to UIMA annotation types that are stored in tables of their own. A table schema to store the base document is created byaddXmiTextFieldConfiguration(List, List, boolean). This method is used by the Jena Document Information System (JeDIS) components jcore-xmi-db-reader and jcore-xmi-db-consumer.- Parameters:
primaryKey- The document primary key for which an base document XMI segmentation table schema should be created.doGzip- Whether the XMI data should be gzipped in the table.- Returns:
- The created field configuration.
-
obtainConnection
public CoStoSysConnection obtainConnection()
Returns the connection associated with the current thread object if it exists.- Returns:
- A connection associated with the current thread. Can be null if no shared connection for this thread is available.
- Throws:
IllegalStateException- If there are no reserved connections for the current thread.- See Also:
obtainOrReserveConnection(boolean),releaseConnections(),reserveConnection(boolean)
-
obtainOrReserveConnection
public CoStoSysConnection obtainOrReserveConnection()
This is just a convenience method for
obtainOrReserveConnection(boolean)with the parameter set to true.- Returns:
- A database connection to the database as configured in the configuration.
-
obtainOrReserveConnection
public CoStoSysConnection obtainOrReserveConnection(boolean shared)
This is the preferred way to obtain a database connection. It will reuse an existing connection or get a new one if required.
A reserved connection is required by many internal methods that need a database connection. They will acquire it by calling
obtainConnection(). This helps in reusing the same connection for multiple tasks within a single thread. This also helps to avoid deadlocks where a single thread requests multiple connections from the connection pool in method subcalls, blocking itself.Guaranteed to return either an already reserved connection or a newly reserved one. The newlyReserved property of the returned object indicates whether the returned connection was newly reserved or not (
true/false, respectively). To comfortably release the connection only when it was newly reserved, just close the CostoSysConnection.- Parameters:
shared- Whether or not the returned connection can be shared.- Returns:
- A pair consisting of connection and the information if it was newly reserved or not.
-
getNumReservedConnections
public int getNumReservedConnections()
-
getNumReservedConnections
public int getNumReservedConnections(boolean excludeNonShared)
-
reserveConnection
public CoStoSysConnection reserveConnection()
Just delegates to reserveConnection(true).
- Returns:
- A newly created, sharable connection.
-
reserveConnection
public CoStoSysConnection reserveConnection(boolean shared)
Only use when you are sure you need this method. Otherwise, use
obtainOrReserveConnection(boolean)Reserves a connection for the current thread. A reserved connection is required by many internal methods that need a database connection. They will aquire it by calling
obtainConnection(). This helps in reusing the same connection for multiple tasks within a single thread. This also helps to avoid deadlocks where a single thread requests multiple connections from the connection pool in method subcalls, blocking itself.Note that is possible to reserve multiple connections but that this does not have any positive effect as of now. You should always only reserve one connection per thread. After the connection is not required any more, call
releaseConnections()to free the connection.- Parameters:
shared-- Returns:
- The newly reserved connection.
- See Also:
obtainConnection(),releaseConnections()
-
releaseConnections
public void releaseConnections()
Releases all connections associated with the current thread back to the connection pool. After this call, the current thread will not have any reserved connections left.
-
withConnectionExecute
public void withConnectionExecute(DbcExecution command)
-
getProcessedPrimaryKeys
public List<Object[]> getProcessedPrimaryKeys(String subsetTable) throws CoStoSysException
Creates a query cursor to the given subset table and retrieves all those primary keys according to the active table schema that are marked as processed.- Parameters:
subsetTable- The subset table to retrieve the processed primary key values from.- Returns:
- The primary keys that are marked as processed in subsetTable.
- Throws:
CoStoSysException- If the given table is not a subset table.
-
getProcessedPrimaryKeys
public List<Object[]> getProcessedPrimaryKeys(String subsetTable, String tableSchema) throws CoStoSysException
Creates a query cursor to the given subset table and retrieves all those primary keys according to tableSchema that are marked as processed.- Parameters:
subsetTable- The subset table to retrieve the processed primary key values from.tableSchema- The schema of the data table referenced by the subset. Only the primary key columns are important.- Returns:
- The primary keys that are marked as processed in subsetTable.
- Throws:
CoStoSysException- If the given table is not a subset table.
-
-