This package provides implementations of bigdata services (metadata service, data service, transaction manager service.
A bigdata federation is comprised of the following essential services:
bigdata provides the following optional services. These are not considered essential since (a) they are not required by clients that make direct use of bigdata as a distributed database; and (b) they are themselves applications of the services briefly described above.
In order to deploy a bigdata federation, the following preconditions must be met (a standalone deployment can be realized by meeting these preconditions on a single host). This will go more smoothly if JINI is running before you start the various bigdata services since they will then be able to register themselves immediately on startup. Service discovery by default uses the local network and the "public" group. If you want to run more than one bigdata federation on the same network, then you MUST edit the configuration file.
Hosts:
In addition, there must be at least one host on which the following preconditions are true (a different host could be used for each of these services, and more than one instance of these services may be running on the network). These services should be started during the boot procedure so that they are available whenever the host is up and running.
Clients:
The serviceID identifies the data service. If the service dies and then restarts on the same host with the same data, then it is still the same service instance. If the service dies and is recreated on another host with the same data, then it is still the same instance. What is important is that (a) the service has the same data (journals, index segments, and serviceID); and (b) that the data for that service has not become stale (or inconsistent).
The easiest way in which the data can become stale is for the service to fail. When failover is operating, clients will be redirected to one or more other services having consistent data for the same index partitions. If clients then write on _any_ of those index partitions and commit, then the old service now has stale (aka inconsistent) data. A service with inconsistent data MUST NOT be used.
In general, the majority of state for a service will live in its index segments. This can be hundreds or thousands of times more data in the index segments than there is in any single journal. This can make it worth while to make the service consistent again.
While bigdata makes use of jini for service registration and discovery, it does not use downloaded code for executing its basic services (the necessary interfaces and implementation classes are already bundled in bigdata.jar). However, clients MAY use downloaded code in order to have data servers execute remote procedures or extension batch operations. In this scenario, the client will create and register a JINI service. The service should perform all of its operations locally. The data server will lookup the service using jini and then execute it locally.
In order for downloaded code to work correctly, clients must ensure
that their classes are exposed by an HTTP server accessible to bigdata
servers and declared to the VM in which the client starts using:
-Djava.rmi.server.codebase=...
@todo consider this use case further. Are there other/better ways for executing remote code on the data server? At a minimum, we should supply a base class and ant script support for creating such remote services.