public class StreamTransferManager extends Object
The data is split into chunks and uploaded using the multipart upload API. The uploading is done on separate threads, the number of which is configured by the user.
After creating an instance with details of the upload, use getMultiPartOutputStreams()
to get a list
of MultiPartOutputStreams. As you write data to these streams, call
MultiPartOutputStream.checkSize() regularly. When you finish, call MultiPartOutputStream.close().
Parts will be uploaded to S3 as you write.
Once all streams have been closed, call complete(). Alternatively you can call
abort()
at any point if needed.
Here is an example. A lot of the code relates to setting up threads for creating data unrelated to the library. The essential parts are commented.
AmazonS3Client client = new AmazonS3Client(awsCreds);
int numStreams = 2;
int numUploadThreads = 2;
int queueCapacity = 2;
int partSize = 5;
// Setting up
final StreamTransferManager manager = new StreamTransferManager(bucket, key, client, numStreams,
numUploadThreads, queueCapacity, partSize);
final List<MultiPartOutputStream> streams = manager.getMultiPartOutputStreams();
ExecutorService pool = Executors.newFixedThreadPool(numStreams);
for (int i = 0; i < numStreams; i++) {
final int streamIndex = i;
pool.submit(new Runnable() {
public void run() {
try {
MultiPartOutputStream outputStream = streams.get(streamIndex);
for (int lineNum = 0; lineNum < 1000000; lineNum++) {
String line = generateData(streamIndex, lineNum);
// Writing data and potentially sending off a part
outputStream.write(line.getBytes());
try {
outputStream.checkSize();
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
}
// The stream must be closed once all the data has been written
outputStream.close();
} catch (Exception e) {
// Aborts all uploads
manager.abort(e);
}
}
});
}
pool.shutdown();
pool.awaitTermination(5, TimeUnit.SECONDS);
// Finishing off
manager.complete();
The final file on S3 will then usually be the result of concatenating all the data written to each stream,
in the order that the streams were in in the list obtained from getMultiPartOutputStreams(). However this
may not be true if multiple streams are used and some of them produce less than 5 MB of data. This is because the multipart
upload API does not allow the uploading of more than one part smaller than 5 MB, which leads to fundamental limits
on what this class can accomplish. If order of data is important to you, then either use only one stream or ensure
that you write at least 5 MB to every stream.
While performing the multipart upload this class will create instances of InitiateMultipartUploadRequest,
UploadPartRequest, and CompleteMultipartUploadRequest, fill in the essential details, and send them
off. If you need to add additional details then override the appropriate customise*Request methods and
set the required properties within.
This class does not perform retries when uploading. If an exception is thrown at any stage the upload will be aborted and the
exception rethrown, wrapped in a RuntimeException.
| Modifier and Type | Field and Description |
|---|---|
protected String |
bucketName |
protected String |
putKey |
protected com.amazonaws.services.s3.AmazonS3 |
s3Client |
protected String |
uploadId |
| Constructor and Description |
|---|
StreamTransferManager(String bucketName,
String putKey,
com.amazonaws.services.s3.AmazonS3 s3Client,
int numStreams,
int numUploadThreads,
int queueCapacity,
int partSize)
Initiates a multipart upload to S3 using the first three parameters.
|
| Modifier and Type | Method and Description |
|---|---|
void |
abort()
Aborts the upload.
|
void |
abort(Throwable throwable)
Aborts the upload and logs a message including the stack trace of the given throwable.
|
void |
complete()
Blocks while waiting for the threads uploading the contents of the streams returned
by
getMultiPartOutputStreams() to finish, then sends a request to S3 to complete
the upload. |
void |
customiseCompleteRequest(com.amazonaws.services.s3.model.CompleteMultipartUploadRequest request) |
void |
customiseInitiateRequest(com.amazonaws.services.s3.model.InitiateMultipartUploadRequest request) |
void |
customiseUploadPartRequest(com.amazonaws.services.s3.model.UploadPartRequest request) |
List<MultiPartOutputStream> |
getMultiPartOutputStreams() |
String |
toString() |
protected final String bucketName
protected final String putKey
protected final com.amazonaws.services.s3.AmazonS3 s3Client
protected final String uploadId
public StreamTransferManager(String bucketName, String putKey, com.amazonaws.services.s3.AmazonS3 s3Client, int numStreams, int numUploadThreads, int queueCapacity, int partSize)
MultiPartOutputStreams and threads to upload the parts they produce in parallel.
Parts that have been produced sit in a queue of specified capacity while they wait for a thread to upload them.
The worst case memory usage is therefore (numStreams + numUploadThreads + queueCapacity) * partSize,
while higher values for these first three parameters may lead to better resource usage and throughput.
S3 allows at most 10 000 parts to be uploaded. This means that if you are uploading very large files, the part size must be big enough to compensate. Moreover the part numbers are distributed equally among streams so keep this in mind if you might write much more data to some streams than others.
numStreams - the number of multiPartOutputStreams that will be created for you to write to.numUploadThreads - the number of threads that will upload parts as they are produced.queueCapacity - the capacity of the queue that holds parts yet to be uploaded.partSize - the minimum size of each part in MB before it gets uploaded. Minimum is 5 due to limitations of S3.
More than 500 is not useful in most cases as this corresponds to the limit of 5 TB total for any upload.public List<MultiPartOutputStream> getMultiPartOutputStreams()
public void complete()
getMultiPartOutputStreams() to finish, then sends a request to S3 to complete
the upload. For the former to complete, it's essential that every stream is closed, otherwise the upload
threads will block forever waiting for more data.public void abort(Throwable throwable)
public void abort()
public void customiseInitiateRequest(com.amazonaws.services.s3.model.InitiateMultipartUploadRequest request)
public void customiseUploadPartRequest(com.amazonaws.services.s3.model.UploadPartRequest request)
public void customiseCompleteRequest(com.amazonaws.services.s3.model.CompleteMultipartUploadRequest request)
Copyright © 2015. All rights reserved.