See: Description
| Package | Description |
|---|---|
| com.azure.storage.file.datalake |
Package containing the class required for DataLakeStorageClient.
|
| com.azure.storage.file.datalake.models |
Package containing classes for DataLakeStorageClient.
|
| com.azure.storage.file.datalake.options |
Package containing options model classes used by Azure Storage File Datalake.
|
| com.azure.storage.file.datalake.sas |
Package containing sas related classes for DataLakeStorageClient.
|
| com.azure.storage.file.datalake.specialized |
Package containing specialized lease clients for Azure Storage File Data Lake.
|
Azure Data Lake Storage is Microsoft's optimized storage solution for for big data analytics workloads. A fundamental part of Data Lake Storage Gen2 is the addition of a hierarchical namespace to Blob storage. The hierarchical namespace organizes objects/files into a hierarchy of directories for efficient data access.
Source code | API reference documentation | REST API documentation | Product documentation | Samples
Add a dependency on Azure Storage File Datalake
<dependency>
<groupId>com.azure</groupId>
<artifactId>azure-storage-file-datalake</artifactId>
<version>12.4.0</version>
</dependency>
To create a Storage Account you can use the Azure Portal or Azure CLI. Note: To use data lake, your account must have hierarchical namespace enabled.
# Install the extension “Storage-Preview”
az extension add --name storage-preview
# Create the storage account
az storage account create -n my-storage-account-name -g my-resource-group --sku Standard_LRS --kind StorageV2 --hierarchical-namespace true
Your storage account URL, subsequently identified as
In order to interact with the Storage Service you'll need to create an instance of the Service Client class. To make this possible you'll need the Account SAS (shared access signature) string of the Storage Account. Learn more at SAS Token
a. Use the Azure CLI snippet below to get the SAS token from the Storage Account.
az storage blob generate-sas \
--account-name {Storage Account name} \
--container-name {container name} \
--name {blob name} \
--permissions {permissions to grant} \
--expiry {datetime to expire the SAS token} \
--services {storage services the SAS allows} \
--resource-types {resource types the SAS allows}
Example:
CONNECTION_STRING=<connection-string>
az storage blob generate-sas \
--account-name MyStorageAccount \
--container-name MyContainer \
--name MyBlob \
--permissions racdw \
--expiry 2020-06-15
b. Alternatively, get the Account SAS Token from the Azure Portal.
Shared access signature from the menu on the leftGenerate SAS and connection string (after setup)a. Use Account name and Account key. Account name is your Storage Account name.
Access keys from the menu on the leftkey1/key2 copy the contents of the Key fieldor
b. Use the connection string.
Access keys from the menu on the leftkey1/key2 copy the contents of the Connection string fieldDataLake Storage Gen2 was designed to: - Service multiple petabytes of information while sustaining hundreds of gigabits of throughput - Allow you to easily manage massive amounts of data
Key Features of DataLake Storage Gen2 include: - Hadoop compatible access - A superset of POSIX permissions - Cost effective in terms of low-cost storage capacity and transactions - Optimized driver for big data analytics
A fundamental part of Data Lake Storage Gen2 is the addition of a hierarchical namespace to Blob storage. The hierarchical namespace organizes objects/files into a hierarchy of directories for efficient data access.
In the past, cloud-based analytics had to compromise in areas of performance, management, and security. Data Lake Storage Gen2 addresses each of these aspects in the following ways: - Performance is optimized because you do not need to copy or transform data as a prerequisite for analysis. The hierarchical namespace greatly improves the performance of directory management operations, which improves overall job performance. - Management is easier because you can organize and manipulate files through directories and subdirectories. - Security is enforceable because you can define POSIX permissions on directories or individual files. - Cost effectiveness is made possible as Data Lake Storage Gen2 is built on top of the low-cost Azure Blob storage. The additional features further lower the total cost of ownership for running big data analytics on Azure.
Data Lake Storage Gen2 offers two types of resources:
_filesystem used via 'DataLakeFileSystemClient'_path used via 'DataLakeFileClient' or 'DataLakeDirectoryClient'|ADLS Gen2 | Blob | | --------------------------| ---------- | |Filesystem | Container | |Path (File or Directory) | Blob |
Note: This client library does not support hierarchical namespace (HNS) disabled storage accounts.
Paths are addressable using the following URL format: The following URL addresses a file:
https://${myaccount}.dfs.core.windows.net/${myfilesystem}/${myfile}
For the storage account, the base URI for datalake operations includes the name of the account only:
https://${myaccount}.dfs.core.windows.net
For a file system, the base URI includes the name of the account and the name of the file system:
https://${myaccount}.dfs.core.windows.net/${myfilesystem}
For a file/directory, the base URI includes the name of the account, the name of the file system and the name of the path:
https://${myaccount}.dfs.core.windows.net/${myfilesystem}/${mypath}
Note that the above URIs may not hold for more advanced scenarios such as custom domain names.
The following sections provide several code snippets covering some of the most common Azure Storage Blob tasks, including:
DataLakeServiceClientDataLakeFileSystemClientDataLakeFileClientDataLakeDirectoryClientDataLakeServiceClientCreate a DataLakeServiceClient using the sasToken generated above.
DataLakeServiceClient dataLakeServiceClient = new DataLakeServiceClientBuilder()
.endpoint("<your-storage-account-url>")
.sasToken("<your-sasToken>")
.buildClient();
or
// Only one "?" is needed here. If the sastoken starts with "?", please removing one "?".
DataLakeServiceClient dataLakeServiceClient = new DataLakeServiceClientBuilder()
.endpoint("<your-storage-account-url>" + "?" + "<your-sasToken>")
.buildClient();
DataLakeFileSystemClientCreate a DataLakeFileSystemClient using a DataLakeServiceClient.
DataLakeFileSystemClient dataLakeFileSystemClient = dataLakeServiceClient.getFileSystemClient("myfilesystem");
or
Create a DataLakeFileSystemClient from the builder sasToken generated above.
DataLakeFileSystemClient dataLakeFileSystemClient = new DataLakeFileSystemClientBuilder()
.endpoint("<your-storage-account-url>")
.sasToken("<your-sasToken>")
.fileSystemName("myfilesystem")
.buildClient();
or
// Only one "?" is needed here. If the sastoken starts with "?", please removing one "?".
DataLakeFileSystemClient dataLakeFileSystemClient = new DataLakeFileSystemClientBuilder()
.endpoint("<your-storage-account-url>" + "/" + "myfilesystem" + "?" + "<your-sasToken>")
.buildClient();
DataLakeFileClientCreate a DataLakeFileClient using a DataLakeFileSystemClient.
DataLakeFileClient fileClient = dataLakeFileSystemClient.getFileClient("myfile");
or
Create a FileClient from the builder sasToken generated above.
DataLakeFileClient fileClient = new DataLakePathClientBuilder()
.endpoint("<your-storage-account-url>")
.sasToken("<your-sasToken>")
.fileSystemName("myfilesystem")
.pathName("myfile")
.buildFileClient();
or
// Only one "?" is needed here. If the sastoken starts with "?", please removing one "?".
DataLakeFileClient fileClient = new DataLakePathClientBuilder()
.endpoint("<your-storage-account-url>" + "/" + "myfilesystem" + "/" + "myfile" + "?" + "<your-sasToken>")
.buildFileClient();
DataLakeDirectoryClientGet a DataLakeDirectoryClient using a DataLakeFileSystemClient.
DataLakeDirectoryClient directoryClient = dataLakeFileSystemClient.getDirectoryClient("mydir");
or
Create a DirectoryClient from the builder sasToken generated above.
DataLakeDirectoryClient directoryClient = new DataLakePathClientBuilder()
.endpoint("<your-storage-account-url>")
.sasToken("<your-sasToken>")
.fileSystemName("myfilesystem")
.pathName("mydir")
.buildDirectoryClient();
or
// Only one "?" is needed here. If the sastoken starts with "?", please removing one "?".
DataLakeDirectoryClient directoryClient = new DataLakePathClientBuilder()
.endpoint("<your-storage-account-url>" + "/" + "myfilesystem" + "/" + "mydir" + "?" + "<your-sasToken>")
.buildDirectoryClient();
Create a file system using a DataLakeServiceClient.
dataLakeServiceClient.createFileSystem("myfilesystem");
or
Create a file system using a DataLakeFileSystemClient.
dataLakeFileSystemClient.create();
Enumerating all paths using a DataLakeFileSystemClient.
for (PathItem pathItem : dataLakeFileSystemClient.listPaths()) {
System.out.println("This is the path name: " + pathItem.getName());
}
Rename a file using a DataLakeFileClient.
//Need to authenticate with azure identity and add role assignment "Storage Blob Data Contributor" to do the following operation.
DataLakeFileClient fileClient = dataLakeFileSystemClient.getFileClient("myfile");
fileClient.create();
fileClient.rename("new-file-system-name", "new-file-name");
Rename a directory using a DataLakeDirectoryClient.
//Need to authenticate with azure identity and add role assignment "Storage Blob Data Contributor" to do the following operation.
DataLakeDirectoryClient directoryClient = dataLakeFileSystemClient.getDirectoryClient("mydir");
directoryClient.create();
directoryClient.rename("new-file-system-name", "new-directory-name");
Get properties from a file using a DataLakeFileClient.
DataLakeFileClient fileClient = dataLakeFileSystemClient.getFileClient("myfile");
fileClient.create();
PathProperties properties = fileClient.getProperties();
Get properties from a directory using a DataLakeDirectoryClient.
DataLakeDirectoryClient directoryClient = dataLakeFileSystemClient.getDirectoryClient("mydir");
directoryClient.create();
PathProperties properties = directoryClient.getProperties();
The Azure Identity library provides Azure Active Directory support for authenticating with Azure Storage.
DataLakeServiceClient storageClient = new DataLakeServiceClientBuilder()
.endpoint("<your-storage-account-url>")
.credential(new DefaultAzureCredentialBuilder().build())
.buildClient();
When interacting with data lake using this Java client library, errors returned by the service correspond to the same HTTP
status codes returned for REST API requests. For example, if you try to retrieve a file system or path that
doesn't exist in your Storage Account, a 404 error is returned, indicating Not Found.
All client libraries by default use the Netty HTTP client. Adding the above dependency will automatically configure the client library to use the Netty HTTP client. Configuring or changing the HTTP client is detailed in the HTTP clients wiki.
All client libraries, by default, use the Tomcat-native Boring SSL library to enable native-level performance for SSL operations. The Boring SSL library is an uber jar containing native libraries for Linux / macOS / Windows, and provides better performance compared to the default SSL implementation within the JDK. For more information, including how to reduce the dependency size, refer to the performance tuning section of the wiki.
Several Storage datalake Java SDK samples are available to you in the SDK's GitHub repository.
This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution.
When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.
This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

Copyright © 2021 Microsoft Corporation. All rights reserved.