In this guide, we’ll link a private Google Cloud Platform bucket to a Valohai project.
Requirements
- Google Cloud Platform project that you can administer
- A Valohai project which to link Google Storage to
Create the bucket
Create the bucket
Using an existing Google Cloud Store
You can skip this part and go directly to the next section if you’re using an existing Cloud Storage
Create a bucket through the Google Cloud Platform web console.
Recommended configuration for the bucket:
- Name: can be anything valid for GCP, here are using
sample-valohai-bucket
as an example - Region: pick the region that hosts the majority of the workers you’ll be using to minimize transfer.
- Storage Class: use Standard if you have no further preference
- Access control: Uniform (allow only bucket-level permissions)
- Encryption: Google-managed key
- Retention Policy: none
- Labels: none
Keep pressing the Continue until you’ve created the bucket.
Now you have an empty bucket that you can use for your data; e.g. training datasets and models.
Create a service account
Next, we’ll create a new service account using the GCP console. The service account is “an account” that Valohai workers use to access this particular GCP bucket.
Navigate to IAM & admin > Service accounts > Create service account
Name your service account so that you can later remember what it’s meant for (here we are using my-valohai-bucket-admin
) and press Create.
On the next screen, you don’t need to add any roles as we will configure more limited access rights later. Just press Continue.
Press the Create Key button and select JSON format, this will automatically download a JSON file that we’ll be using later.
The resulting JSON file will look something like this:
{
"type": "...",
"project_id": "...",
"private_key_id": "...",
"private_key": "...",
"client_email": "my-valohai-bucket-admin@chubby.iam.gserviceaccount.com",
"client_id": "...",
"auth_uri": "...",
"token_uri": "...",
"auth_provider_x509_cert_url": "...",
"client_x509_cert_url": "..."
}
Also, take a note of the client_email
value, we’ll be using that later.
You can later find the service account email in the Service Accounts listing:
Allow access for the new service account
Next, we permit the new service account to access files in the bucket.
Navigate to Storage > Browse > “your-bucket” > Permissions > Add member
- New members: Copy-and-paste the service account email to the field, it will validate it. We got the service account email in the previous section.
- Role: Select Storage Object Admin, this allows download and uploading files.
- Press the Save button.
Set CORS settings for your bucket
Click on “Activate Google Cloud Shell” in the upper right corner.
- Create a new CORS configuration file
-
echo '[{"origin": ["*"],"responseHeader": ["Content-Type", "x-ms-*"],"method": ["GET", "HEAD", "OPTIONS"],"maxAgeSeconds": 3600}, {"origin": ["https://app.valohai.com"],"responseHeader": ["Content-Type", "x-ms-*"],"method": ["POST", "PUT"],"maxAgeSeconds": 3600}]' > cors-config.json
- Update the CORS settings for your bucket
-
gsutil cors set cors-config.json gs://<your-bucket-name>
- Check the CORS settings
-
gsutil cors get gs://<your-bucket-nameg>
Link the store to Valohai
You can connect this data store either to a single project, or create in on the organization level.
Link to a Valohai organization
- Login at https://app.valohai.com
- Navigate to
Hi, <name> (the top-right menu) > Manage <organization>
. - Open the Data Stores tab and add your store's details
The data store can be shared with everyone in the organization, or you can expose the data store only certain team(s).
Link the store to a Valohai project
Navigate to Project > Settings > Data Stores > Add Google Storage
- Name: usually makes sense to use the same name as the bucket name.
- Bucket: the bucket name;
my-valohai-bucket
in this example. - Service Account JSON: copy-and-paste the contents of the JSON file we downloaded earlier.
When you create the store, the credentials provided will be checked by creating a small test file in the bucket. If the creation succeeds, you are good to go
Once the data store is linked, you can set it as your project’s default upload store under Settings > General > Default upload store. This ensures that uploaded outputs will be stored in this particular GCP bucket by default.
Comments
0 comments
Please sign in to leave a comment.