The Valohai Ecosystem database connectors allow you to easily run database queries on AWS Redshift, GCP BigQuery and Snowflake. You can provide the information required for authenticating the connection as environment variables in Valohai. For Redshift and BigQuery it is also possible to use machine identity (IAM roles) to authenticate the database connection.
You can define the query in the Valohai UI and it will be saved with other details related to the execution. Moreover, the query results will be saved as execution outputs and will be automatically uploaded to your data store to be used in other jobs you run in Valohai, for example. Like any other execution output, the result file will get a datum URL, which allows you to connect a datum alias to it.
Below, you can find instructions how to connect to your database on AWS Redshift.
Prerequisites:
- Redshift cluster on your AWS account with imported data.
- Cluster security group allows connections from the security group
valohai-sg-workers
. - A way to authenticate with the database, either by:
- Give
ValohaiWorkerRole
access to Redshift (or some other role that is attached to the workers. Contact Valohai support for help.) - Username/password combination that can be used to run the query.
- Give
To give ValohaiWorkerRole
access to the cluster, you will need to attach the following policy to it. You can limit the access for example to certain users or to certain databases if needed.
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "VisualEditor0",
"Effect": "Allow",
"Action": "redshift:GetClusterCredentials",
"Resource": "*"
}
]
}
Add environment variables
You will need to define the following environment variables for the execution:
- RSCLUSTERIDENTIFIER, your Redshift cluster identifier.
- RSDATABASE, the name of your Redshift database.
- RSHOST, the redshift cluster endpoint:
<cluster-identifier>.abcdefgh123a.<region>.redshift.amazonaws.com
- RSIAM, 1 by default, set to 0 to use username/password login instead of the ValohaiWorkerRole.
- RSPORT, 5439 by default.
- RSREGION, region of the Redshift cluster.
If using password login (i.e. RSIAM is set to 0), define also also:
- RSUSER, Redshift database username.
- RSPASSWORD, Redshift database password.
It is possible to add the environment variables separately to each execution while creating it in the UI but we recommend saving them either under the project Settings and the Env Variables tab or as an environment variable group on organization level (ask your organization admin for help with this). If needed, you can edit the environment variables in the UI before starting the execution.
Create a Valohai execution with the AWS Redshift connector
In order to to create and execution with the Redshift connector, follow the instructions below.
-
Open your project at app.valohai.com.
- Click on the Create Execution button under the Executions tab.
- Expand the step library by clicking on the plus sign next to valohai-ecosystem in the left side menu.
- Choose redshift-query step under the Connectors.
- (Optional) Change the settings, such as the environment or Docker image, under the runtime section if needed.
- Write your SQL query into the field under Parameters sections.
- By default the query results will be saved as results.csv but you can also define some other path for the output.
- (Optional) You can give the output file a datum alias, e.g.
redshift-query
, to easily refer to that in your executions with thedatum://redshift-query
.
- If you did not add save the environment variables under project Settings or as an environment variable group on organization level, add them under the Environment Variables section.
- You can edit and/or add new environment variables if needed.
- Click on Create Execution.
Comments
0 comments
Please sign in to leave a comment.