You can access BigQuery from your Valohai executions to run queries.
Create a Service Account
The easiest way to authenticate your Valohai jobs with BigQuery is using a GCP Service Account.
- Go to your GCP Project that hosts all the Valohai resources
- Navigate to IAM -> Service Accounts
- Create a new a service account
- Give the service account
Share the email of the service account with Valohai with the information on which environments you’d like to attach this service account to. Each Valohai environment can be configured to use a different service account.
If your BigQuery data is in a different GCP Project than your Valohai resources, you’ll need to go to that project and give the newly created service account
BigQuery User permissions there.
In this case, your service account doesn’t need
BigQuery User and
BigQuery Data Viewer permissions in the project where you have just the Valohai resources, but no BigQuery data.
Connect to BigQuery
In your code, you can use the Python Client for Google BigQuery and directly connect to the BigQuery. When you launch your Valohai executions, choose the environment that has the service account attached and it will be automatically authenticated with the service account credentials.
from google.cloud import bigquery bqclient = bigquery.Client(project='myproject') # Download query results. query_string = """ SELECT CONCAT( 'https://stackoverflow.com/questions/', CAST(id as STRING)) as url, view_count FROM `bigquery-public-data.stackoverflow.posts_questions` WHERE tags like '%google-bigquery%' ORDER BY view_count DESC """ df = ( bqclient.query(query=query_string) .result() .to_dataframe() ) print(df.head()) df.to_csv("/valohai/outputs/dump.csv")
You’ll need to have the google-cloud-bigquery[bqstorage,pandas] packages run the example above.
pip install --upgrade 'google-cloud-bigquery[bqstorage,pandas]'