Creating a connection to Google BigQuery as a target

Set up a connection between Data Integration and Google BigQuery as a target by configuring your Google Cloud Platform (GCP) account, setting up the necessary resources, and creating a connection in Data Integration.

You can also manage service accounts and Google Cloud Storage buckets by using Data Integration’s managed resources or by configuring your own via a Custom File Zone. For more information, refer to schema drifts topic.

Prerequisites

Ensure you sign up for the Google Cloud Platform (GCP) and have a console Admin user to manage the necessary resources.

Enable BigQuery API
Enable Google Cloud Storage JSON API
BigQuery Dataset: Ensure you have an available dataset or create a new one.
Google Cloud Storage Bucket: Required for the Custom File Zone setup.

Step 1: Create a Google BigQuery connection in Data Integration

In the Data Integration console, navigate to Connections > New Connection and select Google BigQuery (Target).
Obtain your Google Cloud Platform Project ID and Project Number from the GCP console home page.
Enter the values in the Data Integration connection page.
Select the Region where your BigQuery dataset is located.

Step 2: Choose service account management (file zone setup)

You can use Data Integration’s default service account and storage bucket, or manage your own via the Custom File Zone toggle:

Default (Custom File Zone OFF): Data Integration automatically provisions a service account and Google Cloud Storage bucket.
Custom file zone (ON): If you enable this, Data Integration uses a Google Cloud Storage bucket you manage. This option stages data in your own Google Cloud bucket before inserting it into BigQuery.

Using Data Integration’s default file zone

Data Integration automatically creates and manages a dedicated service account and Google Cloud Storage bucket. However, you must grant access to Data Integration’s service account.

Grant access to Data Integration’s service account

Procedure

Sign in to Google Cloud Platform console.
Navigate to IAM & Admin > IAM.
Click +GRANT ACCESS.
Add the Data Integration service account as the principal.
Assign the BigQuery Admin role and click Save.
Go to 'API & Services' > 'Library'.
Look for BigQuery API and Cloud Storage JSON API, and click Enable.

Using custom file zone

When you use the Custom File Zone option, Data Integration stages data in a Google Cloud Storage bucket that you manage within your GCP project.

Create a new service account in the GCP Console

Ensure your GCP user has the ServiceAccountAdmin role.

Procedure

Go to IAM & Admin > Service Accounts.
Click CREATE SERVICE ACCOUNT.
Set the service account name (For example, Data_Integration User).
Assign BigQuery Admin and Storage Admin roles.
Copy the Service Account ID / Email.
Select Manage Keys > ADD KEY > Create new key (JSON) to manage keys for the service account. The platform downloads the key locally.

Procedure

Open your connection by navigating to Connections > Create New Connection, and choose Google BigQuery.
Upload the JSON key to the Data Integration connection.
Enter your "Google Cloud Storage bucket name" in the Default Bucket field. If Data Integration cannot populate the bucket list, manually enter the bucket name.

Additional configuration options

Data type widening (Schema Drift)

When the data type of a column changes in the source table, Data Integration adjusts the target table to accommodate a "wider" data type. For example:

If a column changes from an Integer to a String, the target data type becomes a String to encapsulate the integer values.
If a column changes from a String to a Date, it remains a String in the target.

You can turn this behavior on or off using the toggle in the Additional Options section:

If ON, the River adjusts data types automatically.
If OFF, the River fails if it encounters data type mismatches.

Notifications

To receive warnings about data type changes, set the On Warning toggle to True in the settings.

Known issues

Storage admin permissions: The default "Storage Admin" role may lack the storage.buckets.get permission. If this occurs:
- Duplicate the "Storage Admin" role and add the missing permission.
- Assign the custom role to your service account.
Location consistency: When using BigQuery as a target, ensure that the connection and BigQuery Dataset ID are in the exact location (region). Mismatched locations cause errors, such as: Error: Cannot read and write in different locations.