BigQuery Data Type Update
The workflow Env - BigQuery Data Type Update in functions repo is used to update the data type of a specific field on root level in BigQuery tables (_latest
and _changelog
tables) in the bqDataLake
dataset. The workflow is triggered manually via the workflow_dispatch event with user-provided inputs from github actions.
📋 Workflow Overview​
Trigger​
Manually triggered using workflow_dispatch
.
Input Parameters​
Input Name | Description | Required | Default |
---|---|---|---|
table_name | Name of the base table | ✅ Yes | "" |
field_name | Name of the field to be updated | ✅ Yes | "" |
data_type | Target BigQuery data type | ✅ Yes | "" |
🧠Function Logic (updateDataTypeOnRootLevel.ts
)​
The script performs the following actions:
-
Reads environment variables:
PROJECT_ENV
,TABLE_NAME
,FIELD_NAME
,DATA_TYPE
-
Prepares table names:
- Appends
_latest
and_changelog
to the base table name
- Appends
-
Generates and executes ALTER TABLE queries:
- Updates the column's data type in both tables using BigQuery SQL
TypeScript Code Summary​
const tables = [`${tableName}_latest`, `${tableName}_changelog`];
for (const table of tables) {
const sqlQuery = `
ALTER TABLE ${projectEnv}.bqDataLake.${table}
ALTER COLUMN ${field} SET DATA TYPE ${dataType};`;
await bigQuery.query({ query: sqlQuery });
console.log(`Updated column data type in table ${table}`);
}
Error Handling​
If any required environment variable is missing or the query fails, an error message is logged:
console.log('Error:', error?.errors?.[0]?.message ?? 'Data type update failed.');
✅ Example Usage​
To trigger this workflow, go to the Actions tab in GitHub, select ENV - BigQuery Data Type Update, and provide the required inputs:
- Table Name:
accounts_serviceUnit_enquiry_invoice
- Field Name:
grandTotal.value
- Data Type:
STRING
📌 Notes​
- Ensure the service account has
BigQuery Data Editor
permissions. - The script will alter both the
_latest
and_changelog
tables. - Always validate schema compatibility before changing types in production.
- This will work for fields only at root level. Nested fields should be handled manually using queries.
- https://cloud.google.com/bigquery/docs/reference/standard-sql/conversion_rules
Updating Nested Fields​
📄 Casted Fields Query Example​
The following is an example query that casts nested fields in accounts_serviceUnit_enquiry_invoice_changelog
:
CREATE OR REPLACE TABLE bqDataLake.accounts_serviceUnit_enquiry_invoice_changelog
PARTITION BY DATE(document_timestamp) AS
SELECT
* EXCEPT(jobs),
ARRAY(
SELECT AS STRUCT
job.* EXCEPT(priceLine),
(SELECT AS STRUCT
job.priceLine.* EXCEPT(parts,labour),
(SELECT AS STRUCT
job.priceLine.parts.* EXCEPT(partsAndFluids),
ARRAY(
SELECT AS STRUCT
paf.* EXCEPT(unitPrice, quantity),
(SELECT AS STRUCT
paf.unitPrice.* EXCEPT(raw),
CAST(paf.unitPrice.raw AS FLOAT64) AS raw
) AS unitPrice,
(SELECT AS STRUCT
paf.quantity.* EXCEPT(raw),
CAST(paf.quantity.raw AS FLOAT64) AS raw
) AS quantity
FROM UNNEST(job.priceLine.parts.partsAndFluids) AS paf
) AS partsAndFluids
) AS parts,
(SELECT AS STRUCT
job.priceLine.labour.* EXCEPT(manufacturerGoodwill),
(SELECT AS STRUCT
job.priceLine.labour.manufacturerGoodwill.* EXCEPT(raw),
CAST(job.priceLine.labour.manufacturerGoodwill.raw AS FLOAT64) AS raw
) AS manufacturerGoodwill
) AS labour
) AS priceLine
FROM UNNEST(jobs) AS job
) AS jobs
FROM bqDataLake.accounts_serviceUnit_enquiry_invoice_changelog;
Field Path | Original Type | New Type |
---|---|---|
job.priceLine.parts.partsAndFluids.unitPrice.raw | INTEGER | FLOAT64 |
job.priceLine.parts.partsAndFluids.quantity.raw | INTEGER | FLOAT64 |
job.priceLine.labour.manufacturerGoodwill.raw | INTEGER | FLOAT64 |