There is a variety of ways of getting data out of BigQuery using the API including streaming the results into memory and creating a file for download, however if you need to download a large amount of data then this will initially need to be exported to Google Cloud Storage before the resulting files can be downloaded.

In my case this data was to be loaded into a SQL database so I exported it as a CSV but it can also be exported as JSON or AVRO depending on the structure of your results.

The way to export BigQuery tables to first run a query against them and to then export these query results to Cloud Storage. The complete code for this post is here.

RunBigQueryAsync

public static async Task<BigQueryResults> RunBigQueryAsync(BigQueryClient bigQueryClient, string query)
{
	BigQueryJob bigQueryJob = await bigQueryClient.CreateQueryJobAsync(
		sql: query,
		parameters: null
		);

	await bigQueryJob.PollUntilCompletedAsync();

	return await bigQueryClient.GetQueryResultsAsync(bigQueryJob.Reference);
}

ExportBigQueryTableToStorageAsync

public static async Task ExportBigQueryTableToStorageAsync(BigQueryClient bigQueryClient, string destinationUri, BigQueryResults results)
{
	CreateExtractJobOptions jobOptions = new CreateExtractJobOptions()
	{
		DestinationFormat = FileFormat.Csv,
		Compression = CompressionType.Gzip
	};

	BigQueryJob job = bigQueryClient.CreateExtractJob(
		projectId: results.TableReference.ProjectId,
		datasetId: results.TableReference.DatasetId,
		tableId: results.TableReference.TableId,
		destinationUri: destinationUri,
		options: jobOptions
	);

	await job.PollUntilCompletedAsync();
}

Usage

GoogleCredential googleCredential = GoogleCredential.FromFile(_googleCredentialPath);
BigQueryClient bigQueryClient = await BigQueryClient.CreateAsync(_bigQueryProjectId, googleCredential);
string query = $"select * from `{_bigQueryProjectId}.{_bigQueryDataSetId}.{_bigQueryTableId}`";

BigQueryResults results = await RunBigQueryAsync(bigQueryClient, query);

await ExportBigQueryTableToStorageAsync(bigQueryClient, _cloudStorageDestinationUri, results);

2 Comments

Jason · 26th October 2020 at 5:24 pm

What does your destination uri look like? I’m using the same extract job options but my Uri is gs://path/test-*.csv. This doesnt compress the file and if I try to view it the data is corrupted. If I change the file extension to .gz or .gzip then it’s compressed but when I extract it, it extracts a “file” without an extension. The only thing that works is setting the compression to None and leaving the file extension in uri as .csv. Any suggestions?

    Shinigami · 27th October 2020 at 10:05 am

    Hi, yes I seem to remember getting confused by this as well. My URIs are of the format gs://lm_output/lm-external/sand/ftv_model_scoring__customer_attributes_20201027095711.csv. If you set the destination to csv and the compression to gzip then it should hopefully work as expected.

    CreateExtractJobOptions jobOptions = new CreateExtractJobOptions()
    {
    DestinationFormat = FileFormat.Csv,
    Compression = CompressionType.Gzip
    };

Leave a Reply to Jason Cancel reply

Avatar placeholder

Your email address will not be published. Required fields are marked *