Exporting Data From BigQuery

Once you've loaded your data into BigQuery , you can export the data in several formats. BigQuery can export up to 1 GB of data per file, and supports exporting to multiple files .

To export data, create a job and populate the configuration.extract object.

Access control

Exporting data requires the following access levels.

Product	Access
BigQuery	Dataset-level `READER` access. For more information, see access control .
Google Cloud Storage	`WRITE` access for the Google Cloud Storage buckets. For more information, see Access Control - Google Cloud Storage .

Quota policy

The following limits apply for exporting data from BigQuery .

Daily Limit: 1,000 exports per day, up to 10 TB
Multiple Wildcard URI Limit: 500 URIs per export

Configuration options

You can configure two aspects of the exported data: the format, and the compression type.

destinationFormat controls the format. BigQuery supports CSV, JSON and Avro format. The Avro format can't be used in combination with GZIP compression.

Extract configuration example

The following code example shows the configuration of a job that exports data to single newline-delimited JSON file.

jobData = { 'projectId': projectId, 'configuration': { 'extract': { 'sourceTable': { 'projectId': projectId, 'datasetId': datasetId, 'tableId': tableId }, 'destinationUris': ['gs://<bucket>/<file>'], 'destinationFormat': 'NEWLINE_DELIMITED_JSON' } } }

The destinationUris property indicates the location(s) and file name(s) where BigQuery should export your files to. Usually, you'll pass a single value so that BigQuery exports to a single file, but you can alternately pass one or more wildcard URIs. For more information on what to specify for the


  destinationUris

property, see exporting data into one or more files .

Exporting data into one or more files

The destinationUris property indicates the location(s) and file name(s) where BigQuery should export your files to.

BigQuery supports a single wildcard operator (*) in each URI. Using the wildcard operator instructs BigQuery to create multiple sharded files based on the supplied pattern. The wildcard operator is replaced with a file number (starting at 0), left-padded to 12 digits. For example,

000000000001

for the second file.

The following table describes several possible options for the


  destinationUris

property:

Option	Description	Example
Single URI	Use a single URI if you want BigQuery to export your data to a single file. This option is the most common use case, as exported data is generally less than BigQuery's 1 GB per file maximum value.	Property definition: `['gs://my-bucket/file-name.json']` Creates: gs://my-bucket/file-name.json
Single wildcard URI	Use a single wildcard URI if you think your exported data will be larger than BigQuery's 1 GB per file maximum value. BigQuery shards your data into multiple files based on the provided pattern.	Property definition: `['gs://my-bucket/file-name-.json']` Creates:* gs://my-bucket/file-name-000000000000.json gs://my-bucket/file-name-000000000001.json gs://my-bucket/file-name-000000000002.json ...
Multiple wildcard URIs	Use multiple wildcard URIs if you want to partition the export output. You would use this option if you're running a parallel processing job with a service like Hadoop on Google Cloud Platform . Determine how many workers that are available to process the job, and create one URI per worker. BigQuery treats each URI location as a partition, and uses parallel processing to shard your data into multiple files in each location. You can use whatever pattern you'd like in your file name, assuming there is a single wildcard operator in each URI, each URI is unique, and the number of URIs does not exceed the quota policy . When you pass more than one wildcard URI, BigQuery creates a special file at the end of each partition that indicates the "final" file in the set. This file name indicates how many shards BigQuery created. For example, if your wildcard URI is `gs://my-bucket/file-name- <worker number> -*.json` , and BigQuery creates 80 sharded files, the zero record file name is `gs://my-bucket/file-name- <worker number> -000000000080.json` . You can use this file name to determine that BigQuery created 80 sharded files (named 000000000000-000000000079). Note that a zero record file might contain more than 0 bytes depending on the data format, such as when exporting data in CSV format with a column header.	String pattern: `gs://my-bucket/file-name- <worker number> -.json` Property definition:* ['gs://my-bucket/file-name-1-.json', 'gs://my-bucket/file-name-2-.json', 'gs://my-bucket/file-name-3-.json'] Creates:* This example assumes that BigQuery creates 80 sharded files in each partition. gs://my-bucket/file-name-1-000000000000.json gs://my-bucket/file-name-1-000000000001.json ... gs://my-bucket/file-name-1-000000000080.json gs://my-bucket/file-name-2-000000000000.json gs://my-bucket/file-name-2-000000000001.json ... gs://my-bucket/file-name-2-000000000080.json gs://my-bucket/file-name-3-000000000000.json gs://my-bucket/file-name-3-000000000001.json ... gs://my-bucket/file-name-3-000000000080.json

Option

Description

Example

Single URI

Use a single URI if you want BigQuery to export your data to a single file. This option is the most common use case, as exported data is generally less than BigQuery's 1 GB per file maximum value.

Property definition:

['gs://my-bucket/file-name.json']

Creates:

gs://my-bucket/file-name.json

Single wildcard URI

Use a single wildcard URI if you think your exported data will be larger than BigQuery's 1 GB per file maximum value. BigQuery shards your data into multiple files based on the provided pattern.

Property definition:

['gs://my-bucket/file-name-*.json']

Creates:

gs://my-bucket/file-name-000000000000.json
gs://my-bucket/file-name-000000000001.json
gs://my-bucket/file-name-000000000002.json
...

Multiple wildcard URIs

Use multiple wildcard URIs if you want to partition the export output. You would use this option if you're running a parallel processing job with a service like Hadoop on Google Cloud Platform . Determine how many workers that are available to process the job, and create one URI per worker. BigQuery treats each URI location as a partition, and uses parallel processing to shard your data into multiple files in each location. You can use whatever pattern you'd like in your file name, assuming there is a single wildcard operator in each URI, each URI is unique, and the number of URIs does not exceed the quota policy .

When you pass more than one wildcard URI, BigQuery creates a special file at the end of each partition that indicates the "final" file in the set. This file name indicates how many shards BigQuery created.

For example, if your wildcard URI is gs://my-bucket/file-name- <worker number> -*.json , and BigQuery creates 80 sharded files, the zero record file name is gs://my-bucket/file-name- <worker number> -000000000080.json . You can use this file name to determine that BigQuery created 80 sharded files (named 000000000000-000000000079).

Note that a zero record file might contain more than 0 bytes depending on the data format, such as when exporting data in CSV format with a column header.

String pattern:

gs://my-bucket/file-name- <worker number> -*.json

Property definition:

['gs://my-bucket/file-name-1-*.json',
'gs://my-bucket/file-name-2-*.json', 
'gs://my-bucket/file-name-3-*.json']

Creates:

This example assumes that BigQuery creates 80 sharded files in each partition.

gs://my-bucket/file-name-1-000000000000.json
gs://my-bucket/file-name-1-000000000001.json
...
gs://my-bucket/file-name-1-000000000080.json
gs://my-bucket/file-name-2-000000000000.json
gs://my-bucket/file-name-2-000000000001.json
...
gs://my-bucket/file-name-2-000000000080.json
gs://my-bucket/file-name-3-000000000000.json
gs://my-bucket/file-name-3-000000000001.json
...
gs://my-bucket/file-name-3-000000000080.json

Python code example

The following code example shows how to export data in CSV format, in Python.

import sys from apiclient.discovery import build from oauth2client.file import Storage from oauth2client.client import AccessTokenRefreshError from oauth2client.client import OAuth2WebServerFlow from oauth2client.tools import run from apiclient.errors import HttpError import httplib2 FLOW = OAuth2WebServerFlow( client_id='xxxxxxx.apps.googleusercontent.com', client_secret='shhhhhhhhhhhh', scope='https://www.googleapis.com/auth/bigquery', user_agent='my-program-name/1.0') def exportTable(http, service): projectId = raw_input("Choose your project ID: ") datasetId = raw_input("Choose a dataset ID: ") tableId = raw_input("Choose a table name to copy: ") url = "https://www.googleapis.com/bigquery/v2/projects/" + projectId + "/jobs" jobCollection = service.jobs() jobData = { 'projectId': projectId, 'configuration': { 'extract': { 'sourceTable': { 'projectId': projectId, 'datasetId': datasetId, 'tableId': tableId }, 'destinationUris': ['gs://<bucket>/<file>'], } } } insertJob = jobCollection.insert(projectId=projectId, body=jobData).execute() import time while True: status =jobCollection.get(projectId=projectId, jobId=insertJob['jobReference']['jobId']).execute() print status if 'DONE' == status['status']['state']: print "Done exporting!" return print 'Waiting for export to complete..' time.sleep(10) def main(argv): # If the credentials don't exist or are invalid, run the native client # auth flow. The Storage object will ensure that if successful the good # credentials will get written back to a file. storage = Storage('bigquery2.dat') # Choose a file name to store the credentials. credentials = storage.get() if credentials is None or credentials.invalid: credentials = run(FLOW, storage) # Create an httplib2.Http object to handle our HTTP requests and authorize it # with our good credentials. http = httplib2.Http() http = credentials.authorize(http) service = build('bigquery','v2', http=http) exportTable(http, service) if __name__ == '__main__': main(sys.argv)

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 3.0 License , and code samples are licensed under the Apache 2.0 License . For details, see our Site Policies .

Authentication required

You need to be signed in with Google+ to do that.

Signing you in...

Google Developers needs your permission to do that.

Google BigQuery

Exporting Data From BigQuery

Contents

Access control

Quota policy

Configuration options

Avro format

Extract configuration example

Exporting data into one or more files

Python code example

Authentication required

Signing you in...