Please note that the contents of this offline web site may be out of date. To access the most recent documentation visit the online version .
Note that links that point to online resources are green in color and will open in a new window.
We would love it if you could give us feedback about this material by filling this form (You have to be online to fill it)



Exporting Data From BigQuery

Once you've loaded your data into BigQuery , you can export the data in several formats. BigQuery can export up to 1 GB of data per file, and supports exporting to multiple files .

To export data, create a job and populate the configuration.extract object.

Contents

Access control

Exporting data requires the following access levels.

Product

Access

BigQuery

Dataset-level READER access. For more information, see access control .

Google Cloud Storage

WRITE access for the Google Cloud Storage buckets. For more information, see Access Control - Google Cloud Storage .

Back to top

Quota policy

The following limits apply for exporting data from BigQuery .

  • Daily Limit: 1,000 exports per day, up to 10 TB
  • Multiple Wildcard URI Limit: 500 URIs per export

Back to top

Configuration options

You can configure two aspects of the exported data: the format, and the compression type.

  • destinationFormat controls the format. BigQuery supports CSV, JSON and Avro format. The Avro format can't be used in combination with GZIP compression.

  • compression controls the compression type. BigQuery supports GZIP compression or NONE .
  • Avro format

    BigQuery expresses Avro formatted data in the following ways:

    Back to top

    Extract configuration example

    The following code example shows the configuration of a job that exports data to single newline-delimited JSON file.

    jobData = {
        'projectId': projectId,
        'configuration': {
          'extract': {
            'sourceTable': {
               'projectId': projectId,
               'datasetId': datasetId,
               'tableId': tableId
             },
            'destinationUris': ['gs://<bucket>/<file>'],
            'destinationFormat': 'NEWLINE_DELIMITED_JSON'
           }
         }
       }

    The destinationUris property indicates the location(s) and file name(s) where BigQuery should export your files to. Usually, you'll pass a single value so that BigQuery exports to a single file, but you can alternately pass one or more wildcard URIs. For more information on what to specify for the destinationUris property, see exporting data into one or more files .

    Back to top

    Exporting data into one or more files

    The destinationUris property indicates the location(s) and file name(s) where BigQuery should export your files to.

    BigQuery supports a single wildcard operator (*) in each URI. Using the wildcard operator instructs BigQuery to create multiple sharded files based on the supplied pattern. The wildcard operator is replaced with a file number (starting at 0), left-padded to 12 digits. For example, 000000000001 for the second file.

    The following table describes several possible options for the destinationUris property:

    Option Description Example
    Single URI Use a single URI if you want BigQuery to export your data to a single file. This option is the most common use case, as exported data is generally less than BigQuery's 1 GB per file maximum value.

    Property definition:

    ['gs://my-bucket/file-name.json']

    Creates:

    gs://my-bucket/file-name.json
    Single wildcard URI Use a single wildcard URI if you think your exported data will be larger than BigQuery's 1 GB per file maximum value. BigQuery shards your data into multiple files based on the provided pattern.

    Property definition:

    ['gs://my-bucket/file-name-*.json']

    Creates:

    gs://my-bucket/file-name-000000000000.json
    gs://my-bucket/file-name-000000000001.json
    gs://my-bucket/file-name-000000000002.json
    ...
    Multiple wildcard URIs

    Use multiple wildcard URIs if you want to partition the export output. You would use this option if you're running a parallel processing job with a service like Hadoop on Google Cloud Platform . Determine how many workers that are available to process the job, and create one URI per worker. BigQuery treats each URI location as a partition, and uses parallel processing to shard your data into multiple files in each location. You can use whatever pattern you'd like in your file name, assuming there is a single wildcard operator in each URI, each URI is unique, and the number of URIs does not exceed the quota policy .

    When you pass more than one wildcard URI, BigQuery creates a special file at the end of each partition that indicates the "final" file in the set. This file name indicates how many shards BigQuery created.

    For example, if your wildcard URI is gs://my-bucket/file-name- <worker number> -*.json , and BigQuery creates 80 sharded files, the zero record file name is gs://my-bucket/file-name- <worker number> -000000000080.json . You can use this file name to determine that BigQuery created 80 sharded files (named 000000000000-000000000079).

    Note that a zero record file might contain more than 0 bytes depending on the data format, such as when exporting data in CSV format with a column header.

    String pattern:

    gs://my-bucket/file-name- <worker number> -*.json

    Property definition:

    ['gs://my-bucket/file-name-1-*.json',
    'gs://my-bucket/file-name-2-*.json', 
    'gs://my-bucket/file-name-3-*.json']

    Creates:

    This example assumes that BigQuery creates 80 sharded files in each partition.

    gs://my-bucket/file-name-1-000000000000.json
    gs://my-bucket/file-name-1-000000000001.json
    ...
    gs://my-bucket/file-name-1-000000000080.json
    gs://my-bucket/file-name-2-000000000000.json
    gs://my-bucket/file-name-2-000000000001.json
    ...
    gs://my-bucket/file-name-2-000000000080.json
    gs://my-bucket/file-name-3-000000000000.json
    gs://my-bucket/file-name-3-000000000001.json
    ...
    gs://my-bucket/file-name-3-000000000080.json

    Back to top

    Python code example

    The following code example shows how to export data in CSV format, in Python.

    import sys
    from apiclient.discovery import build
    from oauth2client.file import Storage
    from oauth2client.client import AccessTokenRefreshError
    from oauth2client.client import OAuth2WebServerFlow
    from oauth2client.tools import run
    from apiclient.errors import HttpError
    import httplib2
    
    FLOW = OAuth2WebServerFlow(
        client_id='xxxxxxx.apps.googleusercontent.com',
        client_secret='shhhhhhhhhhhh',
        scope='https://www.googleapis.com/auth/bigquery',
        user_agent='my-program-name/1.0')
    
    def exportTable(http, service):
      projectId = raw_input("Choose your project ID: ")
      datasetId = raw_input("Choose a dataset ID: ")
      tableId = raw_input("Choose a table name to copy: ")
    
      url = "https://www.googleapis.com/bigquery/v2/projects/" + projectId + "/jobs"
    
      jobCollection = service.jobs()
      jobData = {
        'projectId': projectId,
        'configuration': {
          'extract': {
            'sourceTable': {
               'projectId': projectId,
               'datasetId': datasetId,
               'tableId': tableId
             },
            'destinationUris': ['gs://<bucket>/<file>'],
           }
         }
       }
      insertJob = jobCollection.insert(projectId=projectId, body=jobData).execute()
      import time
      while True:
        status =jobCollection.get(projectId=projectId, jobId=insertJob['jobReference']['jobId']).execute()
        print status
        if 'DONE' == status['status']['state']:
          print "Done exporting!"
          return
        print 'Waiting for export to complete..'
        time.sleep(10)
    
    def main(argv):
      # If the credentials don't exist or are invalid, run the native client
      # auth flow. The Storage object will ensure that if successful the good
      # credentials will get written back to a file.
      storage = Storage('bigquery2.dat') # Choose a file name to store the credentials.
      credentials = storage.get()
      if credentials is None or credentials.invalid:
        credentials = run(FLOW, storage)
    
      # Create an httplib2.Http object to handle our HTTP requests and authorize it
      # with our good credentials.
      http = httplib2.Http()
      http = credentials.authorize(http)
    
      service = build('bigquery','v2', http=http)
      exportTable(http, service)
    
    if __name__ == '__main__':
      main(sys.argv)

    Back to top

    Authentication required

    You need to be signed in with Google+ to do that.

    Signing you in...

    Google Developers needs your permission to do that.