This section describes concepts and techniques you must know to use Google Cloud Storage effectively. The information it contains is intended for developers. It assumes that you are familiar with web programming and you are comfortable creating applications that consume web services through HTTP requests.
Contents
- Concepts and Terminology
- Consistency
- Request Endpoints
- Resumable Uploads
- Resumable Uploads of Unknown Size
- Specifying Bucket Locations
- Streaming Transfers
- Best Practices and Security Considerations
Concepts and Terminology
To use Google Cloud Storage effectively you need to understand some of the concepts on which it is built. These concepts define how your data is stored in Google Cloud Storage.
-
Projects
All data in Google Cloud Storage belongs inside a project. A project consists of a set of users, a set of APIs, and billing, authentication, and monitoring settings for those APIs. You can have one project or multiple projects.
-
Buckets
Buckets are the basic containers that hold your data. Everything that you store in Google Cloud Storage must be contained in a bucket. You can use buckets to organize your data and control access to your data, but unlike directories and folders, you cannot nest buckets. There is a per-project rate limit on bucket create/delete operations; however there is no rate limit on object create/delete operations. Therefore, we recommend that you design your storage applications to favor intensive object operations and relatively few buckets operations.
-
Objects
Objects are the individual pieces of data that you store in Google Cloud Storage. A single object can be up to 5 TB in size. Objects have two components: object data and object metadata. The object data component is usually a file that you want to store in Google Cloud Storage. The object metadata component is a collection of name-value pairs that describe various object qualities.
-
Data opacity
An object's data component is completely opaque to Google Cloud Storage. It is just a chunk of data to Google Cloud Storage.
-
Object immutability
Objects are immutable, which means that an uploaded object cannot change throughout its storage lifetime. An object's storage lifetime is the time between successful object creation (upload) and successful object deletion. In practice, this means that you cannot make incremental changes to objects, such as append operations or truncate operations. However, it is possible to overwrite objects that are stored in Google Cloud Storage because an overwrite operation is in effect a delete object operation followed immediately by an upload object operation. So a single overwrite operation simply marks the end of one immutable object's lifetime and the beginning of a new immutable object's lifetime.
-
Hierarchy
Google Cloud Storage uses a flat namespace to store objects. However, some tools (for example, Google Developers Console and gsutil ) can work with objects as if they are stored in a virtual hierarchy, as a convenience.
-
Namespace
There is only one Google Cloud Storage namespace, which means every bucket must have a unique name across the entire Google Cloud Storage namespace. Object names must be unique only within a given bucket.
-
Object names
An object name is just metadata to Google Cloud Storage. Object names can contain any combination of Unicode characters (UTF-8 encoded) less than 1024 bytes in length. A common character to include in file names is a slash (/). By using slashes in an object name, you can make objects appear as though they're stored in a hierarchical structure. For example, you could name one object /europe/france/paris.jpg and another object /europe/france/cannes.jpg. When you list these objects they appear to be in a hierarchical directory structure based on location; however, Google Cloud Storage sees the objects as independent objects with no hierarchical relationship whatsoever.
-
Bucket names
Bucket names have more restrictions than object names because every bucket resides in a single Google Cloud Storage namespace. Also, bucket names can be used with a CNAME redirect, which means they need to conform to DNS naming conventions. For more information, see Bucket and Object Naming Guidelines .
Consistency
From an availability standpoint, upload operations to Google Cloud Storage are atomic. When you upload an object, the object is not available until it is completely uploaded. By extension, uploaded objects are never available for download in a corrupted state or as partial objects. Objects are either available or not available.
From a consistency standpoint, Google Cloud Storage provides strong global consistency for all read-after-write, read-after-update, and read-after-delete operations, including both data and metadata. When you upload a file (PUT) to Google Cloud Storage, and you receive a success response, the object is immediately available for download (GET) operations, from any location in Google's global network. This is true if you upload a new file or upload and overwrite an existing file. And, when you receive a success response for an upload operation, your data is already replicated in multiple data centers.
Strong global consistency means you won't receive a
404 Not Found
or stale data for a read-after-write or read-after-update operation. The latency for writing to a globally-consistent replicated store may be slightly higher than to a non-replicated or non-committed store because a success response is returned only when multiple writes complete, not just one. For read operations of publicly cacheable objects, this latency is mitigated through our global edge cache.
Strong global consistency also extends to deletion (DELETE) operations on objects and update (PUT) operations that change existing object and bucket ACLs. If you delete an object and you receive a success response, an immediate attempt to download (GET) the object will result in a
404 Not Found
status code. You get the
404
error because the object no longer exists after the delete operation completes successfully. Strong consistency means that it is not possible to delete an object and then perform a GET on the object and receive a pre-deleted state of the object. Likewise, if you change ACLs on an object or bucket and you receive a success response, the newly applied object or bucket ACLs are immediately available.
List operations are eventually consistent. This eventual consistency applies to operations that get a <list> of objects or buckets. For example, if you upload an object to a bucket, you can immediately download that object and get its ACLs because object uploads are strongly consistent. However, if you upload an object to a bucket, and then immediately perform a list objects operation on the bucket in which the object is stored, the uploaded object might not immediately appear in the returned list of objects. This is also true if you create a bucket and then immediately perform a list buckets operation. The newly-created bucket will be immediately available for use, but the bucket might not immediately appear in the returned list of buckets.
DELETE Bucket operations can also be affected by the eventual consistency of list operations. DELETE Bucket operations are affected only when you delete all of the objects in a bucket and then immediately try to delete the bucket. In this case, the list of objects in the bucket might not immediately reflect the fact that the objects have been deleted and so the delete bucket operation fails. Delete operations on buckets that are already empty are strongly consistent: that is, if you delete an empty bucket and get a success response, any subsequent attempt to access the bucket will fail.
Because we allow you to specify cache control for publicly-readable objects, cached objects that are publicly readable might not exhibit strong consistency. If you cache an object, and the object is in the cache when it is updated or deleted, the cached object will not be updated or deleted until its cache lifetime expires. The cache lifetime is defined by the Cache-Control request header, which you can specify when you upload an object. So ultimately, you can control the degree to which cached objects are consistent.
Request Endpoints
You can access Google Cloud Storage through three request endpoints (URIs). Which one you use depends on the operation you are performing.
Note: The Google Cloud Storage URIs described in this section are subject to change.
Typical API Requests
For most operations you can use either of the following URLs to access an object:
storage.googleapis.com/
<bucket>
/
<object>
<bucket>
.storage.googleapis.com/
<object>
www.googleapis.com/storage/v1beta2/b/
<bucket>
/o/
<object>
?alt=media
(JSON API)
Both forms support secure sockets layer (SSL) encryption, which means you can use either HTTP or HTTPS. It is recommended that if you authenticate to the Google Cloud Storage API using OAuth 2.0, you should use HTTPS.
CNAME Redirects
A CNAME redirect is a special DNS record that lets you use a URL from your own domain to access a resource (bucket and object) in Google Cloud Storage without revealing the Google Cloud Storage URI. To do this, you must use the following URI in the host name portion of your CNAME record:
c.storage.googleapis.com
For example, let's assume your domain is example.com and you want to make travel maps available to your customers. You could create a bucket in Google Cloud Storage called travel-maps.example.com, and then create a CNAME record in DNS that redirects requests from travel-maps.example.com to the Google Cloud Storage URI. To do this, you publish the following CNAME record in DNS:
travel-maps.example.com CNAME c.storage.googleapis.com.
By doing this, your customers can use the following URL to access a map of Paris:
http://travel-maps.example.com/paris.jpg
Note: You can use a CNAME redirect only with HTTP, not with HTTPS.
Authenticated Browser Downloads
The Google Cloud Storage authentication and authorization models support authenticated browser downloads, which lets a user download data through their browser if they are logged in to their Google account and they have been granted permission to read the data. Authenticated browser downloads use cookie-based Google account authentication in conjunction with Google account-based ACLs. To download an object using cookie-based authentication you must use the following URL:
https://storage.cloud.google.com/
<bucket>
/
<object>
.
For more information about authenticated browser downloads, see Cookie-based Authentication later in this document.
Resumable Uploads
The Google Cloud Storage API provides a resumable data transfer feature that lets you resume upload operations after a communication failure has interrupted the flow of data. Resumable uploads are useful if you are transferring large files because the likelihood of a network interruption or some other transmission failure is high. Also, by using the resumable upload feature you can reduce your bandwidth usage (and therefore your bandwidth cost) because you do not have to restart large file uploads from the beginning. This section shows you how to implement the resumable upload feature using the RESTful API. You can also perform a resumable upload by using the gsutil tool.
Implementing Resumable Uploads with the XML API
The Google Cloud Storage XML API provides two standard HTTP methods for uploading data: POST Object and PUT Object . To implement a resumable upload you use both of these methods in conjunction with various headers and query string parameters. The following procedure shows you how to do this:
Step 1—Initiate the resumable upload
To begin a resumable upload you send a POST Object request to Google Cloud Storage. The POST Object request does not contain the file you are uploading, rather, it contains a few headers that inform the Google Cloud Storage system that you want to perform a resumable upload. Specifically, the POST Object request must have the following:
- An empty entity body.
-
A
Content-Type
request header, which can be set to the content type of the file you are uploading. -
A
Content-Length
request header, which must be set to 0. -
An
x-goog-resumable
header, which must be set to start .
You can include a
Content-Type
request header if you want to specify a content type for the file you are uploading. If you do not specify a content type, the Google Cloud Storage system will set the content type to binary/octet-stream when it serves the object you are uploading.
The
x-goog-resumable
header is a Google Cloud Storage extension (custom) header. The header notifies the Google Cloud Storage system that you want to initiate a resumable upload. The header can be used only with a POST Object request and can be used only for resumable uploads.
In addition, you must use the standard Google Cloud Storage host name in the request (storage.googleapis.com), and you must authenticate the POST Object request just as you would any authenticated request. For more information, see Request URIs and Authentication .
The following example shows how to initiate a resumable upload for a file named music.mp3 that's being uploaded into a bucket named example.
POST /music.mp3 HTTP/1.1 Host: example.storage.googleapis.com Date: Fri, 01 Oct 2010 21:56:18 GMT Content-Length: 0 Content-Type: audio/mpeg x-goog-resumable: start Authorization: Bearer {your_auth_token}
Note: gsutil supports resumable uploads and downloads, with resumabale downloads using HTTP range GETs.
Step 2—Process the response
After you initiate the resumable upload with a POST Object request, Google Cloud Storage responds with a 201 Created status message. The status message includes a
Location
header, which defines an upload ID for the resumable upload. You must save the upload ID because you will use it in all further requests during your upload operation.
The following example shows the response to the Post Object request that was shown in Step 1.
HTTP/1.1 201 Created Location: https://example.storage.googleapis.com/music.mp3?upload_id=tvA0ExBntDa...gAAEnB2Uowrot Date: Fri, 01 Oct 2010 21:56:18 GMT Content-Length: 0 Content-Type: audio/mpeg
Step 3—Upload the file
Next, you implement a PUT Object request that sends the file blocks to Google Cloud Storage. The PUT Object request includes an
upload_id
query string parameter, which specifies the upload ID that you obtained in Step 2. The request also includes a
Content-Length
header, which you must use to specify the size of the file you are uploading. The sizes of all the blocks written, except the final block, must be a multiple of 256K bytes (that is, 262144 bytes).
As with the POST Object request in Step 1, you must use the standard Google Cloud Storage host name in the request (storage.googleapis.com), and you must authenticate the PUT Object request just as you would any authenticated request.
The following example shows how to upload the music.mp3 file that was initiated in Step 1:
PUT /music.mp3?upload_id=tvA0ExBntDa...gAAEnB2Uowrot HTTP/1.1 Host: example.storage.googleapis.com Date: Fri, 01 Oct 2010 21:56:18 GMT Content-Length: 7351375 Authorization: Bearer {your_auth_token}
If the PUT Object request is not interrupted and the file is successfully uploaded, Google Cloud Storage responds with a 200 OK status code. If the upload is interrupted, you can resume the upload by performing Steps 4, 5, and 6.
Step 4—Query Google Cloud Storage for the upload status
If the upload operation is interrupted or gets an HTTP 503 or 500 response, you should query for the number of bytes it has received by implementing another PUT Object request. The PUT Object request must have the following:
- An empty entity body.
-
A
Content-Length
request header, which must be set to 0. -
A
Content-Range
request header, which specifies the byte range you are seeking status for. -
An
upload_id
query string parameter, which specifies the upload ID for the resumable upload.
The value of the
Content-Range
request header must be in the following format:
Content-Range: bytes */
<content-length>
Where
<content-length>
is the value of the
Content-Length
header that you specified in the original PUT Object request (Step 3).
In addition, you must use the standard Google Cloud Storage host name in the request (storage.googleapis.com), and you must authenticate the PUT Object request just as you would any authenticated request.
The following example shows how to query the Google Cloud Storage system after a resumable upload is interrupted:
PUT /music.mp3?upload_id=tvA0ExBntDa...gAAEnB2Uowrot HTTP/1.1 Host: example.storage.googleapis.com Date: Fri, 01 Oct 2010 22:25:53 GMT Content-Range: bytes */7351375 Content-Length: 0 Authorization: Bearer {your_auth_token}
Step 5—Process the status response
After you query the Google Cloud Storage system for the status of the interrupted upload, the Google Cloud Storage system responds with a 308 Resume Incomplete status code. This status code contains a
Range
response header, which tells you the range of bytes that the Google Cloud Storage system has received. You must use the value of the
Range
header to determine the byte range that was not successfully uploaded. You will use this number in Step 6.
The following example shows the response to the PUT Object request that was shown in Step 4:
HTTP/1.1 308 Resume Incomplete Range: bytes=0-2359295 Date: Fri, 01 Oct 2010 22:25:53 GMT Content-Length: 0 Content-Type: audio/mpeg
The example indicates that Google Cloud Storage received the first 2359296 bytes of the music.mp3 file.
Step 6—Resume the upload
Finally, you can resume the upload operation by implementing a PUT Object request. The PUT Object request must have the following:
-
An entity body containing the range of bytes that still need to be uploaded. You can determine this range by subtracting the
Range
(which you obtained in Step 5) from theContent-Length
(which you specified in Step 3). -
A
Content-Length
request header, which specifies the number of bytes you are uploading in the current request. -
A
Content-Range
request header, which specifies the byte range you are uploading in the request. -
An
upload_id
query string parameter, which specifies the upload ID for the resumable upload.
You must use the standard Google Cloud Storage host name in the request (storage.googleapis.com), and you must authenticate the POST Object request just as you would any authenticated request.
The following example shows a PUT Object request that resumes the upload of the music.mp3 file into the example bucket.
PUT /music.mp3?upload_id=tvA0ExBntDa...gAAEnB2Uowrot HTTP/1.1 Host: example.storage.googleapis.com Date: Fri, 01 Oct 2010 22:25:53 GMT Content-Range: bytes 2359296-7351374/7351375 Content-Length: 4992079 Authorization: Bearer {your_auth_token}
You can perform steps 4, 5, or 6 as many times as necessary but when retrying requests, use randomized binary exponential backoff: wait a random period between [0..1] seconds and retry; if that fails, wait a random period between [0..2] seconds and retry; if that fails, wait a random period between [0..4] seconds and retry, and so on. For more information, see binary exponential backoff . As an example, you can also see the boto implementation of this logic.
Resumable Uploads of Unknown Size
The resumable upload mechanism supports transfers where the file size is not known in advance. This can be useful for cases like compressing an object on-the-fly while uploading, since it's difficult to predict the exact file size for the compressed file at the start of a transfer. The mechanism is useful either if you want to stream a transfer that can be resumed after being interrupted, or if chunked transfer encoding does not work for your application.
Step 1—Initiate the resumable upload
To begin a resumable upload you send a POST Object request to Google Cloud Storage. The POST Object request does not contain the file you are uploading, rather, it contains a few headers that inform the Google Cloud Storage system that you want to perform a resumable upload. The following example shows how to initiate a resumable upload for a file named myFile.zip.
POST /myFile.zip HTTP/1.1 Host: example.storage.googleapis.com Date: Fri, 22 Jun 2012 21:56:18 GMT Content-Type: application/octet-stream x-goog-resumable: start Authorization: Bearer {your_auth_token}
Step 2—Process the response
After you initiate the resumable upload with a POST Object request, Google Cloud Storage responds with a 201 Created status message. The status message includes a
Location
header, which defines an upload ID for the resumable upload. You must save the upload ID because you will use it in all further requests during your upload operation.
The following example shows the response to the POST Object request.
HTTP/1.1 201 Created Location: https://example.storage.googleapis.com/myFile.zip?upload_id=tvA0ExBntDa...gAAEnB2Uowrot Date: Fri, 22 Jun 2012 21:56:18 GMT Content-Length: 0 Content-Type: text/html; charset=UTF-8
Step 3—Upload the file blocks
Next, you implement a PUT Object request that sends the file blocks to Google Cloud Storage. The PUT Object request includes an
upload_id
query string parameter, which specifies the upload ID that you obtained in Step 2.
The sizes of all the blocks written, except the final block, must be a multiple of 256K bytes (that is, 262144 bytes).
For each block, with the exclusion of the last block, perform a PUT request and assign to the
Content-Range
the value X-Y/*.
Where X represents the first byte, and Y the last byte of the block.
For example, if the size of the first block to transfer is 512K, then X = 0, Y = 524287 (that is, 524288 - 1).
The following code shows how to perform the related request.
PUT /myFile.zip?upload_id=tvA0ExBntDa...gAAEnB2Uowrot HTTP/1.1 Host: example.storage.googleapis.com Date: Fri, 22 Jun 2012 21:56:18 GMT Content-Length: 524288 Content-Range: bytes 0-524287/* Authorization: Bearer {your_auth_token}
After each upload, Google Cloud Storage responds with a 308 Resume Incomplete status code. For more information, see Resumable Uploads .
For the final block, perform the PUT request and assign to the
Content-Range
the value X-Y/Z, where X and Y
are as defined before, and Z is the total byte count.
To keep the things simple, let us assume that the size of the file to transfer is 588288 bytes.
After the first transfer of 524288 bytes, shown in the previous example, there are 64000 (that is, 588288 - 524288) bytes left.
From this it results that X = 524288 (the first byte in the block), Z = 588288 (the file size),
and Y = 588287 (the last byte in the block that is, 588288 - 1).
The following code shows how to perform the related request.
PUT /myFile.zip?upload_id=tvA0ExBntDa...gAAEnB2Uowrot HTTP/1.1 Host: example.storage.googleapis.com Date: Fri, 22 Jun 2012 21:56:18 GMT Content-Length: 64000 Content-Range: bytes 524288-588287/588288 Authorization: Bearer {your_auth_token}
Recommended Practices
An upload ID expires after one week. We recommend that you start a resumable upload as soon as you obtain the upload ID, and that you resume an interrupted upload shortly after the interruption occurred.
If you use an expired upload ID in a request, you will receive a 400 Bad Request status code. In this case, you will have to initiate the resumable upload, obtain a new upload ID, and start the upload from the beginning using the new upload ID.
Also, you should retry any requests that return the following status codes:
- 408 Request Timeout
- 500 Internal Server Error
- 502 Bad Gateway
- 503 Service Unavailable
- 504 Gateway Timeout
When performing retry requests, use randomized binary exponential backoff: wait a random period between [0..1] seconds and retry; if that fails, wait a random period between [0..2] seconds and retry; if that fails, wait a random period between [0..4] seconds and retry, and so on. For more information, see binary exponential backoff . As an example, you can also see the boto implementation of this logic.
In addition, we recommend that you request an integrity check of the final uploaded object to be sure that it matches the source file. You can do this by calculating the MD5 digest of the source file and adding it to the
Content-MD5
request header. Checking the integrity of the uploaded file is particularly important if you are uploading a large file over a long period of time because there is an increased likelihood of the source file being modified over the course of the upload operation.
Specifying Bucket Locations
For various reasons, you may want to store your Google Cloud Storage data in a specific geographic location. You can do this by providing a location constraint when you create a bucket. The location constraint tells Google Cloud Storage to store your bucket and its contents on a server in the specified location. For example, if you specify a location constraint of "EU" on bucket A, then bucket A and any objects in bucket A will be stored in the European Union.
Currently, it is possible to store your buckets on servers in the following locations:
- ASIA - Asia
- EU - European Union
- US - United States
- ASIA-EAST1 - Eastern Asia-Pacific
- US-CENTRAL1 - Central United States
- US-CENTRAL2 - Central United States
- US-EAST1 - Eastern United States
- US-EAST2 - Eastern United States
- US-EAST3 - Eastern United States
- US-WEST1 - Western United States
If you do not specify a location constraint, your buckets will be stored on servers in the US spanning one or more regions. For more detailed information about where your data is being stored, see the Terms of Service .
Specifying a location constraint on a bucket ensures that data will be stored in a specific
geographical region as specified in our
Terms of Service
. To
improve performance, some of this data may be temporarily cached by Google systems when data is
being written to or read from the location-constrained bucket. You can prevent such delivery-based
caching by proxying data transfers through a Google App Engine application located in the same
region as your region-constrained Google Cloud Storage bucket, and additionally employing appropriate
cache-control
headers for your
Google Cloud Storage objects to specify that they not be cached in edge-caches. For example, the
response header
Cache-Control: no-cache
for an object specifies that the object must not be
used to satisfy a subsequent request. For more information about cache control directives, see
RFC 2616: Cache-Control
.
You can specify or look up a location constraint in different ways depending on what tool you are using:
Using XML API
To specify a location constraint using the XML API, include the
<CreateBucketConfiguration></CreateBucketConfiguration>
in the XML payload. For example, the following request creates a bucket named
helloworld
and sets its location constraint to the EU:
PUT / HTTP/1.1 Host: helloworld.storage.googleapis.com Accept-Encoding: identity Date: Fri, 01 Apr 2011 21:52:39 GMT Content-Length: 92 x-goog-project-id: 123456789123 Authorization: Bearer {your_auth_token} <xml version="1.0" encoding="UTF-8"?> <CreateBucketConfiguration> <LocationConstraint>EU</LocationConstraint> </CreateBucketConfiguration>
To look up a location constraint, perform a GET request using the
location
query string parameter, as shown in the following example:
GET /?location HTTP/1.1 Host: helloworld.storage.googleapis.com Accept-Encoding: identity Date: Mon, 04 Apr 2011 21:59:48 GMT Content-Length: 0 Authorization: Bearer {your_auth_token}
HTTP/1.1 200 OK Content-Type: application/xml; charset=UTF-8 Content-Length: 81 <?xml version="1.0" encoding="UTF-8"?><LocationConstraint>EU</LocationConstraint>
Using JSON API
In JSON API you specify the location when you create the bucket as described in: Bucket: insert .
Using gsutil
To specify a location constraint using gsutil, include the
-l "<location constraint>"
flag when creating your bucket, as shown in the following example:
gsutil mb -l "EU" gs://helloworld
To look up a location constraint, use the
ls
command with the
-L
flag. For example, the following command looks up location constraint information for the helloworld bucket:
gsutil ls -L -b gs://helloworld
Google Cloud Storage provides the following response:
gs://helloworld/ : Storage class: STANDARD Location constraint: EU ...
Using Boto
To specify a location constraint using Boto, run the following commands:
from boto import storage_uri storage_uri('<bucket_name>').create_bucket(location='<location constraint>');
For example, the following commands create a bucket named helloworld with a location constraint in the EU:
from boto import storage uri storage_uri('gs://helloworld').create_bucket(location='EU')
To look up a location constraint using Boto, run the following commands:
from boto import storage_uri location = storage_uri('<bucket_name>').get_location() location
For example, the following commands look up the location constraint for the helloworld bucket:
>>> from boto import storage_uri >>> location = storage_uri('gs://helloworld').get_location() >>> location u'EU'
Streaming Transfers
Google Cloud Storage supports streaming transfers with the gsutil tool or boto library, based on HTTP chunked transfer encoding. Streaming data lets you stream data to and from your Google Cloud Storage account as soon as it becomes available without requiring that the data be first saved to a separate file. Streaming transfers are useful if you have a process that generates data and you do not want to buffer it locally before uploading it or if you want to send the result from a computational pipeline directly into Google Cloud Storage.
For more information on chunked transfer encoding, see RFC 2616 §3.6.1 .
To stream uploads and downloads using gsutil:
To use gsutil to perform a streaming upload, pipeline your data to a
gsutil cp
command and replace the file to be copied with a dash. The following example shows a process called collect_measurements whose output is being transferred to a Google Cloud Storage object named data_measurements:
collect_measurements | gsutil cp - gs://my_app_bucket/data_measurements
Similarly, to perform streaming downloads using gsutil, pipeline your data with the
gsutil cp
command and a dash:
gsutil cp gs://bucket/object - | <process data>
The following example shows the object named data_measurements being streamed and sorted:
gsutil cp gs://my_app_bucket/data_measurements - | sort
To stream uploads and downloads using boto:
To use boto to perform a streaming upload, run the following commands:
dst_uri = boto.storage_uri(<bucket> + '/' + <object>, 'gs') dst_uri.new_key().set_contents_from_stream(<stream object>)
For example, the following command performs a streaming upload of a file named data_file to an object with the same name:
filename = 'data_file' MY_BUCKET = 'my_app_bucket' my_data = open(filename, 'rb') dst_uri = boto.storage_uri(MY_BUCKET + '/' + filename, 'gs') dst_uri.new_key().set_contents_from_stream(my_data)
Note: Although this example demonstrates opening a file to get an input stream, you could use any stream object in place of my_data above.
To perform a streaming download using boto, run the following commands:
import sys src_uri = boto.storage_uri(<bucket> + '/' + <object>, 'gs') src_uri.get_key().get_file(sys.stdout)
For example, the following command performs a streaming download of an object named data_file:
downloaded_file = 'saved_data_file' MY_BUCKET = 'my_app_bucket' object_name = 'data_file' src_uri = boto.storage_uri(MY_BUCKET + '/' + object_name, 'gs') src_uri.get_key().get_file(sys.stdout)
Caution:
Google Cloud Storage does not compare the
Content-MD5
from the start of a streaming upload to the
Content-MD5
of the completed upload. This is because Google Cloud Storage does not know anything about the content of the upload until the upload has been completed and therefore, cannot create a
Content-MD5
at the beginning of the upload. As a result, you should make sure to perform an integrity check on your streaming uploads after they have been completed.
Best Practices and Security Considerations
Google Cloud Storage provides easy and secure ways to protect data in the Cloud by offering OAuth 2.0 authentication , ACLs , and project permissions . You should also implement some best practices and be aware of security considerations so you can prevent unwanted access and enhance your Google Cloud Storage experience.
Bucket and object ACLs are independent of each other
Bucket and object ACLs are independent of each other, which means that the ACLs on a bucket do not affect the ACLs on objects inside that bucket. It is possible for a user without permissions to a bucket to have permissions to an object inside the bucket. For example, you can create a bucket such that only GroupA is granted permission to list the objects in the bucket but then upload an object into that bucket that allows GroupB
READ
access to the object. GroupB will be able to read the object, but will not be able to view the contents of the bucket or perform bucket-related tasks. When you first create or upload a bucket, default bucket and object ACLs are applied. Default ACLs are described in
Access Control
.
Third parties can determine the existence of buckets and objects
By its nature, Google Cloud Storage requests reference buckets and objects by using bucket and object names. This implies that even though ACLs will prevent unauthorized third parties from operating on buckets or objects, a third party can attempt requests with bucket or object names and determine their existence by observing the error responses. It can then be possible for information in bucket or object names to be leaked. If you are concerned about the privacy of your bucket or object names, you should take the appropriate precautions, such as:
-
Choosing bucket and object names that are difficult to guess.
For example, a bucket named mybucket-GTbyTuL3 is random enough that unauthorized third parties cannot feasibly guess it or enumerate other buckets names from it. Sufficiently unpredictable names would deter third parties from learning and confirming the existence of your bucket solely by means of making requests to the Google Cloud Storage API.
-
Avoiding use of sensitive information as part of bucket or object names.
For example, instead of naming your bucket mysecretproject-prodbucket , name it somemeaninglesscodename-prod . In some applications, you may want to keep sensitive metadata in custom Google Cloud Storage headers such as
x-goog-meta
, rather than encode it in object names.
Data and credentials should be handled securely
There are several things that you can do to practice safe credential and data handling, as discussed below:
-
Use TLS (HTTPS)
Always use TLS (HTTPS) to transport your data when you can. This ensures that your credentials as well as your data are protected as you transport data over the network. For example, to access the Google Cloud Storage API, you should use https://storage.googleapis.com.
-
Use an HTTPS library that validates server certificates
Make sure that you use an HTTPS library that validates server certificates. A lack of server certificate validation makes your application vulnerable to man-in-the-middle attacks or other attacks. Be aware that HTTPS libraries shipped with certain commonly used implementation languages do not, by default, verify server certificates. For example, Python before version 3.2 has no built-in or complete support for server certificate validation and you need to use third-party wrapper libraries to ensure your application validates server certificates. Boto includes code that validates server certificates by default.
-
Revoke access when it is no longer necessary
When applications no longer need access to your data, you should revoke their authentication credentials. For Google services and APIs, you can do this by logging into your Google Account and clicking on Authorizing applications and sites . On the next page, you can revoke access for applications by clicking Revoke Access next to the application.
-
Sanitize credentials if you are posting HTTP protocol detail
When you print out HTTP protocol details, your authentication credentials, such as OAuth 2.0 tokens, are visible in the headers. If you need to post protocol details to a message board or need to supply HTTP protocol details for troubleshooting, make sure that you sanitize or revoke any credentials that appear as part of the output.
-
Securely store your credentials
Make sure that you securely store your credentials. This can be done differently depending on your environment and where you store your credentials. For example, if you store your credentials in a configuration file, make sure that you set appropriate permissions on that file to prevent unwanted access. If you are using Google App Engine, consider using
StorageByKeyName
to store your credentials.
Server-Side Encryption
Google Cloud Storage automatically encrypts all data before it is written to disk, at no additional charge. There is no setup or configuration required, no need to modify the way you access the service and no visible performance impact. The data is automatically and transparently decrypted when read by an authorized user.
With server-side encryption, Google manages the cryptographic keys on your behalf using the same hardened key management systems that we use for our own encrypted data, including strict key access controls and auditing. Each Cloud Storage object’s data and metadata is encrypted under the 128-bit Advanced Encryption Standard , and each encryption key is itself encrypted with a regularly rotated set of master keys.
Server-side encryption can be used in combination with client-side encryption. In client-side encryption, you manage your own encryption keys and encrypt data before writing it to Google Cloud Storage. In this case, your data is encrypted twice, once with your keys and once with Google's keys.
To protect your data as it travels over the Internet during read and write operations, use Transport Layer Security (HTTPS).