This document discusses how to download and review access logs and storage data for your Google Cloud Storage buckets, and analyze the logs using Google BigQuery .
Contents
- Introduction
- Setting up log Delivery
- Checking logging status
- Downloading logs
- Analyzing logs in BigQuery
- Disabling logging
- Access and storage log format
Introduction
Google Cloud Storage offers access logs and storage data in the form of CSV files that you can download and view. Access logs provide information for all of the requests made on a specified bucket and are created hourly, while the daily storage logs provide information about the storage consumption of that bucket for the last day. The access logs and storage data files are automatically created as new objects in a bucket that you specify. Access and storage logs are currently only available in CSV format.
When you configure a Google Cloud Storage bucket to simulate the behavior of a static website, you might want to log how resources in the website are being used. Note that you can also configure bucket access logs and storage data files for any Google Cloud Storage bucket.
Back to topSetting up log delivery
The following steps describe how to set up logs delivery for a specific bucket using the gsutil tool, the XML API , and the JSON API . If you don't have the gsutil tool, download and install it.
gsutil
-
Create a bucket to store your logs.
Create a bucket to store your logs using the following command:
gsutil mb gs://my_logs
-
Set permissions to allow Google Cloud Storage
WRITE
permission to the bucket.Google Cloud Storage must have
WRITE
permission to create and store your logs as new objects. To grant Google Cloud StorageWRITE
access to your bucket, grant the "[email protected]" group write access with the following command:gsutil acl ch -g [email protected]:W gs://my_logs
-
Enable logging for your bucket.
You can enable logging for your bucket using the
logging
command:gsutil logging set on -b gs://my_logs [-o log_object_prefix ] gs://bucket_name
Optionally, you can set the
log_object_prefix
object prefix for your log objects. The object prefix forms the beginning of the log object name. It can be at most 900 characters and must be a valid object name . By default, the object prefix is the name of the bucket for which the logs are enabled.
Log objects will have the default object acl of the log bucket. You can set the default object acl of the log bucket using gsutil. For example, to set the default object acl to project-private:
gsutil defacl set project-private gs://my_logs
XML API
-
Create a bucket to store your logs.
Create a bucket to store your logs using the following request:
PUT /my_logs HTTP/1.1 Host: storage.googleapis.com
-
Set permissions to allow Google Cloud Storage
WRITE
permission to the bucket.Google Cloud Storage must have
WRITE
permission to create and store your logs as new objects. To grant Google Cloud StorageWRITE
access to your bucket, add an ACL entry for the bucket that grants the "[email protected]" group write access. Be sure to include all existing ACLs for the bucket, in addition to the new ACL, in the request.PUT /my_logs?acl HTTP/1.1 Host: storage.googleapis.com <AccessControlList> <Entries> <Entry> <Scope type="GroupByEmail"> <EmailAddress>[email protected]</EmailAddress> </Scope> <Permission>WRITE</Permission> </Entry> <!-- include other existing ACL entries here--> </Entries> </AccessControlList>
-
Enable logging for your bucket.
You can enable logging for your bucket using the logging query parameter:
PUT /bucket_name?logging HTTP/1.1 Host: storage.googleapis.com <Logging> <LogBucket>my_logs</LogBucket> <LogObjectPrefix>log_object_prefix</LogObjectPrefix> </Logging>
JSON API
-
Create a bucket to store your logs.
Create a bucket to store your logs using the following request:
POST /storage/v1beta2/b?project=project-id Host: www.googleapis.com { "name": "my_logs" }
-
Set permissions to allow Google Cloud Storage
WRITE
permission to the bucket.Google Cloud Storage must have
WRITE
permission to create and store your logs as new objects. To grant Google Cloud StorageWRITE
access to your bucket, add an ACL entry for the bucket that grants the "[email protected]" group write access. You can do this with the following request to the BucketAccessControls resource for the logging bucket:POST /storage/v1beta2/b/my_logs/acl Host: www.googleapis.com { "entity": "[email protected]", "role": "WRITER" }
-
Enable logging for your bucket.
You can enable logging for your bucket using the following request:
PATCH /storage/v1beta2/b/bucket_name Host: www.googleapis.com { "logging": { "logBucket": "my_logs", "logObjectPrefix": "log_object_prefix" } }
Checking logging status
gsutil
Using gsutil, you can check logging by using the
logging get
command:
gsutil logging get gs://bucket_name
You can also save the logging configurations to a file:
gsutil logging get gs://bucket_name > your_logging_configuration_file
If logging is enabled, the server returns the logging configuation in the response:
{"logObjectPrefix": "log_object_prefix", "logBucket": "my_logs"}
If logging is not enabled, the following is returned:
gs://bucket_name/ has no logging configuration.
XML API
Using the Google Cloud Storage XML API, you can send a GET request for the bucket's logging configuration as shown in the following example.
GET /bucket_name?logging HTTP/1.1 Host: storage.googleapis.com
If logging is enabled, the server sends the configuration in the response. A response might look similar to the following:
<?xml version="1.0" ?> <Logging> <LogBucket> my_logs </LogBucket> <LogObjectPrefix> log_object_prefix </LogObjectPrefix> </Logging>
If logging is not enabled, an empty configuration is returned:
<?xml version="1.0" ?> <Logging/>
JSON API
Using the Google Cloud Storage JSON API, you can send a GET request for the bucket's logging configuration as shown in the following example.
GET /storage/v1beta2/b/bucket_name?fields=logging Host: www.googleapis.com
If logging is enabled, the server sends the configuration in the response. A response might look similar to the following:
{ "logging": { "logBucket": "my_logs", "logObjectPrefix": "log_object_prefix" } }
If logging is not enabled, an empty configuration is returned:
{ }
Downloading logs
Storage logs are generated once a day and contain the storage usage for the previous day. They are typically created before 10:00 am PST.
Usage logs are generated hourly when there is activity to report in the monitored bucket. Usage logs are typically created 15 minutes after the end of the hour. Here are several things to keep in mind when working with usage logs:
- Any log processing of usage logs should take into account the possibility that they may be delivered later than 15 minutes after the end of an hour.
- Usually, hourly usage log object(s) contain records for all usage that occurred during that hour. Occasionally, an hourly usage log object contains records for an earlier hour, but never for a later hour.
- Google Cloud Storage may write multiple log objects for the same hour.
-
Occasionally, a single record may appear twice in the usage logs. While we make our best effort
to remove duplicate records,
your log processing should be able to remove them if it is critical to your log analysis.
You can use the
s_request_id
field to detect duplicates.
Access to your logs is controlled by the ACL on the log objects. Log objects have the default object acl of the log bucket.
The easiest way to download your logs and storage data is either through the Google Developers Console or using the gsutil tool. Your access logs are in CSV format and have the following naming convention:
gs://<bucket_name>/<object_prefix>_usage_<timestamp>_<id>_v0
For example, the following is an access logs object for a bucket named gs://finance-data, created on June 18, 2013 at 14:00 UTC and stored in the bucket gs://my_logs:
gs://my_logs/finance-data_usage_2013_06_18_14_00_00_1702e6_v0
Storage data logs are named using the following convention:
gs://<bucket_name>/<object_prefix>_storage_<timestamp>_<id>_v0
For example, the following is a storage data logs object for the same bucket on June 18, 2013:
gs://my_logs/finance-data_storage_2013_06_18_07_00_00_1702e6_v0
gsutil
To download logs using gsutil, run the following command:
gsutil cp <logs_object> <destination_uri>
Google Developers Console
To download logs using Google Developers Console:
- Log in to the Google Developers Console .
- Select the project that contains the logs.
- Click on the Google Cloud Storage service.
- Select your log bucket.
- Download or view your logs by clicking on the appropriate log object.
Anaylyzing logs in BigQuery
To query your Google Cloud Storage usage and storage logs, you can use Google BiqQuery which enables fast, SQL-like queries against append-only tables. The BigQuery Command-Line Tool (bq) is a Python-based tool that allows you to access BigQuery from the command line. For information about downloading and using bq, see the bq Command-Line Tool reference page.
Loading logs into BigQuery
-
Select a default project.
For details about selecting a project, see Working With Projects .
-
Create a new dataset.
$ bq mk storageanalysis Dataset 'storageanalysis' successfully created.
List the datasets in the project:
$ bq ls datasetId ----------------- storageanalysis
-
Save the usage and storage schemas to your local computer for use in the load command.
You can find the schemas to use at these locations: cloud_storage_usage_schema_v0 and cloud_storage_storage_schema_v0 . The schemas are also described in the section Access and Storage Logs Format .
-
Load the access logs into the dataset.
$ bq load --skip_leading_rows=1 storageanalysis.usage \ gs://my_logs/bucket_usage_2014_01_15_14_00_00_1702e6_v0 \ ./cloud_storage_usage_schema_v0.json $ bq load --skip_leading_rows=1 storageanalysis.storage \ gs://my_logs/bucket_storage_2014_01_05_14_00_00_091c5f_v0 \ ./cloud_storage_storage_schema_v0.json
These commands do the following:
-
Load usage and storage logs from the bucket
my_logs
. -
Create tables
usage
andstorage
in the datasetstorageanalysis
. - Read schema data (.json file) from the same directory where the bq command runs.
- Skip the first row of each log file because it contains column descriptions.
Because this was the first time you ran the load command in the example here, the tables
usage
andstorage
were created. You could continue to append to these tables with subsequent load commands with different access log file names or using wildcards. For example, the following command appends data from all logs that start with "bucket_usuage_2014", to thestorage
table:$ bq load --skip_leading_rows=1 storageanalysis.usage \ gs://my_logs/bucket_usage_2014* \ ./cloud_storage_usage_schema.json
When using wildcards, you might want to move logs already uploaded to BiqQuery to another directory ( e.g.,
gs://my_logs/processed
) to avoid uploading data from a log more than once. -
Load usage and storage logs from the bucket
BiqQuery functionality can also be accessed through the BigQuery Browser Tool . With the browser tool, you can load data through the create table process.
For additional information about loading data from Google Cloud Storage, including programmatically loading data, see Loading data from Google Cloud Storage .
Modifying the access log schema
In some scenarios, you may find it useful to pre-process access logs before loading into BigQuery. For example, you can add additional information to the access logs to make your query analysis easier in BigQuery. In this section, we'll show how you can add the file name of each storage access log to the log. This requires modifying the existing schema and each log file.
-
Modify the existing schema,
cloud_storage_storage_schema_v0
, to add file name as shown below. Give the new schema
a new name, for example,
cloud_storage_storage_schema_custom.json
, to distinguish from the original.
[ {"name": "bucket", "type": "string", "mode": "REQUIRED"}, {"name": "storage_byte_hours","type": "integer","mode": "REQUIRED"}, {"name": "filename","type": "string","mode": "REQUIRED"} ]
-
Pre-process storage access log files based on the new schema, before loading them into BigQuery.
For example, the following commands can be used in a Linux/Mac OS X or Windows (Cygwin) environment:
gsutil cp gs://my_logs/bucket_storage* . for f in bucket_storage*; do sed -i -e "1s/$/,\"filename\"/" -e "2s/$/,\""$f"\"/" $f; done
The gsutil command copies the files into your working directory. The second command loops through the log files and adds "filename" to the description row (first row) and the actual file name to the data row (second row). Here's an example of a modified log file:
"bucket","storage_byte_hours","filename" "example-bucket","5532482018","bucket_storage_2014_01_05_08_00_00_021fd_v0"
-
When you load the storage access logs into BigQuery, load your locally modified logs and use
the customized schema.
for f in bucket_storage*; \ do ./bq.py load --skip_leading_rows=1 storageanalysis.storage $f ./cloud_storage_storage_schema_custom.json; done
Querying logs in BigQuery
Once your logs are loaded into BigQuery, you can query your access logs to return information about your logged bucket(s). The following example shows you how to use the bq tool in a scenario where you have access logs for a bucket over several days and you have loaded the logs as shown in Loading access logs into BigQuery . You can also execute the queries below using the BigQuery Browser Tool .
-
In the bq tool, enter the interactive mode.
$ bq shell
-
Run a query against the storage log table.
For example, the following query shows how the storage of a logged bucket changes in time. It assumes that you modified the storage access logs as described in Modifying the Access Log Schema and that the log files are named "log_storage_*".
project-name>SELECT SUBSTRING(filename, 13, 10) as day, storage_byte_hours/24 as size FROM [storageanalysis.storage] ORDER BY filename LIMIT 100
Example output from the query:
Waiting on bqjob_r36fbf5c164a966e8_0000014379bc199c_1 ... (0s) Current status: DONE +------------+----------------------+ | day | size | +------------+----------------------+ | 2014_01_05 | 2.3052008408333334E8 | | 2014_01_06 | 2.3012297245833334E8 | | 2014_01_07 | 3.3477797120833334E8 | | 2014_01_08 | 4.4183686058333334E8 | +-----------------------------------+
If you did not modify the schema and are using the default schema, you can run the following query:
project-name>SELECT storage_byte_hours FROM [storageanalysis.storage] LIMIT 100
-
Run a query against the usage log table.
For example, the following query shows how to summarize the request methods that clients use to access resources in the logged bucket.
project-name>SELECT cs_method, COUNT(*) AS count FROM [storageanalysis.usage] GROUP BY cs_method
Example output from the query:
Waiting on bqjob_r1a6b4596bd9c29fb_000001437d6f8a52_1 ... (0s) Current status: DONE +-----------+-------+ | cs_method | count | +-----------+-------+ | PUT | 8002 | | GET | 12631 | | POST | 2737 | | HEAD | 2173 | | DELETE | 7290 | +-----------+-------+
-
Quit the interactive shell of the bq tool.
project-name> quit
Disabling logging
gsutil
Using gsutil, disable logging with the
disablelogging
command:
gsutil disablelogging gs://bucket_name
To check that logging was successfully disabled, perform a
logging get
request:
gsutil logging get gs://bucket_name
If logging is disabled, the following is returned:
gs://bucket_name/ has no logging configuration.
XML API
Using the Google Cloud Storage XML API, disable logging by sending a PUT request to the bucket's logging configuration as shown in the following example:
PUT /bucket_name?logging HTTP/1.1 Host: storage.googleapis.com <Logging/>
JSON API
Using the Google Cloud Storage JSON API, disable logging by sending a PATCH request to the bucket's logging configuration as shown in the following example.
PATCH /bucket_name?logging HTTP/1.1 Host: storage.googleapis.com { "logging": null }
Access and storage log format
The access logs and storage data files can provide an overwhelming amount of information. You can use the following tables to help you identify all the information provided in these logs.
Access log fields:
Field | Type | Description |
---|---|---|
time_micros
|
integer | The time that the request was completed, in microseconds since the Unix epoch . |
c_ip
|
string | The IP address from which the request was made. The "c" prefix indicates that this is information about the client. |
c_ip_type
|
integer |
The type of IP in the c_ip field:
|
c_ip_region
|
string | Reserved for future use. |
cs_method
|
string | The HTTP method of this request. The "cs" prefix indicates that this information was sent from the client to the server. |
cs_uri
|
string | The URI of the request. |
sc_status
|
integer | The HTTP status code the server sent in response. The "sc" prefix indicates that this information was sent from the server to the client. |
cs_bytes
|
integer | The number of bytes sent in the request. |
sc_bytes
|
integer | The number of bytes sent in the response. |
time_taken_micros
|
integer | The time it took to serve the request in microseconds. |
cs_host
|
string | The host in the original request. |
cs_referer
|
string | The HTTP referrer for the request. |
cs_user_agent
|
string |
The
User-Agent
of the request. The value is
GCS Lifecycle Management
for requests made by
lifecycle management
.
|
s_request_id
|
string | The request identifier. |
cs_operation
|
string |
The Google Cloud Storage operation e.g.
GET_Object
.
|
cs_bucket
|
string | The bucket specified in the request. If this is a list buckets request, this can be null. |
cs_object
|
string | The object specified in this request. This can be null. |
Storage data fields:
Field | Type | Description |
---|---|---|
bucket
|
string | The name of the bucket. |
storage_byte_hours
|
integer | Average size in byte-hours over a 24 hour period of the bucket. To get the total size of the bucket, divide byte-hours by 24. |