Access Logs & Storage Data

This document discusses how to download and review access logs and storage data for your Google Cloud Storage buckets, and analyze the logs using Google BigQuery .

Introduction

Google Cloud Storage offers access logs and storage data in the form of CSV files that you can download and view. Access logs provide information for all of the requests made on a specified bucket and are created hourly, while the daily storage logs provide information about the storage consumption of that bucket for the last day. The access logs and storage data files are automatically created as new objects in a bucket that you specify. Access and storage logs are currently only available in CSV format.

When you configure a Google Cloud Storage bucket to simulate the behavior of a static website, you might want to log how resources in the website are being used. Note that you can also configure bucket access logs and storage data files for any Google Cloud Storage bucket.

Setting up log delivery

The following steps describe how to set up logs delivery for a specific bucket using the gsutil tool, the XML API , and the JSON API . If you don't have the gsutil tool, download and install it.

gsutil

Create a bucket to store your logs.
Create a bucket to store your logs using the following command:
```
gsutil mb gs://my_logs
```
Set permissions to allow Google Cloud Storage WRITE permission to the bucket.
Google Cloud Storage must have WRITE permission to create and store your logs as new objects. To grant Google Cloud Storage WRITE access to your bucket, grant the "[email protected]" group write access with the following command:
```
gsutil acl ch -g [email protected]:W gs://my_logs
```

Log objects will have the default object acl of the log bucket. You can set the default object acl of the log bucket using gsutil. For example, to set the default object acl to project-private:

gsutil defacl set project-private gs://my_logs

Enable logging for your bucket.
You can enable logging for your bucket using the logging command:
```
gsutil logging set on -b gs://my_logs [-o log_object_prefix ] gs://bucket_name
```
Optionally, you can set the log_object_prefix object prefix for your log objects. The object prefix forms the beginning of the log object name. It can be at most 900 characters and must be a valid object name . By default, the object prefix is the name of the bucket for which the logs are enabled.

XML API

Create a bucket to store your logs.
Create a bucket to store your logs using the following request:
```
PUT /my_logs HTTP/1.1
Host: storage.googleapis.com
```
Set permissions to allow Google Cloud Storage WRITE permission to the bucket.
Google Cloud Storage must have WRITE permission to create and store your logs as new objects. To grant Google Cloud Storage WRITE access to your bucket, add an ACL entry for the bucket that grants the "[email protected]" group write access. Be sure to include all existing ACLs for the bucket, in addition to the new ACL, in the request.
```
PUT /my_logs?acl HTTP/1.1
Host: storage.googleapis.com

<AccessControlList>
  <Entries>
    <Entry>
      <Scope type="GroupByEmail">
        <EmailAddress>[email protected]</EmailAddress>
      </Scope>
     <Permission>WRITE</Permission>
    </Entry>
    
  </Entries>
</AccessControlList>
```

Enable logging for your bucket.

You can enable logging for your bucket using the logging query parameter:

PUT /bucket_name?logging HTTP/1.1
Host: storage.googleapis.com

<Logging>
    <LogBucket>my_logs</LogBucket>
    <LogObjectPrefix>log_object_prefix</LogObjectPrefix>
</Logging>

JSON API

Create a bucket to store your logs.
Create a bucket to store your logs using the following request:
```
POST /storage/v1beta2/b?project=project-id
Host: www.googleapis.com

{
  "name": "my_logs"
}
```
Set permissions to allow Google Cloud Storage WRITE permission to the bucket.
Google Cloud Storage must have WRITE permission to create and store your logs as new objects. To grant Google Cloud Storage WRITE access to your bucket, add an ACL entry for the bucket that grants the "[email protected]" group write access. You can do this with the following request to the BucketAccessControls resource for the logging bucket:
```
POST /storage/v1beta2/b/my_logs/acl
Host: www.googleapis.com
{
 "entity": "[email protected]",
 "role": "WRITER"
}
```

Enable logging for your bucket.

You can enable logging for your bucket using the following request:

PATCH /storage/v1beta2/b/bucket_name
Host: www.googleapis.com

{
 "logging": {
  "logBucket": "my_logs",
  "logObjectPrefix": "log_object_prefix"
 }
}

Checking logging status

gsutil

Using gsutil, you can check logging by using the logging get command:

gsutil logging get gs://bucket_name

You can also save the logging configurations to a file:

gsutil logging get gs://bucket_name > your_logging_configuration_file

If logging is enabled, the server returns the logging configuation in the response:

{"logObjectPrefix": "log_object_prefix", "logBucket": "my_logs"}

If logging is not enabled, the following is returned:

gs://bucket_name/ has no logging configuration.

XML API

Using the Google Cloud Storage XML API, you can send a GET request for the bucket's logging configuration as shown in the following example.

GET /bucket_name?logging HTTP/1.1
Host: storage.googleapis.com

If logging is enabled, the server sends the configuration in the response. A response might look similar to the following:

<?xml version="1.0" ?>
<Logging>
    <LogBucket>
        my_logs
    </LogBucket>
    <LogObjectPrefix>
        log_object_prefix
    </LogObjectPrefix>
</Logging>

If logging is not enabled, an empty configuration is returned:

<?xml version="1.0" ?>
<Logging/>

JSON API

Using the Google Cloud Storage JSON API, you can send a GET request for the bucket's logging configuration as shown in the following example.

GET /storage/v1beta2/b/bucket_name?fields=logging
Host: www.googleapis.com

If logging is enabled, the server sends the configuration in the response. A response might look similar to the following:

{
 "logging": {
  "logBucket": "my_logs",
  "logObjectPrefix": "log_object_prefix"
  }
}

If logging is not enabled, an empty configuration is returned:

{
}

Downloading logs

Storage logs are generated once a day and contain the storage usage for the previous day. They are typically created before 10:00 am PST.

Usage logs are generated hourly when there is activity to report in the monitored bucket. Usage logs are typically created 15 minutes after the end of the hour. Here are several things to keep in mind when working with usage logs:

Any log processing of usage logs should take into account the possibility that they may be delivered later than 15 minutes after the end of an hour.
Usually, hourly usage log object(s) contain records for all usage that occurred during that hour. Occasionally, an hourly usage log object contains records for an earlier hour, but never for a later hour.
Google Cloud Storage may write multiple log objects for the same hour.
Occasionally, a single record may appear twice in the usage logs. While we make our best effort to remove duplicate records, your log processing should be able to remove them if it is critical to your log analysis. You can use the s_request_id field to detect duplicates.

Note: Starting March 6, 2013 00:00 UTC storage and usage objects will be written with the suffix


          _YYYY_MM_DD_HH_MM_SS_
          
           <id>
          
          _v0

, where


          
           <id>

matches the regular expression


          [a-z0-9]+

. This replaces the previous suffixes


          _YYYY_MM_DD_v0

for storage objects and


          _YYYY_MM_DD_HH_v0

for usage objects.

Access to your logs is controlled by the ACL on the log objects. Log objects have the default object acl of the log bucket.

The easiest way to download your logs and storage data is either through the Google Developers Console or using the gsutil tool. Your access logs are in CSV format and have the following naming convention:

gs://<bucket_name>/<object_prefix>_usage_<timestamp>_<id>_v0

For example, the following is an access logs object for a bucket named gs://finance-data, created on June 18, 2013 at 14:00 UTC and stored in the bucket gs://my_logs:

gs://my_logs/finance-data_usage_2013_06_18_14_00_00_1702e6_v0

Storage data logs are named using the following convention:

gs://<bucket_name>/<object_prefix>_storage_<timestamp>_<id>_v0

For example, the following is a storage data logs object for the same bucket on June 18, 2013:

gs://my_logs/finance-data_storage_2013_06_18_07_00_00_1702e6_v0

gsutil

To download logs using gsutil, run the following command:

gsutil cp <logs_object> <destination_uri>

Google Developers Console

To download logs using Google Developers Console:

Log in to the Google Developers Console .
Select the project that contains the logs.
Click on the Google Cloud Storage service.
Select your log bucket.
Download or view your logs by clicking on the appropriate log object.

Anaylyzing logs in BigQuery

To query your Google Cloud Storage usage and storage logs, you can use Google BiqQuery which enables fast, SQL-like queries against append-only tables. The BigQuery Command-Line Tool (bq) is a Python-based tool that allows you to access BigQuery from the command line. For information about downloading and using bq, see the bq Command-Line Tool reference page.

Loading logs into BigQuery

Select a default project.
For details about selecting a project, see Working With Projects .

Create a new dataset.

$ bq mk storageanalysis
Dataset 'storageanalysis' successfully created.

List the datasets in the project:

$ bq ls

  datasetId
-----------------
 storageanalysis

Save the usage and storage schemas to your local computer for use in the load command.
You can find the schemas to use at these locations: cloud_storage_usage_schema_v0 and cloud_storage_storage_schema_v0 . The schemas are also described in the section Access and Storage Logs Format .
Load the access logs into the dataset.
```
$ bq load --skip_leading_rows=1 storageanalysis.usage \
          gs://my_logs/bucket_usage_2014_01_15_14_00_00_1702e6_v0 \
          ./cloud_storage_usage_schema_v0.json
$ bq load --skip_leading_rows=1 storageanalysis.storage \
          gs://my_logs/bucket_storage_2014_01_05_14_00_00_091c5f_v0 \
          ./cloud_storage_storage_schema_v0.json
```
These commands do the following:
- Load usage and storage logs from the bucket my_logs .
- Create tables usage and storage in the dataset storageanalysis .
- Read schema data (.json file) from the same directory where the bq command runs.
- Skip the first row of each log file because it contains column descriptions.
Because this was the first time you ran the load command in the example here, the tables usage and storage were created. You could continue to append to these tables with subsequent load commands with different access log file names or using wildcards. For example, the following command appends data from all logs that start with "bucket_usuage_2014", to the storage table:
```
$ bq load --skip_leading_rows=1 storageanalysis.usage \
          gs://my_logs/bucket_usage_2014* \
          ./cloud_storage_usage_schema.json
```
When using wildcards, you might want to move logs already uploaded to BiqQuery to another directory ( e.g., gs://my_logs/processed ) to avoid uploading data from a log more than once.

BiqQuery functionality can also be accessed through the BigQuery Browser Tool . With the browser tool, you can load data through the create table process.

For additional information about loading data from Google Cloud Storage, including programmatically loading data, see Loading data from Google Cloud Storage .

Modifying the access log schema

In some scenarios, you may find it useful to pre-process access logs before loading into BigQuery. For example, you can add additional information to the access logs to make your query analysis easier in BigQuery. In this section, we'll show how you can add the file name of each storage access log to the log. This requires modifying the existing schema and each log file.

Modify the existing schema, cloud_storage_storage_schema_v0 , to add file name as shown below. Give the new schema a new name, for example, cloud_storage_storage_schema_custom.json , to distinguish from the original.
```
[  {"name": "bucket", "type": "string", "mode": "REQUIRED"},
   {"name": "storage_byte_hours","type": "integer","mode": "REQUIRED"},
   {"name": "filename","type": "string","mode": "REQUIRED"}
]
```
Pre-process storage access log files based on the new schema, before loading them into BigQuery.
For example, the following commands can be used in a Linux/Mac OS X or Windows (Cygwin) environment:
```
gsutil cp gs://my_logs/bucket_storage* .
for f in bucket_storage*; do sed -i -e "1s/$/,\"filename\"/" -e "2s/$/,\""$f"\"/" $f; done
```
The gsutil command copies the files into your working directory. The second command loops through the log files and adds "filename" to the description row (first row) and the actual file name to the data row (second row). Here's an example of a modified log file:
```
"bucket","storage_byte_hours","filename"
"example-bucket","5532482018","bucket_storage_2014_01_05_08_00_00_021fd_v0"
```

When you load the storage access logs into BigQuery, load your locally modified logs and use the customized schema.

for f in bucket_storage*; \
  do ./bq.py load --skip_leading_rows=1 storageanalysis.storage $f ./cloud_storage_storage_schema_custom.json; done

Querying logs in BigQuery

Once your logs are loaded into BigQuery, you can query your access logs to return information about your logged bucket(s). The following example shows you how to use the bq tool in a scenario where you have access logs for a bucket over several days and you have loaded the logs as shown in Loading access logs into BigQuery . You can also execute the queries below using the BigQuery Browser Tool .

In the bq tool, enter the interactive mode.
```
$ bq shell
```

Run a query against the storage log table.

For example, the following query shows how the storage of a logged bucket changes in time. It assumes that you modified the storage access logs as described in Modifying the Access Log Schema and that the log files are named "log_storage_*".

project-name>SELECT SUBSTRING(filename, 13, 10) as day, storage_byte_hours/24 as size FROM [storageanalysis.storage] ORDER BY filename LIMIT 100

Example output from the query:

Waiting on bqjob_r36fbf5c164a966e8_0000014379bc199c_1 ... (0s) Current status: DONE
+------------+----------------------+
|    day     |         size         |
+------------+----------------------+
| 2014_01_05 | 2.3052008408333334E8 |
| 2014_01_06 | 2.3012297245833334E8 |
| 2014_01_07 | 3.3477797120833334E8 |
| 2014_01_08 | 4.4183686058333334E8 |
+-----------------------------------+

If you did not modify the schema and are using the default schema, you can run the following query:

project-name>SELECT storage_byte_hours FROM [storageanalysis.storage] LIMIT 100

Run a query against the usage log table.

For example, the following query shows how to summarize the request methods that clients use to access resources in the logged bucket.

project-name>SELECT cs_method, COUNT(*) AS count FROM [storageanalysis.usage] GROUP BY cs_method

Example output from the query:

Waiting on bqjob_r1a6b4596bd9c29fb_000001437d6f8a52_1 ... (0s) Current status: DONE
+-----------+-------+
| cs_method | count |
+-----------+-------+
| PUT       |  8002 |
| GET       | 12631 |
| POST      |  2737 |
| HEAD      |  2173 |
| DELETE    |  7290 |
+-----------+-------+

Quit the interactive shell of the bq tool.
```
project-name> quit
```

Disabling logging

gsutil

Using gsutil, disable logging with the disablelogging command:

gsutil disablelogging gs://bucket_name

To check that logging was successfully disabled, perform a logging get request:

gsutil logging get gs://bucket_name

If logging is disabled, the following is returned:

gs://bucket_name/ has no logging configuration.

XML API

Using the Google Cloud Storage XML API, disable logging by sending a PUT request to the bucket's logging configuration as shown in the following example:

PUT /bucket_name?logging HTTP/1.1
Host: storage.googleapis.com

<Logging/>

JSON API

Using the Google Cloud Storage JSON API, disable logging by sending a PATCH request to the bucket's logging configuration as shown in the following example.

PATCH /bucket_name?logging HTTP/1.1
Host: storage.googleapis.com

{
 "logging": null
}

Access and storage log format

The access logs and storage data files can provide an overwhelming amount of information. You can use the following tables to help you identify all the information provided in these logs.

Access log fields:

Field	Type	Description
`time_micros`	integer	The time that the request was completed, in microseconds since the Unix epoch .
`c_ip`	string	The IP address from which the request was made. The "c" prefix indicates that this is information about the client.
`c_ip_type`	integer	The type of IP in the c_ip field: A value of 1 indicates an IPV4 address. A value of 2 indicates an IPV6 address.
`c_ip_region`	string	Reserved for future use.
`cs_method`	string	The HTTP method of this request. The "cs" prefix indicates that this information was sent from the client to the server.
`cs_uri`	string	The URI of the request.
`sc_status`	integer	The HTTP status code the server sent in response. The "sc" prefix indicates that this information was sent from the server to the client.
`cs_bytes`	integer	The number of bytes sent in the request.
`sc_bytes`	integer	The number of bytes sent in the response.
`time_taken_micros`	integer	The time it took to serve the request in microseconds.
`cs_host`	string	The host in the original request.
`cs_referer`	string	The HTTP referrer for the request.
`cs_user_agent`	string	The User-Agent of the request. The value is `GCS Lifecycle Management` for requests made by lifecycle management .
`s_request_id`	string	The request identifier.
`cs_operation`	string	The Google Cloud Storage operation e.g. `GET_Object` .
`cs_bucket`	string	The bucket specified in the request. If this is a list buckets request, this can be null.
`cs_object`	string	The object specified in this request. This can be null.

Storage data fields:

Field	Type	Description
`bucket`	string	The name of the bucket.
`storage_byte_hours`	integer	Average size in byte-hours over a 24 hour period of the bucket. To get the total size of the bucket, divide byte-hours by 24.

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 3.0 License , and code samples are licensed under the Apache 2.0 License . For details, see our Site Policies .

Google Cloud Storage

Access Logs & Storage Data

Contents

Introduction

Setting up log delivery

gsutil

XML API

JSON API

Checking logging status

gsutil

XML API

JSON API

Downloading logs

gsutil

Google Developers Console

Anaylyzing logs in BigQuery

Loading logs into BigQuery

Modifying the access log schema

Querying logs in BigQuery

Disabling logging

gsutil

XML API

JSON API

Access and storage log format

Authentication required

Signing you in...