Google Cloud Storage encourages users to validate the data they transfer to/from their buckets using either CRC32c or MD5 checksums. This section describes best practices for performing these validations.
Contents
Using Hashes for Integrity Checking
There are a variety of ways that data can be corrupted while uploading to or downloading from the Cloud:
- Noisy network links
- Memory errors on client or server computers, or routers along the path
- Software bugs (e.g., in a library that customers use)
To protect against data corruption, Google Cloud Storage supports two types of hashes: CRC32C and MD5 (described below). Google recommends that customers use CRC32C for all cases, as described in the Validation section below. Customers that prefer MD5 can use that hash, but that hash is not supported for composite objects or range GETs.
CRC32C
All GCS objects have a CRC32c hash. CRC32C is a 32 bit Cyclic Redundancy Check (CRC) based on the Castagnoli polynomial. This CRC is described by the IETF specification for SCTP . Libraries for computing CRC32c include Boost for C++, crcmod for Python, and digest-crc for Ruby. Java users can find an implementation of the algorithm in the GoogleCloudPlatform crc32c Java project .