Click-to-deploy Apache Cassandra: How it works

< Prev Next >

The click-to-deploy process for Cassandra on Google Compute Engine is intended to help you get a development or test environment running quickly. This document provides the details that you might need regarding software installation and tuning in order to use your cluster. In addition, you can use this deployment approach as a starting point if you choose to launch and maintain your own production Cassandra cluster. You will learn the steps that were performed and a few best practices for bringing up any cluster on Google Compute Engine.

Objectives

Learn about the click-to-deploy process and the optimizations that were made, such as running instances with the least privileges to ensure stability and security.
Extend this knowledge to other applications and situations.

Deployment processing

The deployment of the Cassandra cluster is managed by Google Cloud Deployment Manager . Deployment Manager uses templates to help you deploy your configuration. These templates can include actions , which specify shell scripts that can contain specific deployment instructions. In the click-to-deploy templates, these scripts are specific to Cassandra and define the types of Compute Engine instances to create and how to install Cassandra on the instances.

The click-to-deploy templates define two modules:

Deployment coordinator
Cassandra server

The Compute Engine instances that are associated with each of the modules start concurrently and coordinate the installation and configuration of the Cassandra cluster. Each module is described here and the deployment flow is represented visually below.

Deployment coordinator module

A single, short-lived deployment coordinator instance is created during the deployment. This instance exists to ensure that the long-running Cassandra server instances can run under the principle of least priviledge .

Automatically creating and attaching of persistent disks to Compute Engine instances requires the authorization of the https://www.googleapis.com/auth/compute OAuth 2.0 scope. Service account scopes are granted to a Compute Engine instances when it is created and exists for the lifespan of that instance.

Rather than granting this scope to the long-running Cassandra server instances, the scope is granted to the deployment coordinator instance. The deployment coordinator creates and attaches a data disk for each Cassandra server instance.

Cassandra server module

When the Cassandra server instances are created, they complete the following steps:

Download the Cassandra software and source code.
Install the Cassandra software and copy the source code to /usr/src .
Wait for attachment of the data disk by the deployment coordinator.
Format and mount the data disk.
Move Cassandra data, commitlog, and saved_caches to the data disk.
Update the tuning parameters:
1. cassandra.yaml :
  - max_hints_delivery_threads: 8
  - concurrent_writes: 8 * Cores
  - commitlog_total_space_in_mb: 2048
  - memtable_flush_writers: 2
  - trickle_fsync: true
  - trickle_fsync_interval_in_kb: 4096
  - rpc_server_type: hsha
  - concurrent_compactors: 4
  - multithreaded_compaction: true
  - compaction_throughput_mb_per_sec: 0
  - endpoint_snitch: GoogleCloudSnitch
2. cassandra-env.sh :
  - MAX_HEAP_SIZE: defined by the size of box memory ~ 80%
  - HEAP_NEWSIZE: 600M
  - Stack size: -Xss256
  - SurvivorRatio: 4
  - CMSInitiatingOccupancyFraction: 70
  - Additional Java Options: TargetSurvivorRatio=50, +AggressiveOpts, MaxDirectMemorySize=5g, +UseLargePages
Set the Cassandra cluster name in the configuration.
Update the seed values with all of the active nodes.

On deployment, the Cassandra instances will automatically join the cluster given the cluster name and the list of seed nodes set up above.

Deployment work flow

The following graphic shows the work flow for each of the servers. Most of the steps occur independently. The only step that needs to be coordinated is that the individual Cassandra servers wait for the deployment coordinator to create and attach the data disk.

Flow chart showing
the steps involved in deploying the deployment coordinator and each of the
Cassandra servers.

Next steps

Configure your virtual machine instances: Use gcutil ssh to connect to the external IP addresses for your machines and perform any system configuration that you might need on each server. You can look up your IP addresses by using the gcutil listinstances command or you can find them in the Developers Console .

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 3.0 License , and code samples are licensed under the Apache 2.0 License . For details, see our Site Policies .

Last updated July 14, 2014.

Apache Cassandra on Google Cloud