Cassandra on Mesos with Docker

[Editor's Note] Distributed systems are difficult to understand, design, build and manage, and Apache Mesos can run a variety of distributed systems on cluster machines and efficiently share resources. Marathon is a mesos framework that supports running long service. This article describes how to run Docker's Cassandra cluster on Mesos, which is recommended.

aims

We wanted to use the Docker container on the Mesos cluster and deploy the Datastax Cassandra cluster with Marathon running the job for a long time. Do not think we just want to pile up these hot words together, then I will specifically describe how we do it.

Over the past few months, we have used Mesos as a cloud resource manager and have successfully deployed Mesos clusters. These clusters are integrated with Jenkins, Mesos Jekins Scheduler, and the development environment to support the CI / CD environment and run with Marahon for a longer period of time. At the same time, we chose Docker as a container tool. Now we have successfully deployed Cassandra, and we recommend using Datastax Cassandra in high-throughput projects.

Scenes

We used a Cassandra node in the Akka cluster and are now ready to use the Cassandra cluster. When we want to use Cassandra in a Mesos-based test environment, everything is simple because it only requires a single node Cassandra. And when we want to go to the pre-production environment, things began to become complicated, because then you must use the Cassandra cluster. What we have done is the transition from a single node Cassandra to a Cassandra cluster. The project used in this exercise is an open source branch of our own project: Lift . This article describes the details of the project's architecture.

Cassandra background introduction

We started using and promoted the Cassandra database in 2011 as the first user of the Apache Cassandra NoSql database to use the 0.8.x version. We used Cassandra from the start on the AWS cloud. We also use the new version and have successfully upgraded to the new version. 2015, we began to combine Mesosphere with Spotify's Cassandra Docker image .

Program

Cassandra single node

Mesosphere provides a good tool for the Spotify community to contribute to the official Datastax Cassandra version of the Docker image, so it is easy to use Cassandra single nodes in development or simulation environments.

Here is a brief introduction to our system:

Use Puppet as a configuration management tool. Before this article describes how we used Puppet.

Mesosphere tool we choose to use the Marathon framework to run the Cassandra node. Marathon can support applications running on Mesos for a long time, we are using the latest version, the specific content recorded in this blog .

Our scenario is based on the use of the Mesos cluster in the Marathon framework, as well as the Mesos slaves that can run Docker containers. Single node Cassandra build the following code and configuration:
Puppet code, make sure you can run any Marathon tasks:

  Class profile :: marathon_jobs ($ jobs = {}) { 
Create_resources ('profile :: resources :: marathon_job', $ jobs, $ defaults)
}

Create a profile and create a resource to load the Marathon job hash into the json file and send it to the Marathon framework. The streamlined Puppet resource is as follows:

  Define profile :: resources :: marathon_job (...) { 
File {"... / $ {title} .json": ...} ->
Exec {"post $ {title} .json": ...}
}

So the single node Cassandra job in Hiera is configured as follows:

  Profile :: marathon_jobs :: jobs: 
Lift_cassandra
Docker
Image: 'spotify / cassandra'
Privileged: true
Id: 'lift-cassandra'
Instances: '1'
Cpus: '1'
Mem: '1000'
Constraints:
-
- "rack-id"
- "CLUSTER"
- "cassandra-single-rack-1"

We use Marathon jobs constraints to ensure that the Cassandra single node container runs on the Mesos slaves with the desired attributes: [ “rack-id”, “CLUSTER”, “cassandra-single-rack-1”] . This is the basic usage, with the role of resources to do its topology label, because Mesos and Cassandra have a "cluster" concept.

We also want to ensure that the Cassandra single node running Mesos slave has the correct name, so that the application can use it. To this end we use the following Hiera configuration on Route 53 to register:

  Profile :: aws_manager :: route53_register :: records: 
Cassandra: {}

This is all configured, Datastax Cassandra node in the application layer has been available.

Extended to cluster

So far everything is very simple, but if you want to deploy Cassandra cluster on Mesos is not enough.

In order to build the required four nodes of the cluster, we ensure that Mesos has the necessary resources available to Marathon. This requires four Mesos slave dockers that are registered with the known name (s1.cassandra.eigengo.io).
In this exercise, we use spotify Cassandra: cluster on the virtual node, so the token is automatically generated.

Marathon job's Hiera configuration is as follows:

  Profile :: marathon_jobs :: jobs: 
Lift_cassandra_cluster:
Docker
Image: 'cakesolutions / cassandra: cluster'
Privileged: true
Parameters:
- key: env
Value: CASSANDRA_CLUSTERNAME = eigengo
- key: env
Value: CASSANDRA_SEEDS = s1.cassandra.eigengo.io, s2.cassandra.eigengo.io, s3.cassandra.eigengo.io, s4.cassandra.eigengo.io
Id: 'lift-cassandra'
Instances: '4'
Cpus: '4'
Mem: '8000'
Volumes:
- hostPath: "% {hiera ('cassandra :: raid_volume')}"
ContainerPath: "/ var / lib / cassandra"
Mode: "RW"
Constraints:
-
- "rack-id"
- "CLUSTER"
- "cassandra-single-rack-1"
-

The key configuration item is the privileged mode. Enabling privilege mode allows the Cassandra process, thrift client, CQL native transport, internode communication port, bound to the Docker host running on the Cassandra container. An alternative to privileged mode is to use the network HOST mode. Someone may say that there are restrictions on running multiple Cassandra nodes on the Docker host. An example of this apparent limitation is that the Cassandra cluster in a pre-production environment must meet the needs of certain resources, especially storage requirements. We believe that running multiple Cassandra Docker containers on the same Docker host, sharing machine resources is a good idea, but need to refer to Cassandra production environment best practices .
We also forked spotify docker-Cassandra warehouse, revoked a change in Cassandra default configuration. In simple terms, we re-enabled the use of virtual nodes, so we rely on Cassandra to generate the token itself. Spotify / cassandra: Alternatives on the cluster can work with a specific token. That is, each Cassandra node needs a Marathon job, assigned a predefined token associated with each Marathon job, which looks a bit overkill.

According to the Cassandra production environment recommendation setting, we assume that such a Cassandra container will run on a host that has prepared a raid0 volume and has sufficient memory resources. We consider the m3.xlarge AWS example for production purpose, but you can also try a small-scale use of the cluster to verify your idea.

Cluster map

Cassandra_Cluster_Mesos_Cluster_-_New_Page.png
As shown above, you can understand the cluster creation process, Marathon Mesos scheduler as a trigger, Mesos as a resource manager. Docker is a containerized option for Mesos resources.

Other programs

Mesosphere allows running a framework called Cassandra Mesos, how to build flexible mesos , here is a good practice article.
The Cassandra Mesos framework is being rewritten, so it is time to focus.
There is an article written by tuplejump.com that uses a unique way to manage resources in the Cassandra ring and Spark and Akka RM on the Mesos cluster.

Conclusions and considerations

Safety considerations

Also worth considering is the security of the Cassandra cluster. Modeling Mesos Clusters Cassandra partitions take security into account in order to detect security issues early. According to the different needs of the source can achieve multi-level security.

The advantage of using this option on Cassandra

You can add additional data centers by clicking the button to submit the Marathon job configured for the new data center. A concrete example is the addition of an equivalent VPC to another area on the existing infrastructure. Using the additional Cassandra data center in the new VPC requires the following steps:

  • Deploy the Mesos foundation
  • Adjust the run in the Cassandra cluster where Merathon has the job limit, adjust the Docker environment variable to refer to the new area (possibly equivalent VPC)
  • Prepare a new Marathon job and deploy a new Cassandra cluster on the new VPC.

Reference materials

* Http://www.datastax.com/blog
* Http://www.datastax.com/docume … .html
* Https://github.com/spotify/docker-cassandra
* Https://www.youtube.com/watch?v=efYIRKs63T4
* Https://github.com/mesosphere/cassandra-mesos
* Https://github.com/spotify/docker-cassandra
* Https://github.com/cakesolutions/docker-cassandra

Original link: Cassandra on Mesos with Docker (translation: Cui Jingwen proofreading: Li Yingjie)
===========================
Translator introduction Cui Jingwen, now working at VMware, senior software engineer, responsible for desktop virtualization products, quality assurance work. Worked for years in the IBM WebSphere Business Process Management software. On the virtualization, middleware technology has a strong interest.

    Heads up! This alert needs your attention, but it's not super important.