How to Set Up Chaos Engineering in your Continuous Delivery pipeline with Gremlin and Jenkins

How to Set Up Chaos Engineering in your Continuous Delivery pipeline with Gremlin and Jenkins

Many operations teams today leverage Continuous Deployment (CD) pipelines to provide a repeatable automated sequence of steps in building and testing new software. This enables a consistent ability to stand up an environment, perform validations, and optionally tear down the environment to revert to a clean slate in a repeatable way. Teams will often add automated testing tools to perform functional tests, load tests, integration tests, and other types of tests to validate the quality of the product before and after pushing to production.

With Chaos Engineering, we can add reliability testing to our suite of automated tests. Running chaos experiments in our CI/CD pipeline ensures all code changes are reliable before they reach customers. By using "automated chaos" to test for reliability during the deployment process, we can detect operational issues early and avoid outages in production.

In this tutorial, we'll create stages in a Jenkins pipeline to inject a controlled amount of failure into a test system using Gremlin. You'll learn how to deploy a Jenkins instance using Docker, create API keys in Gremlin, and use the Gremlin API to start an attack.

Prerequisites

Before you begin this tutorial, you’ll need the following:

  • Docker, for easily deploying Jenkins from a container image.
  • Gremlin deployed on the host where you want to run your chaos experiment. This can be the same host as Jenkins, but ideally it should be a host that you deploy your application to for testing.
  • A Gremlin account (sign up below).

Create your Gremlin Free account

Sign up now. Free forever.
First name
Last name
Email
Log in

Step 1 - Get Jenkins Up and Running

In this step, you’ll stand up an instance of Jenkins using the official Docker image. If you already have a Jenkins environment, skip to Step 3 - Create your Chaos Deployment Pipeline.

At the command line, enter the following to initialize a Jenkins instance using Docker.

bash
1docker run --publish 8080:8080 --publish 50000:50000 --name jenkins jenkins/jenkins:lts-alpine

Navigate to http://localhost:8080 on your browser to confirm Jenkins is working. If this is your first time setting up Jenkins, you will need to enter your admin password and your choice of packages. For this tutorial, the defaults will work fine. Then, add an admin user and log into the account.

Step 2 - Retrieve and Add a Gremlin API Key to Jenkins

In this step, you’ll enter your Gremlin API key and team ID into the Jenkins instance. Your Gremlin API key is tied to your Gremlin user account, and allows Jenkins to authenticate with Gremlin without requiring your username or password. Your team ID is associated with your Gremlin team and allows Jenkins to run attacks, target hosts, and perform other actions within your Gremlin team.

To get your team ID, log into the Gremlin web app. Click on the user icon in the top-right, then click Team Settings. Click the Configuration tab to see your Team ID:

Copy your team ID or keep this window open, as you'll need it in the next step.

Next, we'll create an API key. Click on the user icon in the top-right, then click Account Settings. Click on the API Keys tab, then click New API Key. Enter a name for the key (e.g. "Jenkins") and optionally a description, then click Save. Copy the key from the modal window that appears (you can still access the key after closing the modal window).

Now that we have our team ID and API key, let's enter them into Jenkins. We'll add these to Jenkins as credentials. Open the following in your browser:

http://localhost:8080/credentials/store/system/domain/_/newCredentials

Or open the Jenkins dashboard and navigate to Manage Jenkins > Manage Credentials > (global). Click Add Credentials. Set the Kind to Secret text and the Scope to Global as shown below. Paste your Gremlin API key in the Secret field, and enter gremlin-api-key as the ID. Click OK to save.

Repeat this step for your team ID. Select Secret text, paste your ID into the Secret field, then enter gremlin-team-id into the ID field. Click OK to save. Your global credentials list should look like this:

Step 3 - Create your Chaos Deployment Pipeline

In this step, we'll create a Jenkins pipeline. This pipeline will run a CPU attack, which consumes CPU capacity on our target host for a set amount of time. The target of the attack is the host where we installed Gremlin before starting the tutorial.

In a typical CI/CD pipeline, our pipeline code might contain steps for provisioning a test environment, deploying an application, deploying the Gremlin agent to that environment, then running the attack. For the purposes of this tutorial, we'll skip the first three steps and just show how to run the attack using the Gremlin API.

On the Jenkins home screen, click New Item. Enter a name such as "Chaos Pipeline", select Pipeline, then click OK. Scroll down to the Pipeline section, then enter the following code:

bash
1pipeline {
2 agent none
3 environment {
4 ATTACK_ID = ''
5 GREMLIN_API_KEY = credentials('gremlin-api-key')
6 GREMLIN_TEAM_ID = credentials('gremlin-team-id')
7 }
8 parameters {
9 string(name: 'TARGET_IDENTIFIER', defaultValue: 'gremlin-demo-lab-host', description: 'Host to target')
10 string(name: 'CPU_LENGTH', defaultValue: '30', description: 'Duration of CPU attack')
11 string(name: 'CPU_CORE', defaultValue: '1', description: 'Number of cores to impact')
12 string(name: 'CPU_CAPACITY', defaultValue: '100', description: 'The percentage of total CPU capacity to consume')
13 }
14 stages {
15 stage('Initialize test environment') {
16 steps{
17 echo "[Add commands to create a test environment.]"
18 }
19 }
20 stage('Install application to test environment') {
21 steps{
22 echo "[Add commands to deploy your application to your test environment.]"
23 }
24 }
25 stage('Run chaos experiment') {
26 agent any
27 steps {
28 script {
29 ATTACK_ID = sh (
30 script: "curl -s -H 'Content-Type: application/json;charset=utf-8' -H 'Authorization: Key ${GREMLIN_API_KEY}' https://api.gremlin.com/v1/attacks/new?teamId=${GREMLIN_TEAM_ID} --data '{ \"command\": { \"type\": \"cpu\", \"args\": [\"-c\", \"$CPU_CORE\", \"-l\", \"$CPU_LENGTH\", \"-p\", \"$CPU_CAPACITY\"] },\"target\": { \"type\": \"Exact\", \"hosts\" : { \"ids\": [\"$TARGET_IDENTIFIER\"] } } }' --compressed",
31 returnStdout: true
32 ).trim()
33 echo "View your experiment at https://app.gremlin.com/attacks/${ATTACK_ID}"
34 }
35 }
36 }
37 }
38}

Let's take a closer look at this script.

First, in the environment section, we retrieve our credentials (our Gremlin API key and team ID). Under parameters, we define the parameters of the attack. TARGET_IDENTIFIER is the name of the host we want to target as it appears in Gremlin (for example, here we use gremlin-demo-lab-host). You can find your list of hosts in the Gremlin web app by clicking on Clients > Hosts:

Next is the stages section. The first two stages are where we would add steps to provision and set up our test environment. The third stage, "Run chaos experiment," is where we call the Gremlin API to start the attack. Note the script field, which contains the complete call to the Gremlin API. You can replace this field with any Gremlin API call of your choice, whether it's calling a different type of attack, running a Scenario, attacking a Kubernetes resource, or attacking a Service. You can learn more about creating API calls in our getting started tutorial.

For now, replace the default value of TARGET_IDENTIFIER with the name of the host you want to run the attack on. Optionally, change the parameters of the CPU attack by changing the CPU_LENGTH, CPU_CORE, and CPU_CAPACITY parameters. CPU_LENGTH is how long the attack will run (in seconds), CPU_CORE is the number of CPU cores impacted, and CPU_CAPACITY is the percentage of total CPU capacity to consume.

Next, run the demo script by selecting “Build with Parameters”, then “Build”. Jenkins will quickly run through the first two stages, then call the Gremlin API and start the attack. The Stage View will look similar to this:

If we open the console output by clicking on the build number and selecting Console Output, we'll see the following:

1Started by user Admin
2Running in Durability level: MAX_SURVIVABILITY
3[Pipeline] Start of Pipeline
4[Pipeline] withCredentials
5Masking supported pattern matches of $GREMLIN_API_KEY
6[Pipeline] {
7[Pipeline] withEnv
8[Pipeline] {
9[Pipeline] stage
10[Pipeline] { (Initialize test environment)
11[Pipeline] echo
12[Add commands to create a test environment.]
13[Pipeline] }
14[Pipeline] // stage
15[Pipeline] stage
16[Pipeline] { (Install application to test environment)
17[Pipeline] echo
18[Add commands to deploy your application to your test environment.]
19[Pipeline] }
20[Pipeline] // stage
21[Pipeline] stage
22[Pipeline] { (Run chaos experiment)
23[Pipeline] node
24Running on Jenkins in /var/jenkins_home/workspace/Chaos Pipeline
25[Pipeline] {
26[Pipeline] script
27[Pipeline] {
28[Pipeline] sh
29Warning: A secret was passed to "sh" using Groovy String interpolation, which is insecure.
30 Affected argument(s) used the following variable(s): [GREMLIN_API_KEY]
31 See https://jenkins.io/redirect/groovy-string-interpolation for details.
32+ curl -s -H 'Content-Type: application/json' -H 'Authorization: Key ****' https://api.gremlin.com/v1/attacks/new --data '{ "command": { "type": "cpu", "args": ["-c", "1", "-l", "30", "-p", "100"] },"target": { "type": "Exact", "hosts" : { "ids": ["gremlin-demo-lab-host"] } } }' --compressed
33[Pipeline] echo
34View your experiment at https://app.gremlin.com/attacks/User requires privilege for target team: TEAM_DEFAULT
35[Pipeline] }
36[Pipeline] // script
37[Pipeline] }
38[Pipeline] // node
39[Pipeline] }
40[Pipeline] // stage
41[Pipeline] }
42[Pipeline] // withEnv
43[Pipeline] }
44[Pipeline] // withCredentials
45[Pipeline] End of Pipeline
46Finished: SUCCESS

Congratulations! You've now integrated chaos experiments into your CI/CD pipeline!

Conclusion

This tutorial is just the first step to effectively using Chaos Engineering in your CI/CD pipeline. Expand your practice further by running a Scenario instead of an attack, run a check to verify the completion of the experiment, use Status Checks to automatically halt an experiment if your systems become unstable, or run your experiment alongside an integration or load test. If you have automated load or functional tests, run them alongside your chaos experiment to make sure your systems can operate reliably under stress. You can apply these same principles to other automated build and deployment tools such as Spinnaker, GitLab, or CircleCI.

For more on Gremlin and CI/CD, check out our webinar: Automating Chaos Engineering in your CI/CD Environments.

Related

Avoid downtime. Use Gremlin to turn failure into resilience.

Gremlin empowers you to proactively root out failure before it causes downtime. See how you can harness chaos to build resilient systems by requesting a demo of Gremlin.

Get started