Add km module kafka

2025-12-24 20:22:12 +08:00 · 2023-02-14 16:27:47 +08:00
parent 229140f067
commit 0b8160a714
4039 changed files with 718112 additions and 46204 deletions
--- a/tests/.gitignore
+++ b/tests/.gitignore
@@ -0,0 +1,7 @@
+Vagrantfile.local
+.idea/
+*.pyc
+*.ipynb
+.DS_Store
+.ducktape
+results/
--- a/tests/MANIFEST.in
+++ b/tests/MANIFEST.in
@@ -0,0 +1,16 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+recursive-include kafkatest */templates/*
--- a/tests/README.md
+++ b/tests/README.md
@@ -0,0 +1,548 @@
+System Integration & Performance Testing
+========================================
+
+This directory contains Kafka system integration and performance tests.
+[ducktape](https://github.com/confluentinc/ducktape) is used to run the tests.  
+(ducktape is a distributed testing framework which provides test runner,
+result reporter and utilities to pull up and tear down services.)
+
+Running tests using docker
+--------------------------
+Docker containers can be used for running kafka system tests locally.
+* Requirements
+  - Docker 1.12.3 (or higher) is installed and running on the machine.
+  - Test require that Kafka, including system test libs, is built. This can be done by running ./gradlew clean systemTestLibs
+* Run all tests
+```
+bash tests/docker/run_tests.sh
+```
+* Run all tests with debug on (warning will produce log of logs)
+```
+_DUCKTAPE_OPTIONS="--debug" bash tests/docker/run_tests.sh | tee debug_logs.txt
+```
+* Run a subset of tests
+```
+TC_PATHS="tests/kafkatest/tests/streams tests/kafkatest/tests/tools" bash tests/docker/run_tests.sh
+```
+* Run a specific tests file
+```
+TC_PATHS="tests/kafkatest/tests/client/pluggable_test.py" bash tests/docker/run_tests.sh
+```
+* Run a specific test class
+```
+TC_PATHS="tests/kafkatest/tests/client/pluggable_test.py::PluggableConsumerTest" bash tests/docker/run_tests.sh
+```
+* Run a specific test method
+```
+TC_PATHS="tests/kafkatest/tests/client/pluggable_test.py::PluggableConsumerTest.test_start_stop" bash tests/docker/run_tests.sh
+```
+* Run tests with a different JVM
+```
+bash tests/docker/ducker-ak up -j 'openjdk:11'; tests/docker/run_tests.sh
+```
+
+* Notes
+  - The scripts to run tests creates and destroys docker network named *knw*.
+   This network can't be used for any other purpose.
+  - The docker containers are named knode01, knode02 etc.
+   These nodes can't be used for any other purpose.
+
+* Exposing ports using --expose-ports option of `ducker-ak up` command
+
+    If `--expose-ports` is specified then we will expose those ports to random ephemeral ports
+    on the host. The argument can be a single port (like 5005), a port range like (5005-5009)
+    or a combination of port/port-range separated by comma (like 2181,9092 or 2181,5005-5008).
+    By default no port is exposed.
+    
+    The exposed port mapping can be seen by executing `docker ps` command. The PORT column
+    of the output shows the mapping like this (maps port 33891 on host to port 2182 in container):
+
+    0.0.0.0:33891->2182/tcp
+
+    Behind the scene Docker is setting up a DNAT rule for the mapping and it is visible in
+    the DOCKER section of iptables command (`sudo iptables -t nat -L -n`), something like:
+
+    <pre>DNAT       tcp  --  0.0.0.0/0      0.0.0.0/0      tcp       dpt:33882       to:172.22.0.2:9092</pre>
+
+    The exposed port(s) are useful to attach a remote debugger to the process running
+    in the docker image. For example if port 5005 was exposed and is mapped to an ephemeral
+    port (say 33891), then a debugger attaching to port 33891 on host will be connecting to
+    a debug session started at port 5005 in the docker image. As an example, for above port
+    numbers, run following commands in the docker image (say by ssh using `./docker/ducker-ak ssh ducker02`):
+
+    > $ export KAFKA_OPTS="-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=5005"
+    
+    > $ /opt/kafka-dev/bin/kafka-topics.sh --bootstrap-server ducker03:9095 --topic __consumer_offsets --describe
+
+    This will run the TopicCommand to describe the __consumer-offset topic. The java process
+    will stop and wait for debugger to attach as `suspend=y` option was specified. Now starting
+    a debugger on host with host `localhost` and following parameter as JVM setting:
+
+    `-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=33891`
+
+    will attach it to the TopicCommand process running in the docker image.
+
+Examining CI run
+----------------
+* Set BUILD_ID is travis ci's build id. E.g. build id is 169519874 for the following build
+```bash
+https://travis-ci.org/apache/kafka/builds/169519874
+```
+
+* Getting number of tests that were actually run
+```bash
+for id in $(curl -sSL https://api.travis-ci.org/builds/$BUILD_ID | jq '.matrix|map(.id)|.[]'); do curl -sSL "https://api.travis-ci.org/jobs/$id/log.txt?deansi=true" ; done | grep -cE 'RunnerClient: Loading test'
+```
+
+* Getting number of tests that passed
+```bash
+for id in $(curl -sSL https://api.travis-ci.org/builds/$BUILD_ID | jq '.matrix|map(.id)|.[]'); do curl -sSL "https://api.travis-ci.org/jobs/$id/log.txt?deansi=true" ; done | grep -cE 'RunnerClient.*PASS'
+```
+* Getting all the logs produced from a run
+```bash
+for id in $(curl -sSL https://api.travis-ci.org/builds/$BUILD_ID | jq '.matrix|map(.id)|.[]'); do curl -sSL "https://api.travis-ci.org/jobs/$id/log.txt?deansi=true" ; done
+```
+* Explanation of curl calls to travis-ci & jq commands
+  - We get json information of the build using the following command
+```bash
+curl -sSL https://api.travis-ci.org/apache/kafka/builds/169519874
+```
+This produces a json about the build which looks like:
+```json
+{
+  "id": 169519874,
+  "repository_id": 6097916,
+  "number": "19",
+  "config": {
+    "sudo": "required",
+    "dist": "trusty",
+    "language": "java",
+    "env": [
+      "TC_PATHS=\"tests/kafkatest/tests/client\"",
+      "TC_PATHS=\"tests/kafkatest/tests/connect tests/kafkatest/tests/streams tests/kafkatest/tests/tools\"",
+      "TC_PATHS=\"tests/kafkatest/tests/mirror_maker\"",
+      "TC_PATHS=\"tests/kafkatest/tests/replication\"",
+      "TC_PATHS=\"tests/kafkatest/tests/upgrade\"",
+      "TC_PATHS=\"tests/kafkatest/tests/security\"",
+      "TC_PATHS=\"tests/kafkatest/tests/core\""
+    ],
+    "jdk": [
+      "oraclejdk8"
+    ],
+    "before_install": null,
+    "script": [
+      "./gradlew systemTestLibs && /bin/bash ./tests/travis/run_tests.sh"
+    ],
+    "services": [
+      "docker"
+    ],
+    "before_cache": [
+      "rm -f  $HOME/.gradle/caches/modules-2/modules-2.lock",
+      "rm -fr $HOME/.gradle/caches/*/plugin-resolution/"
+    ],
+    "cache": {
+      "directories": [
+        "$HOME/.m2/repository",
+        "$HOME/.gradle/caches/",
+        "$HOME/.gradle/wrapper/"
+      ]
+    },
+    ".result": "configured",
+    "group": "stable"
+  },
+  "state": "finished",
+  "result": null,
+  "status": null,
+  "started_at": "2016-10-21T13:35:43Z",
+  "finished_at": "2016-10-21T14:46:03Z",
+  "duration": 16514,
+  "commit": "7e583d9ea08c70dbbe35a3adde72ed203a797f64",
+  "branch": "trunk",
+  "message": "respect _DUCK_OPTIONS",
+  "committed_at": "2016-10-21T00:12:36Z",
+  "author_name": "Raghav Kumar Gautam",
+  "author_email": "raghav@apache.org",
+  "committer_name": "Raghav Kumar Gautam",
+  "committer_email": "raghav@apache.org",
+  "compare_url": "https://github.com/raghavgautam/kafka/compare/cc788ac99ca7...7e583d9ea08c",
+  "event_type": "push",
+  "matrix": [
+    {
+      "id": 169519875,
+      "repository_id": 6097916,
+      "number": "19.1",
+      "config": {
+        "sudo": "required",
+        "dist": "trusty",
+        "language": "java",
+        "env": "TC_PATHS=\"tests/kafkatest/tests/client\"",
+        "jdk": "oraclejdk8",
+        "before_install": null,
+        "script": [
+          "./gradlew systemTestLibs && /bin/bash ./tests/travis/run_tests.sh"
+        ],
+        "services": [
+          "docker"
+        ],
+        "before_cache": [
+          "rm -f  $HOME/.gradle/caches/modules-2/modules-2.lock",
+          "rm -fr $HOME/.gradle/caches/*/plugin-resolution/"
+        ],
+        "cache": {
+          "directories": [
+            "$HOME/.m2/repository",
+            "$HOME/.gradle/caches/",
+            "$HOME/.gradle/wrapper/"
+          ]
+        },
+        ".result": "configured",
+        "group": "stable",
+        "os": "linux"
+      },
+      "result": null,
+      "started_at": "2016-10-21T13:35:43Z",
+      "finished_at": "2016-10-21T14:24:50Z",
+      "allow_failure": false
+    },
+    {
+      "id": 169519876,
+      "repository_id": 6097916,
+      "number": "19.2",
+      "config": {
+        "sudo": "required",
+        "dist": "trusty",
+        "language": "java",
+        "env": "TC_PATHS=\"tests/kafkatest/tests/connect tests/kafkatest/tests/streams tests/kafkatest/tests/tools\"",
+        "jdk": "oraclejdk8",
+        "before_install": null,
+        "script": [
+          "./gradlew systemTestLibs && /bin/bash ./tests/travis/run_tests.sh"
+        ],
+        "services": [
+          "docker"
+        ],
+        "before_cache": [
+          "rm -f  $HOME/.gradle/caches/modules-2/modules-2.lock",
+          "rm -fr $HOME/.gradle/caches/*/plugin-resolution/"
+        ],
+        "cache": {
+          "directories": [
+            "$HOME/.m2/repository",
+            "$HOME/.gradle/caches/",
+            "$HOME/.gradle/wrapper/"
+          ]
+        },
+        ".result": "configured",
+        "group": "stable",
+        "os": "linux"
+      },
+      "result": 1,
+      "started_at": "2016-10-21T13:35:46Z",
+      "finished_at": "2016-10-21T14:22:05Z",
+      "allow_failure": false
+    },
+
+    ...
+  ]
+}
+
+```
+  - By passing this through jq filter `.matrix` we extract the matrix part of the json
+```bash
+curl -sSL https://api.travis-ci.org/apache/kafka/builds/169519874 | jq '.matrix'
+```
+The resulting json looks like:
+```json
+[
+  {
+    "id": 169519875,
+    "repository_id": 6097916,
+    "number": "19.1",
+    "config": {
+      "sudo": "required",
+      "dist": "trusty",
+      "language": "java",
+      "env": "TC_PATHS=\"tests/kafkatest/tests/client\"",
+      "jdk": "oraclejdk8",
+      "before_install": null,
+      "script": [
+        "./gradlew systemTestLibs && /bin/bash ./tests/travis/run_tests.sh"
+      ],
+      "services": [
+        "docker"
+      ],
+      "before_cache": [
+        "rm -f  $HOME/.gradle/caches/modules-2/modules-2.lock",
+        "rm -fr $HOME/.gradle/caches/*/plugin-resolution/"
+      ],
+      "cache": {
+        "directories": [
+          "$HOME/.m2/repository",
+          "$HOME/.gradle/caches/",
+          "$HOME/.gradle/wrapper/"
+        ]
+      },
+      ".result": "configured",
+      "group": "stable",
+      "os": "linux"
+    },
+    "result": null,
+    "started_at": "2016-10-21T13:35:43Z",
+    "finished_at": "2016-10-21T14:24:50Z",
+    "allow_failure": false
+  },
+  {
+    "id": 169519876,
+    "repository_id": 6097916,
+    "number": "19.2",
+    "config": {
+      "sudo": "required",
+      "dist": "trusty",
+      "language": "java",
+      "env": "TC_PATHS=\"tests/kafkatest/tests/connect tests/kafkatest/tests/streams tests/kafkatest/tests/tools\"",
+      "jdk": "oraclejdk8",
+      "before_install": null,
+      "script": [
+        "./gradlew systemTestLibs && /bin/bash ./tests/travis/run_tests.sh"
+      ],
+      "services": [
+        "docker"
+      ],
+      "before_cache": [
+        "rm -f  $HOME/.gradle/caches/modules-2/modules-2.lock",
+        "rm -fr $HOME/.gradle/caches/*/plugin-resolution/"
+      ],
+      "cache": {
+        "directories": [
+          "$HOME/.m2/repository",
+          "$HOME/.gradle/caches/",
+          "$HOME/.gradle/wrapper/"
+        ]
+      },
+      ".result": "configured",
+      "group": "stable",
+      "os": "linux"
+    },
+    "result": 1,
+    "started_at": "2016-10-21T13:35:46Z",
+    "finished_at": "2016-10-21T14:22:05Z",
+    "allow_failure": false
+  },
+
+  ...
+]
+
+```
+  - By further passing this through jq filter `map(.id)` we extract the id of
+  the builds for each of the splits
+```bash
+curl -sSL https://api.travis-ci.org/apache/kafka/builds/169519874 | jq '.matrix|map(.id)'
+```
+The resulting json looks like:
+```json
+[
+  169519875,
+  169519876,
+  169519877,
+  169519878,
+  169519879,
+  169519880,
+  169519881
+]
+```
+  - To use these ids in for loop we want to get rid of `[]` which is done by
+  passing it through `.[]` filter
+```bash
+curl -sSL https://api.travis-ci.org/apache/kafka/builds/169519874 | jq '.matrix|map(.id)|.[]'
+```
+And we get
+```text
+169519875
+169519876
+169519877
+169519878
+169519879
+169519880
+169519881
+```
+  - In the for loop we have made calls to fetch logs
+```bash
+curl -sSL "https://api.travis-ci.org/jobs/169519875/log.txt?deansi=true" | tail
+```
+which gives us
+```text
+[INFO:2016-10-21 14:21:12,538]: SerialTestRunner: kafkatest.tests.client.consumer_test.OffsetValidationTest.test_consumer_bounce.clean_shutdown=False.bounce_mode=rolling: test 16 of 28
+[INFO:2016-10-21 14:21:12,538]: SerialTestRunner: kafkatest.tests.client.consumer_test.OffsetValidationTest.test_consumer_bounce.clean_shutdown=False.bounce_mode=rolling: setting up
+[INFO:2016-10-21 14:21:30,810]: SerialTestRunner: kafkatest.tests.client.consumer_test.OffsetValidationTest.test_consumer_bounce.clean_shutdown=False.bounce_mode=rolling: running
+[INFO:2016-10-21 14:24:35,519]: SerialTestRunner: kafkatest.tests.client.consumer_test.OffsetValidationTest.test_consumer_bounce.clean_shutdown=False.bounce_mode=rolling: PASS
+[INFO:2016-10-21 14:24:35,519]: SerialTestRunner: kafkatest.tests.client.consumer_test.OffsetValidationTest.test_consumer_bounce.clean_shutdown=False.bounce_mode=rolling: tearing down
+
+
+The job exceeded the maximum time limit for jobs, and has been terminated.
+
+```
+* Links
+  - [Travis-CI REST api documentation](https://docs.travis-ci.com/api)
+  - [jq Manual](https://stedolan.github.io/jq/manual/)
+
+Local Quickstart
+----------------
+This quickstart will help you run the Kafka system tests on your local machine. Note this requires bringing up a cluster of virtual machines on your local computer, which is memory intensive; it currently requires around 10G RAM.
+For a tutorial on how to setup and run the Kafka system tests, see
+https://cwiki.apache.org/confluence/display/KAFKA/tutorial+-+set+up+and+run+Kafka+system+tests+with+ducktape
+
+* Install Virtual Box from [https://www.virtualbox.org/](https://www.virtualbox.org/) (run `$ vboxmanage --version` to check if it's installed).
+* Install Vagrant >= 1.6.4 from [https://www.vagrantup.com/](https://www.vagrantup.com/) (run `vagrant --version` to check if it's installed).
+* Install system test dependencies, including ducktape, a command-line tool and library for testing distributed systems. We recommend to use virtual env for system test development
+
+        $ cd kafka/tests
+        $ virtualenv venv
+        $ . ./venv/bin/activate
+        $ python setup.py develop
+        $ cd ..  # back to base kafka directory
+
+* Run the bootstrap script to set up Vagrant for testing
+
+        $ tests/bootstrap-test-env.sh
+
+* Bring up the test cluster
+
+        $ vagrant/vagrant-up.sh
+        $ # When using Virtualbox, it also works to run: vagrant up
+
+* Build the desired branch of Kafka
+
+        $ git checkout $BRANCH
+        $ gradle  # (only if necessary)
+        $ ./gradlew systemTestLibs
+
+* Run the system tests using ducktape:
+
+        $ ducktape tests/kafkatest/tests
+
+EC2 Quickstart
+--------------
+This quickstart will help you run the Kafka system tests on EC2. In this setup, all logic is run
+on EC2 and none on your local machine.
+
+There are a lot of steps here, but the basic goals are to create one distinguished EC2 instance that
+will be our "test driver", and to set up the security groups and iam role so that the test driver
+can create, destroy, and run ssh commands on any number of "workers".
+
+As a convention, we'll use "kafkatest" in most names, but you can use whatever name you want.
+
+Preparation
+-----------
+In these steps, we will create an IAM role which has permission to create and destroy EC2 instances,
+set up a keypair used for ssh access to the test driver and worker machines, and create a security group to allow the test driver and workers to all communicate via TCP.
+
+* [Create an IAM role](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create_for-user.html). We'll give this role the ability to launch or kill additional EC2 machines.
+ - Create role "kafkatest-master"
+ - Role type: Amazon EC2
+ - Attach policy: AmazonEC2FullAccess (this will allow our test-driver to create and destroy EC2 instances)
+
+* If you haven't already, [set up a keypair to use for SSH access](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-key-pairs.html). For the purpose
+of this quickstart, let's say the keypair name is kafkatest, and you've saved the private key in kafktest.pem
+
+* Next, create a EC2 security group called "kafkatest".
+ - After creating the group, inbound rules: allow SSH on port 22 from anywhere; also, allow access on all ports (0-65535) from other machines in the kafkatest group.
+
+Create the Test Driver
+----------------------
+* Launch a new test driver machine
+ - OS: Ubuntu server is recommended
+ - Instance type: t2.medium is easily enough since this machine is just a driver
+ - Instance details: Most defaults are fine.
+ - IAM role -> kafkatest-master
+ - Tagging the instance with a useful name is recommended.
+ - Security group -> 'kafkatest'
+
+
+* Once the machine is started, upload the SSH key to your test driver:
+
+        $ scp -i /path/to/kafkatest.pem \
+            /path/to/kafkatest.pem ubuntu@public.hostname.amazonaws.com:kafkatest.pem
+
+* Grab the public hostname/IP (available for example by navigating to your EC2 dashboard and viewing running instances) of your test driver and SSH into it:
+
+        $ ssh -i /path/to/kafkatest.pem ubuntu@public.hostname.amazonaws.com
+
+Set Up the Test Driver
+----------------------
+The following steps assume you have ssh'd into
+the test driver machine.
+
+* Start by making sure you're up to date, and install git and ducktape:
+
+        $ sudo apt-get update && sudo apt-get -y upgrade && sudo apt-get install -y python-pip git
+        $ pip install ducktape
+
+* Get Kafka:
+
+        $ git clone https://git-wip-us.apache.org/repos/asf/kafka.git kafka
+
+* Update your AWS credentials:
+
+        export AWS_IAM_ROLE=$(curl -s http://169.254.169.254/latest/meta-data/iam/info | grep InstanceProfileArn | cut -d '"' -f 4 | cut -d '/' -f 2)
+        export AWS_ACCESS_KEY=$(curl -s http://169.254.169.254/latest/meta-data/iam/security-credentials/$AWS_IAM_ROLE | grep AccessKeyId | awk -F\" '{ print $4 }')
+        export AWS_SECRET_KEY=$(curl -s http://169.254.169.254/latest/meta-data/iam/security-credentials/$AWS_IAM_ROLE | grep SecretAccessKey | awk -F\" '{ print $4 }')
+        export AWS_SESSION_TOKEN=$(curl -s http://169.254.169.254/latest/meta-data/iam/security-credentials/$AWS_IAM_ROLE | grep Token | awk -F\" '{ print $4 }')
+
+* Install some dependencies:
+
+        $ cd kafka
+        $ ./vagrant/aws/aws-init.sh
+        $ . ~/.bashrc
+
+* An example Vagrantfile.local has been created by aws-init.sh which looks something like:
+
+        # Vagrantfile.local
+        ec2_instance_type = "..." # Pick something appropriate for your
+                                  # test. Note that the default m3.medium has
+                                  # a small disk.
+        ec2_spot_max_price = "0.123"  # On-demand price for instance type
+        enable_hostmanager = false
+        num_zookeepers = 0
+        num_kafka = 0
+        num_workers = 9
+        ec2_keypair_name = 'kafkatest'
+        ec2_keypair_file = '/home/ubuntu/kafkatest.pem'
+        ec2_security_groups = ['kafkatest']
+        ec2_region = 'us-west-2'
+        ec2_ami = "ami-29ebb519"
+
+* Start up the instances:
+
+        # This will brink up worker machines in small parallel batches
+        $ vagrant/vagrant-up.sh --aws
+
+* Now you should be able to run tests:
+
+        $ cd kafka/tests
+        $ ducktape kafkatest/tests
+
+* Update Worker VM
+
+If you change code in a branch on your driver VM, you need to update your worker VM to pick up this change:
+
+        $ ./gradlew systemTestLibs
+        $ vagrant rsync
+
+* To halt your workers without destroying persistent state, run `vagrant halt`. Run `vagrant destroy -f` to destroy all traces of your workers.
+
+Unit Tests
+----------
+The system tests have unit tests! The various services in the python `kafkatest` module are reasonably complex, and intended to be reusable. Hence we have unit tests
+for the system service classes.
+
+Where are the unit tests?
+* The kafkatest unit tests are located under kafka/tests/unit
+
+How do I run the unit tests?
+* cd kafka/tests # The base system test directory
+* python setup.py test
+
+How can I add a unit test?
+* Follow the naming conventions - module name starts with "check", class name begins with "Check", test method name begins with "check"
+* These naming conventions are defined in "setup.cfg". We use "check" to distinguish unit tests from system tests, which use "test" in the various names.
+
--- a/tests/bin/external_trogdor_command_example.py
+++ b/tests/bin/external_trogdor_command_example.py
@@ -0,0 +1,41 @@
+#!/usr/bin/env python
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import json
+import sys
+import time
+
+#
+# This is an example of an external script which can be run through Trogdor's
+# ExternalCommandWorker. It sleeps for the given amount of time expressed by the delayMs field in the ExternalCommandSpec
+#
+
+if __name__ == '__main__':
+    # Read the ExternalCommandWorker start message.
+    line = sys.stdin.readline()
+    start_message = json.loads(line)
+    workload = start_message["workload"]
+    print("Starting external_trogdor_command_example with task id %s, workload %s"
+          % (start_message["id"], workload))
+    sys.stdout.flush()
+
+    # pretend to start some workload
+    print(json.dumps({"status": "running"}))
+    sys.stdout.flush()
+    time.sleep(0.001 * workload["delayMs"])
+
+    print(json.dumps({"status": "exiting after %s delayMs" % workload["delayMs"]}))
+    sys.stdout.flush()
--- a/tests/bin/flatten_html.sh
+++ b/tests/bin/flatten_html.sh
@@ -0,0 +1,78 @@
+#!/bin/bash
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+usage() {
+    cat <<EOF
+flatten_html.sh: This script "flattens" an HTML file by inlining all
+files included via "#include virtual".  This is useful when making
+changes to the Kafka documentation files.
+
+Typical usage:
+    ./gradlew docsJar
+    ./tests/bin/flatten_html.sh -f ./docs/protocol.html > /tmp/my-protocol.html
+    firefox /tmp/my-protocol.html &
+
+usage:
+$0 [flags]
+
+flags:
+-f [filename]   The HTML file to process.
+-h              Print this help message.
+EOF
+}
+
+die() {
+    echo $@
+    exit 1
+}
+
+realpath() {
+    [[ $1 = /* ]] && echo "$1" || echo "$PWD/${1#./}"
+}
+
+process_file() {
+    local CUR_FILE="${1}"
+    [[ -f "${CUR_FILE}" ]] || die "Unable to open input file ${CUR_FILE}"
+    while IFS= read -r LINE; do
+        if [[ $LINE =~ \#include\ virtual=\"(.*)\" ]]; then
+            local INCLUDED_FILE="${BASH_REMATCH[1]}"
+            if [[ $INCLUDED_FILE =~ ../includes/ ]]; then
+                : # ignore ../includes
+            else
+                pushd "$(dirname "${CUR_FILE}")" &> /dev/null \
+                    || die "failed to change directory to directory of ${CUR_FILE}"
+                process_file "${INCLUDED_FILE}"
+                popd &> /dev/null
+            fi
+        else
+            echo "${LINE}"
+        fi
+    done < "${CUR_FILE}"
+}
+
+FILE=""
+while getopts "f:h" arg; do
+    case $arg in
+        f) FILE=$OPTARG;;
+        h) usage; exit 0;;
+        *) echo "Error parsing command-line arguments."
+            usage
+            exit 1;;
+    esac
+done
+
+[[ -z "${FILE}" ]] && die "You must specify which file to process.  -h for help."
+process_file "${FILE}"
--- a/tests/bootstrap-test-env.sh
+++ b/tests/bootstrap-test-env.sh
@@ -0,0 +1,87 @@
+#!/usr/bin/env bash
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+# This script automates the process of setting up a local machine for running Kafka system tests
+export GREP_OPTIONS='--color=never'
+
+# Helper function which prints version numbers so they can be compared lexically or numerically
+function version { echo "$@" | awk -F. '{ printf("%03d%03d%03d%03d\n", $1,$2,$3,$4); }'; }
+
+base_dir=`dirname $0`/..
+cd $base_dir
+
+echo "Checking Virtual Box installation..."
+bad_vb=false
+if [ -z `vboxmanage --version` ]; then
+    echo "It appears that Virtual Box is not installed. Please install and try again (see https://www.virtualbox.org/ for details)"
+    bad_vb=true   
+else
+    echo "Virtual Box looks good."
+fi
+
+echo "Checking Vagrant installation..."
+vagrant_version=`vagrant --version | egrep -o "[0-9]+\.[0-9]+\.[0-9]+"`
+bad_vagrant=false
+if [ "$(version $vagrant_version)" -lt "$(version 1.6.4)" ]; then
+    echo "Found Vagrant version $vagrant_version. Please upgrade to 1.6.4 or higher (see https://www.vagrantup.com for details)"
+    bad_vagrant=true
+else
+    echo "Vagrant installation looks good."
+fi
+
+if [ "x$bad_vagrant" == "xtrue" -o "x$bad_vb" == "xtrue" ]; then
+    exit 1
+fi
+
+echo "Checking for necessary Vagrant plugins..."
+hostmanager_version=`vagrant plugin list | grep vagrant-hostmanager | egrep -o "[0-9]+\.[0-9]+\.[0-9]+"`
+if [ -z "$hostmanager_version"  ]; then
+    vagrant plugin install vagrant-hostmanager
+fi
+
+echo "Creating and packaging a reusable base box for Vagrant..."
+vagrant/package-base-box.sh
+
+# Set up Vagrantfile.local if necessary
+if [ ! -e Vagrantfile.local ]; then
+    echo "Creating Vagrantfile.local..."
+    cp vagrant/system-test-Vagrantfile.local Vagrantfile.local
+else
+    echo "Found an existing Vagrantfile.local. Keeping without overwriting..."
+fi
+
+# Sanity check contents of Vagrantfile.local
+echo "Checking Vagrantfile.local..."
+vagrantfile_ok=true
+num_brokers=`egrep -o "num_brokers\s*=\s*[0-9]+" Vagrantfile.local | cut -d '=' -f 2 | xargs`
+num_zookeepers=`egrep -o "num_zookeepers\s*=\s*[0-9]+" Vagrantfile.local | cut -d '=' -f 2 | xargs`
+num_workers=`egrep -o "num_workers\s*=\s*[0-9]+" Vagrantfile.local | cut -d '=' -f 2 | xargs`
+if [ "x$num_brokers" == "x" -o "$num_brokers" != 0 ]; then
+    echo "Vagrantfile.local: bad num_brokers. Update to: num_brokers = 0"
+    vagrantfile_ok=false
+fi
+if [ "x$num_zookeepers" == "x" -o "$num_zookeepers" != 0 ]; then
+    echo "Vagrantfile.local: bad num_zookeepers. Update to: num_zookeepers = 0"
+    vagrantfile_ok=false
+fi
+if [ "x$num_workers" == "x" -o "$num_workers" == 0 ]; then
+    echo "Vagrantfile.local: bad num_workers (size of test cluster). Set num_workers high enough to run your tests."
+    vagrantfile_ok=false
+fi
+
+if [ "$vagrantfile_ok" == "true" ]; then
+    echo "Vagrantfile.local looks good."
+fi
--- a/tests/docker/Dockerfile
+++ b/tests/docker/Dockerfile
@@ -0,0 +1,89 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+ARG jdk_version=openjdk:8
+FROM $jdk_version
+
+MAINTAINER Apache Kafka dev@kafka.apache.org
+VOLUME ["/opt/kafka-dev"]
+
+# Set the timezone.
+ENV TZ="/usr/share/zoneinfo/America/Los_Angeles"
+
+# Do not ask for confirmations when running apt-get, etc.
+ENV DEBIAN_FRONTEND noninteractive
+
+# Set the ducker.creator label so that we know that this is a ducker image.  This will make it
+# visible to 'ducker purge'.  The ducker.creator label also lets us know what UNIX user built this
+# image.
+ARG ducker_creator=default
+LABEL ducker.creator=$ducker_creator
+
+# Update Linux and install necessary utilities.
+RUN apt update && apt install -y sudo netcat iptables rsync unzip wget curl jq coreutils openssh-server net-tools vim python-pip python-dev libffi-dev libssl-dev cmake pkg-config libfuse-dev iperf traceroute && apt-get -y clean
+RUN python -m pip install -U pip==9.0.3;
+RUN pip install --upgrade cffi virtualenv pyasn1 boto3 pycrypto pywinrm ipaddress enum34 && pip install --upgrade ducktape==0.7.9
+
+# Set up ssh
+COPY ./ssh-config /root/.ssh/config
+# NOTE: The paramiko library supports the PEM-format private key, but does not support the RFC4716 format.
+RUN ssh-keygen -m PEM -q -t rsa -N '' -f /root/.ssh/id_rsa && cp -f /root/.ssh/id_rsa.pub /root/.ssh/authorized_keys
+RUN echo 'PermitUserEnvironment yes' >> /etc/ssh/sshd_config
+
+# Install binary test dependencies.
+# we use the same versions as in vagrant/base.sh
+ARG KAFKA_MIRROR="https://s3-us-west-2.amazonaws.com/kafka-packages"
+RUN mkdir -p "/opt/kafka-0.8.2.2" && chmod a+rw /opt/kafka-0.8.2.2 && curl -s "$KAFKA_MIRROR/kafka_2.11-0.8.2.2.tgz" | tar xz --strip-components=1 -C "/opt/kafka-0.8.2.2"
+RUN mkdir -p "/opt/kafka-0.9.0.1" && chmod a+rw /opt/kafka-0.9.0.1 && curl -s "$KAFKA_MIRROR/kafka_2.11-0.9.0.1.tgz" | tar xz --strip-components=1 -C "/opt/kafka-0.9.0.1"
+RUN mkdir -p "/opt/kafka-0.10.0.1" && chmod a+rw /opt/kafka-0.10.0.1 && curl -s "$KAFKA_MIRROR/kafka_2.11-0.10.0.1.tgz" | tar xz --strip-components=1 -C "/opt/kafka-0.10.0.1"
+RUN mkdir -p "/opt/kafka-0.10.1.1" && chmod a+rw /opt/kafka-0.10.1.1 && curl -s "$KAFKA_MIRROR/kafka_2.11-0.10.1.1.tgz" | tar xz --strip-components=1 -C "/opt/kafka-0.10.1.1"
+RUN mkdir -p "/opt/kafka-0.10.2.2" && chmod a+rw /opt/kafka-0.10.2.2 && curl -s "$KAFKA_MIRROR/kafka_2.11-0.10.2.2.tgz" | tar xz --strip-components=1 -C "/opt/kafka-0.10.2.2"
+RUN mkdir -p "/opt/kafka-0.11.0.3" && chmod a+rw /opt/kafka-0.11.0.3 && curl -s "$KAFKA_MIRROR/kafka_2.11-0.11.0.3.tgz" | tar xz --strip-components=1 -C "/opt/kafka-0.11.0.3"
+RUN mkdir -p "/opt/kafka-1.0.2" && chmod a+rw /opt/kafka-1.0.2 && curl -s "$KAFKA_MIRROR/kafka_2.11-1.0.2.tgz" | tar xz --strip-components=1 -C "/opt/kafka-1.0.2"
+RUN mkdir -p "/opt/kafka-1.1.1" && chmod a+rw /opt/kafka-1.1.1 && curl -s "$KAFKA_MIRROR/kafka_2.11-1.1.1.tgz" | tar xz --strip-components=1 -C "/opt/kafka-1.1.1"
+RUN mkdir -p "/opt/kafka-2.0.1" && chmod a+rw /opt/kafka-2.0.1 && curl -s "$KAFKA_MIRROR/kafka_2.12-2.0.1.tgz" | tar xz --strip-components=1 -C "/opt/kafka-2.0.1"
+RUN mkdir -p "/opt/kafka-2.1.1" && chmod a+rw /opt/kafka-2.1.1 && curl -s "$KAFKA_MIRROR/kafka_2.12-2.1.1.tgz" | tar xz --strip-components=1 -C "/opt/kafka-2.1.1"
+RUN mkdir -p "/opt/kafka-2.2.2" && chmod a+rw /opt/kafka-2.2.2 && curl -s "$KAFKA_MIRROR/kafka_2.12-2.2.2.tgz" | tar xz --strip-components=1 -C "/opt/kafka-2.2.2"
+RUN mkdir -p "/opt/kafka-2.3.1" && chmod a+rw /opt/kafka-2.3.1 && curl -s "$KAFKA_MIRROR/kafka_2.12-2.3.1.tgz" | tar xz --strip-components=1 -C "/opt/kafka-2.3.1"
+RUN mkdir -p "/opt/kafka-2.4.0" && chmod a+rw /opt/kafka-2.4.0 && curl -s "$KAFKA_MIRROR/kafka_2.12-2.4.0.tgz" | tar xz --strip-components=1 -C "/opt/kafka-2.4.0"
+RUN mkdir -p "/opt/kafka-2.5.0" && chmod a+rw /opt/kafka-2.5.0 && curl -s "$KAFKA_MIRROR/kafka_2.12-2.5.0.tgz" | tar xz --strip-components=1 -C "/opt/kafka-2.5.0"
+
+# Streams test dependencies
+RUN curl -s "$KAFKA_MIRROR/kafka-streams-0.10.0.1-test.jar" -o /opt/kafka-0.10.0.1/libs/kafka-streams-0.10.0.1-test.jar
+RUN curl -s "$KAFKA_MIRROR/kafka-streams-0.10.1.1-test.jar" -o /opt/kafka-0.10.1.1/libs/kafka-streams-0.10.1.1-test.jar
+RUN curl -s "$KAFKA_MIRROR/kafka-streams-0.10.2.2-test.jar" -o /opt/kafka-0.10.2.2/libs/kafka-streams-0.10.2.2-test.jar
+RUN curl -s "$KAFKA_MIRROR/kafka-streams-0.11.0.3-test.jar" -o /opt/kafka-0.11.0.3/libs/kafka-streams-0.11.0.3-test.jar
+RUN curl -s "$KAFKA_MIRROR/kafka-streams-1.0.2-test.jar" -o /opt/kafka-1.0.2/libs/kafka-streams-1.0.2-test.jar
+RUN curl -s "$KAFKA_MIRROR/kafka-streams-1.1.1-test.jar" -o /opt/kafka-1.1.1/libs/kafka-streams-1.1.1-test.jar
+RUN curl -s "$KAFKA_MIRROR/kafka-streams-2.0.1-test.jar" -o /opt/kafka-2.0.1/libs/kafka-streams-2.0.1-test.jar
+RUN curl -s "$KAFKA_MIRROR/kafka-streams-2.1.1-test.jar" -o /opt/kafka-2.1.1/libs/kafka-streams-2.1.1-test.jar
+RUN curl -s "$KAFKA_MIRROR/kafka-streams-2.2.2-test.jar" -o /opt/kafka-2.2.2/libs/kafka-streams-2.2.2-test.jar
+RUN curl -s "$KAFKA_MIRROR/kafka-streams-2.3.1-test.jar" -o /opt/kafka-2.3.1/libs/kafka-streams-2.3.1-test.jar
+RUN curl -s "$KAFKA_MIRROR/kafka-streams-2.4.0-test.jar" -o /opt/kafka-2.4.0/libs/kafka-streams-2.4.0-test.jar
+RUN curl -s "$KAFKA_MIRROR/kafka-streams-2.5.0-test.jar" -o /opt/kafka-2.5.0/libs/kafka-streams-2.5.0-test.jar
+
+# The version of Kibosh to use for testing.
+# If you update this, also update vagrant/base.sh
+ARG KIBOSH_VERSION="8841dd392e6fbf02986e2fb1f1ebf04df344b65a"
+
+# Install Kibosh
+RUN apt-get install fuse
+RUN cd /opt && git clone -q  https://github.com/confluentinc/kibosh.git && cd "/opt/kibosh" && git reset --hard $KIBOSH_VERSION && mkdir "/opt/kibosh/build" && cd "/opt/kibosh/build" && ../configure && make -j 2
+
+# Set up the ducker user.
+RUN useradd -ms /bin/bash ducker && mkdir -p /home/ducker/ && rsync -aiq /root/.ssh/ /home/ducker/.ssh && chown -R ducker /home/ducker/ /mnt/ /var/log/ && echo "PATH=$(runuser -l ducker -c 'echo $PATH'):$JAVA_HOME/bin" >> /home/ducker/.ssh/environment && echo 'PATH=$PATH:'"$JAVA_HOME/bin" >> /home/ducker/.profile && echo 'ducker ALL=(ALL) NOPASSWD: ALL' >> /etc/sudoers
+USER ducker
+
+CMD sudo service ssh start && tail -f /dev/null
--- a/tests/docker/ducker-ak
+++ b/tests/docker/ducker-ak
@@ -0,0 +1,580 @@
+#!/usr/bin/env bash
+
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+#
+# Ducker-AK: a tool for running Apache Kafka system tests inside Docker images.
+#
+# Note: this should be compatible with the version of bash that ships on most
+# Macs, bash 3.2.57.
+#
+
+script_path="${0}"
+
+# The absolute path to the directory which this script is in.  This will also be the directory
+# which we run docker build from.
+ducker_dir="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
+
+# The absolute path to the root Kafka directory
+kafka_dir="$( cd "${ducker_dir}/../.." && pwd )"
+
+# The memory consumption to allow during the docker build.
+# This does not include swap.
+docker_build_memory_limit="3200m"
+
+# The maximum mmemory consumption to allow in containers.
+docker_run_memory_limit="2000m"
+
+# The default number of cluster nodes to bring up if a number is not specified.
+default_num_nodes=14
+
+# The default OpenJDK base image.
+default_jdk="openjdk:8"
+
+# The default ducker-ak image name.
+default_image_name="ducker-ak"
+
+# Display a usage message on the terminal and exit.
+#
+# $1: The exit status to use
+usage() {
+    local exit_status="${1}"
+    cat <<EOF
+ducker-ak: a tool for running Apache Kafka tests inside Docker images.
+
+Usage: ${script_path} [command] [options]
+
+help|-h|--help
+    Display this help message
+
+up [-n|--num-nodes NUM_NODES] [-f|--force] [docker-image]
+        [-C|--custom-ducktape DIR] [-e|--expose-ports ports]
+    Bring up a cluster with the specified amount of nodes (defaults to ${default_num_nodes}).
+    The docker image name defaults to ${default_image_name}.  If --force is specified, we will
+    attempt to bring up an image even some parameters are not valid.
+
+    If --custom-ducktape is specified, we will install the provided custom
+    ducktape source code directory before bringing up the nodes.  The provided
+    directory should be the ducktape git repo, not the ducktape installed module directory.
+
+    if --expose-ports is specified then we will expose those ports to random ephemeral ports
+    on the host. The argument can be a single port (like 5005), a port range like (5005-5009)
+    or a combination of port/port-range separated by comma (like 2181,9092 or 2181,5005-5008).
+    By default no port is exposed. See README.md for more detail on this option.
+
+test [test-name(s)]
+    Run a test or set of tests inside the currently active Ducker nodes.
+    For example, to run the system test produce_bench_test, you would run:
+        ./tests/docker/ducker-ak test ./tests/kafkatest/test/core/produce_bench_test.py
+
+ssh [node-name|user-name@node-name] [command]
+    Log in to a running ducker container.  If node-name is not given, it prints
+    the names of all running nodes.  If node-name is 'all', we will run the
+    command on every node.  If user-name is given, we will try to log in as
+    that user.  Otherwise, we will log in as the 'ducker' user.  If a command
+    is specified, we will run that command.  Otherwise, we will provide a login
+    shell.
+
+down [-q|--quiet] [-f|--force]
+    Tear down all the currently active ducker-ak nodes.  If --quiet is specified,
+    only error messages are printed. If --force or -f is specified, "docker rm -f"
+    will be used to remove the nodes, which kills currently running ducker-ak test.
+
+purge [--f|--force]
+    Purge Docker images created by ducker-ak.  This will free disk space.
+    If --force is set, we run 'docker rmi -f'.
+EOF
+    exit "${exit_status}"
+}
+
+# Exit with an error message.
+die() {
+    echo $@
+    exit 1
+}
+
+# Check for the presence of certain commands.
+#
+# $@: The commands to check for.  This function will die if any of these commands are not found by
+#       the 'which' command.
+require_commands() {
+    local cmds="${@}"
+    for cmd in ${cmds}; do
+        which -- "${cmd}" &> /dev/null || die "You must install ${cmd} to run this script."
+    done
+}
+
+# Set a global variable to a value.
+#
+# $1: The variable name to set.  This function will die if the variable already has a value.  The
+#     variable will be made readonly to prevent any future modifications.
+# $2: The value to set the variable to.  This function will die if the value is empty or starts
+#     with a dash.
+# $3: A human-readable description of the variable.
+set_once() {
+    local key="${1}"
+    local value="${2}"
+    local what="${3}"
+    [[ -n "${!key}" ]] && die "Error: more than one value specified for ${what}."
+    verify_command_line_argument "${value}" "${what}"
+    # It would be better to use declare -g, but older bash versions don't support it.
+    export ${key}="${value}"
+}
+
+# Verify that a command-line argument is present and does not start with a slash.
+#
+# $1: The command-line argument to verify.
+# $2: A human-readable description of the variable.
+verify_command_line_argument() {
+    local value="${1}"
+    local what="${2}"
+    [[ -n "${value}" ]] || die "Error: no value specified for ${what}"
+    [[ ${value} == -* ]] && die "Error: invalid value ${value} specified for ${what}"
+}
+
+# Echo a message if a flag is set.
+#
+# $1: If this is 1, the message will be echoed.
+# $@: The message
+maybe_echo() {
+    local verbose="${1}"
+    shift
+    [[ "${verbose}" -eq 1 ]] && echo "${@}"
+}
+
+# Counts the number of elements passed to this subroutine.
+count() {
+    echo $#
+}
+
+# Push a new directory on to the bash directory stack, or exit with a failure message.
+#
+# $1: The directory push on to the directory stack.
+must_pushd() {
+    local target_dir="${1}"
+    pushd -- "${target_dir}" &> /dev/null || die "failed to change directory to ${target_dir}"
+}
+
+# Pop a directory from the bash directory stack, or exit with a failure message.
+must_popd() {
+    popd &> /dev/null || die "failed to popd"
+}
+
+# Run a command and die if it fails.
+#
+# Optional flags:
+# -v: print the command before running it.
+# -o: display the command output.
+# $@: The command to run.
+must_do() {
+    local verbose=0
+    local output="/dev/null"
+    while true; do
+        case ${1} in
+            -v) verbose=1; shift;;
+            -o) output="/dev/stdout"; shift;;
+            *) break;;
+        esac
+    done
+    local cmd="${@}"
+    [[ "${verbose}" -eq 1 ]] && echo "${cmd}"
+    ${cmd} >${output} || die "${1} failed"
+}
+
+# Ask the user a yes/no question.
+#
+# $1: The prompt to use
+# $_return: 0 if the user answered no; 1 if the user answered yes.
+ask_yes_no() {
+    local prompt="${1}"
+    while true; do
+        read -r -p "${prompt} " response
+        case "${response}" in
+            [yY]|[yY][eE][sS]) _return=1; return;;
+            [nN]|[nN][oO]) _return=0; return;;
+            *);;
+        esac
+        echo "Please respond 'yes' or 'no'."
+        echo
+    done
+}
+
+# Build a docker image.
+#
+# $1: The name of the image to build.
+ducker_build() {
+    local image_name="${1}"
+
+    # Use SECONDS, a builtin bash variable that gets incremented each second, to measure the docker
+    # build duration.
+    SECONDS=0
+
+    must_pushd "${ducker_dir}"
+    # Tip: if you are scratching your head for some dependency problems that are referring to an old code version
+    # (for example java.lang.NoClassDefFoundError), add --no-cache flag to the build shall give you a clean start.
+    must_do -v -o docker build --memory="${docker_build_memory_limit}" \
+        --build-arg "ducker_creator=${user_name}" --build-arg "jdk_version=${jdk_version}" -t "${image_name}" \
+        -f "${ducker_dir}/Dockerfile" ${docker_args} -- .
+    docker_status=$?
+    must_popd
+    duration="${SECONDS}"
+    if [[ ${docker_status} -ne 0 ]]; then
+        die "** ERROR: Failed to build ${what} image after $((${duration} / 60))m \
+$((${duration} % 60))s.  See ${build_log} for details."
+    fi
+    echo "** Successfully built ${what} image in $((${duration} / 60))m \
+$((${duration} % 60))s.  See ${build_log} for details."
+}
+
+docker_run() {
+    local node=${1}
+    local image_name=${2}
+    local ports_option=${3}
+
+    local expose_ports=""
+    if [[ -n ${ports_option} ]]; then
+        expose_ports="-P"
+        for expose_port in ${ports_option//,/ }; do
+            expose_ports="${expose_ports} --expose ${expose_port}"
+        done
+    fi
+
+    # Invoke docker-run. We need privileged mode to be able to run iptables
+    # and mount FUSE filesystems inside the container.  We also need it to
+    # run iptables inside the container.
+    must_do -v docker run --privileged \
+        -d -t -h "${node}" --network ducknet "${expose_ports}" \
+        --memory=${docker_run_memory_limit} --memory-swappiness=1 \
+        -v "${kafka_dir}:/opt/kafka-dev" --name "${node}" -- "${image_name}"
+}
+
+setup_custom_ducktape() {
+    local custom_ducktape="${1}"
+    local image_name="${2}"
+
+    [[ -f "${custom_ducktape}/ducktape/__init__.py" ]] || \
+        die "You must supply a valid ducktape directory to --custom-ducktape"
+    docker_run ducker01 "${image_name}"
+    local running_container="$(docker ps -f=network=ducknet -q)"
+    must_do -v -o docker cp "${custom_ducktape}" "${running_container}:/opt/ducktape"
+    docker exec --user=root ducker01 bash -c 'set -x && cd /opt/kafka-dev/tests && sudo python ./setup.py develop install && cd /opt/ducktape && sudo python ./setup.py develop install'
+    [[ $? -ne 0 ]] && die "failed to install the new ducktape."
+    must_do -v -o docker commit ducker01 "${image_name}"
+    must_do -v docker kill "${running_container}"
+    must_do -v docker rm ducker01
+}
+
+ducker_up() {
+    require_commands docker
+    while [[ $# -ge 1 ]]; do
+        case "${1}" in
+            -C|--custom-ducktape) set_once custom_ducktape "${2}" "the custom ducktape directory"; shift 2;;
+            -f|--force) force=1; shift;;
+            -n|--num-nodes) set_once num_nodes "${2}" "number of nodes"; shift 2;;
+            -j|--jdk) set_once jdk_version "${2}" "the OpenJDK base image"; shift 2;;
+            -e|--expose-ports) set_once expose_ports "${2}" "the ports to expose"; shift 2;;
+            *) set_once image_name "${1}" "docker image name"; shift;;
+        esac
+    done
+    [[ -n "${num_nodes}" ]] || num_nodes="${default_num_nodes}"
+    [[ -n "${jdk_version}" ]] || jdk_version="${default_jdk}"
+    [[ -n "${image_name}" ]] || image_name="${default_image_name}-${jdk_version/:/-}"
+    [[ "${num_nodes}" =~ ^-?[0-9]+$ ]] || \
+        die "ducker_up: the number of nodes must be an integer."
+    [[ "${num_nodes}" -gt 0 ]] || die "ducker_up: the number of nodes must be greater than 0."
+    if [[ "${num_nodes}" -lt 2 ]]; then
+        if [[ "${force}" -ne 1 ]]; then
+            echo "ducker_up: It is recommended to run at least 2 nodes, since ducker01 is only \
+used to run ducktape itself.  If you want to do it anyway, you can use --force to attempt to \
+use only ${num_nodes}."
+            exit 1
+        fi
+    fi
+
+    docker ps >/dev/null || die "ducker_up: failed to run docker.  Please check that the daemon is started."
+
+    ducker_build "${image_name}"
+
+    docker inspect --format='{{.Config.Labels}}' --type=image "${image_name}" | grep -q 'ducker.type'
+    local docker_status=${PIPESTATUS[0]}
+    local grep_status=${PIPESTATUS[1]}
+    [[ "${docker_status}" -eq 0 ]] || die "ducker_up: failed to inspect image ${image_name}.  \
+Please check that it exists."
+    if [[ "${grep_status}" -ne 0 ]]; then
+        if [[ "${force}" -ne 1 ]]; then
+            echo "ducker_up: ${image_name} does not appear to be a ducker image.  It lacks the \
+ducker.type label.  If you think this is a mistake, you can use --force to attempt to bring \
+it up anyway."
+            exit 1
+        fi
+    fi
+    local running_containers="$(docker ps -f=network=ducknet -q)"
+    local num_running_containers=$(count ${running_containers})
+    if [[ ${num_running_containers} -gt 0 ]]; then
+        die "ducker_up: there are ${num_running_containers} ducker containers \
+running already.  Use ducker down to bring down these containers before \
+attempting to start new ones."
+    fi
+
+    echo "ducker_up: Bringing up ${image_name} with ${num_nodes} nodes..."
+    if docker network inspect ducknet &>/dev/null; then
+        must_do -v docker network rm ducknet
+    fi
+    must_do -v docker network create ducknet
+    if [[ -n "${custom_ducktape}" ]]; then
+        setup_custom_ducktape "${custom_ducktape}" "${image_name}"
+    fi
+    for n in $(seq -f %02g 1 ${num_nodes}); do
+        local node="ducker${n}"
+        docker_run "${node}" "${image_name}" "${expose_ports}"
+    done
+    mkdir -p "${ducker_dir}/build"
+    exec 3<> "${ducker_dir}/build/node_hosts"
+    for n in $(seq -f %02g 1 ${num_nodes}); do
+        local node="ducker${n}"
+        docker exec --user=root "${node}" grep "${node}" /etc/hosts >&3
+        [[ $? -ne 0 ]] && die "failed to find the /etc/hosts entry for ${node}"
+    done
+    exec 3>&-
+    for n in $(seq -f %02g 1 ${num_nodes}); do
+        local node="ducker${n}"
+        docker exec --user=root "${node}" \
+            bash -c "grep -v ${node} /opt/kafka-dev/tests/docker/build/node_hosts >> /etc/hosts"
+        [[ $? -ne 0 ]] && die "failed to append to the /etc/hosts file on ${node}"
+    done
+
+    echo "ducker_up: added the latest entries to /etc/hosts on each node."
+    generate_cluster_json_file "${num_nodes}" "${ducker_dir}/build/cluster.json"
+    echo "ducker_up: successfully wrote ${ducker_dir}/build/cluster.json"
+    echo "** ducker_up: successfully brought up ${num_nodes} nodes."
+}
+
+# Generate the cluster.json file used by ducktape to identify cluster nodes.
+#
+# $1: The number of cluster nodes.
+# $2: The path to write the cluster.json file to.
+generate_cluster_json_file() {
+    local num_nodes="${1}"
+    local path="${2}"
+    exec 3<> "${path}"
+cat<<EOF >&3
+{
+  "_comment": [
+    "Licensed to the Apache Software Foundation (ASF) under one or more",
+    "contributor license agreements.  See the NOTICE file distributed with",
+    "this work for additional information regarding copyright ownership.",
+    "The ASF licenses this file to You under the Apache License, Version 2.0",
+    "(the \"License\"); you may not use this file except in compliance with",
+    "the License.  You may obtain a copy of the License at",
+    "",
+    "http://www.apache.org/licenses/LICENSE-2.0",
+    "",
+    "Unless required by applicable law or agreed to in writing, software",
+    "distributed under the License is distributed on an \"AS IS\" BASIS,",
+    "WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.",
+    "See the License for the specific language governing permissions and",
+    "limitations under the License."
+  ],
+  "nodes": [
+EOF
+    for n in $(seq 2 ${num_nodes}); do
+      if [[ ${n} -eq ${num_nodes} ]]; then
+        suffix=""
+      else
+        suffix=","
+      fi
+      local node=$(printf ducker%02d ${n})
+cat<<EOF >&3
+    {
+      "externally_routable_ip": "${node}",
+      "ssh_config": {
+        "host": "${node}",
+        "hostname": "${node}",
+        "identityfile": "/home/ducker/.ssh/id_rsa",
+        "password": "",
+        "port": 22,
+        "user": "ducker"
+      }
+    }${suffix}
+EOF
+    done
+cat<<EOF >&3
+  ]
+}
+EOF
+    exec 3>&-
+}
+
+ducker_test() {
+    require_commands docker
+    docker inspect ducker01 &>/dev/null || \
+        die "ducker_test: the ducker01 instance appears to be down. Did you run 'ducker up'?"
+    [[ $# -lt 1 ]] && \
+        die "ducker_test: you must supply at least one system test to run. Type --help for help."
+    local args=""
+    local kafka_test=0
+    for arg in "${@}"; do
+        local regex=".*\/kafkatest\/(.*)"
+        if [[ $arg =~ $regex ]]; then
+            local kpath=${BASH_REMATCH[1]}
+            args="${args} ./tests/kafkatest/${kpath}"
+        else
+            args="${args} ${arg}"
+        fi
+    done
+    must_pushd "${kafka_dir}"
+    (test -f ./gradlew || gradle) && ./gradlew systemTestLibs
+    must_popd
+    cmd="cd /opt/kafka-dev && ducktape --cluster-file /opt/kafka-dev/tests/docker/build/cluster.json $args"
+    echo "docker exec ducker01 bash -c \"${cmd}\""
+    exec docker exec --user=ducker ducker01 bash -c "${cmd}"
+}
+
+ducker_ssh() {
+    require_commands docker
+    [[ $# -eq 0 ]] && die "ducker_ssh: Please specify a container name to log into. \
+Currently active containers: $(echo_running_container_names)"
+    local node_info="${1}"
+    shift
+    local guest_command="$*"
+    local user_name="ducker"
+    if [[ "${node_info}" =~ @ ]]; then
+        user_name="${node_info%%@*}"
+        local node_name="${node_info##*@}"
+    else
+        local node_name="${node_info}"
+    fi
+    local docker_flags=""
+    if [[ -z "${guest_command}" ]]; then
+        local docker_flags="${docker_flags} -t"
+        local guest_command_prefix=""
+        guest_command=bash
+    else
+        local guest_command_prefix="bash -c"
+    fi
+    if [[ "${node_name}" == "all" ]]; then
+        local nodes=$(echo_running_container_names)
+        [[ "${nodes}" == "(none)" ]] && die "ducker_ssh: can't locate any running ducker nodes."
+        for node in ${nodes}; do
+            docker exec --user=${user_name} -i ${docker_flags} "${node}" \
+                ${guest_command_prefix} "${guest_command}" || die "docker exec ${node} failed"
+        done
+    else
+        docker inspect --type=container -- "${node_name}" &>/dev/null || \
+            die "ducker_ssh: can't locate node ${node_name}. Currently running nodes: \
+$(echo_running_container_names)"
+        exec docker exec --user=${user_name} -i ${docker_flags} "${node_name}" \
+            ${guest_command_prefix} "${guest_command}"
+    fi
+}
+
+# Echo all the running Ducker container names, or (none) if there are no running Ducker containers.
+echo_running_container_names() {
+    node_names="$(docker ps -f=network=ducknet -q --format '{{.Names}}' | sort)"
+    if [[ -z "${node_names}" ]]; then
+        echo "(none)"
+    else
+        echo ${node_names//$'\n'/ }
+    fi
+}
+
+ducker_down() {
+    require_commands docker
+    local verbose=1
+    local force_str=""
+    while [[ $# -ge 1 ]]; do
+        case "${1}" in
+            -q|--quiet) verbose=0; shift;;
+            -f|--force) force_str="-f"; shift;;
+            *) die "ducker_down: unexpected command-line argument ${1}";;
+        esac
+    done
+    local running_containers
+    running_containers="$(docker ps -f=network=ducknet -q)"
+    [[ $? -eq 0 ]]  || die "ducker_down: docker command failed.  Is the docker daemon running?"
+    running_containers=${running_containers//$'\n'/ }
+    local all_containers="$(docker ps -a -f=network=ducknet -q)"
+    all_containers=${all_containers//$'\n'/ }
+    if [[ -z "${all_containers}" ]]; then
+        maybe_echo "${verbose}" "No ducker containers found."
+        return
+    fi
+    verbose_flag=""
+    if [[ ${verbose} == 1 ]]; then
+        verbose_flag="-v"
+    fi
+    if [[ -n "${running_containers}" ]]; then
+        must_do ${verbose_flag} docker kill "${running_containers}"
+    fi
+    must_do ${verbose_flag} docker rm ${force_str} "${all_containers}"
+    must_do ${verbose_flag} -o rm -f -- "${ducker_dir}/build/node_hosts" "${ducker_dir}/build/cluster.json"
+    if docker network inspect ducknet &>/dev/null; then
+        must_do -v docker network rm ducknet
+    fi
+    maybe_echo "${verbose}" "ducker_down: removed $(count ${all_containers}) containers."
+}
+
+ducker_purge() {
+    require_commands docker
+    local force_str=""
+    while [[ $# -ge 1 ]]; do
+        case "${1}" in
+            -f|--force) force_str="-f"; shift;;
+            *) die "ducker_purge: unknown argument ${1}";;
+        esac
+    done
+    echo "** ducker_purge: attempting to locate ducker images to purge"
+    local images
+    images=$(docker images -q -a -f label=ducker.creator)
+    [[ $? -ne 0 ]] && die "docker images command failed"
+    images=${images//$'\n'/ }
+    declare -a purge_images=()
+    if [[ -z "${images}" ]]; then
+        echo "** ducker_purge: no images found to purge."
+        exit 0
+    fi
+    echo "** ducker_purge: images to delete:"
+    for image in ${images}; do
+        echo -n "${image} "
+        docker inspect --format='{{.Config.Labels}} {{.Created}}' --type=image "${image}"
+        [[ $? -ne 0 ]] && die "docker inspect ${image} failed"
+    done
+    ask_yes_no "Delete these docker images? [y/n]"
+    [[ "${_return}" -eq 0 ]] && exit 0
+    must_do -v -o docker rmi ${force_str} ${images}
+}
+
+# Parse command-line arguments
+[[ $# -lt 1 ]] && usage 0
+# Display the help text if -h or --help appears in the command line
+for arg in ${@}; do
+    case "${arg}" in
+        -h|--help) usage 0;;
+        --) break;;
+        *);;
+    esac
+done
+action="${1}"
+shift
+case "${action}" in
+    help) usage 0;;
+
+    up|test|ssh|down|purge)
+        ducker_${action} "${@}"; exit 0;;
+
+    *)  echo "Unknown command '${action}'.  Type '${script_path} --help' for usage information."
+        exit 1;;
+esac
--- a/tests/docker/run_tests.sh
+++ b/tests/docker/run_tests.sh
@@ -0,0 +1,30 @@
+#!/usr/bin/env bash
+
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
+KAFKA_NUM_CONTAINERS=${KAFKA_NUM_CONTAINERS:-14}
+TC_PATHS=${TC_PATHS:-./kafkatest/}
+
+die() {
+    echo $@
+    exit 1
+}
+
+if ${SCRIPT_DIR}/ducker-ak ssh | grep -q '(none)'; then
+    ${SCRIPT_DIR}/ducker-ak up -n "${KAFKA_NUM_CONTAINERS}" || die "ducker-ak up failed"
+fi
+${SCRIPT_DIR}/ducker-ak test ${TC_PATHS} ${_DUCKTAPE_OPTIONS} || die "ducker-ak test failed"
--- a/tests/docker/ssh-config
+++ b/tests/docker/ssh-config
@@ -0,0 +1,21 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+Host *
+  ControlMaster auto
+  ControlPath ~/.ssh/master-%r@%h:%p
+  StrictHostKeyChecking no
+  ConnectTimeout=10
+  IdentityFile ~/.ssh/id_rsa
--- a/tests/docker/ssh/authorized_keys
+++ b/tests/docker/ssh/authorized_keys
@@ -0,0 +1,15 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC0qDT9kEPWc8JQ53b4KnT/ZJOLwb+3c//jpLW/2ofjDyIsPW4FohLpicfouch/zsRpN4G38lua+2BsGls9sMIZc6PXY2L+NIGCkqEMdCoU1Ym8SMtyJklfzp3m/0PeK9s2dLlR3PFRYvyFA4btQK5hkbYDNZPzf4airvzdRzLkrFf81+RemaMI2EtONwJRcbLViPaTXVKJdbFwJTJ1u7yu9wDYWHKBMA92mHTQeP6bhVYCqxJn3to/RfZYd+sHw6mfxVg5OrAlUOYpSV4pDNCAsIHdtZ56V8NQlJL6NJ2vzzSSYUwLMqe88fhrC8yYHoxC07QPy1EdkSTHdohAicyT root@knode01.knw
--- a/tests/docker/ssh/config
+++ b/tests/docker/ssh/config
@@ -0,0 +1,21 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+Host *
+  ControlMaster auto
+  ControlPath ~/.ssh/master-%r@%h:%p
+  StrictHostKeyChecking no
+  ConnectTimeout=10
+  IdentityFile ~/.ssh/id_rsa
--- a/tests/docker/ssh/id_rsa
+++ b/tests/docker/ssh/id_rsa
@@ -0,0 +1,27 @@
+-----BEGIN RSA PRIVATE KEY-----
+MIIEpQIBAAKCAQEAtKg0/ZBD1nPCUOd2+Cp0/2STi8G/t3P/46S1v9qH4w8iLD1u
+BaIS6YnH6LnIf87EaTeBt/JbmvtgbBpbPbDCGXOj12Ni/jSBgpKhDHQqFNWJvEjL
+ciZJX86d5v9D3ivbNnS5UdzxUWL8hQOG7UCuYZG2AzWT83+Goq783Ucy5KxX/Nfk
+XpmjCNhLTjcCUXGy1Yj2k11SiXWxcCUydbu8rvcA2FhygTAPdph00Hj+m4VWAqsS
+Z97aP0X2WHfrB8Opn8VYOTqwJVDmKUleKQzQgLCB3bWeelfDUJSS+jSdr880kmFM
+CzKnvPH4awvMmB6MQtO0D8tRHZEkx3aIQInMkwIDAQABAoIBAQCz6EMFNNLp0NP1
+X9yRXS6wW4e4CRWUazesiw3YZpcmnp6IchCMGZA99FEZyVILPW1J3tYWyotBdw7Z
+RFeCRXy5L+IMtiVkNJcpwss7M4ve0w0LkY0gj5V49xJ+3Gp4gDnZSxcguvrAem5
+yP5obR572fDpl0SknB4HCr6U2l+rauzrLyevy5eeDT/vmXbuM1cdHpNIXmmElz4L
+t31n+exQRn6tP1h516iXbcYbopxDgdv2qKGAqzWKE6TyWpzF5x7kjOEYt0bZ5QO3
+Lwh7AAqE/3mwxlYwng1L4WAT7RtcP19W+9JDIc7ENInMGxq6q46p1S3IPZsf1cj/
+aAJ9q3LBAoGBAOVJr0+WkR786n3BuswpGQWBgVxfai4y9Lf90vuGKawdQUzXv0/c
+EB/CFqP/dIsquukA8PfzjNMyTNmEHXi4Sf16H8Rg4EGhIYMEqIQojx1t/yLLm0aU
+YPEvW/02Umtlg3pJw9fQAAzFVqCasw2E2lUdAUkydGRwDUJZmv2/b3NzAoGBAMm0
+Jo7Et7ochH8Vku6uA+hG+RdwlKFm5JA7/Ci3DOdQ1zmJNrvBBFQLo7AjA4iSCoBd
+s9+y0nrSPcF4pM3l6ghLheaqbnIi2HqIMH9mjDbrOZiWvbnjvjpOketgNX8vV3Ye
+GUkSjoNcmvRmdsICmUjeML8bGOmq4zF9W/GIfTphAoGBAKGRo8R8f/SLGh3VtvCI
+gUY89NAHuEWnyIQii1qMNq8+yjYAzaHTm1UVqmiT6SbrzFvGOwcuCu0Dw91+2Fmp
+2xGPzfTOoxf8GCY/0ROXlQmS6jc1rEw24Hzz92ldrwRYuyYf9q4Ltw1IvXtcp5F+
+LW/OiYpv0E66Gs3HYI0wKbP7AoGBAJMZWeFW37LQJ2TTJAQDToAwemq4xPxsoJX7
+2SsMTFHKKBwi0JLe8jwk/OxwrJwF/bieHZcvv8ao2zbkuDQcz6/a/D074C5G8V9z
+QQM4k1td8vQwQw91Yv782/gvgvRNX1iaHNCowtxURgGlVEirQoTc3eoRZfrLkMM/
+7DTa2JEhAoGACEu3zHJ1sgyeOEgLArUJXlQM30A/ulMrnCd4MEyIE+ReyWAUevUQ
+0lYdVNva0/W4C5e2lUOJL41jjIPLqI7tcFR2PZE6n0xTTkxNH5W2u1WpFeKjx+O3
+czv7Bt6wYyLHIMy1JEqAQ7pw1mtJ5s76UDvXUhciF+DU2pWYc6APKR0=
+-----END RSA PRIVATE KEY-----
--- a/tests/docker/ssh/id_rsa.pub
+++ b/tests/docker/ssh/id_rsa.pub
@@ -0,0 +1 @@
+ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC0qDT9kEPWc8JQ53b4KnT/ZJOLwb+3c//jpLW/2ofjDyIsPW4FohLpicfouch/zsRpN4G38lua+2BsGls9sMIZc6PXY2L+NIGCkqEMdCoU1Ym8SMtyJklfzp3m/0PeK9s2dLlR3PFRYvyFA4btQK5hkbYDNZPzf4airvzdRzLkrFf81+RemaMI2EtONwJRcbLViPaTXVKJdbFwJTJ1u7yu9wDYWHKBMA92mHTQeP6bhVYCqxJn3to/RfZYd+sHw6mfxVg5OrAlUOYpSV4pDNCAsIHdtZ56V8NQlJL6NJ2vzzSSYUwLMqe88fhrC8yYHoxC07QPy1EdkSTHdohAicyT root@knode01.knw
--- a/tests/kafkatest/init.py
+++ b/tests/kafkatest/init.py
@@ -0,0 +1,25 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+# This determines the version of kafkatest that can be published to PyPi and installed with pip
+#
+# Note that in development, this version name can't follow Kafka's convention of having a trailing "-SNAPSHOT"
+# due to python version naming restrictions, which are enforced by python packaging tools
+# (see  https://www.python.org/dev/peps/pep-0440/)
+#
+# Instead, in development branches, the version should have a suffix of the form ".devN"
+#
+# For example, when Kafka is at version 1.0.0-SNAPSHOT, this should be something like "1.0.0.dev0"
+__version__ = '2.5.2.dev0'
--- a/tests/kafkatest/benchmarks/init.py
+++ b/tests/kafkatest/benchmarks/init.py
@@ -0,0 +1,14 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
--- a/tests/kafkatest/benchmarks/core/init.py
+++ b/tests/kafkatest/benchmarks/core/init.py
@@ -0,0 +1,14 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
--- a/tests/kafkatest/benchmarks/core/benchmark_test.py
+++ b/tests/kafkatest/benchmarks/core/benchmark_test.py
@@ -0,0 +1,279 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from ducktape.mark import matrix
+from ducktape.mark import parametrize
+from ducktape.mark.resource import cluster
+from ducktape.services.service import Service
+from ducktape.tests.test import Test
+
+from kafkatest.services.kafka import KafkaService
+from kafkatest.services.performance import ProducerPerformanceService, EndToEndLatencyService, ConsumerPerformanceService, throughput, latency, compute_aggregate_throughput
+from kafkatest.services.zookeeper import ZookeeperService
+from kafkatest.version import DEV_BRANCH, KafkaVersion
+
+TOPIC_REP_ONE = "topic-replication-factor-one"
+TOPIC_REP_THREE = "topic-replication-factor-three"
+DEFAULT_RECORD_SIZE = 100  # bytes
+
+
+class Benchmark(Test):
+    """A benchmark of Kafka producer/consumer performance. This replicates the test
+    run here:
+    https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines
+    """
+    def __init__(self, test_context):
+        super(Benchmark, self).__init__(test_context)
+        self.num_zk = 1
+        self.num_brokers = 3
+        self.topics = {
+            TOPIC_REP_ONE: {'partitions': 6, 'replication-factor': 1},
+            TOPIC_REP_THREE: {'partitions': 6, 'replication-factor': 3}
+        }
+
+        self.zk = ZookeeperService(test_context, self.num_zk)
+
+        self.msgs_large = 10000000
+        self.batch_size = 8*1024
+        self.buffer_memory = 64*1024*1024
+        self.msg_sizes = [10, 100, 1000, 10000, 100000]
+        self.target_data_size = 128*1024*1024
+        self.target_data_size_gb = self.target_data_size/float(1024*1024*1024)
+
+    def setUp(self):
+        self.zk.start()
+
+    def start_kafka(self, security_protocol, interbroker_security_protocol, version):
+        self.kafka = KafkaService(
+            self.test_context, self.num_brokers,
+            self.zk, security_protocol=security_protocol,
+            interbroker_security_protocol=interbroker_security_protocol, topics=self.topics,
+            version=version)
+        self.kafka.log_level = "INFO"  # We don't DEBUG logging here
+        self.kafka.start()
+
+    @cluster(num_nodes=5)
+    @parametrize(acks=1, topic=TOPIC_REP_ONE)
+    @parametrize(acks=1, topic=TOPIC_REP_THREE)
+    @parametrize(acks=-1, topic=TOPIC_REP_THREE)
+    @matrix(acks=[1], topic=[TOPIC_REP_THREE], message_size=[10, 100, 1000, 10000, 100000], compression_type=["none", "snappy"], security_protocol=['PLAINTEXT', 'SSL'])
+    @cluster(num_nodes=7)
+    @parametrize(acks=1, topic=TOPIC_REP_THREE, num_producers=3)
+    def test_producer_throughput(self, acks, topic, num_producers=1, message_size=DEFAULT_RECORD_SIZE,
+                                 compression_type="none", security_protocol='PLAINTEXT', client_version=str(DEV_BRANCH),
+                                 broker_version=str(DEV_BRANCH)):
+        """
+        Setup: 1 node zk + 3 node kafka cluster
+        Produce ~128MB worth of messages to a topic with 6 partitions. Required acks, topic replication factor,
+        security protocol and message size are varied depending on arguments injected into this test.
+
+        Collect and return aggregate throughput statistics after all messages have been acknowledged.
+        (This runs ProducerPerformance.java under the hood)
+        """
+        client_version = KafkaVersion(client_version)
+        broker_version = KafkaVersion(broker_version)
+        self.validate_versions(client_version, broker_version)
+        self.start_kafka(security_protocol, security_protocol, broker_version)
+        # Always generate the same total amount of data
+        nrecords = int(self.target_data_size / message_size)
+
+        self.producer = ProducerPerformanceService(
+            self.test_context, num_producers, self.kafka, topic=topic,
+            num_records=nrecords, record_size=message_size,  throughput=-1, version=client_version,
+            settings={
+                'acks': acks,
+                'compression.type': compression_type,
+                'batch.size': self.batch_size,
+                'buffer.memory': self.buffer_memory})
+        self.producer.run()
+        return compute_aggregate_throughput(self.producer)
+
+    @cluster(num_nodes=5)
+    @parametrize(security_protocol='SSL', interbroker_security_protocol='PLAINTEXT')
+    @matrix(security_protocol=['PLAINTEXT', 'SSL'], compression_type=["none", "snappy"])
+    def test_long_term_producer_throughput(self, compression_type="none", security_protocol='PLAINTEXT',
+                                           interbroker_security_protocol=None, client_version=str(DEV_BRANCH),
+                                           broker_version=str(DEV_BRANCH)):
+        """
+        Setup: 1 node zk + 3 node kafka cluster
+        Produce 10e6 100 byte messages to a topic with 6 partitions, replication-factor 3, and acks=1.
+
+        Collect and return aggregate throughput statistics after all messages have been acknowledged.
+
+        (This runs ProducerPerformance.java under the hood)
+        """
+        client_version = KafkaVersion(client_version)
+        broker_version = KafkaVersion(broker_version)
+        self.validate_versions(client_version, broker_version)
+        if interbroker_security_protocol is None:
+            interbroker_security_protocol = security_protocol
+        self.start_kafka(security_protocol, interbroker_security_protocol, broker_version)
+        self.producer = ProducerPerformanceService(
+            self.test_context, 1, self.kafka,
+            topic=TOPIC_REP_THREE, num_records=self.msgs_large, record_size=DEFAULT_RECORD_SIZE,
+            throughput=-1, version=client_version, settings={
+                'acks': 1,
+                'compression.type': compression_type,
+                'batch.size': self.batch_size,
+                'buffer.memory': self.buffer_memory
+            },
+            intermediate_stats=True
+        )
+        self.producer.run()
+
+        summary = ["Throughput over long run, data > memory:"]
+        data = {}
+        # FIXME we should be generating a graph too
+        # Try to break it into 5 blocks, but fall back to a smaller number if
+        # there aren't even 5 elements
+        block_size = max(len(self.producer.stats[0]) / 5, 1)
+        nblocks = len(self.producer.stats[0]) / block_size
+
+        for i in range(nblocks):
+            subset = self.producer.stats[0][i*block_size:min((i+1)*block_size, len(self.producer.stats[0]))]
+            if len(subset) == 0:
+                summary.append(" Time block %d: (empty)" % i)
+                data[i] = None
+            else:
+                records_per_sec = sum([stat['records_per_sec'] for stat in subset])/float(len(subset))
+                mb_per_sec = sum([stat['mbps'] for stat in subset])/float(len(subset))
+
+                summary.append(" Time block %d: %f rec/sec (%f MB/s)" % (i, records_per_sec, mb_per_sec))
+                data[i] = throughput(records_per_sec, mb_per_sec)
+
+        self.logger.info("\n".join(summary))
+        return data
+
+    @cluster(num_nodes=5)
+    @parametrize(security_protocol='SSL', interbroker_security_protocol='PLAINTEXT')
+    @matrix(security_protocol=['PLAINTEXT', 'SSL'], compression_type=["none", "snappy"])
+    @cluster(num_nodes=6)
+    @matrix(security_protocol=['SASL_PLAINTEXT', 'SASL_SSL'], compression_type=["none", "snappy"])
+    def test_end_to_end_latency(self, compression_type="none", security_protocol="PLAINTEXT",
+                                interbroker_security_protocol=None, client_version=str(DEV_BRANCH),
+                                broker_version=str(DEV_BRANCH)):
+        """
+        Setup: 1 node zk + 3 node kafka cluster
+        Produce (acks = 1) and consume 10e3 messages to a topic with 6 partitions and replication-factor 3,
+        measuring the latency between production and consumption of each message.
+
+        Return aggregate latency statistics.
+
+        (Under the hood, this simply runs EndToEndLatency.scala)
+        """
+        client_version = KafkaVersion(client_version)
+        broker_version = KafkaVersion(broker_version)
+        self.validate_versions(client_version, broker_version)
+        if interbroker_security_protocol is None:
+            interbroker_security_protocol = security_protocol
+        self.start_kafka(security_protocol, interbroker_security_protocol, broker_version)
+        self.logger.info("BENCHMARK: End to end latency")
+        self.perf = EndToEndLatencyService(
+            self.test_context, 1, self.kafka,
+            topic=TOPIC_REP_THREE, num_records=10000,
+            compression_type=compression_type, version=client_version
+        )
+        self.perf.run()
+        return latency(self.perf.results[0]['latency_50th_ms'],  self.perf.results[0]['latency_99th_ms'], self.perf.results[0]['latency_999th_ms'])
+
+    @cluster(num_nodes=6)
+    @parametrize(security_protocol='SSL', interbroker_security_protocol='PLAINTEXT')
+    @matrix(security_protocol=['PLAINTEXT', 'SSL'], compression_type=["none", "snappy"])
+    def test_producer_and_consumer(self, compression_type="none", security_protocol="PLAINTEXT",
+                                   interbroker_security_protocol=None,
+                                   client_version=str(DEV_BRANCH), broker_version=str(DEV_BRANCH)):
+        """
+        Setup: 1 node zk + 3 node kafka cluster
+        Concurrently produce and consume 10e6 messages with a single producer and a single consumer,
+
+        Return aggregate throughput statistics for both producer and consumer.
+
+        (Under the hood, this runs ProducerPerformance.java, and ConsumerPerformance.scala)
+        """
+        client_version = KafkaVersion(client_version)
+        broker_version = KafkaVersion(broker_version)
+        self.validate_versions(client_version, broker_version)
+        if interbroker_security_protocol is None:
+            interbroker_security_protocol = security_protocol
+        self.start_kafka(security_protocol, interbroker_security_protocol, broker_version)
+        num_records = 10 * 1000 * 1000  # 10e6
+
+        self.producer = ProducerPerformanceService(
+            self.test_context, 1, self.kafka,
+            topic=TOPIC_REP_THREE,
+            num_records=num_records, record_size=DEFAULT_RECORD_SIZE, throughput=-1, version=client_version,
+            settings={
+                'acks': 1,
+                'compression.type': compression_type,
+                'batch.size': self.batch_size,
+                'buffer.memory': self.buffer_memory
+            }
+        )
+        self.consumer = ConsumerPerformanceService(
+            self.test_context, 1, self.kafka, topic=TOPIC_REP_THREE, messages=num_records)
+        Service.run_parallel(self.producer, self.consumer)
+
+        data = {
+            "producer": compute_aggregate_throughput(self.producer),
+            "consumer": compute_aggregate_throughput(self.consumer)
+        }
+        summary = [
+            "Producer + consumer:",
+            str(data)]
+        self.logger.info("\n".join(summary))
+        return data
+
+    @cluster(num_nodes=6)
+    @parametrize(security_protocol='SSL', interbroker_security_protocol='PLAINTEXT')
+    @matrix(security_protocol=['PLAINTEXT', 'SSL'], compression_type=["none", "snappy"])
+    def test_consumer_throughput(self, compression_type="none", security_protocol="PLAINTEXT",
+                                 interbroker_security_protocol=None, num_consumers=1,
+                                 client_version=str(DEV_BRANCH), broker_version=str(DEV_BRANCH)):
+        """
+        Consume 10e6 100-byte messages with 1 or more consumers from a topic with 6 partitions
+        and report throughput.
+        """
+        client_version = KafkaVersion(client_version)
+        broker_version = KafkaVersion(broker_version)
+        self.validate_versions(client_version, broker_version)
+        if interbroker_security_protocol is None:
+            interbroker_security_protocol = security_protocol
+        self.start_kafka(security_protocol, interbroker_security_protocol, broker_version)
+        num_records = 10 * 1000 * 1000  # 10e6
+
+        # seed kafka w/messages
+        self.producer = ProducerPerformanceService(
+            self.test_context, 1, self.kafka,
+            topic=TOPIC_REP_THREE,
+            num_records=num_records, record_size=DEFAULT_RECORD_SIZE, throughput=-1, version=client_version,
+            settings={
+                'acks': 1,
+                'compression.type': compression_type,
+                'batch.size': self.batch_size,
+                'buffer.memory': self.buffer_memory
+            }
+        )
+        self.producer.run()
+
+        # consume
+        self.consumer = ConsumerPerformanceService(
+            self.test_context, num_consumers, self.kafka,
+            topic=TOPIC_REP_THREE, messages=num_records)
+        self.consumer.group = "test-consumer-group"
+        self.consumer.run()
+        return compute_aggregate_throughput(self.consumer)
+
+    def validate_versions(self, client_version, broker_version):
+        assert client_version <= broker_version, "Client version %s should be <= than broker version %s" (client_version, broker_version)
--- a/tests/kafkatest/benchmarks/streams/init.py
+++ b/tests/kafkatest/benchmarks/streams/init.py
@@ -0,0 +1,14 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
--- a/tests/kafkatest/benchmarks/streams/streams_simple_benchmark_test.py
+++ b/tests/kafkatest/benchmarks/streams/streams_simple_benchmark_test.py
@@ -0,0 +1,164 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from ducktape.tests.test import Test
+from ducktape.mark.resource import cluster
+from ducktape.mark import parametrize, matrix
+from kafkatest.tests.kafka_test import KafkaTest
+
+from kafkatest.services.performance.streams_performance import StreamsSimpleBenchmarkService
+from kafkatest.services.zookeeper import ZookeeperService
+from kafkatest.services.kafka import KafkaService
+from kafkatest.version import DEV_BRANCH
+
+STREAMS_SIMPLE_TESTS = ["streamprocess", "streamprocesswithsink", "streamprocesswithstatestore", "streamprocesswithwindowstore"]
+STREAMS_COUNT_TESTS = ["streamcount", "streamcountwindowed"]
+STREAMS_JOIN_TESTS = ["streamtablejoin", "streamstreamjoin", "tabletablejoin"]
+NON_STREAMS_TESTS = ["consume", "consumeproduce"]
+
+ALL_TEST = "all"
+STREAMS_SIMPLE_TEST = "streams-simple"
+STREAMS_COUNT_TEST = "streams-count"
+STREAMS_JOIN_TEST = "streams-join"
+
+
+class StreamsSimpleBenchmarkTest(Test):
+    """
+    Simple benchmark of Kafka Streams.
+    """
+
+    def __init__(self, test_context):
+        super(StreamsSimpleBenchmarkTest, self).__init__(test_context)
+
+        # these values could be updated in ad-hoc benchmarks
+        self.key_skew = 0
+        self.value_size = 1024
+        self.num_records = 10000000L
+        self.num_threads = 1
+
+        self.replication = 1
+
+    @cluster(num_nodes=12)
+    @matrix(test=["consume", "consumeproduce",
+                  "streamprocess", "streamprocesswithsink", "streamprocesswithstatestore", "streamprocesswithwindowstore",
+                  "streamcount", "streamcountwindowed",
+                  "streamtablejoin", "streamstreamjoin", "tabletablejoin"],
+            scale=[1])
+    def test_simple_benchmark(self, test, scale):
+        """
+        Run simple Kafka Streams benchmark
+        """
+        self.driver = [None] * (scale + 1)
+
+        self.final = {}
+
+        #############
+        # SETUP PHASE
+        #############
+        self.zk = ZookeeperService(self.test_context, num_nodes=1)
+        self.zk.start()
+        self.kafka = KafkaService(self.test_context, num_nodes=scale, zk=self.zk, version=DEV_BRANCH, topics={
+            'simpleBenchmarkSourceTopic1' : { 'partitions': scale, 'replication-factor': self.replication },
+            'simpleBenchmarkSourceTopic2' : { 'partitions': scale, 'replication-factor': self.replication },
+            'simpleBenchmarkSinkTopic' : { 'partitions': scale, 'replication-factor': self.replication },
+            'yahooCampaigns' : { 'partitions': 20, 'replication-factor': self.replication },
+            'yahooEvents' : { 'partitions': 20, 'replication-factor': self.replication }
+        })
+        self.kafka.log_level = "INFO"
+        self.kafka.start()
+
+
+        load_test = ""
+        if test == ALL_TEST:
+            load_test = "load-two"
+        if test in STREAMS_JOIN_TESTS or test == STREAMS_JOIN_TEST:
+            load_test = "load-two"
+        if test in STREAMS_COUNT_TESTS or test == STREAMS_COUNT_TEST:
+            load_test = "load-one"
+        if test in STREAMS_SIMPLE_TESTS or test == STREAMS_SIMPLE_TEST:
+            load_test = "load-one"
+        if test in NON_STREAMS_TESTS:
+            load_test = "load-one"
+
+
+
+        ################
+        # LOAD PHASE
+        ################
+        self.load_driver = StreamsSimpleBenchmarkService(self.test_context,
+                                                         self.kafka,
+                                                         load_test,
+                                                         self.num_threads,
+                                                         self.num_records,
+                                                         self.key_skew,
+                                                         self.value_size)
+
+        self.load_driver.start()
+        self.load_driver.wait(3600) # wait at most 30 minutes
+        self.load_driver.stop()
+
+        if test == ALL_TEST:
+            for single_test in STREAMS_SIMPLE_TESTS + STREAMS_COUNT_TESTS + STREAMS_JOIN_TESTS:
+                self.execute(single_test, scale)
+        elif test == STREAMS_SIMPLE_TEST:
+            for single_test in STREAMS_SIMPLE_TESTS:
+                self.execute(single_test, scale)
+        elif test == STREAMS_COUNT_TEST:
+            for single_test in STREAMS_COUNT_TESTS:
+                self.execute(single_test, scale)
+        elif test == STREAMS_JOIN_TEST:
+            for single_test in STREAMS_JOIN_TESTS:
+                self.execute(single_test, scale)
+        else:
+            self.execute(test, scale)
+
+        return self.final
+
+    def execute(self, test, scale):
+
+        ################
+        # RUN PHASE
+        ################
+        for num in range(0, scale):
+            self.driver[num] = StreamsSimpleBenchmarkService(self.test_context,
+                                                             self.kafka,
+                                                             test,
+                                                             self.num_threads,
+                                                             self.num_records,
+                                                             self.key_skew,
+                                                             self.value_size)
+            self.driver[num].start()
+
+        #######################
+        # STOP + COLLECT PHASE
+        #######################
+        data = [None] * (scale)
+
+        for num in range(0, scale):
+            self.driver[num].wait()
+            self.driver[num].stop()
+            self.driver[num].node.account.ssh("grep Performance %s" % self.driver[num].STDOUT_FILE, allow_fail=False)
+            data[num] = self.driver[num].collect_data(self.driver[num].node, "")
+            self.driver[num].read_jmx_output_all_nodes()
+
+        for num in range(0, scale):
+            for key in data[num]:
+                self.final[key + "-" + str(num)] = data[num][key]
+
+            for key in sorted(self.driver[num].jmx_stats[0]):
+                self.logger.info("%s: %s" % (key, self.driver[num].jmx_stats[0][key]))
+
+            self.final[test + "-jmx-avg-" + str(num)] = self.driver[num].average_jmx_value
+            self.final[test + "-jmx-max-" + str(num)] = self.driver[num].maximum_jmx_value
--- a/tests/kafkatest/directory_layout/init.py
+++ b/tests/kafkatest/directory_layout/init.py
@@ -0,0 +1,14 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
--- a/tests/kafkatest/directory_layout/kafka_path.py
+++ b/tests/kafkatest/directory_layout/kafka_path.py
@@ -0,0 +1,137 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import importlib
+import os
+
+from kafkatest.version import get_version, KafkaVersion, DEV_BRANCH
+
+
+"""This module serves a few purposes:
+
+First, it gathers information about path layout in a single place, and second, it
+makes the layout of the Kafka installation pluggable, so that users are not forced
+to use the layout assumed in the KafkaPathResolver class.
+
+To run system tests using your own path resolver, use for example:
+
+ducktape <TEST_PATH> --globals '{"kafka-path-resolver": "my.path.resolver.CustomResolverClass"}'
+"""
+
+SCRATCH_ROOT = "/mnt"
+KAFKA_INSTALL_ROOT = "/opt"
+KAFKA_PATH_RESOLVER_KEY = "kafka-path-resolver"
+KAFKA_PATH_RESOLVER = "kafkatest.directory_layout.kafka_path.KafkaSystemTestPathResolver"
+
+# Variables for jar path resolution
+CORE_JAR_NAME = "core"
+CORE_LIBS_JAR_NAME = "core-libs"
+CORE_DEPENDANT_TEST_LIBS_JAR_NAME = "core-dependant-testlibs"
+TOOLS_JAR_NAME = "tools"
+TOOLS_DEPENDANT_TEST_LIBS_JAR_NAME = "tools-dependant-libs"
+
+JARS = {
+    "dev": {
+        CORE_JAR_NAME: "core/build/*/*.jar",
+        CORE_LIBS_JAR_NAME: "core/build/libs/*.jar",
+        CORE_DEPENDANT_TEST_LIBS_JAR_NAME: "core/build/dependant-testlibs/*.jar",
+        TOOLS_JAR_NAME: "tools/build/libs/kafka-tools*.jar",
+        TOOLS_DEPENDANT_TEST_LIBS_JAR_NAME: "tools/build/dependant-libs*/*.jar"
+    }
+}
+
+
+def create_path_resolver(context, project="kafka"):
+    """Factory for generating a path resolver class
+
+    This will first check for a fully qualified path resolver classname in context.globals.
+
+    If present, construct a new instance, else default to KafkaSystemTestPathResolver
+    """
+    assert project is not None
+
+    if KAFKA_PATH_RESOLVER_KEY in context.globals:
+        resolver_fully_qualified_classname = context.globals[KAFKA_PATH_RESOLVER_KEY]
+    else:
+        resolver_fully_qualified_classname = KAFKA_PATH_RESOLVER
+
+    # Using the fully qualified classname, import the resolver class
+    (module_name, resolver_class_name) = resolver_fully_qualified_classname.rsplit('.', 1)
+    cluster_mod = importlib.import_module(module_name)
+    path_resolver_class = getattr(cluster_mod, resolver_class_name)
+    path_resolver = path_resolver_class(context, project)
+
+    return path_resolver
+
+
+class KafkaPathResolverMixin(object):
+    """Mixin to automatically provide pluggable path resolution functionality to any class using it.
+
+    Keep life simple, and don't add a constructor to this class:
+    Since use of a mixin entails multiple inheritence, it is *much* simpler to reason about the interaction of this
+    class with subclasses if we don't have to worry about method resolution order, constructor signatures etc.
+    """
+
+    @property
+    def path(self):
+        if not hasattr(self, "_path"):
+            setattr(self, "_path", create_path_resolver(self.context, "kafka"))
+            if hasattr(self.context, "logger") and self.context.logger is not None:
+                self.context.logger.debug("Using path resolver %s" % self._path.__class__.__name__)
+
+        return self._path
+
+
+class KafkaSystemTestPathResolver(object):
+    """Path resolver for Kafka system tests which assumes the following layout:
+
+        /opt/kafka-dev          # Current version of kafka under test
+        /opt/kafka-0.9.0.1      # Example of an older version of kafka installed from tarball
+        /opt/kafka-<version>    # Other previous versions of kafka
+        ...
+    """
+    def __init__(self, context, project="kafka"):
+        self.context = context
+        self.project = project
+
+    def home(self, node_or_version=DEV_BRANCH, project=None):
+        version = self._version(node_or_version)
+        home_dir = project or self.project
+        if version is not None:
+            home_dir += "-%s" % str(version)
+
+        return os.path.join(KAFKA_INSTALL_ROOT, home_dir)
+
+    def bin(self, node_or_version=DEV_BRANCH, project=None):
+        version = self._version(node_or_version)
+        return os.path.join(self.home(version, project=project), "bin")
+
+    def script(self, script_name, node_or_version=DEV_BRANCH, project=None):
+        version = self._version(node_or_version)
+        return os.path.join(self.bin(version, project=project), script_name)
+
+    def jar(self, jar_name, node_or_version=DEV_BRANCH, project=None):
+        version = self._version(node_or_version)
+        return os.path.join(self.home(version, project=project), JARS[str(version)][jar_name])
+
+    def scratch_space(self, service_instance):
+        return os.path.join(SCRATCH_ROOT, service_instance.service_id)
+
+    def _version(self, node_or_version):
+        if isinstance(node_or_version, KafkaVersion):
+            return node_or_version
+        else:
+            return get_version(node_or_version)
+
--- a/tests/kafkatest/sanity_checks/init.py
+++ b/tests/kafkatest/sanity_checks/init.py
@@ -0,0 +1,14 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
--- a/tests/kafkatest/sanity_checks/test_console_consumer.py
+++ b/tests/kafkatest/sanity_checks/test_console_consumer.py
@@ -0,0 +1,99 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import time
+
+from ducktape.mark import matrix
+from ducktape.mark import parametrize
+from ducktape.mark.resource import cluster
+from ducktape.tests.test import Test
+from ducktape.utils.util import wait_until
+
+from kafkatest.services.console_consumer import ConsoleConsumer
+from kafkatest.services.kafka import KafkaService
+from kafkatest.services.verifiable_producer import VerifiableProducer
+from kafkatest.services.zookeeper import ZookeeperService
+from kafkatest.utils.remote_account import line_count, file_exists
+from kafkatest.version import LATEST_0_8_2
+
+
+class ConsoleConsumerTest(Test):
+    """Sanity checks on console consumer service class."""
+    def __init__(self, test_context):
+        super(ConsoleConsumerTest, self).__init__(test_context)
+
+        self.topic = "topic"
+        self.zk = ZookeeperService(test_context, num_nodes=1)
+        self.kafka = KafkaService(self.test_context, num_nodes=1, zk=self.zk, zk_chroot="/kafka",
+                                  topics={self.topic: {"partitions": 1, "replication-factor": 1}})
+        self.consumer = ConsoleConsumer(self.test_context, num_nodes=1, kafka=self.kafka, topic=self.topic)
+
+    def setUp(self):
+        self.zk.start()
+
+    @cluster(num_nodes=3)
+    @matrix(security_protocol=['PLAINTEXT', 'SSL'])
+    @cluster(num_nodes=4)
+    @matrix(security_protocol=['SASL_SSL'], sasl_mechanism=['PLAIN', 'SCRAM-SHA-256', 'SCRAM-SHA-512'])
+    @matrix(security_protocol=['SASL_PLAINTEXT', 'SASL_SSL'])
+    def test_lifecycle(self, security_protocol, sasl_mechanism='GSSAPI'):
+        """Check that console consumer starts/stops properly, and that we are capturing log output."""
+
+        self.kafka.security_protocol = security_protocol
+        self.kafka.client_sasl_mechanism = sasl_mechanism
+        self.kafka.interbroker_sasl_mechanism = sasl_mechanism
+        self.kafka.start()
+
+        self.consumer.security_protocol = security_protocol
+
+        t0 = time.time()
+        self.consumer.start()
+        node = self.consumer.nodes[0]
+
+        wait_until(lambda: self.consumer.alive(node),
+            timeout_sec=20, backoff_sec=.2, err_msg="Consumer was too slow to start")
+        self.logger.info("consumer started in %s seconds " % str(time.time() - t0))
+
+        # Verify that log output is happening
+        wait_until(lambda: file_exists(node, ConsoleConsumer.LOG_FILE), timeout_sec=10,
+                   err_msg="Timed out waiting for consumer log file to exist.")
+        wait_until(lambda: line_count(node, ConsoleConsumer.LOG_FILE) > 0, timeout_sec=1,
+                   backoff_sec=.25, err_msg="Timed out waiting for log entries to start.")
+
+        # Verify no consumed messages
+        assert line_count(node, ConsoleConsumer.STDOUT_CAPTURE) == 0
+
+        self.consumer.stop_node(node)
+
+    @cluster(num_nodes=4)
+    def test_version(self):
+        """Check that console consumer v0.8.2.X successfully starts and consumes messages."""
+        self.kafka.start()
+
+        num_messages = 1000
+        self.producer = VerifiableProducer(self.test_context, num_nodes=1, kafka=self.kafka, topic=self.topic,
+                                           max_messages=num_messages, throughput=1000)
+        self.producer.start()
+        self.producer.wait()
+
+        self.consumer.nodes[0].version = LATEST_0_8_2
+        self.consumer.new_consumer = False
+        self.consumer.consumer_timeout_ms = 1000
+        self.consumer.start()
+        self.consumer.wait()
+
+        num_consumed = len(self.consumer.messages_consumed[1])
+        num_produced = self.producer.num_acked
+        assert num_produced == num_consumed, "num_produced: %d, num_consumed: %d" % (num_produced, num_consumed)
--- a/tests/kafkatest/sanity_checks/test_kafka_version.py
+++ b/tests/kafkatest/sanity_checks/test_kafka_version.py
@@ -0,0 +1,58 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from ducktape.tests.test import Test
+from ducktape.mark.resource import cluster
+
+from kafkatest.services.kafka import KafkaService, config_property
+from kafkatest.services.zookeeper import ZookeeperService
+from kafkatest.utils import is_version
+from kafkatest.version import LATEST_0_8_2, DEV_BRANCH
+
+
+class KafkaVersionTest(Test):
+    """Sanity checks on kafka versioning."""
+    def __init__(self, test_context):
+        super(KafkaVersionTest, self).__init__(test_context)
+
+        self.topic = "topic"
+        self.zk = ZookeeperService(test_context, num_nodes=1)
+
+    def setUp(self):
+        self.zk.start()
+
+    @cluster(num_nodes=2)
+    def test_0_8_2(self):
+        """Test kafka service node-versioning api - verify that we can bring up a single-node 0.8.2.X cluster."""
+        self.kafka = KafkaService(self.test_context, num_nodes=1, zk=self.zk,
+                                  topics={self.topic: {"partitions": 1, "replication-factor": 1}})
+        node = self.kafka.nodes[0]
+        node.version = LATEST_0_8_2
+        self.kafka.start()
+
+        assert is_version(node, [LATEST_0_8_2], logger=self.logger)
+
+    @cluster(num_nodes=3)
+    def test_multi_version(self):
+        """Test kafka service node-versioning api - ensure we can bring up a 2-node cluster, one on version 0.8.2.X,
+        the other on the current development branch."""
+        self.kafka = KafkaService(self.test_context, num_nodes=2, zk=self.zk,
+                                  topics={self.topic: {"partitions": 1, "replication-factor": 2}})
+        self.kafka.nodes[1].version = LATEST_0_8_2
+        self.kafka.nodes[1].config[config_property.INTER_BROKER_PROTOCOL_VERSION] = "0.8.2.X"
+        self.kafka.start()
+
+        assert is_version(self.kafka.nodes[0], [DEV_BRANCH.vstring], logger=self.logger)
+        assert is_version(self.kafka.nodes[1], [LATEST_0_8_2], logger=self.logger)
--- a/tests/kafkatest/sanity_checks/test_performance_services.py
+++ b/tests/kafkatest/sanity_checks/test_performance_services.py
@@ -0,0 +1,90 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from ducktape.mark import parametrize
+from ducktape.mark.resource import cluster
+from ducktape.tests.test import Test
+
+from kafkatest.services.kafka import KafkaService
+from kafkatest.services.performance import ProducerPerformanceService, ConsumerPerformanceService, EndToEndLatencyService
+from kafkatest.services.performance import latency, compute_aggregate_throughput
+from kafkatest.services.zookeeper import ZookeeperService
+from kafkatest.version import DEV_BRANCH, LATEST_0_8_2, LATEST_0_9, LATEST_1_1, KafkaVersion
+
+
+class PerformanceServiceTest(Test):
+    def __init__(self, test_context):
+        super(PerformanceServiceTest, self).__init__(test_context)
+        self.record_size = 100
+        self.num_records = 10000
+        self.topic = "topic"
+
+        self.zk = ZookeeperService(test_context, 1)
+
+    def setUp(self):
+        self.zk.start()
+
+    @cluster(num_nodes=5)
+    # We are keeping 0.8.2 here so that we don't inadvertently break support for it. Since this is just a sanity check,
+    # the overhead should be manageable.
+    @parametrize(version=str(LATEST_0_8_2), new_consumer=False)
+    @parametrize(version=str(LATEST_0_9), new_consumer=False)
+    @parametrize(version=str(LATEST_0_9))
+    @parametrize(version=str(LATEST_1_1), new_consumer=False)
+    @parametrize(version=str(DEV_BRANCH))
+    def test_version(self, version=str(LATEST_0_9), new_consumer=True):
+        """
+        Sanity check out producer performance service - verify that we can run the service with a small
+        number of messages. The actual stats here are pretty meaningless since the number of messages is quite small.
+        """
+        version = KafkaVersion(version)
+        self.kafka = KafkaService(
+            self.test_context, 1,
+            self.zk, topics={self.topic: {'partitions': 1, 'replication-factor': 1}}, version=version)
+        self.kafka.start()
+
+        # check basic run of producer performance
+        self.producer_perf = ProducerPerformanceService(
+            self.test_context, 1, self.kafka, topic=self.topic,
+            num_records=self.num_records, record_size=self.record_size,
+            throughput=1000000000,  # Set impossibly for no throttling for equivalent behavior between 0.8.X and 0.9.X
+            version=version,
+            settings={
+                'acks': 1,
+                'batch.size': 8*1024,
+                'buffer.memory': 64*1024*1024})
+        self.producer_perf.run()
+        producer_perf_data = compute_aggregate_throughput(self.producer_perf)
+
+        # check basic run of end to end latency
+        self.end_to_end = EndToEndLatencyService(
+            self.test_context, 1, self.kafka,
+            topic=self.topic, num_records=self.num_records, version=version)
+        self.end_to_end.run()
+        end_to_end_data = latency(self.end_to_end.results[0]['latency_50th_ms'],  self.end_to_end.results[0]['latency_99th_ms'], self.end_to_end.results[0]['latency_999th_ms'])
+
+        # check basic run of consumer performance service
+        self.consumer_perf = ConsumerPerformanceService(
+            self.test_context, 1, self.kafka, new_consumer=new_consumer,
+            topic=self.topic, version=version, messages=self.num_records)
+        self.consumer_perf.group = "test-consumer-group"
+        self.consumer_perf.run()
+        consumer_perf_data = compute_aggregate_throughput(self.consumer_perf)
+
+        return {
+            "producer_performance": producer_perf_data,
+            "end_to_end_latency": end_to_end_data,
+            "consumer_performance": consumer_perf_data
+        }
--- a/tests/kafkatest/sanity_checks/test_verifiable_producer.py
+++ b/tests/kafkatest/sanity_checks/test_verifiable_producer.py
@@ -0,0 +1,84 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+
+from ducktape.mark import parametrize
+from ducktape.mark.resource import cluster
+from ducktape.tests.test import Test
+from ducktape.utils.util import wait_until
+
+from kafkatest.services.kafka import KafkaService
+from kafkatest.services.verifiable_producer import VerifiableProducer
+from kafkatest.services.zookeeper import ZookeeperService
+from kafkatest.utils import is_version
+from kafkatest.version import LATEST_0_8_2, LATEST_0_9, LATEST_0_10_0, LATEST_0_10_1, DEV_BRANCH, KafkaVersion
+
+
+class TestVerifiableProducer(Test):
+    """Sanity checks on verifiable producer service class."""
+    def __init__(self, test_context):
+        super(TestVerifiableProducer, self).__init__(test_context)
+
+        self.topic = "topic"
+        self.zk = ZookeeperService(test_context, num_nodes=1)
+        self.kafka = KafkaService(test_context, num_nodes=1, zk=self.zk,
+                                  topics={self.topic: {"partitions": 1, "replication-factor": 1}})
+
+        self.num_messages = 1000
+        # This will produce to source kafka cluster
+        self.producer = VerifiableProducer(test_context, num_nodes=1, kafka=self.kafka, topic=self.topic,
+                                           max_messages=self.num_messages, throughput=self.num_messages/5)
+
+    def setUp(self):
+        self.zk.start()
+        self.kafka.start()
+
+    @cluster(num_nodes=3)
+    @parametrize(producer_version=str(LATEST_0_8_2))
+    @parametrize(producer_version=str(LATEST_0_9))
+    @parametrize(producer_version=str(LATEST_0_10_0))
+    @parametrize(producer_version=str(LATEST_0_10_1))
+    @parametrize(producer_version=str(DEV_BRANCH))
+    def test_simple_run(self, producer_version=DEV_BRANCH):
+        """
+        Test that we can start VerifiableProducer on the current branch snapshot version or against the 0.8.2 jar, and
+        verify that we can produce a small number of messages.
+        """
+        node = self.producer.nodes[0]
+        node.version = KafkaVersion(producer_version)
+        self.producer.start()
+        wait_until(lambda: self.producer.num_acked > 5, timeout_sec=5,
+             err_msg="Producer failed to start in a reasonable amount of time.")
+
+        # using version.vstring (distutils.version.LooseVersion) is a tricky way of ensuring
+        # that this check works with DEV_BRANCH
+        # When running VerifiableProducer 0.8.X, both the current branch version and 0.8.X should show up because of the
+        # way verifiable producer pulls in some development directories into its classpath
+        #
+        # If the test fails here because 'ps .. | grep' couldn't find the process it means
+        # the login and grep that is_version() performs is slower than
+        # the time it takes the producer to produce its messages.
+        # Easy fix is to decrease throughput= above, the good fix is to make the producer
+        # not terminate until explicitly killed in this case.
+        if node.version <= LATEST_0_8_2:
+            assert is_version(node, [node.version.vstring, DEV_BRANCH.vstring], logger=self.logger)
+        else:
+            assert is_version(node, [node.version.vstring], logger=self.logger)
+
+        self.producer.wait()
+        num_produced = self.producer.num_acked
+        assert num_produced == self.num_messages, "num_produced: %d, num_messages: %d" % (num_produced, self.num_messages)
+
+
--- a/tests/kafkatest/services/init.py
+++ b/tests/kafkatest/services/init.py
@@ -0,0 +1,14 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
--- a/tests/kafkatest/services/connect.py
+++ b/tests/kafkatest/services/connect.py
@@ -0,0 +1,519 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import json
+import os.path
+import random
+import signal
+import time
+
+import requests
+from ducktape.errors import DucktapeError
+from ducktape.services.service import Service
+from ducktape.utils.util import wait_until
+
+from kafkatest.directory_layout.kafka_path import KafkaPathResolverMixin
+
+
+class ConnectServiceBase(KafkaPathResolverMixin, Service):
+    """Base class for Kafka Connect services providing some common settings and functionality"""
+
+    PERSISTENT_ROOT = "/mnt/connect"
+    CONFIG_FILE = os.path.join(PERSISTENT_ROOT, "connect.properties")
+    # The log file contains normal log4j logs written using a file appender. stdout and stderr are handled separately
+    # so they can be used for other output, e.g. verifiable source & sink.
+    LOG_FILE = os.path.join(PERSISTENT_ROOT, "connect.log")
+    STDOUT_FILE = os.path.join(PERSISTENT_ROOT, "connect.stdout")
+    STDERR_FILE = os.path.join(PERSISTENT_ROOT, "connect.stderr")
+    LOG4J_CONFIG_FILE = os.path.join(PERSISTENT_ROOT, "connect-log4j.properties")
+    PID_FILE = os.path.join(PERSISTENT_ROOT, "connect.pid")
+    EXTERNAL_CONFIGS_FILE = os.path.join(PERSISTENT_ROOT, "connect-external-configs.properties")
+    CONNECT_REST_PORT = 8083
+    HEAP_DUMP_FILE = os.path.join(PERSISTENT_ROOT, "connect_heap_dump.bin")
+
+    # Currently the Connect worker supports waiting on four modes:
+    STARTUP_MODE_INSTANT = 'INSTANT'
+    """STARTUP_MODE_INSTANT: Start Connect worker and return immediately"""
+    STARTUP_MODE_LOAD = 'LOAD'
+    """STARTUP_MODE_LOAD: Start Connect worker and return after discovering and loading plugins"""
+    STARTUP_MODE_LISTEN = 'LISTEN'
+    """STARTUP_MODE_LISTEN: Start Connect worker and return after opening the REST port."""
+    STARTUP_MODE_JOIN = 'JOIN'
+    """STARTUP_MODE_JOIN: Start Connect worker and return after joining the group."""
+
+    logs = {
+        "connect_log": {
+            "path": LOG_FILE,
+            "collect_default": True},
+        "connect_stdout": {
+            "path": STDOUT_FILE,
+            "collect_default": False},
+        "connect_stderr": {
+            "path": STDERR_FILE,
+            "collect_default": True},
+        "connect_heap_dump_file": {
+            "path": HEAP_DUMP_FILE,
+            "collect_default": True}
+    }
+
+    def __init__(self, context, num_nodes, kafka, files, startup_timeout_sec = 60):
+        super(ConnectServiceBase, self).__init__(context, num_nodes)
+        self.kafka = kafka
+        self.security_config = kafka.security_config.client_config()
+        self.files = files
+        self.startup_mode = self.STARTUP_MODE_LISTEN
+        self.startup_timeout_sec = startup_timeout_sec
+        self.environment = {}
+        self.external_config_template_func = None
+
+    def pids(self, node):
+        """Return process ids for Kafka Connect processes."""
+        try:
+            return [pid for pid in node.account.ssh_capture("cat " + self.PID_FILE, callback=int)]
+        except:
+            return []
+
+    def set_configs(self, config_template_func, connector_config_templates=None):
+        """
+        Set configurations for the worker and the connector to run on
+        it. These are not provided in the constructor because the worker
+        config generally needs access to ZK/Kafka services to
+        create the configuration.
+        """
+        self.config_template_func = config_template_func
+        self.connector_config_templates = connector_config_templates
+
+    def set_external_configs(self, external_config_template_func):
+        """
+        Set the properties that will be written in the external file properties
+        as used by the org.apache.kafka.common.config.provider.FileConfigProvider.
+        When this is used, the worker configuration must also enable the FileConfigProvider.
+        This is not provided in the constructor in case the worker
+        config generally needs access to ZK/Kafka services to
+        create the configuration.
+        """
+        self.external_config_template_func = external_config_template_func
+
+    def listening(self, node):
+        try:
+            self.list_connectors(node)
+            self.logger.debug("Connect worker started serving REST at: '%s:%s')", node.account.hostname,
+                              self.CONNECT_REST_PORT)
+            return True
+        except requests.exceptions.ConnectionError:
+            self.logger.debug("REST resources are not loaded yet")
+            return False
+
+    def start(self, mode=None):
+        if mode:
+            self.startup_mode = mode
+        super(ConnectServiceBase, self).start()
+
+    def start_and_return_immediately(self, node, worker_type, remote_connector_configs):
+        cmd = self.start_cmd(node, remote_connector_configs)
+        self.logger.debug("Connect %s command: %s", worker_type, cmd)
+        node.account.ssh(cmd)
+
+    def start_and_wait_to_load_plugins(self, node, worker_type, remote_connector_configs):
+        with node.account.monitor_log(self.LOG_FILE) as monitor:
+            self.start_and_return_immediately(node, worker_type, remote_connector_configs)
+            monitor.wait_until('Kafka version', timeout_sec=self.startup_timeout_sec,
+                               err_msg="Never saw message indicating Kafka Connect finished startup on node: " +
+                                       "%s in condition mode: %s" % (str(node.account), self.startup_mode))
+
+    def start_and_wait_to_start_listening(self, node, worker_type, remote_connector_configs):
+        self.start_and_return_immediately(node, worker_type, remote_connector_configs)
+        wait_until(lambda: self.listening(node), timeout_sec=self.startup_timeout_sec,
+                   err_msg="Kafka Connect failed to start on node: %s in condition mode: %s" %
+                   (str(node.account), self.startup_mode))
+
+    def start_and_wait_to_join_group(self, node, worker_type, remote_connector_configs):
+        if worker_type != 'distributed':
+            raise RuntimeError("Cannot wait for joined group message for %s" % worker_type)
+        with node.account.monitor_log(self.LOG_FILE) as monitor:
+            self.start_and_return_immediately(node, worker_type, remote_connector_configs)
+            monitor.wait_until('Joined group', timeout_sec=self.startup_timeout_sec,
+                               err_msg="Never saw message indicating Kafka Connect joined group on node: " +
+                                       "%s in condition mode: %s" % (str(node.account), self.startup_mode))
+
+    def stop_node(self, node, clean_shutdown=True):
+        self.logger.info((clean_shutdown and "Cleanly" or "Forcibly") + " stopping Kafka Connect on " + str(node.account))
+        pids = self.pids(node)
+        sig = signal.SIGTERM if clean_shutdown else signal.SIGKILL
+
+        for pid in pids:
+            node.account.signal(pid, sig, allow_fail=True)
+        if clean_shutdown:
+            for pid in pids:
+                wait_until(lambda: not node.account.alive(pid), timeout_sec=self.startup_timeout_sec, err_msg="Kafka Connect process on " + str(
+                    node.account) + " took too long to exit")
+
+        node.account.ssh("rm -f " + self.PID_FILE, allow_fail=False)
+
+    def restart(self, clean_shutdown=True):
+        # We don't want to do any clean up here, just restart the process.
+        for node in self.nodes:
+            self.logger.info("Restarting Kafka Connect on " + str(node.account))
+            self.restart_node(node, clean_shutdown)
+
+    def restart_node(self, node, clean_shutdown=True):
+        self.stop_node(node, clean_shutdown)
+        self.start_node(node)
+
+    def clean_node(self, node):
+        node.account.kill_process("connect", clean_shutdown=False, allow_fail=True)
+        self.security_config.clean_node(node)
+        other_files = " ".join(self.config_filenames() + self.files)
+        node.account.ssh("rm -rf -- %s %s" % (ConnectServiceBase.PERSISTENT_ROOT, other_files), allow_fail=False)
+
+    def config_filenames(self):
+        return [os.path.join(self.PERSISTENT_ROOT, "connect-connector-" + str(idx) + ".properties") for idx, template in enumerate(self.connector_config_templates or [])]
+
+    def list_connectors(self, node=None, **kwargs):
+        return self._rest_with_retry('/connectors', node=node, **kwargs)
+
+    def create_connector(self, config, node=None, **kwargs):
+        create_request = {
+            'name': config['name'],
+            'config': config
+        }
+        return self._rest_with_retry('/connectors', create_request, node=node, method="POST", **kwargs)
+
+    def get_connector(self, name, node=None, **kwargs):
+        return self._rest_with_retry('/connectors/' + name, node=node, **kwargs)
+
+    def get_connector_config(self, name, node=None, **kwargs):
+        return self._rest_with_retry('/connectors/' + name + '/config', node=node, **kwargs)
+
+    def set_connector_config(self, name, config, node=None, **kwargs):
+        # Unlike many other calls, a 409 when setting a connector config is expected if the connector already exists.
+        # However, we also might see 409s for other reasons (e.g. rebalancing). So we still perform retries at the cost
+        # of tests possibly taking longer to ultimately fail. Tests that care about this can explicitly override the
+        # number of retries.
+        return self._rest_with_retry('/connectors/' + name + '/config', config, node=node, method="PUT", **kwargs)
+
+    def get_connector_tasks(self, name, node=None, **kwargs):
+        return self._rest_with_retry('/connectors/' + name + '/tasks', node=node, **kwargs)
+
+    def delete_connector(self, name, node=None, **kwargs):
+        return self._rest_with_retry('/connectors/' + name, node=node, method="DELETE", **kwargs)
+
+    def get_connector_status(self, name, node=None):
+        return self._rest('/connectors/' + name + '/status', node=node)
+
+    def restart_connector(self, name, node=None, **kwargs):
+        return self._rest_with_retry('/connectors/' + name + '/restart', node=node, method="POST", **kwargs)
+
+    def restart_task(self, connector_name, task_id, node=None):
+        return self._rest('/connectors/' + connector_name + '/tasks/' + str(task_id) + '/restart', node=node, method="POST")
+
+    def pause_connector(self, name, node=None):
+        return self._rest('/connectors/' + name + '/pause', node=node, method="PUT")
+
+    def resume_connector(self, name, node=None):
+        return self._rest('/connectors/' + name + '/resume', node=node, method="PUT")
+
+    def list_connector_plugins(self, node=None):
+        return self._rest('/connector-plugins/', node=node)
+
+    def validate_config(self, connector_type, validate_request, node=None):
+        return self._rest('/connector-plugins/' + connector_type + '/config/validate', validate_request, node=node, method="PUT")
+
+    def _rest(self, path, body=None, node=None, method="GET"):
+        if node is None:
+            node = random.choice(self.nodes)
+
+        meth = getattr(requests, method.lower())
+        url = self._base_url(node) + path
+        self.logger.debug("Kafka Connect REST request: %s %s %s %s", node.account.hostname, url, method, body)
+        resp = meth(url, json=body)
+        self.logger.debug("%s %s response: %d", url, method, resp.status_code)
+        if resp.status_code > 400:
+            self.logger.debug("Connect REST API error for %s: %d %s", resp.url, resp.status_code, resp.text)
+            raise ConnectRestError(resp.status_code, resp.text, resp.url)
+        if resp.status_code == 204 or resp.status_code == 202:
+            return None
+        else:
+            return resp.json()
+
+    def _rest_with_retry(self, path, body=None, node=None, method="GET", retries=40, retry_backoff=.25):
+        """
+        Invokes a REST API with retries for errors that may occur during normal operation (notably 409 CONFLICT
+        responses that can occur due to rebalancing or 404 when the connect resources are not initialized yet).
+        """
+        exception_to_throw = None
+        for i in range(0, retries + 1):
+            try:
+                return self._rest(path, body, node, method)
+            except ConnectRestError as e:
+                exception_to_throw = e
+                if e.status != 409 and e.status != 404:
+                    break
+                time.sleep(retry_backoff)
+        raise exception_to_throw
+
+    def _base_url(self, node):
+        return 'http://' + node.account.externally_routable_ip + ':' + str(self.CONNECT_REST_PORT)
+
+    def append_to_environment_variable(self, envvar, value):
+        env_opts = self.environment[envvar]
+        if env_opts is None:
+            env_opts = "\"%s\"" % value
+        else:
+            env_opts = "\"%s %s\"" % (env_opts.strip('\"'), value)
+        self.environment[envvar] = env_opts
+
+
+class ConnectStandaloneService(ConnectServiceBase):
+    """Runs Kafka Connect in standalone mode."""
+
+    def __init__(self, context, kafka, files, startup_timeout_sec = 60):
+        super(ConnectStandaloneService, self).__init__(context, 1, kafka, files, startup_timeout_sec)
+
+    # For convenience since this service only makes sense with a single node
+    @property
+    def node(self):
+        return self.nodes[0]
+
+    def start_cmd(self, node, connector_configs):
+        cmd = "( export KAFKA_LOG4J_OPTS=\"-Dlog4j.configuration=file:%s\"; " % self.LOG4J_CONFIG_FILE
+        heap_kafka_opts = "-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=%s" % \
+                          self.logs["connect_heap_dump_file"]["path"]
+        other_kafka_opts = self.security_config.kafka_opts.strip('\"')
+        cmd += "export KAFKA_OPTS=\"%s %s\"; " % (heap_kafka_opts, other_kafka_opts)
+        for envvar in self.environment:
+            cmd += "export %s=%s; " % (envvar, str(self.environment[envvar]))
+        cmd += "%s %s " % (self.path.script("connect-standalone.sh", node), self.CONFIG_FILE)
+        cmd += " ".join(connector_configs)
+        cmd += " & echo $! >&3 ) 1>> %s 2>> %s 3> %s" % (self.STDOUT_FILE, self.STDERR_FILE, self.PID_FILE)
+        return cmd
+
+    def start_node(self, node):
+        node.account.ssh("mkdir -p %s" % self.PERSISTENT_ROOT, allow_fail=False)
+
+        self.security_config.setup_node(node)
+        if self.external_config_template_func:
+            node.account.create_file(self.EXTERNAL_CONFIGS_FILE, self.external_config_template_func(node))
+        node.account.create_file(self.CONFIG_FILE, self.config_template_func(node))
+        node.account.create_file(self.LOG4J_CONFIG_FILE, self.render('connect_log4j.properties', log_file=self.LOG_FILE))
+        remote_connector_configs = []
+        for idx, template in enumerate(self.connector_config_templates):
+            target_file = os.path.join(self.PERSISTENT_ROOT, "connect-connector-" + str(idx) + ".properties")
+            node.account.create_file(target_file, template)
+            remote_connector_configs.append(target_file)
+
+        self.logger.info("Starting Kafka Connect standalone process on " + str(node.account))
+        if self.startup_mode == self.STARTUP_MODE_LOAD:
+            self.start_and_wait_to_load_plugins(node, 'standalone', remote_connector_configs)
+        elif self.startup_mode == self.STARTUP_MODE_INSTANT:
+            self.start_and_return_immediately(node, 'standalone', remote_connector_configs)
+        elif self.startup_mode == self.STARTUP_MODE_JOIN:
+            self.start_and_wait_to_join_group(node, 'standalone', remote_connector_configs)
+        else:
+            # The default mode is to wait until the complete startup of the worker
+            self.start_and_wait_to_start_listening(node, 'standalone', remote_connector_configs)
+
+        if len(self.pids(node)) == 0:
+            raise RuntimeError("No process ids recorded")
+
+
+class ConnectDistributedService(ConnectServiceBase):
+    """Runs Kafka Connect in distributed mode."""
+
+    def __init__(self, context, num_nodes, kafka, files, offsets_topic="connect-offsets",
+                 configs_topic="connect-configs", status_topic="connect-status", startup_timeout_sec = 60):
+        super(ConnectDistributedService, self).__init__(context, num_nodes, kafka, files, startup_timeout_sec)
+        self.startup_mode = self.STARTUP_MODE_JOIN
+        self.offsets_topic = offsets_topic
+        self.configs_topic = configs_topic
+        self.status_topic = status_topic
+
+    # connector_configs argument is intentionally ignored in distributed service.
+    def start_cmd(self, node, connector_configs):
+        cmd = "( export KAFKA_LOG4J_OPTS=\"-Dlog4j.configuration=file:%s\"; " % self.LOG4J_CONFIG_FILE
+        heap_kafka_opts = "-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=%s" % \
+                          self.logs["connect_heap_dump_file"]["path"]
+        other_kafka_opts = self.security_config.kafka_opts.strip('\"')
+        cmd += "export KAFKA_OPTS=\"%s %s\"; " % (heap_kafka_opts, other_kafka_opts)
+        for envvar in self.environment:
+            cmd += "export %s=%s; " % (envvar, str(self.environment[envvar]))
+        cmd += "%s %s " % (self.path.script("connect-distributed.sh", node), self.CONFIG_FILE)
+        cmd += " & echo $! >&3 ) 1>> %s 2>> %s 3> %s" % (self.STDOUT_FILE, self.STDERR_FILE, self.PID_FILE)
+        return cmd
+
+    def start_node(self, node):
+        node.account.ssh("mkdir -p %s" % self.PERSISTENT_ROOT, allow_fail=False)
+
+        self.security_config.setup_node(node)
+        if self.external_config_template_func:
+            node.account.create_file(self.EXTERNAL_CONFIGS_FILE, self.external_config_template_func(node))
+        node.account.create_file(self.CONFIG_FILE, self.config_template_func(node))
+        node.account.create_file(self.LOG4J_CONFIG_FILE, self.render('connect_log4j.properties', log_file=self.LOG_FILE))
+        if self.connector_config_templates:
+            raise DucktapeError("Config files are not valid in distributed mode, submit connectors via the REST API")
+
+        self.logger.info("Starting Kafka Connect distributed process on " + str(node.account))
+        if self.startup_mode == self.STARTUP_MODE_LOAD:
+            self.start_and_wait_to_load_plugins(node, 'distributed', '')
+        elif self.startup_mode == self.STARTUP_MODE_INSTANT:
+            self.start_and_return_immediately(node, 'distributed', '')
+        elif self.startup_mode == self.STARTUP_MODE_LISTEN:
+            self.start_and_wait_to_start_listening(node, 'distributed', '')
+        else:
+            # The default mode is to wait until the complete startup of the worker
+            self.start_and_wait_to_join_group(node, 'distributed', '')
+
+        if len(self.pids(node)) == 0:
+            raise RuntimeError("No process ids recorded")
+
+
+class ErrorTolerance(object):
+    ALL = "all"
+    NONE = "none"
+
+
+class ConnectRestError(RuntimeError):
+    def __init__(self, status, msg, url):
+        self.status = status
+        self.message = msg
+        self.url = url
+
+    def __unicode__(self):
+        return "Kafka Connect REST call failed: returned " + self.status + " for " + self.url + ". Response: " + self.message
+
+
+class VerifiableConnector(object):
+    def messages(self):
+        """
+        Collect and parse the logs from Kafka Connect nodes. Return a list containing all parsed JSON messages generated by
+        this source.
+        """
+        self.logger.info("Collecting messages from log of %s %s", type(self).__name__, self.name)
+        records = []
+        for node in self.cc.nodes:
+            for line in node.account.ssh_capture('cat ' + self.cc.STDOUT_FILE):
+                try:
+                    data = json.loads(line)
+                except ValueError:
+                    self.logger.debug("Ignoring unparseable line: %s", line)
+                    continue
+                # Filter to only ones matching our name to support multiple verifiable producers
+                if data['name'] != self.name:
+                    continue
+                data['node'] = node
+                records.append(data)
+        return records
+
+    def stop(self):
+        self.logger.info("Destroying connector %s %s", type(self).__name__, self.name)
+        self.cc.delete_connector(self.name)
+
+
+class VerifiableSource(VerifiableConnector):
+    """
+    Helper class for running a verifiable source connector on a Kafka Connect cluster and analyzing the output.
+    """
+
+    def __init__(self, cc, name="verifiable-source", tasks=1, topic="verifiable", throughput=1000):
+        self.cc = cc
+        self.logger = self.cc.logger
+        self.name = name
+        self.tasks = tasks
+        self.topic = topic
+        self.throughput = throughput
+
+    def committed_messages(self):
+        return filter(lambda m: 'committed' in m and m['committed'], self.messages())
+
+    def sent_messages(self):
+        return filter(lambda m: 'committed' not in m or not m['committed'], self.messages())
+
+    def start(self):
+        self.logger.info("Creating connector VerifiableSourceConnector %s", self.name)
+        self.cc.create_connector({
+            'name': self.name,
+            'connector.class': 'org.apache.kafka.connect.tools.VerifiableSourceConnector',
+            'tasks.max': self.tasks,
+            'topic': self.topic,
+            'throughput': self.throughput
+        })
+
+
+class VerifiableSink(VerifiableConnector):
+    """
+    Helper class for running a verifiable sink connector on a Kafka Connect cluster and analyzing the output.
+    """
+
+    def __init__(self, cc, name="verifiable-sink", tasks=1, topics=["verifiable"]):
+        self.cc = cc
+        self.logger = self.cc.logger
+        self.name = name
+        self.tasks = tasks
+        self.topics = topics
+
+    def flushed_messages(self):
+        return filter(lambda m: 'flushed' in m and m['flushed'], self.messages())
+
+    def received_messages(self):
+        return filter(lambda m: 'flushed' not in m or not m['flushed'], self.messages())
+
+    def start(self):
+        self.logger.info("Creating connector VerifiableSinkConnector %s", self.name)
+        self.cc.create_connector({
+            'name': self.name,
+            'connector.class': 'org.apache.kafka.connect.tools.VerifiableSinkConnector',
+            'tasks.max': self.tasks,
+            'topics': ",".join(self.topics)
+        })
+
+class MockSink(object):
+
+    def __init__(self, cc, topics, mode=None, delay_sec=10, name="mock-sink"):
+        self.cc = cc
+        self.logger = self.cc.logger
+        self.name = name
+        self.mode = mode
+        self.delay_sec = delay_sec
+        self.topics = topics
+
+    def start(self):
+        self.logger.info("Creating connector MockSinkConnector %s", self.name)
+        self.cc.create_connector({
+            'name': self.name,
+            'connector.class': 'org.apache.kafka.connect.tools.MockSinkConnector',
+            'tasks.max': 1,
+            'topics': ",".join(self.topics),
+            'mock_mode': self.mode,
+            'delay_ms': self.delay_sec * 1000
+        })
+
+class MockSource(object):
+
+    def __init__(self, cc, mode=None, delay_sec=10, name="mock-source"):
+        self.cc = cc
+        self.logger = self.cc.logger
+        self.name = name
+        self.mode = mode
+        self.delay_sec = delay_sec
+
+    def start(self):
+        self.logger.info("Creating connector MockSourceConnector %s", self.name)
+        self.cc.create_connector({
+            'name': self.name,
+            'connector.class': 'org.apache.kafka.connect.tools.MockSourceConnector',
+            'tasks.max': 1,
+            'mock_mode': self.mode,
+            'delay_ms': self.delay_sec * 1000
+        })
--- a/tests/kafkatest/services/console_consumer.py
+++ b/tests/kafkatest/services/console_consumer.py
@@ -0,0 +1,315 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import itertools
+import os
+
+from ducktape.cluster.remoteaccount import RemoteCommandError
+from ducktape.services.background_thread import BackgroundThreadService
+from ducktape.utils.util import wait_until
+
+from kafkatest.directory_layout.kafka_path import KafkaPathResolverMixin
+from kafkatest.services.monitor.jmx import JmxMixin
+from kafkatest.version import DEV_BRANCH, LATEST_0_8_2, LATEST_0_9, LATEST_0_10_0, V_0_9_0_0, V_0_10_0_0, V_0_11_0_0, V_2_0_0
+
+"""
+The console consumer is a tool that reads data from Kafka and outputs it to standard output.
+"""
+
+
+class ConsoleConsumer(KafkaPathResolverMixin, JmxMixin, BackgroundThreadService):
+    # Root directory for persistent output
+    PERSISTENT_ROOT = "/mnt/console_consumer"
+    STDOUT_CAPTURE = os.path.join(PERSISTENT_ROOT, "console_consumer.stdout")
+    STDERR_CAPTURE = os.path.join(PERSISTENT_ROOT, "console_consumer.stderr")
+    LOG_DIR = os.path.join(PERSISTENT_ROOT, "logs")
+    LOG_FILE = os.path.join(LOG_DIR, "console_consumer.log")
+    LOG4J_CONFIG = os.path.join(PERSISTENT_ROOT, "tools-log4j.properties")
+    CONFIG_FILE = os.path.join(PERSISTENT_ROOT, "console_consumer.properties")
+    JMX_TOOL_LOG = os.path.join(PERSISTENT_ROOT, "jmx_tool.log")
+    JMX_TOOL_ERROR_LOG = os.path.join(PERSISTENT_ROOT, "jmx_tool.err.log")
+
+    logs = {
+        "consumer_stdout": {
+            "path": STDOUT_CAPTURE,
+            "collect_default": False},
+        "consumer_stderr": {
+            "path": STDERR_CAPTURE,
+            "collect_default": False},
+        "consumer_log": {
+            "path": LOG_FILE,
+            "collect_default": True},
+        "jmx_log": {
+            "path" : JMX_TOOL_LOG,
+            "collect_default": False},
+        "jmx_err_log": {
+            "path": JMX_TOOL_ERROR_LOG,
+            "collect_default": False}
+    }
+
+    def __init__(self, context, num_nodes, kafka, topic, group_id="test-consumer-group", new_consumer=True,
+                 message_validator=None, from_beginning=True, consumer_timeout_ms=None, version=DEV_BRANCH,
+                 client_id="console-consumer", print_key=False, jmx_object_names=None, jmx_attributes=None,
+                 enable_systest_events=False, stop_timeout_sec=35, print_timestamp=False, print_partition=False,
+                 isolation_level="read_uncommitted", jaas_override_variables=None,
+                 kafka_opts_override="", client_prop_file_override="", consumer_properties={}):
+        """
+        Args:
+            context:                    standard context
+            num_nodes:                  number of nodes to use (this should be 1)
+            kafka:                      kafka service
+            topic:                      consume from this topic
+            new_consumer:               use new Kafka consumer if True
+            message_validator:          function which returns message or None
+            from_beginning:             consume from beginning if True, else from the end
+            consumer_timeout_ms:        corresponds to consumer.timeout.ms. consumer process ends if time between
+                                        successively consumed messages exceeds this timeout. Setting this and
+                                        waiting for the consumer to stop is a pretty good way to consume all messages
+                                        in a topic.
+            print_timestamp             if True, print each message's timestamp as well
+            print_key                   if True, print each message's key as well
+            print_partition             if True, print each message's partition as well
+            enable_systest_events       if True, console consumer will print additional lifecycle-related information
+                                        only available in 0.10.0 and later.
+            stop_timeout_sec            After stopping a node, wait up to stop_timeout_sec for the node to stop,
+                                        and the corresponding background thread to finish successfully.
+            isolation_level             How to handle transactional messages.
+            jaas_override_variables     A dict of variables to be used in the jaas.conf template file
+            kafka_opts_override         Override parameters of the KAFKA_OPTS environment variable
+            client_prop_file_override   Override client.properties file used by the consumer
+            consumer_properties         A dict of values to pass in as --consumer-property key=value
+        """
+        JmxMixin.__init__(self, num_nodes=num_nodes, jmx_object_names=jmx_object_names, jmx_attributes=(jmx_attributes or []),
+                          root=ConsoleConsumer.PERSISTENT_ROOT)
+        BackgroundThreadService.__init__(self, context, num_nodes)
+        self.kafka = kafka
+        self.new_consumer = new_consumer
+        self.group_id = group_id
+        self.args = {
+            'topic': topic,
+        }
+
+        self.consumer_timeout_ms = consumer_timeout_ms
+        for node in self.nodes:
+            node.version = version
+
+        self.from_beginning = from_beginning
+        self.message_validator = message_validator
+        self.messages_consumed = {idx: [] for idx in range(1, num_nodes + 1)}
+        self.clean_shutdown_nodes = set()
+        self.client_id = client_id
+        self.print_key = print_key
+        self.print_partition = print_partition
+        self.log_level = "TRACE"
+        self.stop_timeout_sec = stop_timeout_sec
+
+        self.isolation_level = isolation_level
+        self.enable_systest_events = enable_systest_events
+        if self.enable_systest_events:
+            # Only available in 0.10.0 and up
+            assert version >= V_0_10_0_0
+
+        self.print_timestamp = print_timestamp
+        self.jaas_override_variables = jaas_override_variables or {}
+        self.kafka_opts_override = kafka_opts_override
+        self.client_prop_file_override = client_prop_file_override
+        self.consumer_properties = consumer_properties
+
+
+    def prop_file(self, node):
+        """Return a string which can be used to create a configuration file appropriate for the given node."""
+        # Process client configuration
+        prop_file = self.render('console_consumer.properties')
+        if hasattr(node, "version") and node.version <= LATEST_0_8_2:
+            # in 0.8.2.X and earlier, console consumer does not have --timeout-ms option
+            # instead, we have to pass it through the config file
+            prop_file += "\nconsumer.timeout.ms=%s\n" % str(self.consumer_timeout_ms)
+
+        # Add security properties to the config. If security protocol is not specified,
+        # use the default in the template properties.
+        self.security_config = self.kafka.security_config.client_config(prop_file, node, self.jaas_override_variables)
+        self.security_config.setup_node(node)
+
+        prop_file += str(self.security_config)
+        return prop_file
+
+
+    def start_cmd(self, node):
+        """Return the start command appropriate for the given node."""
+        args = self.args.copy()
+        args['zk_connect'] = self.kafka.zk_connect_setting()
+        args['stdout'] = ConsoleConsumer.STDOUT_CAPTURE
+        args['stderr'] = ConsoleConsumer.STDERR_CAPTURE
+        args['log_dir'] = ConsoleConsumer.LOG_DIR
+        args['log4j_config'] = ConsoleConsumer.LOG4J_CONFIG
+        args['config_file'] = ConsoleConsumer.CONFIG_FILE
+        args['stdout'] = ConsoleConsumer.STDOUT_CAPTURE
+        args['jmx_port'] = self.jmx_port
+        args['console_consumer'] = self.path.script("kafka-console-consumer.sh", node)
+        args['broker_list'] = self.kafka.bootstrap_servers(self.security_config.security_protocol)
+
+        if self.kafka_opts_override:
+            args['kafka_opts'] = "\"%s\"" % self.kafka_opts_override
+        else:
+            args['kafka_opts'] = self.security_config.kafka_opts
+
+        cmd = "export JMX_PORT=%(jmx_port)s; " \
+              "export LOG_DIR=%(log_dir)s; " \
+              "export KAFKA_LOG4J_OPTS=\"-Dlog4j.configuration=file:%(log4j_config)s\"; " \
+              "export KAFKA_OPTS=%(kafka_opts)s; " \
+              "%(console_consumer)s " \
+              "--topic %(topic)s " \
+              "--consumer.config %(config_file)s " % args
+
+        if self.new_consumer:
+            assert node.version >= V_0_9_0_0, \
+                "new_consumer is only supported if version >= 0.9.0.0, version %s" % str(node.version)
+            if node.version <= LATEST_0_10_0:
+                cmd += " --new-consumer"
+            cmd += " --bootstrap-server %(broker_list)s" % args
+            if node.version >= V_0_11_0_0:
+                cmd += " --isolation-level %s" % self.isolation_level
+        else:
+            assert node.version < V_2_0_0, \
+                "new_consumer==false is only supported if version < 2.0.0, version %s" % str(node.version)
+            cmd += " --zookeeper %(zk_connect)s" % args
+
+        if self.from_beginning:
+            cmd += " --from-beginning"
+
+        if self.consumer_timeout_ms is not None:
+            # version 0.8.X and below do not support --timeout-ms option
+            # This will be added in the properties file instead
+            if node.version > LATEST_0_8_2:
+                cmd += " --timeout-ms %s" % self.consumer_timeout_ms
+
+        if self.print_timestamp:
+            cmd += " --property print.timestamp=true"
+
+        if self.print_key:
+            cmd += " --property print.key=true"
+
+        if self.print_partition:
+            cmd += " --property print.partition=true"
+
+        # LoggingMessageFormatter was introduced after 0.9
+        if node.version > LATEST_0_9:
+            cmd += " --formatter kafka.tools.LoggingMessageFormatter"
+
+        if self.enable_systest_events:
+            # enable systest events is only available in 0.10.0 and later
+            # check the assertion here as well, in case node.version has been modified
+            assert node.version >= V_0_10_0_0
+            cmd += " --enable-systest-events"
+
+        if self.consumer_properties is not None:
+            for k, v in self.consumer_properties.items():
+                cmd += " --consumer-property %s=%s" % (k, v)
+
+        cmd += " 2>> %(stderr)s | tee -a %(stdout)s &" % args
+        return cmd
+
+    def pids(self, node):
+        return node.account.java_pids(self.java_class_name())
+
+    def alive(self, node):
+        return len(self.pids(node)) > 0
+
+    def _worker(self, idx, node):
+        node.account.ssh("mkdir -p %s" % ConsoleConsumer.PERSISTENT_ROOT, allow_fail=False)
+
+        # Create and upload config file
+        self.logger.info("console_consumer.properties:")
+
+        self.security_config = self.kafka.security_config.client_config(node=node,
+                                                                        jaas_override_variables=self.jaas_override_variables)
+        self.security_config.setup_node(node)
+
+        if self.client_prop_file_override:
+            prop_file = self.client_prop_file_override
+        else:
+            prop_file = self.prop_file(node)
+
+        self.logger.info(prop_file)
+        node.account.create_file(ConsoleConsumer.CONFIG_FILE, prop_file)
+
+        # Create and upload log properties
+        log_config = self.render('tools_log4j.properties', log_file=ConsoleConsumer.LOG_FILE)
+        node.account.create_file(ConsoleConsumer.LOG4J_CONFIG, log_config)
+
+        # Run and capture output
+        cmd = self.start_cmd(node)
+        self.logger.debug("Console consumer %d command: %s", idx, cmd)
+
+        consumer_output = node.account.ssh_capture(cmd, allow_fail=False)
+
+        with self.lock:
+            self.logger.debug("collecting following jmx objects: %s", self.jmx_object_names)
+            self.start_jmx_tool(idx, node)
+
+        for line in consumer_output:
+            msg = line.strip()
+            if msg == "shutdown_complete":
+                # Note that we can only rely on shutdown_complete message if running 0.10.0 or greater
+                if node in self.clean_shutdown_nodes:
+                    raise Exception("Unexpected shutdown event from consumer, already shutdown. Consumer index: %d" % idx)
+                self.clean_shutdown_nodes.add(node)
+            else:
+                if self.message_validator is not None:
+                    msg = self.message_validator(msg)
+                if msg is not None:
+                    self.messages_consumed[idx].append(msg)
+
+        with self.lock:
+            self.read_jmx_output(idx, node)
+
+    def start_node(self, node):
+        BackgroundThreadService.start_node(self, node)
+
+    def stop_node(self, node):
+        self.logger.info("%s Stopping node %s" % (self.__class__.__name__, str(node.account)))
+        node.account.kill_java_processes(self.java_class_name(),
+                                         clean_shutdown=True, allow_fail=True)
+
+        stopped = self.wait_node(node, timeout_sec=self.stop_timeout_sec)
+        assert stopped, "Node %s: did not stop within the specified timeout of %s seconds" % \
+                        (str(node.account), str(self.stop_timeout_sec))
+
+    def clean_node(self, node):
+        if self.alive(node):
+            self.logger.warn("%s %s was still alive at cleanup time. Killing forcefully..." %
+                             (self.__class__.__name__, node.account))
+        JmxMixin.clean_node(self, node)
+        node.account.kill_java_processes(self.java_class_name(), clean_shutdown=False, allow_fail=True)
+        node.account.ssh("rm -rf %s" % ConsoleConsumer.PERSISTENT_ROOT, allow_fail=False)
+        self.security_config.clean_node(node)
+
+    def java_class_name(self):
+        return "ConsoleConsumer"
+
+    def has_log_message(self, node, message):
+        try:
+            node.account.ssh("grep '%s' %s" % (message, ConsoleConsumer.LOG_FILE))
+        except RemoteCommandError:
+            return False
+        return True
+
+    def wait_for_offset_reset(self, node, topic, num_partitions):
+        for partition in range(num_partitions):
+            message = "Resetting offset for partition %s-%d" % (topic, partition)
+            wait_until(lambda: self.has_log_message(node, message),
+                       timeout_sec=60,
+                       err_msg="Offset not reset for partition %s-%d" % (topic, partition))
+
--- a/tests/kafkatest/services/consumer_property.py
+++ b/tests/kafkatest/services/consumer_property.py
@@ -0,0 +1,21 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""
+Define Consumer configuration property names here.
+"""
+
+GROUP_INSTANCE_ID = "group.instance.id"
+SESSION_TIMEOUT_MS = "session.timeout.ms"
--- a/tests/kafkatest/services/delegation_tokens.py
+++ b/tests/kafkatest/services/delegation_tokens.py
@@ -0,0 +1,102 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os.path
+from kafkatest.directory_layout.kafka_path import KafkaPathResolverMixin
+
+"""
+Delegation tokens is a tool to manage the lifecycle of delegation tokens.
+All commands are executed on a secured Kafka node reusing its generated jaas.conf and krb5.conf.
+"""
+
+class DelegationTokens(KafkaPathResolverMixin):
+    def __init__(self, kafka, context):
+        self.client_properties_content = """
+security.protocol=SASL_PLAINTEXT
+sasl.kerberos.service.name=kafka
+"""
+        self.context = context
+        self.command_path = self.path.script("kafka-delegation-tokens.sh")
+        self.kafka_opts = "KAFKA_OPTS=\"-Djava.security.auth.login.config=/mnt/security/jaas.conf " \
+                          "-Djava.security.krb5.conf=/mnt/security/krb5.conf\" "
+        self.kafka = kafka
+        self.bootstrap_server = " --bootstrap-server " + self.kafka.bootstrap_servers('SASL_PLAINTEXT')
+        self.base_cmd = self.kafka_opts + self.command_path + self.bootstrap_server
+        self.client_prop_path = os.path.join(self.kafka.PERSISTENT_ROOT, "client.properties")
+        self.jaas_deleg_conf_path = os.path.join(self.kafka.PERSISTENT_ROOT, "jaas_deleg.conf")
+        self.token_hmac_path = os.path.join(self.kafka.PERSISTENT_ROOT, "deleg_token_hmac.out")
+        self.delegation_token_out = os.path.join(self.kafka.PERSISTENT_ROOT, "delegation_token.out")
+        self.expire_delegation_token_out = os.path.join(self.kafka.PERSISTENT_ROOT, "expire_delegation_token.out")
+        self.renew_delegation_token_out = os.path.join(self.kafka.PERSISTENT_ROOT, "renew_delegation_token.out")
+
+        self.node = self.kafka.nodes[0]
+
+    def generate_delegation_token(self, maxlifetimeperiod=-1):
+        self.node.account.create_file(self.client_prop_path, self.client_properties_content)
+
+        cmd = self.base_cmd + "  --create" \
+                              "  --max-life-time-period %s" \
+                              "  --command-config %s > %s" % (maxlifetimeperiod, self.client_prop_path, self.delegation_token_out)
+        self.node.account.ssh(cmd, allow_fail=False)
+
+    def expire_delegation_token(self, hmac):
+        cmd = self.base_cmd + "  --expire" \
+                              "  --expiry-time-period -1" \
+                              "  --hmac %s" \
+                              "  --command-config %s > %s" % (hmac, self.client_prop_path, self.expire_delegation_token_out)
+        self.node.account.ssh(cmd, allow_fail=False)
+
+    def renew_delegation_token(self, hmac, renew_time_period=-1):
+        cmd = self.base_cmd + "  --renew" \
+                              "  --renew-time-period %s" \
+                              "  --hmac %s" \
+                              "  --command-config %s > %s" \
+              % (renew_time_period, hmac, self.client_prop_path, self.renew_delegation_token_out)
+        return self.node.account.ssh_capture(cmd, allow_fail=False)
+
+    def create_jaas_conf_with_delegation_token(self):
+        dt = self.parse_delegation_token_out()
+        jaas_deleg_content = """
+KafkaClient {
+  org.apache.kafka.common.security.scram.ScramLoginModule required
+  username="%s"
+  password="%s"
+  tokenauth=true;
+};
+""" % (dt["tokenid"], dt["hmac"])
+        self.node.account.create_file(self.jaas_deleg_conf_path, jaas_deleg_content)
+
+        return jaas_deleg_content
+
+    def token_hmac(self):
+        dt = self.parse_delegation_token_out()
+        return dt["hmac"]
+
+    def parse_delegation_token_out(self):
+        cmd = "tail -1 %s" % self.delegation_token_out
+
+        output_iter = self.node.account.ssh_capture(cmd, allow_fail=False)
+        output = ""
+        for line in output_iter:
+            output += line
+
+        tokenid, hmac, owner, renewers, issuedate, expirydate, maxdate = output.split()
+        return {"tokenid" : tokenid,
+                "hmac" : hmac,
+                "owner" : owner,
+                "renewers" : renewers,
+                "issuedate" : issuedate,
+                "expirydate" :expirydate,
+                "maxdate" : maxdate}
--- a/tests/kafkatest/services/kafka/init.py
+++ b/tests/kafkatest/services/kafka/init.py
@@ -0,0 +1,18 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from kafka import KafkaService
+from util import TopicPartition
+from config import KafkaConfig
--- a/tests/kafkatest/services/kafka/config.py
+++ b/tests/kafkatest/services/kafka/config.py
@@ -0,0 +1,48 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import config_property
+
+
+class KafkaConfig(dict):
+    """A dictionary-like container class which allows for definition of overridable default values,
+    which is also capable of "rendering" itself as a useable server.properties file.
+    """
+
+    DEFAULTS = {
+        config_property.PORT: 9092,
+        config_property.SOCKET_RECEIVE_BUFFER_BYTES: 65536,
+        config_property.LOG_DIRS: "/mnt/kafka/kafka-data-logs-1,/mnt/kafka/kafka-data-logs-2",
+        config_property.ZOOKEEPER_CONNECTION_TIMEOUT_MS: 2000
+    }
+
+    def __init__(self, **kwargs):
+        super(KafkaConfig, self).__init__(**kwargs)
+
+        # Set defaults
+        for key, val in self.DEFAULTS.items():
+            if not self.has_key(key):
+                self[key] = val
+
+    def render(self):
+        """Render self as a series of lines key=val\n, and do so in a consistent order. """
+        keys = [k for k in self.keys()]
+        keys.sort()
+
+        s = ""
+        for k in keys:
+            s += "%s=%s\n" % (k, str(self[k]))
+        return s
+
--- a/tests/kafkatest/services/kafka/config_property.py
+++ b/tests/kafkatest/services/kafka/config_property.py
@@ -0,0 +1,192 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""
+Define Kafka configuration property names here.
+"""
+
+BROKER_ID = "broker.id"
+PORT = "port"
+ADVERTISED_HOSTNAME = "advertised.host.name"
+
+NUM_NETWORK_THREADS = "num.network.threads"
+NUM_IO_THREADS = "num.io.threads"
+SOCKET_SEND_BUFFER_BYTES = "socket.send.buffer.bytes"
+SOCKET_RECEIVE_BUFFER_BYTES = "socket.receive.buffer.bytes"
+SOCKET_REQUEST_MAX_BYTES = "socket.request.max.bytes"
+LOG_DIRS = "log.dirs"
+NUM_PARTITIONS = "num.partitions"
+NUM_RECOVERY_THREADS_PER_DATA_DIR = "num.recovery.threads.per.data.dir"
+
+LOG_RETENTION_HOURS = "log.retention.hours"
+LOG_SEGMENT_BYTES = "log.segment.bytes"
+LOG_RETENTION_CHECK_INTERVAL_MS = "log.retention.check.interval.ms"
+LOG_RETENTION_MS = "log.retention.ms"
+LOG_CLEANER_ENABLE = "log.cleaner.enable"
+
+AUTO_CREATE_TOPICS_ENABLE = "auto.create.topics.enable"
+
+ZOOKEEPER_CONNECT = "zookeeper.connect"
+ZOOKEEPER_SSL_CLIENT_ENABLE = "zookeeper.ssl.client.enable"
+ZOOKEEPER_CLIENT_CNXN_SOCKET = "zookeeper.clientCnxnSocket"
+ZOOKEEPER_CONNECTION_TIMEOUT_MS = "zookeeper.connection.timeout.ms"
+INTER_BROKER_PROTOCOL_VERSION = "inter.broker.protocol.version"
+MESSAGE_FORMAT_VERSION = "log.message.format.version"
+MESSAGE_TIMESTAMP_TYPE = "message.timestamp.type"
+THROTTLING_REPLICATION_RATE_LIMIT = "replication.quota.throttled.rate"
+
+LOG_FLUSH_INTERVAL_MESSAGE = "log.flush.interval.messages"
+REPLICA_HIGHWATERMARK_CHECKPOINT_INTERVAL_MS = "replica.high.watermark.checkpoint.interval.ms"
+LOG_ROLL_TIME_MS = "log.roll.ms"
+OFFSETS_TOPIC_NUM_PARTITIONS = "offsets.topic.num.partitions"
+
+DELEGATION_TOKEN_MAX_LIFETIME_MS="delegation.token.max.lifetime.ms"
+DELEGATION_TOKEN_EXPIRY_TIME_MS="delegation.token.expiry.time.ms"
+DELEGATION_TOKEN_MASTER_KEY="delegation.token.master.key"
+SASL_ENABLED_MECHANISMS="sasl.enabled.mechanisms"
+
+
+"""
+From KafkaConfig.scala
+
+  /** ********* General Configuration ***********/
+  val MaxReservedBrokerIdProp = "reserved.broker.max.id"
+  val MessageMaxBytesProp = "message.max.bytes"
+  val NumIoThreadsProp = "num.io.threads"
+  val BackgroundThreadsProp = "background.threads"
+  val QueuedMaxRequestsProp = "queued.max.requests"
+  /** ********* Socket Server Configuration ***********/
+  val PortProp = "port"
+  val HostNameProp = "host.name"
+  val ListenersProp = "listeners"
+  val AdvertisedPortProp = "advertised.port"
+  val AdvertisedListenersProp = "advertised.listeners"
+  val SocketSendBufferBytesProp = "socket.send.buffer.bytes"
+  val SocketReceiveBufferBytesProp = "socket.receive.buffer.bytes"
+  val SocketRequestMaxBytesProp = "socket.request.max.bytes"
+  val MaxConnectionsPerIpProp = "max.connections.per.ip"
+  val MaxConnectionsPerIpOverridesProp = "max.connections.per.ip.overrides"
+  val ConnectionsMaxIdleMsProp = "connections.max.idle.ms"
+  /** ********* Log Configuration ***********/
+  val NumPartitionsProp = "num.partitions"
+  val LogDirsProp = "log.dirs"
+  val LogDirProp = "log.dir"
+  val LogSegmentBytesProp = "log.segment.bytes"
+
+  val LogRollTimeMillisProp = "log.roll.ms"
+  val LogRollTimeHoursProp = "log.roll.hours"
+
+  val LogRollTimeJitterMillisProp = "log.roll.jitter.ms"
+  val LogRollTimeJitterHoursProp = "log.roll.jitter.hours"
+
+  val LogRetentionTimeMillisProp = "log.retention.ms"
+  val LogRetentionTimeMinutesProp = "log.retention.minutes"
+  val LogRetentionTimeHoursProp = "log.retention.hours"
+
+  val LogRetentionBytesProp = "log.retention.bytes"
+  val LogCleanupIntervalMsProp = "log.retention.check.interval.ms"
+  val LogCleanupPolicyProp = "log.cleanup.policy"
+  val LogCleanerThreadsProp = "log.cleaner.threads"
+  val LogCleanerIoMaxBytesPerSecondProp = "log.cleaner.io.max.bytes.per.second"
+  val LogCleanerDedupeBufferSizeProp = "log.cleaner.dedupe.buffer.size"
+  val LogCleanerIoBufferSizeProp = "log.cleaner.io.buffer.size"
+  val LogCleanerDedupeBufferLoadFactorProp = "log.cleaner.io.buffer.load.factor"
+  val LogCleanerBackoffMsProp = "log.cleaner.backoff.ms"
+  val LogCleanerMinCleanRatioProp = "log.cleaner.min.cleanable.ratio"
+  val LogCleanerEnableProp = "log.cleaner.enable"
+  val LogCleanerDeleteRetentionMsProp = "log.cleaner.delete.retention.ms"
+  val LogIndexSizeMaxBytesProp = "log.index.size.max.bytes"
+  val LogIndexIntervalBytesProp = "log.index.interval.bytes"
+  val LogFlushIntervalMessagesProp = "log.flush.interval.messages"
+  val LogDeleteDelayMsProp = "log.segment.delete.delay.ms"
+  val LogFlushSchedulerIntervalMsProp = "log.flush.scheduler.interval.ms"
+  val LogFlushIntervalMsProp = "log.flush.interval.ms"
+  val LogFlushOffsetCheckpointIntervalMsProp = "log.flush.offset.checkpoint.interval.ms"
+  val LogPreAllocateProp = "log.preallocate"
+  val NumRecoveryThreadsPerDataDirProp = "num.recovery.threads.per.data.dir"
+  val MinInSyncReplicasProp = "min.insync.replicas"
+  /** ********* Replication configuration ***********/
+  val ControllerSocketTimeoutMsProp = "controller.socket.timeout.ms"
+  val DefaultReplicationFactorProp = "default.replication.factor"
+  val ReplicaLagTimeMaxMsProp = "replica.lag.time.max.ms"
+  val ReplicaSocketTimeoutMsProp = "replica.socket.timeout.ms"
+  val ReplicaSocketReceiveBufferBytesProp = "replica.socket.receive.buffer.bytes"
+  val ReplicaFetchMaxBytesProp = "replica.fetch.max.bytes"
+  val ReplicaFetchWaitMaxMsProp = "replica.fetch.wait.max.ms"
+  val ReplicaFetchMinBytesProp = "replica.fetch.min.bytes"
+  val ReplicaFetchBackoffMsProp = "replica.fetch.backoff.ms"
+  val NumReplicaFetchersProp = "num.replica.fetchers"
+  val ReplicaHighWatermarkCheckpointIntervalMsProp = "replica.high.watermark.checkpoint.interval.ms"
+  val FetchPurgatoryPurgeIntervalRequestsProp = "fetch.purgatory.purge.interval.requests"
+  val ProducerPurgatoryPurgeIntervalRequestsProp = "producer.purgatory.purge.interval.requests"
+  val AutoLeaderRebalanceEnableProp = "auto.leader.rebalance.enable"
+  val LeaderImbalancePerBrokerPercentageProp = "leader.imbalance.per.broker.percentage"
+  val LeaderImbalanceCheckIntervalSecondsProp = "leader.imbalance.check.interval.seconds"
+  val UncleanLeaderElectionEnableProp = "unclean.leader.election.enable"
+  val InterBrokerSecurityProtocolProp = "security.inter.broker.protocol"
+  val InterBrokerProtocolVersionProp = "inter.broker.protocol.version"
+  /** ********* Controlled shutdown configuration ***********/
+  val ControlledShutdownMaxRetriesProp = "controlled.shutdown.max.retries"
+  val ControlledShutdownRetryBackoffMsProp = "controlled.shutdown.retry.backoff.ms"
+  val ControlledShutdownEnableProp = "controlled.shutdown.enable"
+  /** ********* Consumer coordinator configuration ***********/
+  val ConsumerMinSessionTimeoutMsProp = "consumer.min.session.timeout.ms"
+  val ConsumerMaxSessionTimeoutMsProp = "consumer.max.session.timeout.ms"
+  /** ********* Offset management configuration ***********/
+  val OffsetMetadataMaxSizeProp = "offset.metadata.max.bytes"
+  val OffsetsLoadBufferSizeProp = "offsets.load.buffer.size"
+  val OffsetsTopicReplicationFactorProp = "offsets.topic.replication.factor"
+  val OffsetsTopicPartitionsProp = "offsets.topic.num.partitions"
+  val OffsetsTopicSegmentBytesProp = "offsets.topic.segment.bytes"
+  val OffsetsTopicCompressionCodecProp = "offsets.topic.compression.codec"
+  val OffsetsRetentionMinutesProp = "offsets.retention.minutes"
+  val OffsetsRetentionCheckIntervalMsProp = "offsets.retention.check.interval.ms"
+  val OffsetCommitTimeoutMsProp = "offsets.commit.timeout.ms"
+  val OffsetCommitRequiredAcksProp = "offsets.commit.required.acks"
+  /** ********* Quota Configuration ***********/
+  val ProducerQuotaBytesPerSecondDefaultProp = "quota.producer.default"
+  val ConsumerQuotaBytesPerSecondDefaultProp = "quota.consumer.default"
+  val NumQuotaSamplesProp = "quota.window.num"
+  val QuotaWindowSizeSecondsProp = "quota.window.size.seconds"
+
+  val DeleteTopicEnableProp = "delete.topic.enable"
+  val CompressionTypeProp = "compression.type"
+
+  /** ********* Kafka Metrics Configuration ***********/
+  val MetricSampleWindowMsProp = CommonClientConfigs.METRICS_SAMPLE_WINDOW_MS_CONFIG
+  val MetricNumSamplesProp: String = CommonClientConfigs.METRICS_NUM_SAMPLES_CONFIG
+  val MetricReporterClassesProp: String = CommonClientConfigs.METRIC_REPORTER_CLASSES_CONFIG
+
+  /** ********* SSL Configuration ****************/
+  val PrincipalBuilderClassProp = SSLConfigs.PRINCIPAL_BUILDER_CLASS_CONFIG
+  val SSLProtocolProp = SSLConfigs.SSL_PROTOCOL_CONFIG
+  val SSLProviderProp = SSLConfigs.SSL_PROVIDER_CONFIG
+  val SSLCipherSuitesProp = SSLConfigs.SSL_CIPHER_SUITES_CONFIG
+  val SSLEnabledProtocolsProp = SSLConfigs.SSL_ENABLED_PROTOCOLS_CONFIG
+  val SSLKeystoreTypeProp = SSLConfigs.SSL_KEYSTORE_TYPE_CONFIG
+  val SSLKeystoreLocationProp = SSLConfigs.SSL_KEYSTORE_LOCATION_CONFIG
+  val SSLKeystorePasswordProp = SSLConfigs.SSL_KEYSTORE_PASSWORD_CONFIG
+  val SSLKeyPasswordProp = SSLConfigs.SSL_KEY_PASSWORD_CONFIG
+  val SSLTruststoreTypeProp = SSLConfigs.SSL_TRUSTSTORE_TYPE_CONFIG
+  val SSLTruststoreLocationProp = SSLConfigs.SSL_TRUSTSTORE_LOCATION_CONFIG
+  val SSLTruststorePasswordProp = SSLConfigs.SSL_TRUSTSTORE_PASSWORD_CONFIG
+  val SSLKeyManagerAlgorithmProp = SSLConfigs.SSL_KEYMANAGER_ALGORITHM_CONFIG
+  val SSLTrustManagerAlgorithmProp = SSLConfigs.SSL_TRUSTMANAGER_ALGORITHM_CONFIG
+  val SSLEndpointIdentificationAlgorithmProp = SSLConfigs.SSL_ENDPOINT_IDENTIFICATION_ALGORITHM_CONFIG
+  val SSLSecureRandomImplementationProp = SSLConfigs.SSL_SECURE_RANDOM_IMPLEMENTATION_CONFIG
+  val SSLClientAuthProp = SSLConfigs.SSL_CLIENT_AUTH_CONFIG
+"""
+
+
--- a/tests/kafkatest/services/kafka/kafka.py
+++ b/tests/kafkatest/services/kafka/kafka.py
@@ -0,0 +1,897 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import collections
+import json
+import os.path
+import re
+import signal
+import time
+
+from ducktape.services.service import Service
+from ducktape.utils.util import wait_until
+from ducktape.cluster.remoteaccount import RemoteCommandError
+
+from config import KafkaConfig
+from kafkatest.directory_layout.kafka_path import KafkaPathResolverMixin
+from kafkatest.services.kafka import config_property
+from kafkatest.services.monitor.jmx import JmxMixin
+from kafkatest.services.security.minikdc import MiniKdc
+from kafkatest.services.security.listener_security_config import ListenerSecurityConfig
+from kafkatest.services.security.security_config import SecurityConfig
+from kafkatest.version import DEV_BRANCH, LATEST_0_10_0
+
+
+class KafkaListener:
+
+    def __init__(self, name, port_number, security_protocol, open=False):
+        self.name = name
+        self.port_number = port_number
+        self.security_protocol = security_protocol
+        self.open = open
+
+    def listener(self):
+        return "%s://:%s" % (self.name, str(self.port_number))
+
+    def advertised_listener(self, node):
+        return "%s://%s:%s" % (self.name, node.account.hostname, str(self.port_number))
+
+    def listener_security_protocol(self):
+        return "%s:%s" % (self.name, self.security_protocol)
+
+class KafkaService(KafkaPathResolverMixin, JmxMixin, Service):
+    PERSISTENT_ROOT = "/mnt/kafka"
+    STDOUT_STDERR_CAPTURE = os.path.join(PERSISTENT_ROOT, "server-start-stdout-stderr.log")
+    LOG4J_CONFIG = os.path.join(PERSISTENT_ROOT, "kafka-log4j.properties")
+    # Logs such as controller.log, server.log, etc all go here
+    OPERATIONAL_LOG_DIR = os.path.join(PERSISTENT_ROOT, "kafka-operational-logs")
+    OPERATIONAL_LOG_INFO_DIR = os.path.join(OPERATIONAL_LOG_DIR, "info")
+    OPERATIONAL_LOG_DEBUG_DIR = os.path.join(OPERATIONAL_LOG_DIR, "debug")
+    # Kafka log segments etc go here
+    DATA_LOG_DIR_PREFIX = os.path.join(PERSISTENT_ROOT, "kafka-data-logs")
+    DATA_LOG_DIR_1 = "%s-1" % (DATA_LOG_DIR_PREFIX)
+    DATA_LOG_DIR_2 = "%s-2" % (DATA_LOG_DIR_PREFIX)
+    CONFIG_FILE = os.path.join(PERSISTENT_ROOT, "kafka.properties")
+    # Kafka Authorizer
+    ACL_AUTHORIZER = "kafka.security.authorizer.AclAuthorizer"
+    # Old Kafka Authorizer. This is deprecated but still supported.
+    SIMPLE_AUTHORIZER = "kafka.security.auth.SimpleAclAuthorizer"
+    HEAP_DUMP_FILE = os.path.join(PERSISTENT_ROOT, "kafka_heap_dump.bin")
+    INTERBROKER_LISTENER_NAME = 'INTERNAL'
+    JAAS_CONF_PROPERTY = "java.security.auth.login.config=/mnt/security/jaas.conf"
+    KRB5_CONF = "java.security.krb5.conf=/mnt/security/krb5.conf"
+
+    logs = {
+        "kafka_server_start_stdout_stderr": {
+            "path": STDOUT_STDERR_CAPTURE,
+            "collect_default": True},
+        "kafka_operational_logs_info": {
+            "path": OPERATIONAL_LOG_INFO_DIR,
+            "collect_default": True},
+        "kafka_operational_logs_debug": {
+            "path": OPERATIONAL_LOG_DEBUG_DIR,
+            "collect_default": False},
+        "kafka_data_1": {
+            "path": DATA_LOG_DIR_1,
+            "collect_default": False},
+        "kafka_data_2": {
+            "path": DATA_LOG_DIR_2,
+            "collect_default": False},
+        "kafka_heap_dump_file": {
+            "path": HEAP_DUMP_FILE,
+            "collect_default": True}
+    }
+
+    def __init__(self, context, num_nodes, zk, security_protocol=SecurityConfig.PLAINTEXT, interbroker_security_protocol=SecurityConfig.PLAINTEXT,
+                 client_sasl_mechanism=SecurityConfig.SASL_MECHANISM_GSSAPI, interbroker_sasl_mechanism=SecurityConfig.SASL_MECHANISM_GSSAPI,
+                 authorizer_class_name=None, topics=None, version=DEV_BRANCH, jmx_object_names=None,
+                 jmx_attributes=None, zk_connect_timeout=5000, zk_session_timeout=6000, server_prop_overides=None, zk_chroot=None,
+                 zk_client_secure=False,
+                 listener_security_config=ListenerSecurityConfig(), per_node_server_prop_overrides=None, extra_kafka_opts=""):
+        """
+        :param context: test context
+        :param ZookeeperService zk:
+        :param dict topics: which topics to create automatically
+        :param str security_protocol: security protocol for clients to use
+        :param str interbroker_security_protocol: security protocol to use for broker-to-broker communication
+        :param str client_sasl_mechanism: sasl mechanism for clients to use
+        :param str interbroker_sasl_mechanism: sasl mechanism to use for broker-to-broker communication
+        :param str authorizer_class_name: which authorizer class to use
+        :param str version: which kafka version to use. Defaults to "dev" branch
+        :param jmx_object_names:
+        :param jmx_attributes:
+        :param int zk_connect_timeout:
+        :param int zk_session_timeout:
+        :param dict server_prop_overides: overrides for kafka.properties file
+        :param zk_chroot:
+        :param bool zk_client_secure: connect to Zookeeper over secure client port (TLS) when True
+        :param ListenerSecurityConfig listener_security_config: listener config to use
+        :param dict per_node_server_prop_overrides:
+        :param str extra_kafka_opts: jvm args to add to KAFKA_OPTS variable
+        """
+        Service.__init__(self, context, num_nodes)
+        JmxMixin.__init__(self, num_nodes=num_nodes, jmx_object_names=jmx_object_names, jmx_attributes=(jmx_attributes or []),
+                          root=KafkaService.PERSISTENT_ROOT)
+
+        self.zk = zk
+
+        self.security_protocol = security_protocol
+        self.client_sasl_mechanism = client_sasl_mechanism
+        self.topics = topics
+        self.minikdc = None
+        self.authorizer_class_name = authorizer_class_name
+        self.zk_set_acl = False
+        if server_prop_overides is None:
+            self.server_prop_overides = []
+        else:
+            self.server_prop_overides = server_prop_overides
+        if per_node_server_prop_overrides is None:
+            self.per_node_server_prop_overrides = {}
+        else:
+            self.per_node_server_prop_overrides = per_node_server_prop_overrides
+        self.log_level = "DEBUG"
+        self.zk_chroot = zk_chroot
+        self.zk_client_secure = zk_client_secure
+        self.listener_security_config = listener_security_config
+        self.extra_kafka_opts = extra_kafka_opts
+
+        #
+        # In a heavily loaded and not very fast machine, it is
+        # sometimes necessary to give more time for the zk client
+        # to have its session established, especially if the client
+        # is authenticating and waiting for the SaslAuthenticated
+        # in addition to the SyncConnected event.
+        #
+        # The default value for zookeeper.connect.timeout.ms is
+        # 2 seconds and here we increase it to 5 seconds, but
+        # it can be overridden by setting the corresponding parameter
+        # for this constructor.
+        self.zk_connect_timeout = zk_connect_timeout
+
+        # Also allow the session timeout to be provided explicitly,
+        # primarily so that test cases can depend on it when waiting
+        # e.g. brokers to deregister after a hard kill.
+        self.zk_session_timeout = zk_session_timeout
+
+        self.port_mappings = {
+            'PLAINTEXT': KafkaListener('PLAINTEXT', 9092, 'PLAINTEXT', False),
+            'SSL': KafkaListener('SSL', 9093, 'SSL', False),
+            'SASL_PLAINTEXT': KafkaListener('SASL_PLAINTEXT', 9094, 'SASL_PLAINTEXT', False),
+            'SASL_SSL': KafkaListener('SASL_SSL', 9095, 'SASL_SSL', False),
+            KafkaService.INTERBROKER_LISTENER_NAME:
+                KafkaListener(KafkaService.INTERBROKER_LISTENER_NAME, 9099, None, False)
+        }
+
+        self.interbroker_listener = None
+        self.setup_interbroker_listener(interbroker_security_protocol, self.listener_security_config.use_separate_interbroker_listener)
+        self.interbroker_sasl_mechanism = interbroker_sasl_mechanism
+
+        for node in self.nodes:
+            node.version = version
+            node.config = KafkaConfig(**{config_property.BROKER_ID: self.idx(node)})
+
+    def set_version(self, version):
+        for node in self.nodes:
+            node.version = version
+
+    @property
+    def interbroker_security_protocol(self):
+        return self.interbroker_listener.security_protocol
+
+    # this is required for backwards compatibility - there are a lot of tests that set this property explicitly
+    # meaning 'use one of the existing listeners that match given security protocol, do not use custom listener'
+    @interbroker_security_protocol.setter
+    def interbroker_security_protocol(self, security_protocol):
+        self.setup_interbroker_listener(security_protocol, use_separate_listener=False)
+
+    def setup_interbroker_listener(self, security_protocol, use_separate_listener=False):
+        self.listener_security_config.use_separate_interbroker_listener = use_separate_listener
+
+        if self.listener_security_config.use_separate_interbroker_listener:
+            # do not close existing port here since it is not used exclusively for interbroker communication
+            self.interbroker_listener = self.port_mappings[KafkaService.INTERBROKER_LISTENER_NAME]
+            self.interbroker_listener.security_protocol = security_protocol
+        else:
+            # close dedicated interbroker port, so it's not dangling in 'listeners' and 'advertised.listeners'
+            self.close_port(KafkaService.INTERBROKER_LISTENER_NAME)
+            self.interbroker_listener = self.port_mappings[security_protocol]
+
+    @property
+    def security_config(self):
+        config = SecurityConfig(self.context, self.security_protocol, self.interbroker_listener.security_protocol,
+                                zk_sasl=self.zk.zk_sasl, zk_tls=self.zk_client_secure,
+                                client_sasl_mechanism=self.client_sasl_mechanism,
+                                interbroker_sasl_mechanism=self.interbroker_sasl_mechanism,
+                                listener_security_config=self.listener_security_config)
+        for port in self.port_mappings.values():
+            if port.open:
+                config.enable_security_protocol(port.security_protocol)
+        return config
+
+    def open_port(self, listener_name):
+        self.port_mappings[listener_name].open = True
+
+    def close_port(self, listener_name):
+        self.port_mappings[listener_name].open = False
+
+    def start_minikdc_if_necessary(self, add_principals=""):
+        if self.security_config.has_sasl:
+            if self.minikdc is None:
+                self.minikdc = MiniKdc(self.context, self.nodes, extra_principals = add_principals)
+                self.minikdc.start()
+        else:
+            self.minikdc = None
+
+    def alive(self, node):
+        return len(self.pids(node)) > 0
+
+    def start(self, add_principals="", use_zk_to_create_topic=True):
+        if self.zk_client_secure and not self.zk.zk_client_secure_port:
+            raise Exception("Unable to start Kafka: TLS to Zookeeper requested but Zookeeper secure port not enabled")
+        self.open_port(self.security_protocol)
+        self.interbroker_listener.open = True
+
+        self.start_minikdc_if_necessary(add_principals)
+        self._ensure_zk_chroot()
+
+        Service.start(self)
+
+        self.logger.info("Waiting for brokers to register at ZK")
+
+        retries = 30
+        expected_broker_ids = set(self.nodes)
+        wait_until(lambda: {node for node in self.nodes if self.is_registered(node)} == expected_broker_ids, 30, 1)
+
+        if retries == 0:
+            raise RuntimeError("Kafka servers didn't register at ZK within 30 seconds")
+
+        # Create topics if necessary
+        if self.topics is not None:
+            for topic, topic_cfg in self.topics.items():
+                if topic_cfg is None:
+                    topic_cfg = {}
+
+                topic_cfg["topic"] = topic
+                self.create_topic(topic_cfg, use_zk_to_create_topic=use_zk_to_create_topic)
+
+    def _ensure_zk_chroot(self):
+        self.logger.info("Ensuring zk_chroot %s exists", self.zk_chroot)
+        if self.zk_chroot:
+            if not self.zk_chroot.startswith('/'):
+                raise Exception("Zookeeper chroot must start with '/' but found " + self.zk_chroot)
+
+            parts = self.zk_chroot.split('/')[1:]
+            for i in range(len(parts)):
+                self.zk.create('/' + '/'.join(parts[:i+1]))
+
+    def set_protocol_and_port(self, node):
+        listeners = []
+        advertised_listeners = []
+        protocol_map = []
+
+        for port in self.port_mappings.values():
+            if port.open:
+                listeners.append(port.listener())
+                advertised_listeners.append(port.advertised_listener(node))
+                protocol_map.append(port.listener_security_protocol())
+
+        self.listeners = ','.join(listeners)
+        self.advertised_listeners = ','.join(advertised_listeners)
+        self.listener_security_protocol_map = ','.join(protocol_map)
+        self.interbroker_bootstrap_servers = self.__bootstrap_servers(self.interbroker_listener, True)
+
+    def prop_file(self, node):
+        self.set_protocol_and_port(node)
+
+        #load template configs as dictionary
+        config_template = self.render('kafka.properties', node=node, broker_id=self.idx(node),
+                                      security_config=self.security_config, num_nodes=self.num_nodes,
+                                      listener_security_config=self.listener_security_config)
+
+        configs = dict( l.rstrip().split('=', 1) for l in config_template.split('\n')
+                        if not l.startswith("#") and "=" in l )
+
+        #load specific test override configs
+        override_configs = KafkaConfig(**node.config)
+        override_configs[config_property.ADVERTISED_HOSTNAME] = node.account.hostname
+        override_configs[config_property.ZOOKEEPER_CONNECT] = self.zk_connect_setting()
+        if self.zk_client_secure:
+            override_configs[config_property.ZOOKEEPER_SSL_CLIENT_ENABLE] = 'true'
+            override_configs[config_property.ZOOKEEPER_CLIENT_CNXN_SOCKET] = 'org.apache.zookeeper.ClientCnxnSocketNetty'
+        else:
+            override_configs[config_property.ZOOKEEPER_SSL_CLIENT_ENABLE] = 'false'
+
+        for prop in self.server_prop_overides:
+            override_configs[prop[0]] = prop[1]
+
+        for prop in self.per_node_server_prop_overrides.get(self.idx(node), []):
+            override_configs[prop[0]] = prop[1]
+
+        #update template configs with test override configs
+        configs.update(override_configs)
+
+        prop_file = self.render_configs(configs)
+        return prop_file
+
+    def render_configs(self, configs):
+        """Render self as a series of lines key=val\n, and do so in a consistent order. """
+        keys = [k for k in configs.keys()]
+        keys.sort()
+
+        s = ""
+        for k in keys:
+            s += "%s=%s\n" % (k, str(configs[k]))
+        return s
+
+    def start_cmd(self, node):
+        cmd = "export JMX_PORT=%d; " % self.jmx_port
+        cmd += "export KAFKA_LOG4J_OPTS=\"-Dlog4j.configuration=file:%s\"; " % self.LOG4J_CONFIG
+        heap_kafka_opts = "-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=%s" % \
+                          self.logs["kafka_heap_dump_file"]["path"]
+        security_kafka_opts = self.security_config.kafka_opts.strip('\"')
+        cmd += "export KAFKA_OPTS=\"%s %s %s\"; " % (heap_kafka_opts, security_kafka_opts, self.extra_kafka_opts)
+        cmd += "%s %s 1>> %s 2>> %s &" % \
+               (self.path.script("kafka-server-start.sh", node),
+                KafkaService.CONFIG_FILE,
+                KafkaService.STDOUT_STDERR_CAPTURE,
+                KafkaService.STDOUT_STDERR_CAPTURE)
+        return cmd
+
+    def start_node(self, node, timeout_sec=60):
+        node.account.mkdirs(KafkaService.PERSISTENT_ROOT)
+        prop_file = self.prop_file(node)
+        self.logger.info("kafka.properties:")
+        self.logger.info(prop_file)
+        node.account.create_file(KafkaService.CONFIG_FILE, prop_file)
+        node.account.create_file(self.LOG4J_CONFIG, self.render('log4j.properties', log_dir=KafkaService.OPERATIONAL_LOG_DIR))
+
+        self.security_config.setup_node(node)
+        self.security_config.setup_credentials(node, self.path, self.zk_connect_setting(), broker=True)
+
+        cmd = self.start_cmd(node)
+        self.logger.debug("Attempting to start KafkaService on %s with command: %s" % (str(node.account), cmd))
+        with node.account.monitor_log(KafkaService.STDOUT_STDERR_CAPTURE) as monitor:
+            node.account.ssh(cmd)
+            # Kafka 1.0.0 and higher don't have a space between "Kafka" and "Server"
+            monitor.wait_until("Kafka\s*Server.*started", timeout_sec=timeout_sec, backoff_sec=.25,
+                               err_msg="Kafka server didn't finish startup in %d seconds" % timeout_sec)
+
+        # Credentials for inter-broker communication are created before starting Kafka.
+        # Client credentials are created after starting Kafka so that both loading of
+        # existing credentials from ZK and dynamic update of credentials in Kafka are tested.
+        self.security_config.setup_credentials(node, self.path, self.zk_connect_setting(), broker=False)
+
+        self.start_jmx_tool(self.idx(node), node)
+        if len(self.pids(node)) == 0:
+            raise Exception("No process ids recorded on node %s" % node.account.hostname)
+
+    def pids(self, node):
+        """Return process ids associated with running processes on the given node."""
+        try:
+            cmd = "jcmd | grep -e %s | awk '{print $1}'" % self.java_class_name()
+            pid_arr = [pid for pid in node.account.ssh_capture(cmd, allow_fail=True, callback=int)]
+            return pid_arr
+        except (RemoteCommandError, ValueError) as e:
+            return []
+
+    def signal_node(self, node, sig=signal.SIGTERM):
+        pids = self.pids(node)
+        for pid in pids:
+            node.account.signal(pid, sig)
+
+    def signal_leader(self, topic, partition=0, sig=signal.SIGTERM):
+        leader = self.leader(topic, partition)
+        self.signal_node(leader, sig)
+
+    def stop_node(self, node, clean_shutdown=True, timeout_sec=60):
+        pids = self.pids(node)
+        sig = signal.SIGTERM if clean_shutdown else signal.SIGKILL
+
+        for pid in pids:
+            node.account.signal(pid, sig, allow_fail=False)
+
+        try:
+            wait_until(lambda: len(self.pids(node)) == 0, timeout_sec=timeout_sec,
+                       err_msg="Kafka node failed to stop in %d seconds" % timeout_sec)
+        except Exception:
+            self.thread_dump(node)
+            raise
+
+    def thread_dump(self, node):
+        for pid in self.pids(node):
+            try:
+                node.account.signal(pid, signal.SIGQUIT, allow_fail=True)
+            except:
+                self.logger.warn("Could not dump threads on node")
+
+    def clean_node(self, node):
+        JmxMixin.clean_node(self, node)
+        self.security_config.clean_node(node)
+        node.account.kill_java_processes(self.java_class_name(),
+                                         clean_shutdown=False, allow_fail=True)
+        node.account.ssh("sudo rm -rf -- %s" % KafkaService.PERSISTENT_ROOT, allow_fail=False)
+
+    def _kafka_topics_cmd(self, node, use_zk_connection=True):
+        """
+        Returns kafka-topics.sh command path with jaas configuration and krb5 environment variable
+        set. If Admin client is not going to be used, don't set the environment variable.
+        """
+        kafka_topic_script = self.path.script("kafka-topics.sh", node)
+        skip_security_settings = use_zk_connection or not node.version.topic_command_supports_bootstrap_server()
+        return kafka_topic_script if skip_security_settings else \
+            "KAFKA_OPTS='-D%s -D%s' %s" % (KafkaService.JAAS_CONF_PROPERTY, KafkaService.KRB5_CONF, kafka_topic_script)
+
+    def _kafka_topics_cmd_config(self, node, use_zk_connection=True):
+        """
+        Return --command-config parameter to the kafka-topics.sh command. The config parameter specifies
+        the security settings that AdminClient uses to connect to a secure kafka server.
+        """
+        skip_command_config = use_zk_connection or not node.version.topic_command_supports_bootstrap_server()
+        return "" if skip_command_config else " --command-config <(echo '%s')" % (self.security_config.client_config())
+
+    def create_topic(self, topic_cfg, node=None, use_zk_to_create_topic=True):
+        """Run the admin tool create topic command.
+        Specifying node is optional, and may be done if for different kafka nodes have different versions,
+        and we care where command gets run.
+
+        If the node is not specified, run the command from self.nodes[0]
+        """
+        if node is None:
+            node = self.nodes[0]
+        self.logger.info("Creating topic %s with settings %s",
+                         topic_cfg["topic"], topic_cfg)
+
+        use_zk_connection = topic_cfg.get('if-not-exists', False) or use_zk_to_create_topic
+
+        cmd = "%(kafka_topics_cmd)s %(connection_string)s --create --topic %(topic)s " % {
+            'kafka_topics_cmd': self._kafka_topics_cmd(node, use_zk_connection),
+            'connection_string': self._connect_setting(node, use_zk_connection),
+            'topic': topic_cfg.get("topic"),
+        }
+        if 'replica-assignment' in topic_cfg:
+            cmd += " --replica-assignment %(replica-assignment)s" % {
+                'replica-assignment': topic_cfg.get('replica-assignment')
+            }
+        else:
+            cmd += " --partitions %(partitions)d --replication-factor %(replication-factor)d" % {
+                'partitions': topic_cfg.get('partitions', 1),
+                'replication-factor': topic_cfg.get('replication-factor', 1)
+            }
+
+        if topic_cfg.get('if-not-exists', False):
+            cmd += ' --if-not-exists'
+
+        if "configs" in topic_cfg.keys() and topic_cfg["configs"] is not None:
+            for config_name, config_value in topic_cfg["configs"].items():
+                cmd += " --config %s=%s" % (config_name, str(config_value))
+
+        cmd += self._kafka_topics_cmd_config(node, use_zk_connection)
+
+        self.logger.info("Running topic creation command...\n%s" % cmd)
+        node.account.ssh(cmd)
+
+    def delete_topic(self, topic, node=None):
+        """
+        Delete a topic with the topics command
+        :param topic:
+        :param node:
+        :return:
+        """
+        if node is None:
+            node = self.nodes[0]
+        self.logger.info("Deleting topic %s" % topic)
+        kafka_topic_script = self.path.script("kafka-topics.sh", node)
+
+        cmd = kafka_topic_script + " "
+        cmd += "--bootstrap-server %(bootstrap_servers)s --delete --topic %(topic)s " % {
+            'bootstrap_servers': self.bootstrap_servers(self.security_protocol),
+            'topic': topic
+        }
+        self.logger.info("Running topic delete command...\n%s" % cmd)
+        node.account.ssh(cmd)
+
+    def describe_topic(self, topic, node=None, use_zk_to_describe_topic=True):
+        if node is None:
+            node = self.nodes[0]
+        cmd = "%s %s --topic %s --describe %s" % \
+              (self._kafka_topics_cmd(node=node, use_zk_connection=use_zk_to_describe_topic),
+               self._connect_setting(node=node, use_zk_connection=use_zk_to_describe_topic),
+               topic, self._kafka_topics_cmd_config(node=node, use_zk_connection=use_zk_to_describe_topic))
+
+        self.logger.info("Running topic describe command...\n%s" % cmd)
+        output = ""
+        for line in node.account.ssh_capture(cmd):
+            output += line
+        return output
+
+    def list_topics(self, node=None, use_zk_to_list_topic=True):
+        if node is None:
+            node = self.nodes[0]
+        cmd = "%s %s --list %s" % (self._kafka_topics_cmd(node, use_zk_to_list_topic),
+                                   self._connect_setting(node, use_zk_to_list_topic),
+                                   self._kafka_topics_cmd_config(node, use_zk_to_list_topic))
+        for line in node.account.ssh_capture(cmd):
+            if not line.startswith("SLF4J"):
+                yield line.rstrip()
+
+    def alter_message_format(self, topic, msg_format_version, node=None):
+        if node is None:
+            node = self.nodes[0]
+        self.logger.info("Altering message format version for topic %s with format %s", topic, msg_format_version)
+        cmd = "%s --zookeeper %s %s --entity-name %s --entity-type topics --alter --add-config message.format.version=%s" % \
+              (self.path.script("kafka-configs.sh", node), self.zk_connect_setting(), self.zk.zkTlsConfigFileOption(), topic, msg_format_version)
+        self.logger.info("Running alter message format command...\n%s" % cmd)
+        node.account.ssh(cmd)
+
+    def set_unclean_leader_election(self, topic, value=True, node=None):
+        if node is None:
+            node = self.nodes[0]
+        if value is True:
+            self.logger.info("Enabling unclean leader election for topic %s", topic)
+        else:
+            self.logger.info("Disabling unclean leader election for topic %s", topic)
+        cmd = "%s --zookeeper %s %s --entity-name %s --entity-type topics --alter --add-config unclean.leader.election.enable=%s" % \
+              (self.path.script("kafka-configs.sh", node), self.zk_connect_setting(), self.zk.zkTlsConfigFileOption(), topic, str(value).lower())
+        self.logger.info("Running alter unclean leader command...\n%s" % cmd)
+        node.account.ssh(cmd)
+
+    def parse_describe_topic(self, topic_description):
+        """Parse output of kafka-topics.sh --describe (or describe_topic() method above), which is a string of form
+        PartitionCount:2\tReplicationFactor:2\tConfigs:
+            Topic: test_topic\ttPartition: 0\tLeader: 3\tReplicas: 3,1\tIsr: 3,1
+            Topic: test_topic\tPartition: 1\tLeader: 1\tReplicas: 1,2\tIsr: 1,2
+        into a dictionary structure appropriate for use with reassign-partitions tool:
+        {
+            "partitions": [
+                {"topic": "test_topic", "partition": 0, "replicas": [3, 1]},
+                {"topic": "test_topic", "partition": 1, "replicas": [1, 2]}
+            ]
+        }
+        """
+        lines = map(lambda x: x.strip(), topic_description.split("\n"))
+        partitions = []
+        for line in lines:
+            m = re.match(".*Leader:.*", line)
+            if m is None:
+                continue
+
+            fields = line.split("\t")
+            # ["Partition: 4", "Leader: 0"] -> ["4", "0"]
+            fields = map(lambda x: x.split(" ")[1], fields)
+            partitions.append(
+                {"topic": fields[0],
+                 "partition": int(fields[1]),
+                 "replicas": map(int, fields[3].split(','))})
+        return {"partitions": partitions}
+
+    def verify_reassign_partitions(self, reassignment, node=None):
+        """Run the reassign partitions admin tool in "verify" mode
+        """
+        if node is None:
+            node = self.nodes[0]
+
+        json_file = "/tmp/%s_reassign.json" % str(time.time())
+
+        # reassignment to json
+        json_str = json.dumps(reassignment)
+        json_str = json.dumps(json_str)
+
+        # create command
+        cmd = "echo %s > %s && " % (json_str, json_file)
+        cmd += "%s " % self.path.script("kafka-reassign-partitions.sh", node)
+        cmd += "--zookeeper %s " % self.zk_connect_setting()
+        cmd += "--reassignment-json-file %s " % json_file
+        cmd += "--verify "
+        cmd += "&& sleep 1 && rm -f %s" % json_file
+
+        # send command
+        self.logger.info("Verifying partition reassignment...")
+        self.logger.debug(cmd)
+        output = ""
+        for line in node.account.ssh_capture(cmd):
+            output += line
+
+        self.logger.debug(output)
+
+        if re.match(".*Reassignment of partition.*failed.*",
+                    output.replace('\n', '')) is not None:
+            return False
+
+        if re.match(".*is still in progress.*",
+                    output.replace('\n', '')) is not None:
+            return False
+
+        return True
+
+    def execute_reassign_partitions(self, reassignment, node=None,
+                                    throttle=None):
+        """Run the reassign partitions admin tool in "verify" mode
+        """
+        if node is None:
+            node = self.nodes[0]
+        json_file = "/tmp/%s_reassign.json" % str(time.time())
+
+        # reassignment to json
+        json_str = json.dumps(reassignment)
+        json_str = json.dumps(json_str)
+
+        # create command
+        cmd = "echo %s > %s && " % (json_str, json_file)
+        cmd += "%s " % self.path.script( "kafka-reassign-partitions.sh", node)
+        cmd += "--zookeeper %s " % self.zk_connect_setting()
+        cmd += "--reassignment-json-file %s " % json_file
+        cmd += "--execute"
+        if throttle is not None:
+            cmd += " --throttle %d" % throttle
+        cmd += " && sleep 1 && rm -f %s" % json_file
+
+        # send command
+        self.logger.info("Executing parition reassignment...")
+        self.logger.debug(cmd)
+        output = ""
+        for line in node.account.ssh_capture(cmd):
+            output += line
+
+        self.logger.debug("Verify partition reassignment:")
+        self.logger.debug(output)
+
+    def search_data_files(self, topic, messages):
+        """Check if a set of messages made it into the Kakfa data files. Note that
+        this method takes no account of replication. It simply looks for the
+        payload in all the partition files of the specified topic. 'messages' should be
+        an array of numbers. The list of missing messages is returned.
+        """
+        payload_match = "payload: " + "$|payload: ".join(str(x) for x in messages) + "$"
+        found = set([])
+        self.logger.debug("number of unique missing messages we will search for: %d",
+                          len(messages))
+        for node in self.nodes:
+            # Grab all .log files in directories prefixed with this topic
+            files = node.account.ssh_capture("find %s* -regex  '.*/%s-.*/[^/]*.log'" % (KafkaService.DATA_LOG_DIR_PREFIX, topic))
+
+            # Check each data file to see if it contains the messages we want
+            for log in files:
+                cmd = "%s kafka.tools.DumpLogSegments --print-data-log --files %s | grep -E \"%s\"" % \
+                      (self.path.script("kafka-run-class.sh", node), log.strip(), payload_match)
+
+                for line in node.account.ssh_capture(cmd, allow_fail=True):
+                    for val in messages:
+                        if line.strip().endswith("payload: "+str(val)):
+                            self.logger.debug("Found %s in data-file [%s] in line: [%s]" % (val, log.strip(), line.strip()))
+                            found.add(val)
+
+        self.logger.debug("Number of unique messages found in the log: %d",
+                          len(found))
+        missing = list(set(messages) - found)
+
+        if len(missing) > 0:
+            self.logger.warn("The following values were not found in the data files: " + str(missing))
+
+        return missing
+
+    def restart_cluster(self, clean_shutdown=True, timeout_sec=60, after_each_broker_restart=None, *args):
+        for node in self.nodes:
+            self.restart_node(node, clean_shutdown=clean_shutdown, timeout_sec=timeout_sec)
+            if after_each_broker_restart is not None:
+                after_each_broker_restart(*args)
+
+    def restart_node(self, node, clean_shutdown=True, timeout_sec=60):
+        """Restart the given node."""
+        self.stop_node(node, clean_shutdown, timeout_sec)
+        self.start_node(node, timeout_sec)
+
+    def isr_idx_list(self, topic, partition=0):
+        """ Get in-sync replica list the given topic and partition.
+        """
+        self.logger.debug("Querying zookeeper to find in-sync replicas for topic %s and partition %d" % (topic, partition))
+        zk_path = "/brokers/topics/%s/partitions/%d/state" % (topic, partition)
+        partition_state = self.zk.query(zk_path, chroot=self.zk_chroot)
+
+        if partition_state is None:
+            raise Exception("Error finding partition state for topic %s and partition %d." % (topic, partition))
+
+        partition_state = json.loads(partition_state)
+        self.logger.info(partition_state)
+
+        isr_idx_list = partition_state["isr"]
+        self.logger.info("Isr for topic %s and partition %d is now: %s" % (topic, partition, isr_idx_list))
+        return isr_idx_list
+
+    def replicas(self, topic, partition=0):
+        """ Get the assigned replicas for the given topic and partition.
+        """
+        self.logger.debug("Querying zookeeper to find assigned replicas for topic %s and partition %d" % (topic, partition))
+        zk_path = "/brokers/topics/%s" % (topic)
+        assignment = self.zk.query(zk_path, chroot=self.zk_chroot)
+
+        if assignment is None:
+            raise Exception("Error finding partition state for topic %s and partition %d." % (topic, partition))
+
+        assignment = json.loads(assignment)
+        self.logger.info(assignment)
+
+        replicas = assignment["partitions"][str(partition)]
+
+        self.logger.info("Assigned replicas for topic %s and partition %d is now: %s" % (topic, partition, replicas))
+        return [self.get_node(replica) for replica in replicas]
+
+    def leader(self, topic, partition=0):
+        """ Get the leader replica for the given topic and partition.
+        """
+        self.logger.debug("Querying zookeeper to find leader replica for topic %s and partition %d" % (topic, partition))
+        zk_path = "/brokers/topics/%s/partitions/%d/state" % (topic, partition)
+        partition_state = self.zk.query(zk_path, chroot=self.zk_chroot)
+
+        if partition_state is None:
+            raise Exception("Error finding partition state for topic %s and partition %d." % (topic, partition))
+
+        partition_state = json.loads(partition_state)
+        self.logger.info(partition_state)
+
+        leader_idx = int(partition_state["leader"])
+        self.logger.info("Leader for topic %s and partition %d is now: %d" % (topic, partition, leader_idx))
+        return self.get_node(leader_idx)
+
+    def cluster_id(self):
+        """ Get the current cluster id
+        """
+        self.logger.debug("Querying ZooKeeper to retrieve cluster id")
+        cluster = self.zk.query("/cluster/id", chroot=self.zk_chroot)
+
+        try:
+            return json.loads(cluster)['id'] if cluster else None
+        except:
+            self.logger.debug("Data in /cluster/id znode could not be parsed. Data = %s" % cluster)
+            raise
+
+    def check_protocol_errors(self, node):
+        """ Checks for common protocol exceptions due to invalid inter broker protocol handling.
+            While such errors can and should be checked in other ways, checking the logs is a worthwhile failsafe.
+            """
+        for node in self.nodes:
+            exit_code = node.account.ssh("grep -e 'java.lang.IllegalArgumentException: Invalid version' -e SchemaException %s/*"
+                                         % KafkaService.OPERATIONAL_LOG_DEBUG_DIR, allow_fail=True)
+            if exit_code != 1:
+                return False
+        return True
+
+    def list_consumer_groups(self, node=None, command_config=None):
+        """ Get list of consumer groups.
+        """
+        if node is None:
+            node = self.nodes[0]
+        consumer_group_script = self.path.script("kafka-consumer-groups.sh", node)
+
+        if command_config is None:
+            command_config = ""
+        else:
+            command_config = "--command-config " + command_config
+
+        cmd = "%s --bootstrap-server %s %s --list" % \
+              (consumer_group_script,
+               self.bootstrap_servers(self.security_protocol),
+               command_config)
+        output = ""
+        self.logger.debug(cmd)
+        for line in node.account.ssh_capture(cmd):
+            if not line.startswith("SLF4J"):
+                output += line
+        self.logger.debug(output)
+        return output
+
+    def describe_consumer_group(self, group, node=None, command_config=None):
+        """ Describe a consumer group.
+        """
+        if node is None:
+            node = self.nodes[0]
+        consumer_group_script = self.path.script("kafka-consumer-groups.sh", node)
+
+        if command_config is None:
+            command_config = ""
+        else:
+            command_config = "--command-config " + command_config
+
+        cmd = "%s --bootstrap-server %s %s --group %s --describe" % \
+              (consumer_group_script,
+               self.bootstrap_servers(self.security_protocol),
+               command_config, group)
+
+        output = ""
+        self.logger.debug(cmd)
+        for line in node.account.ssh_capture(cmd):
+            if not (line.startswith("SLF4J") or line.startswith("TOPIC") or line.startswith("Could not fetch offset")):
+                output += line
+        self.logger.debug(output)
+        return output
+
+    def zk_connect_setting(self):
+        return self.zk.connect_setting(self.zk_chroot, self.zk_client_secure)
+
+    def _connect_setting(self, node, use_zk_connection=True):
+        """
+        Checks if --bootstrap-server config is supported, if yes then returns a string with
+        bootstrap server, otherwise returns zookeeper connection string.
+        """
+        if node.version.topic_command_supports_bootstrap_server() and not use_zk_connection:
+            connection_setting = "--bootstrap-server %s" % (self.bootstrap_servers(self.security_protocol))
+        else:
+            connection_setting = "--zookeeper %s" % (self.zk_connect_setting())
+
+        return connection_setting
+
+    def __bootstrap_servers(self, port, validate=True, offline_nodes=[]):
+        if validate and not port.open:
+            raise ValueError("We are retrieving bootstrap servers for the port: %s which is not currently open. - " %
+                             str(port.port_number))
+
+        return ','.join([node.account.hostname + ":" + str(port.port_number)
+                         for node in self.nodes
+                         if node not in offline_nodes])
+
+    def bootstrap_servers(self, protocol='PLAINTEXT', validate=True, offline_nodes=[]):
+        """Return comma-delimited list of brokers in this cluster formatted as HOSTNAME1:PORT1,HOSTNAME:PORT2,...
+
+        This is the format expected by many config files.
+        """
+        port_mapping = self.port_mappings[protocol]
+        self.logger.info("Bootstrap client port is: " + str(port_mapping.port_number))
+        return self.__bootstrap_servers(port_mapping, validate, offline_nodes)
+
+    def controller(self):
+        """ Get the controller node
+        """
+        self.logger.debug("Querying zookeeper to find controller broker")
+        controller_info = self.zk.query("/controller", chroot=self.zk_chroot)
+
+        if controller_info is None:
+            raise Exception("Error finding controller info")
+
+        controller_info = json.loads(controller_info)
+        self.logger.debug(controller_info)
+
+        controller_idx = int(controller_info["brokerid"])
+        self.logger.info("Controller's ID: %d" % (controller_idx))
+        return self.get_node(controller_idx)
+
+    def is_registered(self, node):
+        """
+        Check whether a broker is registered in Zookeeper
+        """
+        self.logger.debug("Querying zookeeper to see if broker %s is registered", str(node))
+        broker_info = self.zk.query("/brokers/ids/%s" % self.idx(node), chroot=self.zk_chroot)
+        self.logger.debug("Broker info: %s", broker_info)
+        return broker_info is not None
+
+    def get_offset_shell(self, topic, partitions, max_wait_ms, offsets, time):
+        node = self.nodes[0]
+
+        cmd = self.path.script("kafka-run-class.sh", node)
+        cmd += " kafka.tools.GetOffsetShell"
+        cmd += " --topic %s --broker-list %s --max-wait-ms %s --offsets %s --time %s" % (topic, self.bootstrap_servers(self.security_protocol), max_wait_ms, offsets, time)
+
+        if partitions:
+            cmd += '  --partitions %s' % partitions
+
+        cmd += " 2>> %s/get_offset_shell.log" % KafkaService.PERSISTENT_ROOT
+        cmd += " | tee -a %s/get_offset_shell.log &" % KafkaService.PERSISTENT_ROOT
+        output = ""
+        self.logger.debug(cmd)
+        for line in node.account.ssh_capture(cmd):
+            output += line
+        self.logger.debug(output)
+        return output
+
+    def java_class_name(self):
+        return "kafka.Kafka"
--- a/tests/kafkatest/services/kafka/templates/kafka.properties
+++ b/tests/kafkatest/services/kafka/templates/kafka.properties
@@ -0,0 +1,91 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+# 
+#    http://www.apache.org/licenses/LICENSE-2.0
+# 
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+# see kafka.server.KafkaConfig for additional details and defaults
+advertised.host.name={{ node.account.hostname }}
+
+
+listeners={{ listeners }}
+advertised.listeners={{ advertised_listeners }}
+listener.security.protocol.map={{ listener_security_protocol_map }}
+
+{% if node.version.supports_named_listeners() %}
+inter.broker.listener.name={{ interbroker_listener.name }}
+{% else %}
+security.inter.broker.protocol={{ interbroker_listener.security_protocol }}
+{% endif %}
+
+{% for k, v in listener_security_config.client_listener_overrides.iteritems() %}
+{% if listener_security_config.requires_sasl_mechanism_prefix(k) %}
+listener.name.{{ security_protocol.lower() }}.{{ security_config.client_sasl_mechanism.lower() }}.{{ k }}={{ v }}
+{% else %}
+listener.name.{{ security_protocol.lower() }}.{{ k }}={{ v }}
+{% endif %}
+{% endfor %}
+
+{% if interbroker_listener.name != security_protocol %}
+{% for k, v in listener_security_config.interbroker_listener_overrides.iteritems() %}
+{% if listener_security_config.requires_sasl_mechanism_prefix(k) %}
+listener.name.{{ interbroker_listener.name.lower() }}.{{ security_config.interbroker_sasl_mechanism.lower() }}.{{ k }}={{ v }}
+{% else %}
+listener.name.{{ interbroker_listener.name.lower() }}.{{ k }}={{ v }}
+{% endif %}
+{% endfor %}
+{% endif %}
+
+ssl.keystore.location=/mnt/security/test.keystore.jks
+ssl.keystore.password=test-ks-passwd
+ssl.key.password=test-ks-passwd
+ssl.keystore.type=JKS
+ssl.truststore.location=/mnt/security/test.truststore.jks
+ssl.truststore.password=test-ts-passwd
+ssl.truststore.type=JKS
+ssl.endpoint.identification.algorithm=HTTPS
+# Zookeeper TLS settings
+#
+# Note that zookeeper.ssl.client.enable will be set to true or false elsewhere, as appropriate.
+# If it is false then these ZK keystore/truststore settings will have no effect.  If it is true then
+# zookeeper.clientCnxnSocket will also be set elsewhere (to org.apache.zookeeper.ClientCnxnSocketNetty)
+{% if not zk.zk_tls_encrypt_only %}
+zookeeper.ssl.keystore.location=/mnt/security/test.keystore.jks
+zookeeper.ssl.keystore.password=test-ks-passwd
+{% endif %}
+zookeeper.ssl.truststore.location=/mnt/security/test.truststore.jks
+zookeeper.ssl.truststore.password=test-ts-passwd
+#
+sasl.mechanism.inter.broker.protocol={{ security_config.interbroker_sasl_mechanism }}
+sasl.enabled.mechanisms={{ ",".join(security_config.enabled_sasl_mechanisms) }}
+sasl.kerberos.service.name=kafka
+{% if authorizer_class_name is not none %}
+ssl.client.auth=required
+authorizer.class.name={{ authorizer_class_name }}
+{% endif %}
+
+zookeeper.set.acl={{"true" if zk_set_acl else "false"}}
+
+zookeeper.connection.timeout.ms={{ zk_connect_timeout }}
+zookeeper.session.timeout.ms={{ zk_session_timeout }}
+
+{% if replica_lag is defined %}
+replica.lag.time.max.ms={{replica_lag}}
+{% endif %}
+
+{% if auto_create_topics_enable is defined and auto_create_topics_enable is not none %}
+auto.create.topics.enable={{ auto_create_topics_enable }}
+{% endif %}
+offsets.topic.num.partitions={{ num_nodes }}
+offsets.topic.replication.factor={{ 3 if num_nodes > 3 else num_nodes }}
+# Set to a low, but non-zero value to exercise this path without making tests much slower
+group.initial.rebalance.delay.ms=100
--- a/tests/kafkatest/services/kafka/templates/log4j.properties
+++ b/tests/kafkatest/services/kafka/templates/log4j.properties
@@ -0,0 +1,136 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+log4j.rootLogger={{ log_level|default("DEBUG") }}, stdout
+
+log4j.appender.stdout=org.apache.log4j.ConsoleAppender
+log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
+log4j.appender.stdout.layout.ConversionPattern=[%d] %p %m (%c)%n
+
+# INFO level appenders
+log4j.appender.kafkaInfoAppender=org.apache.log4j.DailyRollingFileAppender
+log4j.appender.kafkaInfoAppender.DatePattern='.'yyyy-MM-dd-HH
+log4j.appender.kafkaInfoAppender.File={{ log_dir }}/info/server.log
+log4j.appender.kafkaInfoAppender.layout=org.apache.log4j.PatternLayout
+log4j.appender.kafkaInfoAppender.layout.ConversionPattern=[%d] %p %m (%c)%n
+log4j.appender.kafkaInfoAppender.Threshold=INFO
+
+log4j.appender.stateChangeInfoAppender=org.apache.log4j.DailyRollingFileAppender
+log4j.appender.stateChangeInfoAppender.DatePattern='.'yyyy-MM-dd-HH
+log4j.appender.stateChangeInfoAppender.File={{ log_dir }}/info/state-change.log
+log4j.appender.stateChangeInfoAppender.layout=org.apache.log4j.PatternLayout
+log4j.appender.stateChangeInfoAppender.layout.ConversionPattern=[%d] %p %m (%c)%n
+log4j.appender.stateChangeInfoAppender.Threshold=INFO
+
+log4j.appender.requestInfoAppender=org.apache.log4j.DailyRollingFileAppender
+log4j.appender.requestInfoAppender.DatePattern='.'yyyy-MM-dd-HH
+log4j.appender.requestInfoAppender.File={{ log_dir }}/info/kafka-request.log
+log4j.appender.requestInfoAppender.layout=org.apache.log4j.PatternLayout
+log4j.appender.requestInfoAppender.layout.ConversionPattern=[%d] %p %m (%c)%n
+log4j.appender.requestInfoAppender.Threshold=INFO
+
+log4j.appender.cleanerInfoAppender=org.apache.log4j.DailyRollingFileAppender
+log4j.appender.cleanerInfoAppender.DatePattern='.'yyyy-MM-dd-HH
+log4j.appender.cleanerInfoAppender.File={{ log_dir }}/info/log-cleaner.log
+log4j.appender.cleanerInfoAppender.layout=org.apache.log4j.PatternLayout
+log4j.appender.cleanerInfoAppender.layout.ConversionPattern=[%d] %p %m (%c)%n
+log4j.appender.cleanerInfoAppender.Threshold=INFO
+
+log4j.appender.controllerInfoAppender=org.apache.log4j.DailyRollingFileAppender
+log4j.appender.controllerInfoAppender.DatePattern='.'yyyy-MM-dd-HH
+log4j.appender.controllerInfoAppender.File={{ log_dir }}/info/controller.log
+log4j.appender.controllerInfoAppender.layout=org.apache.log4j.PatternLayout
+log4j.appender.controllerInfoAppender.layout.ConversionPattern=[%d] %p %m (%c)%n
+log4j.appender.controllerInfoAppender.Threshold=INFO
+
+log4j.appender.authorizerInfoAppender=org.apache.log4j.DailyRollingFileAppender
+log4j.appender.authorizerInfoAppender.DatePattern='.'yyyy-MM-dd-HH
+log4j.appender.authorizerInfoAppender.File={{ log_dir }}/info/kafka-authorizer.log
+log4j.appender.authorizerInfoAppender.layout=org.apache.log4j.PatternLayout
+log4j.appender.authorizerInfoAppender.layout.ConversionPattern=[%d] %p %m (%c)%n
+log4j.appender.authorizerInfoAppender.Threshold=INFO
+
+# DEBUG level appenders
+log4j.appender.kafkaDebugAppender=org.apache.log4j.DailyRollingFileAppender
+log4j.appender.kafkaDebugAppender.DatePattern='.'yyyy-MM-dd-HH
+log4j.appender.kafkaDebugAppender.File={{ log_dir }}/debug/server.log
+log4j.appender.kafkaDebugAppender.layout=org.apache.log4j.PatternLayout
+log4j.appender.kafkaDebugAppender.layout.ConversionPattern=[%d] %p %m (%c)%n
+log4j.appender.kafkaDebugAppender.Threshold=DEBUG
+
+log4j.appender.stateChangeDebugAppender=org.apache.log4j.DailyRollingFileAppender
+log4j.appender.stateChangeDebugAppender.DatePattern='.'yyyy-MM-dd-HH
+log4j.appender.stateChangeDebugAppender.File={{ log_dir }}/debug/state-change.log
+log4j.appender.stateChangeDebugAppender.layout=org.apache.log4j.PatternLayout
+log4j.appender.stateChangeDebugAppender.layout.ConversionPattern=[%d] %p %m (%c)%n
+log4j.appender.stateChangeDebugAppender.Threshold=DEBUG
+
+log4j.appender.requestDebugAppender=org.apache.log4j.DailyRollingFileAppender
+log4j.appender.requestDebugAppender.DatePattern='.'yyyy-MM-dd-HH
+log4j.appender.requestDebugAppender.File={{ log_dir }}/debug/kafka-request.log
+log4j.appender.requestDebugAppender.layout=org.apache.log4j.PatternLayout
+log4j.appender.requestDebugAppender.layout.ConversionPattern=[%d] %p %m (%c)%n
+log4j.appender.requestDebugAppender.Threshold=DEBUG
+
+log4j.appender.cleanerDebugAppender=org.apache.log4j.DailyRollingFileAppender
+log4j.appender.cleanerDebugAppender.DatePattern='.'yyyy-MM-dd-HH
+log4j.appender.cleanerDebugAppender.File={{ log_dir }}/debug/log-cleaner.log
+log4j.appender.cleanerDebugAppender.layout=org.apache.log4j.PatternLayout
+log4j.appender.cleanerDebugAppender.layout.ConversionPattern=[%d] %p %m (%c)%n
+log4j.appender.cleanerDebugAppender.Threshold=DEBUG
+
+log4j.appender.controllerDebugAppender=org.apache.log4j.DailyRollingFileAppender
+log4j.appender.controllerDebugAppender.DatePattern='.'yyyy-MM-dd-HH
+log4j.appender.controllerDebugAppender.File={{ log_dir }}/debug/controller.log
+log4j.appender.controllerDebugAppender.layout=org.apache.log4j.PatternLayout
+log4j.appender.controllerDebugAppender.layout.ConversionPattern=[%d] %p %m (%c)%n
+log4j.appender.controllerDebugAppender.Threshold=DEBUG
+
+log4j.appender.authorizerDebugAppender=org.apache.log4j.DailyRollingFileAppender
+log4j.appender.authorizerDebugAppender.DatePattern='.'yyyy-MM-dd-HH
+log4j.appender.authorizerDebugAppender.File={{ log_dir }}/debug/kafka-authorizer.log
+log4j.appender.authorizerDebugAppender.layout=org.apache.log4j.PatternLayout
+log4j.appender.authorizerDebugAppender.layout.ConversionPattern=[%d] %p %m (%c)%n
+log4j.appender.authorizerDebugAppender.Threshold=DEBUG
+
+# Turn on all our debugging info
+log4j.logger.kafka.producer.async.DefaultEventHandler={{ log_level|default("DEBUG") }}, kafkaInfoAppender, kafkaDebugAppender
+log4j.logger.kafka.client.ClientUtils={{ log_level|default("DEBUG") }}, kafkaInfoAppender, kafkaDebugAppender
+log4j.logger.kafka.perf={{ log_level|default("DEBUG") }}, kafkaInfoAppender, kafkaDebugAppender
+log4j.logger.kafka.perf.ProducerPerformance$ProducerThread={{ log_level|default("DEBUG") }}, kafkaInfoAppender, kafkaDebugAppender
+log4j.logger.kafka={{ log_level|default("DEBUG") }}, kafkaInfoAppender, kafkaDebugAppender
+
+log4j.logger.kafka.network.RequestChannel$={{ log_level|default("DEBUG") }}, requestInfoAppender, requestDebugAppender
+log4j.additivity.kafka.network.RequestChannel$=false
+
+log4j.logger.kafka.network.Processor={{ log_level|default("DEBUG") }}, requestInfoAppender, requestDebugAppender
+log4j.logger.kafka.server.KafkaApis={{ log_level|default("DEBUG") }}, requestInfoAppender, requestDebugAppender
+log4j.additivity.kafka.server.KafkaApis=false
+log4j.logger.kafka.request.logger={{ log_level|default("DEBUG") }}, requestInfoAppender, requestDebugAppender
+log4j.additivity.kafka.request.logger=false
+
+log4j.logger.kafka.controller={{ log_level|default("DEBUG") }}, controllerInfoAppender, controllerDebugAppender
+log4j.additivity.kafka.controller=false
+
+log4j.logger.kafka.log.LogCleaner={{ log_level|default("DEBUG") }}, cleanerInfoAppender, cleanerDebugAppender
+log4j.additivity.kafka.log.LogCleaner=false
+
+log4j.logger.state.change.logger={{ log_level|default("DEBUG") }}, stateChangeInfoAppender, stateChangeDebugAppender
+log4j.additivity.state.change.logger=false
+
+#Change this to debug to get the actual audit log for authorizer.
+log4j.logger.kafka.authorizer.logger={{ log_level|default("DEBUG") }}, authorizerInfoAppender, authorizerDebugAppender
+log4j.additivity.kafka.authorizer.logger=false
+
--- a/tests/kafkatest/services/kafka/util.py
+++ b/tests/kafkatest/services/kafka/util.py
@@ -0,0 +1,18 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from collections import namedtuple
+
+TopicPartition = namedtuple('TopicPartition', ['topic', 'partition'])
--- a/tests/kafkatest/services/kafka_log4j_appender.py
+++ b/tests/kafkatest/services/kafka_log4j_appender.py
@@ -0,0 +1,83 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from ducktape.services.background_thread import BackgroundThreadService
+
+from kafkatest.directory_layout.kafka_path import KafkaPathResolverMixin
+from kafkatest.services.security.security_config import SecurityConfig
+
+
+class KafkaLog4jAppender(KafkaPathResolverMixin, BackgroundThreadService):
+
+    logs = {
+        "producer_log": {
+            "path": "/mnt/kafka_log4j_appender.log",
+            "collect_default": False}
+    }
+
+    def __init__(self, context, num_nodes, kafka, topic, max_messages=-1, security_protocol="PLAINTEXT"):
+        super(KafkaLog4jAppender, self).__init__(context, num_nodes)
+
+        self.kafka = kafka
+        self.topic = topic
+        self.max_messages = max_messages
+        self.security_protocol = security_protocol
+        self.security_config = SecurityConfig(self.context, security_protocol)
+        self.stop_timeout_sec = 30
+
+    def _worker(self, idx, node):
+        cmd = self.start_cmd(node)
+        self.logger.debug("VerifiableLog4jAppender %d command: %s" % (idx, cmd))
+        self.security_config.setup_node(node)
+        node.account.ssh(cmd)
+
+    def start_cmd(self, node):
+        cmd = self.path.script("kafka-run-class.sh", node)
+        cmd += " "
+        cmd += self.java_class_name()
+        cmd += " --topic %s --broker-list %s" % (self.topic, self.kafka.bootstrap_servers(self.security_protocol))
+
+        if self.max_messages > 0:
+            cmd += " --max-messages %s" % str(self.max_messages)
+        if self.security_protocol != SecurityConfig.PLAINTEXT:
+            cmd += " --security-protocol %s" % str(self.security_protocol)
+        if self.security_protocol == SecurityConfig.SSL or self.security_protocol == SecurityConfig.SASL_SSL:
+            cmd += " --ssl-truststore-location %s" % str(SecurityConfig.TRUSTSTORE_PATH)
+            cmd += " --ssl-truststore-password %s" % str(SecurityConfig.ssl_stores.truststore_passwd)
+        if self.security_protocol == SecurityConfig.SASL_PLAINTEXT or \
+                self.security_protocol == SecurityConfig.SASL_SSL or \
+                self.security_protocol == SecurityConfig.SASL_MECHANISM_GSSAPI or \
+                self.security_protocol == SecurityConfig.SASL_MECHANISM_PLAIN:
+            cmd += " --sasl-kerberos-service-name %s" % str('kafka')
+            cmd += " --client-jaas-conf-path %s" % str(SecurityConfig.JAAS_CONF_PATH)
+            cmd += " --kerb5-conf-path %s" % str(SecurityConfig.KRB5CONF_PATH)
+
+        cmd += " 2>> /mnt/kafka_log4j_appender.log | tee -a /mnt/kafka_log4j_appender.log &"
+        return cmd
+
+    def stop_node(self, node):
+        node.account.kill_java_processes(self.java_class_name(), allow_fail=False)
+
+        stopped = self.wait_node(node, timeout_sec=self.stop_timeout_sec)
+        assert stopped, "Node %s: did not stop within the specified timeout of %s seconds" % \
+                        (str(node.account), str(self.stop_timeout_sec))
+
+    def clean_node(self, node):
+        node.account.kill_java_processes(self.java_class_name(), clean_shutdown=False,
+                                         allow_fail=False)
+        node.account.ssh("rm -rf /mnt/kafka_log4j_appender.log", allow_fail=False)
+
+    def java_class_name(self):
+        return "org.apache.kafka.tools.VerifiableLog4jAppender"
--- a/tests/kafkatest/services/log_compaction_tester.py
+++ b/tests/kafkatest/services/log_compaction_tester.py
@@ -0,0 +1,88 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+
+from ducktape.services.background_thread import BackgroundThreadService
+
+from kafkatest.directory_layout.kafka_path import KafkaPathResolverMixin, CORE_LIBS_JAR_NAME, CORE_DEPENDANT_TEST_LIBS_JAR_NAME
+from kafkatest.services.security.security_config import SecurityConfig
+from kafkatest.version import DEV_BRANCH
+
+class LogCompactionTester(KafkaPathResolverMixin, BackgroundThreadService):
+
+    OUTPUT_DIR = "/mnt/logcompaction_tester"
+    LOG_PATH = os.path.join(OUTPUT_DIR, "logcompaction_tester_stdout.log")
+    VERIFICATION_STRING = "Data verification is completed"
+
+    logs = {
+        "tool_logs": {
+            "path": LOG_PATH,
+            "collect_default": True}
+    }
+
+    def __init__(self, context, kafka, security_protocol="PLAINTEXT", stop_timeout_sec=30):
+        super(LogCompactionTester, self).__init__(context, 1)
+
+        self.kafka = kafka
+        self.security_protocol = security_protocol
+        self.security_config = SecurityConfig(self.context, security_protocol)
+        self.stop_timeout_sec = stop_timeout_sec
+        self.log_compaction_completed = False
+
+    def _worker(self, idx, node):
+        node.account.ssh("mkdir -p %s" % LogCompactionTester.OUTPUT_DIR)
+        cmd = self.start_cmd(node)
+        self.logger.info("LogCompactionTester %d command: %s" % (idx, cmd))
+        self.security_config.setup_node(node)
+        for line in node.account.ssh_capture(cmd):
+            self.logger.debug("Checking line:{}".format(line))
+
+            if line.startswith(LogCompactionTester.VERIFICATION_STRING):
+                self.log_compaction_completed = True
+
+    def start_cmd(self, node):
+        core_libs_jar = self.path.jar(CORE_LIBS_JAR_NAME, DEV_BRANCH)
+        core_dependant_test_libs_jar = self.path.jar(CORE_DEPENDANT_TEST_LIBS_JAR_NAME, DEV_BRANCH)
+
+        cmd = "for file in %s; do CLASSPATH=$CLASSPATH:$file; done;" % core_libs_jar
+        cmd += " for file in %s; do CLASSPATH=$CLASSPATH:$file; done;" % core_dependant_test_libs_jar
+        cmd += " export CLASSPATH;"
+        cmd += self.path.script("kafka-run-class.sh", node)
+        cmd += " %s" % self.java_class_name()
+        cmd += " --bootstrap-server %s --messages 1000000 --sleep 20 --duplicates 10 --percent-deletes 10" % (self.kafka.bootstrap_servers(self.security_protocol))
+
+        cmd += " 2>> %s | tee -a %s &" % (self.logs["tool_logs"]["path"], self.logs["tool_logs"]["path"])
+        return cmd
+
+    def stop_node(self, node):
+        node.account.kill_java_processes(self.java_class_name(), clean_shutdown=True,
+                                         allow_fail=True)
+
+        stopped = self.wait_node(node, timeout_sec=self.stop_timeout_sec)
+        assert stopped, "Node %s: did not stop within the specified timeout of %s seconds" % \
+                        (str(node.account), str(self.stop_timeout_sec))
+
+    def clean_node(self, node):
+        node.account.kill_java_processes(self.java_class_name(), clean_shutdown=False,
+                                         allow_fail=True)
+        node.account.ssh("rm -rf %s" % LogCompactionTester.OUTPUT_DIR, allow_fail=False)
+
+    def java_class_name(self):
+        return "kafka.tools.LogCompactionTester"
+
+    @property
+    def is_done(self):
+        return self.log_compaction_completed
--- a/tests/kafkatest/services/mirror_maker.py
+++ b/tests/kafkatest/services/mirror_maker.py
@@ -0,0 +1,164 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+
+from ducktape.services.service import Service
+from ducktape.utils.util import wait_until
+
+from kafkatest.directory_layout.kafka_path import KafkaPathResolverMixin
+
+"""
+MirrorMaker is a tool for mirroring data between two Kafka clusters.
+"""
+
+class MirrorMaker(KafkaPathResolverMixin, Service):
+
+    # Root directory for persistent output
+    PERSISTENT_ROOT = "/mnt/mirror_maker"
+    LOG_DIR = os.path.join(PERSISTENT_ROOT, "logs")
+    LOG_FILE = os.path.join(LOG_DIR, "mirror_maker.log")
+    LOG4J_CONFIG = os.path.join(PERSISTENT_ROOT, "tools-log4j.properties")
+    PRODUCER_CONFIG = os.path.join(PERSISTENT_ROOT, "producer.properties")
+    CONSUMER_CONFIG = os.path.join(PERSISTENT_ROOT, "consumer.properties")
+
+    logs = {
+        "mirror_maker_log": {
+            "path": LOG_FILE,
+            "collect_default": True}
+        }
+
+    def __init__(self, context, num_nodes, source, target, whitelist=None, num_streams=1,
+                 consumer_timeout_ms=None, offsets_storage="kafka",
+                 offset_commit_interval_ms=60000, log_level="DEBUG", producer_interceptor_classes=None):
+        """
+        MirrorMaker mirrors messages from one or more source clusters to a single destination cluster.
+
+        Args:
+            context:                    standard context
+            source:                     source Kafka cluster
+            target:                     target Kafka cluster to which data will be mirrored
+            whitelist:                  whitelist regex for topics to mirror
+            blacklist:                  blacklist regex for topics not to mirror
+            num_streams:                number of consumer threads to create; can be a single int, or a list with
+                                            one value per node, allowing num_streams to be the same for each node,
+                                            or configured independently per-node
+            consumer_timeout_ms:        consumer stops if t > consumer_timeout_ms elapses between consecutive messages
+            offsets_storage:            used for consumer offsets.storage property
+            offset_commit_interval_ms:  how frequently the mirror maker consumer commits offsets
+        """
+        super(MirrorMaker, self).__init__(context, num_nodes=num_nodes)
+        self.log_level = log_level
+        self.consumer_timeout_ms = consumer_timeout_ms
+        self.num_streams = num_streams
+        if not isinstance(num_streams, int):
+            # if not an integer, num_streams should be configured per-node
+            assert len(num_streams) == num_nodes
+        self.whitelist = whitelist
+        self.source = source
+        self.target = target
+
+        self.offsets_storage = offsets_storage.lower()
+        if not (self.offsets_storage in ["kafka", "zookeeper"]):
+            raise Exception("offsets_storage should be 'kafka' or 'zookeeper'. Instead found %s" % self.offsets_storage)
+
+        self.offset_commit_interval_ms = offset_commit_interval_ms
+        self.producer_interceptor_classes = producer_interceptor_classes
+        self.external_jars = None
+
+        # These properties are potentially used by third-party tests.
+        self.source_auto_offset_reset = None
+        self.partition_assignment_strategy = None
+
+    def start_cmd(self, node):
+        cmd = "export LOG_DIR=%s;" % MirrorMaker.LOG_DIR
+        cmd += " export KAFKA_LOG4J_OPTS=\"-Dlog4j.configuration=file:%s\";" % MirrorMaker.LOG4J_CONFIG
+        cmd += " export KAFKA_OPTS=%s;" % self.security_config.kafka_opts
+        # add external dependencies, for instance for interceptors
+        if self.external_jars is not None:
+            cmd += "for file in %s; do CLASSPATH=$CLASSPATH:$file; done; " % self.external_jars
+            cmd += "export CLASSPATH; "
+        cmd += " %s %s" % (self.path.script("kafka-run-class.sh", node),
+                           self.java_class_name())
+        cmd += " --consumer.config %s" % MirrorMaker.CONSUMER_CONFIG
+        cmd += " --producer.config %s" % MirrorMaker.PRODUCER_CONFIG
+        cmd += " --offset.commit.interval.ms %s" % str(self.offset_commit_interval_ms)
+        if isinstance(self.num_streams, int):
+            cmd += " --num.streams %d" % self.num_streams
+        else:
+            # config num_streams separately on each node
+            cmd += " --num.streams %d" % self.num_streams[self.idx(node) - 1]
+        if self.whitelist is not None:
+            cmd += " --whitelist=\"%s\"" % self.whitelist
+
+        cmd += " 1>> %s 2>> %s &" % (MirrorMaker.LOG_FILE, MirrorMaker.LOG_FILE)
+        return cmd
+
+    def pids(self, node):
+        return node.account.java_pids(self.java_class_name())
+
+    def alive(self, node):
+        return len(self.pids(node)) > 0
+
+    def start_node(self, node):
+        node.account.ssh("mkdir -p %s" % MirrorMaker.PERSISTENT_ROOT, allow_fail=False)
+        node.account.ssh("mkdir -p %s" % MirrorMaker.LOG_DIR, allow_fail=False)
+
+        self.security_config = self.source.security_config.client_config()
+        self.security_config.setup_node(node)
+
+        # Create, upload one consumer config file for source cluster
+        consumer_props = self.render("mirror_maker_consumer.properties")
+        consumer_props += str(self.security_config)
+
+        node.account.create_file(MirrorMaker.CONSUMER_CONFIG, consumer_props)
+        self.logger.info("Mirrormaker consumer props:\n" + consumer_props)
+
+        # Create, upload producer properties file for target cluster
+        producer_props = self.render('mirror_maker_producer.properties')
+        producer_props += str(self.security_config)
+        self.logger.info("Mirrormaker producer props:\n" + producer_props)
+        node.account.create_file(MirrorMaker.PRODUCER_CONFIG, producer_props)
+
+
+        # Create and upload log properties
+        log_config = self.render('tools_log4j.properties', log_file=MirrorMaker.LOG_FILE)
+        node.account.create_file(MirrorMaker.LOG4J_CONFIG, log_config)
+
+        # Run mirror maker
+        cmd = self.start_cmd(node)
+        self.logger.debug("Mirror maker command: %s", cmd)
+        node.account.ssh(cmd, allow_fail=False)
+        wait_until(lambda: self.alive(node), timeout_sec=30, backoff_sec=.5,
+                   err_msg="Mirror maker took to long to start.")
+        self.logger.debug("Mirror maker is alive")
+
+    def stop_node(self, node, clean_shutdown=True):
+        node.account.kill_java_processes(self.java_class_name(), allow_fail=True,
+                                         clean_shutdown=clean_shutdown)
+        wait_until(lambda: not self.alive(node), timeout_sec=30, backoff_sec=.5,
+                   err_msg="Mirror maker took to long to stop.")
+
+    def clean_node(self, node):
+        if self.alive(node):
+            self.logger.warn("%s %s was still alive at cleanup time. Killing forcefully..." %
+                             (self.__class__.__name__, node.account))
+        node.account.kill_java_processes(self.java_class_name(), clean_shutdown=False,
+                                         allow_fail=True)
+        node.account.ssh("rm -rf %s" % MirrorMaker.PERSISTENT_ROOT, allow_fail=False)
+        self.security_config.clean_node(node)
+
+    def java_class_name(self):
+        return "kafka.tools.MirrorMaker"
--- a/tests/kafkatest/services/monitor/init.py
+++ b/tests/kafkatest/services/monitor/init.py
@@ -0,0 +1,14 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
--- a/tests/kafkatest/services/monitor/http.py
+++ b/tests/kafkatest/services/monitor/http.py
@@ -0,0 +1,228 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from BaseHTTPServer import BaseHTTPRequestHandler, HTTPServer
+from collections import defaultdict, namedtuple
+import json
+from threading import Thread
+from select import select
+import socket
+
+MetricKey = namedtuple('MetricKey', ['host', 'client_id', 'name', 'group', 'tags'])
+MetricValue = namedtuple('MetricValue', ['time', 'value'])
+
+# Python's logging library doesn't define anything more detailed than DEBUG, but we'd like a finer-grained setting for
+# for highly detailed messages, e.g. logging every single incoming request.
+TRACE = 5
+
+
+class HttpMetricsCollector(object):
+    """
+    HttpMetricsCollector enables collection of metrics from various Kafka clients instrumented with the
+    PushHttpMetricsReporter. It starts a web server locally and provides the necessary configuration for clients
+    to automatically report metrics data to this server. It also provides basic functionality for querying the
+    recorded metrics. This class can be used either as a mixin or standalone object.
+    """
+
+    # The port to listen on on the worker node, which will be forwarded to the port listening on this driver node
+    REMOTE_PORT = 6789
+
+    def __init__(self, **kwargs):
+        """
+        Create a new HttpMetricsCollector
+        :param period the period, in seconds, between updates that the metrics reporter configuration should define.
+               defaults to reporting once per second
+        :param args:
+        :param kwargs:
+        """
+        self._http_metrics_period = kwargs.pop('period', 1)
+
+        super(HttpMetricsCollector, self).__init__(**kwargs)
+
+        # TODO: currently we maintain just a simple map from all key info -> value. However, some key fields are far
+        # more common to filter on, so we'd want to index by them, e.g. host, client.id, metric name.
+        self._http_metrics = defaultdict(list)
+
+        self._httpd = HTTPServer(('', 0), _MetricsReceiver)
+        self._httpd.parent = self
+        self._httpd.metrics = self._http_metrics
+
+        self._http_metrics_thread = Thread(target=self._run_http_metrics_httpd,
+                                           name='http-metrics-thread[%s]' % str(self))
+        self._http_metrics_thread.start()
+
+        self._forwarders = {}
+
+    @property
+    def http_metrics_url(self):
+        """
+        :return: the URL to use when reporting metrics
+        """
+        return "http://%s:%d" % ("localhost", self.REMOTE_PORT)
+
+    @property
+    def http_metrics_client_configs(self):
+        """
+        Get client configurations that can be used to report data to this collector. Put these in a properties file for
+        clients (e.g. console producer or consumer) to have them push metrics to this driver. Note that in some cases
+        (e.g. streams, connect) these settings may need to be prefixed.
+        :return: a dictionary of client configurations that will direct a client to report metrics to this collector
+        """
+        return {
+            "metric.reporters": "org.apache.kafka.tools.PushHttpMetricsReporter",
+            "metrics.url": self.http_metrics_url,
+            "metrics.period": self._http_metrics_period,
+        }
+
+    def start_node(self, node):
+        local_port = self._httpd.socket.getsockname()[1]
+        self.logger.debug('HttpMetricsCollector listening on %s', local_port)
+        self._forwarders[self.idx(node)] = _ReverseForwarder(self.logger, node, self.REMOTE_PORT, local_port)
+
+        super(HttpMetricsCollector, self).start_node(node)
+
+    def stop(self):
+        super(HttpMetricsCollector, self).stop()
+
+        if self._http_metrics_thread:
+            self.logger.debug("Shutting down metrics httpd")
+            self._httpd.shutdown()
+            self._http_metrics_thread.join()
+            self.logger.debug("Finished shutting down metrics httpd")
+
+    def stop_node(self, node):
+        super(HttpMetricsCollector, self).stop_node(node)
+
+        idx = self.idx(node)
+        self._forwarders[idx].stop()
+        del self._forwarders[idx]
+
+    def metrics(self, host=None, client_id=None, name=None, group=None, tags=None):
+        """
+        Get any collected metrics that match the specified parameters, yielding each as a tuple of
+        (key, [<timestamp, value>, ...]) values.
+        """
+        for k, values in self._http_metrics.iteritems():
+            if ((host is None or host == k.host) and
+                    (client_id is None or client_id == k.client_id) and
+                    (name is None or name == k.name) and
+                    (group is None or group == k.group) and
+                    (tags is None or tags == k.tags)):
+                yield (k, values)
+
+    def _run_http_metrics_httpd(self):
+        self._httpd.serve_forever()
+
+
+class _MetricsReceiver(BaseHTTPRequestHandler):
+    """
+    HTTP request handler that accepts requests from the PushHttpMetricsReporter and stores them back into the parent
+    HttpMetricsCollector
+    """
+
+    def log_message(self, format, *args, **kwargs):
+        # Don't do any logging here so we get rid of the mostly useless per-request Apache log-style info that spams
+        # the debug log
+        pass
+
+    def do_POST(self):
+        data = self.rfile.read(int(self.headers['Content-Length']))
+        data = json.loads(data)
+        self.server.parent.logger.log(TRACE, "POST %s\n\n%s\n%s", self.path, self.headers,
+                                      json.dumps(data, indent=4, separators=(',', ': ')))
+        self.send_response(204)
+        self.end_headers()
+
+        client = data['client']
+        host = client['host']
+        client_id = client['client_id']
+        ts = client['time']
+        metrics = data['metrics']
+        for raw_metric in metrics:
+            name = raw_metric['name']
+            group = raw_metric['group']
+            # Convert to tuple of pairs because dicts & lists are unhashable
+            tags = tuple([(k, v) for k, v in raw_metric['tags'].iteritems()]),
+            value = raw_metric['value']
+
+            key = MetricKey(host=host, client_id=client_id, name=name, group=group, tags=tags)
+            metric_value = MetricValue(time=ts, value=value)
+
+            self.server.metrics[key].append(metric_value)
+
+
+class _ReverseForwarder(object):
+    """
+    Runs reverse forwarding of a port on a node to a local port. This allows you to setup a server on the test driver
+    that only assumes we have basic SSH access that ducktape guarantees is available for worker nodes.
+    """
+
+    def __init__(self, logger, node, remote_port, local_port):
+        self.logger = logger
+        self._node = node
+        self._local_port = local_port
+        self._remote_port = remote_port
+
+        self.logger.debug('Forwarding %s port %d to driver port %d', node, remote_port, local_port)
+
+        self._stopping = False
+
+        self._transport = node.account.ssh_client.get_transport()
+        self._transport.request_port_forward('', remote_port)
+
+        self._accept_thread = Thread(target=self._accept)
+        self._accept_thread.start()
+
+    def stop(self):
+        self._stopping = True
+        self._accept_thread.join(30)
+        if self._accept_thread.isAlive():
+            raise RuntimeError("Failed to stop reverse forwarder on %s", self._node)
+        self._transport.cancel_port_forward('', self._remote_port)
+
+    def _accept(self):
+        while not self._stopping:
+            chan = self._transport.accept(1)
+            if chan is None:
+                continue
+            thr = Thread(target=self._handler, args=(chan,))
+            thr.setDaemon(True)
+            thr.start()
+
+    def _handler(self, chan):
+        sock = socket.socket()
+        try:
+            sock.connect(("localhost", self._local_port))
+        except Exception as e:
+            self.logger.error('Forwarding request to port %d failed: %r', self._local_port, e)
+            return
+
+        self.logger.log(TRACE, 'Connected! Tunnel open %r -> %r -> %d', chan.origin_addr, chan.getpeername(),
+                        self._local_port)
+        while True:
+            r, w, x = select([sock, chan], [], [])
+            if sock in r:
+                data = sock.recv(1024)
+                if len(data) == 0:
+                    break
+                chan.send(data)
+            if chan in r:
+                data = chan.recv(1024)
+                if len(data) == 0:
+                    break
+                sock.send(data)
+        chan.close()
+        sock.close()
+        self.logger.log(TRACE, 'Tunnel closed from %r', chan.origin_addr)
--- a/tests/kafkatest/services/monitor/jmx.py
+++ b/tests/kafkatest/services/monitor/jmx.py
@@ -0,0 +1,141 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+
+from ducktape.cluster.remoteaccount import RemoteCommandError
+from ducktape.utils.util import wait_until
+from kafkatest.version import get_version, V_0_11_0_0, DEV_BRANCH
+
+class JmxMixin(object):
+    """This mixin helps existing service subclasses start JmxTool on their worker nodes and collect jmx stats.
+
+    A couple things worth noting:
+    - this is not a service in its own right.
+    - we assume the service using JmxMixin also uses KafkaPathResolverMixin
+    - this uses the --wait option for JmxTool, so the list of object names must be explicit; no patterns are permitted
+    """
+    def __init__(self, num_nodes, jmx_object_names=None, jmx_attributes=None, jmx_poll_ms=1000, root="/mnt"):
+        self.jmx_object_names = jmx_object_names
+        self.jmx_attributes = jmx_attributes or []
+        self.jmx_poll_ms = jmx_poll_ms
+        self.jmx_port = 9192
+
+        self.started = [False] * num_nodes
+        self.jmx_stats = [{} for x in range(num_nodes)]
+        self.maximum_jmx_value = {}  # map from object_attribute_name to maximum value observed over time
+        self.average_jmx_value = {}  # map from object_attribute_name to average value observed over time
+
+        self.jmx_tool_log = os.path.join(root, "jmx_tool.log")
+        self.jmx_tool_err_log = os.path.join(root, "jmx_tool.err.log")
+
+    def clean_node(self, node):
+        node.account.kill_java_processes(self.jmx_class_name(), clean_shutdown=False,
+                                         allow_fail=True)
+        idx = self.idx(node)
+        self.started[idx-1] = False
+        node.account.ssh("rm -f -- %s %s" % (self.jmx_tool_log, self.jmx_tool_err_log), allow_fail=False)
+
+    def start_jmx_tool(self, idx, node):
+        if self.jmx_object_names is None:
+            self.logger.debug("%s: Not starting jmx tool because no jmx objects are defined" % node.account)
+            return
+
+        if self.started[idx-1]:
+            self.logger.debug("%s: jmx tool has been started already on this node" % node.account)
+            return
+
+        # JmxTool is not particularly robust to slow-starting processes. In order to ensure JmxTool doesn't fail if the
+        # process we're trying to monitor takes awhile before listening on the JMX port, wait until we can see that port
+        # listening before even launching JmxTool
+        def check_jmx_port_listening():
+            return 0 == node.account.ssh("nc -z 127.0.0.1 %d" % self.jmx_port, allow_fail=True)
+
+        wait_until(check_jmx_port_listening, timeout_sec=30, backoff_sec=.1,
+                   err_msg="%s: Never saw JMX port for %s start listening" % (node.account, self))
+
+        # To correctly wait for requested JMX metrics to be added we need the --wait option for JmxTool. This option was
+        # not added until 0.11.0.1, so any earlier versions need to use JmxTool from a newer version.
+        use_jmxtool_version = get_version(node)
+        if use_jmxtool_version <= V_0_11_0_0:
+            use_jmxtool_version = DEV_BRANCH
+        cmd = "%s %s " % (self.path.script("kafka-run-class.sh", use_jmxtool_version), self.jmx_class_name())
+        cmd += "--reporting-interval %d --jmx-url service:jmx:rmi:///jndi/rmi://127.0.0.1:%d/jmxrmi" % (self.jmx_poll_ms, self.jmx_port)
+        cmd += " --wait"
+        for jmx_object_name in self.jmx_object_names:
+            cmd += " --object-name %s" % jmx_object_name
+        cmd += " --attributes "
+        for jmx_attribute in self.jmx_attributes:
+            cmd += "%s," % jmx_attribute
+        cmd += " 1>> %s" % self.jmx_tool_log
+        cmd += " 2>> %s &" % self.jmx_tool_err_log
+
+        self.logger.debug("%s: Start JmxTool %d command: %s" % (node.account, idx, cmd))
+        node.account.ssh(cmd, allow_fail=False)
+        wait_until(lambda: self._jmx_has_output(node), timeout_sec=30, backoff_sec=.5, err_msg="%s: Jmx tool took too long to start" % node.account)
+        self.started[idx-1] = True
+
+    def _jmx_has_output(self, node):
+        """Helper used as a proxy to determine whether jmx is running by that jmx_tool_log contains output."""
+        try:
+            node.account.ssh("test -s %s" % self.jmx_tool_log, allow_fail=False)
+            return True
+        except RemoteCommandError:
+            return False
+
+    def read_jmx_output(self, idx, node):
+        if not self.started[idx-1]:
+            return
+
+        object_attribute_names = []
+
+        cmd = "cat %s" % self.jmx_tool_log
+        self.logger.debug("Read jmx output %d command: %s", idx, cmd)
+        lines = [line for line in node.account.ssh_capture(cmd, allow_fail=False)]
+        assert len(lines) > 1, "There don't appear to be any samples in the jmx tool log: %s" % lines
+
+        for line in lines:
+            if "time" in line:
+                object_attribute_names = line.strip()[1:-1].split("\",\"")[1:]
+                continue
+            stats = [float(field) for field in line.split(',')]
+            time_sec = int(stats[0]/1000)
+            self.jmx_stats[idx-1][time_sec] = {name: stats[i+1] for i, name in enumerate(object_attribute_names)}
+
+        # do not calculate average and maximum of jmx stats until we have read output from all nodes
+        # If the service is multithreaded, this means that the results will be aggregated only when the last
+        # service finishes
+        if any(len(time_to_stats) == 0 for time_to_stats in self.jmx_stats):
+            return
+
+        start_time_sec = min([min(time_to_stats.keys()) for time_to_stats in self.jmx_stats])
+        end_time_sec = max([max(time_to_stats.keys()) for time_to_stats in self.jmx_stats])
+
+        for name in object_attribute_names:
+            aggregates_per_time = []
+            for time_sec in xrange(start_time_sec, end_time_sec + 1):
+                # assume that value is 0 if it is not read by jmx tool at the given time. This is appropriate for metrics such as bandwidth
+                values_per_node = [time_to_stats.get(time_sec, {}).get(name, 0) for time_to_stats in self.jmx_stats]
+                # assume that value is aggregated across nodes by sum. This is appropriate for metrics such as bandwidth
+                aggregates_per_time.append(sum(values_per_node))
+            self.average_jmx_value[name] = sum(aggregates_per_time) / len(aggregates_per_time)
+            self.maximum_jmx_value[name] = max(aggregates_per_time)
+
+    def read_jmx_output_all_nodes(self):
+        for node in self.nodes:
+            self.read_jmx_output(self.idx(node), node)
+
+    def jmx_class_name(self):
+        return "kafka.tools.JmxTool"
--- a/tests/kafkatest/services/performance/init.py
+++ b/tests/kafkatest/services/performance/init.py
@@ -0,0 +1,19 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from performance import PerformanceService, throughput, latency, compute_aggregate_throughput
+from end_to_end_latency import EndToEndLatencyService
+from producer_performance import ProducerPerformanceService
+from consumer_performance import ConsumerPerformanceService
--- a/tests/kafkatest/services/performance/consumer_performance.py
+++ b/tests/kafkatest/services/performance/consumer_performance.py
@@ -0,0 +1,187 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+
+import os
+
+from kafkatest.services.performance import PerformanceService
+from kafkatest.services.security.security_config import SecurityConfig
+from kafkatest.version import DEV_BRANCH, V_0_9_0_0, V_2_0_0, LATEST_0_10_0
+
+
+class ConsumerPerformanceService(PerformanceService):
+    """
+        See ConsumerPerformance.scala as the source of truth on these settings, but for reference:
+
+        "zookeeper" "The connection string for the zookeeper connection in the form host:port. Multiple URLS can
+                     be given to allow fail-over. This option is only used with the old consumer."
+
+        "broker-list", "A broker list to use for connecting if using the new consumer."
+
+        "topic", "REQUIRED: The topic to consume from."
+
+        "group", "The group id to consume on."
+
+        "fetch-size", "The amount of data to fetch in a single request."
+
+        "from-latest", "If the consumer does not already have an establishedoffset to consume from,
+                        start with the latest message present in the log rather than the earliest message."
+
+        "socket-buffer-size", "The size of the tcp RECV size."
+
+        "threads", "Number of processing threads."
+
+        "num-fetch-threads", "Number of fetcher threads. Defaults to 1"
+
+        "new-consumer", "Use the new consumer implementation."
+        "consumer.config", "Consumer config properties file."
+    """
+
+    # Root directory for persistent output
+    PERSISTENT_ROOT = "/mnt/consumer_performance"
+    LOG_DIR = os.path.join(PERSISTENT_ROOT, "logs")
+    STDOUT_CAPTURE = os.path.join(PERSISTENT_ROOT, "consumer_performance.stdout")
+    STDERR_CAPTURE = os.path.join(PERSISTENT_ROOT, "consumer_performance.stderr")
+    LOG_FILE = os.path.join(LOG_DIR, "consumer_performance.log")
+    LOG4J_CONFIG = os.path.join(PERSISTENT_ROOT, "tools-log4j.properties")
+    CONFIG_FILE = os.path.join(PERSISTENT_ROOT, "consumer.properties")
+
+    logs = {
+        "consumer_performance_output": {
+            "path": STDOUT_CAPTURE,
+            "collect_default": True},
+        "consumer_performance_stderr": {
+            "path": STDERR_CAPTURE,
+            "collect_default": True},
+        "consumer_performance_log": {
+            "path": LOG_FILE,
+            "collect_default": True}
+    }
+
+    def __init__(self, context, num_nodes, kafka, topic, messages, version=DEV_BRANCH, new_consumer=True, settings={}):
+        super(ConsumerPerformanceService, self).__init__(context, num_nodes)
+        self.kafka = kafka
+        self.security_config = kafka.security_config.client_config()
+        self.topic = topic
+        self.messages = messages
+        self.new_consumer = new_consumer
+        self.settings = settings
+
+        assert version >= V_0_9_0_0 or (not new_consumer), \
+            "new_consumer is only supported if version >= 0.9.0.0, version %s" % str(version)
+
+        assert version < V_2_0_0 or new_consumer, \
+            "new_consumer==false is only supported if version < 2.0.0, version %s" % str(version)
+
+        security_protocol = self.security_config.security_protocol
+        assert version >= V_0_9_0_0 or security_protocol == SecurityConfig.PLAINTEXT, \
+            "Security protocol %s is only supported if version >= 0.9.0.0, version %s" % (self.security_config, str(version))
+
+        # These less-frequently used settings can be updated manually after instantiation
+        self.fetch_size = None
+        self.socket_buffer_size = None
+        self.threads = None
+        self.num_fetch_threads = None
+        self.group = None
+        self.from_latest = None
+
+        for node in self.nodes:
+            node.version = version
+
+    def args(self, version):
+        """Dictionary of arguments used to start the Consumer Performance script."""
+        args = {
+            'topic': self.topic,
+            'messages': self.messages,
+        }
+
+        if self.new_consumer:
+            if version <= LATEST_0_10_0:
+                args['new-consumer'] = ""
+            args['broker-list'] = self.kafka.bootstrap_servers(self.security_config.security_protocol)
+        else:
+            args['zookeeper'] = self.kafka.zk_connect_setting()
+
+        if self.fetch_size is not None:
+            args['fetch-size'] = self.fetch_size
+
+        if self.socket_buffer_size is not None:
+            args['socket-buffer-size'] = self.socket_buffer_size
+
+        if self.threads is not None:
+            args['threads'] = self.threads
+
+        if self.num_fetch_threads is not None:
+            args['num-fetch-threads'] = self.num_fetch_threads
+
+        if self.group is not None:
+            args['group'] = self.group
+
+        if self.from_latest:
+            args['from-latest'] = ""
+
+        return args
+
+    def start_cmd(self, node):
+        cmd = "export LOG_DIR=%s;" % ConsumerPerformanceService.LOG_DIR
+        cmd += " export KAFKA_OPTS=%s;" % self.security_config.kafka_opts
+        cmd += " export KAFKA_LOG4J_OPTS=\"-Dlog4j.configuration=file:%s\";" % ConsumerPerformanceService.LOG4J_CONFIG
+        cmd += " %s" % self.path.script("kafka-consumer-perf-test.sh", node)
+        for key, value in self.args(node.version).items():
+            cmd += " --%s %s" % (key, value)
+
+        if node.version >= V_0_9_0_0:
+            # This is only used for security settings
+            cmd += " --consumer.config %s" % ConsumerPerformanceService.CONFIG_FILE
+
+        for key, value in self.settings.items():
+            cmd += " %s=%s" % (str(key), str(value))
+
+        cmd += " 2>> %(stderr)s | tee -a %(stdout)s" % {'stdout': ConsumerPerformanceService.STDOUT_CAPTURE,
+                                                        'stderr': ConsumerPerformanceService.STDERR_CAPTURE}
+        return cmd
+
+    def parse_results(self, line, version):
+        parts = line.split(',')
+        if version >= V_0_9_0_0:
+            result = {
+                'total_mb': float(parts[2]),
+                'mbps': float(parts[3]),
+                'records_per_sec': float(parts[5]),
+            }
+        else:
+            result = {
+                'total_mb': float(parts[3]),
+                'mbps': float(parts[4]),
+                'records_per_sec': float(parts[6]),
+            }
+        return result
+
+    def _worker(self, idx, node):
+        node.account.ssh("mkdir -p %s" % ConsumerPerformanceService.PERSISTENT_ROOT, allow_fail=False)
+
+        log_config = self.render('tools_log4j.properties', log_file=ConsumerPerformanceService.LOG_FILE)
+        node.account.create_file(ConsumerPerformanceService.LOG4J_CONFIG, log_config)
+        node.account.create_file(ConsumerPerformanceService.CONFIG_FILE, str(self.security_config))
+        self.security_config.setup_node(node)
+
+        cmd = self.start_cmd(node)
+        self.logger.debug("Consumer performance %d command: %s", idx, cmd)
+        last = None
+        for line in node.account.ssh_capture(cmd):
+            last = line
+
+        # Parse and save the last line's information
+        self.results[idx-1] = self.parse_results(last, node.version)
--- a/tests/kafkatest/services/performance/end_to_end_latency.py
+++ b/tests/kafkatest/services/performance/end_to_end_latency.py
@@ -0,0 +1,124 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+
+from kafkatest.services.performance import PerformanceService
+from kafkatest.services.security.security_config import SecurityConfig
+from kafkatest.version import DEV_BRANCH, V_0_9_0_0
+
+
+
+class EndToEndLatencyService(PerformanceService):
+    MESSAGE_BYTES = 21  # 0.8.X messages are fixed at 21 bytes, so we'll match that for other versions
+
+    # Root directory for persistent output
+    PERSISTENT_ROOT = "/mnt/end_to_end_latency"
+    LOG_DIR = os.path.join(PERSISTENT_ROOT, "logs")
+    STDOUT_CAPTURE = os.path.join(PERSISTENT_ROOT, "end_to_end_latency.stdout")
+    STDERR_CAPTURE = os.path.join(PERSISTENT_ROOT, "end_to_end_latency.stderr")
+    LOG_FILE = os.path.join(LOG_DIR, "end_to_end_latency.log")
+    LOG4J_CONFIG = os.path.join(PERSISTENT_ROOT, "tools-log4j.properties")
+    CONFIG_FILE = os.path.join(PERSISTENT_ROOT, "client.properties")
+
+    logs = {
+        "end_to_end_latency_output": {
+            "path": STDOUT_CAPTURE,
+            "collect_default": True},
+        "end_to_end_latency_stderr": {
+            "path": STDERR_CAPTURE,
+            "collect_default": True},
+        "end_to_end_latency_log": {
+            "path": LOG_FILE,
+            "collect_default": True}
+    }
+
+    def __init__(self, context, num_nodes, kafka, topic, num_records, compression_type="none", version=DEV_BRANCH, acks=1):
+        super(EndToEndLatencyService, self).__init__(context, num_nodes,
+                                                     root=EndToEndLatencyService.PERSISTENT_ROOT)
+        self.kafka = kafka
+        self.security_config = kafka.security_config.client_config()
+
+        security_protocol = self.security_config.security_protocol
+
+        if version < V_0_9_0_0:
+            assert security_protocol == SecurityConfig.PLAINTEXT, \
+                "Security protocol %s is only supported if version >= 0.9.0.0, version %s" % (self.security_config, str(version))
+            assert compression_type == "none", \
+                "Compression type %s is only supported if version >= 0.9.0.0, version %s" % (compression_type, str(version))
+
+        self.args = {
+            'topic': topic,
+            'num_records': num_records,
+            'acks': acks,
+            'compression_type': compression_type,
+            'kafka_opts': self.security_config.kafka_opts,
+            'message_bytes': EndToEndLatencyService.MESSAGE_BYTES
+        }
+
+        for node in self.nodes:
+            node.version = version
+
+    def start_cmd(self, node):
+        args = self.args.copy()
+        args.update({
+            'zk_connect': self.kafka.zk_connect_setting(),
+            'bootstrap_servers': self.kafka.bootstrap_servers(self.security_config.security_protocol),
+            'config_file': EndToEndLatencyService.CONFIG_FILE,
+            'kafka_run_class': self.path.script("kafka-run-class.sh", node),
+            'java_class_name': self.java_class_name()
+        })
+
+        cmd = "export KAFKA_LOG4J_OPTS=\"-Dlog4j.configuration=file:%s\"; " % EndToEndLatencyService.LOG4J_CONFIG
+        if node.version >= V_0_9_0_0:
+            cmd += "KAFKA_OPTS=%(kafka_opts)s %(kafka_run_class)s %(java_class_name)s " % args
+            cmd += "%(bootstrap_servers)s %(topic)s %(num_records)d %(acks)d %(message_bytes)d %(config_file)s" % args
+        else:
+            # Set fetch max wait to 0 to match behavior in later versions
+            cmd += "KAFKA_OPTS=%(kafka_opts)s %(kafka_run_class)s kafka.tools.TestEndToEndLatency " % args
+            cmd += "%(bootstrap_servers)s %(zk_connect)s %(topic)s %(num_records)d 0 %(acks)d" % args
+
+        cmd += " 2>> %(stderr)s | tee -a %(stdout)s" % {'stdout': EndToEndLatencyService.STDOUT_CAPTURE,
+                                                        'stderr': EndToEndLatencyService.STDERR_CAPTURE}
+
+        return cmd
+
+    def _worker(self, idx, node):
+        node.account.ssh("mkdir -p %s" % EndToEndLatencyService.PERSISTENT_ROOT, allow_fail=False)
+
+        log_config = self.render('tools_log4j.properties', log_file=EndToEndLatencyService.LOG_FILE)
+
+        node.account.create_file(EndToEndLatencyService.LOG4J_CONFIG, log_config)
+        client_config = str(self.security_config)
+        if node.version >= V_0_9_0_0:
+            client_config += "compression_type=%(compression_type)s" % self.args
+        node.account.create_file(EndToEndLatencyService.CONFIG_FILE, client_config)
+
+        self.security_config.setup_node(node)
+
+        cmd = self.start_cmd(node)
+        self.logger.debug("End-to-end latency %d command: %s", idx, cmd)
+        results = {}
+        for line in node.account.ssh_capture(cmd):
+            if line.startswith("Avg latency:"):
+                results['latency_avg_ms'] = float(line.split()[2])
+            if line.startswith("Percentiles"):
+                results['latency_50th_ms'] = float(line.split()[3][:-1])
+                results['latency_99th_ms'] = float(line.split()[6][:-1])
+                results['latency_999th_ms'] = float(line.split()[9])
+        self.results[idx-1] = results
+
+    def java_class_name(self):
+        return "kafka.tools.EndToEndLatency"
--- a/tests/kafkatest/services/performance/performance.py
+++ b/tests/kafkatest/services/performance/performance.py
@@ -0,0 +1,72 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from ducktape.services.background_thread import BackgroundThreadService
+from kafkatest.directory_layout.kafka_path import KafkaPathResolverMixin
+
+
+class PerformanceService(KafkaPathResolverMixin, BackgroundThreadService):
+
+    def __init__(self, context=None, num_nodes=0, root="/mnt/*", stop_timeout_sec=30):
+        super(PerformanceService, self).__init__(context, num_nodes)
+        self.results = [None] * self.num_nodes
+        self.stats = [[] for x in range(self.num_nodes)]
+        self.stop_timeout_sec = stop_timeout_sec
+        self.root = root
+
+    def java_class_name(self):
+        """
+        Returns the name of the Java class which this service creates.  Subclasses should override
+        this method, so that we know the name of the java process to stop.  If it is not
+        overridden, we will kill all java processes in PerformanceService#stop_node (for backwards
+        compatibility.)
+        """
+        return ""
+
+    def stop_node(self, node):
+        node.account.kill_java_processes(self.java_class_name(), clean_shutdown=True, allow_fail=True)
+
+        stopped = self.wait_node(node, timeout_sec=self.stop_timeout_sec)
+        assert stopped, "Node %s: did not stop within the specified timeout of %s seconds" % \
+                        (str(node.account), str(self.stop_timeout_sec))
+
+    def clean_node(self, node):
+        node.account.kill_java_processes(self.java_class_name(), clean_shutdown=False, allow_fail=True)
+        node.account.ssh("rm -rf -- %s" % self.root, allow_fail=False)
+
+
+def throughput(records_per_sec, mb_per_sec):
+    """Helper method to ensure uniform representation of throughput data"""
+    return {
+        "records_per_sec": records_per_sec,
+        "mb_per_sec": mb_per_sec
+    }
+
+
+def latency(latency_50th_ms, latency_99th_ms, latency_999th_ms):
+    """Helper method to ensure uniform representation of latency data"""
+    return {
+        "latency_50th_ms": latency_50th_ms,
+        "latency_99th_ms": latency_99th_ms,
+        "latency_999th_ms": latency_999th_ms
+    }
+
+
+def compute_aggregate_throughput(perf):
+    """Helper method for computing throughput after running a performance service."""
+    aggregate_rate = sum([r['records_per_sec'] for r in perf.results])
+    aggregate_mbps = sum([r['mbps'] for r in perf.results])
+
+    return throughput(aggregate_rate, aggregate_mbps)
--- a/tests/kafkatest/services/performance/producer_performance.py
+++ b/tests/kafkatest/services/performance/producer_performance.py
@@ -0,0 +1,174 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+import time
+from ducktape.utils.util import wait_until
+from ducktape.cluster.remoteaccount import RemoteCommandError
+
+from kafkatest.directory_layout.kafka_path import TOOLS_JAR_NAME, TOOLS_DEPENDANT_TEST_LIBS_JAR_NAME
+from kafkatest.services.monitor.http import HttpMetricsCollector
+from kafkatest.services.performance import PerformanceService
+from kafkatest.services.security.security_config import SecurityConfig
+from kafkatest.version import DEV_BRANCH, V_0_9_0_0
+
+
+class ProducerPerformanceService(HttpMetricsCollector, PerformanceService):
+
+    PERSISTENT_ROOT = "/mnt/producer_performance"
+    STDOUT_CAPTURE = os.path.join(PERSISTENT_ROOT, "producer_performance.stdout")
+    STDERR_CAPTURE = os.path.join(PERSISTENT_ROOT, "producer_performance.stderr")
+    LOG_DIR = os.path.join(PERSISTENT_ROOT, "logs")
+    LOG_FILE = os.path.join(LOG_DIR, "producer_performance.log")
+    LOG4J_CONFIG = os.path.join(PERSISTENT_ROOT, "tools-log4j.properties")
+
+    def __init__(self, context, num_nodes, kafka, topic, num_records, record_size, throughput, version=DEV_BRANCH, settings=None,
+                 intermediate_stats=False, client_id="producer-performance"):
+
+        super(ProducerPerformanceService, self).__init__(context=context, num_nodes=num_nodes)
+
+        self.logs = {
+            "producer_performance_stdout": {
+                "path": ProducerPerformanceService.STDOUT_CAPTURE,
+                "collect_default": True},
+            "producer_performance_stderr": {
+                "path": ProducerPerformanceService.STDERR_CAPTURE,
+                "collect_default": True},
+            "producer_performance_log": {
+                "path": ProducerPerformanceService.LOG_FILE,
+                "collect_default": True}
+        }
+
+        self.kafka = kafka
+        self.security_config = kafka.security_config.client_config()
+
+        security_protocol = self.security_config.security_protocol
+        assert version >= V_0_9_0_0 or security_protocol == SecurityConfig.PLAINTEXT, \
+            "Security protocol %s is only supported if version >= 0.9.0.0, version %s" % (self.security_config, str(version))
+
+        self.args = {
+            'topic': topic,
+            'kafka_opts': self.security_config.kafka_opts,
+            'num_records': num_records,
+            'record_size': record_size,
+            'throughput': throughput
+        }
+        self.settings = settings or {}
+        self.intermediate_stats = intermediate_stats
+        self.client_id = client_id
+
+        for node in self.nodes:
+            node.version = version
+
+    def start_cmd(self, node):
+        args = self.args.copy()
+        args.update({
+            'bootstrap_servers': self.kafka.bootstrap_servers(self.security_config.security_protocol),
+            'client_id': self.client_id,
+            'kafka_run_class': self.path.script("kafka-run-class.sh", node),
+            'metrics_props': ' '.join(["%s=%s" % (k, v) for k, v in self.http_metrics_client_configs.iteritems()])
+            })
+
+        cmd = ""
+
+        if node.version < DEV_BRANCH:
+            # In order to ensure more consistent configuration between versions, always use the ProducerPerformance
+            # tool from the development branch
+            tools_jar = self.path.jar(TOOLS_JAR_NAME, DEV_BRANCH)
+            tools_dependant_libs_jar = self.path.jar(TOOLS_DEPENDANT_TEST_LIBS_JAR_NAME, DEV_BRANCH)
+
+            for jar in (tools_jar, tools_dependant_libs_jar):
+                cmd += "for file in %s; do CLASSPATH=$CLASSPATH:$file; done; " % jar
+            cmd += "export CLASSPATH; "
+
+        cmd += " export KAFKA_LOG4J_OPTS=\"-Dlog4j.configuration=file:%s\"; " % ProducerPerformanceService.LOG4J_CONFIG
+        cmd += "KAFKA_OPTS=%(kafka_opts)s KAFKA_HEAP_OPTS=\"-XX:+HeapDumpOnOutOfMemoryError\" %(kafka_run_class)s org.apache.kafka.tools.ProducerPerformance " \
+              "--topic %(topic)s --num-records %(num_records)d --record-size %(record_size)d --throughput %(throughput)d --producer-props bootstrap.servers=%(bootstrap_servers)s client.id=%(client_id)s %(metrics_props)s" % args
+
+        self.security_config.setup_node(node)
+        if self.security_config.security_protocol != SecurityConfig.PLAINTEXT:
+            self.settings.update(self.security_config.properties)
+
+        for key, value in self.settings.items():
+            cmd += " %s=%s" % (str(key), str(value))
+
+        cmd += " 2>>%s | tee %s" % (ProducerPerformanceService.STDERR_CAPTURE, ProducerPerformanceService.STDOUT_CAPTURE)
+        return cmd
+
+    def pids(self, node):
+        try:
+            cmd = "jps | grep -i ProducerPerformance | awk '{print $1}'"
+            pid_arr = [pid for pid in node.account.ssh_capture(cmd, allow_fail=True, callback=int)]
+            return pid_arr
+        except (RemoteCommandError, ValueError) as e:
+            return []
+
+    def alive(self, node):
+        return len(self.pids(node)) > 0
+
+    def _worker(self, idx, node):
+        node.account.ssh("mkdir -p %s" % ProducerPerformanceService.PERSISTENT_ROOT, allow_fail=False)
+
+        # Create and upload log properties
+        log_config = self.render('tools_log4j.properties', log_file=ProducerPerformanceService.LOG_FILE)
+        node.account.create_file(ProducerPerformanceService.LOG4J_CONFIG, log_config)
+
+        cmd = self.start_cmd(node)
+        self.logger.debug("Producer performance %d command: %s", idx, cmd)
+
+        # start ProducerPerformance process
+        start = time.time()
+        producer_output = node.account.ssh_capture(cmd)
+        wait_until(lambda: self.alive(node), timeout_sec=20, err_msg="ProducerPerformance failed to start")
+        # block until there is at least one line of output
+        first_line = next(producer_output, None)
+        if first_line is None:
+            raise Exception("No output from ProducerPerformance")
+
+        wait_until(lambda: not self.alive(node), timeout_sec=1200, backoff_sec=2, err_msg="ProducerPerformance failed to finish")
+        elapsed = time.time() - start
+        self.logger.debug("ProducerPerformance process ran for %s seconds" % elapsed)
+
+        # parse producer output from file
+        last = None
+        producer_output = node.account.ssh_capture("cat %s" % ProducerPerformanceService.STDOUT_CAPTURE)
+        for line in producer_output:
+            if self.intermediate_stats:
+                try:
+                    self.stats[idx-1].append(self.parse_stats(line))
+                except:
+                    # Sometimes there are extraneous log messages
+                    pass
+
+            last = line
+        try:
+            self.results[idx-1] = self.parse_stats(last)
+        except:
+            raise Exception("Unable to parse aggregate performance statistics on node %d: %s" % (idx, last))
+
+    def parse_stats(self, line):
+
+        parts = line.split(',')
+        return {
+            'records': int(parts[0].split()[0]),
+            'records_per_sec': float(parts[1].split()[0]),
+            'mbps': float(parts[1].split('(')[1].split()[0]),
+            'latency_avg_ms': float(parts[2].split()[0]),
+            'latency_max_ms': float(parts[3].split()[0]),
+            'latency_50th_ms': float(parts[4].split()[0]),
+            'latency_95th_ms': float(parts[5].split()[0]),
+            'latency_99th_ms': float(parts[6].split()[0]),
+            'latency_999th_ms': float(parts[7].split()[0]),
+        }
--- a/tests/kafkatest/services/performance/streams_performance.py
+++ b/tests/kafkatest/services/performance/streams_performance.py
@@ -0,0 +1,108 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from kafkatest.services.monitor.jmx import JmxMixin
+from kafkatest.services.streams import StreamsTestBaseService
+from kafkatest.services.kafka import KafkaConfig
+from kafkatest.services import streams_property
+
+#
+# Class used to start the simple Kafka Streams benchmark
+#
+
+class StreamsSimpleBenchmarkService(StreamsTestBaseService):
+    """Base class for simple Kafka Streams benchmark"""
+
+    def __init__(self, test_context, kafka, test_name, num_threads, num_recs_or_wait_ms, key_skew, value_size):
+        super(StreamsSimpleBenchmarkService, self).__init__(test_context,
+                                                            kafka,
+                                                            "org.apache.kafka.streams.perf.SimpleBenchmark",
+                                                            test_name,
+                                                            num_recs_or_wait_ms,
+                                                            key_skew,
+                                                            value_size)
+
+        self.jmx_option = ""
+        if test_name.startswith('stream') or test_name.startswith('table'):
+            self.jmx_option = "stream-jmx"
+            JmxMixin.__init__(self,
+                              num_nodes=1,
+                              jmx_object_names=['kafka.streams:type=stream-thread-metrics,thread-id=simple-benchmark-StreamThread-%d' %(i+1) for i in range(num_threads)],
+                              jmx_attributes=['process-latency-avg',
+                                              'process-rate',
+                                              'commit-latency-avg',
+                                              'commit-rate',
+                                              'poll-latency-avg',
+                                              'poll-rate'],
+                              root=StreamsTestBaseService.PERSISTENT_ROOT)
+
+        if test_name.startswith('consume'):
+            self.jmx_option = "consumer-jmx"
+            JmxMixin.__init__(self,
+                              num_nodes=1,
+                              jmx_object_names=['kafka.consumer:type=consumer-fetch-manager-metrics,client-id=simple-benchmark-consumer'],
+                              jmx_attributes=['records-consumed-rate'],
+                              root=StreamsTestBaseService.PERSISTENT_ROOT)
+
+        self.num_threads = num_threads
+
+    def prop_file(self):
+        cfg = KafkaConfig(**{streams_property.STATE_DIR: self.PERSISTENT_ROOT,
+                             streams_property.KAFKA_SERVERS: self.kafka.bootstrap_servers(),
+                             streams_property.NUM_THREADS: self.num_threads})
+        return cfg.render()
+
+
+    def start_cmd(self, node):
+        if self.jmx_option != "":
+            args = self.args.copy()
+            args['jmx_port'] = self.jmx_port
+            args['config_file'] = self.CONFIG_FILE
+            args['stdout'] = self.STDOUT_FILE
+            args['stderr'] = self.STDERR_FILE
+            args['pidfile'] = self.PID_FILE
+            args['log4j'] = self.LOG4J_CONFIG_FILE
+            args['kafka_run_class'] = self.path.script("kafka-run-class.sh", node)
+
+            cmd = "( export JMX_PORT=%(jmx_port)s; export KAFKA_LOG4J_OPTS=\"-Dlog4j.configuration=file:%(log4j)s\"; " \
+                  "INCLUDE_TEST_JARS=true %(kafka_run_class)s %(streams_class_name)s " \
+                  " %(config_file)s %(user_test_args1)s %(user_test_args2)s %(user_test_args3)s" \
+                  " %(user_test_args4)s & echo $! >&3 ) 1>> %(stdout)s 2>> %(stderr)s 3> %(pidfile)s" % args
+
+        else:
+            cmd = super(StreamsSimpleBenchmarkService, self).start_cmd(node)
+
+        return cmd
+
+    def start_node(self, node):
+        super(StreamsSimpleBenchmarkService, self).start_node(node)
+
+        if self.jmx_option != "":
+            self.start_jmx_tool(1, node)
+
+    def clean_node(self, node):
+        if self.jmx_option != "":
+            JmxMixin.clean_node(self, node)
+
+        super(StreamsSimpleBenchmarkService, self).clean_node(node)
+
+    def collect_data(self, node, tag = None):
+        # Collect the data and return it to the framework
+        output = node.account.ssh_capture("grep Performance %s" % self.STDOUT_FILE)
+        data = {}
+        for line in output:
+            parts = line.split(':')
+            data[tag + parts[0]] = parts[1]
+        return data
--- a/tests/kafkatest/services/performance/templates/tools_log4j.properties
+++ b/tests/kafkatest/services/performance/templates/tools_log4j.properties
@@ -0,0 +1,25 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+# Define the root logger with appender file
+log4j.rootLogger = {{ log_level|default("INFO") }}, FILE
+
+log4j.appender.FILE=org.apache.log4j.FileAppender
+log4j.appender.FILE.File={{ log_file }}
+log4j.appender.FILE.ImmediateFlush=true
+# Set the append to false, overwrite
+log4j.appender.FILE.Append=false
+log4j.appender.FILE.layout=org.apache.log4j.PatternLayout
+log4j.appender.FILE.layout.conversionPattern=[%d] %p %m (%c)%n
--- a/tests/kafkatest/services/replica_verification_tool.py
+++ b/tests/kafkatest/services/replica_verification_tool.py
@@ -0,0 +1,93 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from ducktape.services.background_thread import BackgroundThreadService
+
+from kafkatest.directory_layout.kafka_path import KafkaPathResolverMixin
+from kafkatest.services.security.security_config import SecurityConfig
+
+import re
+
+
+class ReplicaVerificationTool(KafkaPathResolverMixin, BackgroundThreadService):
+
+    logs = {
+        "producer_log": {
+            "path": "/mnt/replica_verification_tool.log",
+            "collect_default": False}
+    }
+
+    def __init__(self, context, num_nodes, kafka, topic, report_interval_ms, security_protocol="PLAINTEXT", stop_timeout_sec=30):
+        super(ReplicaVerificationTool, self).__init__(context, num_nodes)
+
+        self.kafka = kafka
+        self.topic = topic
+        self.report_interval_ms = report_interval_ms
+        self.security_protocol = security_protocol
+        self.security_config = SecurityConfig(self.context, security_protocol)
+        self.partition_lag = {}
+        self.stop_timeout_sec = stop_timeout_sec
+
+    def _worker(self, idx, node):
+        cmd = self.start_cmd(node)
+        self.logger.debug("ReplicaVerificationTool %d command: %s" % (idx, cmd))
+        self.security_config.setup_node(node)
+        for line in node.account.ssh_capture(cmd):
+            self.logger.debug("Parsing line:{}".format(line))
+
+            parsed = re.search('.*max lag is (.+?) for partition ([a-zA-Z0-9._-]+-[0-9]+) at', line)
+            if parsed:
+                lag = int(parsed.group(1))
+                topic_partition = parsed.group(2)
+                self.logger.debug("Setting max lag for {} as {}".format(topic_partition, lag))
+                self.partition_lag[topic_partition] = lag
+
+    def get_lag_for_partition(self, topic, partition):
+        """
+        Get latest lag for given topic-partition
+
+        Args:
+            topic:          a topic
+            partition:      a partition of the topic
+        """
+        topic_partition = topic + '-' + str(partition)
+        lag = self.partition_lag.get(topic_partition, -1)
+        self.logger.debug("Returning lag for {} as {}".format(topic_partition, lag))
+
+        return lag
+
+    def start_cmd(self, node):
+        cmd = self.path.script("kafka-run-class.sh", node)
+        cmd += " %s" % self.java_class_name()
+        cmd += " --broker-list %s --topic-white-list %s --time -2 --report-interval-ms %s" % (self.kafka.bootstrap_servers(self.security_protocol), self.topic, self.report_interval_ms)
+
+        cmd += " 2>> /mnt/replica_verification_tool.log | tee -a /mnt/replica_verification_tool.log &"
+        return cmd
+
+    def stop_node(self, node):
+        node.account.kill_java_processes(self.java_class_name(), clean_shutdown=True,
+                                         allow_fail=True)
+
+        stopped = self.wait_node(node, timeout_sec=self.stop_timeout_sec)
+        assert stopped, "Node %s: did not stop within the specified timeout of %s seconds" % \
+                        (str(node.account), str(self.stop_timeout_sec))
+
+    def clean_node(self, node):
+        node.account.kill_java_processes(self.java_class_name(), clean_shutdown=False,
+                                         allow_fail=True)
+        node.account.ssh("rm -rf /mnt/replica_verification_tool.log", allow_fail=False)
+
+    def java_class_name(self):
+        return "kafka.tools.ReplicaVerificationTool"
--- a/tests/kafkatest/services/security/init.py
+++ b/tests/kafkatest/services/security/init.py
@@ -0,0 +1,15 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
--- a/tests/kafkatest/services/security/kafka_acls.py
+++ b/tests/kafkatest/services/security/kafka_acls.py
@@ -0,0 +1,75 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from kafkatest.directory_layout.kafka_path import KafkaPathResolverMixin
+
+
+class ACLs(KafkaPathResolverMixin):
+    def __init__(self, context):
+        self.context = context
+
+    def set_acls(self, protocol, kafka, topic, group):
+        node = kafka.nodes[0]
+        setting = kafka.zk_connect_setting()
+
+        # Set server ACLs
+        kafka_principal = "User:CN=systemtest" if protocol == "SSL" else "User:kafka"
+        self.acls_command(node, ACLs.add_cluster_acl(setting, kafka_principal))
+        self.acls_command(node, ACLs.broker_read_acl(setting, "*", kafka_principal))
+
+        # Set client ACLs
+        client_principal = "User:CN=systemtest" if protocol == "SSL" else "User:client"
+        self.acls_command(node, ACLs.produce_acl(setting, topic, client_principal))
+        self.acls_command(node, ACLs.consume_acl(setting, topic, group, client_principal))
+
+    def acls_command(self, node, properties):
+        cmd = "%s %s" % (self.path.script("kafka-acls.sh", node), properties)
+        node.account.ssh(cmd)
+
+    @staticmethod
+    def add_cluster_acl(zk_connect, principal="User:kafka"):
+        return "--authorizer-properties zookeeper.connect=%(zk_connect)s --add --cluster " \
+               "--operation=ClusterAction --allow-principal=%(principal)s " % {
+            'zk_connect': zk_connect,
+            'principal': principal
+        }
+
+    @staticmethod
+    def broker_read_acl(zk_connect, topic, principal="User:kafka"):
+        return "--authorizer-properties zookeeper.connect=%(zk_connect)s --add --topic=%(topic)s " \
+               "--operation=Read --allow-principal=%(principal)s " % {
+            'zk_connect': zk_connect,
+            'topic': topic,
+            'principal': principal
+        }
+
+    @staticmethod
+    def produce_acl(zk_connect, topic, principal="User:client"):
+        return "--authorizer-properties zookeeper.connect=%(zk_connect)s --add --topic=%(topic)s " \
+               "--producer --allow-principal=%(principal)s " % {
+            'zk_connect': zk_connect,
+            'topic': topic,
+            'principal': principal
+        }
+
+    @staticmethod
+    def consume_acl(zk_connect, topic, group, principal="User:client"):
+        return "--authorizer-properties zookeeper.connect=%(zk_connect)s --add --topic=%(topic)s " \
+               "--group=%(group)s --consumer --allow-principal=%(principal)s " % {
+            'zk_connect': zk_connect,
+            'topic': topic,
+            'group': group,
+            'principal': principal
+        }
--- a/tests/kafkatest/services/security/listener_security_config.py
+++ b/tests/kafkatest/services/security/listener_security_config.py
@@ -0,0 +1,43 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+class ListenerSecurityConfig:
+
+    SASL_MECHANISM_PREFIXED_CONFIGS = ["connections.max.reauth.ms", "sasl.jaas.config",
+                                       "sasl.login.callback.handler.class", "sasl.login.class",
+                                       "sasl.server.callback.handler.class"]
+
+    def __init__(self, use_separate_interbroker_listener=False,
+                 client_listener_overrides={}, interbroker_listener_overrides={}):
+        """
+        :param bool use_separate_interbroker_listener - if set, will use a separate interbroker listener,
+        with security protocol set to interbroker_security_protocol value. If set, requires
+        interbroker_security_protocol to be provided.
+        Normally port name is the same as its security protocol, so setting security_protocol and
+        interbroker_security_protocol to the same value will lead to a single port being open and both client
+        and broker-to-broker communication will go over that port. This parameter allows
+        you to add an interbroker listener with the same security protocol as a client listener, but running on a
+        separate port.
+        :param dict client_listener_overrides - non-prefixed listener config overrides for named client listener
+        (for example 'sasl.jaas.config', 'ssl.keystore.location', 'sasl.login.callback.handler.class', etc).
+        :param dict interbroker_listener_overrides - non-prefixed listener config overrides for named interbroker
+        listener (for example 'sasl.jaas.config', 'ssl.keystore.location', 'sasl.login.callback.handler.class', etc).
+        """
+        self.use_separate_interbroker_listener = use_separate_interbroker_listener
+        self.client_listener_overrides = client_listener_overrides
+        self.interbroker_listener_overrides = interbroker_listener_overrides
+
+    def requires_sasl_mechanism_prefix(self, config):
+        return config in ListenerSecurityConfig.SASL_MECHANISM_PREFIXED_CONFIGS
--- a/tests/kafkatest/services/security/minikdc.py
+++ b/tests/kafkatest/services/security/minikdc.py
@@ -0,0 +1,136 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+import random
+import uuid
+from io import open
+from os import remove, close
+from shutil import move
+from tempfile import mkstemp
+
+from ducktape.services.service import Service
+
+from kafkatest.directory_layout.kafka_path import KafkaPathResolverMixin, CORE_LIBS_JAR_NAME, CORE_DEPENDANT_TEST_LIBS_JAR_NAME
+from kafkatest.version import DEV_BRANCH
+
+
+class MiniKdc(KafkaPathResolverMixin, Service):
+
+    logs = {
+        "minikdc_log": {
+            "path": "/mnt/minikdc/minikdc.log",
+            "collect_default": True}
+    }
+
+    WORK_DIR = "/mnt/minikdc"
+    PROPS_FILE = "/mnt/minikdc/minikdc.properties"
+    KEYTAB_FILE = "/mnt/minikdc/keytab"
+    KRB5CONF_FILE = "/mnt/minikdc/krb5.conf"
+    LOG_FILE = "/mnt/minikdc/minikdc.log"
+
+    LOCAL_KEYTAB_FILE = None
+    LOCAL_KRB5CONF_FILE = None
+
+    @staticmethod
+    def _set_local_keytab_file(local_scratch_dir):
+        """Set MiniKdc.LOCAL_KEYTAB_FILE exactly once per test.
+
+        LOCAL_KEYTAB_FILE is currently used like a global variable to provide a mechanism to share the
+        location of the local keytab file among all services which might need it.
+
+        Since individual ducktape tests are each run in a subprocess forked from the ducktape main process,
+        class variables set at class load time are duplicated between test processes. This leads to collisions
+        if test subprocesses are run in parallel, so we defer setting these class variables until after the test itself
+        begins to run.
+        """
+        if MiniKdc.LOCAL_KEYTAB_FILE is None:
+            MiniKdc.LOCAL_KEYTAB_FILE = os.path.join(local_scratch_dir, "keytab")
+        return MiniKdc.LOCAL_KEYTAB_FILE
+
+    @staticmethod
+    def _set_local_krb5conf_file(local_scratch_dir):
+        """Set MiniKdc.LOCAL_KRB5CONF_FILE exactly once per test.
+
+        See _set_local_keytab_file for details why we do this.
+        """
+
+        if MiniKdc.LOCAL_KRB5CONF_FILE is None:
+            MiniKdc.LOCAL_KRB5CONF_FILE = os.path.join(local_scratch_dir, "krb5conf")
+        return MiniKdc.LOCAL_KRB5CONF_FILE
+
+    def __init__(self, context, kafka_nodes, extra_principals=""):
+        super(MiniKdc, self).__init__(context, 1)
+        self.kafka_nodes = kafka_nodes
+        self.extra_principals = extra_principals
+
+        # context.local_scratch_dir uses a ducktape feature:
+        # each test_context object has a unique local scratch directory which is available for the duration of the test
+        # which is automatically garbage collected after the test finishes
+        MiniKdc._set_local_keytab_file(context.local_scratch_dir)
+        MiniKdc._set_local_krb5conf_file(context.local_scratch_dir)
+
+    def replace_in_file(self, file_path, pattern, subst):
+        fh, abs_path = mkstemp()
+        with open(abs_path, 'w') as new_file:
+            with open(file_path) as old_file:
+                for line in old_file:
+                    new_file.write(line.replace(pattern, subst))
+        close(fh)
+        remove(file_path)
+        move(abs_path, file_path)
+
+    def start_node(self, node):
+        node.account.ssh("mkdir -p %s" % MiniKdc.WORK_DIR, allow_fail=False)
+        props_file = self.render('minikdc.properties',  node=node)
+        node.account.create_file(MiniKdc.PROPS_FILE, props_file)
+        self.logger.info("minikdc.properties")
+        self.logger.info(props_file)
+
+        kafka_principals = ' '.join(['kafka/' + kafka_node.account.hostname for kafka_node in self.kafka_nodes])
+        principals = 'client ' + kafka_principals + ' ' + self.extra_principals
+        self.logger.info("Starting MiniKdc with principals " + principals)
+
+        core_libs_jar = self.path.jar(CORE_LIBS_JAR_NAME, DEV_BRANCH)
+        core_dependant_test_libs_jar = self.path.jar(CORE_DEPENDANT_TEST_LIBS_JAR_NAME, DEV_BRANCH)
+
+        cmd = "for file in %s; do CLASSPATH=$CLASSPATH:$file; done;" % core_libs_jar
+        cmd += " for file in %s; do CLASSPATH=$CLASSPATH:$file; done;" % core_dependant_test_libs_jar
+        cmd += " export CLASSPATH;"
+        cmd += " %s kafka.security.minikdc.MiniKdc %s %s %s %s 1>> %s 2>> %s &" % (self.path.script("kafka-run-class.sh", node), MiniKdc.WORK_DIR, MiniKdc.PROPS_FILE, MiniKdc.KEYTAB_FILE, principals, MiniKdc.LOG_FILE, MiniKdc.LOG_FILE)
+        self.logger.debug("Attempting to start MiniKdc on %s with command: %s" % (str(node.account), cmd))
+        with node.account.monitor_log(MiniKdc.LOG_FILE) as monitor:
+            node.account.ssh(cmd)
+            monitor.wait_until("MiniKdc Running", timeout_sec=60, backoff_sec=1, err_msg="MiniKdc didn't finish startup")
+
+        node.account.copy_from(MiniKdc.KEYTAB_FILE, MiniKdc.LOCAL_KEYTAB_FILE)
+        node.account.copy_from(MiniKdc.KRB5CONF_FILE, MiniKdc.LOCAL_KRB5CONF_FILE)
+
+        # KDC is set to bind openly (via 0.0.0.0). Change krb5.conf to hold the specific KDC address
+        self.replace_in_file(MiniKdc.LOCAL_KRB5CONF_FILE, '0.0.0.0', node.account.hostname)
+
+    def stop_node(self, node):
+        self.logger.info("Stopping %s on %s" % (type(self).__name__, node.account.hostname))
+        node.account.kill_java_processes("MiniKdc", clean_shutdown=True, allow_fail=False)
+
+    def clean_node(self, node):
+        node.account.kill_java_processes("MiniKdc", clean_shutdown=False, allow_fail=True)
+        node.account.ssh("rm -rf " + MiniKdc.WORK_DIR, allow_fail=False)
+        if os.path.exists(MiniKdc.LOCAL_KEYTAB_FILE):
+            os.remove(MiniKdc.LOCAL_KEYTAB_FILE)
+        if os.path.exists(MiniKdc.LOCAL_KRB5CONF_FILE):
+            os.remove(MiniKdc.LOCAL_KRB5CONF_FILE)
+
+
--- a/tests/kafkatest/services/security/security_config.py
+++ b/tests/kafkatest/services/security/security_config.py
@@ -0,0 +1,352 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import logging
+import os
+import subprocess
+from tempfile import mkdtemp
+from shutil import rmtree
+from ducktape.template import TemplateRenderer
+from kafkatest.services.security.minikdc import MiniKdc
+from kafkatest.services.security.listener_security_config import ListenerSecurityConfig
+import itertools
+
+
+class SslStores(object):
+    def __init__(self, local_scratch_dir, logger=None):
+        self.logger = logger
+        self.ca_crt_path = os.path.join(local_scratch_dir, "test.ca.crt")
+        self.ca_jks_path = os.path.join(local_scratch_dir, "test.ca.jks")
+        self.ca_passwd = "test-ca-passwd"
+
+        self.truststore_path = os.path.join(local_scratch_dir, "test.truststore.jks")
+        self.truststore_passwd = "test-ts-passwd"
+        self.keystore_passwd = "test-ks-passwd"
+        # Zookeeper TLS (as of v3.5.6) does not support a key password different than the keystore password
+        self.key_passwd = self.keystore_passwd
+        # Allow upto one hour of clock skew between host and VMs
+        self.startdate = "-1H"
+
+        for file in [self.ca_crt_path, self.ca_jks_path, self.truststore_path]:
+            if os.path.exists(file):
+                os.remove(file)
+
+    def generate_ca(self):
+        """
+        Generate CA private key and certificate.
+        """
+
+        self.runcmd("keytool -genkeypair -alias ca -keyalg RSA -keysize 2048 -keystore %s -storetype JKS -storepass %s -keypass %s -dname CN=SystemTestCA -startdate %s --ext bc=ca:true" % (self.ca_jks_path, self.ca_passwd, self.ca_passwd, self.startdate))
+        self.runcmd("keytool -export -alias ca -keystore %s -storepass %s -storetype JKS -rfc -file %s" % (self.ca_jks_path, self.ca_passwd, self.ca_crt_path))
+
+    def generate_truststore(self):
+        """
+        Generate JKS truststore containing CA certificate.
+        """
+
+        self.runcmd("keytool -importcert -alias ca -file %s -keystore %s -storepass %s -storetype JKS -noprompt" % (self.ca_crt_path, self.truststore_path, self.truststore_passwd))
+
+    def generate_and_copy_keystore(self, node):
+        """
+        Generate JKS keystore with certificate signed by the test CA.
+        The generated certificate has the node's hostname as a DNS SubjectAlternativeName.
+        """
+
+        ks_dir = mkdtemp(dir="/tmp")
+        ks_path = os.path.join(ks_dir, "test.keystore.jks")
+        csr_path = os.path.join(ks_dir, "test.kafka.csr")
+        crt_path = os.path.join(ks_dir, "test.kafka.crt")
+
+        self.runcmd("keytool -genkeypair -alias kafka -keyalg RSA -keysize 2048 -keystore %s -storepass %s -storetype JKS -keypass %s -dname CN=systemtest -ext SAN=DNS:%s -startdate %s" % (ks_path, self.keystore_passwd, self.key_passwd, self.hostname(node), self.startdate))
+        self.runcmd("keytool -certreq -keystore %s -storepass %s -storetype JKS -keypass %s -alias kafka -file %s" % (ks_path, self.keystore_passwd, self.key_passwd, csr_path))
+        self.runcmd("keytool -gencert -keystore %s -storepass %s -storetype JKS -alias ca -infile %s -outfile %s -dname CN=systemtest -ext SAN=DNS:%s -startdate %s" % (self.ca_jks_path, self.ca_passwd, csr_path, crt_path, self.hostname(node), self.startdate))
+        self.runcmd("keytool -importcert -keystore %s -storepass %s -storetype JKS -alias ca -file %s -noprompt" % (ks_path, self.keystore_passwd, self.ca_crt_path))
+        self.runcmd("keytool -importcert -keystore %s -storepass %s -storetype JKS -keypass %s -alias kafka -file %s -noprompt" % (ks_path, self.keystore_passwd, self.key_passwd, crt_path))
+        node.account.copy_to(ks_path, SecurityConfig.KEYSTORE_PATH)
+
+        # generate ZooKeeper client TLS config file for encryption-only (no client cert) use case
+        str = """zookeeper.clientCnxnSocket=org.apache.zookeeper.ClientCnxnSocketNetty
+zookeeper.ssl.client.enable=true
+zookeeper.ssl.truststore.location=%s
+zookeeper.ssl.truststore.password=%s
+""" % (SecurityConfig.TRUSTSTORE_PATH, self.truststore_passwd)
+        node.account.create_file(SecurityConfig.ZK_CLIENT_TLS_ENCRYPT_ONLY_CONFIG_PATH, str)
+
+        # also generate ZooKeeper client TLS config file for mutual authentication use case
+        str = """zookeeper.clientCnxnSocket=org.apache.zookeeper.ClientCnxnSocketNetty
+zookeeper.ssl.client.enable=true
+zookeeper.ssl.truststore.location=%s
+zookeeper.ssl.truststore.password=%s
+zookeeper.ssl.keystore.location=%s
+zookeeper.ssl.keystore.password=%s
+""" % (SecurityConfig.TRUSTSTORE_PATH, self.truststore_passwd, SecurityConfig.KEYSTORE_PATH, self.keystore_passwd)
+        node.account.create_file(SecurityConfig.ZK_CLIENT_MUTUAL_AUTH_CONFIG_PATH, str)
+
+        rmtree(ks_dir)
+
+    def hostname(self, node):
+        """ Hostname which may be overridden for testing validation failures
+        """
+        return node.account.hostname
+
+    def runcmd(self, cmd):
+        if self.logger:
+            self.logger.log(logging.DEBUG, cmd)
+        proc = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
+        stdout, stderr = proc.communicate()
+
+        if proc.returncode != 0:
+            raise RuntimeError("Command '%s' returned non-zero exit status %d: %s" % (cmd, proc.returncode, stdout))
+
+
+class SecurityConfig(TemplateRenderer):
+
+    PLAINTEXT = 'PLAINTEXT'
+    SSL = 'SSL'
+    SASL_PLAINTEXT = 'SASL_PLAINTEXT'
+    SASL_SSL = 'SASL_SSL'
+    SASL_MECHANISM_GSSAPI = 'GSSAPI'
+    SASL_MECHANISM_PLAIN = 'PLAIN'
+    SASL_MECHANISM_SCRAM_SHA_256 = 'SCRAM-SHA-256'
+    SASL_MECHANISM_SCRAM_SHA_512 = 'SCRAM-SHA-512'
+    SCRAM_CLIENT_USER = "kafka-client"
+    SCRAM_CLIENT_PASSWORD = "client-secret"
+    SCRAM_BROKER_USER = "kafka-broker"
+    SCRAM_BROKER_PASSWORD = "broker-secret"
+    CONFIG_DIR = "/mnt/security"
+    KEYSTORE_PATH = "/mnt/security/test.keystore.jks"
+    TRUSTSTORE_PATH = "/mnt/security/test.truststore.jks"
+    ZK_CLIENT_TLS_ENCRYPT_ONLY_CONFIG_PATH = "/mnt/security/zk_client_tls_encrypt_only_config.properties"
+    ZK_CLIENT_MUTUAL_AUTH_CONFIG_PATH = "/mnt/security/zk_client_mutual_auth_config.properties"
+    JAAS_CONF_PATH = "/mnt/security/jaas.conf"
+    KRB5CONF_PATH = "/mnt/security/krb5.conf"
+    KEYTAB_PATH = "/mnt/security/keytab"
+
+    # This is initialized only when the first instance of SecurityConfig is created
+    ssl_stores = None
+
+    def __init__(self, context, security_protocol=None, interbroker_security_protocol=None,
+                 client_sasl_mechanism=SASL_MECHANISM_GSSAPI, interbroker_sasl_mechanism=SASL_MECHANISM_GSSAPI,
+                 zk_sasl=False, zk_tls=False, template_props="", static_jaas_conf=True, jaas_override_variables=None,
+                 listener_security_config=ListenerSecurityConfig()):
+        """
+        Initialize the security properties for the node and copy
+        keystore and truststore to the remote node if the transport protocol 
+        is SSL. If security_protocol is None, the protocol specified in the
+        template properties file is used. If no protocol is specified in the
+        template properties either, PLAINTEXT is used as default.
+        """
+
+        self.context = context
+        if not SecurityConfig.ssl_stores:
+            # This generates keystore/trustore files in a local scratch directory which gets
+            # automatically destroyed after the test is run
+            # Creating within the scratch directory allows us to run tests in parallel without fear of collision
+            SecurityConfig.ssl_stores = SslStores(context.local_scratch_dir, context.logger)
+            SecurityConfig.ssl_stores.generate_ca()
+            SecurityConfig.ssl_stores.generate_truststore()
+
+        if security_protocol is None:
+            security_protocol = self.get_property('security.protocol', template_props)
+        if security_protocol is None:
+            security_protocol = SecurityConfig.PLAINTEXT
+        elif security_protocol not in [SecurityConfig.PLAINTEXT, SecurityConfig.SSL, SecurityConfig.SASL_PLAINTEXT, SecurityConfig.SASL_SSL]:
+            raise Exception("Invalid security.protocol in template properties: " + security_protocol)
+
+        if interbroker_security_protocol is None:
+            interbroker_security_protocol = security_protocol
+        self.interbroker_security_protocol = interbroker_security_protocol
+        self.has_sasl = self.is_sasl(security_protocol) or self.is_sasl(interbroker_security_protocol) or zk_sasl
+        self.has_ssl = self.is_ssl(security_protocol) or self.is_ssl(interbroker_security_protocol) or zk_tls
+        self.zk_sasl = zk_sasl
+        self.zk_tls = zk_tls
+        self.static_jaas_conf = static_jaas_conf
+        self.listener_security_config = listener_security_config
+        self.properties = {
+            'security.protocol' : security_protocol,
+            'ssl.keystore.location' : SecurityConfig.KEYSTORE_PATH,
+            'ssl.keystore.password' : SecurityConfig.ssl_stores.keystore_passwd,
+            'ssl.key.password' : SecurityConfig.ssl_stores.key_passwd,
+            'ssl.truststore.location' : SecurityConfig.TRUSTSTORE_PATH,
+            'ssl.truststore.password' : SecurityConfig.ssl_stores.truststore_passwd,
+            'ssl.endpoint.identification.algorithm' : 'HTTPS',
+            'sasl.mechanism' : client_sasl_mechanism,
+            'sasl.mechanism.inter.broker.protocol' : interbroker_sasl_mechanism,
+            'sasl.kerberos.service.name' : 'kafka'
+        }
+        self.properties.update(self.listener_security_config.client_listener_overrides)
+        self.jaas_override_variables = jaas_override_variables or {}
+
+    def client_config(self, template_props="", node=None, jaas_override_variables=None):
+        # If node is not specified, use static jaas config which will be created later.
+        # Otherwise use static JAAS configuration files with SASL_SSL and sasl.jaas.config
+        # property with SASL_PLAINTEXT so that both code paths are tested by existing tests.
+        # Note that this is an artibtrary choice and it is possible to run all tests with
+        # either static or dynamic jaas config files if required.
+        static_jaas_conf = node is None or (self.has_sasl and self.has_ssl)
+        return SecurityConfig(self.context, self.security_protocol,
+                              client_sasl_mechanism=self.client_sasl_mechanism,
+                              template_props=template_props,
+                              static_jaas_conf=static_jaas_conf,
+                              jaas_override_variables=jaas_override_variables,
+                              listener_security_config=self.listener_security_config)
+
+    def enable_security_protocol(self, security_protocol):
+        self.has_sasl = self.has_sasl or self.is_sasl(security_protocol)
+        self.has_ssl = self.has_ssl or self.is_ssl(security_protocol)
+
+    def setup_ssl(self, node):
+        node.account.ssh("mkdir -p %s" % SecurityConfig.CONFIG_DIR, allow_fail=False)
+        node.account.copy_to(SecurityConfig.ssl_stores.truststore_path, SecurityConfig.TRUSTSTORE_PATH)
+        SecurityConfig.ssl_stores.generate_and_copy_keystore(node)
+
+    def setup_sasl(self, node):
+        node.account.ssh("mkdir -p %s" % SecurityConfig.CONFIG_DIR, allow_fail=False)
+        jaas_conf_file = "jaas.conf"
+        java_version = node.account.ssh_capture("java -version")
+
+        jaas_conf = None
+        if 'sasl.jaas.config' not in self.properties:
+            jaas_conf = self.render_jaas_config(
+                jaas_conf_file,
+                {
+                    'node': node,
+                    'is_ibm_jdk': any('IBM' in line for line in java_version),
+                    'SecurityConfig': SecurityConfig,
+                    'client_sasl_mechanism': self.client_sasl_mechanism,
+                    'enabled_sasl_mechanisms': self.enabled_sasl_mechanisms
+                }
+            )
+        else:
+            jaas_conf = self.properties['sasl.jaas.config']
+
+        if self.static_jaas_conf:
+            node.account.create_file(SecurityConfig.JAAS_CONF_PATH, jaas_conf)
+        elif 'sasl.jaas.config' not in self.properties:
+            self.properties['sasl.jaas.config'] = jaas_conf.replace("\n", " \\\n")
+        if self.has_sasl_kerberos:
+            node.account.copy_to(MiniKdc.LOCAL_KEYTAB_FILE, SecurityConfig.KEYTAB_PATH)
+            node.account.copy_to(MiniKdc.LOCAL_KRB5CONF_FILE, SecurityConfig.KRB5CONF_PATH)
+
+    def render_jaas_config(self, jaas_conf_file, config_variables):
+        """
+        Renders the JAAS config file contents
+
+        :param jaas_conf_file: name of the JAAS config template file
+        :param config_variables: dict of variables used in the template
+        :return: the rendered template string
+        """
+        variables = config_variables.copy()
+        variables.update(self.jaas_override_variables)  # override variables
+        return self.render(jaas_conf_file, **variables)
+
+    def setup_node(self, node):
+        if self.has_ssl:
+            self.setup_ssl(node)
+
+        if self.has_sasl:
+            self.setup_sasl(node)
+
+    def setup_credentials(self, node, path, zk_connect, broker):
+        if broker:
+            self.maybe_create_scram_credentials(node, zk_connect, path, self.interbroker_sasl_mechanism,
+                 SecurityConfig.SCRAM_BROKER_USER, SecurityConfig.SCRAM_BROKER_PASSWORD)
+        else:
+            self.maybe_create_scram_credentials(node, zk_connect, path, self.client_sasl_mechanism,
+                 SecurityConfig.SCRAM_CLIENT_USER, SecurityConfig.SCRAM_CLIENT_PASSWORD)
+
+    def maybe_create_scram_credentials(self, node, zk_connect, path, mechanism, user_name, password):
+        if self.has_sasl and self.is_sasl_scram(mechanism):
+            cmd = "%s --zookeeper %s --entity-name %s --entity-type users --alter --add-config %s=[password=%s]" % \
+                  (path.script("kafka-configs.sh", node), zk_connect,
+                  user_name, mechanism, password)
+            node.account.ssh(cmd)
+
+    def clean_node(self, node):
+        if self.security_protocol != SecurityConfig.PLAINTEXT:
+            node.account.ssh("rm -rf %s" % SecurityConfig.CONFIG_DIR, allow_fail=False)
+
+    def get_property(self, prop_name, template_props=""):
+        """
+        Get property value from the string representation of
+        a properties file.
+        """
+        value = None
+        for line in template_props.split("\n"):
+            items = line.split("=")
+            if len(items) == 2 and items[0].strip() == prop_name:
+                value = str(items[1].strip())
+        return value
+
+    def is_ssl(self, security_protocol):
+        return security_protocol == SecurityConfig.SSL or security_protocol == SecurityConfig.SASL_SSL
+
+    def is_sasl(self, security_protocol):
+        return security_protocol == SecurityConfig.SASL_PLAINTEXT or security_protocol == SecurityConfig.SASL_SSL
+
+    def is_sasl_scram(self, sasl_mechanism):
+        return sasl_mechanism == SecurityConfig.SASL_MECHANISM_SCRAM_SHA_256 or sasl_mechanism == SecurityConfig.SASL_MECHANISM_SCRAM_SHA_512
+
+    @property
+    def security_protocol(self):
+        return self.properties['security.protocol']
+
+    @property
+    def client_sasl_mechanism(self):
+        return self.properties['sasl.mechanism']
+
+    @property
+    def interbroker_sasl_mechanism(self):
+        return self.properties['sasl.mechanism.inter.broker.protocol']
+
+    @property
+    def enabled_sasl_mechanisms(self):
+        return set([self.client_sasl_mechanism, self.interbroker_sasl_mechanism])
+
+    @property
+    def has_sasl_kerberos(self):
+        return self.has_sasl and (SecurityConfig.SASL_MECHANISM_GSSAPI in self.enabled_sasl_mechanisms)
+
+    @property
+    def kafka_opts(self):
+        if self.has_sasl:
+            if self.static_jaas_conf:
+                return "\"-Djava.security.auth.login.config=%s -Djava.security.krb5.conf=%s\"" % (SecurityConfig.JAAS_CONF_PATH, SecurityConfig.KRB5CONF_PATH)
+            else:
+                return "\"-Djava.security.krb5.conf=%s\"" % SecurityConfig.KRB5CONF_PATH
+        else:
+            return ""
+
+    def props(self, prefix=''):
+        """
+        Return properties as string with line separators, optionally with a prefix.
+        This is used to append security config properties to
+        a properties file.
+        :param prefix: prefix to add to each property
+        :return: a string containing line-separated properties
+        """
+        if self.security_protocol == SecurityConfig.PLAINTEXT:
+            return ""
+        if self.has_sasl and not self.static_jaas_conf and 'sasl.jaas.config' not in self.properties:
+            raise Exception("JAAS configuration property has not yet been initialized")
+        config_lines = (prefix + key + "=" + value for key, value in self.properties.iteritems())
+        # Extra blank lines ensure this can be appended/prepended safely
+        return "\n".join(itertools.chain([""], config_lines, [""]))
+
+    def __str__(self):
+        """
+        Return properties as a string with line separators.
+        """
+        return self.props()
--- a/tests/kafkatest/services/security/templates/jaas.conf
+++ b/tests/kafkatest/services/security/templates/jaas.conf
@@ -0,0 +1,108 @@
+/**
+  * Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE
+  * file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file
+  * to You under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the
+  * License. You may obtain a copy of the License at
+  *
+  * http://www.apache.org/licenses/LICENSE-2.0
+  *
+  * Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+  * an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+  * specific language governing permissions and limitations under the License.
+  */
+
+
+{% if static_jaas_conf %}
+KafkaClient {
+{% endif %}
+{% if "GSSAPI" in client_sasl_mechanism %}
+{% if is_ibm_jdk %}
+    com.ibm.security.auth.module.Krb5LoginModule required debug=false
+    credsType=both
+    useKeytab="file:/mnt/security/keytab"
+    principal="client@EXAMPLE.COM";
+{% else %}
+    com.sun.security.auth.module.Krb5LoginModule required debug=false
+    doNotPrompt=true
+    useKeyTab=true
+    storeKey=true
+    keyTab="/mnt/security/keytab"
+    principal="client@EXAMPLE.COM";
+{% endif %}
+{% elif client_sasl_mechanism == "PLAIN" %}
+	org.apache.kafka.common.security.plain.PlainLoginModule required
+	username="client"
+	password="client-secret";
+{% elif "SCRAM-SHA-256" in client_sasl_mechanism or "SCRAM-SHA-512" in client_sasl_mechanism  %}
+	org.apache.kafka.common.security.scram.ScramLoginModule required
+	username="{{ SecurityConfig.SCRAM_CLIENT_USER }}"
+	password="{{ SecurityConfig.SCRAM_CLIENT_PASSWORD }}";
+{% endif %}
+
+{% if static_jaas_conf %}
+};
+
+KafkaServer {
+{% if "GSSAPI" in enabled_sasl_mechanisms %}
+{% if is_ibm_jdk %}
+    com.ibm.security.auth.module.Krb5LoginModule required debug=false
+    credsType=both
+    useKeytab="file:/mnt/security/keytab"
+    principal="kafka/{{ node.account.hostname }}@EXAMPLE.COM";
+{% else %}
+    com.sun.security.auth.module.Krb5LoginModule required debug=false
+    doNotPrompt=true
+    useKeyTab=true
+    storeKey=true
+    keyTab="/mnt/security/keytab"
+    principal="kafka/{{ node.account.hostname }}@EXAMPLE.COM";
+{% endif %}
+{% endif %}
+{% if "PLAIN" in enabled_sasl_mechanisms %}
+	org.apache.kafka.common.security.plain.PlainLoginModule required
+	username="kafka"
+	password="kafka-secret"
+	user_client="client-secret"
+	user_kafka="kafka-secret";
+{% endif %}
+{% if "SCRAM-SHA-256" in client_sasl_mechanism or "SCRAM-SHA-512" in client_sasl_mechanism %}
+	org.apache.kafka.common.security.scram.ScramLoginModule required
+	username="{{ SecurityConfig.SCRAM_BROKER_USER }}"
+	password="{{ SecurityConfig.SCRAM_BROKER_PASSWORD }}";
+{% endif %}
+};
+
+{% if zk_sasl %}
+Client {
+{% if is_ibm_jdk %}
+    com.ibm.security.auth.module.Krb5LoginModule required debug=false
+    credsType=both
+    useKeytab="file:/mnt/security/keytab"
+    principal="zkclient@EXAMPLE.COM";
+{% else %}
+   com.sun.security.auth.module.Krb5LoginModule required
+   useKeyTab=true
+   keyTab="/mnt/security/keytab"
+   storeKey=true
+   useTicketCache=false
+   principal="zkclient@EXAMPLE.COM";
+{% endif %}
+};
+
+Server {
+{% if is_ibm_jdk %}
+   com.ibm.security.auth.module.Krb5LoginModule required debug=false
+   credsType=both
+   useKeyTab="file:/mnt/security/keytab"
+   principal="zookeeper/{{ node.account.hostname }}@EXAMPLE.COM";
+{% else %}
+   com.sun.security.auth.module.Krb5LoginModule required
+   useKeyTab=true
+   keyTab="/mnt/security/keytab"
+   storeKey=true
+   useTicketCache=false
+   principal="zookeeper/{{ node.account.hostname }}@EXAMPLE.COM";
+{% endif %}
+};
+{% endif %}
+{% endif %}
--- a/tests/kafkatest/services/security/templates/minikdc.properties
+++ b/tests/kafkatest/services/security/templates/minikdc.properties
@@ -0,0 +1,17 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+# 
+#    http://www.apache.org/licenses/LICENSE-2.0
+# 
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+kdc.bind.address=0.0.0.0
+
--- a/tests/kafkatest/services/streams.py
+++ b/tests/kafkatest/services/streams.py
@@ -0,0 +1,701 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os.path
+import signal
+import streams_property
+import consumer_property
+from ducktape.services.service import Service
+from ducktape.utils.util import wait_until
+from kafkatest.directory_layout.kafka_path import KafkaPathResolverMixin
+from kafkatest.services.kafka import KafkaConfig
+from kafkatest.services.monitor.jmx import JmxMixin
+from kafkatest.version import LATEST_0_10_0, LATEST_0_10_1
+
+STATE_DIR = "state.dir"
+
+class StreamsTestBaseService(KafkaPathResolverMixin, JmxMixin, Service):
+    """Base class for Streams Test services providing some common settings and functionality"""
+
+    PERSISTENT_ROOT = "/mnt/streams"
+
+    # The log file contains normal log4j logs written using a file appender. stdout and stderr are handled separately
+    CONFIG_FILE = os.path.join(PERSISTENT_ROOT, "streams.properties")
+    LOG_FILE = os.path.join(PERSISTENT_ROOT, "streams.log")
+    STDOUT_FILE = os.path.join(PERSISTENT_ROOT, "streams.stdout")
+    STDERR_FILE = os.path.join(PERSISTENT_ROOT, "streams.stderr")
+    JMX_LOG_FILE = os.path.join(PERSISTENT_ROOT, "jmx_tool.log")
+    JMX_ERR_FILE = os.path.join(PERSISTENT_ROOT, "jmx_tool.err.log")
+    LOG4J_CONFIG_FILE = os.path.join(PERSISTENT_ROOT, "tools-log4j.properties")
+    PID_FILE = os.path.join(PERSISTENT_ROOT, "streams.pid")
+
+    CLEAN_NODE_ENABLED = True
+
+    logs = {
+        "streams_config": {
+            "path": CONFIG_FILE,
+            "collect_default": True},
+        "streams_config.1": {
+            "path": CONFIG_FILE + ".1",
+            "collect_default": True},
+        "streams_config.0-1": {
+            "path": CONFIG_FILE + ".0-1",
+            "collect_default": True},
+        "streams_config.1-1": {
+            "path": CONFIG_FILE + ".1-1",
+            "collect_default": True},
+        "streams_log": {
+            "path": LOG_FILE,
+            "collect_default": True},
+        "streams_stdout": {
+            "path": STDOUT_FILE,
+            "collect_default": True},
+        "streams_stderr": {
+            "path": STDERR_FILE,
+            "collect_default": True},
+        "streams_log.1": {
+            "path": LOG_FILE + ".1",
+            "collect_default": True},
+        "streams_stdout.1": {
+            "path": STDOUT_FILE + ".1",
+            "collect_default": True},
+        "streams_stderr.1": {
+            "path": STDERR_FILE + ".1",
+            "collect_default": True},
+        "streams_log.2": {
+            "path": LOG_FILE + ".2",
+            "collect_default": True},
+        "streams_stdout.2": {
+            "path": STDOUT_FILE + ".2",
+            "collect_default": True},
+        "streams_stderr.2": {
+            "path": STDERR_FILE + ".2",
+            "collect_default": True},
+        "streams_log.3": {
+            "path": LOG_FILE + ".3",
+            "collect_default": True},
+        "streams_stdout.3": {
+            "path": STDOUT_FILE + ".3",
+            "collect_default": True},
+        "streams_stderr.3": {
+            "path": STDERR_FILE + ".3",
+            "collect_default": True},
+        "streams_log.0-1": {
+            "path": LOG_FILE + ".0-1",
+            "collect_default": True},
+        "streams_stdout.0-1": {
+            "path": STDOUT_FILE + ".0-1",
+            "collect_default": True},
+        "streams_stderr.0-1": {
+            "path": STDERR_FILE + ".0-1",
+            "collect_default": True},
+        "streams_log.0-2": {
+            "path": LOG_FILE + ".0-2",
+            "collect_default": True},
+        "streams_stdout.0-2": {
+            "path": STDOUT_FILE + ".0-2",
+            "collect_default": True},
+        "streams_stderr.0-2": {
+            "path": STDERR_FILE + ".0-2",
+            "collect_default": True},
+        "streams_log.0-3": {
+            "path": LOG_FILE + ".0-3",
+            "collect_default": True},
+        "streams_stdout.0-3": {
+            "path": STDOUT_FILE + ".0-3",
+            "collect_default": True},
+        "streams_stderr.0-3": {
+            "path": STDERR_FILE + ".0-3",
+            "collect_default": True},
+        "streams_log.0-4": {
+            "path": LOG_FILE + ".0-4",
+            "collect_default": True},
+        "streams_stdout.0-4": {
+            "path": STDOUT_FILE + ".0-4",
+            "collect_default": True},
+        "streams_stderr.0-4": {
+            "path": STDERR_FILE + ".0-4",
+            "collect_default": True},
+        "streams_log.0-5": {
+            "path": LOG_FILE + ".0-5",
+            "collect_default": True},
+        "streams_stdout.0-5": {
+            "path": STDOUT_FILE + ".0-5",
+            "collect_default": True},
+        "streams_stderr.0-5": {
+            "path": STDERR_FILE + ".0-5",
+            "collect_default": True},
+        "streams_log.0-6": {
+            "path": LOG_FILE + ".0-6",
+            "collect_default": True},
+        "streams_stdout.0-6": {
+            "path": STDOUT_FILE + ".0-6",
+            "collect_default": True},
+        "streams_stderr.0-6": {
+            "path": STDERR_FILE + ".0-6",
+            "collect_default": True},
+        "streams_log.1-1": {
+            "path": LOG_FILE + ".1-1",
+            "collect_default": True},
+        "streams_stdout.1-1": {
+            "path": STDOUT_FILE + ".1-1",
+            "collect_default": True},
+        "streams_stderr.1-1": {
+            "path": STDERR_FILE + ".1-1",
+            "collect_default": True},
+        "streams_log.1-2": {
+            "path": LOG_FILE + ".1-2",
+            "collect_default": True},
+        "streams_stdout.1-2": {
+            "path": STDOUT_FILE + ".1-2",
+            "collect_default": True},
+        "streams_stderr.1-2": {
+            "path": STDERR_FILE + ".1-2",
+            "collect_default": True},
+        "streams_log.1-3": {
+            "path": LOG_FILE + ".1-3",
+            "collect_default": True},
+        "streams_stdout.1-3": {
+            "path": STDOUT_FILE + ".1-3",
+            "collect_default": True},
+        "streams_stderr.1-3": {
+            "path": STDERR_FILE + ".1-3",
+            "collect_default": True},
+        "streams_log.1-4": {
+            "path": LOG_FILE + ".1-4",
+            "collect_default": True},
+        "streams_stdout.1-4": {
+            "path": STDOUT_FILE + ".1-4",
+            "collect_default": True},
+        "streams_stderr.1-4": {
+            "path": STDERR_FILE + ".1-4",
+            "collect_default": True},
+        "streams_log.1-5": {
+            "path": LOG_FILE + ".1-5",
+            "collect_default": True},
+        "streams_stdout.1-5": {
+            "path": STDOUT_FILE + ".1-5",
+            "collect_default": True},
+        "streams_stderr.1-5": {
+            "path": STDERR_FILE + ".1-5",
+            "collect_default": True},
+        "streams_log.1-6": {
+            "path": LOG_FILE + ".1-6",
+            "collect_default": True},
+        "streams_stdout.1-6": {
+            "path": STDOUT_FILE + ".1-6",
+            "collect_default": True},
+        "streams_stderr.1-6": {
+            "path": STDERR_FILE + ".1-6",
+            "collect_default": True},
+        "jmx_log": {
+            "path": JMX_LOG_FILE,
+            "collect_default": True},
+        "jmx_err": {
+            "path": JMX_ERR_FILE,
+            "collect_default": True},
+    }
+
+    def __init__(self, test_context, kafka, streams_class_name, user_test_args1, user_test_args2=None, user_test_args3=None, user_test_args4=None):
+        Service.__init__(self, test_context, num_nodes=1)
+        self.kafka = kafka
+        self.args = {'streams_class_name': streams_class_name,
+                     'user_test_args1': user_test_args1,
+                     'user_test_args2': user_test_args2,
+                     'user_test_args3': user_test_args3,
+                     'user_test_args4': user_test_args4}
+        self.log_level = "DEBUG"
+
+    @property
+    def node(self):
+        return self.nodes[0]
+
+    def pids(self, node):
+        try:
+            pids = [pid for pid in node.account.ssh_capture("cat " + self.PID_FILE, callback=str)]
+            return [int(pid) for pid in pids]
+        except Exception, exception:
+            self.logger.debug(str(exception))
+            return []
+
+    def stop_nodes(self, clean_shutdown=True):
+        for node in self.nodes:
+            self.stop_node(node, clean_shutdown)
+
+    def stop_node(self, node, clean_shutdown=True):
+        self.logger.info((clean_shutdown and "Cleanly" or "Forcibly") + " stopping Streams Test on " + str(node.account))
+        pids = self.pids(node)
+        sig = signal.SIGTERM if clean_shutdown else signal.SIGKILL
+
+        for pid in pids:
+            node.account.signal(pid, sig, allow_fail=True)
+        if clean_shutdown:
+            for pid in pids:
+                wait_until(lambda: not node.account.alive(pid), timeout_sec=120, err_msg="Streams Test process on " + str(node.account) + " took too long to exit")
+
+        node.account.ssh("rm -f " + self.PID_FILE, allow_fail=False)
+
+    def restart(self):
+        # We don't want to do any clean up here, just restart the process.
+        for node in self.nodes:
+            self.logger.info("Restarting Kafka Streams on " + str(node.account))
+            self.stop_node(node)
+            self.start_node(node)
+
+
+    def abortThenRestart(self):
+        # We don't want to do any clean up here, just abort then restart the process. The running service is killed immediately.
+        for node in self.nodes:
+            self.logger.info("Aborting Kafka Streams on " + str(node.account))
+            self.stop_node(node, False)
+            self.logger.info("Restarting Kafka Streams on " + str(node.account))
+            self.start_node(node)
+
+    def wait(self, timeout_sec=1440):
+        for node in self.nodes:
+            self.wait_node(node, timeout_sec)
+
+    def wait_node(self, node, timeout_sec=None):
+        for pid in self.pids(node):
+            wait_until(lambda: not node.account.alive(pid), timeout_sec=timeout_sec, err_msg="Streams Test process on " + str(node.account) + " took too long to exit")
+
+    def clean_node(self, node):
+        node.account.kill_process("streams", clean_shutdown=False, allow_fail=True)
+        if self.CLEAN_NODE_ENABLED:
+            node.account.ssh("rm -rf " + self.PERSISTENT_ROOT, allow_fail=False)
+
+    def start_cmd(self, node):
+        args = self.args.copy()
+        args['config_file'] = self.CONFIG_FILE
+        args['stdout'] = self.STDOUT_FILE
+        args['stderr'] = self.STDERR_FILE
+        args['pidfile'] = self.PID_FILE
+        args['log4j'] = self.LOG4J_CONFIG_FILE
+        args['kafka_run_class'] = self.path.script("kafka-run-class.sh", node)
+
+        cmd = "( export KAFKA_LOG4J_OPTS=\"-Dlog4j.configuration=file:%(log4j)s\"; " \
+              "INCLUDE_TEST_JARS=true %(kafka_run_class)s %(streams_class_name)s " \
+              " %(config_file)s %(user_test_args1)s %(user_test_args2)s %(user_test_args3)s" \
+              " %(user_test_args4)s & echo $! >&3 ) 1>> %(stdout)s 2>> %(stderr)s 3> %(pidfile)s" % args
+
+        self.logger.info("Executing streams cmd: " + cmd)
+
+        return cmd
+
+    def prop_file(self):
+        cfg = KafkaConfig(**{streams_property.STATE_DIR: self.PERSISTENT_ROOT, streams_property.KAFKA_SERVERS: self.kafka.bootstrap_servers()})
+        return cfg.render()
+
+    def start_node(self, node):
+        node.account.mkdirs(self.PERSISTENT_ROOT)
+        prop_file = self.prop_file()
+        node.account.create_file(self.CONFIG_FILE, prop_file)
+        node.account.create_file(self.LOG4J_CONFIG_FILE, self.render('tools_log4j.properties', log_file=self.LOG_FILE))
+
+        self.logger.info("Starting StreamsTest process on " + str(node.account))
+        with node.account.monitor_log(self.STDOUT_FILE) as monitor:
+            node.account.ssh(self.start_cmd(node))
+            monitor.wait_until('StreamsTest instance started', timeout_sec=60, err_msg="Never saw message indicating StreamsTest finished startup on " + str(node.account))
+
+        if len(self.pids(node)) == 0:
+            raise RuntimeError("No process ids recorded")
+
+
+class StreamsSmokeTestBaseService(StreamsTestBaseService):
+    """Base class for Streams Smoke Test services providing some common settings and functionality"""
+
+    def __init__(self, test_context, kafka, command, processing_guarantee = 'at_least_once', num_threads = 3, replication_factor = 3):
+        super(StreamsSmokeTestBaseService, self).__init__(test_context,
+                                                          kafka,
+                                                          "org.apache.kafka.streams.tests.StreamsSmokeTest",
+                                                          command)
+        self.NUM_THREADS = num_threads
+        self.PROCESSING_GUARANTEE = processing_guarantee
+        self.KAFKA_STREAMS_VERSION = ""
+        self.UPGRADE_FROM = None
+        self.REPLICATION_FACTOR = replication_factor
+
+    def set_version(self, kafka_streams_version):
+        self.KAFKA_STREAMS_VERSION = kafka_streams_version
+
+    def set_upgrade_from(self, upgrade_from):
+        self.UPGRADE_FROM = upgrade_from
+
+    def prop_file(self):
+        properties = {streams_property.STATE_DIR: self.PERSISTENT_ROOT,
+                      streams_property.KAFKA_SERVERS: self.kafka.bootstrap_servers(),
+                      "processing.guarantee": self.PROCESSING_GUARANTEE,
+                      streams_property.NUM_THREADS: self.NUM_THREADS,
+                      "replication.factor": self.REPLICATION_FACTOR,
+                      "num.standby.replicas": 2,
+                      "buffered.records.per.partition": 100,
+                      "commit.interval.ms": 1000,
+                      "auto.offset.reset": "earliest",
+                      "acks": "all"}
+
+        if self.UPGRADE_FROM is not None:
+            properties['upgrade.from'] = self.UPGRADE_FROM
+
+        cfg = KafkaConfig(**properties)
+        return cfg.render()
+
+    def start_cmd(self, node):
+        args = self.args.copy()
+        args['config_file'] = self.CONFIG_FILE
+        args['stdout'] = self.STDOUT_FILE
+        args['stderr'] = self.STDERR_FILE
+        args['pidfile'] = self.PID_FILE
+        args['log4j'] = self.LOG4J_CONFIG_FILE
+        args['version'] = self.KAFKA_STREAMS_VERSION
+        args['kafka_run_class'] = self.path.script("kafka-run-class.sh", node)
+
+        cmd = "( export KAFKA_LOG4J_OPTS=\"-Dlog4j.configuration=file:%(log4j)s\";" \
+              " INCLUDE_TEST_JARS=true UPGRADE_KAFKA_STREAMS_TEST_VERSION=%(version)s" \
+              " bash -x %(kafka_run_class)s %(streams_class_name)s" \
+              " %(config_file)s %(user_test_args1)s" \
+              " & echo $! >&3 ) " \
+              "1>> %(stdout)s 2>> %(stderr)s 3> %(pidfile)s" % args
+
+        self.logger.info("Executing streams cmd: " + cmd)
+
+        return cmd
+
+class StreamsEosTestBaseService(StreamsTestBaseService):
+    """Base class for Streams EOS Test services providing some common settings and functionality"""
+
+    clean_node_enabled = True
+
+    def __init__(self, test_context, kafka, command):
+        super(StreamsEosTestBaseService, self).__init__(test_context,
+                                                        kafka,
+                                                        "org.apache.kafka.streams.tests.StreamsEosTest",
+                                                        command)
+
+    def clean_node(self, node):
+        if self.clean_node_enabled:
+            super(StreamsEosTestBaseService, self).clean_node(node)
+
+
+class StreamsSmokeTestDriverService(StreamsSmokeTestBaseService):
+    def __init__(self, test_context, kafka):
+        super(StreamsSmokeTestDriverService, self).__init__(test_context, kafka, "run")
+        self.DISABLE_AUTO_TERMINATE = ""
+
+    def disable_auto_terminate(self):
+        self.DISABLE_AUTO_TERMINATE = "disableAutoTerminate"
+
+    def start_cmd(self, node):
+        args = self.args.copy()
+        args['config_file'] = self.CONFIG_FILE
+        args['stdout'] = self.STDOUT_FILE
+        args['stderr'] = self.STDERR_FILE
+        args['pidfile'] = self.PID_FILE
+        args['log4j'] = self.LOG4J_CONFIG_FILE
+        args['disable_auto_terminate'] = self.DISABLE_AUTO_TERMINATE
+        args['kafka_run_class'] = self.path.script("kafka-run-class.sh", node)
+
+        cmd = "( export KAFKA_LOG4J_OPTS=\"-Dlog4j.configuration=file:%(log4j)s\"; " \
+              "INCLUDE_TEST_JARS=true %(kafka_run_class)s %(streams_class_name)s " \
+              " %(config_file)s %(user_test_args1)s %(disable_auto_terminate)s" \
+              " & echo $! >&3 ) 1>> %(stdout)s 2>> %(stderr)s 3> %(pidfile)s" % args
+
+        return cmd
+
+class StreamsSmokeTestJobRunnerService(StreamsSmokeTestBaseService):
+    def __init__(self, test_context, kafka, processing_guarantee = 'at_least_once', num_threads = 3, replication_factor = 3):
+        super(StreamsSmokeTestJobRunnerService, self).__init__(test_context, kafka, "process", processing_guarantee = processing_guarantee, num_threads = num_threads, replication_factor = replication_factor)
+
+class StreamsSmokeTestEOSJobRunnerService(StreamsSmokeTestBaseService):
+    def __init__(self, test_context, kafka):
+        super(StreamsSmokeTestEOSJobRunnerService, self).__init__(test_context, kafka, "process-eos")
+
+
+class StreamsEosTestDriverService(StreamsEosTestBaseService):
+    def __init__(self, test_context, kafka):
+        super(StreamsEosTestDriverService, self).__init__(test_context, kafka, "run")
+
+
+class StreamsEosTestJobRunnerService(StreamsEosTestBaseService):
+    def __init__(self, test_context, kafka):
+        super(StreamsEosTestJobRunnerService, self).__init__(test_context, kafka, "process")
+
+class StreamsComplexEosTestJobRunnerService(StreamsEosTestBaseService):
+    def __init__(self, test_context, kafka):
+        super(StreamsComplexEosTestJobRunnerService, self).__init__(test_context, kafka, "process-complex")
+
+class StreamsEosTestVerifyRunnerService(StreamsEosTestBaseService):
+    def __init__(self, test_context, kafka):
+        super(StreamsEosTestVerifyRunnerService, self).__init__(test_context, kafka, "verify")
+
+
+class StreamsComplexEosTestVerifyRunnerService(StreamsEosTestBaseService):
+    def __init__(self, test_context, kafka):
+        super(StreamsComplexEosTestVerifyRunnerService, self).__init__(test_context, kafka, "verify-complex")
+
+
+class StreamsSmokeTestShutdownDeadlockService(StreamsSmokeTestBaseService):
+    def __init__(self, test_context, kafka):
+        super(StreamsSmokeTestShutdownDeadlockService, self).__init__(test_context, kafka, "close-deadlock-test")
+
+
+class StreamsBrokerCompatibilityService(StreamsTestBaseService):
+    def __init__(self, test_context, kafka, eosEnabled):
+        super(StreamsBrokerCompatibilityService, self).__init__(test_context,
+                                                                kafka,
+                                                                "org.apache.kafka.streams.tests.BrokerCompatibilityTest",
+                                                                eosEnabled)
+
+
+class StreamsBrokerDownResilienceService(StreamsTestBaseService):
+    def __init__(self, test_context, kafka, configs):
+        super(StreamsBrokerDownResilienceService, self).__init__(test_context,
+                                                                 kafka,
+                                                                 "org.apache.kafka.streams.tests.StreamsBrokerDownResilienceTest",
+                                                                 configs)
+
+    def start_cmd(self, node):
+        args = self.args.copy()
+        args['config_file'] = self.CONFIG_FILE
+        args['stdout'] = self.STDOUT_FILE
+        args['stderr'] = self.STDERR_FILE
+        args['pidfile'] = self.PID_FILE
+        args['log4j'] = self.LOG4J_CONFIG_FILE
+        args['kafka_run_class'] = self.path.script("kafka-run-class.sh", node)
+
+        cmd = "( export KAFKA_LOG4J_OPTS=\"-Dlog4j.configuration=file:%(log4j)s\"; " \
+              "INCLUDE_TEST_JARS=true %(kafka_run_class)s %(streams_class_name)s " \
+              " %(config_file)s %(user_test_args1)s %(user_test_args2)s %(user_test_args3)s" \
+              " %(user_test_args4)s & echo $! >&3 ) 1>> %(stdout)s 2>> %(stderr)s 3> %(pidfile)s" % args
+
+        self.logger.info("Executing: " + cmd)
+
+        return cmd
+
+
+class StreamsStandbyTaskService(StreamsTestBaseService):
+    def __init__(self, test_context, kafka, configs):
+        super(StreamsStandbyTaskService, self).__init__(test_context,
+                                                        kafka,
+                                                        "org.apache.kafka.streams.tests.StreamsStandByReplicaTest",
+                                                        configs)
+
+
+class StreamsOptimizedUpgradeTestService(StreamsTestBaseService):
+    def __init__(self, test_context, kafka):
+        super(StreamsOptimizedUpgradeTestService, self).__init__(test_context,
+                                                                 kafka,
+                                                                 "org.apache.kafka.streams.tests.StreamsOptimizedTest",
+                                                                 "")
+        self.OPTIMIZED_CONFIG = 'none'
+        self.INPUT_TOPIC = None
+        self.AGGREGATION_TOPIC = None
+        self.REDUCE_TOPIC = None
+        self.JOIN_TOPIC = None
+
+    def prop_file(self):
+        properties = {streams_property.STATE_DIR: self.PERSISTENT_ROOT,
+                      streams_property.KAFKA_SERVERS: self.kafka.bootstrap_servers()}
+
+        properties['topology.optimization'] = self.OPTIMIZED_CONFIG
+        properties['input.topic'] = self.INPUT_TOPIC
+        properties['aggregation.topic'] = self.AGGREGATION_TOPIC
+        properties['reduce.topic'] = self.REDUCE_TOPIC
+        properties['join.topic'] = self.JOIN_TOPIC
+
+        cfg = KafkaConfig(**properties)
+        return cfg.render()
+
+
+class StreamsUpgradeTestJobRunnerService(StreamsTestBaseService):
+    def __init__(self, test_context, kafka):
+        super(StreamsUpgradeTestJobRunnerService, self).__init__(test_context,
+                                                                 kafka,
+                                                                 "org.apache.kafka.streams.tests.StreamsUpgradeTest",
+                                                                 "")
+        self.UPGRADE_FROM = None
+        self.UPGRADE_TO = None
+        self.extra_properties = {}
+
+    def set_config(self, key, value):
+        self.extra_properties[key] = value
+
+    def set_version(self, kafka_streams_version):
+        self.KAFKA_STREAMS_VERSION = kafka_streams_version
+
+    def set_upgrade_from(self, upgrade_from):
+        self.UPGRADE_FROM = upgrade_from
+
+    def set_upgrade_to(self, upgrade_to):
+        self.UPGRADE_TO = upgrade_to
+
+    def prop_file(self):
+        properties = self.extra_properties.copy()
+        properties[streams_property.STATE_DIR] = self.PERSISTENT_ROOT
+        properties[streams_property.KAFKA_SERVERS] = self.kafka.bootstrap_servers()
+
+        if self.UPGRADE_FROM is not None:
+            properties['upgrade.from'] = self.UPGRADE_FROM
+        if self.UPGRADE_TO == "future_version":
+            properties['test.future.metadata'] = "any_value"
+
+        cfg = KafkaConfig(**properties)
+        return cfg.render()
+
+    def start_cmd(self, node):
+        args = self.args.copy()
+
+        if self.KAFKA_STREAMS_VERSION == str(LATEST_0_10_0) or self.KAFKA_STREAMS_VERSION == str(LATEST_0_10_1):
+            args['zk'] = self.kafka.zk.connect_setting()
+        else:
+            args['zk'] = ""
+        args['config_file'] = self.CONFIG_FILE
+        args['stdout'] = self.STDOUT_FILE
+        args['stderr'] = self.STDERR_FILE
+        args['pidfile'] = self.PID_FILE
+        args['log4j'] = self.LOG4J_CONFIG_FILE
+        args['version'] = self.KAFKA_STREAMS_VERSION
+        args['kafka_run_class'] = self.path.script("kafka-run-class.sh", node)
+
+        cmd = "( export KAFKA_LOG4J_OPTS=\"-Dlog4j.configuration=file:%(log4j)s\"; " \
+              "INCLUDE_TEST_JARS=true UPGRADE_KAFKA_STREAMS_TEST_VERSION=%(version)s " \
+              " %(kafka_run_class)s %(streams_class_name)s %(zk)s %(config_file)s " \
+              " & echo $! >&3 ) 1>> %(stdout)s 2>> %(stderr)s 3> %(pidfile)s" % args
+
+        self.logger.info("Executing: " + cmd)
+
+        return cmd
+
+
+class StreamsNamedRepartitionTopicService(StreamsTestBaseService):
+    def __init__(self, test_context, kafka):
+        super(StreamsNamedRepartitionTopicService, self).__init__(test_context,
+                                                                  kafka,
+                                                                  "org.apache.kafka.streams.tests.StreamsNamedRepartitionTest",
+                                                                  "")
+        self.ADD_ADDITIONAL_OPS = 'false'
+        self.INPUT_TOPIC = None
+        self.AGGREGATION_TOPIC = None
+
+    def prop_file(self):
+        properties = {streams_property.STATE_DIR: self.PERSISTENT_ROOT,
+                      streams_property.KAFKA_SERVERS: self.kafka.bootstrap_servers()}
+
+        properties['input.topic'] = self.INPUT_TOPIC
+        properties['aggregation.topic'] = self.AGGREGATION_TOPIC
+        properties['add.operations'] = self.ADD_ADDITIONAL_OPS
+
+        cfg = KafkaConfig(**properties)
+        return cfg.render()
+
+
+class StaticMemberTestService(StreamsTestBaseService):
+    def __init__(self, test_context, kafka, group_instance_id, num_threads):
+        super(StaticMemberTestService, self).__init__(test_context,
+                                                      kafka,
+                                                      "org.apache.kafka.streams.tests.StaticMemberTestClient",
+                                                      "")
+        self.INPUT_TOPIC = None
+        self.GROUP_INSTANCE_ID = group_instance_id
+        self.NUM_THREADS = num_threads
+    def prop_file(self):
+        properties = {streams_property.STATE_DIR: self.PERSISTENT_ROOT,
+                      streams_property.KAFKA_SERVERS: self.kafka.bootstrap_servers(),
+                      streams_property.NUM_THREADS: self.NUM_THREADS,
+                      consumer_property.GROUP_INSTANCE_ID: self.GROUP_INSTANCE_ID,
+                      consumer_property.SESSION_TIMEOUT_MS: 60000}
+
+        properties['input.topic'] = self.INPUT_TOPIC
+
+        cfg = KafkaConfig(**properties)
+        return cfg.render()
+
+
+class CooperativeRebalanceUpgradeService(StreamsTestBaseService):
+    def __init__(self, test_context, kafka):
+        super(CooperativeRebalanceUpgradeService, self).__init__(test_context,
+                                                                 kafka,
+                                                                 "org.apache.kafka.streams.tests.StreamsUpgradeToCooperativeRebalanceTest",
+                                                                 "")
+        self.UPGRADE_FROM = None
+        # these properties will be overridden in test
+        self.SOURCE_TOPIC = None
+        self.SINK_TOPIC = None
+        self.TASK_DELIMITER = "#"
+        self.REPORT_INTERVAL = None
+
+        self.standby_tasks = None
+        self.active_tasks = None
+        self.upgrade_phase = None
+
+    def set_tasks(self, task_string):
+        label = "TASK-ASSIGNMENTS:"
+        task_string_substr = task_string[len(label):]
+        all_tasks = task_string_substr.split(self.TASK_DELIMITER)
+        self.active_tasks = set(all_tasks[0].split(","))
+        if len(all_tasks) > 1:
+            self.standby_tasks = set(all_tasks[1].split(","))
+
+    def set_version(self, kafka_streams_version):
+        self.KAFKA_STREAMS_VERSION = kafka_streams_version
+
+    def set_upgrade_phase(self, upgrade_phase):
+        self.upgrade_phase = upgrade_phase
+
+    def start_cmd(self, node):
+        args = self.args.copy()
+
+        if self.KAFKA_STREAMS_VERSION == str(LATEST_0_10_0) or self.KAFKA_STREAMS_VERSION == str(LATEST_0_10_1):
+            args['zk'] = self.kafka.zk.connect_setting()
+        else:
+            args['zk'] = ""
+        args['config_file'] = self.CONFIG_FILE
+        args['stdout'] = self.STDOUT_FILE
+        args['stderr'] = self.STDERR_FILE
+        args['pidfile'] = self.PID_FILE
+        args['log4j'] = self.LOG4J_CONFIG_FILE
+        args['version'] = self.KAFKA_STREAMS_VERSION
+        args['kafka_run_class'] = self.path.script("kafka-run-class.sh", node)
+
+        cmd = "( export KAFKA_LOG4J_OPTS=\"-Dlog4j.configuration=file:%(log4j)s\"; " \
+              "INCLUDE_TEST_JARS=true UPGRADE_KAFKA_STREAMS_TEST_VERSION=%(version)s " \
+              " %(kafka_run_class)s %(streams_class_name)s %(zk)s %(config_file)s " \
+              " & echo $! >&3 ) 1>> %(stdout)s 2>> %(stderr)s 3> %(pidfile)s" % args
+
+        self.logger.info("Executing: " + cmd)
+
+        return cmd
+
+    def prop_file(self):
+        properties = {streams_property.STATE_DIR: self.PERSISTENT_ROOT,
+                      streams_property.KAFKA_SERVERS: self.kafka.bootstrap_servers()}
+
+        if self.UPGRADE_FROM is not None:
+            properties['upgrade.from'] = self.UPGRADE_FROM
+        else:
+            try:
+                del properties['upgrade.from']
+            except KeyError:
+                self.logger.info("Key 'upgrade.from' not there, better safe than sorry")
+
+        if self.upgrade_phase is not None:
+            properties['upgrade.phase'] = self.upgrade_phase
+
+        properties['source.topic'] = self.SOURCE_TOPIC
+        properties['sink.topic'] = self.SINK_TOPIC
+        properties['task.delimiter'] = self.TASK_DELIMITER
+        properties['report.interval'] = self.REPORT_INTERVAL
+
+        cfg = KafkaConfig(**properties)
+        return cfg.render()
--- a/tests/kafkatest/services/streams_property.py
+++ b/tests/kafkatest/services/streams_property.py
@@ -0,0 +1,22 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""
+Define Streams configuration property names here.
+"""
+
+STATE_DIR = "state.dir"
+KAFKA_SERVERS = "bootstrap.servers"
+NUM_THREADS = "num.stream.threads"
--- a/tests/kafkatest/services/templates/connect_log4j.properties
+++ b/tests/kafkatest/services/templates/connect_log4j.properties
@@ -0,0 +1,29 @@
+##
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+##
+
+# Define the root logger with appender file
+log4j.rootLogger = {{ log_level|default("INFO") }}, FILE
+
+log4j.appender.FILE=org.apache.log4j.FileAppender
+log4j.appender.FILE.File={{ log_file }}
+log4j.appender.FILE.ImmediateFlush=true
+log4j.appender.FILE.Append=true
+log4j.appender.FILE.layout=org.apache.log4j.PatternLayout
+log4j.appender.FILE.layout.conversionPattern=[%d] %p %m (%c)%n
+
+log4j.logger.org.apache.zookeeper=ERROR
+log4j.logger.org.reflections=ERROR
--- a/tests/kafkatest/services/templates/console_consumer.properties
+++ b/tests/kafkatest/services/templates/console_consumer.properties
@@ -0,0 +1,24 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+group.id={{ group_id|default('test-consumer-group') }}
+
+{% if client_id is defined and client_id is not none %}
+client.id={{ client_id }}
+{% endif %}
+
+{% if consumer_metadata_max_age_ms is defined and consumer_metadata_max_age_ms is not none %}
+metadata.max.age.ms={{ consumer_metadata_max_age_ms }}
+{% endif %}
--- a/tests/kafkatest/services/templates/mirror_maker_consumer.properties
+++ b/tests/kafkatest/services/templates/mirror_maker_consumer.properties
@@ -0,0 +1,27 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# see kafka.consumer.ConsumerConfig for more details
+
+bootstrap.servers={{ source.bootstrap_servers(security_config.security_protocol) }}
+
+{% if source_auto_offset_reset is defined and source_auto_offset_reset is not none %}
+auto.offset.reset={{ source_auto_offset_reset|default('latest') }}
+{% endif %}
+
+group.id={{ group_id|default('test-consumer-group') }}
+
+{% if partition_assignment_strategy is defined and partition_assignment_strategy is not none %}
+partition.assignment.strategy={{ partition_assignment_strategy }}
+{% endif %}
--- a/tests/kafkatest/services/templates/mirror_maker_producer.properties
+++ b/tests/kafkatest/services/templates/mirror_maker_producer.properties
@@ -0,0 +1,20 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+bootstrap.servers = {{ target.bootstrap_servers(security_config.security_protocol) }}
+
+{% if producer_interceptor_classes is defined and producer_interceptor_classes is not none %}
+interceptor.classes={{ producer_interceptor_classes }}
+{% endif %}
--- a/tests/kafkatest/services/templates/producer.properties
+++ b/tests/kafkatest/services/templates/producer.properties
@@ -0,0 +1,17 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# see kafka.producer.ProducerConfig for more details
+
+request.timeout.ms={{ request_timeout_ms }}
--- a/tests/kafkatest/services/templates/tools_log4j.properties
+++ b/tests/kafkatest/services/templates/tools_log4j.properties
@@ -0,0 +1,31 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+# Define the root logger with appender file
+log4j.rootLogger = {{ log_level|default("INFO") }}, FILE
+
+{% if loggers is defined %}
+{% for logger, log_level in loggers.iteritems() %}
+log4j.logger.{{ logger }}={{ log_level }}
+{% endfor %}
+{% endif %}
+
+log4j.appender.FILE=org.apache.log4j.FileAppender
+log4j.appender.FILE.File={{ log_file }}
+log4j.appender.FILE.ImmediateFlush=true
+# Set the append to true
+log4j.appender.FILE.Append=true
+log4j.appender.FILE.layout=org.apache.log4j.PatternLayout
+log4j.appender.FILE.layout.conversionPattern=[%d] %p %m (%c)%n
--- a/tests/kafkatest/services/templates/zookeeper.properties
+++ b/tests/kafkatest/services/templates/zookeeper.properties
@@ -0,0 +1,40 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+dataDir=/mnt/zookeeper/data
+{% if zk_client_port %}
+clientPort=2181
+{% endif %}
+{% if zk_client_secure_port %}
+secureClientPort=2182
+serverCnxnFactory=org.apache.zookeeper.server.NettyServerCnxnFactory
+authProvider.x509=org.apache.zookeeper.server.auth.X509AuthenticationProvider
+ssl.keyStore.location=/mnt/security/test.keystore.jks
+ssl.keyStore.password=test-ks-passwd
+ssl.keyStore.type=JKS
+ssl.trustStore.location=/mnt/security/test.truststore.jks
+ssl.trustStore.password=test-ts-passwd
+ssl.trustStore.type=JKS
+{% if zk_tls_encrypt_only %}
+ssl.clientAuth=none
+{% endif %}
+{% endif %}
+maxClientCnxns=0
+initLimit=5
+syncLimit=2
+quorumListenOnAllIPs=true
+{% for node in nodes %}
+server.{{ loop.index }}={{ node.account.hostname }}:2888:3888
+{% endfor %}
--- a/tests/kafkatest/services/transactional_message_copier.py
+++ b/tests/kafkatest/services/transactional_message_copier.py
@@ -0,0 +1,204 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+import json
+import signal
+
+from ducktape.utils.util import wait_until
+from ducktape.services.background_thread import BackgroundThreadService
+from kafkatest.directory_layout.kafka_path import KafkaPathResolverMixin
+from ducktape.cluster.remoteaccount import RemoteCommandError
+
+class TransactionalMessageCopier(KafkaPathResolverMixin, BackgroundThreadService):
+    """This service wraps org.apache.kafka.tools.TransactionalMessageCopier for
+    use in system testing.
+    """
+    PERSISTENT_ROOT = "/mnt/transactional_message_copier"
+    STDOUT_CAPTURE = os.path.join(PERSISTENT_ROOT, "transactional_message_copier.stdout")
+    STDERR_CAPTURE = os.path.join(PERSISTENT_ROOT, "transactional_message_copier.stderr")
+    LOG_DIR = os.path.join(PERSISTENT_ROOT, "logs")
+    LOG_FILE = os.path.join(LOG_DIR, "transactional_message_copier.log")
+    LOG4J_CONFIG = os.path.join(PERSISTENT_ROOT, "tools-log4j.properties")
+
+    logs = {
+        "transactional_message_copier_stdout": {
+            "path": STDOUT_CAPTURE,
+            "collect_default": True},
+        "transactional_message_copier_stderr": {
+            "path": STDERR_CAPTURE,
+            "collect_default": True},
+        "transactional_message_copier_log": {
+            "path": LOG_FILE,
+            "collect_default": True}
+    }
+
+    def __init__(self, context, num_nodes, kafka, transactional_id, consumer_group,
+                 input_topic, input_partition, output_topic, max_messages=-1,
+                 transaction_size=1000, transaction_timeout=None, enable_random_aborts=True,
+                 use_group_metadata=False, group_mode=False):
+        super(TransactionalMessageCopier, self).__init__(context, num_nodes)
+        self.kafka = kafka
+        self.transactional_id = transactional_id
+        self.consumer_group = consumer_group
+        self.transaction_size = transaction_size
+        self.transaction_timeout = transaction_timeout
+        self.input_topic = input_topic
+        self.input_partition = input_partition
+        self.output_topic = output_topic
+        self.max_messages = max_messages
+        self.message_copy_finished = False
+        self.consumed = -1
+        self.remaining = -1
+        self.stop_timeout_sec = 60
+        self.enable_random_aborts = enable_random_aborts
+        self.use_group_metadata = use_group_metadata
+        self.group_mode = group_mode
+        self.loggers = {
+            "org.apache.kafka.clients.producer": "TRACE",
+            "org.apache.kafka.clients.consumer": "TRACE"
+        }
+
+    def _worker(self, idx, node):
+        node.account.ssh("mkdir -p %s" % TransactionalMessageCopier.PERSISTENT_ROOT,
+                         allow_fail=False)
+        # Create and upload log properties
+        log_config = self.render('tools_log4j.properties',
+                                 log_file=TransactionalMessageCopier.LOG_FILE)
+        node.account.create_file(TransactionalMessageCopier.LOG4J_CONFIG, log_config)
+        # Configure security
+        self.security_config = self.kafka.security_config.client_config(node=node)
+        self.security_config.setup_node(node)
+        cmd = self.start_cmd(node, idx)
+        self.logger.debug("TransactionalMessageCopier %d command: %s" % (idx, cmd))
+        try:
+            for line in node.account.ssh_capture(cmd):
+                line = line.strip()
+                data = self.try_parse_json(line)
+                if data is not None:
+                    with self.lock:
+                        self.remaining = int(data["remaining"])
+                        self.consumed = int(data["consumed"])
+                        self.logger.info("%s: consumed %d, remaining %d" %
+                                         (self.transactional_id, self.consumed, self.remaining))
+                        if "shutdown_complete" in data:
+                           if self.remaining == 0:
+                                # We are only finished if the remaining
+                                # messages at the time of shutdown is 0.
+                                #
+                                # Otherwise a clean shutdown would still print
+                                # a 'shutdown complete' messages even though
+                                # there are unprocessed messages, causing
+                                # tests to fail.
+                                self.logger.info("%s : Finished message copy" % self.transactional_id)
+                                self.message_copy_finished = True
+                           else:
+                               self.logger.info("%s : Shut down without finishing message copy." %\
+                                                self.transactional_id)
+        except RemoteCommandError as e:
+            self.logger.debug("Got exception while reading output from copier, \
+                              probably because it was SIGKILL'd (exit code 137): %s" % str(e))
+
+    def start_cmd(self, node, idx):
+        cmd  = "export LOG_DIR=%s;" % TransactionalMessageCopier.LOG_DIR
+        cmd += " export KAFKA_OPTS=%s;" % self.security_config.kafka_opts
+        cmd += " export KAFKA_LOG4J_OPTS=\"-Dlog4j.configuration=file:%s\"; " % TransactionalMessageCopier.LOG4J_CONFIG
+        cmd += self.path.script("kafka-run-class.sh", node) + " org.apache.kafka.tools." + "TransactionalMessageCopier"
+        cmd += " --broker-list %s" % self.kafka.bootstrap_servers(self.security_config.security_protocol)
+        cmd += " --transactional-id %s" % self.transactional_id
+        cmd += " --consumer-group %s" % self.consumer_group
+        cmd += " --input-topic %s" % self.input_topic
+        cmd += " --output-topic %s" % self.output_topic
+        cmd += " --input-partition %s" % str(self.input_partition)
+        cmd += " --transaction-size %s" % str(self.transaction_size)
+
+        if self.transaction_timeout is not None:
+            cmd += " --transaction-timeout %s" % str(self.transaction_timeout)
+
+        if self.enable_random_aborts:
+            cmd += " --enable-random-aborts"
+
+        if self.use_group_metadata:
+            cmd += " --use-group-metadata"
+
+        if self.group_mode:
+            cmd += " --group-mode"
+
+        if self.max_messages > 0:
+            cmd += " --max-messages %s" % str(self.max_messages)
+        cmd += " 2>> %s | tee -a %s &" % (TransactionalMessageCopier.STDERR_CAPTURE, TransactionalMessageCopier.STDOUT_CAPTURE)
+
+        return cmd
+
+    def clean_node(self, node):
+        self.kill_node(node, clean_shutdown=False)
+        node.account.ssh("rm -rf " + self.PERSISTENT_ROOT, allow_fail=False)
+        self.security_config.clean_node(node)
+
+    def pids(self, node):
+        try:
+            cmd = "jps | grep -i TransactionalMessageCopier | awk '{print $1}'"
+            pid_arr = [pid for pid in node.account.ssh_capture(cmd, allow_fail=True, callback=int)]
+            return pid_arr
+        except (RemoteCommandError, ValueError) as e:
+            self.logger.error("Could not list pids: %s" % str(e))
+            return []
+
+    def alive(self, node):
+        return len(self.pids(node)) > 0
+
+    def kill_node(self, node, clean_shutdown=True):
+        pids = self.pids(node)
+        sig = signal.SIGTERM if clean_shutdown else signal.SIGKILL
+        for pid in pids:
+            node.account.signal(pid, sig)
+            wait_until(lambda: len(self.pids(node)) == 0, timeout_sec=60, err_msg="Message Copier failed to stop")
+
+    def stop_node(self, node, clean_shutdown=True):
+        self.kill_node(node, clean_shutdown)
+        stopped = self.wait_node(node, timeout_sec=self.stop_timeout_sec)
+        assert stopped, "Node %s: did not stop within the specified timeout of %s seconds" % \
+            (str(node.account), str(self.stop_timeout_sec))
+
+    def restart(self, clean_shutdown):
+        if self.is_done:
+            return
+        node = self.nodes[0]
+        with self.lock:
+            self.consumed = -1
+            self.remaining = -1
+        self.stop_node(node, clean_shutdown)
+        self.start_node(node)
+
+    def try_parse_json(self, string):
+        """Try to parse a string as json. Return None if not parseable."""
+        try:
+            record = json.loads(string)
+            return record
+        except ValueError:
+            self.logger.debug("Could not parse as json: %s" % str(string))
+            return None
+
+    @property
+    def is_done(self):
+        return self.message_copy_finished
+
+    def progress_percent(self):
+        with self.lock:
+            if self.remaining < 0:
+                return 0
+            if self.consumed + self.remaining == 0:
+                return 100
+            return (float(self.consumed)/float(self.consumed + self.remaining)) * 100
--- a/tests/kafkatest/services/trogdor/init.py
+++ b/tests/kafkatest/services/trogdor/init.py
@@ -0,0 +1,14 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
--- a/tests/kafkatest/services/trogdor/consume_bench_workload.py
+++ b/tests/kafkatest/services/trogdor/consume_bench_workload.py
@@ -0,0 +1,56 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+
+from ducktape.services.service import Service
+from kafkatest.services.trogdor.task_spec import TaskSpec
+
+
+class ConsumeBenchWorkloadSpec(TaskSpec):
+    def __init__(self, start_ms, duration_ms, consumer_node, bootstrap_servers,
+                 target_messages_per_sec, max_messages, active_topics,
+                 consumer_conf, common_client_conf, admin_client_conf, consumer_group=None, threads_per_worker=1):
+        super(ConsumeBenchWorkloadSpec, self).__init__(start_ms, duration_ms)
+        self.message["class"] = "org.apache.kafka.trogdor.workload.ConsumeBenchSpec"
+        self.message["consumerNode"] = consumer_node
+        self.message["bootstrapServers"] = bootstrap_servers
+        self.message["targetMessagesPerSec"] = target_messages_per_sec
+        self.message["maxMessages"] = max_messages
+        self.message["consumerConf"] = consumer_conf
+        self.message["adminClientConf"] = admin_client_conf
+        self.message["commonClientConf"] = common_client_conf
+        self.message["activeTopics"] = active_topics
+        self.message["threadsPerWorker"] = threads_per_worker
+        if consumer_group is not None:
+            self.message["consumerGroup"] = consumer_group
+
+
+class ConsumeBenchWorkloadService(Service):
+    def __init__(self, context, kafka):
+        Service.__init__(self, context, num_nodes=1)
+        self.bootstrap_servers = kafka.bootstrap_servers(validate=False)
+        self.consumer_node = self.nodes[0].account.hostname
+
+    def free(self):
+        Service.free(self)
+
+    def wait_node(self, node, timeout_sec=None):
+        pass
+
+    def stop_node(self, node):
+        pass
+
+    def clean_node(self, node):
+        pass
--- a/tests/kafkatest/services/trogdor/degraded_network_fault_spec.py
+++ b/tests/kafkatest/services/trogdor/degraded_network_fault_spec.py
@@ -0,0 +1,48 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from kafkatest.services.trogdor.task_spec import TaskSpec
+
+
+class DegradedNetworkFaultSpec(TaskSpec):
+    """
+    The specification for a network degradation fault.
+
+    Degrades the network so that traffic on a subset of nodes has higher latency
+    """
+
+    def __init__(self, start_ms, duration_ms):
+        """
+        Create a new NetworkDegradeFaultSpec.
+
+        :param start_ms:        The start time, as described in task_spec.py
+        :param duration_ms:     The duration in milliseconds.
+        """
+        super(DegradedNetworkFaultSpec, self).__init__(start_ms, duration_ms)
+        self.message["class"] = "org.apache.kafka.trogdor.fault.DegradedNetworkFaultSpec"
+        self.message["nodeSpecs"] = {}
+
+    def add_node_spec(self, node, networkDevice, latencyMs=0, rateLimitKbit=0):
+        """
+        Add a node spec to this fault spec
+        :param node:            The node name which is to be degraded
+        :param networkDevice:   The network device name (e.g., eth0) to apply the degradation to
+        :param latencyMs:       Optional. How much latency to add to each packet
+        :param rateLimitKbit:   Optional. Maximum throughput in kilobits per second to allow
+        :return:
+        """
+        self.message["nodeSpecs"][node] = {
+            "rateLimitKbit": rateLimitKbit, "latencyMs": latencyMs, "networkDevice": networkDevice
+        }
--- a/tests/kafkatest/services/trogdor/files_unreadable_fault_spec.py
+++ b/tests/kafkatest/services/trogdor/files_unreadable_fault_spec.py
@@ -0,0 +1,46 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from kafkatest.services.trogdor.task_spec import TaskSpec
+
+
+class FilesUnreadableFaultSpec(TaskSpec):
+    """
+    The specification for a fault which makes files unreadable.
+    """
+
+    def __init__(self, start_ms, duration_ms, node_names, mount_path,
+                 prefix, error_code):
+        """
+        Create a new FilesUnreadableFaultSpec.
+
+        :param start_ms:        The start time, as described in task_spec.py
+        :param duration_ms:     The duration in milliseconds.
+        :param node_names:      The names of the node(s) to create the fault on.
+        :param mount_path:      The mount path.
+        :param prefix:          The prefix within the mount point to make unreadable.
+        :param error_code:      The error code to use.
+        """
+        super(FilesUnreadableFaultSpec, self).__init__(start_ms, duration_ms)
+        self.message["class"] = "org.apache.kafka.trogdor.fault.FilesUnreadableFaultSpec"
+        self.message["nodeNames"] = node_names
+        self.message["mountPath"] = mount_path
+        self.message["prefix"] = prefix
+        self.message["errorCode"] = error_code
+
+        self.kibosh_message = {}
+        self.kibosh_message["type"] = "unreadable"
+        self.kibosh_message["prefix"] = prefix
+        self.kibosh_message["code"] = error_code
--- a/tests/kafkatest/services/trogdor/kibosh.py
+++ b/tests/kafkatest/services/trogdor/kibosh.py
@@ -0,0 +1,156 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import json
+import os.path
+
+from ducktape.services.service import Service
+from ducktape.utils import util
+
+
+class KiboshService(Service):
+    """
+    Kibosh is a fault-injecting FUSE filesystem.
+
+    Attributes:
+        INSTALL_ROOT                    The path of where Kibosh is installed.
+        BINARY_NAME                     The Kibosh binary name.
+        BINARY_PATH                     The path to the kibosh binary.
+    """
+    INSTALL_ROOT = "/opt/kibosh/build"
+    BINARY_NAME = "kibosh"
+    BINARY_PATH = os.path.join(INSTALL_ROOT, BINARY_NAME)
+
+    def __init__(self, context, nodes, target, mirror, persist="/mnt/kibosh"):
+        """
+        Create a Kibosh service.
+
+        :param context:             The TestContext object.
+        :param nodes:               The nodes to put the Kibosh FS on.  Kibosh allocates no
+                                    nodes of its own.
+        :param target:              The target directory, which Kibosh exports a view of.
+        :param mirror:              The mirror directory, where Kibosh injects faults.
+        :param persist:             Where the log files and pid files will be created.
+        """
+        Service.__init__(self, context, num_nodes=0)
+        if (len(nodes) == 0):
+            raise RuntimeError("You must supply at least one node to run the service on.")
+        for node in nodes:
+            self.nodes.append(node)
+
+        self.target = target
+        self.mirror = mirror
+        self.persist = persist
+
+        self.control_path = os.path.join(self.mirror, "kibosh_control")
+        self.pidfile_path = os.path.join(self.persist, "pidfile")
+        self.stdout_stderr_path = os.path.join(self.persist, "kibosh-stdout-stderr.log")
+        self.log_path = os.path.join(self.persist, "kibosh.log")
+        self.logs = {
+            "kibosh-stdout-stderr.log": {
+                "path": self.stdout_stderr_path,
+                "collect_default": True},
+            "kibosh.log": {
+                "path": self.log_path,
+                "collect_default": True}
+        }
+
+    def free(self):
+        """Clear the nodes list."""
+        # Because the filesystem runs on nodes which have been allocated by other services, those nodes
+        # are not deallocated here.
+        self.nodes = []
+        Service.free(self)
+
+    def kibosh_running(self, node):
+        return 0 == node.account.ssh("test -e '%s'" % self.control_path, allow_fail=True)
+
+    def start_node(self, node):
+        node.account.mkdirs(self.persist)
+        cmd = "sudo -E "
+        cmd += " %s" % KiboshService.BINARY_PATH
+        cmd += " --target %s" % self.target
+        cmd += " --pidfile %s" % self.pidfile_path
+        cmd += " --log %s" % self.log_path
+        cmd += " --control-mode 666"
+        cmd += " --verbose"
+        cmd += " %s" % self.mirror
+        cmd += " &> %s" % self.stdout_stderr_path
+        node.account.ssh(cmd)
+        util.wait_until(lambda: self.kibosh_running(node), 20, backoff_sec=.1,
+                        err_msg="Timed out waiting for kibosh to start on %s" % node.account.hostname)
+
+    def pids(self, node):
+        return [pid for pid in node.account.ssh_capture("test -e '%s' && test -e /proc/$(cat '%s')" %
+                                                        (self.pidfile_path, self.pidfile_path), allow_fail=True)]
+
+    def wait_node(self, node, timeout_sec=None):
+        return len(self.pids(node)) == 0
+
+    def kibosh_process_running(self, node):
+        pids = self.pids(node)
+        if len(pids) == 0:
+            return True
+        return False
+
+    def stop_node(self, node):
+        """Halt kibosh process(es) on this node."""
+        node.account.logger.debug("stop_node(%s): unmounting %s" % (node.name, self.mirror))
+        node.account.ssh("sudo fusermount -u %s" % self.mirror, allow_fail=True)
+        # Wait for the kibosh process to terminate.
+        try:
+            util.wait_until(lambda: self.kibosh_process_running(node), 20, backoff_sec=.1,
+                            err_msg="Timed out waiting for kibosh to stop on %s" % node.account.hostname)
+        except TimeoutError:
+            # If the process won't terminate, use kill -9 to shut it down.
+            node.account.logger.debug("stop_node(%s): killing the kibosh process managing %s" % (node.name, self.mirror))
+            node.account.ssh("sudo kill -9 %s" % (" ".join(self.pids(node))), allow_fail=True)
+            node.account.ssh("sudo fusermount -u %s" % self.mirror)
+            util.wait_until(lambda: self.kibosh_process_running(node), 20, backoff_sec=.1,
+                            err_msg="Timed out waiting for kibosh to stop on %s" % node.account.hostname)
+
+    def clean_node(self, node):
+        """Clean up persistent state on this node - e.g. service logs, configuration files etc."""
+        self.stop_node(node)
+        node.account.ssh("rm -rf -- %s" % self.persist)
+
+    def set_faults(self, node, specs):
+        """
+        Set the currently active faults.
+
+        :param node:        The node.
+        :param spec:        An array of FaultSpec objects describing the faults.
+        """
+        if len(specs) == 0:
+            obj_json = "{}"
+        else:
+            fault_array = [spec.kibosh_message for spec in specs]
+            obj = { 'faults': fault_array }
+            obj_json = json.dumps(obj)
+        node.account.create_file(self.control_path, obj_json)
+
+    def get_fault_json(self, node):
+        """
+        Return a JSON string which contains the currently active faults.
+
+        :param node:        The node.
+
+        :returns:           The fault JSON describing the faults.
+        """
+        iter = node.account.ssh_capture("cat '%s'" % self.control_path)
+        text = ""
+        for line in iter:
+            text = "%s%s" % (text, line.rstrip("\r\n"))
+        return text
--- a/tests/kafkatest/services/trogdor/network_partition_fault_spec.py
+++ b/tests/kafkatest/services/trogdor/network_partition_fault_spec.py
@@ -0,0 +1,39 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from kafkatest.services.trogdor.task_spec import TaskSpec
+
+
+class NetworkPartitionFaultSpec(TaskSpec):
+    """
+    The specification for a network partition fault.
+
+    Network partition faults fracture the network into different partitions
+    that cannot communicate with each other.
+    """
+
+    def __init__(self, start_ms, duration_ms, partitions):
+        """
+        Create a new NetworkPartitionFaultSpec.
+
+        :param start_ms:        The start time, as described in task_spec.py
+        :param duration_ms:     The duration in milliseconds.
+        :param partitions:      An array of arrays describing the partitions.
+                                The inner arrays may contain either node names,
+                                or ClusterNode objects.
+        """
+        super(NetworkPartitionFaultSpec, self).__init__(start_ms, duration_ms)
+        self.message["class"] = "org.apache.kafka.trogdor.fault.NetworkPartitionFaultSpec"
+        self.message["partitions"] = [TaskSpec.to_node_names(p) for p in partitions]
--- a/tests/kafkatest/services/trogdor/no_op_task_spec.py
+++ b/tests/kafkatest/services/trogdor/no_op_task_spec.py
@@ -0,0 +1,35 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from kafkatest.services.trogdor.task_spec import TaskSpec
+
+
+class NoOpTaskSpec(TaskSpec):
+    """
+    The specification for a nop-op task.
+
+    No-op faults are used to test Trogdor.  They don't do anything,
+    but must be propagated to all Trogdor agents.
+    """
+
+    def __init__(self, start_ms, duration_ms):
+        """
+        Create a new NoOpFault.
+
+        :param start_ms:        The start time, as described in task_spec.py
+        :param duration_ms:     The duration in milliseconds.
+        """
+        super(NoOpTaskSpec, self).__init__(start_ms, duration_ms)
+        self.message["class"] = "org.apache.kafka.trogdor.task.NoOpTaskSpec";
--- a/tests/kafkatest/services/trogdor/process_stop_fault_spec.py
+++ b/tests/kafkatest/services/trogdor/process_stop_fault_spec.py
@@ -0,0 +1,38 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from kafkatest.services.trogdor.task_spec import TaskSpec
+
+
+class ProcessStopFaultSpec(TaskSpec):
+    """
+    The specification for a process stop fault.
+    """
+
+    def __init__(self, start_ms, duration_ms, nodes, java_process_name):
+        """
+        Create a new ProcessStopFaultSpec.
+
+        :param start_ms:            The start time, as described in task_spec.py
+        :param duration_ms:         The duration in milliseconds.
+        :param node_names:          An array describing the nodes to stop processes on.  The array
+                                    may contain either node names, or ClusterNode objects.
+        :param java_process_name:   The name of the java process to stop.  This is the name which
+                                    is reported by jps, etc., not the OS-level process name.
+        """
+        super(ProcessStopFaultSpec, self).__init__(start_ms, duration_ms)
+        self.message["class"] = "org.apache.kafka.trogdor.fault.ProcessStopFaultSpec"
+        self.message["nodeNames"] = TaskSpec.to_node_names(nodes)
+        self.message["javaProcessName"] = java_process_name
--- a/tests/kafkatest/services/trogdor/produce_bench_workload.py
+++ b/tests/kafkatest/services/trogdor/produce_bench_workload.py
@@ -0,0 +1,56 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+
+from ducktape.services.service import Service
+from kafkatest.services.trogdor.task_spec import TaskSpec
+
+
+class ProduceBenchWorkloadSpec(TaskSpec):
+    def __init__(self, start_ms, duration_ms, producer_node, bootstrap_servers,
+                 target_messages_per_sec, max_messages, producer_conf, admin_client_conf,
+                 common_client_conf, inactive_topics, active_topics,
+                 transaction_generator=None):
+        super(ProduceBenchWorkloadSpec, self).__init__(start_ms, duration_ms)
+        self.message["class"] = "org.apache.kafka.trogdor.workload.ProduceBenchSpec"
+        self.message["producerNode"] = producer_node
+        self.message["bootstrapServers"] = bootstrap_servers
+        self.message["targetMessagesPerSec"] = target_messages_per_sec
+        self.message["maxMessages"] = max_messages
+        self.message["producerConf"] = producer_conf
+        self.message["transactionGenerator"] = transaction_generator
+        self.message["adminClientConf"] = admin_client_conf
+        self.message["commonClientConf"] = common_client_conf
+        self.message["inactiveTopics"] = inactive_topics
+        self.message["activeTopics"] = active_topics
+
+
+class ProduceBenchWorkloadService(Service):
+    def __init__(self, context, kafka):
+        Service.__init__(self, context, num_nodes=1)
+        self.bootstrap_servers = kafka.bootstrap_servers(validate=False)
+        self.producer_node = self.nodes[0].account.hostname
+
+    def free(self):
+        Service.free(self)
+
+    def wait_node(self, node, timeout_sec=None):
+        pass
+
+    def stop_node(self, node):
+        pass
+
+    def clean_node(self, node):
+        pass
--- a/tests/kafkatest/services/trogdor/round_trip_workload.py
+++ b/tests/kafkatest/services/trogdor/round_trip_workload.py
@@ -0,0 +1,49 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+
+from ducktape.services.service import Service
+from kafkatest.services.trogdor.task_spec import TaskSpec
+
+
+class RoundTripWorkloadSpec(TaskSpec):
+    def __init__(self, start_ms, duration_ms, client_node, bootstrap_servers,
+                 target_messages_per_sec, max_messages, active_topics):
+        super(RoundTripWorkloadSpec, self).__init__(start_ms, duration_ms)
+        self.message["class"] = "org.apache.kafka.trogdor.workload.RoundTripWorkloadSpec"
+        self.message["clientNode"] = client_node
+        self.message["bootstrapServers"] = bootstrap_servers
+        self.message["targetMessagesPerSec"] = target_messages_per_sec
+        self.message["maxMessages"] = max_messages
+        self.message["activeTopics"] = active_topics
+
+
+class RoundTripWorkloadService(Service):
+    def __init__(self, context, kafka):
+        Service.__init__(self, context, num_nodes=1)
+        self.bootstrap_servers = kafka.bootstrap_servers(validate=False)
+        self.client_node = self.nodes[0].account.hostname
+
+    def free(self):
+        Service.free(self)
+
+    def wait_node(self, node, timeout_sec=None):
+        pass
+
+    def stop_node(self, node):
+        pass
+
+    def clean_node(self, node):
+        pass
--- a/tests/kafkatest/services/trogdor/task_spec.py
+++ b/tests/kafkatest/services/trogdor/task_spec.py
@@ -0,0 +1,54 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import json
+
+
+class TaskSpec(object):
+    """
+    The base class for a task specification.
+
+    MAX_DURATION_MS         The longest duration we should use for a task specification.
+    """
+
+    MAX_DURATION_MS=10000000
+
+    def __init__(self, start_ms, duration_ms):
+        """
+        Create a new task specification.
+
+        :param start_ms:        The target start time in milliseconds since the epoch.
+        :param duration_ms:     The duration in milliseconds.
+        """
+        self.message = {
+            'startMs': start_ms,
+            'durationMs': duration_ms
+        }
+
+    @staticmethod
+    def to_node_names(nodes):
+        """
+        Convert an array of nodes or node names to an array of node names.
+        """
+        node_names = []
+        for obj in nodes:
+            if isinstance(obj, basestring):
+                node_names.append(obj)
+            else:
+                node_names.append(obj.name)
+        return node_names
+
+    def __str__(self):
+        return json.dumps(self.message)
--- a/tests/kafkatest/services/trogdor/templates/log4j.properties
+++ b/tests/kafkatest/services/trogdor/templates/log4j.properties
@@ -0,0 +1,23 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+log4j.rootLogger=DEBUG, mylogger
+log4j.logger.kafka=DEBUG
+log4j.logger.org.apache.kafka=DEBUG
+log4j.logger.org.eclipse=INFO
+log4j.appender.mylogger=org.apache.log4j.FileAppender
+log4j.appender.mylogger.File={{ log_path }}
+log4j.appender.mylogger.layout=org.apache.log4j.PatternLayout
+log4j.appender.mylogger.layout.ConversionPattern=[%d] %p %m (%c)%n
--- a/tests/kafkatest/services/trogdor/trogdor.py
+++ b/tests/kafkatest/services/trogdor/trogdor.py
@@ -0,0 +1,354 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import json
+import os.path
+import requests
+from requests.adapters import HTTPAdapter
+from requests.packages.urllib3 import Retry
+
+from ducktape.services.service import Service
+from ducktape.utils.util import wait_until
+from kafkatest.directory_layout.kafka_path import KafkaPathResolverMixin
+
+
+class TrogdorService(KafkaPathResolverMixin, Service):
+    """
+    A ducktape service for running the trogdor fault injection daemons.
+
+    Attributes:
+        PERSISTENT_ROOT                 The root filesystem path to store service files under.
+        COORDINATOR_STDOUT_STDERR       The path where we store the coordinator's stdout/stderr output.
+        AGENT_STDOUT_STDERR             The path where we store the agents's stdout/stderr output.
+        COORDINATOR_LOG                 The path where we store the coordinator's log4j output.
+        AGENT_LOG                       The path where we store the agent's log4j output.
+        AGENT_LOG4J_PROPERTIES          The path to the agent log4j.properties file for log config.
+        COORDINATOR_LOG4J_PROPERTIES    The path to the coordinator log4j.properties file for log config.
+        CONFIG_PATH                     The path to the trogdor configuration file.
+        DEFAULT_AGENT_PORT              The default port to use for trogdor_agent daemons.
+        DEFAULT_COORDINATOR_PORT        The default port to use for trogdor_coordinator daemons.
+        REQUEST_TIMEOUT                 The request timeout in seconds to use for REST requests.
+        REQUEST_HEADERS                 The request headers to use when communicating with trogdor.
+    """
+
+    PERSISTENT_ROOT="/mnt/trogdor"
+    COORDINATOR_STDOUT_STDERR = os.path.join(PERSISTENT_ROOT, "trogdor-coordinator-stdout-stderr.log")
+    AGENT_STDOUT_STDERR = os.path.join(PERSISTENT_ROOT, "trogdor-agent-stdout-stderr.log")
+    COORDINATOR_LOG = os.path.join(PERSISTENT_ROOT, "trogdor-coordinator.log")
+    AGENT_LOG = os.path.join(PERSISTENT_ROOT, "trogdor-agent.log")
+    COORDINATOR_LOG4J_PROPERTIES = os.path.join(PERSISTENT_ROOT, "trogdor-coordinator-log4j.properties")
+    AGENT_LOG4J_PROPERTIES = os.path.join(PERSISTENT_ROOT, "trogdor-agent-log4j.properties")
+    CONFIG_PATH = os.path.join(PERSISTENT_ROOT, "trogdor.conf")
+    DEFAULT_AGENT_PORT=8888
+    DEFAULT_COORDINATOR_PORT=8889
+    REQUEST_TIMEOUT=5
+    REQUEST_HEADERS = {"Content-type": "application/json"}
+
+    logs = {
+        "trogdor_coordinator_stdout_stderr": {
+            "path": COORDINATOR_STDOUT_STDERR,
+            "collect_default": True},
+        "trogdor_agent_stdout_stderr": {
+            "path": AGENT_STDOUT_STDERR,
+            "collect_default": True},
+        "trogdor_coordinator_log": {
+            "path": COORDINATOR_LOG,
+            "collect_default": True},
+        "trogdor_agent_log": {
+            "path": AGENT_LOG,
+            "collect_default": True},
+    }
+
+
+    def __init__(self, context, agent_nodes=None, client_services=None,
+                 agent_port=DEFAULT_AGENT_PORT, coordinator_port=DEFAULT_COORDINATOR_PORT):
+        """
+        Create a Trogdor service.
+
+        :param context:             The test context.
+        :param agent_nodes:         The nodes to run the agents on.
+        :param client_services:     Services whose nodes we should run agents on.
+        :param agent_port:          The port to use for the trogdor_agent daemons.
+        :param coordinator_port:    The port to use for the trogdor_coordinator daemons.
+        """
+        Service.__init__(self, context, num_nodes=1)
+        self.coordinator_node = self.nodes[0]
+        if client_services is not None:
+            for client_service in client_services:
+                for node in client_service.nodes:
+                    self.nodes.append(node)
+        if agent_nodes is not None:
+            for agent_node in agent_nodes:
+                self.nodes.append(agent_node)
+        if (len(self.nodes) == 1):
+            raise RuntimeError("You must supply at least one agent node to run the service on.")
+        self.agent_port = agent_port
+        self.coordinator_port = coordinator_port
+
+    def free(self):
+        # We only want to deallocate the coordinator node, not the agent nodes.  So we
+        # change self.nodes to include only the coordinator node, and then invoke
+        # the base class' free method.
+        if self.coordinator_node is not None:
+            self.nodes = [self.coordinator_node]
+            self.coordinator_node = None
+            Service.free(self)
+
+    def _create_config_dict(self):
+        """
+        Create a dictionary with the Trogdor configuration.
+
+        :return:            The configuration dictionary.
+        """
+        dict_nodes = {}
+        for node in self.nodes:
+            dict_nodes[node.name] = {
+                "hostname": node.account.ssh_hostname,
+            }
+            if node.name == self.coordinator_node.name:
+                dict_nodes[node.name]["trogdor.coordinator.port"] = self.coordinator_port
+            else:
+                dict_nodes[node.name]["trogdor.agent.port"] = self.agent_port
+
+        return {
+            "platform": "org.apache.kafka.trogdor.basic.BasicPlatform",
+            "nodes": dict_nodes,
+        }
+
+    def start_node(self, node):
+        node.account.mkdirs(TrogdorService.PERSISTENT_ROOT)
+
+        # Create the configuration file on the node.
+        str = json.dumps(self._create_config_dict(), indent=2)
+        self.logger.info("Creating configuration file %s with %s" % (TrogdorService.CONFIG_PATH, str))
+        node.account.create_file(TrogdorService.CONFIG_PATH, str)
+
+        if self.is_coordinator(node):
+            self._start_coordinator_node(node)
+        else:
+            self._start_agent_node(node)
+
+    def _start_coordinator_node(self, node):
+        node.account.create_file(TrogdorService.COORDINATOR_LOG4J_PROPERTIES,
+                                 self.render('log4j.properties',
+                                             log_path=TrogdorService.COORDINATOR_LOG))
+        self._start_trogdor_daemon("coordinator", TrogdorService.COORDINATOR_STDOUT_STDERR,
+                                   TrogdorService.COORDINATOR_LOG4J_PROPERTIES,
+                                   TrogdorService.COORDINATOR_LOG, node)
+        self.logger.info("Started trogdor coordinator on %s." % node.name)
+
+    def _start_agent_node(self, node):
+        node.account.create_file(TrogdorService.AGENT_LOG4J_PROPERTIES,
+                                 self.render('log4j.properties',
+                                             log_path=TrogdorService.AGENT_LOG))
+        self._start_trogdor_daemon("agent", TrogdorService.AGENT_STDOUT_STDERR,
+                                   TrogdorService.AGENT_LOG4J_PROPERTIES,
+                                   TrogdorService.AGENT_LOG, node)
+        self.logger.info("Started trogdor agent on %s." % node.name)
+
+    def _start_trogdor_daemon(self, daemon_name, stdout_stderr_capture_path,
+                              log4j_properties_path, log_path, node):
+        cmd = "export KAFKA_LOG4J_OPTS='-Dlog4j.configuration=file:%s'; " % log4j_properties_path
+        cmd += "%s %s --%s.config %s --node-name %s 1>> %s 2>> %s &" % \
+               (self.path.script("trogdor.sh", node),
+                daemon_name,
+                daemon_name,
+                TrogdorService.CONFIG_PATH,
+                node.name,
+                stdout_stderr_capture_path,
+                stdout_stderr_capture_path)
+        node.account.ssh(cmd)
+        with node.account.monitor_log(log_path) as monitor:
+            monitor.wait_until("Starting %s process." % daemon_name, timeout_sec=60, backoff_sec=.10,
+                               err_msg=("%s on %s didn't finish startup" % (daemon_name, node.name)))
+
+    def wait_node(self, node, timeout_sec=None):
+        if self.is_coordinator(node):
+            return len(node.account.java_pids(self.coordinator_class_name())) == 0
+        else:
+            return len(node.account.java_pids(self.agent_class_name())) == 0
+
+    def stop_node(self, node):
+        """Halt trogdor processes on this node."""
+        if self.is_coordinator(node):
+            node.account.kill_java_processes(self.coordinator_class_name())
+        else:
+            node.account.kill_java_processes(self.agent_class_name())
+
+    def clean_node(self, node):
+        """Clean up persistent state on this node - e.g. service logs, configuration files etc."""
+        self.stop_node(node)
+        node.account.ssh("rm -rf -- %s" % TrogdorService.PERSISTENT_ROOT)
+
+    def _coordinator_url(self, path):
+        return "http://%s:%d/coordinator/%s" % \
+               (self.coordinator_node.account.ssh_hostname, self.coordinator_port, path)
+
+    def request_session(self):
+        """
+        Creates a new request session which will retry for a while.
+        """
+        session = requests.Session()
+        session.mount('http://',
+                      HTTPAdapter(max_retries=Retry(total=5, backoff_factor=0.3)))
+        return session
+
+    def _coordinator_post(self, path, message):
+        """
+        Make a POST request to the Trogdor coordinator.
+
+        :param path:            The URL path to use.
+        :param message:         The message object to send.
+        :return:                The response as an object.
+        """
+        url = self._coordinator_url(path)
+        self.logger.info("POST %s %s" % (url, message))
+        response = self.request_session().post(url, json=message,
+                                               timeout=TrogdorService.REQUEST_TIMEOUT,
+                                               headers=TrogdorService.REQUEST_HEADERS)
+        response.raise_for_status()
+        return response.json()
+
+    def _coordinator_put(self, path, message):
+        """
+        Make a PUT request to the Trogdor coordinator.
+
+        :param path:            The URL path to use.
+        :param message:         The message object to send.
+        :return:                The response as an object.
+        """
+        url = self._coordinator_url(path)
+        self.logger.info("PUT %s %s" % (url, message))
+        response = self.request_session().put(url, json=message,
+                                              timeout=TrogdorService.REQUEST_TIMEOUT,
+                                              headers=TrogdorService.REQUEST_HEADERS)
+        response.raise_for_status()
+        return response.json()
+
+    def _coordinator_get(self, path, message):
+        """
+        Make a GET request to the Trogdor coordinator.
+
+        :param path:            The URL path to use.
+        :param message:         The message object to send.
+        :return:                The response as an object.
+        """
+        url = self._coordinator_url(path)
+        self.logger.info("GET %s %s" % (url, message))
+        response = self.request_session().get(url, json=message,
+                                              timeout=TrogdorService.REQUEST_TIMEOUT,
+                                              headers=TrogdorService.REQUEST_HEADERS)
+        response.raise_for_status()
+        return response.json()
+
+    def create_task(self, id, spec):
+        """
+        Create a new task.
+
+        :param id:          The task id.
+        :param spec:        The task spec.
+        """
+        self._coordinator_post("task/create", { "id": id, "spec": spec.message})
+        return TrogdorTask(id, self)
+
+    def stop_task(self, id):
+        """
+        Stop a task.
+
+        :param id:          The task id.
+        """
+        self._coordinator_put("task/stop", { "id": id })
+
+    def tasks(self):
+        """
+        Get the tasks which are on the coordinator.
+
+        :returns:           A map of task id strings to task state objects.
+                            Task state objects contain a 'spec' field with the spec
+                            and a 'state' field with the state.
+        """
+        return self._coordinator_get("tasks", {})
+
+    def is_coordinator(self, node):
+        return node == self.coordinator_node
+
+    def agent_class_name(self):
+        return "org.apache.kafka.trogdor.agent.Agent"
+
+    def coordinator_class_name(self):
+        return "org.apache.kafka.trogdor.coordinator.Coordinator"
+
+class TrogdorTask(object):
+    PENDING_STATE = "PENDING"
+    RUNNING_STATE = "RUNNING"
+    STOPPING_STATE = "STOPPING"
+    DONE_STATE = "DONE"
+
+    def __init__(self, id, trogdor):
+        self.id = id
+        self.trogdor = trogdor
+
+    def task_state_or_error(self):
+        task_state = self.trogdor.tasks()["tasks"][self.id]
+        if task_state is None:
+            raise RuntimeError("Coordinator did not know about %s." % self.id)
+        error = task_state.get("error")
+        if error is None or error == "":
+            return task_state["state"], None
+        else:
+            return None, error
+
+    def done(self):
+        """
+        Check if this task is done.
+
+        :raises RuntimeError:       If the task encountered an error.
+        :returns:                   True if the task is in DONE_STATE;
+                                    False if it is in a different state.
+        """
+        (task_state, error) = self.task_state_or_error()
+        if task_state is not None:
+            return task_state == TrogdorTask.DONE_STATE
+        else:
+            raise RuntimeError("Failed to gracefully stop %s: got task error: %s" % (self.id, error))
+
+    def running(self):
+        """
+        Check if this task is running.
+
+        :raises RuntimeError:       If the task encountered an error.
+        :returns:                   True if the task is in RUNNING_STATE;
+                                    False if it is in a different state.
+        """
+        (task_state, error) = self.task_state_or_error()
+        if task_state is not None:
+            return task_state == TrogdorTask.RUNNING_STATE
+        else:
+            raise RuntimeError("Failed to start %s: got task error: %s" % (self.id, error))
+
+    def stop(self):
+        """
+        Stop this task.
+
+        :raises RuntimeError:       If the task encountered an error.
+        """
+        if self.done():
+            return
+        self.trogdor.stop_task(self.id)
+
+    def wait_for_done(self, timeout_sec=360):
+        wait_until(lambda: self.done(),
+                   timeout_sec=timeout_sec,
+                   err_msg="%s failed to finish in the expected amount of time." % self.id)
--- a/tests/kafkatest/services/verifiable_client.py
+++ b/tests/kafkatest/services/verifiable_client.py
@@ -0,0 +1,330 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from kafkatest.directory_layout.kafka_path import TOOLS_JAR_NAME, TOOLS_DEPENDANT_TEST_LIBS_JAR_NAME
+from kafkatest.version import DEV_BRANCH, LATEST_0_8_2
+from ducktape.cluster.remoteaccount import RemoteCommandError
+
+import importlib
+import os
+import subprocess
+import signal
+
+
+"""This module abstracts the implementation of a verifiable client, allowing
+client developers to plug in their own client for all kafkatests that make
+use of either the VerifiableConsumer or VerifiableProducer classes.
+
+A verifiable client class must implement exec_cmd() and pids().
+
+This file provides:
+ * VerifiableClientMixin class: to be used for creating new verifiable client classes
+ * VerifiableClientJava class: the default Java verifiable clients
+ * VerifiableClientApp class: uses global configuration to specify
+   the command to execute and optional "pids" command, deploy script, etc.
+   Config syntax (pass as --global <json_or_jsonfile>):
+      {"Verifiable(Producer|Consumer|Client)": {
+       "class": "kafkatest.services.verifiable_client.VerifiableClientApp",
+       "exec_cmd": "/vagrant/x/myclient --some --standard --args",
+       "pids": "pgrep -f ...", // optional
+       "deploy": "/vagrant/x/mydeploy.sh", // optional
+       "kill_signal": 2 // optional clean_shutdown kill signal (SIGINT in this case)
+      }}
+ * VerifiableClientDummy class: testing dummy
+
+
+
+==============================
+Verifiable client requirements
+==============================
+
+There are currently two verifiable client specifications:
+ * VerifiableConsumer
+ * VerifiableProducer
+
+Common requirements for both:
+ * One-way communication (client -> tests) through new-line delimited
+   JSON objects on stdout (details below).
+ * Log/debug to stderr
+
+Common communication for both:
+ * `{ "name": "startup_complete" }` - Client succesfully started
+ * `{ "name": "shutdown_complete" }` - Client succesfully terminated (after receiving SIGINT/SIGTERM)
+
+
+==================
+VerifiableConsumer
+==================
+
+Command line arguments:
+ * `--group-id <group-id>`
+ * `--topic <topic>`
+ * `--broker-list <brokers>`
+ * `--session-timeout <n>`
+ * `--enable-autocommit`
+ * `--max-messages <n>`
+ * `--assignment-strategy <s>`
+ * `--consumer.config <config-file>` - consumer config properties (typically empty)
+
+Environment variables:
+ * `LOG_DIR` - log output directory. Typically not needed if logs are written to stderr.
+ * `KAFKA_OPTS` - Security config properties (Java client syntax)
+ * `KAFKA_LOG4J_OPTS` - Java log4j options (can be ignored)
+
+Client communication:
+ * `{ "name": "offsets_committed",  "success": bool, "error": "<errstr>", "offsets": [ { "topic": "<t>", "partition": <p>, "offset": <o> } ] }` - offset commit results, should be emitted for each committed offset. Emit prior to partitions_revoked.
+ * `{ "name": "records_consumed", "partitions": [ { "topic": "<t>", "partition": <p>,  "minOffset": <o>, "maxOffset": <o> } ], "count": <total_consumed> }` - per-partition delta stats from last records_consumed. Emit every 1000 messages, or 1s. Emit prior to partitions_assigned, partitions_revoked and offsets_committed.
+ * `{ "name": "partitions_revoked", "partitions": [ { "topic": "<t>", "partition": <p> } ] }` - rebalance: revoked partitions
+ * `{ "name": "partitions_assigned", "partitions": [ { "topic": "<t>", "partition": <p> } ] }` - rebalance: assigned partitions
+
+
+==================
+VerifiableProducer
+==================
+
+Command line arguments:
+ * `--topic <topic>`
+ * `--broker-list <brokers>`
+ * `--max-messages <n>`
+ * `--throughput <msgs/s>`
+ * `--producer.config <config-file>` - producer config properties (typically empty)
+
+Environment variables:
+ * `LOG_DIR` - log output directory. Typically not needed if logs are written to stderr.
+ * `KAFKA_OPTS` - Security config properties (Java client syntax)
+ * `KAFKA_LOG4J_OPTS` - Java log4j options (can be ignored)
+
+Client communication:
+ * `{ "name": "producer_send_error", "message": "<error msg>", "topic": "<t>", "key": "<msg key>", "value": "<msg value>" }` - emit on produce error.
+ * `{ "name": "producer_send_success", "topic": "<t>", "partition": <p>, "offset": <o>, "key": "<msg key>", "value": "<msg value>" }` - emit on produce success.
+
+
+
+===========
+Development
+===========
+
+**Logs:**
+During development of kafkatest clients it is generally a good idea to
+enable collection of the client's stdout and stderr logs for troubleshooting.
+Do this by setting "collect_default" to True for verifiable_consumder_stdout
+and .._stderr in verifiable_consumer.py and verifiable_producer.py
+
+
+**Deployment:**
+There's currently no automatic way of deploying 3rd party kafkatest clients
+on the VM instance so this needs to be done (at least partially) manually for
+now.
+
+One way to do this is logging in to a worker (`vagrant ssh worker1`), downloading
+and building the kafkatest client under /vagrant (which maps to the kafka root
+directory on the host and is shared with all VM instances).
+Also make sure to install any system-level dependencies on each instance.
+
+Then use /vagrant/..../yourkafkatestclient as your run-time path since it will
+now be available on all instances.
+
+The VerifiableClientApp automates the per-worker deployment with the optional
+"deploy": "/vagrant/../deploy_script.sh" globals configuration property, this
+script will be called on the VM just prior to executing the client.
+"""
+
+def create_verifiable_client_implementation(context, parent):
+    """Factory for generating a verifiable client implementation class instance
+
+    :param parent: parent class instance, either VerifiableConsumer or VerifiableProducer
+
+    This will first check for a fully qualified client implementation class name
+    in context.globals as "Verifiable<type>" where <type> is "Producer" or "Consumer",
+    followed by "VerifiableClient" (which should implement both).
+    The global object layout is: {"class": "<full class name>", "..anything..": ..}.
+
+    If present, construct a new instance, else defaults to VerifiableClientJava
+    """
+
+    # Default class
+    obj = {"class": "kafkatest.services.verifiable_client.VerifiableClientJava"}
+
+    parent_name = parent.__class__.__name__.rsplit('.', 1)[-1]
+    for k in [parent_name, "VerifiableClient"]:
+        if k in context.globals:
+            obj = context.globals[k]
+            break
+
+    if "class" not in obj:
+        raise SyntaxError('%s (or VerifiableClient) expected object format: {"class": "full.class.path", ..}' % parent_name)
+
+    clname = obj["class"]
+    # Using the fully qualified classname, import the implementation class
+    if clname.find('.') == -1:
+        raise SyntaxError("%s (or VerifiableClient) must specify full class path (including module)" % parent_name)
+
+    (module_name, clname) = clname.rsplit('.', 1)
+    cluster_mod = importlib.import_module(module_name)
+    impl_class = getattr(cluster_mod, clname)
+    return impl_class(parent, obj)
+
+
+
+class VerifiableClientMixin (object):
+    """
+    Verifiable client mixin class
+    """
+    @property
+    def impl (self):
+        """
+        :return: Return (and create if necessary) the Verifiable client implementation object.
+        """
+        # Add _impl attribute to parent Verifiable(Consumer|Producer) object.
+        if not hasattr(self, "_impl"):
+            setattr(self, "_impl", create_verifiable_client_implementation(self.context, self))
+            if hasattr(self.context, "logger") and self.context.logger is not None:
+                self.context.logger.debug("Using client implementation %s for %s" % (self._impl.__class__.__name__, self.__class__.__name__))
+        return self._impl
+
+
+    def exec_cmd (self, node):
+        """
+        :return: command string to execute client.
+        Environment variables will be prepended and command line arguments
+        appended to this string later by start_cmd().
+
+        This method should also take care of deploying the client on the instance, if necessary.
+        """
+        raise NotImplementedError()
+
+    def pids (self, node):
+        """ :return: list of pids for this client instance on node """
+        raise NotImplementedError()
+
+    def kill_signal (self, clean_shutdown=True):
+        """ :return: the kill signal to terminate the application. """
+        if not clean_shutdown:
+            return signal.SIGKILL
+
+        return self.conf.get("kill_signal", signal.SIGTERM)
+
+
+class VerifiableClientJava (VerifiableClientMixin):
+    """
+    Verifiable Consumer and Producer using the official Java client.
+    """
+    def __init__(self, parent, conf=None):
+        """
+        :param parent: The parent instance, either VerifiableConsumer or VerifiableProducer
+        :param conf: Optional conf object (the --globals VerifiableX object)
+        """
+        super(VerifiableClientJava, self).__init__()
+        self.parent = parent
+        self.java_class_name = parent.java_class_name()
+        self.conf = conf
+
+    def exec_cmd (self, node):
+        """ :return: command to execute to start instance
+        Translates Verifiable* to the corresponding Java client class name """
+        cmd = ""
+        if self.java_class_name == 'VerifiableProducer' and node.version <= LATEST_0_8_2:
+            # 0.8.2.X releases do not have VerifiableProducer.java, so cheat and add
+            # the tools jar from trunk to the classpath
+            tools_jar = self.parent.path.jar(TOOLS_JAR_NAME, DEV_BRANCH)
+            tools_dependant_libs_jar = self.parent.path.jar(TOOLS_DEPENDANT_TEST_LIBS_JAR_NAME, DEV_BRANCH)
+            cmd += "for file in %s; do CLASSPATH=$CLASSPATH:$file; done; " % tools_jar
+            cmd += "for file in %s; do CLASSPATH=$CLASSPATH:$file; done; " % tools_dependant_libs_jar
+            cmd += "export CLASSPATH; "
+        cmd += self.parent.path.script("kafka-run-class.sh", node) + " org.apache.kafka.tools." + self.java_class_name
+        return cmd
+
+    def pids (self, node):
+        """ :return: pid(s) for this client intstance on node """
+        try:
+            cmd = "jps | grep -i " + self.java_class_name + " | awk '{print $1}'"
+            pid_arr = [pid for pid in node.account.ssh_capture(cmd, allow_fail=True, callback=int)]
+            return pid_arr
+        except (RemoteCommandError, ValueError) as e:
+            return []
+
+
+class VerifiableClientDummy (VerifiableClientMixin):
+    """
+    Dummy class for testing the pluggable framework
+    """
+    def __init__(self, parent, conf=None):
+        """
+        :param parent: The parent instance, either VerifiableConsumer or VerifiableProducer
+        :param conf: Optional conf object (the --globals VerifiableX object)
+        """
+        super(VerifiableClientDummy, self).__init__()
+        self.parent = parent
+        self.conf = conf
+
+    def exec_cmd (self, node):
+        """ :return: command to execute to start instance """
+        return 'echo -e \'{"name": "shutdown_complete" }\n\' ; echo ARGS:'
+
+    def pids (self, node):
+        """ :return: pid(s) for this client intstance on node """
+        return []
+
+
+class VerifiableClientApp (VerifiableClientMixin):
+    """
+    VerifiableClient using --global settings for exec_cmd, pids and deploy.
+    By using this a verifiable client application can be used through simple
+    --globals configuration rather than implementing a Python class.
+    """
+
+    def __init__(self, parent, conf):
+        """
+        :param parent: The parent instance, either VerifiableConsumer or VerifiableProducer
+        :param conf: Optional conf object (the --globals VerifiableX object)
+        """
+        super(VerifiableClientApp, self).__init__()
+        self.parent = parent
+        # "VerifiableConsumer" or "VerifiableProducer"
+        self.name = self.parent.__class__.__name__
+        self.conf = conf
+
+        if "exec_cmd" not in self.conf:
+            raise SyntaxError("%s requires \"exec_cmd\": .. to be set in --globals %s object" % \
+                              (self.__class__.__name__, self.name))
+
+    def exec_cmd (self, node):
+        """ :return: command to execute to start instance """
+        self.deploy(node)
+        return self.conf["exec_cmd"]
+
+    def pids (self, node):
+        """ :return: pid(s) for this client intstance on node """
+
+        cmd = self.conf.get("pids", "pgrep -f '" + self.conf["exec_cmd"] + "'")
+        try:
+            pid_arr = [pid for pid in node.account.ssh_capture(cmd, allow_fail=True, callback=int)]
+            self.parent.context.logger.info("%s pids are: %s" % (str(node.account), pid_arr))
+            return pid_arr
+        except (subprocess.CalledProcessError, ValueError) as e:
+            return []
+
+    def deploy (self, node):
+        """ Call deploy script specified by "deploy" --global key
+            This optional script is run on the VM instance just prior to
+            executing `exec_cmd` to deploy the kafkatest client.
+            The script path must be as seen by the VM instance, e.g. /vagrant/.... """
+
+        if "deploy" not in self.conf:
+            return
+
+        script_cmd = self.conf["deploy"]
+        self.parent.context.logger.debug("Deploying %s: %s" % (self, script_cmd))
+        r = node.account.ssh(script_cmd)
--- a/tests/kafkatest/services/verifiable_consumer.py
+++ b/tests/kafkatest/services/verifiable_consumer.py
@@ -0,0 +1,418 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import json
+import os
+
+from ducktape.services.background_thread import BackgroundThreadService
+
+from kafkatest.directory_layout.kafka_path import KafkaPathResolverMixin
+from kafkatest.services.kafka import TopicPartition
+from kafkatest.services.verifiable_client import VerifiableClientMixin
+from kafkatest.version import DEV_BRANCH, V_2_3_0, V_2_3_1, V_0_10_0_0
+
+
+class ConsumerState:
+    Started = 1
+    Dead = 2
+    Rebalancing = 3
+    Joined = 4
+
+
+class ConsumerEventHandler(object):
+
+    def __init__(self, node, verify_offsets, idx):
+        self.node = node
+        self.idx = idx
+        self.state = ConsumerState.Dead
+        self.revoked_count = 0
+        self.assigned_count = 0
+        self.assignment = []
+        self.position = {}
+        self.committed = {}
+        self.total_consumed = 0
+        self.verify_offsets = verify_offsets
+
+    def handle_shutdown_complete(self):
+        self.state = ConsumerState.Dead
+        self.assignment = []
+        self.position = {}
+
+    def handle_startup_complete(self):
+        self.state = ConsumerState.Started
+
+    def handle_offsets_committed(self, event, node, logger):
+        if event["success"]:
+            for offset_commit in event["offsets"]:
+                if offset_commit.get("error", "") != "":
+                    logger.debug("%s: Offset commit failed for: %s" % (str(node.account), offset_commit))
+                    continue
+
+                topic = offset_commit["topic"]
+                partition = offset_commit["partition"]
+                tp = TopicPartition(topic, partition)
+                offset = offset_commit["offset"]
+                assert tp in self.assignment, \
+                    "Committed offsets for partition %s not assigned (current assignment: %s)" % \
+                    (str(tp), str(self.assignment))
+                assert tp in self.position, "No previous position for %s: %s" % (str(tp), event)
+                assert self.position[tp] >= offset, \
+                    "The committed offset %d was greater than the current position %d for partition %s" % \
+                    (offset, self.position[tp], str(tp))
+                self.committed[tp] = offset
+
+    def handle_records_consumed(self, event, logger):
+        assert self.state == ConsumerState.Joined, \
+            "Consumed records should only be received when joined (current state: %s)" % str(self.state)
+
+        for record_batch in event["partitions"]:
+            tp = TopicPartition(topic=record_batch["topic"],
+                                partition=record_batch["partition"])
+            min_offset = record_batch["minOffset"]
+            max_offset = record_batch["maxOffset"]
+
+            assert tp in self.assignment, \
+                "Consumed records for partition %s which is not assigned (current assignment: %s)" % \
+                (str(tp), str(self.assignment))
+            if tp not in self.position or self.position[tp] == min_offset:
+                self.position[tp] = max_offset + 1
+            else:
+                msg = "Consumed from an unexpected offset (%d, %d) for partition %s" % \
+                      (self.position.get(tp), min_offset, str(tp))
+                if self.verify_offsets:
+                    raise AssertionError(msg)
+                else:
+                    if tp in self.position:
+                        self.position[tp] = max_offset + 1
+                    logger.warn(msg)
+            self.total_consumed += event["count"]
+
+    def handle_partitions_revoked(self, event):
+        self.revoked_count += 1
+        self.state = ConsumerState.Rebalancing
+        self.position = {}
+
+    def handle_partitions_assigned(self, event):
+        self.assigned_count += 1
+        self.state = ConsumerState.Joined
+        assignment = []
+        for topic_partition in event["partitions"]:
+            topic = topic_partition["topic"]
+            partition = topic_partition["partition"]
+            assignment.append(TopicPartition(topic, partition))
+        self.assignment = assignment
+
+    def handle_kill_process(self, clean_shutdown):
+        # if the shutdown was clean, then we expect the explicit
+        # shutdown event from the consumer
+        if not clean_shutdown:
+            self.handle_shutdown_complete()
+
+    def current_assignment(self):
+        return list(self.assignment)
+
+    def current_position(self, tp):
+        if tp in self.position:
+            return self.position[tp]
+        else:
+            return None
+
+    def last_commit(self, tp):
+        if tp in self.committed:
+            return self.committed[tp]
+        else:
+            return None
+
+
+class VerifiableConsumer(KafkaPathResolverMixin, VerifiableClientMixin, BackgroundThreadService):
+    """This service wraps org.apache.kafka.tools.VerifiableConsumer for use in
+    system testing. 
+    
+    NOTE: this class should be treated as a PUBLIC API. Downstream users use
+    this service both directly and through class extension, so care must be 
+    taken to ensure compatibility.
+    """
+
+    PERSISTENT_ROOT = "/mnt/verifiable_consumer"
+    STDOUT_CAPTURE = os.path.join(PERSISTENT_ROOT, "verifiable_consumer.stdout")
+    STDERR_CAPTURE = os.path.join(PERSISTENT_ROOT, "verifiable_consumer.stderr")
+    LOG_DIR = os.path.join(PERSISTENT_ROOT, "logs")
+    LOG_FILE = os.path.join(LOG_DIR, "verifiable_consumer.log")
+    LOG4J_CONFIG = os.path.join(PERSISTENT_ROOT, "tools-log4j.properties")
+    CONFIG_FILE = os.path.join(PERSISTENT_ROOT, "verifiable_consumer.properties")
+
+    logs = {
+        "verifiable_consumer_stdout": {
+            "path": STDOUT_CAPTURE,
+            "collect_default": False},
+        "verifiable_consumer_stderr": {
+            "path": STDERR_CAPTURE,
+            "collect_default": False},
+        "verifiable_consumer_log": {
+            "path": LOG_FILE,
+            "collect_default": True}
+        }
+
+    def __init__(self, context, num_nodes, kafka, topic, group_id,
+                 static_membership=False, max_messages=-1, session_timeout_sec=30, enable_autocommit=False,
+                 assignment_strategy=None,
+                 version=DEV_BRANCH, stop_timeout_sec=30, log_level="INFO", jaas_override_variables=None,
+                 on_record_consumed=None, reset_policy="earliest", verify_offsets=True):
+        """
+        :param jaas_override_variables: A dict of variables to be used in the jaas.conf template file
+        """
+        super(VerifiableConsumer, self).__init__(context, num_nodes)
+        self.log_level = log_level
+        self.kafka = kafka
+        self.topic = topic
+        self.group_id = group_id
+        self.reset_policy = reset_policy
+        self.static_membership = static_membership
+        self.max_messages = max_messages
+        self.session_timeout_sec = session_timeout_sec
+        self.enable_autocommit = enable_autocommit
+        self.assignment_strategy = assignment_strategy
+        self.prop_file = ""
+        self.stop_timeout_sec = stop_timeout_sec
+        self.on_record_consumed = on_record_consumed
+        self.verify_offsets = verify_offsets
+
+        self.event_handlers = {}
+        self.global_position = {}
+        self.global_committed = {}
+        self.jaas_override_variables = jaas_override_variables or {}
+
+        for node in self.nodes:
+            node.version = version
+
+    def java_class_name(self):
+        return "VerifiableConsumer"
+
+    def _worker(self, idx, node):
+        with self.lock:
+            if node not in self.event_handlers:
+                self.event_handlers[node] = ConsumerEventHandler(node, self.verify_offsets, idx)
+            handler = self.event_handlers[node]
+
+        node.account.ssh("mkdir -p %s" % VerifiableConsumer.PERSISTENT_ROOT, allow_fail=False)
+
+        # Create and upload log properties
+        log_config = self.render('tools_log4j.properties', log_file=VerifiableConsumer.LOG_FILE)
+        node.account.create_file(VerifiableConsumer.LOG4J_CONFIG, log_config)
+
+        # Create and upload config file
+        self.security_config = self.kafka.security_config.client_config(self.prop_file, node,
+                                                                        self.jaas_override_variables)
+        self.security_config.setup_node(node)
+        self.prop_file += str(self.security_config)
+        self.logger.info("verifiable_consumer.properties:")
+        self.logger.info(self.prop_file)
+        node.account.create_file(VerifiableConsumer.CONFIG_FILE, self.prop_file)
+        self.security_config.setup_node(node)
+        # apply group.instance.id to the node for static membership validation
+        node.group_instance_id = None
+        if self.static_membership:
+            assert node.version >= V_2_3_0, \
+                "Version %s does not support static membership (must be 2.3 or higher)" % str(node.version)
+            node.group_instance_id = self.group_id + "-instance-" + str(idx)
+
+        if self.assignment_strategy:
+            assert node.version >= V_0_10_0_0, \
+                "Version %s does not setting an assignment strategy (must be 0.10.0 or higher)" % str(node.version)
+
+        cmd = self.start_cmd(node)
+        self.logger.debug("VerifiableConsumer %d command: %s" % (idx, cmd))
+
+        for line in node.account.ssh_capture(cmd):
+            event = self.try_parse_json(node, line.strip())
+            if event is not None:
+                with self.lock:
+                    name = event["name"]
+                    if name == "shutdown_complete":
+                        handler.handle_shutdown_complete()
+                    elif name == "startup_complete":
+                        handler.handle_startup_complete()
+                    elif name == "offsets_committed":
+                        handler.handle_offsets_committed(event, node, self.logger)
+                        self._update_global_committed(event)
+                    elif name == "records_consumed":
+                        handler.handle_records_consumed(event, self.logger)
+                        self._update_global_position(event, node)
+                    elif name == "record_data" and self.on_record_consumed:
+                        self.on_record_consumed(event, node)
+                    elif name == "partitions_revoked":
+                        handler.handle_partitions_revoked(event)
+                    elif name == "partitions_assigned":
+                        handler.handle_partitions_assigned(event)
+                    else:
+                        self.logger.debug("%s: ignoring unknown event: %s" % (str(node.account), event))
+
+    def _update_global_position(self, consumed_event, node):
+        for consumed_partition in consumed_event["partitions"]:
+            tp = TopicPartition(consumed_partition["topic"], consumed_partition["partition"])
+            if tp in self.global_committed:
+                # verify that the position never gets behind the current commit.
+                if self.global_committed[tp] > consumed_partition["minOffset"]:
+                    msg = "Consumed position %d is behind the current committed offset %d for partition %s" % \
+                          (consumed_partition["minOffset"], self.global_committed[tp], str(tp))
+                    if self.verify_offsets:
+                        raise AssertionError(msg)
+                    else:
+                        self.logger.warn(msg)
+
+            # the consumer cannot generally guarantee that the position increases monotonically
+            # without gaps in the face of hard failures, so we only log a warning when this happens
+            if tp in self.global_position and self.global_position[tp] != consumed_partition["minOffset"]:
+                self.logger.warn("%s: Expected next consumed offset of %d for partition %s, but instead saw %d" %
+                                 (str(node.account), self.global_position[tp], str(tp), consumed_partition["minOffset"]))
+
+            self.global_position[tp] = consumed_partition["maxOffset"] + 1
+
+    def _update_global_committed(self, commit_event):
+        if commit_event["success"]:
+            for offset_commit in commit_event["offsets"]:
+                tp = TopicPartition(offset_commit["topic"], offset_commit["partition"])
+                offset = offset_commit["offset"]
+                assert self.global_position[tp] >= offset, \
+                    "Committed offset %d for partition %s is ahead of the current position %d" % \
+                    (offset, str(tp), self.global_position[tp])
+                self.global_committed[tp] = offset
+
+    def start_cmd(self, node):
+        cmd = ""
+        cmd += "export LOG_DIR=%s;" % VerifiableConsumer.LOG_DIR
+        cmd += " export KAFKA_OPTS=%s;" % self.security_config.kafka_opts
+        cmd += " export KAFKA_LOG4J_OPTS=\"-Dlog4j.configuration=file:%s\"; " % VerifiableConsumer.LOG4J_CONFIG
+        cmd += self.impl.exec_cmd(node)
+        if self.on_record_consumed:
+            cmd += " --verbose"
+
+        if node.group_instance_id:
+            cmd += " --group-instance-id %s" % node.group_instance_id
+        elif node.version == V_2_3_0 or node.version == V_2_3_1:
+            # In 2.3, --group-instance-id was required, but would be left empty
+            # if `None` is passed as the argument value
+            cmd += " --group-instance-id None"
+
+        if self.assignment_strategy:
+            cmd += " --assignment-strategy %s" % self.assignment_strategy
+
+        if self.enable_autocommit:
+            cmd += " --enable-autocommit "
+
+        cmd += " --reset-policy %s --group-id %s --topic %s --broker-list %s --session-timeout %s" % \
+               (self.reset_policy, self.group_id, self.topic,
+                self.kafka.bootstrap_servers(self.security_config.security_protocol),
+                self.session_timeout_sec*1000)
+               
+        if self.max_messages > 0:
+            cmd += " --max-messages %s" % str(self.max_messages)
+
+        cmd += " --consumer.config %s" % VerifiableConsumer.CONFIG_FILE
+        cmd += " 2>> %s | tee -a %s &" % (VerifiableConsumer.STDOUT_CAPTURE, VerifiableConsumer.STDOUT_CAPTURE)
+        return cmd
+
+    def pids(self, node):
+        return self.impl.pids(node)
+
+    def try_parse_json(self, node, string):
+        """Try to parse a string as json. Return None if not parseable."""
+        try:
+            return json.loads(string)
+        except ValueError:
+            self.logger.debug("%s: Could not parse as json: %s" % (str(node.account), str(string)))
+            return None
+
+    def stop_all(self):
+        for node in self.nodes:
+            self.stop_node(node)
+
+    def kill_node(self, node, clean_shutdown=True, allow_fail=False):
+        sig = self.impl.kill_signal(clean_shutdown)
+        for pid in self.pids(node):
+            node.account.signal(pid, sig, allow_fail)
+
+        with self.lock:
+            self.event_handlers[node].handle_kill_process(clean_shutdown)
+
+    def stop_node(self, node, clean_shutdown=True):
+        self.kill_node(node, clean_shutdown=clean_shutdown)
+
+        stopped = self.wait_node(node, timeout_sec=self.stop_timeout_sec)
+        assert stopped, "Node %s: did not stop within the specified timeout of %s seconds" % \
+                        (str(node.account), str(self.stop_timeout_sec))
+
+    def clean_node(self, node):
+        self.kill_node(node, clean_shutdown=False)
+        node.account.ssh("rm -rf " + self.PERSISTENT_ROOT, allow_fail=False)
+        self.security_config.clean_node(node)
+
+    def current_assignment(self):
+        with self.lock:
+            return { handler.node: handler.current_assignment() for handler in self.event_handlers.itervalues() }
+
+    def current_position(self, tp):
+        with self.lock:
+            if tp in self.global_position:
+                return self.global_position[tp]
+            else:
+                return None
+
+    def owner(self, tp):
+        with self.lock:
+            for handler in self.event_handlers.itervalues():
+                if tp in handler.current_assignment():
+                    return handler.node
+            return None
+
+    def last_commit(self, tp):
+        with self.lock:
+            if tp in self.global_committed:
+                return self.global_committed[tp]
+            else:
+                return None
+
+    def total_consumed(self):
+        with self.lock:
+            return sum(handler.total_consumed for handler in self.event_handlers.itervalues())
+
+    def num_rebalances(self):
+        with self.lock:
+            return max(handler.assigned_count for handler in self.event_handlers.itervalues())
+
+    def num_revokes_for_alive(self, keep_alive=1):
+        with self.lock:
+            return max([handler.revoked_count for handler in self.event_handlers.itervalues()
+                       if handler.idx <= keep_alive])
+
+    def joined_nodes(self):
+        with self.lock:
+            return [handler.node for handler in self.event_handlers.itervalues()
+                    if handler.state == ConsumerState.Joined]
+
+    def rebalancing_nodes(self):
+        with self.lock:
+            return [handler.node for handler in self.event_handlers.itervalues()
+                    if handler.state == ConsumerState.Rebalancing]
+
+    def dead_nodes(self):
+        with self.lock:
+            return [handler.node for handler in self.event_handlers.itervalues()
+                    if handler.state == ConsumerState.Dead]
+
+    def alive_nodes(self):
+        with self.lock:
+            return [handler.node for handler in self.event_handlers.itervalues()
+                    if handler.state != ConsumerState.Dead]
--- a/tests/kafkatest/services/verifiable_producer.py
+++ b/tests/kafkatest/services/verifiable_producer.py
@@ -0,0 +1,315 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import json
+import os
+
+import time
+from ducktape.cluster.remoteaccount import RemoteCommandError
+from ducktape.services.background_thread import BackgroundThreadService
+from kafkatest.directory_layout.kafka_path import KafkaPathResolverMixin
+from kafkatest.services.kafka import TopicPartition
+from kafkatest.services.verifiable_client import VerifiableClientMixin
+from kafkatest.utils import is_int, is_int_with_prefix
+from kafkatest.version import DEV_BRANCH
+
+
+class VerifiableProducer(KafkaPathResolverMixin, VerifiableClientMixin, BackgroundThreadService):
+    """This service wraps org.apache.kafka.tools.VerifiableProducer for use in
+    system testing.
+
+    NOTE: this class should be treated as a PUBLIC API. Downstream users use
+    this service both directly and through class extension, so care must be
+    taken to ensure compatibility.
+    """
+
+    PERSISTENT_ROOT = "/mnt/verifiable_producer"
+    STDOUT_CAPTURE = os.path.join(PERSISTENT_ROOT, "verifiable_producer.stdout")
+    STDERR_CAPTURE = os.path.join(PERSISTENT_ROOT, "verifiable_producer.stderr")
+    LOG_DIR = os.path.join(PERSISTENT_ROOT, "logs")
+    LOG_FILE = os.path.join(LOG_DIR, "verifiable_producer.log")
+    LOG4J_CONFIG = os.path.join(PERSISTENT_ROOT, "tools-log4j.properties")
+    CONFIG_FILE = os.path.join(PERSISTENT_ROOT, "verifiable_producer.properties")
+
+    logs = {
+        "verifiable_producer_stdout": {
+            "path": STDOUT_CAPTURE,
+            "collect_default": False},
+        "verifiable_producer_stderr": {
+            "path": STDERR_CAPTURE,
+            "collect_default": False},
+        "verifiable_producer_log": {
+            "path": LOG_FILE,
+            "collect_default": True}
+        }
+
+    def __init__(self, context, num_nodes, kafka, topic, max_messages=-1, throughput=100000,
+                 message_validator=is_int, compression_types=None, version=DEV_BRANCH, acks=None,
+                 stop_timeout_sec=150, request_timeout_sec=30, log_level="INFO",
+                 enable_idempotence=False, offline_nodes=[], create_time=-1, repeating_keys=None,
+                 jaas_override_variables=None, kafka_opts_override="", client_prop_file_override="",
+                 retries=None):
+        """
+        Args:
+            :param max_messages                number of messages to be produced per producer
+            :param message_validator           checks for an expected format of messages produced. There are
+                                               currently two:
+                                               * is_int is an integer format; this is default and expected to be used if
+                                                 num_nodes = 1
+                                               * is_int_with_prefix recommended if num_nodes > 1, because otherwise each producer
+                                                 will produce exactly same messages, and validation may miss missing messages.
+            :param compression_types           If None, all producers will not use compression; or a list of compression types,
+                                               one per producer (could be "none").
+            :param jaas_override_variables     A dict of variables to be used in the jaas.conf template file
+            :param kafka_opts_override         Override parameters of the KAFKA_OPTS environment variable
+            :param client_prop_file_override   Override client.properties file used by the consumer
+        """
+        super(VerifiableProducer, self).__init__(context, num_nodes)
+        self.log_level = log_level
+
+        self.kafka = kafka
+        self.topic = topic
+        self.max_messages = max_messages
+        self.throughput = throughput
+        self.message_validator = message_validator
+        self.compression_types = compression_types
+        if self.compression_types is not None:
+            assert len(self.compression_types) == num_nodes, "Specify one compression type per node"
+
+        for node in self.nodes:
+            node.version = version
+        self.acked_values = []
+        self.acked_values_by_partition = {}
+        self._last_acked_offsets = {}
+        self.not_acked_values = []
+        self.produced_count = {}
+        self.clean_shutdown_nodes = set()
+        self.acks = acks
+        self.stop_timeout_sec = stop_timeout_sec
+        self.request_timeout_sec = request_timeout_sec
+        self.enable_idempotence = enable_idempotence
+        self.offline_nodes = offline_nodes
+        self.create_time = create_time
+        self.repeating_keys = repeating_keys
+        self.jaas_override_variables = jaas_override_variables or {}
+        self.kafka_opts_override = kafka_opts_override
+        self.client_prop_file_override = client_prop_file_override
+        self.retries = retries
+
+    def java_class_name(self):
+        return "VerifiableProducer"
+
+    def prop_file(self, node):
+        idx = self.idx(node)
+        prop_file = self.render('producer.properties', request_timeout_ms=(self.request_timeout_sec * 1000))
+        prop_file += "\n{}".format(str(self.security_config))
+        if self.compression_types is not None:
+            compression_index = idx - 1
+            self.logger.info("VerifiableProducer (index = %d) will use compression type = %s", idx,
+                             self.compression_types[compression_index])
+            prop_file += "\ncompression.type=%s\n" % self.compression_types[compression_index]
+        return prop_file
+
+    def _worker(self, idx, node):
+        node.account.ssh("mkdir -p %s" % VerifiableProducer.PERSISTENT_ROOT, allow_fail=False)
+
+        # Create and upload log properties
+        log_config = self.render('tools_log4j.properties', log_file=VerifiableProducer.LOG_FILE)
+        node.account.create_file(VerifiableProducer.LOG4J_CONFIG, log_config)
+
+        # Configure security
+        self.security_config = self.kafka.security_config.client_config(node=node,
+                                                                        jaas_override_variables=self.jaas_override_variables)
+        self.security_config.setup_node(node)
+
+        # Create and upload config file
+        if self.client_prop_file_override:
+            producer_prop_file = self.client_prop_file_override
+        else:
+            producer_prop_file = self.prop_file(node)
+
+        if self.acks is not None:
+            self.logger.info("VerifiableProducer (index = %d) will use acks = %s", idx, self.acks)
+            producer_prop_file += "\nacks=%s\n" % self.acks
+
+        if self.enable_idempotence:
+            self.logger.info("Setting up an idempotent producer")
+            producer_prop_file += "\nmax.in.flight.requests.per.connection=5\n"
+            producer_prop_file += "\nretries=1000000\n"
+            producer_prop_file += "\nenable.idempotence=true\n"
+        elif self.retries is not None:
+            self.logger.info("VerifiableProducer (index = %d) will use retries = %s", idx, self.retries)
+            producer_prop_file += "\nretries=%s\n" % self.retries
+            producer_prop_file += "\ndelivery.timeout.ms=%s\n" % (self.request_timeout_sec * 1000 * self.retries)
+
+        self.logger.info("verifiable_producer.properties:")
+        self.logger.info(producer_prop_file)
+        node.account.create_file(VerifiableProducer.CONFIG_FILE, producer_prop_file)
+
+        cmd = self.start_cmd(node, idx)
+        self.logger.debug("VerifiableProducer %d command: %s" % (idx, cmd))
+
+        self.produced_count[idx] = 0
+        last_produced_time = time.time()
+        prev_msg = None
+
+        for line in node.account.ssh_capture(cmd):
+            line = line.strip()
+
+            data = self.try_parse_json(line)
+            if data is not None:
+
+                with self.lock:
+                    if data["name"] == "producer_send_error":
+                        data["node"] = idx
+                        self.not_acked_values.append(self.message_validator(data["value"]))
+                        self.produced_count[idx] += 1
+
+                    elif data["name"] == "producer_send_success":
+                        partition = TopicPartition(data["topic"], data["partition"])
+                        value = self.message_validator(data["value"])
+                        self.acked_values.append(value)
+
+                        if partition not in self.acked_values_by_partition:
+                            self.acked_values_by_partition[partition] = []
+                        self.acked_values_by_partition[partition].append(value)
+
+                        self._last_acked_offsets[partition] = data["offset"]
+                        self.produced_count[idx] += 1
+
+                        # Log information if there is a large gap between successively acknowledged messages
+                        t = time.time()
+                        time_delta_sec = t - last_produced_time
+                        if time_delta_sec > 2 and prev_msg is not None:
+                            self.logger.debug(
+                                "Time delta between successively acked messages is large: " +
+                                "delta_t_sec: %s, prev_message: %s, current_message: %s" % (str(time_delta_sec), str(prev_msg), str(data)))
+
+                        last_produced_time = t
+                        prev_msg = data
+
+                    elif data["name"] == "shutdown_complete":
+                        if node in self.clean_shutdown_nodes:
+                            raise Exception("Unexpected shutdown event from producer, already shutdown. Producer index: %d" % idx)
+                        self.clean_shutdown_nodes.add(node)
+
+    def _has_output(self, node):
+        """Helper used as a proxy to determine whether jmx is running by that jmx_tool_log contains output."""
+        try:
+            node.account.ssh("test -z \"$(cat %s)\"" % VerifiableProducer.STDOUT_CAPTURE, allow_fail=False)
+            return False
+        except RemoteCommandError:
+            return True
+
+    def start_cmd(self, node, idx):
+        cmd  = "export LOG_DIR=%s;" % VerifiableProducer.LOG_DIR
+        if self.kafka_opts_override:
+            cmd += " export KAFKA_OPTS=\"%s\";" % self.kafka_opts_override
+        else:
+            cmd += " export KAFKA_OPTS=%s;" % self.security_config.kafka_opts
+
+        cmd += " export KAFKA_LOG4J_OPTS=\"-Dlog4j.configuration=file:%s\"; " % VerifiableProducer.LOG4J_CONFIG
+        cmd += self.impl.exec_cmd(node)
+        cmd += " --topic %s --broker-list %s" % (self.topic, self.kafka.bootstrap_servers(self.security_config.security_protocol, True, self.offline_nodes))
+        if self.max_messages > 0:
+            cmd += " --max-messages %s" % str(self.max_messages)
+        if self.throughput > 0:
+            cmd += " --throughput %s" % str(self.throughput)
+        if self.message_validator == is_int_with_prefix:
+            cmd += " --value-prefix %s" % str(idx)
+        if self.acks is not None:
+            cmd += " --acks %s " % str(self.acks)
+        if self.create_time > -1:
+            cmd += " --message-create-time %s " % str(self.create_time)
+        if self.repeating_keys is not None:
+            cmd += " --repeating-keys %s " % str(self.repeating_keys)
+
+        cmd += " --producer.config %s" % VerifiableProducer.CONFIG_FILE
+
+        cmd += " 2>> %s | tee -a %s &" % (VerifiableProducer.STDOUT_CAPTURE, VerifiableProducer.STDOUT_CAPTURE)
+        return cmd
+
+    def kill_node(self, node, clean_shutdown=True, allow_fail=False):
+        sig = self.impl.kill_signal(clean_shutdown)
+        for pid in self.pids(node):
+            node.account.signal(pid, sig, allow_fail)
+
+    def pids(self, node):
+        return self.impl.pids(node)
+
+    def alive(self, node):
+        return len(self.pids(node)) > 0
+
+    @property
+    def last_acked_offsets(self):
+        with self.lock:
+            return self._last_acked_offsets
+
+    @property
+    def acked(self):
+        with self.lock:
+            return self.acked_values
+
+    @property
+    def acked_by_partition(self):
+        with self.lock:
+            return self.acked_values_by_partition
+
+    @property
+    def not_acked(self):
+        with self.lock:
+            return self.not_acked_values
+
+    @property
+    def num_acked(self):
+        with self.lock:
+            return len(self.acked_values)
+
+    @property
+    def num_not_acked(self):
+        with self.lock:
+            return len(self.not_acked_values)
+
+    def each_produced_at_least(self, count):
+        with self.lock:
+            for idx in range(1, self.num_nodes + 1):
+                if self.produced_count.get(idx) is None or self.produced_count[idx] < count:
+                    return False
+            return True
+
+    def stop_node(self, node):
+        # There is a race condition on shutdown if using `max_messages` since the
+        # VerifiableProducer will shutdown automatically when all messages have been
+        # written. In this case, the process will be gone and the signal will fail.
+        allow_fail = self.max_messages > 0
+        self.kill_node(node, clean_shutdown=True, allow_fail=allow_fail)
+
+        stopped = self.wait_node(node, timeout_sec=self.stop_timeout_sec)
+        assert stopped, "Node %s: did not stop within the specified timeout of %s seconds" % \
+                        (str(node.account), str(self.stop_timeout_sec))
+
+    def clean_node(self, node):
+        self.kill_node(node, clean_shutdown=False, allow_fail=False)
+        node.account.ssh("rm -rf " + self.PERSISTENT_ROOT, allow_fail=False)
+        self.security_config.clean_node(node)
+
+    def try_parse_json(self, string):
+        """Try to parse a string as json. Return None if not parseable."""
+        try:
+            record = json.loads(string)
+            return record
+        except ValueError:
+            self.logger.debug("Could not parse as json: %s" % str(string))
+            return None
--- a/tests/kafkatest/services/zookeeper.py
+++ b/tests/kafkatest/services/zookeeper.py
@@ -0,0 +1,251 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+
+import os
+import re
+import time
+
+from ducktape.services.service import Service
+from ducktape.utils.util import wait_until
+from ducktape.cluster.remoteaccount import RemoteCommandError
+
+from kafkatest.directory_layout.kafka_path import KafkaPathResolverMixin
+from kafkatest.services.security.security_config import SecurityConfig
+from kafkatest.version import DEV_BRANCH
+
+
+class ZookeeperService(KafkaPathResolverMixin, Service):
+    ROOT = "/mnt/zookeeper"
+    DATA = os.path.join(ROOT, "data")
+    HEAP_DUMP_FILE = os.path.join(ROOT, "zk_heap_dump.bin")
+
+    logs = {
+        "zk_log": {
+            "path": "%s/zk.log" % ROOT,
+            "collect_default": True},
+        "zk_data": {
+            "path": DATA,
+            "collect_default": False},
+        "zk_heap_dump_file": {
+            "path": HEAP_DUMP_FILE,
+            "collect_default": True}
+    }
+
+    def __init__(self, context, num_nodes, zk_sasl = False, zk_client_port = True, zk_client_secure_port = False,
+                 zk_tls_encrypt_only = False):
+        """
+        :type context
+        """
+        self.kafka_opts = ""
+        self.zk_sasl = zk_sasl
+        if not zk_client_port and not zk_client_secure_port:
+            raise Exception("Cannot disable both ZK clientPort and clientSecurePort")
+        self.zk_client_port = zk_client_port
+        self.zk_client_secure_port = zk_client_secure_port
+        self.zk_tls_encrypt_only = zk_tls_encrypt_only
+        super(ZookeeperService, self).__init__(context, num_nodes)
+
+    @property
+    def security_config(self):
+        return SecurityConfig(self.context, zk_sasl=self.zk_sasl, zk_tls=self.zk_client_secure_port)
+
+    @property
+    def security_system_properties(self):
+        return "-Dzookeeper.authProvider.sasl=org.apache.zookeeper.server.auth.SASLAuthenticationProvider " \
+               "-DjaasLoginRenew=3600000 " \
+               "-Djava.security.auth.login.config=%s " \
+               "-Djava.security.krb5.conf=%s " % (self.security_config.JAAS_CONF_PATH, self.security_config.KRB5CONF_PATH)
+
+    @property
+    def zk_principals(self):
+        return " zkclient "  + ' '.join(['zookeeper/' + zk_node.account.hostname for zk_node in self.nodes])
+
+    def restart_cluster(self):
+        for node in self.nodes:
+            self.restart_node(node)
+
+    def restart_node(self, node):
+        """Restart the given node."""
+        self.stop_node(node)
+        self.start_node(node)
+
+    def start_node(self, node):
+        idx = self.idx(node)
+        self.logger.info("Starting ZK node %d on %s", idx, node.account.hostname)
+
+        node.account.ssh("mkdir -p %s" % ZookeeperService.DATA)
+        node.account.ssh("echo %d > %s/myid" % (idx, ZookeeperService.DATA))
+
+        self.security_config.setup_node(node)
+        config_file = self.render('zookeeper.properties')
+        self.logger.info("zookeeper.properties:")
+        self.logger.info(config_file)
+        node.account.create_file("%s/zookeeper.properties" % ZookeeperService.ROOT, config_file)
+
+        heap_kafka_opts = "-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=%s" % self.logs["zk_heap_dump_file"]["path"]
+        other_kafka_opts = self.kafka_opts + ' ' + self.security_system_properties \
+            if self.security_config.zk_sasl else self.kafka_opts
+        start_cmd = "export KAFKA_OPTS=\"%s %s\";" % (heap_kafka_opts, other_kafka_opts)
+        start_cmd += "%s " % self.path.script("zookeeper-server-start.sh", node)
+        start_cmd += "%s/zookeeper.properties &>> %s &" % (ZookeeperService.ROOT, self.logs["zk_log"]["path"])
+        node.account.ssh(start_cmd)
+
+        wait_until(lambda: self.listening(node), timeout_sec=30, err_msg="Zookeeper node failed to start")
+
+    def listening(self, node):
+        try:
+            port = 2181 if self.zk_client_port else 2182
+            cmd = "nc -z %s %s" % (node.account.hostname, port)
+            node.account.ssh_output(cmd, allow_fail=False)
+            self.logger.debug("Zookeeper started accepting connections at: '%s:%s')", node.account.hostname, port)
+            return True
+        except (RemoteCommandError, ValueError) as e:
+            return False
+
+    def pids(self, node):
+        return node.account.java_pids(self.java_class_name())
+
+    def alive(self, node):
+        return len(self.pids(node)) > 0
+
+    def stop_node(self, node):
+        idx = self.idx(node)
+        self.logger.info("Stopping %s node %d on %s" % (type(self).__name__, idx, node.account.hostname))
+        node.account.kill_java_processes(self.java_class_name(), allow_fail=False)
+        node.account.kill_java_processes(self.java_cli_class_name(), allow_fail=False)
+        wait_until(lambda: not self.alive(node), timeout_sec=5, err_msg="Timed out waiting for zookeeper to stop.")
+
+    def clean_node(self, node):
+        self.logger.info("Cleaning ZK node %d on %s", self.idx(node), node.account.hostname)
+        if self.alive(node):
+            self.logger.warn("%s %s was still alive at cleanup time. Killing forcefully..." %
+                             (self.__class__.__name__, node.account))
+        node.account.kill_java_processes(self.java_class_name(),
+                                         clean_shutdown=False, allow_fail=True)
+        node.account.kill_java_processes(self.java_cli_class_name(),
+                                         clean_shutdown=False, allow_fail=False)
+        node.account.ssh("rm -rf -- %s" % ZookeeperService.ROOT, allow_fail=False)
+
+
+    # force_tls is a necessary option for the case where we define both encrypted and non-encrypted ports
+    def connect_setting(self, chroot=None, force_tls=False):
+        if chroot and not chroot.startswith("/"):
+            raise Exception("ZK chroot must start with '/', invalid chroot: %s" % chroot)
+
+        chroot = '' if chroot is None else chroot
+        return ','.join([node.account.hostname + (':2182' if not self.zk_client_port or force_tls else ':2181') + chroot
+                         for node in self.nodes])
+
+    def zkTlsConfigFileOption(self, forZooKeeperMain=False):
+        if not self.zk_client_secure_port:
+            return ""
+        return ("-zk-tls-config-file " if forZooKeeperMain else "--zk-tls-config-file ") + \
+               (SecurityConfig.ZK_CLIENT_TLS_ENCRYPT_ONLY_CONFIG_PATH if self.zk_tls_encrypt_only else SecurityConfig.ZK_CLIENT_MUTUAL_AUTH_CONFIG_PATH)
+
+    #
+    # This call is used to simulate a rolling upgrade to enable/disable
+    # the use of ZooKeeper ACLs.
+    #
+    def zookeeper_migration(self, node, zk_acl):
+        la_migra_cmd = "export KAFKA_OPTS=\"%s\";" % \
+                       self.security_system_properties if self.security_config.zk_sasl else ""
+        la_migra_cmd += "%s --zookeeper.acl=%s --zookeeper.connect=%s %s" % \
+                       (self.path.script("zookeeper-security-migration.sh", node), zk_acl,
+                        self.connect_setting(force_tls=self.zk_client_secure_port),
+                        self.zkTlsConfigFileOption())
+        node.account.ssh(la_migra_cmd)
+
+    def _check_chroot(self, chroot):
+        if chroot and not chroot.startswith("/"):
+            raise Exception("ZK chroot must start with '/', invalid chroot: %s" % chroot)
+
+    def query(self, path, chroot=None):
+        """
+        Queries zookeeper for data associated with 'path' and returns all fields in the schema
+        """
+        self._check_chroot(chroot)
+
+        chroot_path = ('' if chroot is None else chroot) + path
+
+        kafka_run_class = self.path.script("kafka-run-class.sh", DEV_BRANCH)
+        cmd = "%s %s -server %s %s get %s" % \
+              (kafka_run_class, self.java_cli_class_name(), self.connect_setting(force_tls=self.zk_client_secure_port),
+               self.zkTlsConfigFileOption(True),
+               chroot_path)
+        self.logger.debug(cmd)
+
+        node = self.nodes[0]
+        result = None
+        for line in node.account.ssh_capture(cmd, allow_fail=True):
+            # loop through all lines in the output, but only hold on to the first match
+            if result is None:
+                match = re.match("^({.+})$", line)
+                if match is not None:
+                    result = match.groups()[0]
+        return result
+
+    def create(self, path, chroot=None, value=""):
+        """
+        Create an znode at the given path
+        """
+        self._check_chroot(chroot)
+
+        chroot_path = ('' if chroot is None else chroot) + path
+
+        kafka_run_class = self.path.script("kafka-run-class.sh", DEV_BRANCH)
+        cmd = "%s %s -server %s %s create %s '%s'" % \
+              (kafka_run_class, self.java_cli_class_name(), self.connect_setting(force_tls=self.zk_client_secure_port),
+               self.zkTlsConfigFileOption(True),
+               chroot_path, value)
+        self.logger.debug(cmd)
+        output = self.nodes[0].account.ssh_output(cmd)
+        self.logger.debug(output)
+
+    def describe(self, topic):
+        """
+        Describe the given topic using the ConfigCommand CLI
+        """
+
+        kafka_run_class = self.path.script("kafka-run-class.sh", DEV_BRANCH)
+        cmd = "%s kafka.admin.ConfigCommand --zookeeper %s %s --describe --topic %s" % \
+              (kafka_run_class, self.connect_setting(force_tls=self.zk_client_secure_port),
+               self.zkTlsConfigFileOption(),
+               topic)
+        self.logger.debug(cmd)
+        output = self.nodes[0].account.ssh_output(cmd)
+        self.logger.debug(output)
+
+    def list_acls(self, topic):
+        """
+        List ACLs for the given topic using the AclCommand CLI
+        """
+
+        kafka_run_class = self.path.script("kafka-run-class.sh", DEV_BRANCH)
+        cmd = "%s kafka.admin.AclCommand --authorizer-properties zookeeper.connect=%s %s --list --topic %s" % \
+              (kafka_run_class, self.connect_setting(force_tls=self.zk_client_secure_port),
+               self.zkTlsConfigFileOption(),
+               topic)
+        self.logger.debug(cmd)
+        output = self.nodes[0].account.ssh_output(cmd)
+        self.logger.debug(output)
+
+    def java_class_name(self):
+        """ The class name of the Zookeeper quorum peers. """
+        return "org.apache.zookeeper.server.quorum.QuorumPeerMain"
+
+    def java_cli_class_name(self):
+        """ The class name of the Zookeeper tool within Kafka. """
+        return "org.apache.zookeeper.ZooKeeperMainWithTlsSupportForKafka"
--- a/tests/kafkatest/tests/init.py
+++ b/tests/kafkatest/tests/init.py
@@ -0,0 +1,14 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
--- a/tests/kafkatest/tests/client/init.py
+++ b/tests/kafkatest/tests/client/init.py
@@ -0,0 +1,14 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
--- a/tests/kafkatest/tests/client/client_compatibility_features_test.py
+++ b/tests/kafkatest/tests/client/client_compatibility_features_test.py
@@ -0,0 +1,123 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+import time
+from random import randint
+
+from ducktape.mark import parametrize
+from ducktape.tests.test import TestContext
+
+from kafkatest.services.zookeeper import ZookeeperService
+from kafkatest.services.kafka import KafkaService
+from ducktape.tests.test import Test
+from kafkatest.version import DEV_BRANCH, LATEST_0_10_0, LATEST_0_10_1, LATEST_0_10_2, LATEST_0_11_0, LATEST_1_0, LATEST_1_1, LATEST_2_0, LATEST_2_1, LATEST_2_2, LATEST_2_3, LATEST_2_4, V_0_11_0_0, V_0_10_1_0, KafkaVersion
+
+def get_broker_features(broker_version):
+    features = {}
+    if broker_version < V_0_10_1_0:
+        features["create-topics-supported"] = False
+        features["offsets-for-times-supported"] = False
+        features["cluster-id-supported"] = False
+        features["expect-record-too-large-exception"] = True
+    else:
+        features["create-topics-supported"] = True
+        features["offsets-for-times-supported"] = True
+        features["cluster-id-supported"] = True
+        features["expect-record-too-large-exception"] = False
+    if broker_version < V_0_11_0_0:
+        features["describe-acls-supported"] = False
+    else:
+        features["describe-acls-supported"] = True
+    return features
+
+def run_command(node, cmd, ssh_log_file):
+    with open(ssh_log_file, 'w') as f:
+        f.write("Running %s\n" % cmd)
+        try:
+            for line in node.account.ssh_capture(cmd):
+                f.write(line)
+        except Exception as e:
+            f.write("** Command failed!")
+            print e
+            raise
+
+
+class ClientCompatibilityFeaturesTest(Test):
+    """
+    Tests clients for the presence or absence of specific features when communicating with brokers with various
+    versions. Relies on ClientCompatibilityTest.java for much of the functionality.
+    """
+
+    def __init__(self, test_context):
+        """:type test_context: ducktape.tests.test.TestContext"""
+        super(ClientCompatibilityFeaturesTest, self).__init__(test_context=test_context)
+
+        self.zk = ZookeeperService(test_context, num_nodes=3)
+
+        # Generate a unique topic name
+        topic_name = "client_compat_features_topic_%d%d" % (int(time.time()), randint(0, 2147483647))
+        self.topics = { topic_name: {
+            "partitions": 1, # Use only one partition to avoid worrying about ordering
+            "replication-factor": 3
+            }}
+        self.kafka = KafkaService(test_context, num_nodes=3, zk=self.zk, topics=self.topics)
+
+    def invoke_compatibility_program(self, features):
+        # Run the compatibility test on the first Kafka node.
+        node = self.zk.nodes[0]
+        cmd = ("%s org.apache.kafka.tools.ClientCompatibilityTest "
+               "--bootstrap-server %s "
+               "--num-cluster-nodes %d "
+               "--topic %s " % (self.zk.path.script("kafka-run-class.sh", node),
+                               self.kafka.bootstrap_servers(),
+                               len(self.kafka.nodes),
+                               self.topics.keys()[0]))
+        for k, v in features.iteritems():
+            cmd = cmd + ("--%s %s " % (k, v))
+        results_dir = TestContext.results_dir(self.test_context, 0)
+        try:
+            os.makedirs(results_dir)
+        except OSError as e:
+            if e.errno == errno.EEXIST and os.path.isdir(path):
+                pass
+            else:
+                raise
+        ssh_log_file = "%s/%s" % (results_dir, "client_compatibility_test_output.txt")
+        try:
+          self.logger.info("Running %s" % cmd)
+          run_command(node, cmd, ssh_log_file)
+        except Exception as e:
+          self.logger.info("** Command failed.  See %s for log messages." % ssh_log_file)
+          raise
+
+    @parametrize(broker_version=str(DEV_BRANCH))
+    @parametrize(broker_version=str(LATEST_0_10_0))
+    @parametrize(broker_version=str(LATEST_0_10_1))
+    @parametrize(broker_version=str(LATEST_0_10_2))
+    @parametrize(broker_version=str(LATEST_0_11_0))
+    @parametrize(broker_version=str(LATEST_1_0))
+    @parametrize(broker_version=str(LATEST_1_1))
+    @parametrize(broker_version=str(LATEST_2_0))
+    @parametrize(broker_version=str(LATEST_2_1))
+    @parametrize(broker_version=str(LATEST_2_2))
+    @parametrize(broker_version=str(LATEST_2_3))
+    @parametrize(broker_version=str(LATEST_2_4))
+    def run_compatibility_test(self, broker_version):
+        self.zk.start()
+        self.kafka.set_version(KafkaVersion(broker_version))
+        self.kafka.start()
+        features = get_broker_features(broker_version)
+        self.invoke_compatibility_program(features)
--- a/tests/kafkatest/tests/client/client_compatibility_produce_consume_test.py
+++ b/tests/kafkatest/tests/client/client_compatibility_produce_consume_test.py
@@ -0,0 +1,84 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from ducktape.mark import parametrize
+from ducktape.utils.util import wait_until
+
+from kafkatest.services.zookeeper import ZookeeperService
+from kafkatest.services.kafka import KafkaService
+from kafkatest.services.verifiable_producer import VerifiableProducer
+from kafkatest.services.console_consumer import ConsoleConsumer
+from kafkatest.tests.produce_consume_validate import ProduceConsumeValidateTest
+from kafkatest.utils import is_int_with_prefix
+from kafkatest.version import DEV_BRANCH, LATEST_0_10_0, LATEST_0_10_1, LATEST_0_10_2, LATEST_0_11_0, LATEST_1_0, LATEST_1_1, LATEST_2_0, LATEST_2_1, LATEST_2_2, LATEST_2_3, LATEST_2_4, KafkaVersion
+
+class ClientCompatibilityProduceConsumeTest(ProduceConsumeValidateTest):
+    """
+    These tests validate that we can use a new client to produce and consume from older brokers.
+    """
+
+    def __init__(self, test_context):
+        """:type test_context: ducktape.tests.test.TestContext"""
+        super(ClientCompatibilityProduceConsumeTest, self).__init__(test_context=test_context)
+
+        self.topic = "test_topic"
+        self.zk = ZookeeperService(test_context, num_nodes=3)
+        self.kafka = KafkaService(test_context, num_nodes=3, zk=self.zk, topics={self.topic:{
+                                                                    "partitions": 10,
+                                                                    "replication-factor": 2}})
+        self.num_partitions = 10
+        self.timeout_sec = 60
+        self.producer_throughput = 1000
+        self.num_producers = 2
+        self.messages_per_producer = 1000
+        self.num_consumers = 1
+
+    def setUp(self):
+        self.zk.start()
+
+    def min_cluster_size(self):
+        # Override this since we're adding services outside of the constructor
+        return super(ClientCompatibilityProduceConsumeTest, self).min_cluster_size() + self.num_producers + self.num_consumers
+
+    @parametrize(broker_version=str(DEV_BRANCH))
+    @parametrize(broker_version=str(LATEST_0_10_0))
+    @parametrize(broker_version=str(LATEST_0_10_1))
+    @parametrize(broker_version=str(LATEST_0_10_2))
+    @parametrize(broker_version=str(LATEST_0_11_0))
+    @parametrize(broker_version=str(LATEST_1_0))
+    @parametrize(broker_version=str(LATEST_1_1))
+    @parametrize(broker_version=str(LATEST_2_0))
+    @parametrize(broker_version=str(LATEST_2_1))
+    @parametrize(broker_version=str(LATEST_2_2))
+    @parametrize(broker_version=str(LATEST_2_3))
+    @parametrize(broker_version=str(LATEST_2_4))
+    def test_produce_consume(self, broker_version):
+        print("running producer_consumer_compat with broker_version = %s" % broker_version)
+        self.kafka.set_version(KafkaVersion(broker_version))
+        self.kafka.security_protocol = "PLAINTEXT"
+        self.kafka.interbroker_security_protocol = self.kafka.security_protocol
+        self.producer = VerifiableProducer(self.test_context, self.num_producers, self.kafka,
+                                           self.topic, throughput=self.producer_throughput,
+                                           message_validator=is_int_with_prefix)
+        self.consumer = ConsoleConsumer(self.test_context, self.num_consumers, self.kafka, self.topic,
+                                        consumer_timeout_ms=60000,
+                                        message_validator=is_int_with_prefix)
+        self.kafka.start()
+
+        self.run_produce_consume_validate(lambda: wait_until(
+            lambda: self.producer.each_produced_at_least(self.messages_per_producer) == True,
+            timeout_sec=120, backoff_sec=1,
+            err_msg="Producer did not produce all messages in reasonable amount of time"))
+
--- a/tests/kafkatest/tests/client/compression_test.py
+++ b/tests/kafkatest/tests/client/compression_test.py
@@ -0,0 +1,87 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from ducktape.mark import parametrize
+from ducktape.utils.util import wait_until
+from ducktape.mark.resource import cluster
+
+from kafkatest.services.zookeeper import ZookeeperService
+from kafkatest.services.kafka import KafkaService
+from kafkatest.services.verifiable_producer import VerifiableProducer
+from kafkatest.services.console_consumer import ConsoleConsumer
+from kafkatest.tests.produce_consume_validate import ProduceConsumeValidateTest
+from kafkatest.utils import is_int_with_prefix
+
+
+class CompressionTest(ProduceConsumeValidateTest):
+    """
+    These tests validate produce / consume for compressed topics.
+    """
+    COMPRESSION_TYPES = ["snappy", "gzip", "lz4", "zstd", "none"]
+
+    def __init__(self, test_context):
+        """:type test_context: ducktape.tests.test.TestContext"""
+        super(CompressionTest, self).__init__(test_context=test_context)
+
+        self.topic = "test_topic"
+        self.zk = ZookeeperService(test_context, num_nodes=1)
+        self.kafka = KafkaService(test_context, num_nodes=1, zk=self.zk, topics={self.topic: {
+                                                                    "partitions": 10,
+                                                                    "replication-factor": 1}})
+        self.num_partitions = 10
+        self.timeout_sec = 60
+        self.producer_throughput = 1000
+        self.num_producers = len(self.COMPRESSION_TYPES)
+        self.messages_per_producer = 1000
+        self.num_consumers = 1
+
+    def setUp(self):
+        self.zk.start()
+
+    def min_cluster_size(self):
+        # Override this since we're adding services outside of the constructor
+        return super(CompressionTest, self).min_cluster_size() + self.num_producers + self.num_consumers
+
+    @cluster(num_nodes=8)
+    @parametrize(compression_types=COMPRESSION_TYPES)
+    def test_compressed_topic(self, compression_types):
+        """Test produce => consume => validate for compressed topics
+        Setup: 1 zk, 1 kafka node, 1 topic with partitions=10, replication-factor=1
+
+        compression_types parameter gives a list of compression types (or no compression if
+        "none"). Each producer in a VerifiableProducer group (num_producers = number of compression
+        types) will use a compression type from the list based on producer's index in the group.
+
+            - Produce messages in the background
+            - Consume messages in the background
+            - Stop producing, and finish consuming
+            - Validate that every acked message was consumed
+        """
+
+        self.kafka.security_protocol = "PLAINTEXT"
+        self.kafka.interbroker_security_protocol = self.kafka.security_protocol
+        self.producer = VerifiableProducer(self.test_context, self.num_producers, self.kafka,
+                                           self.topic, throughput=self.producer_throughput,
+                                           message_validator=is_int_with_prefix,
+                                           compression_types=compression_types)
+        self.consumer = ConsoleConsumer(self.test_context, self.num_consumers, self.kafka, self.topic,
+                                        consumer_timeout_ms=60000, message_validator=is_int_with_prefix)
+        self.kafka.start()
+
+        self.run_produce_consume_validate(lambda: wait_until(
+            lambda: self.producer.each_produced_at_least(self.messages_per_producer) == True,
+            timeout_sec=120, backoff_sec=1,
+            err_msg="Producer did not produce all messages in reasonable amount of time"))
+
--- a/tests/kafkatest/tests/client/consumer_rolling_upgrade_test.py
+++ b/tests/kafkatest/tests/client/consumer_rolling_upgrade_test.py
@@ -0,0 +1,86 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from ducktape.mark.resource import cluster
+
+
+from kafkatest.tests.verifiable_consumer_test import VerifiableConsumerTest
+from kafkatest.services.kafka import TopicPartition
+
+class ConsumerRollingUpgradeTest(VerifiableConsumerTest):
+    TOPIC = "test_topic"
+    NUM_PARTITIONS = 4
+    RANGE = "org.apache.kafka.clients.consumer.RangeAssignor"
+    ROUND_ROBIN = "org.apache.kafka.clients.consumer.RoundRobinAssignor"
+
+    def __init__(self, test_context):
+        super(ConsumerRollingUpgradeTest, self).__init__(test_context, num_consumers=2, num_producers=0,
+                                                         num_zk=1, num_brokers=1, topics={
+            self.TOPIC : { 'partitions': self.NUM_PARTITIONS, 'replication-factor': 1 }
+        })
+
+    def _verify_range_assignment(self, consumer):
+        # range assignment should give us two partition sets: (0, 1) and (2, 3)
+        assignment = set([frozenset(partitions) for partitions in consumer.current_assignment().values()])
+        assert assignment == set([
+            frozenset([TopicPartition(self.TOPIC, 0), TopicPartition(self.TOPIC, 1)]),
+            frozenset([TopicPartition(self.TOPIC, 2), TopicPartition(self.TOPIC, 3)])]), \
+            "Mismatched assignment: %s" % assignment
+
+    def _verify_roundrobin_assignment(self, consumer):
+        assignment = set([frozenset(x) for x in consumer.current_assignment().values()])
+        assert assignment == set([
+            frozenset([TopicPartition(self.TOPIC, 0), TopicPartition(self.TOPIC, 2)]),
+            frozenset([TopicPartition(self.TOPIC, 1), TopicPartition(self.TOPIC, 3)])]), \
+            "Mismatched assignment: %s" % assignment
+
+    @cluster(num_nodes=4)
+    def rolling_update_test(self):
+        """
+        Verify rolling updates of partition assignment strategies works correctly. In this
+        test, we use a rolling restart to change the group's assignment strategy from "range" 
+        to "roundrobin." We verify after every restart that all members are still in the group
+        and that the correct assignment strategy was used.
+        """
+
+        # initialize the consumer using range assignment
+        consumer = self.setup_consumer(self.TOPIC, assignment_strategy=self.RANGE)
+
+        consumer.start()
+        self.await_all_members(consumer)
+        self._verify_range_assignment(consumer)
+
+        # change consumer configuration to prefer round-robin assignment, but still support range assignment
+        consumer.assignment_strategy = self.ROUND_ROBIN + "," + self.RANGE
+
+        # restart one of the nodes and verify that we are still using range assignment
+        consumer.stop_node(consumer.nodes[0])
+        consumer.start_node(consumer.nodes[0])
+        self.await_all_members(consumer)
+        self._verify_range_assignment(consumer)
+        
+        # now restart the other node and verify that we have switched to round-robin
+        consumer.stop_node(consumer.nodes[1])
+        consumer.start_node(consumer.nodes[1])
+        self.await_all_members(consumer)
+        self._verify_roundrobin_assignment(consumer)
+
+        # if we want, we can now drop support for range assignment
+        consumer.assignment_strategy = self.ROUND_ROBIN
+        for node in consumer.nodes:
+            consumer.stop_node(node)
+            consumer.start_node(node)
+            self.await_all_members(consumer)
+            self._verify_roundrobin_assignment(consumer)
--- a/tests/kafkatest/tests/client/consumer_test.py
+++ b/tests/kafkatest/tests/client/consumer_test.py
@@ -0,0 +1,430 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from ducktape.mark import matrix
+from ducktape.utils.util import wait_until
+from ducktape.mark.resource import cluster
+
+from kafkatest.tests.verifiable_consumer_test import VerifiableConsumerTest
+from kafkatest.services.kafka import TopicPartition
+
+import signal
+
+
+class OffsetValidationTest(VerifiableConsumerTest):
+    TOPIC = "test_topic"
+    NUM_PARTITIONS = 1
+
+    def __init__(self, test_context):
+        super(OffsetValidationTest, self).__init__(test_context, num_consumers=3, num_producers=1,
+                                                     num_zk=1, num_brokers=2, topics={
+            self.TOPIC : { 'partitions': self.NUM_PARTITIONS, 'replication-factor': 2 }
+        })
+
+    def rolling_bounce_consumers(self, consumer, keep_alive=0, num_bounces=5, clean_shutdown=True):
+        for _ in range(num_bounces):
+            for node in consumer.nodes[keep_alive:]:
+                consumer.stop_node(node, clean_shutdown)
+
+                wait_until(lambda: len(consumer.dead_nodes()) == 1,
+                           timeout_sec=self.session_timeout_sec+5,
+                           err_msg="Timed out waiting for the consumer to shutdown")
+
+                consumer.start_node(node)
+
+                self.await_all_members(consumer)
+                self.await_consumed_messages(consumer)
+
+    def bounce_all_consumers(self, consumer, keep_alive=0, num_bounces=5, clean_shutdown=True):
+        for _ in range(num_bounces):
+            for node in consumer.nodes[keep_alive:]:
+                consumer.stop_node(node, clean_shutdown)
+
+            wait_until(lambda: len(consumer.dead_nodes()) == self.num_consumers - keep_alive, timeout_sec=10,
+                       err_msg="Timed out waiting for the consumers to shutdown")
+
+            for node in consumer.nodes[keep_alive:]:
+                consumer.start_node(node)
+
+            self.await_all_members(consumer)
+            self.await_consumed_messages(consumer)
+
+    def rolling_bounce_brokers(self, consumer, num_bounces=5, clean_shutdown=True):
+        for _ in range(num_bounces):
+            for node in self.kafka.nodes:
+                self.kafka.restart_node(node, clean_shutdown=True)
+                self.await_all_members(consumer)
+                self.await_consumed_messages(consumer)
+
+    def setup_consumer(self, topic, **kwargs):
+        # collect verifiable consumer events since this makes debugging much easier
+        consumer = super(OffsetValidationTest, self).setup_consumer(topic, **kwargs)
+        self.mark_for_collect(consumer, 'verifiable_consumer_stdout')
+        return consumer
+
+    @cluster(num_nodes=7)
+    def test_broker_rolling_bounce(self):
+        """
+        Verify correct consumer behavior when the brokers are consecutively restarted.
+
+        Setup: single Kafka cluster with one producer writing messages to a single topic with one
+        partition, an a set of consumers in the same group reading from the same topic.
+
+        - Start a producer which continues producing new messages throughout the test.
+        - Start up the consumers and wait until they've joined the group.
+        - In a loop, restart each broker consecutively, waiting for the group to stabilize between
+          each broker restart.
+        - Verify delivery semantics according to the failure type and that the broker bounces
+          did not cause unexpected group rebalances.
+        """
+        partition = TopicPartition(self.TOPIC, 0)
+
+        producer = self.setup_producer(self.TOPIC)
+        consumer = self.setup_consumer(self.TOPIC)
+
+        producer.start()
+        self.await_produced_messages(producer)
+
+        consumer.start()
+        self.await_all_members(consumer)
+
+        num_rebalances = consumer.num_rebalances()
+        # TODO: make this test work with hard shutdowns, which probably requires
+        #       pausing before the node is restarted to ensure that any ephemeral
+        #       nodes have time to expire
+        self.rolling_bounce_brokers(consumer, clean_shutdown=True)
+
+        unexpected_rebalances = consumer.num_rebalances() - num_rebalances
+        assert unexpected_rebalances == 0, \
+            "Broker rolling bounce caused %d unexpected group rebalances" % unexpected_rebalances
+
+        consumer.stop_all()
+
+        assert consumer.current_position(partition) == consumer.total_consumed(), \
+            "Total consumed records %d did not match consumed position %d" % \
+            (consumer.total_consumed(), consumer.current_position(partition))
+
+    @cluster(num_nodes=7)
+    @matrix(clean_shutdown=[True], bounce_mode=["all", "rolling"])
+    def test_consumer_bounce(self, clean_shutdown, bounce_mode):
+        """
+        Verify correct consumer behavior when the consumers in the group are consecutively restarted.
+
+        Setup: single Kafka cluster with one producer and a set of consumers in one group.
+
+        - Start a producer which continues producing new messages throughout the test.
+        - Start up the consumers and wait until they've joined the group.
+        - In a loop, restart each consumer, waiting for each one to rejoin the group before
+          restarting the rest.
+        - Verify delivery semantics according to the failure type.
+        """
+        partition = TopicPartition(self.TOPIC, 0)
+
+        producer = self.setup_producer(self.TOPIC)
+        consumer = self.setup_consumer(self.TOPIC)
+
+        producer.start()
+        self.await_produced_messages(producer)
+
+        consumer.start()
+        self.await_all_members(consumer)
+
+        if bounce_mode == "all":
+            self.bounce_all_consumers(consumer, clean_shutdown=clean_shutdown)
+        else:
+            self.rolling_bounce_consumers(consumer, clean_shutdown=clean_shutdown)
+
+        consumer.stop_all()
+        if clean_shutdown:
+            # if the total records consumed matches the current position, we haven't seen any duplicates
+            # this can only be guaranteed with a clean shutdown
+            assert consumer.current_position(partition) == consumer.total_consumed(), \
+                "Total consumed records %d did not match consumed position %d" % \
+                (consumer.total_consumed(), consumer.current_position(partition))
+        else:
+            # we may have duplicates in a hard failure
+            assert consumer.current_position(partition) <= consumer.total_consumed(), \
+                "Current position %d greater than the total number of consumed records %d" % \
+                (consumer.current_position(partition), consumer.total_consumed())
+
+    @cluster(num_nodes=7)
+    @matrix(clean_shutdown=[True], static_membership=[True, False], bounce_mode=["all", "rolling"], num_bounces=[5])
+    def test_static_consumer_bounce(self, clean_shutdown, static_membership, bounce_mode, num_bounces):
+        """
+        Verify correct static consumer behavior when the consumers in the group are restarted. In order to make
+        sure the behavior of static members are different from dynamic ones, we take both static and dynamic
+        membership into this test suite.
+
+        Setup: single Kafka cluster with one producer and a set of consumers in one group.
+
+        - Start a producer which continues producing new messages throughout the test.
+        - Start up the consumers as static/dynamic members and wait until they've joined the group.
+        - In a loop, restart each consumer except the first member (note: may not be the leader), and expect no rebalance triggered
+          during this process if the group is in static membership.
+        """
+        partition = TopicPartition(self.TOPIC, 0)
+
+        producer = self.setup_producer(self.TOPIC)
+
+        producer.start()
+        self.await_produced_messages(producer)
+
+        self.session_timeout_sec = 60
+        consumer = self.setup_consumer(self.TOPIC, static_membership=static_membership)
+
+        consumer.start()
+        self.await_all_members(consumer)
+
+        num_revokes_before_bounce = consumer.num_revokes_for_alive()
+
+        num_keep_alive = 1
+
+        if bounce_mode == "all":
+            self.bounce_all_consumers(consumer, keep_alive=num_keep_alive, num_bounces=num_bounces)
+        else:
+            self.rolling_bounce_consumers(consumer, keep_alive=num_keep_alive, num_bounces=num_bounces)
+
+        num_revokes_after_bounce = consumer.num_revokes_for_alive() - num_revokes_before_bounce
+
+        check_condition = num_revokes_after_bounce != 0
+        # under static membership, the live consumer shall not revoke any current running partitions,
+        # since there is no global rebalance being triggered.
+        if static_membership:
+            check_condition = num_revokes_after_bounce == 0
+
+        assert check_condition, \
+            "Total revoked count %d does not match the expectation of having 0 revokes as %d" % \
+            (num_revokes_after_bounce, check_condition)
+
+        consumer.stop_all()
+        if clean_shutdown:
+            # if the total records consumed matches the current position, we haven't seen any duplicates
+            # this can only be guaranteed with a clean shutdown
+            assert consumer.current_position(partition) == consumer.total_consumed(), \
+                "Total consumed records %d did not match consumed position %d" % \
+                (consumer.total_consumed(), consumer.current_position(partition))
+        else:
+            # we may have duplicates in a hard failure
+            assert consumer.current_position(partition) <= consumer.total_consumed(), \
+                "Current position %d greater than the total number of consumed records %d" % \
+                (consumer.current_position(partition), consumer.total_consumed())
+
+    @cluster(num_nodes=10)
+    @matrix(num_conflict_consumers=[1, 2], fencing_stage=["stable", "all"])
+    def test_fencing_static_consumer(self, num_conflict_consumers, fencing_stage):
+        """
+        Verify correct static consumer behavior when there are conflicting consumers with same group.instance.id.
+
+        - Start a producer which continues producing new messages throughout the test.
+        - Start up the consumers as static members and wait until they've joined the group. Some conflict consumers will be configured with
+        - the same group.instance.id.
+        - Let normal consumers and fencing consumers start at the same time, and expect only unique consumers left.
+        """
+        partition = TopicPartition(self.TOPIC, 0)
+
+        producer = self.setup_producer(self.TOPIC)
+
+        producer.start()
+        self.await_produced_messages(producer)
+
+        self.session_timeout_sec = 60
+        consumer = self.setup_consumer(self.TOPIC, static_membership=True)
+
+        self.num_consumers = num_conflict_consumers
+        conflict_consumer = self.setup_consumer(self.TOPIC, static_membership=True)
+
+        # wait original set of consumer to stable stage before starting conflict members.
+        if fencing_stage == "stable":
+            consumer.start()
+            self.await_members(consumer, len(consumer.nodes))
+
+            conflict_consumer.start()
+            self.await_members(conflict_consumer, num_conflict_consumers)
+            self.await_members(consumer, len(consumer.nodes) - num_conflict_consumers)
+
+            assert len(consumer.dead_nodes()) == num_conflict_consumers
+        else:
+            consumer.start()
+            conflict_consumer.start()
+
+            wait_until(lambda: len(consumer.joined_nodes()) + len(conflict_consumer.joined_nodes()) == len(consumer.nodes),
+                       timeout_sec=self.session_timeout_sec,
+                       err_msg="Timed out waiting for consumers to join, expected total %d joined, but only see %d joined from"
+                               "normal consumer group and %d from conflict consumer group" % \
+                               (len(consumer.nodes), len(consumer.joined_nodes()), len(conflict_consumer.joined_nodes()))
+                       )
+            wait_until(lambda: len(consumer.dead_nodes()) + len(conflict_consumer.dead_nodes()) == len(conflict_consumer.nodes),
+                       timeout_sec=self.session_timeout_sec,
+                       err_msg="Timed out waiting for fenced consumers to die, expected total %d dead, but only see %d dead in"
+                               "normal consumer group and %d dead in conflict consumer group" % \
+                               (len(conflict_consumer.nodes), len(consumer.dead_nodes()), len(conflict_consumer.dead_nodes()))
+                       )
+
+    @cluster(num_nodes=7)
+    @matrix(clean_shutdown=[True], enable_autocommit=[True, False])
+    def test_consumer_failure(self, clean_shutdown, enable_autocommit):
+        partition = TopicPartition(self.TOPIC, 0)
+
+        consumer = self.setup_consumer(self.TOPIC, enable_autocommit=enable_autocommit)
+        producer = self.setup_producer(self.TOPIC)
+
+        consumer.start()
+        self.await_all_members(consumer)
+
+        partition_owner = consumer.owner(partition)
+        assert partition_owner is not None
+
+        # startup the producer and ensure that some records have been written
+        producer.start()
+        self.await_produced_messages(producer)
+
+        # stop the partition owner and await its shutdown
+        consumer.kill_node(partition_owner, clean_shutdown=clean_shutdown)
+        wait_until(lambda: len(consumer.joined_nodes()) == (self.num_consumers - 1) and consumer.owner(partition) != None,
+                   timeout_sec=self.session_timeout_sec*2+5,
+                   err_msg="Timed out waiting for consumer to close")
+
+        # ensure that the remaining consumer does some work after rebalancing
+        self.await_consumed_messages(consumer, min_messages=1000)
+
+        consumer.stop_all()
+
+        if clean_shutdown:
+            # if the total records consumed matches the current position, we haven't seen any duplicates
+            # this can only be guaranteed with a clean shutdown
+            assert consumer.current_position(partition) == consumer.total_consumed(), \
+                "Total consumed records %d did not match consumed position %d" % \
+                (consumer.total_consumed(), consumer.current_position(partition))
+        else:
+            # we may have duplicates in a hard failure
+            assert consumer.current_position(partition) <= consumer.total_consumed(), \
+                "Current position %d greater than the total number of consumed records %d" % \
+                (consumer.current_position(partition), consumer.total_consumed())
+
+        # if autocommit is not turned on, we can also verify the last committed offset
+        if not enable_autocommit:
+            assert consumer.last_commit(partition) == consumer.current_position(partition), \
+                "Last committed offset %d did not match last consumed position %d" % \
+                (consumer.last_commit(partition), consumer.current_position(partition))
+
+    @cluster(num_nodes=7)
+    @matrix(clean_shutdown=[True, False], enable_autocommit=[True, False])
+    def test_broker_failure(self, clean_shutdown, enable_autocommit):
+        partition = TopicPartition(self.TOPIC, 0)
+
+        consumer = self.setup_consumer(self.TOPIC, enable_autocommit=enable_autocommit)
+        producer = self.setup_producer(self.TOPIC)
+
+        producer.start()
+        consumer.start()
+        self.await_all_members(consumer)
+
+        num_rebalances = consumer.num_rebalances()
+
+        # shutdown one of the brokers
+        # TODO: we need a way to target the coordinator instead of picking arbitrarily
+        self.kafka.signal_node(self.kafka.nodes[0], signal.SIGTERM if clean_shutdown else signal.SIGKILL)
+
+        # ensure that the consumers do some work after the broker failure
+        self.await_consumed_messages(consumer, min_messages=1000)
+
+        # verify that there were no rebalances on failover
+        assert num_rebalances == consumer.num_rebalances(), "Broker failure should not cause a rebalance"
+
+        consumer.stop_all()
+
+        # if the total records consumed matches the current position, we haven't seen any duplicates
+        assert consumer.current_position(partition) == consumer.total_consumed(), \
+            "Total consumed records %d did not match consumed position %d" % \
+            (consumer.total_consumed(), consumer.current_position(partition))
+
+        # if autocommit is not turned on, we can also verify the last committed offset
+        if not enable_autocommit:
+            assert consumer.last_commit(partition) == consumer.current_position(partition), \
+                "Last committed offset %d did not match last consumed position %d" % \
+                (consumer.last_commit(partition), consumer.current_position(partition))
+
+    @cluster(num_nodes=7)
+    def test_group_consumption(self):
+        """
+        Verifies correct group rebalance behavior as consumers are started and stopped.
+        In particular, this test verifies that the partition is readable after every
+        expected rebalance.
+
+        Setup: single Kafka cluster with a group of consumers reading from one topic
+        with one partition while the verifiable producer writes to it.
+
+        - Start the consumers one by one, verifying consumption after each rebalance
+        - Shutdown the consumers one by one, verifying consumption after each rebalance
+        """
+        consumer = self.setup_consumer(self.TOPIC)
+        producer = self.setup_producer(self.TOPIC)
+
+        partition = TopicPartition(self.TOPIC, 0)
+
+        producer.start()
+
+        for num_started, node in enumerate(consumer.nodes, 1):
+            consumer.start_node(node)
+            self.await_members(consumer, num_started)
+            self.await_consumed_messages(consumer)
+
+        for num_stopped, node in enumerate(consumer.nodes, 1):
+            consumer.stop_node(node)
+
+            if num_stopped < self.num_consumers:
+                self.await_members(consumer, self.num_consumers - num_stopped)
+                self.await_consumed_messages(consumer)
+
+        assert consumer.current_position(partition) == consumer.total_consumed(), \
+            "Total consumed records %d did not match consumed position %d" % \
+            (consumer.total_consumed(), consumer.current_position(partition))
+
+        assert consumer.last_commit(partition) == consumer.current_position(partition), \
+            "Last committed offset %d did not match last consumed position %d" % \
+            (consumer.last_commit(partition), consumer.current_position(partition))
+
+class AssignmentValidationTest(VerifiableConsumerTest):
+    TOPIC = "test_topic"
+    NUM_PARTITIONS = 6
+
+    def __init__(self, test_context):
+        super(AssignmentValidationTest, self).__init__(test_context, num_consumers=3, num_producers=0,
+                                                num_zk=1, num_brokers=2, topics={
+            self.TOPIC : { 'partitions': self.NUM_PARTITIONS, 'replication-factor': 1 },
+        })
+
+    @cluster(num_nodes=6)
+    @matrix(assignment_strategy=["org.apache.kafka.clients.consumer.RangeAssignor",
+                                 "org.apache.kafka.clients.consumer.RoundRobinAssignor",
+                                 "org.apache.kafka.clients.consumer.StickyAssignor"])
+    def test_valid_assignment(self, assignment_strategy):
+        """
+        Verify assignment strategy correctness: each partition is assigned to exactly
+        one consumer instance.
+
+        Setup: single Kafka cluster with a set of consumers in the same group.
+
+        - Start the consumers one by one
+        - Validate assignment after every expected rebalance
+        """
+        consumer = self.setup_consumer(self.TOPIC, assignment_strategy=assignment_strategy)
+        for num_started, node in enumerate(consumer.nodes, 1):
+            consumer.start_node(node)
+            self.await_members(consumer, num_started)
+            assert self.valid_assignment(self.TOPIC, self.NUM_PARTITIONS, consumer.current_assignment()), \
+                "expected valid assignments of %d partitions when num_started %d: %s" % \
+                (self.NUM_PARTITIONS, num_started, \
+                 [(str(node.account), a) for node, a in consumer.current_assignment().items()])
--- a/tests/kafkatest/tests/client/message_format_change_test.py
+++ b/tests/kafkatest/tests/client/message_format_change_test.py
@@ -0,0 +1,104 @@
+# Copyright 2015 Confluent Inc.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from ducktape.mark import parametrize
+from ducktape.utils.util import wait_until
+from ducktape.mark.resource import cluster
+
+from kafkatest.services.console_consumer import ConsoleConsumer
+from kafkatest.services.kafka import KafkaService
+from kafkatest.services.verifiable_producer import VerifiableProducer
+from kafkatest.services.zookeeper import ZookeeperService
+from kafkatest.tests.produce_consume_validate import ProduceConsumeValidateTest
+from kafkatest.utils import is_int
+from kafkatest.version import LATEST_0_9, LATEST_0_10, LATEST_0_11, DEV_BRANCH, KafkaVersion
+
+
+class MessageFormatChangeTest(ProduceConsumeValidateTest):
+
+    def __init__(self, test_context):
+        super(MessageFormatChangeTest, self).__init__(test_context=test_context)
+
+    def setUp(self):
+        self.topic = "test_topic"
+        self.zk = ZookeeperService(self.test_context, num_nodes=1)
+            
+        self.zk.start()
+
+        # Producer and consumer
+        self.producer_throughput = 10000
+        self.num_producers = 1
+        self.num_consumers = 1
+        self.messages_per_producer = 100
+
+    def produce_and_consume(self, producer_version, consumer_version, group):
+        self.producer = VerifiableProducer(self.test_context, self.num_producers, self.kafka,
+                                           self.topic,
+                                           throughput=self.producer_throughput,
+                                           message_validator=is_int,
+                                           version=KafkaVersion(producer_version))
+        self.consumer = ConsoleConsumer(self.test_context, self.num_consumers, self.kafka,
+                                        self.topic, consumer_timeout_ms=30000,
+                                        message_validator=is_int, version=KafkaVersion(consumer_version))
+        self.consumer.group_id = group
+        self.run_produce_consume_validate(lambda: wait_until(
+            lambda: self.producer.each_produced_at_least(self.messages_per_producer) == True,
+            timeout_sec=120, backoff_sec=1,
+            err_msg="Producer did not produce all messages in reasonable amount of time"))
+
+    @cluster(num_nodes=12)
+    @parametrize(producer_version=str(DEV_BRANCH), consumer_version=str(DEV_BRANCH))
+    @parametrize(producer_version=str(LATEST_0_10), consumer_version=str(LATEST_0_10))
+    @parametrize(producer_version=str(LATEST_0_9), consumer_version=str(LATEST_0_9))
+    def test_compatibility(self, producer_version, consumer_version):
+        """ This tests performs the following checks:
+        The workload is a mix of 0.9.x, 0.10.x and 0.11.x producers and consumers
+        that produce to and consume from a DEV_BRANCH cluster
+        1. initially the topic is using message format 0.9.0
+        2. change the message format version for topic to 0.10.0 on the fly.
+        3. change the message format version for topic to 0.11.0 on the fly.
+        4. change the message format version for topic back to 0.10.0 on the fly (only if the client version is 0.11.0 or newer)
+        - The producers and consumers should not have any issue.
+
+        Note regarding step number 4. Downgrading the message format version is generally unsupported as it breaks
+        older clients. More concretely, if we downgrade a topic from 0.11.0 to 0.10.0 after it contains messages with
+        version 0.11.0, we will return the 0.11.0 messages without down conversion due to an optimisation in the
+        handling of fetch requests. This will break any consumer that doesn't support 0.11.0. So, in practice, step 4
+        is similar to step 2 and it didn't seem worth it to increase the cluster size to in order to add a step 5 that
+        would change the message format version for the topic back to 0.9.0.0.
+        """
+        self.kafka = KafkaService(self.test_context, num_nodes=3, zk=self.zk, version=DEV_BRANCH, topics={self.topic: {
+                                                                    "partitions": 3,
+                                                                    "replication-factor": 3,
+                                                                    'configs': {"min.insync.replicas": 2}}})
+       
+        self.kafka.start()
+        self.logger.info("First format change to 0.9.0")
+        self.kafka.alter_message_format(self.topic, str(LATEST_0_9))
+        self.produce_and_consume(producer_version, consumer_version, "group1")
+
+        self.logger.info("Second format change to 0.10.0")
+        self.kafka.alter_message_format(self.topic, str(LATEST_0_10))
+        self.produce_and_consume(producer_version, consumer_version, "group2")
+
+        self.logger.info("Third format change to 0.11.0")
+        self.kafka.alter_message_format(self.topic, str(LATEST_0_11))
+        self.produce_and_consume(producer_version, consumer_version, "group3")
+
+        if producer_version == str(DEV_BRANCH) and consumer_version == str(DEV_BRANCH):
+            self.logger.info("Fourth format change back to 0.10.0")
+            self.kafka.alter_message_format(self.topic, str(LATEST_0_10))
+            self.produce_and_consume(producer_version, consumer_version, "group4")
+
+
--- a/tests/kafkatest/tests/client/pluggable_test.py
+++ b/tests/kafkatest/tests/client/pluggable_test.py
@@ -0,0 +1,51 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from ducktape.utils.util import wait_until
+
+from kafkatest.tests.verifiable_consumer_test import VerifiableConsumerTest
+
+class PluggableConsumerTest(VerifiableConsumerTest):
+    """ Verify that the pluggable client framework works. """
+
+    TOPIC = "test_topic"
+    NUM_PARTITIONS = 1
+
+    def __init__(self, test_context):
+        super(PluggableConsumerTest, self).__init__(test_context, num_consumers=1, num_producers=0,
+                                num_zk=1, num_brokers=1, topics={
+                                self.TOPIC : { 'partitions': self.NUM_PARTITIONS, 'replication-factor': 1 },
+        })
+
+    def test_start_stop(self):
+        """
+        Test that a pluggable VerifiableConsumer module load works
+        """
+        consumer = self.setup_consumer(self.TOPIC)
+
+        for num_started, node in enumerate(consumer.nodes, 1):
+            consumer.start_node(node)
+
+        self.logger.debug("Waiting for %d nodes to start" % len(consumer.nodes))
+        wait_until(lambda: len(consumer.alive_nodes()) == len(consumer.nodes),
+                   timeout_sec=60,
+                   err_msg="Timed out waiting for consumers to start")
+        self.logger.debug("Started: %s" % str(consumer.alive_nodes()))
+        consumer.stop_all()
+
+        self.logger.debug("Waiting for %d nodes to stop" % len(consumer.nodes))
+        wait_until(lambda: len(consumer.dead_nodes()) == len(consumer.nodes),
+                   timeout_sec=self.session_timeout_sec+5,
+                   err_msg="Timed out waiting for consumers to shutdown")
--- a/tests/kafkatest/tests/client/quota_test.py
+++ b/tests/kafkatest/tests/client/quota_test.py
@@ -0,0 +1,236 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from ducktape.tests.test import Test
+from ducktape.mark import matrix, parametrize
+from ducktape.mark.resource import cluster
+
+from kafkatest.services.zookeeper import ZookeeperService
+from kafkatest.services.kafka import KafkaService
+from kafkatest.services.performance import ProducerPerformanceService
+from kafkatest.services.console_consumer import ConsoleConsumer
+from kafkatest.version import DEV_BRANCH, LATEST_1_1
+
+class QuotaConfig(object):
+    CLIENT_ID = 'client-id'
+    USER = 'user'
+    USER_CLIENT = '(user, client-id)'
+
+    LARGE_QUOTA = 1000 * 1000 * 1000
+    USER_PRINCIPAL = 'CN=systemtest'
+
+    def __init__(self, quota_type, override_quota, kafka):
+        if quota_type == QuotaConfig.CLIENT_ID:
+            if override_quota:
+                self.client_id = 'overridden_id'
+                self.producer_quota = 3750000
+                self.consumer_quota = 3000000
+                self.configure_quota(kafka, self.producer_quota, self.consumer_quota, ['clients', self.client_id])
+                self.configure_quota(kafka, QuotaConfig.LARGE_QUOTA, QuotaConfig.LARGE_QUOTA, ['clients', None])
+            else:
+                self.client_id = 'default_id'
+                self.producer_quota = 2500000
+                self.consumer_quota = 2000000
+                self.configure_quota(kafka, self.producer_quota, self.consumer_quota, ['clients', None])
+                self.configure_quota(kafka, QuotaConfig.LARGE_QUOTA, QuotaConfig.LARGE_QUOTA, ['clients', 'overridden_id'])
+        elif quota_type == QuotaConfig.USER:
+            if override_quota:
+                self.client_id = 'some_id'
+                self.producer_quota = 3750000
+                self.consumer_quota = 3000000
+                self.configure_quota(kafka, self.producer_quota, self.consumer_quota, ['users', QuotaConfig.USER_PRINCIPAL])
+                self.configure_quota(kafka, QuotaConfig.LARGE_QUOTA, QuotaConfig.LARGE_QUOTA, ['users', None])
+                self.configure_quota(kafka, QuotaConfig.LARGE_QUOTA, QuotaConfig.LARGE_QUOTA, ['clients', self.client_id])
+            else:
+                self.client_id = 'some_id'
+                self.producer_quota = 2500000
+                self.consumer_quota = 2000000
+                self.configure_quota(kafka, self.producer_quota, self.consumer_quota, ['users', None])
+                self.configure_quota(kafka, QuotaConfig.LARGE_QUOTA, QuotaConfig.LARGE_QUOTA, ['clients', None])
+        elif quota_type == QuotaConfig.USER_CLIENT:
+            if override_quota:
+                self.client_id = 'overridden_id'
+                self.producer_quota = 3750000
+                self.consumer_quota = 3000000
+                self.configure_quota(kafka, self.producer_quota, self.consumer_quota, ['users', QuotaConfig.USER_PRINCIPAL, 'clients', self.client_id])
+                self.configure_quota(kafka, QuotaConfig.LARGE_QUOTA, QuotaConfig.LARGE_QUOTA, ['users', QuotaConfig.USER_PRINCIPAL, 'clients', None])
+                self.configure_quota(kafka, QuotaConfig.LARGE_QUOTA, QuotaConfig.LARGE_QUOTA, ['users', None])
+                self.configure_quota(kafka, QuotaConfig.LARGE_QUOTA, QuotaConfig.LARGE_QUOTA, ['clients', self.client_id])
+            else:
+                self.client_id = 'default_id'
+                self.producer_quota = 2500000
+                self.consumer_quota = 2000000
+                self.configure_quota(kafka, self.producer_quota, self.consumer_quota, ['users', None, 'clients', None])
+                self.configure_quota(kafka, QuotaConfig.LARGE_QUOTA, QuotaConfig.LARGE_QUOTA, ['users', None])
+                self.configure_quota(kafka, QuotaConfig.LARGE_QUOTA, QuotaConfig.LARGE_QUOTA, ['clients', None])
+
+    def configure_quota(self, kafka, producer_byte_rate, consumer_byte_rate, entity_args):
+        node = kafka.nodes[0]
+        cmd = "%s --zookeeper %s --alter --add-config producer_byte_rate=%d,consumer_byte_rate=%d" % \
+              (kafka.path.script("kafka-configs.sh", node), kafka.zk_connect_setting(), producer_byte_rate, consumer_byte_rate)
+        cmd += " --entity-type " + entity_args[0] + self.entity_name_opt(entity_args[1])
+        if len(entity_args) > 2:
+            cmd += " --entity-type " + entity_args[2] + self.entity_name_opt(entity_args[3])
+        node.account.ssh(cmd)
+
+    def entity_name_opt(self, name):
+        return " --entity-default" if name is None else " --entity-name " + name
+
+class QuotaTest(Test):
+    """
+    These tests verify that quota provides expected functionality -- they run
+    producer, broker, and consumer with different clientId and quota configuration and
+    check that the observed throughput is close to the value we expect.
+    """
+
+    def __init__(self, test_context):
+        """:type test_context: ducktape.tests.test.TestContext"""
+        super(QuotaTest, self).__init__(test_context=test_context)
+
+        self.topic = 'test_topic'
+        self.logger.info('use topic ' + self.topic)
+
+        self.maximum_client_deviation_percentage = 100.0
+        self.maximum_broker_deviation_percentage = 5.0
+        self.num_records = 50000
+        self.record_size = 3000
+
+        self.zk = ZookeeperService(test_context, num_nodes=1)
+        self.kafka = KafkaService(test_context, num_nodes=1, zk=self.zk,
+                                  security_protocol='SSL', authorizer_class_name='',
+                                  interbroker_security_protocol='SSL',
+                                  topics={self.topic: {'partitions': 6, 'replication-factor': 1, 'configs': {'min.insync.replicas': 1}}},
+                                  jmx_object_names=['kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec',
+                                                    'kafka.server:type=BrokerTopicMetrics,name=BytesOutPerSec'],
+                                  jmx_attributes=['OneMinuteRate'])
+        self.num_producers = 1
+        self.num_consumers = 2
+
+    def setUp(self):
+        self.zk.start()
+
+    def min_cluster_size(self):
+        """Override this since we're adding services outside of the constructor"""
+        return super(QuotaTest, self).min_cluster_size() + self.num_producers + self.num_consumers
+
+    @cluster(num_nodes=5)
+    @matrix(quota_type=[QuotaConfig.CLIENT_ID, QuotaConfig.USER, QuotaConfig.USER_CLIENT], override_quota=[True, False])
+    @parametrize(quota_type=QuotaConfig.CLIENT_ID, consumer_num=2)
+    @parametrize(quota_type=QuotaConfig.CLIENT_ID, old_broker_throttling_behavior=True)
+    @parametrize(quota_type=QuotaConfig.CLIENT_ID, old_client_throttling_behavior=True)
+    def test_quota(self, quota_type, override_quota=True, producer_num=1, consumer_num=1,
+                   old_broker_throttling_behavior=False, old_client_throttling_behavior=False):
+        # Old (pre-2.0) throttling behavior for broker throttles before sending a response to the client.
+        if old_broker_throttling_behavior:
+            self.kafka.set_version(LATEST_1_1)
+        self.kafka.start()
+
+        self.quota_config = QuotaConfig(quota_type, override_quota, self.kafka)
+        producer_client_id = self.quota_config.client_id
+        consumer_client_id = self.quota_config.client_id
+
+        # Old (pre-2.0) throttling behavior for client does not throttle upon receiving a response with a non-zero throttle time.
+        if old_client_throttling_behavior:
+            client_version = LATEST_1_1
+        else:
+            client_version = DEV_BRANCH
+
+        # Produce all messages
+        producer = ProducerPerformanceService(
+            self.test_context, producer_num, self.kafka,
+            topic=self.topic, num_records=self.num_records, record_size=self.record_size, throughput=-1,
+            client_id=producer_client_id, version=client_version)
+
+        producer.run()
+
+        # Consume all messages
+        consumer = ConsoleConsumer(self.test_context, consumer_num, self.kafka, self.topic,
+            consumer_timeout_ms=60000, client_id=consumer_client_id,
+            jmx_object_names=['kafka.consumer:type=consumer-fetch-manager-metrics,client-id=%s' % consumer_client_id],
+            jmx_attributes=['bytes-consumed-rate'], version=client_version)
+        consumer.run()
+
+        for idx, messages in consumer.messages_consumed.iteritems():
+            assert len(messages) > 0, "consumer %d didn't consume any message before timeout" % idx
+
+        success, msg = self.validate(self.kafka, producer, consumer)
+        assert success, msg
+
+    def validate(self, broker, producer, consumer):
+        """
+        For each client_id we validate that:
+        1) number of consumed messages equals number of produced messages
+        2) maximum_producer_throughput <= producer_quota * (1 + maximum_client_deviation_percentage/100)
+        3) maximum_broker_byte_in_rate <= producer_quota * (1 + maximum_broker_deviation_percentage/100)
+        4) maximum_consumer_throughput <= consumer_quota * (1 + maximum_client_deviation_percentage/100)
+        5) maximum_broker_byte_out_rate <= consumer_quota * (1 + maximum_broker_deviation_percentage/100)
+        """
+        success = True
+        msg = ''
+
+        self.kafka.read_jmx_output_all_nodes()
+
+        # validate that number of consumed messages equals number of produced messages
+        produced_num = sum([value['records'] for value in producer.results])
+        consumed_num = sum([len(value) for value in consumer.messages_consumed.values()])
+        self.logger.info('producer produced %d messages' % produced_num)
+        self.logger.info('consumer consumed %d messages' % consumed_num)
+        if produced_num != consumed_num:
+            success = False
+            msg += "number of produced messages %d doesn't equal number of consumed messages %d" % (produced_num, consumed_num)
+
+        # validate that maximum_producer_throughput <= producer_quota * (1 + maximum_client_deviation_percentage/100)
+        producer_maximum_bps = max(
+            metric.value for k, metrics in producer.metrics(group='producer-metrics', name='outgoing-byte-rate', client_id=producer.client_id) for metric in metrics
+        )
+        producer_quota_bps = self.quota_config.producer_quota
+        self.logger.info('producer has maximum throughput %.2f bps with producer quota %.2f bps' % (producer_maximum_bps, producer_quota_bps))
+        if producer_maximum_bps > producer_quota_bps*(self.maximum_client_deviation_percentage/100+1):
+            success = False
+            msg += 'maximum producer throughput %.2f bps exceeded producer quota %.2f bps by more than %.1f%%' % \
+                   (producer_maximum_bps, producer_quota_bps, self.maximum_client_deviation_percentage)
+
+        # validate that maximum_broker_byte_in_rate <= producer_quota * (1 + maximum_broker_deviation_percentage/100)
+        broker_byte_in_attribute_name = 'kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec:OneMinuteRate'
+        broker_maximum_byte_in_bps = broker.maximum_jmx_value[broker_byte_in_attribute_name]
+        self.logger.info('broker has maximum byte-in rate %.2f bps with producer quota %.2f bps' %
+                         (broker_maximum_byte_in_bps, producer_quota_bps))
+        if broker_maximum_byte_in_bps > producer_quota_bps*(self.maximum_broker_deviation_percentage/100+1):
+            success = False
+            msg += 'maximum broker byte-in rate %.2f bps exceeded producer quota %.2f bps by more than %.1f%%' % \
+                   (broker_maximum_byte_in_bps, producer_quota_bps, self.maximum_broker_deviation_percentage)
+
+        # validate that maximum_consumer_throughput <= consumer_quota * (1 + maximum_client_deviation_percentage/100)
+        consumer_attribute_name = 'kafka.consumer:type=consumer-fetch-manager-metrics,client-id=%s:bytes-consumed-rate' % consumer.client_id
+        consumer_maximum_bps = consumer.maximum_jmx_value[consumer_attribute_name]
+        consumer_quota_bps = self.quota_config.consumer_quota
+        self.logger.info('consumer has maximum throughput %.2f bps with consumer quota %.2f bps' % (consumer_maximum_bps, consumer_quota_bps))
+        if consumer_maximum_bps > consumer_quota_bps*(self.maximum_client_deviation_percentage/100+1):
+            success = False
+            msg += 'maximum consumer throughput %.2f bps exceeded consumer quota %.2f bps by more than %.1f%%' % \
+                   (consumer_maximum_bps, consumer_quota_bps, self.maximum_client_deviation_percentage)
+
+        # validate that maximum_broker_byte_out_rate <= consumer_quota * (1 + maximum_broker_deviation_percentage/100)
+        broker_byte_out_attribute_name = 'kafka.server:type=BrokerTopicMetrics,name=BytesOutPerSec:OneMinuteRate'
+        broker_maximum_byte_out_bps = broker.maximum_jmx_value[broker_byte_out_attribute_name]
+        self.logger.info('broker has maximum byte-out rate %.2f bps with consumer quota %.2f bps' %
+                         (broker_maximum_byte_out_bps, consumer_quota_bps))
+        if broker_maximum_byte_out_bps > consumer_quota_bps*(self.maximum_broker_deviation_percentage/100+1):
+            success = False
+            msg += 'maximum broker byte-out rate %.2f bps exceeded consumer quota %.2f bps by more than %.1f%%' % \
+                   (broker_maximum_byte_out_bps, consumer_quota_bps, self.maximum_broker_deviation_percentage)
+
+        return success, msg
+
--- a/tests/kafkatest/tests/client/truncation_test.py
+++ b/tests/kafkatest/tests/client/truncation_test.py
@@ -0,0 +1,149 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from ducktape.mark.resource import cluster
+from ducktape.utils.util import wait_until
+
+from kafkatest.tests.verifiable_consumer_test import VerifiableConsumerTest
+from kafkatest.services.kafka import TopicPartition
+from kafkatest.services.verifiable_consumer import VerifiableConsumer
+
+
+class TruncationTest(VerifiableConsumerTest):
+    TOPIC = "test_topic"
+    NUM_PARTITIONS = 1
+    TOPICS = {
+        TOPIC: {
+            'partitions': NUM_PARTITIONS,
+            'replication-factor': 2
+        }
+    }
+    GROUP_ID = "truncation-test"
+
+    def __init__(self, test_context):
+        super(TruncationTest, self).__init__(test_context, num_consumers=1, num_producers=1,
+                                             num_zk=1, num_brokers=3, topics=self.TOPICS)
+        self.last_total = 0
+        self.all_offsets_consumed = []
+        self.all_values_consumed = []
+
+    def setup_consumer(self, topic, **kwargs):
+        consumer = super(TruncationTest, self).setup_consumer(topic, **kwargs)
+        self.mark_for_collect(consumer, 'verifiable_consumer_stdout')
+
+        def print_record(event, node):
+            self.all_offsets_consumed.append(event['offset'])
+            self.all_values_consumed.append(event['value'])
+        consumer.on_record_consumed = print_record
+
+        return consumer
+
+    @cluster(num_nodes=7)
+    def test_offset_truncate(self):
+        """
+        Verify correct consumer behavior when the brokers are consecutively restarted.
+
+        Setup: single Kafka cluster with one producer writing messages to a single topic with one
+        partition, an a set of consumers in the same group reading from the same topic.
+
+        - Start a producer which continues producing new messages throughout the test.
+        - Start up the consumers and wait until they've joined the group.
+        - In a loop, restart each broker consecutively, waiting for the group to stabilize between
+          each broker restart.
+        - Verify delivery semantics according to the failure type and that the broker bounces
+          did not cause unexpected group rebalances.
+        """
+        tp = TopicPartition(self.TOPIC, 0)
+
+        producer = self.setup_producer(self.TOPIC, throughput=10)
+        producer.start()
+        self.await_produced_messages(producer, min_messages=10)
+
+        consumer = self.setup_consumer(self.TOPIC, reset_policy="earliest", verify_offsets=False)
+        consumer.start()
+        self.await_all_members(consumer)
+
+        # Reduce ISR to one node
+        isr = self.kafka.isr_idx_list(self.TOPIC, 0)
+        node1 = self.kafka.get_node(isr[0])
+        self.kafka.stop_node(node1)
+        self.logger.info("Reduced ISR to one node, consumer is at %s", consumer.current_position(tp))
+
+        # Ensure remaining ISR member has a little bit of data
+        current_total = consumer.total_consumed()
+        wait_until(lambda: consumer.total_consumed() > current_total + 10,
+                   timeout_sec=30,
+                   err_msg="Timed out waiting for consumer to move ahead by 10 messages")
+
+        # Kill last ISR member
+        node2 = self.kafka.get_node(isr[1])
+        self.kafka.stop_node(node2)
+        self.logger.info("No members in ISR, consumer is at %s", consumer.current_position(tp))
+
+        # Keep consuming until we've caught up to HW
+        def none_consumed(this, consumer):
+            new_total = consumer.total_consumed()
+            if new_total == this.last_total:
+                return True
+            else:
+                this.last_total = new_total
+                return False
+
+        self.last_total = consumer.total_consumed()
+        wait_until(lambda: none_consumed(self, consumer),
+                   timeout_sec=30,
+                   err_msg="Timed out waiting for the consumer to catch up")
+
+        self.kafka.start_node(node1)
+        self.logger.info("Out of sync replica is online, but not electable. Consumer is at  %s", consumer.current_position(tp))
+
+        pre_truncation_pos = consumer.current_position(tp)
+
+        self.kafka.set_unclean_leader_election(self.TOPIC)
+        self.logger.info("New unclean leader, consumer is at %s", consumer.current_position(tp))
+
+        # Wait for truncation to be detected
+        self.kafka.start_node(node2)
+        wait_until(lambda: consumer.current_position(tp) >= pre_truncation_pos,
+                   timeout_sec=30,
+                   err_msg="Timed out waiting for truncation")
+
+        # Make sure we didn't reset to beginning of log
+        total_records_consumed = len(self.all_values_consumed)
+        assert total_records_consumed == len(set(self.all_values_consumed)), "Received duplicate records"
+
+        consumer.stop()
+        producer.stop()
+
+        # Re-consume all the records
+        consumer2 = VerifiableConsumer(self.test_context, 1, self.kafka, self.TOPIC, group_id="group2",
+                                       reset_policy="earliest", verify_offsets=True)
+
+        consumer2.start()
+        self.await_all_members(consumer2)
+
+        wait_until(lambda: consumer2.total_consumed() > 0,
+           timeout_sec=30,
+           err_msg="Timed out waiting for consumer to consume at least 10 messages")
+
+        self.last_total = consumer2.total_consumed()
+        wait_until(lambda: none_consumed(self, consumer2),
+               timeout_sec=30,
+               err_msg="Timed out waiting for the consumer to fully consume data")
+
+        second_total_consumed = consumer2.total_consumed()
+        assert second_total_consumed < total_records_consumed, "Expected fewer records with new consumer since we truncated"
+        self.logger.info("Second consumer saw only %s, meaning %s were truncated",
+                         second_total_consumed, total_records_consumed - second_total_consumed)
--- a/tests/kafkatest/tests/connect/init.py
+++ b/tests/kafkatest/tests/connect/init.py
@@ -0,0 +1,14 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
--- a/tests/kafkatest/tests/connect/connect_distributed_test.py
+++ b/tests/kafkatest/tests/connect/connect_distributed_test.py
@@ -0,0 +1,599 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from ducktape.tests.test import Test
+from ducktape.mark.resource import cluster
+from ducktape.utils.util import wait_until
+from ducktape.mark import matrix, parametrize
+from ducktape.cluster.remoteaccount import RemoteCommandError
+
+from kafkatest.services.zookeeper import ZookeeperService
+from kafkatest.services.kafka import KafkaService, config_property
+from kafkatest.services.connect import ConnectDistributedService, VerifiableSource, VerifiableSink, ConnectRestError, MockSink, MockSource
+from kafkatest.services.console_consumer import ConsoleConsumer
+from kafkatest.services.security.security_config import SecurityConfig
+from kafkatest.version import DEV_BRANCH, LATEST_2_3, LATEST_2_2, LATEST_2_1, LATEST_2_0, LATEST_1_1, LATEST_1_0, LATEST_0_11_0, LATEST_0_10_2, LATEST_0_10_1, LATEST_0_10_0, LATEST_0_9, LATEST_0_8_2, KafkaVersion
+
+from collections import Counter, namedtuple
+import itertools
+import json
+import operator
+import time
+
+class ConnectDistributedTest(Test):
+    """
+    Simple test of Kafka Connect in distributed mode, producing data from files on one cluster and consuming it on
+    another, validating the total output is identical to the input.
+    """
+
+    FILE_SOURCE_CONNECTOR = 'org.apache.kafka.connect.file.FileStreamSourceConnector'
+    FILE_SINK_CONNECTOR = 'org.apache.kafka.connect.file.FileStreamSinkConnector'
+
+    INPUT_FILE = "/mnt/connect.input"
+    OUTPUT_FILE = "/mnt/connect.output"
+
+    TOPIC = "test"
+    OFFSETS_TOPIC = "connect-offsets"
+    OFFSETS_REPLICATION_FACTOR = "1"
+    OFFSETS_PARTITIONS = "1"
+    CONFIG_TOPIC = "connect-configs"
+    CONFIG_REPLICATION_FACTOR = "1"
+    STATUS_TOPIC = "connect-status"
+    STATUS_REPLICATION_FACTOR = "1"
+    STATUS_PARTITIONS = "1"
+    SCHEDULED_REBALANCE_MAX_DELAY_MS = "60000"
+    CONNECT_PROTOCOL="sessioned"
+
+    # Since tasks can be assigned to any node and we're testing with files, we need to make sure the content is the same
+    # across all nodes.
+    FIRST_INPUT_LIST = ["foo", "bar", "baz"]
+    FIRST_INPUTS = "\n".join(FIRST_INPUT_LIST) + "\n"
+    SECOND_INPUT_LIST = ["razz", "ma", "tazz"]
+    SECOND_INPUTS = "\n".join(SECOND_INPUT_LIST) + "\n"
+
+    SCHEMA = { "type": "string", "optional": False }
+
+    def __init__(self, test_context):
+        super(ConnectDistributedTest, self).__init__(test_context)
+        self.num_zk = 1
+        self.num_brokers = 1
+        self.topics = {
+            self.TOPIC: {'partitions': 1, 'replication-factor': 1}
+        }
+
+        self.zk = ZookeeperService(test_context, self.num_zk)
+
+        self.key_converter = "org.apache.kafka.connect.json.JsonConverter"
+        self.value_converter = "org.apache.kafka.connect.json.JsonConverter"
+        self.schemas = True
+
+    def setup_services(self, security_protocol=SecurityConfig.PLAINTEXT, timestamp_type=None, broker_version=DEV_BRANCH, auto_create_topics=False):
+        self.kafka = KafkaService(self.test_context, self.num_brokers, self.zk,
+                                  security_protocol=security_protocol, interbroker_security_protocol=security_protocol,
+                                  topics=self.topics, version=broker_version,
+                                  server_prop_overides=[["auto.create.topics.enable", str(auto_create_topics)]])
+        if timestamp_type is not None:
+            for node in self.kafka.nodes:
+                node.config[config_property.MESSAGE_TIMESTAMP_TYPE] = timestamp_type
+
+        self.cc = ConnectDistributedService(self.test_context, 3, self.kafka, [self.INPUT_FILE, self.OUTPUT_FILE])
+        self.cc.log_level = "DEBUG"
+
+        self.zk.start()
+        self.kafka.start()
+
+    def _start_connector(self, config_file):
+        connector_props = self.render(config_file)
+        connector_config = dict([line.strip().split('=', 1) for line in connector_props.split('\n') if line.strip() and not line.strip().startswith('#')])
+        self.cc.create_connector(connector_config)
+            
+    def _connector_status(self, connector, node=None):
+        try:
+            return self.cc.get_connector_status(connector, node)
+        except ConnectRestError:
+            return None
+
+    def _connector_has_state(self, status, state):
+        return status is not None and status['connector']['state'] == state
+
+    def _task_has_state(self, task_id, status, state):
+        if not status:
+            return False
+
+        tasks = status['tasks']
+        if not tasks:
+            return False
+
+        for task in tasks:
+            if task['id'] == task_id:
+                return task['state'] == state
+
+        return False
+
+    def _all_tasks_have_state(self, status, task_count, state):
+        if status is None:
+            return False
+
+        tasks = status['tasks']
+        if len(tasks) != task_count:
+            return False
+
+        return reduce(operator.and_, [task['state'] == state for task in tasks], True)
+
+    def is_running(self, connector, node=None):
+        status = self._connector_status(connector.name, node)
+        return self._connector_has_state(status, 'RUNNING') and self._all_tasks_have_state(status, connector.tasks, 'RUNNING')
+
+    def is_paused(self, connector, node=None):
+        status = self._connector_status(connector.name, node)
+        return self._connector_has_state(status, 'PAUSED') and self._all_tasks_have_state(status, connector.tasks, 'PAUSED')
+
+    def connector_is_running(self, connector, node=None):
+        status = self._connector_status(connector.name, node)
+        return self._connector_has_state(status, 'RUNNING')
+
+    def connector_is_failed(self, connector, node=None):
+        status = self._connector_status(connector.name, node)
+        return self._connector_has_state(status, 'FAILED')
+
+    def task_is_failed(self, connector, task_id, node=None):
+        status = self._connector_status(connector.name, node)
+        return self._task_has_state(task_id, status, 'FAILED')
+
+    def task_is_running(self, connector, task_id, node=None):
+        status = self._connector_status(connector.name, node)
+        return self._task_has_state(task_id, status, 'RUNNING')
+
+    @cluster(num_nodes=5)
+    @matrix(connect_protocol=['sessioned', 'compatible', 'eager'])
+    def test_restart_failed_connector(self, connect_protocol):
+        self.CONNECT_PROTOCOL = connect_protocol
+        self.setup_services()
+        self.cc.set_configs(lambda node: self.render("connect-distributed.properties", node=node))
+        self.cc.start()
+
+        self.sink = MockSink(self.cc, self.topics.keys(), mode='connector-failure', delay_sec=5)
+        self.sink.start()
+
+        wait_until(lambda: self.connector_is_failed(self.sink), timeout_sec=15,
+                   err_msg="Failed to see connector transition to the FAILED state")
+
+        self.cc.restart_connector(self.sink.name)
+        
+        wait_until(lambda: self.connector_is_running(self.sink), timeout_sec=10,
+                   err_msg="Failed to see connector transition to the RUNNING state")
+
+    @cluster(num_nodes=5)
+    @matrix(connector_type=['source', 'sink'], connect_protocol=['sessioned', 'compatible', 'eager'])
+    def test_restart_failed_task(self, connector_type, connect_protocol):
+        self.CONNECT_PROTOCOL = connect_protocol
+        self.setup_services()
+        self.cc.set_configs(lambda node: self.render("connect-distributed.properties", node=node))
+        self.cc.start()
+
+        connector = None
+        if connector_type == "sink":
+            connector = MockSink(self.cc, self.topics.keys(), mode='task-failure', delay_sec=5)
+        else:
+            connector = MockSource(self.cc, mode='task-failure', delay_sec=5)
+            
+        connector.start()
+
+        task_id = 0
+        wait_until(lambda: self.task_is_failed(connector, task_id), timeout_sec=20,
+                   err_msg="Failed to see task transition to the FAILED state")
+
+        self.cc.restart_task(connector.name, task_id)
+        
+        wait_until(lambda: self.task_is_running(connector, task_id), timeout_sec=10,
+                   err_msg="Failed to see task transition to the RUNNING state")
+
+    @cluster(num_nodes=5)
+    @matrix(connect_protocol=['sessioned', 'compatible', 'eager'])
+    def test_pause_and_resume_source(self, connect_protocol):
+        """
+        Verify that source connectors stop producing records when paused and begin again after
+        being resumed.
+        """
+
+        self.CONNECT_PROTOCOL = connect_protocol
+        self.setup_services()
+        self.cc.set_configs(lambda node: self.render("connect-distributed.properties", node=node))
+        self.cc.start()
+
+        self.source = VerifiableSource(self.cc, topic=self.TOPIC)
+        self.source.start()
+
+        wait_until(lambda: self.is_running(self.source), timeout_sec=30,
+                   err_msg="Failed to see connector transition to the RUNNING state")
+        
+        self.cc.pause_connector(self.source.name)
+
+        # wait until all nodes report the paused transition
+        for node in self.cc.nodes:
+            wait_until(lambda: self.is_paused(self.source, node), timeout_sec=30,
+                       err_msg="Failed to see connector transition to the PAUSED state")
+
+        # verify that we do not produce new messages while paused
+        num_messages = len(self.source.sent_messages())
+        time.sleep(10)
+        assert num_messages == len(self.source.sent_messages()), "Paused source connector should not produce any messages"
+
+        self.cc.resume_connector(self.source.name)
+
+        for node in self.cc.nodes:
+            wait_until(lambda: self.is_running(self.source, node), timeout_sec=30,
+                       err_msg="Failed to see connector transition to the RUNNING state")
+
+        # after resuming, we should see records produced again
+        wait_until(lambda: len(self.source.sent_messages()) > num_messages, timeout_sec=30,
+                   err_msg="Failed to produce messages after resuming source connector")
+
+    @cluster(num_nodes=5)
+    @matrix(connect_protocol=['sessioned', 'compatible', 'eager'])
+    def test_pause_and_resume_sink(self, connect_protocol):
+        """
+        Verify that sink connectors stop consuming records when paused and begin again after
+        being resumed.
+        """
+
+        self.CONNECT_PROTOCOL = connect_protocol
+        self.setup_services()
+        self.cc.set_configs(lambda node: self.render("connect-distributed.properties", node=node))
+        self.cc.start()
+
+        # use the verifiable source to produce a steady stream of messages
+        self.source = VerifiableSource(self.cc, topic=self.TOPIC)
+        self.source.start()
+
+        wait_until(lambda: len(self.source.committed_messages()) > 0, timeout_sec=30,
+                   err_msg="Timeout expired waiting for source task to produce a message")
+
+        self.sink = VerifiableSink(self.cc, topics=[self.TOPIC])
+        self.sink.start()
+
+        wait_until(lambda: self.is_running(self.sink), timeout_sec=30,
+                   err_msg="Failed to see connector transition to the RUNNING state")
+        
+        self.cc.pause_connector(self.sink.name)
+
+        # wait until all nodes report the paused transition
+        for node in self.cc.nodes:
+            wait_until(lambda: self.is_paused(self.sink, node), timeout_sec=30,
+                       err_msg="Failed to see connector transition to the PAUSED state")
+
+        # verify that we do not consume new messages while paused
+        num_messages = len(self.sink.received_messages())
+        time.sleep(10)
+        assert num_messages == len(self.sink.received_messages()), "Paused sink connector should not consume any messages"
+
+        self.cc.resume_connector(self.sink.name)
+
+        for node in self.cc.nodes:
+            wait_until(lambda: self.is_running(self.sink, node), timeout_sec=30,
+                       err_msg="Failed to see connector transition to the RUNNING state")
+
+        # after resuming, we should see records consumed again
+        wait_until(lambda: len(self.sink.received_messages()) > num_messages, timeout_sec=30,
+                   err_msg="Failed to consume messages after resuming sink connector")
+
+    @cluster(num_nodes=5)
+    @matrix(connect_protocol=['sessioned', 'compatible', 'eager'])
+    def test_pause_state_persistent(self, connect_protocol):
+        """
+        Verify that paused state is preserved after a cluster restart.
+        """
+
+        self.CONNECT_PROTOCOL = connect_protocol
+        self.setup_services()
+        self.cc.set_configs(lambda node: self.render("connect-distributed.properties", node=node))
+        self.cc.start()
+
+        self.source = VerifiableSource(self.cc, topic=self.TOPIC)
+        self.source.start()
+
+        wait_until(lambda: self.is_running(self.source), timeout_sec=30,
+                   err_msg="Failed to see connector transition to the RUNNING state")
+        
+        self.cc.pause_connector(self.source.name)
+
+        self.cc.restart()
+
+        # we should still be paused after restarting
+        for node in self.cc.nodes:
+            wait_until(lambda: self.is_paused(self.source, node), timeout_sec=120,
+                       err_msg="Failed to see connector startup in PAUSED state")
+
+    @cluster(num_nodes=6)
+    @matrix(security_protocol=[SecurityConfig.PLAINTEXT, SecurityConfig.SASL_SSL], connect_protocol=['sessioned', 'compatible', 'eager'])
+    def test_file_source_and_sink(self, security_protocol, connect_protocol):
+        """
+        Tests that a basic file connector works across clean rolling bounces. This validates that the connector is
+        correctly created, tasks instantiated, and as nodes restart the work is rebalanced across nodes.
+        """
+
+        self.CONNECT_PROTOCOL = connect_protocol
+        self.setup_services(security_protocol=security_protocol)
+        self.cc.set_configs(lambda node: self.render("connect-distributed.properties", node=node))
+
+        self.cc.start()
+
+        self.logger.info("Creating connectors")
+        self._start_connector("connect-file-source.properties")
+        self._start_connector("connect-file-sink.properties")
+        
+        # Generating data on the source node should generate new records and create new output on the sink node. Timeouts
+        # here need to be more generous than they are for standalone mode because a) it takes longer to write configs,
+        # do rebalancing of the group, etc, and b) without explicit leave group support, rebalancing takes awhile
+        for node in self.cc.nodes:
+            node.account.ssh("echo -e -n " + repr(self.FIRST_INPUTS) + " >> " + self.INPUT_FILE)
+        wait_until(lambda: self._validate_file_output(self.FIRST_INPUT_LIST), timeout_sec=70, err_msg="Data added to input file was not seen in the output file in a reasonable amount of time.")
+
+        # Restarting both should result in them picking up where they left off,
+        # only processing new data.
+        self.cc.restart()
+
+        for node in self.cc.nodes:
+            node.account.ssh("echo -e -n " + repr(self.SECOND_INPUTS) + " >> " + self.INPUT_FILE)
+        wait_until(lambda: self._validate_file_output(self.FIRST_INPUT_LIST + self.SECOND_INPUT_LIST), timeout_sec=150, err_msg="Sink output file never converged to the same state as the input file")
+
+    @cluster(num_nodes=6)
+    @matrix(clean=[True, False], connect_protocol=['sessioned', 'compatible', 'eager'])
+    def test_bounce(self, clean, connect_protocol):
+        """
+        Validates that source and sink tasks that run continuously and produce a predictable sequence of messages
+        run correctly and deliver messages exactly once when Kafka Connect workers undergo clean rolling bounces.
+        """
+        num_tasks = 3
+
+        self.CONNECT_PROTOCOL = connect_protocol
+        self.setup_services()
+        self.cc.set_configs(lambda node: self.render("connect-distributed.properties", node=node))
+        self.cc.start()
+
+        self.source = VerifiableSource(self.cc, topic=self.TOPIC, tasks=num_tasks, throughput=100)
+        self.source.start()
+        self.sink = VerifiableSink(self.cc, tasks=num_tasks, topics=[self.TOPIC])
+        self.sink.start()
+
+        for _ in range(3):
+            for node in self.cc.nodes:
+                started = time.time()
+                self.logger.info("%s bouncing Kafka Connect on %s", clean and "Clean" or "Hard", str(node.account))
+                self.cc.stop_node(node, clean_shutdown=clean)
+                with node.account.monitor_log(self.cc.LOG_FILE) as monitor:
+                    self.cc.start_node(node)
+                    monitor.wait_until("Starting connectors and tasks using config offset", timeout_sec=90,
+                                       err_msg="Kafka Connect worker didn't successfully join group and start work")
+                self.logger.info("Bounced Kafka Connect on %s and rejoined in %f seconds", node.account, time.time() - started)
+
+                # Give additional time for the consumer groups to recover. Even if it is not a hard bounce, there are
+                # some cases where a restart can cause a rebalance to take the full length of the session timeout
+                # (e.g. if the client shuts down before it has received the memberId from its initial JoinGroup).
+                # If we don't give enough time for the group to stabilize, the next bounce may cause consumers to 
+                # be shut down before they have any time to process data and we can end up with zero data making it 
+                # through the test.
+                time.sleep(15)
+
+        # Wait at least scheduled.rebalance.max.delay.ms to expire and rebalance
+        time.sleep(60)
+
+        # Allow the connectors to startup, recover, and exit cleanly before
+        # ending the test. It's possible for the source connector to make
+        # uncommitted progress, and for the sink connector to read messages that
+        # have not been committed yet, and fail a later assertion.
+        wait_until(lambda: self.is_running(self.source), timeout_sec=30,
+                   err_msg="Failed to see connector transition to the RUNNING state")
+        time.sleep(15)
+        self.source.stop()
+        # Ensure that the sink connector has an opportunity to read all
+        # committed messages from the source connector.
+        wait_until(lambda: self.is_running(self.sink), timeout_sec=30,
+                   err_msg="Failed to see connector transition to the RUNNING state")
+        time.sleep(15)
+        self.sink.stop()
+        self.cc.stop()
+
+        # Validate at least once delivery of everything that was reported as written since we should have flushed and
+        # cleanly exited. Currently this only tests at least once delivery because the sink task may not have consumed
+        # all the messages generated by the source task. This needs to be done per-task since seqnos are not unique across
+        # tasks.
+        success = True
+        errors = []
+        allow_dups = not clean
+        src_messages = self.source.committed_messages()
+        sink_messages = self.sink.flushed_messages()
+        for task in range(num_tasks):
+            # Validate source messages
+            src_seqnos = [msg['seqno'] for msg in src_messages if msg['task'] == task]
+            # Every seqno up to the largest one we ever saw should appear. Each seqno should only appear once because clean
+            # bouncing should commit on rebalance.
+            src_seqno_max = max(src_seqnos)
+            self.logger.debug("Max source seqno: %d", src_seqno_max)
+            src_seqno_counts = Counter(src_seqnos)
+            missing_src_seqnos = sorted(set(range(src_seqno_max)).difference(set(src_seqnos)))
+            duplicate_src_seqnos = sorted([seqno for seqno,count in src_seqno_counts.iteritems() if count > 1])
+
+            if missing_src_seqnos:
+                self.logger.error("Missing source sequence numbers for task " + str(task))
+                errors.append("Found missing source sequence numbers for task %d: %s" % (task, missing_src_seqnos))
+                success = False
+            if not allow_dups and duplicate_src_seqnos:
+                self.logger.error("Duplicate source sequence numbers for task " + str(task))
+                errors.append("Found duplicate source sequence numbers for task %d: %s" % (task, duplicate_src_seqnos))
+                success = False
+
+
+            # Validate sink messages
+            sink_seqnos = [msg['seqno'] for msg in sink_messages if msg['task'] == task]
+            # Every seqno up to the largest one we ever saw should appear. Each seqno should only appear once because
+            # clean bouncing should commit on rebalance.
+            sink_seqno_max = max(sink_seqnos)
+            self.logger.debug("Max sink seqno: %d", sink_seqno_max)
+            sink_seqno_counts = Counter(sink_seqnos)
+            missing_sink_seqnos = sorted(set(range(sink_seqno_max)).difference(set(sink_seqnos)))
+            duplicate_sink_seqnos = sorted([seqno for seqno,count in sink_seqno_counts.iteritems() if count > 1])
+
+            if missing_sink_seqnos:
+                self.logger.error("Missing sink sequence numbers for task " + str(task))
+                errors.append("Found missing sink sequence numbers for task %d: %s" % (task, missing_sink_seqnos))
+                success = False
+            if not allow_dups and duplicate_sink_seqnos:
+                self.logger.error("Duplicate sink sequence numbers for task " + str(task))
+                errors.append("Found duplicate sink sequence numbers for task %d: %s" % (task, duplicate_sink_seqnos))
+                success = False
+
+            # Validate source and sink match
+            if sink_seqno_max > src_seqno_max:
+                self.logger.error("Found sink sequence number greater than any generated sink sequence number for task %d: %d > %d", task, sink_seqno_max, src_seqno_max)
+                errors.append("Found sink sequence number greater than any generated sink sequence number for task %d: %d > %d" % (task, sink_seqno_max, src_seqno_max))
+                success = False
+            if src_seqno_max < 1000 or sink_seqno_max < 1000:
+                errors.append("Not enough messages were processed: source:%d sink:%d" % (src_seqno_max, sink_seqno_max))
+                success = False
+
+        if not success:
+            self.mark_for_collect(self.cc)
+            # Also collect the data in the topic to aid in debugging
+            consumer_validator = ConsoleConsumer(self.test_context, 1, self.kafka, self.source.topic, consumer_timeout_ms=1000, print_key=True)
+            consumer_validator.run()
+            self.mark_for_collect(consumer_validator, "consumer_stdout")
+
+        assert success, "Found validation errors:\n" + "\n  ".join(errors)
+
+    @cluster(num_nodes=6)
+    @matrix(connect_protocol=['sessioned', 'compatible', 'eager'])
+    def test_transformations(self, connect_protocol):
+        self.CONNECT_PROTOCOL = connect_protocol
+        self.setup_services(timestamp_type='CreateTime')
+        self.cc.set_configs(lambda node: self.render("connect-distributed.properties", node=node))
+        self.cc.start()
+
+        ts_fieldname = 'the_timestamp'
+
+        NamedConnector = namedtuple('Connector', ['name'])
+
+        source_connector = NamedConnector(name='file-src')
+
+        self.cc.create_connector({
+            'name': source_connector.name,
+            'connector.class': 'org.apache.kafka.connect.file.FileStreamSourceConnector',
+            'tasks.max': 1,
+            'file': self.INPUT_FILE,
+            'topic': self.TOPIC,
+            'transforms': 'hoistToStruct,insertTimestampField',
+            'transforms.hoistToStruct.type': 'org.apache.kafka.connect.transforms.HoistField$Value',
+            'transforms.hoistToStruct.field': 'content',
+            'transforms.insertTimestampField.type': 'org.apache.kafka.connect.transforms.InsertField$Value',
+            'transforms.insertTimestampField.timestamp.field': ts_fieldname,
+        })
+
+        wait_until(lambda: self.connector_is_running(source_connector), timeout_sec=30, err_msg='Failed to see connector transition to the RUNNING state')
+
+        for node in self.cc.nodes:
+            node.account.ssh("echo -e -n " + repr(self.FIRST_INPUTS) + " >> " + self.INPUT_FILE)
+
+        consumer = ConsoleConsumer(self.test_context, 1, self.kafka, self.TOPIC, consumer_timeout_ms=15000, print_timestamp=True)
+        consumer.run()
+
+        assert len(consumer.messages_consumed[1]) == len(self.FIRST_INPUT_LIST)
+
+        expected_schema = {
+            'type': 'struct',
+            'fields': [
+                {'field': 'content', 'type': 'string', 'optional': False},
+                {'field': ts_fieldname, 'name': 'org.apache.kafka.connect.data.Timestamp', 'type': 'int64', 'version': 1, 'optional': True},
+            ],
+            'optional': False
+        }
+
+        for msg in consumer.messages_consumed[1]:
+            (ts_info, value) = msg.split('\t')
+
+            assert ts_info.startswith('CreateTime:')
+            ts = int(ts_info[len('CreateTime:'):])
+
+            obj = json.loads(value)
+            assert obj['schema'] == expected_schema
+            assert obj['payload']['content'] in self.FIRST_INPUT_LIST
+            assert obj['payload'][ts_fieldname] == ts
+
+    @cluster(num_nodes=5)
+    @parametrize(broker_version=str(DEV_BRANCH), auto_create_topics=False, security_protocol=SecurityConfig.PLAINTEXT, connect_protocol='sessioned')
+    @parametrize(broker_version=str(LATEST_0_11_0), auto_create_topics=False, security_protocol=SecurityConfig.PLAINTEXT, connect_protocol='sessioned')
+    @parametrize(broker_version=str(LATEST_0_10_2), auto_create_topics=False, security_protocol=SecurityConfig.PLAINTEXT, connect_protocol='sessioned')
+    @parametrize(broker_version=str(LATEST_0_10_1), auto_create_topics=False, security_protocol=SecurityConfig.PLAINTEXT, connect_protocol='sessioned')
+    @parametrize(broker_version=str(LATEST_0_10_0), auto_create_topics=True, security_protocol=SecurityConfig.PLAINTEXT, connect_protocol='sessioned')
+    @parametrize(broker_version=str(DEV_BRANCH), auto_create_topics=False, security_protocol=SecurityConfig.PLAINTEXT, connect_protocol='compatible')
+    @parametrize(broker_version=str(LATEST_2_3), auto_create_topics=False, security_protocol=SecurityConfig.PLAINTEXT, connect_protocol='compatible')
+    @parametrize(broker_version=str(LATEST_2_2), auto_create_topics=False, security_protocol=SecurityConfig.PLAINTEXT, connect_protocol='compatible')
+    @parametrize(broker_version=str(LATEST_2_1), auto_create_topics=False, security_protocol=SecurityConfig.PLAINTEXT, connect_protocol='compatible')
+    @parametrize(broker_version=str(LATEST_2_0), auto_create_topics=False, security_protocol=SecurityConfig.PLAINTEXT, connect_protocol='compatible')
+    @parametrize(broker_version=str(LATEST_1_1), auto_create_topics=False, security_protocol=SecurityConfig.PLAINTEXT, connect_protocol='compatible')
+    @parametrize(broker_version=str(LATEST_1_0), auto_create_topics=False, security_protocol=SecurityConfig.PLAINTEXT, connect_protocol='compatible')
+    @parametrize(broker_version=str(LATEST_0_11_0), auto_create_topics=False, security_protocol=SecurityConfig.PLAINTEXT, connect_protocol='compatible')
+    @parametrize(broker_version=str(LATEST_0_10_2), auto_create_topics=False, security_protocol=SecurityConfig.PLAINTEXT, connect_protocol='compatible')
+    @parametrize(broker_version=str(LATEST_0_10_1), auto_create_topics=False, security_protocol=SecurityConfig.PLAINTEXT, connect_protocol='compatible')
+    @parametrize(broker_version=str(LATEST_0_10_0), auto_create_topics=True, security_protocol=SecurityConfig.PLAINTEXT, connect_protocol='compatible')
+    @parametrize(broker_version=str(DEV_BRANCH), auto_create_topics=False, security_protocol=SecurityConfig.PLAINTEXT, connect_protocol='eager')
+    @parametrize(broker_version=str(LATEST_2_3), auto_create_topics=False, security_protocol=SecurityConfig.PLAINTEXT, connect_protocol='eager')
+    @parametrize(broker_version=str(LATEST_2_2), auto_create_topics=False, security_protocol=SecurityConfig.PLAINTEXT, connect_protocol='eager')
+    @parametrize(broker_version=str(LATEST_2_1), auto_create_topics=False, security_protocol=SecurityConfig.PLAINTEXT, connect_protocol='eager')
+    @parametrize(broker_version=str(LATEST_2_0), auto_create_topics=False, security_protocol=SecurityConfig.PLAINTEXT, connect_protocol='eager')
+    @parametrize(broker_version=str(LATEST_1_1), auto_create_topics=False, security_protocol=SecurityConfig.PLAINTEXT, connect_protocol='eager')
+    @parametrize(broker_version=str(LATEST_1_0), auto_create_topics=False, security_protocol=SecurityConfig.PLAINTEXT, connect_protocol='eager')
+    @parametrize(broker_version=str(LATEST_0_11_0), auto_create_topics=False, security_protocol=SecurityConfig.PLAINTEXT, connect_protocol='eager')
+    @parametrize(broker_version=str(LATEST_0_10_2), auto_create_topics=False, security_protocol=SecurityConfig.PLAINTEXT, connect_protocol='eager')
+    @parametrize(broker_version=str(LATEST_0_10_1), auto_create_topics=False, security_protocol=SecurityConfig.PLAINTEXT, connect_protocol='eager')
+    @parametrize(broker_version=str(LATEST_0_10_0), auto_create_topics=True, security_protocol=SecurityConfig.PLAINTEXT, connect_protocol='eager')
+    def test_broker_compatibility(self, broker_version, auto_create_topics, security_protocol, connect_protocol):
+        """
+        Verify that Connect will start up with various broker versions with various configurations. 
+        When Connect distributed starts up, it either creates internal topics (v0.10.1.0 and after) 
+        or relies upon the broker to auto-create the topics (v0.10.0.x and before).
+        """
+        self.CONNECT_PROTOCOL = connect_protocol
+        self.setup_services(broker_version=KafkaVersion(broker_version), auto_create_topics=auto_create_topics, security_protocol=security_protocol)
+        self.cc.set_configs(lambda node: self.render("connect-distributed.properties", node=node))
+
+        self.cc.start()
+
+        self.logger.info("Creating connectors")
+        self._start_connector("connect-file-source.properties")
+        self._start_connector("connect-file-sink.properties")
+
+        # Generating data on the source node should generate new records and create new output on the sink node. Timeouts
+        # here need to be more generous than they are for standalone mode because a) it takes longer to write configs,
+        # do rebalancing of the group, etc, and b) without explicit leave group support, rebalancing takes awhile
+        for node in self.cc.nodes:
+            node.account.ssh("echo -e -n " + repr(self.FIRST_INPUTS) + " >> " + self.INPUT_FILE)
+        wait_until(lambda: self._validate_file_output(self.FIRST_INPUT_LIST), timeout_sec=70, err_msg="Data added to input file was not seen in the output file in a reasonable amount of time.")
+
+    def _validate_file_output(self, input):
+        input_set = set(input)
+        # Output needs to be collected from all nodes because we can't be sure where the tasks will be scheduled.
+        # Between the first and second rounds, we might even end up with half the data on each node.
+        output_set = set(itertools.chain(*[
+            [line.strip() for line in self._file_contents(node, self.OUTPUT_FILE)] for node in self.cc.nodes
+        ]))
+        return input_set == output_set
+
+    def _file_contents(self, node, file):
+        try:
+            # Convert to a list here or the RemoteCommandError may be returned during a call to the generator instead of
+            # immediately
+            return list(node.account.ssh_capture("cat " + file))
+        except RemoteCommandError:
+            return []
--- a/Show More
+++ b/Show More
				`@@ -0,0 +1 @@`
				`ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC0qDT9kEPWc8JQ53b4KnT/ZJOLwb+3c//jpLW/2ofjDyIsPW4FohLpicfouch/zsRpN4G38lua+2BsGls9sMIZc6PXY2L+NIGCkqEMdCoU1Ym8SMtyJklfzp3m/0PeK9s2dLlR3PFRYvyFA4btQK5hkbYDNZPzf4airvzdRzLkrFf81+RemaMI2EtONwJRcbLViPaTXVKJdbFwJTJ1u7yu9wDYWHKBMA92mHTQeP6bhVYCqxJn3to/RfZYd+sHw6mfxVg5OrAlUOYpSV4pDNCAsIHdtZ56V8NQlJL6NJ2vzzSSYUwLMqe88fhrC8yYHoxC07QPy1EdkSTHdohAicyT root@knode01.knw`