Add km module kafka gateway

This commit is contained in:
leewei
2023-02-14 11:10:58 +08:00
parent 229140f067
commit 7008677947
4398 changed files with 977288 additions and 46204 deletions

7
tests/.gitignore vendored Normal file
View File

@@ -0,0 +1,7 @@
Vagrantfile.local
.idea/
*.pyc
*.ipynb
.DS_Store
.ducktape
results/

16
tests/MANIFEST.in Normal file
View File

@@ -0,0 +1,16 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
recursive-include kafkatest */templates/*

548
tests/README.md Normal file
View File

@@ -0,0 +1,548 @@
System Integration & Performance Testing
========================================
This directory contains Kafka system integration and performance tests.
[ducktape](https://github.com/confluentinc/ducktape) is used to run the tests.
(ducktape is a distributed testing framework which provides test runner,
result reporter and utilities to pull up and tear down services.)
Running tests using docker
--------------------------
Docker containers can be used for running kafka system tests locally.
* Requirements
- Docker 1.12.3 (or higher) is installed and running on the machine.
- Test require that Kafka, including system test libs, is built. This can be done by running ./gradlew clean systemTestLibs
* Run all tests
```
bash tests/docker/run_tests.sh
```
* Run all tests with debug on (warning will produce log of logs)
```
_DUCKTAPE_OPTIONS="--debug" bash tests/docker/run_tests.sh | tee debug_logs.txt
```
* Run a subset of tests
```
TC_PATHS="tests/kafkatest/tests/streams tests/kafkatest/tests/tools" bash tests/docker/run_tests.sh
```
* Run a specific tests file
```
TC_PATHS="tests/kafkatest/tests/client/pluggable_test.py" bash tests/docker/run_tests.sh
```
* Run a specific test class
```
TC_PATHS="tests/kafkatest/tests/client/pluggable_test.py::PluggableConsumerTest" bash tests/docker/run_tests.sh
```
* Run a specific test method
```
TC_PATHS="tests/kafkatest/tests/client/pluggable_test.py::PluggableConsumerTest.test_start_stop" bash tests/docker/run_tests.sh
```
* Run tests with a different JVM
```
bash tests/docker/ducker-ak up -j 'openjdk:11'; tests/docker/run_tests.sh
```
* Notes
- The scripts to run tests creates and destroys docker network named *knw*.
This network can't be used for any other purpose.
- The docker containers are named knode01, knode02 etc.
These nodes can't be used for any other purpose.
* Exposing ports using --expose-ports option of `ducker-ak up` command
If `--expose-ports` is specified then we will expose those ports to random ephemeral ports
on the host. The argument can be a single port (like 5005), a port range like (5005-5009)
or a combination of port/port-range separated by comma (like 2181,9092 or 2181,5005-5008).
By default no port is exposed.
The exposed port mapping can be seen by executing `docker ps` command. The PORT column
of the output shows the mapping like this (maps port 33891 on host to port 2182 in container):
0.0.0.0:33891->2182/tcp
Behind the scene Docker is setting up a DNAT rule for the mapping and it is visible in
the DOCKER section of iptables command (`sudo iptables -t nat -L -n`), something like:
<pre>DNAT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp dpt:33882 to:172.22.0.2:9092</pre>
The exposed port(s) are useful to attach a remote debugger to the process running
in the docker image. For example if port 5005 was exposed and is mapped to an ephemeral
port (say 33891), then a debugger attaching to port 33891 on host will be connecting to
a debug session started at port 5005 in the docker image. As an example, for above port
numbers, run following commands in the docker image (say by ssh using `./docker/ducker-ak ssh ducker02`):
> $ export KAFKA_OPTS="-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=5005"
> $ /opt/kafka-dev/bin/kafka-topics.sh --bootstrap-server ducker03:9095 --topic __consumer_offsets --describe
This will run the TopicCommand to describe the __consumer-offset topic. The java process
will stop and wait for debugger to attach as `suspend=y` option was specified. Now starting
a debugger on host with host `localhost` and following parameter as JVM setting:
`-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=33891`
will attach it to the TopicCommand process running in the docker image.
Examining CI run
----------------
* Set BUILD_ID is travis ci's build id. E.g. build id is 169519874 for the following build
```bash
https://travis-ci.org/apache/kafka/builds/169519874
```
* Getting number of tests that were actually run
```bash
for id in $(curl -sSL https://api.travis-ci.org/builds/$BUILD_ID | jq '.matrix|map(.id)|.[]'); do curl -sSL "https://api.travis-ci.org/jobs/$id/log.txt?deansi=true" ; done | grep -cE 'RunnerClient: Loading test'
```
* Getting number of tests that passed
```bash
for id in $(curl -sSL https://api.travis-ci.org/builds/$BUILD_ID | jq '.matrix|map(.id)|.[]'); do curl -sSL "https://api.travis-ci.org/jobs/$id/log.txt?deansi=true" ; done | grep -cE 'RunnerClient.*PASS'
```
* Getting all the logs produced from a run
```bash
for id in $(curl -sSL https://api.travis-ci.org/builds/$BUILD_ID | jq '.matrix|map(.id)|.[]'); do curl -sSL "https://api.travis-ci.org/jobs/$id/log.txt?deansi=true" ; done
```
* Explanation of curl calls to travis-ci & jq commands
- We get json information of the build using the following command
```bash
curl -sSL https://api.travis-ci.org/apache/kafka/builds/169519874
```
This produces a json about the build which looks like:
```json
{
"id": 169519874,
"repository_id": 6097916,
"number": "19",
"config": {
"sudo": "required",
"dist": "trusty",
"language": "java",
"env": [
"TC_PATHS=\"tests/kafkatest/tests/client\"",
"TC_PATHS=\"tests/kafkatest/tests/connect tests/kafkatest/tests/streams tests/kafkatest/tests/tools\"",
"TC_PATHS=\"tests/kafkatest/tests/mirror_maker\"",
"TC_PATHS=\"tests/kafkatest/tests/replication\"",
"TC_PATHS=\"tests/kafkatest/tests/upgrade\"",
"TC_PATHS=\"tests/kafkatest/tests/security\"",
"TC_PATHS=\"tests/kafkatest/tests/core\""
],
"jdk": [
"oraclejdk8"
],
"before_install": null,
"script": [
"./gradlew systemTestLibs && /bin/bash ./tests/travis/run_tests.sh"
],
"services": [
"docker"
],
"before_cache": [
"rm -f $HOME/.gradle/caches/modules-2/modules-2.lock",
"rm -fr $HOME/.gradle/caches/*/plugin-resolution/"
],
"cache": {
"directories": [
"$HOME/.m2/repository",
"$HOME/.gradle/caches/",
"$HOME/.gradle/wrapper/"
]
},
".result": "configured",
"group": "stable"
},
"state": "finished",
"result": null,
"status": null,
"started_at": "2016-10-21T13:35:43Z",
"finished_at": "2016-10-21T14:46:03Z",
"duration": 16514,
"commit": "7e583d9ea08c70dbbe35a3adde72ed203a797f64",
"branch": "trunk",
"message": "respect _DUCK_OPTIONS",
"committed_at": "2016-10-21T00:12:36Z",
"author_name": "Raghav Kumar Gautam",
"author_email": "raghav@apache.org",
"committer_name": "Raghav Kumar Gautam",
"committer_email": "raghav@apache.org",
"compare_url": "https://github.com/raghavgautam/kafka/compare/cc788ac99ca7...7e583d9ea08c",
"event_type": "push",
"matrix": [
{
"id": 169519875,
"repository_id": 6097916,
"number": "19.1",
"config": {
"sudo": "required",
"dist": "trusty",
"language": "java",
"env": "TC_PATHS=\"tests/kafkatest/tests/client\"",
"jdk": "oraclejdk8",
"before_install": null,
"script": [
"./gradlew systemTestLibs && /bin/bash ./tests/travis/run_tests.sh"
],
"services": [
"docker"
],
"before_cache": [
"rm -f $HOME/.gradle/caches/modules-2/modules-2.lock",
"rm -fr $HOME/.gradle/caches/*/plugin-resolution/"
],
"cache": {
"directories": [
"$HOME/.m2/repository",
"$HOME/.gradle/caches/",
"$HOME/.gradle/wrapper/"
]
},
".result": "configured",
"group": "stable",
"os": "linux"
},
"result": null,
"started_at": "2016-10-21T13:35:43Z",
"finished_at": "2016-10-21T14:24:50Z",
"allow_failure": false
},
{
"id": 169519876,
"repository_id": 6097916,
"number": "19.2",
"config": {
"sudo": "required",
"dist": "trusty",
"language": "java",
"env": "TC_PATHS=\"tests/kafkatest/tests/connect tests/kafkatest/tests/streams tests/kafkatest/tests/tools\"",
"jdk": "oraclejdk8",
"before_install": null,
"script": [
"./gradlew systemTestLibs && /bin/bash ./tests/travis/run_tests.sh"
],
"services": [
"docker"
],
"before_cache": [
"rm -f $HOME/.gradle/caches/modules-2/modules-2.lock",
"rm -fr $HOME/.gradle/caches/*/plugin-resolution/"
],
"cache": {
"directories": [
"$HOME/.m2/repository",
"$HOME/.gradle/caches/",
"$HOME/.gradle/wrapper/"
]
},
".result": "configured",
"group": "stable",
"os": "linux"
},
"result": 1,
"started_at": "2016-10-21T13:35:46Z",
"finished_at": "2016-10-21T14:22:05Z",
"allow_failure": false
},
...
]
}
```
- By passing this through jq filter `.matrix` we extract the matrix part of the json
```bash
curl -sSL https://api.travis-ci.org/apache/kafka/builds/169519874 | jq '.matrix'
```
The resulting json looks like:
```json
[
{
"id": 169519875,
"repository_id": 6097916,
"number": "19.1",
"config": {
"sudo": "required",
"dist": "trusty",
"language": "java",
"env": "TC_PATHS=\"tests/kafkatest/tests/client\"",
"jdk": "oraclejdk8",
"before_install": null,
"script": [
"./gradlew systemTestLibs && /bin/bash ./tests/travis/run_tests.sh"
],
"services": [
"docker"
],
"before_cache": [
"rm -f $HOME/.gradle/caches/modules-2/modules-2.lock",
"rm -fr $HOME/.gradle/caches/*/plugin-resolution/"
],
"cache": {
"directories": [
"$HOME/.m2/repository",
"$HOME/.gradle/caches/",
"$HOME/.gradle/wrapper/"
]
},
".result": "configured",
"group": "stable",
"os": "linux"
},
"result": null,
"started_at": "2016-10-21T13:35:43Z",
"finished_at": "2016-10-21T14:24:50Z",
"allow_failure": false
},
{
"id": 169519876,
"repository_id": 6097916,
"number": "19.2",
"config": {
"sudo": "required",
"dist": "trusty",
"language": "java",
"env": "TC_PATHS=\"tests/kafkatest/tests/connect tests/kafkatest/tests/streams tests/kafkatest/tests/tools\"",
"jdk": "oraclejdk8",
"before_install": null,
"script": [
"./gradlew systemTestLibs && /bin/bash ./tests/travis/run_tests.sh"
],
"services": [
"docker"
],
"before_cache": [
"rm -f $HOME/.gradle/caches/modules-2/modules-2.lock",
"rm -fr $HOME/.gradle/caches/*/plugin-resolution/"
],
"cache": {
"directories": [
"$HOME/.m2/repository",
"$HOME/.gradle/caches/",
"$HOME/.gradle/wrapper/"
]
},
".result": "configured",
"group": "stable",
"os": "linux"
},
"result": 1,
"started_at": "2016-10-21T13:35:46Z",
"finished_at": "2016-10-21T14:22:05Z",
"allow_failure": false
},
...
]
```
- By further passing this through jq filter `map(.id)` we extract the id of
the builds for each of the splits
```bash
curl -sSL https://api.travis-ci.org/apache/kafka/builds/169519874 | jq '.matrix|map(.id)'
```
The resulting json looks like:
```json
[
169519875,
169519876,
169519877,
169519878,
169519879,
169519880,
169519881
]
```
- To use these ids in for loop we want to get rid of `[]` which is done by
passing it through `.[]` filter
```bash
curl -sSL https://api.travis-ci.org/apache/kafka/builds/169519874 | jq '.matrix|map(.id)|.[]'
```
And we get
```text
169519875
169519876
169519877
169519878
169519879
169519880
169519881
```
- In the for loop we have made calls to fetch logs
```bash
curl -sSL "https://api.travis-ci.org/jobs/169519875/log.txt?deansi=true" | tail
```
which gives us
```text
[INFO:2016-10-21 14:21:12,538]: SerialTestRunner: kafkatest.tests.client.consumer_test.OffsetValidationTest.test_consumer_bounce.clean_shutdown=False.bounce_mode=rolling: test 16 of 28
[INFO:2016-10-21 14:21:12,538]: SerialTestRunner: kafkatest.tests.client.consumer_test.OffsetValidationTest.test_consumer_bounce.clean_shutdown=False.bounce_mode=rolling: setting up
[INFO:2016-10-21 14:21:30,810]: SerialTestRunner: kafkatest.tests.client.consumer_test.OffsetValidationTest.test_consumer_bounce.clean_shutdown=False.bounce_mode=rolling: running
[INFO:2016-10-21 14:24:35,519]: SerialTestRunner: kafkatest.tests.client.consumer_test.OffsetValidationTest.test_consumer_bounce.clean_shutdown=False.bounce_mode=rolling: PASS
[INFO:2016-10-21 14:24:35,519]: SerialTestRunner: kafkatest.tests.client.consumer_test.OffsetValidationTest.test_consumer_bounce.clean_shutdown=False.bounce_mode=rolling: tearing down
The job exceeded the maximum time limit for jobs, and has been terminated.
```
* Links
- [Travis-CI REST api documentation](https://docs.travis-ci.com/api)
- [jq Manual](https://stedolan.github.io/jq/manual/)
Local Quickstart
----------------
This quickstart will help you run the Kafka system tests on your local machine. Note this requires bringing up a cluster of virtual machines on your local computer, which is memory intensive; it currently requires around 10G RAM.
For a tutorial on how to setup and run the Kafka system tests, see
https://cwiki.apache.org/confluence/display/KAFKA/tutorial+-+set+up+and+run+Kafka+system+tests+with+ducktape
* Install Virtual Box from [https://www.virtualbox.org/](https://www.virtualbox.org/) (run `$ vboxmanage --version` to check if it's installed).
* Install Vagrant >= 1.6.4 from [https://www.vagrantup.com/](https://www.vagrantup.com/) (run `vagrant --version` to check if it's installed).
* Install system test dependencies, including ducktape, a command-line tool and library for testing distributed systems. We recommend to use virtual env for system test development
$ cd kafka/tests
$ virtualenv venv
$ . ./venv/bin/activate
$ python setup.py develop
$ cd .. # back to base kafka directory
* Run the bootstrap script to set up Vagrant for testing
$ tests/bootstrap-test-env.sh
* Bring up the test cluster
$ vagrant/vagrant-up.sh
$ # When using Virtualbox, it also works to run: vagrant up
* Build the desired branch of Kafka
$ git checkout $BRANCH
$ gradle # (only if necessary)
$ ./gradlew systemTestLibs
* Run the system tests using ducktape:
$ ducktape tests/kafkatest/tests
EC2 Quickstart
--------------
This quickstart will help you run the Kafka system tests on EC2. In this setup, all logic is run
on EC2 and none on your local machine.
There are a lot of steps here, but the basic goals are to create one distinguished EC2 instance that
will be our "test driver", and to set up the security groups and iam role so that the test driver
can create, destroy, and run ssh commands on any number of "workers".
As a convention, we'll use "kafkatest" in most names, but you can use whatever name you want.
Preparation
-----------
In these steps, we will create an IAM role which has permission to create and destroy EC2 instances,
set up a keypair used for ssh access to the test driver and worker machines, and create a security group to allow the test driver and workers to all communicate via TCP.
* [Create an IAM role](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create_for-user.html). We'll give this role the ability to launch or kill additional EC2 machines.
- Create role "kafkatest-master"
- Role type: Amazon EC2
- Attach policy: AmazonEC2FullAccess (this will allow our test-driver to create and destroy EC2 instances)
* If you haven't already, [set up a keypair to use for SSH access](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-key-pairs.html). For the purpose
of this quickstart, let's say the keypair name is kafkatest, and you've saved the private key in kafktest.pem
* Next, create a EC2 security group called "kafkatest".
- After creating the group, inbound rules: allow SSH on port 22 from anywhere; also, allow access on all ports (0-65535) from other machines in the kafkatest group.
Create the Test Driver
----------------------
* Launch a new test driver machine
- OS: Ubuntu server is recommended
- Instance type: t2.medium is easily enough since this machine is just a driver
- Instance details: Most defaults are fine.
- IAM role -> kafkatest-master
- Tagging the instance with a useful name is recommended.
- Security group -> 'kafkatest'
* Once the machine is started, upload the SSH key to your test driver:
$ scp -i /path/to/kafkatest.pem \
/path/to/kafkatest.pem ubuntu@public.hostname.amazonaws.com:kafkatest.pem
* Grab the public hostname/IP (available for example by navigating to your EC2 dashboard and viewing running instances) of your test driver and SSH into it:
$ ssh -i /path/to/kafkatest.pem ubuntu@public.hostname.amazonaws.com
Set Up the Test Driver
----------------------
The following steps assume you have ssh'd into
the test driver machine.
* Start by making sure you're up to date, and install git and ducktape:
$ sudo apt-get update && sudo apt-get -y upgrade && sudo apt-get install -y python-pip git
$ pip install ducktape
* Get Kafka:
$ git clone https://git-wip-us.apache.org/repos/asf/kafka.git kafka
* Update your AWS credentials:
export AWS_IAM_ROLE=$(curl -s http://169.254.169.254/latest/meta-data/iam/info | grep InstanceProfileArn | cut -d '"' -f 4 | cut -d '/' -f 2)
export AWS_ACCESS_KEY=$(curl -s http://169.254.169.254/latest/meta-data/iam/security-credentials/$AWS_IAM_ROLE | grep AccessKeyId | awk -F\" '{ print $4 }')
export AWS_SECRET_KEY=$(curl -s http://169.254.169.254/latest/meta-data/iam/security-credentials/$AWS_IAM_ROLE | grep SecretAccessKey | awk -F\" '{ print $4 }')
export AWS_SESSION_TOKEN=$(curl -s http://169.254.169.254/latest/meta-data/iam/security-credentials/$AWS_IAM_ROLE | grep Token | awk -F\" '{ print $4 }')
* Install some dependencies:
$ cd kafka
$ ./vagrant/aws/aws-init.sh
$ . ~/.bashrc
* An example Vagrantfile.local has been created by aws-init.sh which looks something like:
# Vagrantfile.local
ec2_instance_type = "..." # Pick something appropriate for your
# test. Note that the default m3.medium has
# a small disk.
ec2_spot_max_price = "0.123" # On-demand price for instance type
enable_hostmanager = false
num_zookeepers = 0
num_kafka = 0
num_workers = 9
ec2_keypair_name = 'kafkatest'
ec2_keypair_file = '/home/ubuntu/kafkatest.pem'
ec2_security_groups = ['kafkatest']
ec2_region = 'us-west-2'
ec2_ami = "ami-29ebb519"
* Start up the instances:
# This will brink up worker machines in small parallel batches
$ vagrant/vagrant-up.sh --aws
* Now you should be able to run tests:
$ cd kafka/tests
$ ducktape kafkatest/tests
* Update Worker VM
If you change code in a branch on your driver VM, you need to update your worker VM to pick up this change:
$ ./gradlew systemTestLibs
$ vagrant rsync
* To halt your workers without destroying persistent state, run `vagrant halt`. Run `vagrant destroy -f` to destroy all traces of your workers.
Unit Tests
----------
The system tests have unit tests! The various services in the python `kafkatest` module are reasonably complex, and intended to be reusable. Hence we have unit tests
for the system service classes.
Where are the unit tests?
* The kafkatest unit tests are located under kafka/tests/unit
How do I run the unit tests?
* cd kafka/tests # The base system test directory
* python setup.py test
How can I add a unit test?
* Follow the naming conventions - module name starts with "check", class name begins with "Check", test method name begins with "check"
* These naming conventions are defined in "setup.cfg". We use "check" to distinguish unit tests from system tests, which use "test" in the various names.

View File

@@ -0,0 +1,41 @@
#!/usr/bin/env python
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import json
import sys
import time
#
# This is an example of an external script which can be run through Trogdor's
# ExternalCommandWorker. It sleeps for the given amount of time expressed by the delayMs field in the ExternalCommandSpec
#
if __name__ == '__main__':
# Read the ExternalCommandWorker start message.
line = sys.stdin.readline()
start_message = json.loads(line)
workload = start_message["workload"]
print("Starting external_trogdor_command_example with task id %s, workload %s"
% (start_message["id"], workload))
sys.stdout.flush()
# pretend to start some workload
print(json.dumps({"status": "running"}))
sys.stdout.flush()
time.sleep(0.001 * workload["delayMs"])
print(json.dumps({"status": "exiting after %s delayMs" % workload["delayMs"]}))
sys.stdout.flush()

78
tests/bin/flatten_html.sh Executable file
View File

@@ -0,0 +1,78 @@
#!/bin/bash
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
usage() {
cat <<EOF
flatten_html.sh: This script "flattens" an HTML file by inlining all
files included via "#include virtual". This is useful when making
changes to the Kafka documentation files.
Typical usage:
./gradlew docsJar
./tests/bin/flatten_html.sh -f ./docs/protocol.html > /tmp/my-protocol.html
firefox /tmp/my-protocol.html &
usage:
$0 [flags]
flags:
-f [filename] The HTML file to process.
-h Print this help message.
EOF
}
die() {
echo $@
exit 1
}
realpath() {
[[ $1 = /* ]] && echo "$1" || echo "$PWD/${1#./}"
}
process_file() {
local CUR_FILE="${1}"
[[ -f "${CUR_FILE}" ]] || die "Unable to open input file ${CUR_FILE}"
while IFS= read -r LINE; do
if [[ $LINE =~ \#include\ virtual=\"(.*)\" ]]; then
local INCLUDED_FILE="${BASH_REMATCH[1]}"
if [[ $INCLUDED_FILE =~ ../includes/ ]]; then
: # ignore ../includes
else
pushd "$(dirname "${CUR_FILE}")" &> /dev/null \
|| die "failed to change directory to directory of ${CUR_FILE}"
process_file "${INCLUDED_FILE}"
popd &> /dev/null
fi
else
echo "${LINE}"
fi
done < "${CUR_FILE}"
}
FILE=""
while getopts "f:h" arg; do
case $arg in
f) FILE=$OPTARG;;
h) usage; exit 0;;
*) echo "Error parsing command-line arguments."
usage
exit 1;;
esac
done
[[ -z "${FILE}" ]] && die "You must specify which file to process. -h for help."
process_file "${FILE}"

87
tests/bootstrap-test-env.sh Executable file
View File

@@ -0,0 +1,87 @@
#!/usr/bin/env bash
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# This script automates the process of setting up a local machine for running Kafka system tests
export GREP_OPTIONS='--color=never'
# Helper function which prints version numbers so they can be compared lexically or numerically
function version { echo "$@" | awk -F. '{ printf("%03d%03d%03d%03d\n", $1,$2,$3,$4); }'; }
base_dir=`dirname $0`/..
cd $base_dir
echo "Checking Virtual Box installation..."
bad_vb=false
if [ -z `vboxmanage --version` ]; then
echo "It appears that Virtual Box is not installed. Please install and try again (see https://www.virtualbox.org/ for details)"
bad_vb=true
else
echo "Virtual Box looks good."
fi
echo "Checking Vagrant installation..."
vagrant_version=`vagrant --version | egrep -o "[0-9]+\.[0-9]+\.[0-9]+"`
bad_vagrant=false
if [ "$(version $vagrant_version)" -lt "$(version 1.6.4)" ]; then
echo "Found Vagrant version $vagrant_version. Please upgrade to 1.6.4 or higher (see https://www.vagrantup.com for details)"
bad_vagrant=true
else
echo "Vagrant installation looks good."
fi
if [ "x$bad_vagrant" == "xtrue" -o "x$bad_vb" == "xtrue" ]; then
exit 1
fi
echo "Checking for necessary Vagrant plugins..."
hostmanager_version=`vagrant plugin list | grep vagrant-hostmanager | egrep -o "[0-9]+\.[0-9]+\.[0-9]+"`
if [ -z "$hostmanager_version" ]; then
vagrant plugin install vagrant-hostmanager
fi
echo "Creating and packaging a reusable base box for Vagrant..."
vagrant/package-base-box.sh
# Set up Vagrantfile.local if necessary
if [ ! -e Vagrantfile.local ]; then
echo "Creating Vagrantfile.local..."
cp vagrant/system-test-Vagrantfile.local Vagrantfile.local
else
echo "Found an existing Vagrantfile.local. Keeping without overwriting..."
fi
# Sanity check contents of Vagrantfile.local
echo "Checking Vagrantfile.local..."
vagrantfile_ok=true
num_brokers=`egrep -o "num_brokers\s*=\s*[0-9]+" Vagrantfile.local | cut -d '=' -f 2 | xargs`
num_zookeepers=`egrep -o "num_zookeepers\s*=\s*[0-9]+" Vagrantfile.local | cut -d '=' -f 2 | xargs`
num_workers=`egrep -o "num_workers\s*=\s*[0-9]+" Vagrantfile.local | cut -d '=' -f 2 | xargs`
if [ "x$num_brokers" == "x" -o "$num_brokers" != 0 ]; then
echo "Vagrantfile.local: bad num_brokers. Update to: num_brokers = 0"
vagrantfile_ok=false
fi
if [ "x$num_zookeepers" == "x" -o "$num_zookeepers" != 0 ]; then
echo "Vagrantfile.local: bad num_zookeepers. Update to: num_zookeepers = 0"
vagrantfile_ok=false
fi
if [ "x$num_workers" == "x" -o "$num_workers" == 0 ]; then
echo "Vagrantfile.local: bad num_workers (size of test cluster). Set num_workers high enough to run your tests."
vagrantfile_ok=false
fi
if [ "$vagrantfile_ok" == "true" ]; then
echo "Vagrantfile.local looks good."
fi

87
tests/docker/Dockerfile Normal file
View File

@@ -0,0 +1,87 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
ARG jdk_version=openjdk:8
FROM $jdk_version
MAINTAINER Apache Kafka dev@kafka.apache.org
VOLUME ["/opt/kafka-dev"]
# Set the timezone.
ENV TZ="/usr/share/zoneinfo/America/Los_Angeles"
# Do not ask for confirmations when running apt-get, etc.
ENV DEBIAN_FRONTEND noninteractive
# Set the ducker.creator label so that we know that this is a ducker image. This will make it
# visible to 'ducker purge'. The ducker.creator label also lets us know what UNIX user built this
# image.
ARG ducker_creator=default
LABEL ducker.creator=$ducker_creator
# Update Linux and install necessary utilities.
RUN apt update && apt install -y sudo netcat iptables rsync unzip wget curl jq coreutils openssh-server net-tools vim python-pip python-dev libffi-dev libssl-dev cmake pkg-config libfuse-dev iperf traceroute && apt-get -y clean
RUN python -m pip install -U pip==9.0.3;
RUN pip install --upgrade cffi virtualenv pyasn1 boto3 pycrypto pywinrm ipaddress enum34 && pip install --upgrade ducktape==0.7.6
# Set up ssh
COPY ./ssh-config /root/.ssh/config
# NOTE: The paramiko library supports the PEM-format private key, but does not support the RFC4716 format.
RUN ssh-keygen -m PEM -q -t rsa -N '' -f /root/.ssh/id_rsa && cp -f /root/.ssh/id_rsa.pub /root/.ssh/authorized_keys
RUN echo 'PermitUserEnvironment yes' >> /etc/ssh/sshd_config
# Install binary test dependencies.
# we use the same versions as in vagrant/base.sh
ARG KAFKA_MIRROR="https://s3-us-west-2.amazonaws.com/kafka-packages"
RUN mkdir -p "/opt/kafka-0.8.2.2" && chmod a+rw /opt/kafka-0.8.2.2 && curl -s "$KAFKA_MIRROR/kafka_2.11-0.8.2.2.tgz" | tar xz --strip-components=1 -C "/opt/kafka-0.8.2.2"
RUN mkdir -p "/opt/kafka-0.9.0.1" && chmod a+rw /opt/kafka-0.9.0.1 && curl -s "$KAFKA_MIRROR/kafka_2.11-0.9.0.1.tgz" | tar xz --strip-components=1 -C "/opt/kafka-0.9.0.1"
RUN mkdir -p "/opt/kafka-0.10.0.1" && chmod a+rw /opt/kafka-0.10.0.1 && curl -s "$KAFKA_MIRROR/kafka_2.11-0.10.0.1.tgz" | tar xz --strip-components=1 -C "/opt/kafka-0.10.0.1"
RUN mkdir -p "/opt/kafka-0.10.1.1" && chmod a+rw /opt/kafka-0.10.1.1 && curl -s "$KAFKA_MIRROR/kafka_2.11-0.10.1.1.tgz" | tar xz --strip-components=1 -C "/opt/kafka-0.10.1.1"
RUN mkdir -p "/opt/kafka-0.10.2.2" && chmod a+rw /opt/kafka-0.10.2.2 && curl -s "$KAFKA_MIRROR/kafka_2.11-0.10.2.2.tgz" | tar xz --strip-components=1 -C "/opt/kafka-0.10.2.2"
RUN mkdir -p "/opt/kafka-0.11.0.3" && chmod a+rw /opt/kafka-0.11.0.3 && curl -s "$KAFKA_MIRROR/kafka_2.11-0.11.0.3.tgz" | tar xz --strip-components=1 -C "/opt/kafka-0.11.0.3"
RUN mkdir -p "/opt/kafka-1.0.2" && chmod a+rw /opt/kafka-1.0.2 && curl -s "$KAFKA_MIRROR/kafka_2.11-1.0.2.tgz" | tar xz --strip-components=1 -C "/opt/kafka-1.0.2"
RUN mkdir -p "/opt/kafka-1.1.1" && chmod a+rw /opt/kafka-1.1.1 && curl -s "$KAFKA_MIRROR/kafka_2.11-1.1.1.tgz" | tar xz --strip-components=1 -C "/opt/kafka-1.1.1"
RUN mkdir -p "/opt/kafka-2.0.1" && chmod a+rw /opt/kafka-2.0.1 && curl -s "$KAFKA_MIRROR/kafka_2.12-2.0.1.tgz" | tar xz --strip-components=1 -C "/opt/kafka-2.0.1"
RUN mkdir -p "/opt/kafka-2.1.1" && chmod a+rw /opt/kafka-2.1.1 && curl -s "$KAFKA_MIRROR/kafka_2.12-2.1.1.tgz" | tar xz --strip-components=1 -C "/opt/kafka-2.1.1"
RUN mkdir -p "/opt/kafka-2.2.2" && chmod a+rw /opt/kafka-2.2.2 && curl -s "$KAFKA_MIRROR/kafka_2.12-2.2.2.tgz" | tar xz --strip-components=1 -C "/opt/kafka-2.2.2"
RUN mkdir -p "/opt/kafka-2.3.1" && chmod a+rw /opt/kafka-2.3.1 && curl -s "$KAFKA_MIRROR/kafka_2.12-2.3.1.tgz" | tar xz --strip-components=1 -C "/opt/kafka-2.3.1"
RUN mkdir -p "/opt/kafka-2.4.0" && chmod a+rw /opt/kafka-2.4.0 && curl -s "$KAFKA_MIRROR/kafka_2.12-2.4.0.tgz" | tar xz --strip-components=1 -C "/opt/kafka-2.4.0"
# Streams test dependencies
RUN curl -s "$KAFKA_MIRROR/kafka-streams-0.10.0.1-test.jar" -o /opt/kafka-0.10.0.1/libs/kafka-streams-0.10.0.1-test.jar
RUN curl -s "$KAFKA_MIRROR/kafka-streams-0.10.1.1-test.jar" -o /opt/kafka-0.10.1.1/libs/kafka-streams-0.10.1.1-test.jar
RUN curl -s "$KAFKA_MIRROR/kafka-streams-0.10.2.2-test.jar" -o /opt/kafka-0.10.2.2/libs/kafka-streams-0.10.2.2-test.jar
RUN curl -s "$KAFKA_MIRROR/kafka-streams-0.11.0.3-test.jar" -o /opt/kafka-0.11.0.3/libs/kafka-streams-0.11.0.3-test.jar
RUN curl -s "$KAFKA_MIRROR/kafka-streams-1.0.2-test.jar" -o /opt/kafka-1.0.2/libs/kafka-streams-1.0.2-test.jar
RUN curl -s "$KAFKA_MIRROR/kafka-streams-1.1.1-test.jar" -o /opt/kafka-1.1.1/libs/kafka-streams-1.1.1-test.jar
RUN curl -s "$KAFKA_MIRROR/kafka-streams-2.0.1-test.jar" -o /opt/kafka-2.0.1/libs/kafka-streams-2.0.1-test.jar
RUN curl -s "$KAFKA_MIRROR/kafka-streams-2.1.1-test.jar" -o /opt/kafka-2.1.1/libs/kafka-streams-2.1.1-test.jar
RUN curl -s "$KAFKA_MIRROR/kafka-streams-2.2.2-test.jar" -o /opt/kafka-2.2.2/libs/kafka-streams-2.2.2-test.jar
RUN curl -s "$KAFKA_MIRROR/kafka-streams-2.3.1-test.jar" -o /opt/kafka-2.3.1/libs/kafka-streams-2.3.1-test.jar
RUN curl -s "$KAFKA_MIRROR/kafka-streams-2.4.0-test.jar" -o /opt/kafka-2.4.0/libs/kafka-streams-2.4.0-test.jar
# The version of Kibosh to use for testing.
# If you update this, also update vagrant/base.sh
ARG KIBOSH_VERSION="8841dd392e6fbf02986e2fb1f1ebf04df344b65a"
# Install Kibosh
RUN apt-get install fuse
RUN cd /opt && git clone -q https://github.com/confluentinc/kibosh.git && cd "/opt/kibosh" && git reset --hard $KIBOSH_VERSION && mkdir "/opt/kibosh/build" && cd "/opt/kibosh/build" && ../configure && make -j 2
# Set up the ducker user.
RUN useradd -ms /bin/bash ducker && mkdir -p /home/ducker/ && rsync -aiq /root/.ssh/ /home/ducker/.ssh && chown -R ducker /home/ducker/ /mnt/ /var/log/ && echo "PATH=$(runuser -l ducker -c 'echo $PATH'):$JAVA_HOME/bin" >> /home/ducker/.ssh/environment && echo 'PATH=$PATH:'"$JAVA_HOME/bin" >> /home/ducker/.profile && echo 'ducker ALL=(ALL) NOPASSWD: ALL' >> /etc/sudoers
USER ducker
CMD sudo service ssh start && tail -f /dev/null

580
tests/docker/ducker-ak Executable file
View File

@@ -0,0 +1,580 @@
#!/usr/bin/env bash
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# Ducker-AK: a tool for running Apache Kafka system tests inside Docker images.
#
# Note: this should be compatible with the version of bash that ships on most
# Macs, bash 3.2.57.
#
script_path="${0}"
# The absolute path to the directory which this script is in. This will also be the directory
# which we run docker build from.
ducker_dir="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
# The absolute path to the root Kafka directory
kafka_dir="$( cd "${ducker_dir}/../.." && pwd )"
# The memory consumption to allow during the docker build.
# This does not include swap.
docker_build_memory_limit="3200m"
# The maximum mmemory consumption to allow in containers.
docker_run_memory_limit="2000m"
# The default number of cluster nodes to bring up if a number is not specified.
default_num_nodes=14
# The default OpenJDK base image.
default_jdk="openjdk:8"
# The default ducker-ak image name.
default_image_name="ducker-ak"
# Display a usage message on the terminal and exit.
#
# $1: The exit status to use
usage() {
local exit_status="${1}"
cat <<EOF
ducker-ak: a tool for running Apache Kafka tests inside Docker images.
Usage: ${script_path} [command] [options]
help|-h|--help
Display this help message
up [-n|--num-nodes NUM_NODES] [-f|--force] [docker-image]
[-C|--custom-ducktape DIR] [-e|--expose-ports ports]
Bring up a cluster with the specified amount of nodes (defaults to ${default_num_nodes}).
The docker image name defaults to ${default_image_name}. If --force is specified, we will
attempt to bring up an image even some parameters are not valid.
If --custom-ducktape is specified, we will install the provided custom
ducktape source code directory before bringing up the nodes. The provided
directory should be the ducktape git repo, not the ducktape installed module directory.
if --expose-ports is specified then we will expose those ports to random ephemeral ports
on the host. The argument can be a single port (like 5005), a port range like (5005-5009)
or a combination of port/port-range separated by comma (like 2181,9092 or 2181,5005-5008).
By default no port is exposed. See README.md for more detail on this option.
test [test-name(s)]
Run a test or set of tests inside the currently active Ducker nodes.
For example, to run the system test produce_bench_test, you would run:
./tests/docker/ducker-ak test ./tests/kafkatest/test/core/produce_bench_test.py
ssh [node-name|user-name@node-name] [command]
Log in to a running ducker container. If node-name is not given, it prints
the names of all running nodes. If node-name is 'all', we will run the
command on every node. If user-name is given, we will try to log in as
that user. Otherwise, we will log in as the 'ducker' user. If a command
is specified, we will run that command. Otherwise, we will provide a login
shell.
down [-q|--quiet] [-f|--force]
Tear down all the currently active ducker-ak nodes. If --quiet is specified,
only error messages are printed. If --force or -f is specified, "docker rm -f"
will be used to remove the nodes, which kills currently running ducker-ak test.
purge [--f|--force]
Purge Docker images created by ducker-ak. This will free disk space.
If --force is set, we run 'docker rmi -f'.
EOF
exit "${exit_status}"
}
# Exit with an error message.
die() {
echo $@
exit 1
}
# Check for the presence of certain commands.
#
# $@: The commands to check for. This function will die if any of these commands are not found by
# the 'which' command.
require_commands() {
local cmds="${@}"
for cmd in ${cmds}; do
which -- "${cmd}" &> /dev/null || die "You must install ${cmd} to run this script."
done
}
# Set a global variable to a value.
#
# $1: The variable name to set. This function will die if the variable already has a value. The
# variable will be made readonly to prevent any future modifications.
# $2: The value to set the variable to. This function will die if the value is empty or starts
# with a dash.
# $3: A human-readable description of the variable.
set_once() {
local key="${1}"
local value="${2}"
local what="${3}"
[[ -n "${!key}" ]] && die "Error: more than one value specified for ${what}."
verify_command_line_argument "${value}" "${what}"
# It would be better to use declare -g, but older bash versions don't support it.
export ${key}="${value}"
}
# Verify that a command-line argument is present and does not start with a slash.
#
# $1: The command-line argument to verify.
# $2: A human-readable description of the variable.
verify_command_line_argument() {
local value="${1}"
local what="${2}"
[[ -n "${value}" ]] || die "Error: no value specified for ${what}"
[[ ${value} == -* ]] && die "Error: invalid value ${value} specified for ${what}"
}
# Echo a message if a flag is set.
#
# $1: If this is 1, the message will be echoed.
# $@: The message
maybe_echo() {
local verbose="${1}"
shift
[[ "${verbose}" -eq 1 ]] && echo "${@}"
}
# Counts the number of elements passed to this subroutine.
count() {
echo $#
}
# Push a new directory on to the bash directory stack, or exit with a failure message.
#
# $1: The directory push on to the directory stack.
must_pushd() {
local target_dir="${1}"
pushd -- "${target_dir}" &> /dev/null || die "failed to change directory to ${target_dir}"
}
# Pop a directory from the bash directory stack, or exit with a failure message.
must_popd() {
popd &> /dev/null || die "failed to popd"
}
# Run a command and die if it fails.
#
# Optional flags:
# -v: print the command before running it.
# -o: display the command output.
# $@: The command to run.
must_do() {
local verbose=0
local output="/dev/null"
while true; do
case ${1} in
-v) verbose=1; shift;;
-o) output="/dev/stdout"; shift;;
*) break;;
esac
done
local cmd="${@}"
[[ "${verbose}" -eq 1 ]] && echo "${cmd}"
${cmd} >${output} || die "${1} failed"
}
# Ask the user a yes/no question.
#
# $1: The prompt to use
# $_return: 0 if the user answered no; 1 if the user answered yes.
ask_yes_no() {
local prompt="${1}"
while true; do
read -r -p "${prompt} " response
case "${response}" in
[yY]|[yY][eE][sS]) _return=1; return;;
[nN]|[nN][oO]) _return=0; return;;
*);;
esac
echo "Please respond 'yes' or 'no'."
echo
done
}
# Build a docker image.
#
# $1: The name of the image to build.
ducker_build() {
local image_name="${1}"
# Use SECONDS, a builtin bash variable that gets incremented each second, to measure the docker
# build duration.
SECONDS=0
must_pushd "${ducker_dir}"
# Tip: if you are scratching your head for some dependency problems that are referring to an old code version
# (for example java.lang.NoClassDefFoundError), add --no-cache flag to the build shall give you a clean start.
must_do -v -o docker build --memory="${docker_build_memory_limit}" \
--build-arg "ducker_creator=${user_name}" --build-arg "jdk_version=${jdk_version}" -t "${image_name}" \
-f "${ducker_dir}/Dockerfile" ${docker_args} -- .
docker_status=$?
must_popd
duration="${SECONDS}"
if [[ ${docker_status} -ne 0 ]]; then
die "** ERROR: Failed to build ${what} image after $((${duration} / 60))m \
$((${duration} % 60))s. See ${build_log} for details."
fi
echo "** Successfully built ${what} image in $((${duration} / 60))m \
$((${duration} % 60))s. See ${build_log} for details."
}
docker_run() {
local node=${1}
local image_name=${2}
local ports_option=${3}
local expose_ports=""
if [[ -n ${ports_option} ]]; then
expose_ports="-P"
for expose_port in ${ports_option//,/ }; do
expose_ports="${expose_ports} --expose ${expose_port}"
done
fi
# Invoke docker-run. We need privileged mode to be able to run iptables
# and mount FUSE filesystems inside the container. We also need it to
# run iptables inside the container.
must_do -v docker run --privileged \
-d -t -h "${node}" --network ducknet "${expose_ports}" \
--memory=${docker_run_memory_limit} --memory-swappiness=1 \
-v "${kafka_dir}:/opt/kafka-dev" --name "${node}" -- "${image_name}"
}
setup_custom_ducktape() {
local custom_ducktape="${1}"
local image_name="${2}"
[[ -f "${custom_ducktape}/ducktape/__init__.py" ]] || \
die "You must supply a valid ducktape directory to --custom-ducktape"
docker_run ducker01 "${image_name}"
local running_container="$(docker ps -f=network=ducknet -q)"
must_do -v -o docker cp "${custom_ducktape}" "${running_container}:/opt/ducktape"
docker exec --user=root ducker01 bash -c 'set -x && cd /opt/kafka-dev/tests && sudo python ./setup.py develop install && cd /opt/ducktape && sudo python ./setup.py develop install'
[[ $? -ne 0 ]] && die "failed to install the new ducktape."
must_do -v -o docker commit ducker01 "${image_name}"
must_do -v docker kill "${running_container}"
must_do -v docker rm ducker01
}
ducker_up() {
require_commands docker
while [[ $# -ge 1 ]]; do
case "${1}" in
-C|--custom-ducktape) set_once custom_ducktape "${2}" "the custom ducktape directory"; shift 2;;
-f|--force) force=1; shift;;
-n|--num-nodes) set_once num_nodes "${2}" "number of nodes"; shift 2;;
-j|--jdk) set_once jdk_version "${2}" "the OpenJDK base image"; shift 2;;
-e|--expose-ports) set_once expose_ports "${2}" "the ports to expose"; shift 2;;
*) set_once image_name "${1}" "docker image name"; shift;;
esac
done
[[ -n "${num_nodes}" ]] || num_nodes="${default_num_nodes}"
[[ -n "${jdk_version}" ]] || jdk_version="${default_jdk}"
[[ -n "${image_name}" ]] || image_name="${default_image_name}-${jdk_version/:/-}"
[[ "${num_nodes}" =~ ^-?[0-9]+$ ]] || \
die "ducker_up: the number of nodes must be an integer."
[[ "${num_nodes}" -gt 0 ]] || die "ducker_up: the number of nodes must be greater than 0."
if [[ "${num_nodes}" -lt 2 ]]; then
if [[ "${force}" -ne 1 ]]; then
echo "ducker_up: It is recommended to run at least 2 nodes, since ducker01 is only \
used to run ducktape itself. If you want to do it anyway, you can use --force to attempt to \
use only ${num_nodes}."
exit 1
fi
fi
docker ps >/dev/null || die "ducker_up: failed to run docker. Please check that the daemon is started."
ducker_build "${image_name}"
docker inspect --format='{{.Config.Labels}}' --type=image "${image_name}" | grep -q 'ducker.type'
local docker_status=${PIPESTATUS[0]}
local grep_status=${PIPESTATUS[1]}
[[ "${docker_status}" -eq 0 ]] || die "ducker_up: failed to inspect image ${image_name}. \
Please check that it exists."
if [[ "${grep_status}" -ne 0 ]]; then
if [[ "${force}" -ne 1 ]]; then
echo "ducker_up: ${image_name} does not appear to be a ducker image. It lacks the \
ducker.type label. If you think this is a mistake, you can use --force to attempt to bring \
it up anyway."
exit 1
fi
fi
local running_containers="$(docker ps -f=network=ducknet -q)"
local num_running_containers=$(count ${running_containers})
if [[ ${num_running_containers} -gt 0 ]]; then
die "ducker_up: there are ${num_running_containers} ducker containers \
running already. Use ducker down to bring down these containers before \
attempting to start new ones."
fi
echo "ducker_up: Bringing up ${image_name} with ${num_nodes} nodes..."
if docker network inspect ducknet &>/dev/null; then
must_do -v docker network rm ducknet
fi
must_do -v docker network create ducknet
if [[ -n "${custom_ducktape}" ]]; then
setup_custom_ducktape "${custom_ducktape}" "${image_name}"
fi
for n in $(seq -f %02g 1 ${num_nodes}); do
local node="ducker${n}"
docker_run "${node}" "${image_name}" "${expose_ports}"
done
mkdir -p "${ducker_dir}/build"
exec 3<> "${ducker_dir}/build/node_hosts"
for n in $(seq -f %02g 1 ${num_nodes}); do
local node="ducker${n}"
docker exec --user=root "${node}" grep "${node}" /etc/hosts >&3
[[ $? -ne 0 ]] && die "failed to find the /etc/hosts entry for ${node}"
done
exec 3>&-
for n in $(seq -f %02g 1 ${num_nodes}); do
local node="ducker${n}"
docker exec --user=root "${node}" \
bash -c "grep -v ${node} /opt/kafka-dev/tests/docker/build/node_hosts >> /etc/hosts"
[[ $? -ne 0 ]] && die "failed to append to the /etc/hosts file on ${node}"
done
echo "ducker_up: added the latest entries to /etc/hosts on each node."
generate_cluster_json_file "${num_nodes}" "${ducker_dir}/build/cluster.json"
echo "ducker_up: successfully wrote ${ducker_dir}/build/cluster.json"
echo "** ducker_up: successfully brought up ${num_nodes} nodes."
}
# Generate the cluster.json file used by ducktape to identify cluster nodes.
#
# $1: The number of cluster nodes.
# $2: The path to write the cluster.json file to.
generate_cluster_json_file() {
local num_nodes="${1}"
local path="${2}"
exec 3<> "${path}"
cat<<EOF >&3
{
"_comment": [
"Licensed to the Apache Software Foundation (ASF) under one or more",
"contributor license agreements. See the NOTICE file distributed with",
"this work for additional information regarding copyright ownership.",
"The ASF licenses this file to You under the Apache License, Version 2.0",
"(the \"License\"); you may not use this file except in compliance with",
"the License. You may obtain a copy of the License at",
"",
"http://www.apache.org/licenses/LICENSE-2.0",
"",
"Unless required by applicable law or agreed to in writing, software",
"distributed under the License is distributed on an \"AS IS\" BASIS,",
"WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.",
"See the License for the specific language governing permissions and",
"limitations under the License."
],
"nodes": [
EOF
for n in $(seq 2 ${num_nodes}); do
if [[ ${n} -eq ${num_nodes} ]]; then
suffix=""
else
suffix=","
fi
local node=$(printf ducker%02d ${n})
cat<<EOF >&3
{
"externally_routable_ip": "${node}",
"ssh_config": {
"host": "${node}",
"hostname": "${node}",
"identityfile": "/home/ducker/.ssh/id_rsa",
"password": "",
"port": 22,
"user": "ducker"
}
}${suffix}
EOF
done
cat<<EOF >&3
]
}
EOF
exec 3>&-
}
ducker_test() {
require_commands docker
docker inspect ducker01 &>/dev/null || \
die "ducker_test: the ducker01 instance appears to be down. Did you run 'ducker up'?"
[[ $# -lt 1 ]] && \
die "ducker_test: you must supply at least one system test to run. Type --help for help."
local args=""
local kafka_test=0
for arg in "${@}"; do
local regex=".*\/kafkatest\/(.*)"
if [[ $arg =~ $regex ]]; then
local kpath=${BASH_REMATCH[1]}
args="${args} ./tests/kafkatest/${kpath}"
else
args="${args} ${arg}"
fi
done
must_pushd "${kafka_dir}"
(test -f ./gradlew || gradle) && ./gradlew systemTestLibs
must_popd
cmd="cd /opt/kafka-dev && ducktape --cluster-file /opt/kafka-dev/tests/docker/build/cluster.json $args"
echo "docker exec ducker01 bash -c \"${cmd}\""
exec docker exec --user=ducker ducker01 bash -c "${cmd}"
}
ducker_ssh() {
require_commands docker
[[ $# -eq 0 ]] && die "ducker_ssh: Please specify a container name to log into. \
Currently active containers: $(echo_running_container_names)"
local node_info="${1}"
shift
local guest_command="$*"
local user_name="ducker"
if [[ "${node_info}" =~ @ ]]; then
user_name="${node_info%%@*}"
local node_name="${node_info##*@}"
else
local node_name="${node_info}"
fi
local docker_flags=""
if [[ -z "${guest_command}" ]]; then
local docker_flags="${docker_flags} -t"
local guest_command_prefix=""
guest_command=bash
else
local guest_command_prefix="bash -c"
fi
if [[ "${node_name}" == "all" ]]; then
local nodes=$(echo_running_container_names)
[[ "${nodes}" == "(none)" ]] && die "ducker_ssh: can't locate any running ducker nodes."
for node in ${nodes}; do
docker exec --user=${user_name} -i ${docker_flags} "${node}" \
${guest_command_prefix} "${guest_command}" || die "docker exec ${node} failed"
done
else
docker inspect --type=container -- "${node_name}" &>/dev/null || \
die "ducker_ssh: can't locate node ${node_name}. Currently running nodes: \
$(echo_running_container_names)"
exec docker exec --user=${user_name} -i ${docker_flags} "${node_name}" \
${guest_command_prefix} "${guest_command}"
fi
}
# Echo all the running Ducker container names, or (none) if there are no running Ducker containers.
echo_running_container_names() {
node_names="$(docker ps -f=network=ducknet -q --format '{{.Names}}' | sort)"
if [[ -z "${node_names}" ]]; then
echo "(none)"
else
echo ${node_names//$'\n'/ }
fi
}
ducker_down() {
require_commands docker
local verbose=1
local force_str=""
while [[ $# -ge 1 ]]; do
case "${1}" in
-q|--quiet) verbose=0; shift;;
-f|--force) force_str="-f"; shift;;
*) die "ducker_down: unexpected command-line argument ${1}";;
esac
done
local running_containers
running_containers="$(docker ps -f=network=ducknet -q)"
[[ $? -eq 0 ]] || die "ducker_down: docker command failed. Is the docker daemon running?"
running_containers=${running_containers//$'\n'/ }
local all_containers="$(docker ps -a -f=network=ducknet -q)"
all_containers=${all_containers//$'\n'/ }
if [[ -z "${all_containers}" ]]; then
maybe_echo "${verbose}" "No ducker containers found."
return
fi
verbose_flag=""
if [[ ${verbose} == 1 ]]; then
verbose_flag="-v"
fi
if [[ -n "${running_containers}" ]]; then
must_do ${verbose_flag} docker kill "${running_containers}"
fi
must_do ${verbose_flag} docker rm ${force_str} "${all_containers}"
must_do ${verbose_flag} -o rm -f -- "${ducker_dir}/build/node_hosts" "${ducker_dir}/build/cluster.json"
if docker network inspect ducknet &>/dev/null; then
must_do -v docker network rm ducknet
fi
maybe_echo "${verbose}" "ducker_down: removed $(count ${all_containers}) containers."
}
ducker_purge() {
require_commands docker
local force_str=""
while [[ $# -ge 1 ]]; do
case "${1}" in
-f|--force) force_str="-f"; shift;;
*) die "ducker_purge: unknown argument ${1}";;
esac
done
echo "** ducker_purge: attempting to locate ducker images to purge"
local images
images=$(docker images -q -a -f label=ducker.creator)
[[ $? -ne 0 ]] && die "docker images command failed"
images=${images//$'\n'/ }
declare -a purge_images=()
if [[ -z "${images}" ]]; then
echo "** ducker_purge: no images found to purge."
exit 0
fi
echo "** ducker_purge: images to delete:"
for image in ${images}; do
echo -n "${image} "
docker inspect --format='{{.Config.Labels}} {{.Created}}' --type=image "${image}"
[[ $? -ne 0 ]] && die "docker inspect ${image} failed"
done
ask_yes_no "Delete these docker images? [y/n]"
[[ "${_return}" -eq 0 ]] && exit 0
must_do -v -o docker rmi ${force_str} ${images}
}
# Parse command-line arguments
[[ $# -lt 1 ]] && usage 0
# Display the help text if -h or --help appears in the command line
for arg in ${@}; do
case "${arg}" in
-h|--help) usage 0;;
--) break;;
*);;
esac
done
action="${1}"
shift
case "${action}" in
help) usage 0;;
up|test|ssh|down|purge)
ducker_${action} "${@}"; exit 0;;
*) echo "Unknown command '${action}'. Type '${script_path} --help' for usage information."
exit 1;;
esac

30
tests/docker/run_tests.sh Executable file
View File

@@ -0,0 +1,30 @@
#!/usr/bin/env bash
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
KAFKA_NUM_CONTAINERS=${KAFKA_NUM_CONTAINERS:-14}
TC_PATHS=${TC_PATHS:-./kafkatest/}
die() {
echo $@
exit 1
}
if ${SCRIPT_DIR}/ducker-ak ssh | grep -q '(none)'; then
${SCRIPT_DIR}/ducker-ak up -n "${KAFKA_NUM_CONTAINERS}" || die "ducker-ak up failed"
fi
${SCRIPT_DIR}/ducker-ak test ${TC_PATHS} ${_DUCKTAPE_OPTIONS} || die "ducker-ak test failed"

21
tests/docker/ssh-config Normal file
View File

@@ -0,0 +1,21 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
Host *
ControlMaster auto
ControlPath ~/.ssh/master-%r@%h:%p
StrictHostKeyChecking no
ConnectTimeout=10
IdentityFile ~/.ssh/id_rsa

View File

@@ -0,0 +1,15 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC0qDT9kEPWc8JQ53b4KnT/ZJOLwb+3c//jpLW/2ofjDyIsPW4FohLpicfouch/zsRpN4G38lua+2BsGls9sMIZc6PXY2L+NIGCkqEMdCoU1Ym8SMtyJklfzp3m/0PeK9s2dLlR3PFRYvyFA4btQK5hkbYDNZPzf4airvzdRzLkrFf81+RemaMI2EtONwJRcbLViPaTXVKJdbFwJTJ1u7yu9wDYWHKBMA92mHTQeP6bhVYCqxJn3to/RfZYd+sHw6mfxVg5OrAlUOYpSV4pDNCAsIHdtZ56V8NQlJL6NJ2vzzSSYUwLMqe88fhrC8yYHoxC07QPy1EdkSTHdohAicyT root@knode01.knw

21
tests/docker/ssh/config Normal file
View File

@@ -0,0 +1,21 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
Host *
ControlMaster auto
ControlPath ~/.ssh/master-%r@%h:%p
StrictHostKeyChecking no
ConnectTimeout=10
IdentityFile ~/.ssh/id_rsa

27
tests/docker/ssh/id_rsa Normal file
View File

@@ -0,0 +1,27 @@
-----BEGIN RSA PRIVATE KEY-----
MIIEpQIBAAKCAQEAtKg0/ZBD1nPCUOd2+Cp0/2STi8G/t3P/46S1v9qH4w8iLD1u
BaIS6YnH6LnIf87EaTeBt/JbmvtgbBpbPbDCGXOj12Ni/jSBgpKhDHQqFNWJvEjL
ciZJX86d5v9D3ivbNnS5UdzxUWL8hQOG7UCuYZG2AzWT83+Goq783Ucy5KxX/Nfk
XpmjCNhLTjcCUXGy1Yj2k11SiXWxcCUydbu8rvcA2FhygTAPdph00Hj+m4VWAqsS
Z97aP0X2WHfrB8Opn8VYOTqwJVDmKUleKQzQgLCB3bWeelfDUJSS+jSdr880kmFM
CzKnvPH4awvMmB6MQtO0D8tRHZEkx3aIQInMkwIDAQABAoIBAQCz6EMFNNLp0NP1
X9yRXS6wW4e4CRWUazesiw3YZpcmnp6IchCMGZA99FEZyVILPW1J3tYWyotBdw7Z
+RFeCRXy5L+IMtiVkNJcpwss7M4ve0w0LkY0gj5V49xJ+3Gp4gDnZSxcguvrAem5
yP5obR572fDpl0SknB4HCr6U2l+rauzrLyevy5eeDT/vmXbuM1cdHpNIXmmElz4L
t31n+exQRn6tP1h516iXbcYbopxDgdv2qKGAqzWKE6TyWpzF5x7kjOEYt0bZ5QO3
Lwh7AAqE/3mwxlYwng1L4WAT7RtcP19W+9JDIc7ENInMGxq6q46p1S3IPZsf1cj/
aAJ9q3LBAoGBAOVJr0+WkR786n3BuswpGQWBgVxfai4y9Lf90vuGKawdQUzXv0/c
EB/CFqP/dIsquukA8PfzjNMyTNmEHXi4Sf16H8Rg4EGhIYMEqIQojx1t/yLLm0aU
YPEvW/02Umtlg3pJw9fQAAzFVqCasw2E2lUdAUkydGRwDUJZmv2/b3NzAoGBAMm0
Jo7Et7ochH8Vku6uA+hG+RdwlKFm5JA7/Ci3DOdQ1zmJNrvBBFQLo7AjA4iSCoBd
s9+y0nrSPcF4pM3l6ghLheaqbnIi2HqIMH9mjDbrOZiWvbnjvjpOketgNX8vV3Ye
GUkSjoNcmvRmdsICmUjeML8bGOmq4zF9W/GIfTphAoGBAKGRo8R8f/SLGh3VtvCI
gUY89NAHuEWnyIQii1qMNq8+yjYAzaHTm1UVqmiT6SbrzFvGOwcuCu0Dw91+2Fmp
2xGPzfTOoxf8GCY/0ROXlQmS6jc1rEw24Hzz92ldrwRYuyYf9q4Ltw1IvXtcp5F+
LW/OiYpv0E66Gs3HYI0wKbP7AoGBAJMZWeFW37LQJ2TTJAQDToAwemq4xPxsoJX7
2SsMTFHKKBwi0JLe8jwk/OxwrJwF/bieHZcvv8ao2zbkuDQcz6/a/D074C5G8V9z
QQM4k1td8vQwQw91Yv782/gvgvRNX1iaHNCowtxURgGlVEirQoTc3eoRZfrLkMM/
7DTa2JEhAoGACEu3zHJ1sgyeOEgLArUJXlQM30A/ulMrnCd4MEyIE+ReyWAUevUQ
0lYdVNva0/W4C5e2lUOJL41jjIPLqI7tcFR2PZE6n0xTTkxNH5W2u1WpFeKjx+O3
czv7Bt6wYyLHIMy1JEqAQ7pw1mtJ5s76UDvXUhciF+DU2pWYc6APKR0=
-----END RSA PRIVATE KEY-----

View File

@@ -0,0 +1 @@
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC0qDT9kEPWc8JQ53b4KnT/ZJOLwb+3c//jpLW/2ofjDyIsPW4FohLpicfouch/zsRpN4G38lua+2BsGls9sMIZc6PXY2L+NIGCkqEMdCoU1Ym8SMtyJklfzp3m/0PeK9s2dLlR3PFRYvyFA4btQK5hkbYDNZPzf4airvzdRzLkrFf81+RemaMI2EtONwJRcbLViPaTXVKJdbFwJTJ1u7yu9wDYWHKBMA92mHTQeP6bhVYCqxJn3to/RfZYd+sHw6mfxVg5OrAlUOYpSV4pDNCAsIHdtZ56V8NQlJL6NJ2vzzSSYUwLMqe88fhrC8yYHoxC07QPy1EdkSTHdohAicyT root@knode01.knw

View File

@@ -0,0 +1,25 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# This determines the version of kafkatest that can be published to PyPi and installed with pip
#
# Note that in development, this version name can't follow Kafka's convention of having a trailing "-SNAPSHOT"
# due to python version naming restrictions, which are enforced by python packaging tools
# (see https://www.python.org/dev/peps/pep-0440/)
#
# Instead, in development branches, the version should have a suffix of the form ".devN"
#
# For example, when Kafka is at version 1.0.0-SNAPSHOT, this should be something like "1.0.0.dev0"
__version__ = '2.5.0.dev0'

View File

@@ -0,0 +1,14 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

View File

@@ -0,0 +1,14 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

View File

@@ -0,0 +1,279 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from ducktape.mark import matrix
from ducktape.mark import parametrize
from ducktape.mark.resource import cluster
from ducktape.services.service import Service
from ducktape.tests.test import Test
from kafkatest.services.kafka import KafkaService
from kafkatest.services.performance import ProducerPerformanceService, EndToEndLatencyService, ConsumerPerformanceService, throughput, latency, compute_aggregate_throughput
from kafkatest.services.zookeeper import ZookeeperService
from kafkatest.version import DEV_BRANCH, KafkaVersion
TOPIC_REP_ONE = "topic-replication-factor-one"
TOPIC_REP_THREE = "topic-replication-factor-three"
DEFAULT_RECORD_SIZE = 100 # bytes
class Benchmark(Test):
"""A benchmark of Kafka producer/consumer performance. This replicates the test
run here:
https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines
"""
def __init__(self, test_context):
super(Benchmark, self).__init__(test_context)
self.num_zk = 1
self.num_brokers = 3
self.topics = {
TOPIC_REP_ONE: {'partitions': 6, 'replication-factor': 1},
TOPIC_REP_THREE: {'partitions': 6, 'replication-factor': 3}
}
self.zk = ZookeeperService(test_context, self.num_zk)
self.msgs_large = 10000000
self.batch_size = 8*1024
self.buffer_memory = 64*1024*1024
self.msg_sizes = [10, 100, 1000, 10000, 100000]
self.target_data_size = 128*1024*1024
self.target_data_size_gb = self.target_data_size/float(1024*1024*1024)
def setUp(self):
self.zk.start()
def start_kafka(self, security_protocol, interbroker_security_protocol, version):
self.kafka = KafkaService(
self.test_context, self.num_brokers,
self.zk, security_protocol=security_protocol,
interbroker_security_protocol=interbroker_security_protocol, topics=self.topics,
version=version)
self.kafka.log_level = "INFO" # We don't DEBUG logging here
self.kafka.start()
@cluster(num_nodes=5)
@parametrize(acks=1, topic=TOPIC_REP_ONE)
@parametrize(acks=1, topic=TOPIC_REP_THREE)
@parametrize(acks=-1, topic=TOPIC_REP_THREE)
@matrix(acks=[1], topic=[TOPIC_REP_THREE], message_size=[10, 100, 1000, 10000, 100000], compression_type=["none", "snappy"], security_protocol=['PLAINTEXT', 'SSL'])
@cluster(num_nodes=7)
@parametrize(acks=1, topic=TOPIC_REP_THREE, num_producers=3)
def test_producer_throughput(self, acks, topic, num_producers=1, message_size=DEFAULT_RECORD_SIZE,
compression_type="none", security_protocol='PLAINTEXT', client_version=str(DEV_BRANCH),
broker_version=str(DEV_BRANCH)):
"""
Setup: 1 node zk + 3 node kafka cluster
Produce ~128MB worth of messages to a topic with 6 partitions. Required acks, topic replication factor,
security protocol and message size are varied depending on arguments injected into this test.
Collect and return aggregate throughput statistics after all messages have been acknowledged.
(This runs ProducerPerformance.java under the hood)
"""
client_version = KafkaVersion(client_version)
broker_version = KafkaVersion(broker_version)
self.validate_versions(client_version, broker_version)
self.start_kafka(security_protocol, security_protocol, broker_version)
# Always generate the same total amount of data
nrecords = int(self.target_data_size / message_size)
self.producer = ProducerPerformanceService(
self.test_context, num_producers, self.kafka, topic=topic,
num_records=nrecords, record_size=message_size, throughput=-1, version=client_version,
settings={
'acks': acks,
'compression.type': compression_type,
'batch.size': self.batch_size,
'buffer.memory': self.buffer_memory})
self.producer.run()
return compute_aggregate_throughput(self.producer)
@cluster(num_nodes=5)
@parametrize(security_protocol='SSL', interbroker_security_protocol='PLAINTEXT')
@matrix(security_protocol=['PLAINTEXT', 'SSL'], compression_type=["none", "snappy"])
def test_long_term_producer_throughput(self, compression_type="none", security_protocol='PLAINTEXT',
interbroker_security_protocol=None, client_version=str(DEV_BRANCH),
broker_version=str(DEV_BRANCH)):
"""
Setup: 1 node zk + 3 node kafka cluster
Produce 10e6 100 byte messages to a topic with 6 partitions, replication-factor 3, and acks=1.
Collect and return aggregate throughput statistics after all messages have been acknowledged.
(This runs ProducerPerformance.java under the hood)
"""
client_version = KafkaVersion(client_version)
broker_version = KafkaVersion(broker_version)
self.validate_versions(client_version, broker_version)
if interbroker_security_protocol is None:
interbroker_security_protocol = security_protocol
self.start_kafka(security_protocol, interbroker_security_protocol, broker_version)
self.producer = ProducerPerformanceService(
self.test_context, 1, self.kafka,
topic=TOPIC_REP_THREE, num_records=self.msgs_large, record_size=DEFAULT_RECORD_SIZE,
throughput=-1, version=client_version, settings={
'acks': 1,
'compression.type': compression_type,
'batch.size': self.batch_size,
'buffer.memory': self.buffer_memory
},
intermediate_stats=True
)
self.producer.run()
summary = ["Throughput over long run, data > memory:"]
data = {}
# FIXME we should be generating a graph too
# Try to break it into 5 blocks, but fall back to a smaller number if
# there aren't even 5 elements
block_size = max(len(self.producer.stats[0]) / 5, 1)
nblocks = len(self.producer.stats[0]) / block_size
for i in range(nblocks):
subset = self.producer.stats[0][i*block_size:min((i+1)*block_size, len(self.producer.stats[0]))]
if len(subset) == 0:
summary.append(" Time block %d: (empty)" % i)
data[i] = None
else:
records_per_sec = sum([stat['records_per_sec'] for stat in subset])/float(len(subset))
mb_per_sec = sum([stat['mbps'] for stat in subset])/float(len(subset))
summary.append(" Time block %d: %f rec/sec (%f MB/s)" % (i, records_per_sec, mb_per_sec))
data[i] = throughput(records_per_sec, mb_per_sec)
self.logger.info("\n".join(summary))
return data
@cluster(num_nodes=5)
@parametrize(security_protocol='SSL', interbroker_security_protocol='PLAINTEXT')
@matrix(security_protocol=['PLAINTEXT', 'SSL'], compression_type=["none", "snappy"])
@cluster(num_nodes=6)
@matrix(security_protocol=['SASL_PLAINTEXT', 'SASL_SSL'], compression_type=["none", "snappy"])
def test_end_to_end_latency(self, compression_type="none", security_protocol="PLAINTEXT",
interbroker_security_protocol=None, client_version=str(DEV_BRANCH),
broker_version=str(DEV_BRANCH)):
"""
Setup: 1 node zk + 3 node kafka cluster
Produce (acks = 1) and consume 10e3 messages to a topic with 6 partitions and replication-factor 3,
measuring the latency between production and consumption of each message.
Return aggregate latency statistics.
(Under the hood, this simply runs EndToEndLatency.scala)
"""
client_version = KafkaVersion(client_version)
broker_version = KafkaVersion(broker_version)
self.validate_versions(client_version, broker_version)
if interbroker_security_protocol is None:
interbroker_security_protocol = security_protocol
self.start_kafka(security_protocol, interbroker_security_protocol, broker_version)
self.logger.info("BENCHMARK: End to end latency")
self.perf = EndToEndLatencyService(
self.test_context, 1, self.kafka,
topic=TOPIC_REP_THREE, num_records=10000,
compression_type=compression_type, version=client_version
)
self.perf.run()
return latency(self.perf.results[0]['latency_50th_ms'], self.perf.results[0]['latency_99th_ms'], self.perf.results[0]['latency_999th_ms'])
@cluster(num_nodes=6)
@parametrize(security_protocol='SSL', interbroker_security_protocol='PLAINTEXT')
@matrix(security_protocol=['PLAINTEXT', 'SSL'], compression_type=["none", "snappy"])
def test_producer_and_consumer(self, compression_type="none", security_protocol="PLAINTEXT",
interbroker_security_protocol=None,
client_version=str(DEV_BRANCH), broker_version=str(DEV_BRANCH)):
"""
Setup: 1 node zk + 3 node kafka cluster
Concurrently produce and consume 10e6 messages with a single producer and a single consumer,
Return aggregate throughput statistics for both producer and consumer.
(Under the hood, this runs ProducerPerformance.java, and ConsumerPerformance.scala)
"""
client_version = KafkaVersion(client_version)
broker_version = KafkaVersion(broker_version)
self.validate_versions(client_version, broker_version)
if interbroker_security_protocol is None:
interbroker_security_protocol = security_protocol
self.start_kafka(security_protocol, interbroker_security_protocol, broker_version)
num_records = 10 * 1000 * 1000 # 10e6
self.producer = ProducerPerformanceService(
self.test_context, 1, self.kafka,
topic=TOPIC_REP_THREE,
num_records=num_records, record_size=DEFAULT_RECORD_SIZE, throughput=-1, version=client_version,
settings={
'acks': 1,
'compression.type': compression_type,
'batch.size': self.batch_size,
'buffer.memory': self.buffer_memory
}
)
self.consumer = ConsumerPerformanceService(
self.test_context, 1, self.kafka, topic=TOPIC_REP_THREE, messages=num_records)
Service.run_parallel(self.producer, self.consumer)
data = {
"producer": compute_aggregate_throughput(self.producer),
"consumer": compute_aggregate_throughput(self.consumer)
}
summary = [
"Producer + consumer:",
str(data)]
self.logger.info("\n".join(summary))
return data
@cluster(num_nodes=6)
@parametrize(security_protocol='SSL', interbroker_security_protocol='PLAINTEXT')
@matrix(security_protocol=['PLAINTEXT', 'SSL'], compression_type=["none", "snappy"])
def test_consumer_throughput(self, compression_type="none", security_protocol="PLAINTEXT",
interbroker_security_protocol=None, num_consumers=1,
client_version=str(DEV_BRANCH), broker_version=str(DEV_BRANCH)):
"""
Consume 10e6 100-byte messages with 1 or more consumers from a topic with 6 partitions
and report throughput.
"""
client_version = KafkaVersion(client_version)
broker_version = KafkaVersion(broker_version)
self.validate_versions(client_version, broker_version)
if interbroker_security_protocol is None:
interbroker_security_protocol = security_protocol
self.start_kafka(security_protocol, interbroker_security_protocol, broker_version)
num_records = 10 * 1000 * 1000 # 10e6
# seed kafka w/messages
self.producer = ProducerPerformanceService(
self.test_context, 1, self.kafka,
topic=TOPIC_REP_THREE,
num_records=num_records, record_size=DEFAULT_RECORD_SIZE, throughput=-1, version=client_version,
settings={
'acks': 1,
'compression.type': compression_type,
'batch.size': self.batch_size,
'buffer.memory': self.buffer_memory
}
)
self.producer.run()
# consume
self.consumer = ConsumerPerformanceService(
self.test_context, num_consumers, self.kafka,
topic=TOPIC_REP_THREE, messages=num_records)
self.consumer.group = "test-consumer-group"
self.consumer.run()
return compute_aggregate_throughput(self.consumer)
def validate_versions(self, client_version, broker_version):
assert client_version <= broker_version, "Client version %s should be <= than broker version %s" (client_version, broker_version)

View File

@@ -0,0 +1,14 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

View File

@@ -0,0 +1,164 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from ducktape.tests.test import Test
from ducktape.mark.resource import cluster
from ducktape.mark import parametrize, matrix
from kafkatest.tests.kafka_test import KafkaTest
from kafkatest.services.performance.streams_performance import StreamsSimpleBenchmarkService
from kafkatest.services.zookeeper import ZookeeperService
from kafkatest.services.kafka import KafkaService
from kafkatest.version import DEV_BRANCH
STREAMS_SIMPLE_TESTS = ["streamprocess", "streamprocesswithsink", "streamprocesswithstatestore", "streamprocesswithwindowstore"]
STREAMS_COUNT_TESTS = ["streamcount", "streamcountwindowed"]
STREAMS_JOIN_TESTS = ["streamtablejoin", "streamstreamjoin", "tabletablejoin"]
NON_STREAMS_TESTS = ["consume", "consumeproduce"]
ALL_TEST = "all"
STREAMS_SIMPLE_TEST = "streams-simple"
STREAMS_COUNT_TEST = "streams-count"
STREAMS_JOIN_TEST = "streams-join"
class StreamsSimpleBenchmarkTest(Test):
"""
Simple benchmark of Kafka Streams.
"""
def __init__(self, test_context):
super(StreamsSimpleBenchmarkTest, self).__init__(test_context)
# these values could be updated in ad-hoc benchmarks
self.key_skew = 0
self.value_size = 1024
self.num_records = 10000000L
self.num_threads = 1
self.replication = 1
@cluster(num_nodes=12)
@matrix(test=["consume", "consumeproduce",
"streamprocess", "streamprocesswithsink", "streamprocesswithstatestore", "streamprocesswithwindowstore",
"streamcount", "streamcountwindowed",
"streamtablejoin", "streamstreamjoin", "tabletablejoin"],
scale=[1])
def test_simple_benchmark(self, test, scale):
"""
Run simple Kafka Streams benchmark
"""
self.driver = [None] * (scale + 1)
self.final = {}
#############
# SETUP PHASE
#############
self.zk = ZookeeperService(self.test_context, num_nodes=1)
self.zk.start()
self.kafka = KafkaService(self.test_context, num_nodes=scale, zk=self.zk, version=DEV_BRANCH, topics={
'simpleBenchmarkSourceTopic1' : { 'partitions': scale, 'replication-factor': self.replication },
'simpleBenchmarkSourceTopic2' : { 'partitions': scale, 'replication-factor': self.replication },
'simpleBenchmarkSinkTopic' : { 'partitions': scale, 'replication-factor': self.replication },
'yahooCampaigns' : { 'partitions': 20, 'replication-factor': self.replication },
'yahooEvents' : { 'partitions': 20, 'replication-factor': self.replication }
})
self.kafka.log_level = "INFO"
self.kafka.start()
load_test = ""
if test == ALL_TEST:
load_test = "load-two"
if test in STREAMS_JOIN_TESTS or test == STREAMS_JOIN_TEST:
load_test = "load-two"
if test in STREAMS_COUNT_TESTS or test == STREAMS_COUNT_TEST:
load_test = "load-one"
if test in STREAMS_SIMPLE_TESTS or test == STREAMS_SIMPLE_TEST:
load_test = "load-one"
if test in NON_STREAMS_TESTS:
load_test = "load-one"
################
# LOAD PHASE
################
self.load_driver = StreamsSimpleBenchmarkService(self.test_context,
self.kafka,
load_test,
self.num_threads,
self.num_records,
self.key_skew,
self.value_size)
self.load_driver.start()
self.load_driver.wait(3600) # wait at most 30 minutes
self.load_driver.stop()
if test == ALL_TEST:
for single_test in STREAMS_SIMPLE_TESTS + STREAMS_COUNT_TESTS + STREAMS_JOIN_TESTS:
self.execute(single_test, scale)
elif test == STREAMS_SIMPLE_TEST:
for single_test in STREAMS_SIMPLE_TESTS:
self.execute(single_test, scale)
elif test == STREAMS_COUNT_TEST:
for single_test in STREAMS_COUNT_TESTS:
self.execute(single_test, scale)
elif test == STREAMS_JOIN_TEST:
for single_test in STREAMS_JOIN_TESTS:
self.execute(single_test, scale)
else:
self.execute(test, scale)
return self.final
def execute(self, test, scale):
################
# RUN PHASE
################
for num in range(0, scale):
self.driver[num] = StreamsSimpleBenchmarkService(self.test_context,
self.kafka,
test,
self.num_threads,
self.num_records,
self.key_skew,
self.value_size)
self.driver[num].start()
#######################
# STOP + COLLECT PHASE
#######################
data = [None] * (scale)
for num in range(0, scale):
self.driver[num].wait()
self.driver[num].stop()
self.driver[num].node.account.ssh("grep Performance %s" % self.driver[num].STDOUT_FILE, allow_fail=False)
data[num] = self.driver[num].collect_data(self.driver[num].node, "")
self.driver[num].read_jmx_output_all_nodes()
for num in range(0, scale):
for key in data[num]:
self.final[key + "-" + str(num)] = data[num][key]
for key in sorted(self.driver[num].jmx_stats[0]):
self.logger.info("%s: %s" % (key, self.driver[num].jmx_stats[0][key]))
self.final[test + "-jmx-avg-" + str(num)] = self.driver[num].average_jmx_value
self.final[test + "-jmx-max-" + str(num)] = self.driver[num].maximum_jmx_value

View File

@@ -0,0 +1,14 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

View File

@@ -0,0 +1,137 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import importlib
import os
from kafkatest.version import get_version, KafkaVersion, DEV_BRANCH
"""This module serves a few purposes:
First, it gathers information about path layout in a single place, and second, it
makes the layout of the Kafka installation pluggable, so that users are not forced
to use the layout assumed in the KafkaPathResolver class.
To run system tests using your own path resolver, use for example:
ducktape <TEST_PATH> --globals '{"kafka-path-resolver": "my.path.resolver.CustomResolverClass"}'
"""
SCRATCH_ROOT = "/mnt"
KAFKA_INSTALL_ROOT = "/opt"
KAFKA_PATH_RESOLVER_KEY = "kafka-path-resolver"
KAFKA_PATH_RESOLVER = "kafkatest.directory_layout.kafka_path.KafkaSystemTestPathResolver"
# Variables for jar path resolution
CORE_JAR_NAME = "core"
CORE_LIBS_JAR_NAME = "core-libs"
CORE_DEPENDANT_TEST_LIBS_JAR_NAME = "core-dependant-testlibs"
TOOLS_JAR_NAME = "tools"
TOOLS_DEPENDANT_TEST_LIBS_JAR_NAME = "tools-dependant-libs"
JARS = {
"dev": {
CORE_JAR_NAME: "core/build/*/*.jar",
CORE_LIBS_JAR_NAME: "core/build/libs/*.jar",
CORE_DEPENDANT_TEST_LIBS_JAR_NAME: "core/build/dependant-testlibs/*.jar",
TOOLS_JAR_NAME: "tools/build/libs/kafka-tools*.jar",
TOOLS_DEPENDANT_TEST_LIBS_JAR_NAME: "tools/build/dependant-libs*/*.jar"
}
}
def create_path_resolver(context, project="kafka"):
"""Factory for generating a path resolver class
This will first check for a fully qualified path resolver classname in context.globals.
If present, construct a new instance, else default to KafkaSystemTestPathResolver
"""
assert project is not None
if KAFKA_PATH_RESOLVER_KEY in context.globals:
resolver_fully_qualified_classname = context.globals[KAFKA_PATH_RESOLVER_KEY]
else:
resolver_fully_qualified_classname = KAFKA_PATH_RESOLVER
# Using the fully qualified classname, import the resolver class
(module_name, resolver_class_name) = resolver_fully_qualified_classname.rsplit('.', 1)
cluster_mod = importlib.import_module(module_name)
path_resolver_class = getattr(cluster_mod, resolver_class_name)
path_resolver = path_resolver_class(context, project)
return path_resolver
class KafkaPathResolverMixin(object):
"""Mixin to automatically provide pluggable path resolution functionality to any class using it.
Keep life simple, and don't add a constructor to this class:
Since use of a mixin entails multiple inheritence, it is *much* simpler to reason about the interaction of this
class with subclasses if we don't have to worry about method resolution order, constructor signatures etc.
"""
@property
def path(self):
if not hasattr(self, "_path"):
setattr(self, "_path", create_path_resolver(self.context, "kafka"))
if hasattr(self.context, "logger") and self.context.logger is not None:
self.context.logger.debug("Using path resolver %s" % self._path.__class__.__name__)
return self._path
class KafkaSystemTestPathResolver(object):
"""Path resolver for Kafka system tests which assumes the following layout:
/opt/kafka-dev # Current version of kafka under test
/opt/kafka-0.9.0.1 # Example of an older version of kafka installed from tarball
/opt/kafka-<version> # Other previous versions of kafka
...
"""
def __init__(self, context, project="kafka"):
self.context = context
self.project = project
def home(self, node_or_version=DEV_BRANCH, project=None):
version = self._version(node_or_version)
home_dir = project or self.project
if version is not None:
home_dir += "-%s" % str(version)
return os.path.join(KAFKA_INSTALL_ROOT, home_dir)
def bin(self, node_or_version=DEV_BRANCH, project=None):
version = self._version(node_or_version)
return os.path.join(self.home(version, project=project), "bin")
def script(self, script_name, node_or_version=DEV_BRANCH, project=None):
version = self._version(node_or_version)
return os.path.join(self.bin(version, project=project), script_name)
def jar(self, jar_name, node_or_version=DEV_BRANCH, project=None):
version = self._version(node_or_version)
return os.path.join(self.home(version, project=project), JARS[str(version)][jar_name])
def scratch_space(self, service_instance):
return os.path.join(SCRATCH_ROOT, service_instance.service_id)
def _version(self, node_or_version):
if isinstance(node_or_version, KafkaVersion):
return node_or_version
else:
return get_version(node_or_version)

View File

@@ -0,0 +1,14 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

View File

@@ -0,0 +1,99 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import time
from ducktape.mark import matrix
from ducktape.mark import parametrize
from ducktape.mark.resource import cluster
from ducktape.tests.test import Test
from ducktape.utils.util import wait_until
from kafkatest.services.console_consumer import ConsoleConsumer
from kafkatest.services.kafka import KafkaService
from kafkatest.services.verifiable_producer import VerifiableProducer
from kafkatest.services.zookeeper import ZookeeperService
from kafkatest.utils.remote_account import line_count, file_exists
from kafkatest.version import LATEST_0_8_2
class ConsoleConsumerTest(Test):
"""Sanity checks on console consumer service class."""
def __init__(self, test_context):
super(ConsoleConsumerTest, self).__init__(test_context)
self.topic = "topic"
self.zk = ZookeeperService(test_context, num_nodes=1)
self.kafka = KafkaService(self.test_context, num_nodes=1, zk=self.zk, zk_chroot="/kafka",
topics={self.topic: {"partitions": 1, "replication-factor": 1}})
self.consumer = ConsoleConsumer(self.test_context, num_nodes=1, kafka=self.kafka, topic=self.topic)
def setUp(self):
self.zk.start()
@cluster(num_nodes=3)
@matrix(security_protocol=['PLAINTEXT', 'SSL'])
@cluster(num_nodes=4)
@matrix(security_protocol=['SASL_SSL'], sasl_mechanism=['PLAIN', 'SCRAM-SHA-256', 'SCRAM-SHA-512'])
@matrix(security_protocol=['SASL_PLAINTEXT', 'SASL_SSL'])
def test_lifecycle(self, security_protocol, sasl_mechanism='GSSAPI'):
"""Check that console consumer starts/stops properly, and that we are capturing log output."""
self.kafka.security_protocol = security_protocol
self.kafka.client_sasl_mechanism = sasl_mechanism
self.kafka.interbroker_sasl_mechanism = sasl_mechanism
self.kafka.start()
self.consumer.security_protocol = security_protocol
t0 = time.time()
self.consumer.start()
node = self.consumer.nodes[0]
wait_until(lambda: self.consumer.alive(node),
timeout_sec=20, backoff_sec=.2, err_msg="Consumer was too slow to start")
self.logger.info("consumer started in %s seconds " % str(time.time() - t0))
# Verify that log output is happening
wait_until(lambda: file_exists(node, ConsoleConsumer.LOG_FILE), timeout_sec=10,
err_msg="Timed out waiting for consumer log file to exist.")
wait_until(lambda: line_count(node, ConsoleConsumer.LOG_FILE) > 0, timeout_sec=1,
backoff_sec=.25, err_msg="Timed out waiting for log entries to start.")
# Verify no consumed messages
assert line_count(node, ConsoleConsumer.STDOUT_CAPTURE) == 0
self.consumer.stop_node(node)
@cluster(num_nodes=4)
def test_version(self):
"""Check that console consumer v0.8.2.X successfully starts and consumes messages."""
self.kafka.start()
num_messages = 1000
self.producer = VerifiableProducer(self.test_context, num_nodes=1, kafka=self.kafka, topic=self.topic,
max_messages=num_messages, throughput=1000)
self.producer.start()
self.producer.wait()
self.consumer.nodes[0].version = LATEST_0_8_2
self.consumer.new_consumer = False
self.consumer.consumer_timeout_ms = 1000
self.consumer.start()
self.consumer.wait()
num_consumed = len(self.consumer.messages_consumed[1])
num_produced = self.producer.num_acked
assert num_produced == num_consumed, "num_produced: %d, num_consumed: %d" % (num_produced, num_consumed)

View File

@@ -0,0 +1,58 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from ducktape.tests.test import Test
from ducktape.mark.resource import cluster
from kafkatest.services.kafka import KafkaService, config_property
from kafkatest.services.zookeeper import ZookeeperService
from kafkatest.utils import is_version
from kafkatest.version import LATEST_0_8_2, DEV_BRANCH
class KafkaVersionTest(Test):
"""Sanity checks on kafka versioning."""
def __init__(self, test_context):
super(KafkaVersionTest, self).__init__(test_context)
self.topic = "topic"
self.zk = ZookeeperService(test_context, num_nodes=1)
def setUp(self):
self.zk.start()
@cluster(num_nodes=2)
def test_0_8_2(self):
"""Test kafka service node-versioning api - verify that we can bring up a single-node 0.8.2.X cluster."""
self.kafka = KafkaService(self.test_context, num_nodes=1, zk=self.zk,
topics={self.topic: {"partitions": 1, "replication-factor": 1}})
node = self.kafka.nodes[0]
node.version = LATEST_0_8_2
self.kafka.start()
assert is_version(node, [LATEST_0_8_2], logger=self.logger)
@cluster(num_nodes=3)
def test_multi_version(self):
"""Test kafka service node-versioning api - ensure we can bring up a 2-node cluster, one on version 0.8.2.X,
the other on the current development branch."""
self.kafka = KafkaService(self.test_context, num_nodes=2, zk=self.zk,
topics={self.topic: {"partitions": 1, "replication-factor": 2}})
self.kafka.nodes[1].version = LATEST_0_8_2
self.kafka.nodes[1].config[config_property.INTER_BROKER_PROTOCOL_VERSION] = "0.8.2.X"
self.kafka.start()
assert is_version(self.kafka.nodes[0], [DEV_BRANCH.vstring], logger=self.logger)
assert is_version(self.kafka.nodes[1], [LATEST_0_8_2], logger=self.logger)

View File

@@ -0,0 +1,90 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from ducktape.mark import parametrize
from ducktape.mark.resource import cluster
from ducktape.tests.test import Test
from kafkatest.services.kafka import KafkaService
from kafkatest.services.performance import ProducerPerformanceService, ConsumerPerformanceService, EndToEndLatencyService
from kafkatest.services.performance import latency, compute_aggregate_throughput
from kafkatest.services.zookeeper import ZookeeperService
from kafkatest.version import DEV_BRANCH, LATEST_0_8_2, LATEST_0_9, LATEST_1_1, KafkaVersion
class PerformanceServiceTest(Test):
def __init__(self, test_context):
super(PerformanceServiceTest, self).__init__(test_context)
self.record_size = 100
self.num_records = 10000
self.topic = "topic"
self.zk = ZookeeperService(test_context, 1)
def setUp(self):
self.zk.start()
@cluster(num_nodes=5)
# We are keeping 0.8.2 here so that we don't inadvertently break support for it. Since this is just a sanity check,
# the overhead should be manageable.
@parametrize(version=str(LATEST_0_8_2), new_consumer=False)
@parametrize(version=str(LATEST_0_9), new_consumer=False)
@parametrize(version=str(LATEST_0_9))
@parametrize(version=str(LATEST_1_1), new_consumer=False)
@parametrize(version=str(DEV_BRANCH))
def test_version(self, version=str(LATEST_0_9), new_consumer=True):
"""
Sanity check out producer performance service - verify that we can run the service with a small
number of messages. The actual stats here are pretty meaningless since the number of messages is quite small.
"""
version = KafkaVersion(version)
self.kafka = KafkaService(
self.test_context, 1,
self.zk, topics={self.topic: {'partitions': 1, 'replication-factor': 1}}, version=version)
self.kafka.start()
# check basic run of producer performance
self.producer_perf = ProducerPerformanceService(
self.test_context, 1, self.kafka, topic=self.topic,
num_records=self.num_records, record_size=self.record_size,
throughput=1000000000, # Set impossibly for no throttling for equivalent behavior between 0.8.X and 0.9.X
version=version,
settings={
'acks': 1,
'batch.size': 8*1024,
'buffer.memory': 64*1024*1024})
self.producer_perf.run()
producer_perf_data = compute_aggregate_throughput(self.producer_perf)
# check basic run of end to end latency
self.end_to_end = EndToEndLatencyService(
self.test_context, 1, self.kafka,
topic=self.topic, num_records=self.num_records, version=version)
self.end_to_end.run()
end_to_end_data = latency(self.end_to_end.results[0]['latency_50th_ms'], self.end_to_end.results[0]['latency_99th_ms'], self.end_to_end.results[0]['latency_999th_ms'])
# check basic run of consumer performance service
self.consumer_perf = ConsumerPerformanceService(
self.test_context, 1, self.kafka, new_consumer=new_consumer,
topic=self.topic, version=version, messages=self.num_records)
self.consumer_perf.group = "test-consumer-group"
self.consumer_perf.run()
consumer_perf_data = compute_aggregate_throughput(self.consumer_perf)
return {
"producer_performance": producer_perf_data,
"end_to_end_latency": end_to_end_data,
"consumer_performance": consumer_perf_data
}

View File

@@ -0,0 +1,84 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from ducktape.mark import parametrize
from ducktape.mark.resource import cluster
from ducktape.tests.test import Test
from ducktape.utils.util import wait_until
from kafkatest.services.kafka import KafkaService
from kafkatest.services.verifiable_producer import VerifiableProducer
from kafkatest.services.zookeeper import ZookeeperService
from kafkatest.utils import is_version
from kafkatest.version import LATEST_0_8_2, LATEST_0_9, LATEST_0_10_0, LATEST_0_10_1, DEV_BRANCH, KafkaVersion
class TestVerifiableProducer(Test):
"""Sanity checks on verifiable producer service class."""
def __init__(self, test_context):
super(TestVerifiableProducer, self).__init__(test_context)
self.topic = "topic"
self.zk = ZookeeperService(test_context, num_nodes=1)
self.kafka = KafkaService(test_context, num_nodes=1, zk=self.zk,
topics={self.topic: {"partitions": 1, "replication-factor": 1}})
self.num_messages = 1000
# This will produce to source kafka cluster
self.producer = VerifiableProducer(test_context, num_nodes=1, kafka=self.kafka, topic=self.topic,
max_messages=self.num_messages, throughput=self.num_messages/5)
def setUp(self):
self.zk.start()
self.kafka.start()
@cluster(num_nodes=3)
@parametrize(producer_version=str(LATEST_0_8_2))
@parametrize(producer_version=str(LATEST_0_9))
@parametrize(producer_version=str(LATEST_0_10_0))
@parametrize(producer_version=str(LATEST_0_10_1))
@parametrize(producer_version=str(DEV_BRANCH))
def test_simple_run(self, producer_version=DEV_BRANCH):
"""
Test that we can start VerifiableProducer on the current branch snapshot version or against the 0.8.2 jar, and
verify that we can produce a small number of messages.
"""
node = self.producer.nodes[0]
node.version = KafkaVersion(producer_version)
self.producer.start()
wait_until(lambda: self.producer.num_acked > 5, timeout_sec=5,
err_msg="Producer failed to start in a reasonable amount of time.")
# using version.vstring (distutils.version.LooseVersion) is a tricky way of ensuring
# that this check works with DEV_BRANCH
# When running VerifiableProducer 0.8.X, both the current branch version and 0.8.X should show up because of the
# way verifiable producer pulls in some development directories into its classpath
#
# If the test fails here because 'ps .. | grep' couldn't find the process it means
# the login and grep that is_version() performs is slower than
# the time it takes the producer to produce its messages.
# Easy fix is to decrease throughput= above, the good fix is to make the producer
# not terminate until explicitly killed in this case.
if node.version <= LATEST_0_8_2:
assert is_version(node, [node.version.vstring, DEV_BRANCH.vstring], logger=self.logger)
else:
assert is_version(node, [node.version.vstring], logger=self.logger)
self.producer.wait()
num_produced = self.producer.num_acked
assert num_produced == self.num_messages, "num_produced: %d, num_messages: %d" % (num_produced, self.num_messages)

View File

@@ -0,0 +1,14 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

View File

@@ -0,0 +1,502 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import json
import os.path
import random
import signal
import time
import requests
from ducktape.errors import DucktapeError
from ducktape.services.service import Service
from ducktape.utils.util import wait_until
from kafkatest.directory_layout.kafka_path import KafkaPathResolverMixin
class ConnectServiceBase(KafkaPathResolverMixin, Service):
"""Base class for Kafka Connect services providing some common settings and functionality"""
PERSISTENT_ROOT = "/mnt/connect"
CONFIG_FILE = os.path.join(PERSISTENT_ROOT, "connect.properties")
# The log file contains normal log4j logs written using a file appender. stdout and stderr are handled separately
# so they can be used for other output, e.g. verifiable source & sink.
LOG_FILE = os.path.join(PERSISTENT_ROOT, "connect.log")
STDOUT_FILE = os.path.join(PERSISTENT_ROOT, "connect.stdout")
STDERR_FILE = os.path.join(PERSISTENT_ROOT, "connect.stderr")
LOG4J_CONFIG_FILE = os.path.join(PERSISTENT_ROOT, "connect-log4j.properties")
PID_FILE = os.path.join(PERSISTENT_ROOT, "connect.pid")
EXTERNAL_CONFIGS_FILE = os.path.join(PERSISTENT_ROOT, "connect-external-configs.properties")
CONNECT_REST_PORT = 8083
HEAP_DUMP_FILE = os.path.join(PERSISTENT_ROOT, "connect_heap_dump.bin")
# Currently the Connect worker supports waiting on three modes:
STARTUP_MODE_INSTANT = 'INSTANT'
"""STARTUP_MODE_INSTANT: Start Connect worker and return immediately"""
STARTUP_MODE_LOAD = 'LOAD'
"""STARTUP_MODE_LOAD: Start Connect worker and return after discovering and loading plugins"""
STARTUP_MODE_LISTEN = 'LISTEN'
"""STARTUP_MODE_LISTEN: Start Connect worker and return after opening the REST port."""
logs = {
"connect_log": {
"path": LOG_FILE,
"collect_default": True},
"connect_stdout": {
"path": STDOUT_FILE,
"collect_default": False},
"connect_stderr": {
"path": STDERR_FILE,
"collect_default": True},
"connect_heap_dump_file": {
"path": HEAP_DUMP_FILE,
"collect_default": True}
}
def __init__(self, context, num_nodes, kafka, files, startup_timeout_sec = 60):
super(ConnectServiceBase, self).__init__(context, num_nodes)
self.kafka = kafka
self.security_config = kafka.security_config.client_config()
self.files = files
self.startup_mode = self.STARTUP_MODE_LISTEN
self.startup_timeout_sec = startup_timeout_sec
self.environment = {}
self.external_config_template_func = None
def pids(self, node):
"""Return process ids for Kafka Connect processes."""
try:
return [pid for pid in node.account.ssh_capture("cat " + self.PID_FILE, callback=int)]
except:
return []
def set_configs(self, config_template_func, connector_config_templates=None):
"""
Set configurations for the worker and the connector to run on
it. These are not provided in the constructor because the worker
config generally needs access to ZK/Kafka services to
create the configuration.
"""
self.config_template_func = config_template_func
self.connector_config_templates = connector_config_templates
def set_external_configs(self, external_config_template_func):
"""
Set the properties that will be written in the external file properties
as used by the org.apache.kafka.common.config.provider.FileConfigProvider.
When this is used, the worker configuration must also enable the FileConfigProvider.
This is not provided in the constructor in case the worker
config generally needs access to ZK/Kafka services to
create the configuration.
"""
self.external_config_template_func = external_config_template_func
def listening(self, node):
try:
self.list_connectors(node)
self.logger.debug("Connect worker started serving REST at: '%s:%s')", node.account.hostname,
self.CONNECT_REST_PORT)
return True
except requests.exceptions.ConnectionError:
self.logger.debug("REST resources are not loaded yet")
return False
def start(self, mode=STARTUP_MODE_LISTEN):
self.startup_mode = mode
super(ConnectServiceBase, self).start()
def start_and_return_immediately(self, node, worker_type, remote_connector_configs):
cmd = self.start_cmd(node, remote_connector_configs)
self.logger.debug("Connect %s command: %s", worker_type, cmd)
node.account.ssh(cmd)
def start_and_wait_to_load_plugins(self, node, worker_type, remote_connector_configs):
with node.account.monitor_log(self.LOG_FILE) as monitor:
self.start_and_return_immediately(node, worker_type, remote_connector_configs)
monitor.wait_until('Kafka version', timeout_sec=self.startup_timeout_sec,
err_msg="Never saw message indicating Kafka Connect finished startup on node: " +
"%s in condition mode: %s" % (str(node.account), self.startup_mode))
def start_and_wait_to_start_listening(self, node, worker_type, remote_connector_configs):
self.start_and_return_immediately(node, worker_type, remote_connector_configs)
wait_until(lambda: self.listening(node), timeout_sec=self.startup_timeout_sec,
err_msg="Kafka Connect failed to start on node: %s in condition mode: %s" %
(str(node.account), self.startup_mode))
def stop_node(self, node, clean_shutdown=True):
self.logger.info((clean_shutdown and "Cleanly" or "Forcibly") + " stopping Kafka Connect on " + str(node.account))
pids = self.pids(node)
sig = signal.SIGTERM if clean_shutdown else signal.SIGKILL
for pid in pids:
node.account.signal(pid, sig, allow_fail=True)
if clean_shutdown:
for pid in pids:
wait_until(lambda: not node.account.alive(pid), timeout_sec=self.startup_timeout_sec, err_msg="Kafka Connect process on " + str(
node.account) + " took too long to exit")
node.account.ssh("rm -f " + self.PID_FILE, allow_fail=False)
def restart(self, clean_shutdown=True):
# We don't want to do any clean up here, just restart the process.
for node in self.nodes:
self.logger.info("Restarting Kafka Connect on " + str(node.account))
self.restart_node(node, clean_shutdown)
def restart_node(self, node, clean_shutdown=True):
self.stop_node(node, clean_shutdown)
self.start_node(node)
def clean_node(self, node):
node.account.kill_process("connect", clean_shutdown=False, allow_fail=True)
self.security_config.clean_node(node)
other_files = " ".join(self.config_filenames() + self.files)
node.account.ssh("rm -rf -- %s %s" % (ConnectServiceBase.PERSISTENT_ROOT, other_files), allow_fail=False)
def config_filenames(self):
return [os.path.join(self.PERSISTENT_ROOT, "connect-connector-" + str(idx) + ".properties") for idx, template in enumerate(self.connector_config_templates or [])]
def list_connectors(self, node=None, **kwargs):
return self._rest_with_retry('/connectors', node=node, **kwargs)
def create_connector(self, config, node=None, **kwargs):
create_request = {
'name': config['name'],
'config': config
}
return self._rest_with_retry('/connectors', create_request, node=node, method="POST", **kwargs)
def get_connector(self, name, node=None, **kwargs):
return self._rest_with_retry('/connectors/' + name, node=node, **kwargs)
def get_connector_config(self, name, node=None, **kwargs):
return self._rest_with_retry('/connectors/' + name + '/config', node=node, **kwargs)
def set_connector_config(self, name, config, node=None, **kwargs):
# Unlike many other calls, a 409 when setting a connector config is expected if the connector already exists.
# However, we also might see 409s for other reasons (e.g. rebalancing). So we still perform retries at the cost
# of tests possibly taking longer to ultimately fail. Tests that care about this can explicitly override the
# number of retries.
return self._rest_with_retry('/connectors/' + name + '/config', config, node=node, method="PUT", **kwargs)
def get_connector_tasks(self, name, node=None, **kwargs):
return self._rest_with_retry('/connectors/' + name + '/tasks', node=node, **kwargs)
def delete_connector(self, name, node=None, **kwargs):
return self._rest_with_retry('/connectors/' + name, node=node, method="DELETE", **kwargs)
def get_connector_status(self, name, node=None):
return self._rest('/connectors/' + name + '/status', node=node)
def restart_connector(self, name, node=None, **kwargs):
return self._rest_with_retry('/connectors/' + name + '/restart', node=node, method="POST", **kwargs)
def restart_task(self, connector_name, task_id, node=None):
return self._rest('/connectors/' + connector_name + '/tasks/' + str(task_id) + '/restart', node=node, method="POST")
def pause_connector(self, name, node=None):
return self._rest('/connectors/' + name + '/pause', node=node, method="PUT")
def resume_connector(self, name, node=None):
return self._rest('/connectors/' + name + '/resume', node=node, method="PUT")
def list_connector_plugins(self, node=None):
return self._rest('/connector-plugins/', node=node)
def validate_config(self, connector_type, validate_request, node=None):
return self._rest('/connector-plugins/' + connector_type + '/config/validate', validate_request, node=node, method="PUT")
def _rest(self, path, body=None, node=None, method="GET"):
if node is None:
node = random.choice(self.nodes)
meth = getattr(requests, method.lower())
url = self._base_url(node) + path
self.logger.debug("Kafka Connect REST request: %s %s %s %s", node.account.hostname, url, method, body)
resp = meth(url, json=body)
self.logger.debug("%s %s response: %d", url, method, resp.status_code)
if resp.status_code > 400:
self.logger.debug("Connect REST API error for %s: %d %s", resp.url, resp.status_code, resp.text)
raise ConnectRestError(resp.status_code, resp.text, resp.url)
if resp.status_code == 204 or resp.status_code == 202:
return None
else:
return resp.json()
def _rest_with_retry(self, path, body=None, node=None, method="GET", retries=40, retry_backoff=.25):
"""
Invokes a REST API with retries for errors that may occur during normal operation (notably 409 CONFLICT
responses that can occur due to rebalancing or 404 when the connect resources are not initialized yet).
"""
exception_to_throw = None
for i in range(0, retries + 1):
try:
return self._rest(path, body, node, method)
except ConnectRestError as e:
exception_to_throw = e
if e.status != 409 and e.status != 404:
break
time.sleep(retry_backoff)
raise exception_to_throw
def _base_url(self, node):
return 'http://' + node.account.externally_routable_ip + ':' + str(self.CONNECT_REST_PORT)
def append_to_environment_variable(self, envvar, value):
env_opts = self.environment[envvar]
if env_opts is None:
env_opts = "\"%s\"" % value
else:
env_opts = "\"%s %s\"" % (env_opts.strip('\"'), value)
self.environment[envvar] = env_opts
class ConnectStandaloneService(ConnectServiceBase):
"""Runs Kafka Connect in standalone mode."""
def __init__(self, context, kafka, files, startup_timeout_sec = 60):
super(ConnectStandaloneService, self).__init__(context, 1, kafka, files, startup_timeout_sec)
# For convenience since this service only makes sense with a single node
@property
def node(self):
return self.nodes[0]
def start_cmd(self, node, connector_configs):
cmd = "( export KAFKA_LOG4J_OPTS=\"-Dlog4j.configuration=file:%s\"; " % self.LOG4J_CONFIG_FILE
heap_kafka_opts = "-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=%s" % \
self.logs["connect_heap_dump_file"]["path"]
other_kafka_opts = self.security_config.kafka_opts.strip('\"')
cmd += "export KAFKA_OPTS=\"%s %s\"; " % (heap_kafka_opts, other_kafka_opts)
for envvar in self.environment:
cmd += "export %s=%s; " % (envvar, str(self.environment[envvar]))
cmd += "%s %s " % (self.path.script("connect-standalone.sh", node), self.CONFIG_FILE)
cmd += " ".join(connector_configs)
cmd += " & echo $! >&3 ) 1>> %s 2>> %s 3> %s" % (self.STDOUT_FILE, self.STDERR_FILE, self.PID_FILE)
return cmd
def start_node(self, node):
node.account.ssh("mkdir -p %s" % self.PERSISTENT_ROOT, allow_fail=False)
self.security_config.setup_node(node)
if self.external_config_template_func:
node.account.create_file(self.EXTERNAL_CONFIGS_FILE, self.external_config_template_func(node))
node.account.create_file(self.CONFIG_FILE, self.config_template_func(node))
node.account.create_file(self.LOG4J_CONFIG_FILE, self.render('connect_log4j.properties', log_file=self.LOG_FILE))
remote_connector_configs = []
for idx, template in enumerate(self.connector_config_templates):
target_file = os.path.join(self.PERSISTENT_ROOT, "connect-connector-" + str(idx) + ".properties")
node.account.create_file(target_file, template)
remote_connector_configs.append(target_file)
self.logger.info("Starting Kafka Connect standalone process on " + str(node.account))
if self.startup_mode == self.STARTUP_MODE_LOAD:
self.start_and_wait_to_load_plugins(node, 'standalone', remote_connector_configs)
elif self.startup_mode == self.STARTUP_MODE_INSTANT:
self.start_and_return_immediately(node, 'standalone', remote_connector_configs)
else:
# The default mode is to wait until the complete startup of the worker
self.start_and_wait_to_start_listening(node, 'standalone', remote_connector_configs)
if len(self.pids(node)) == 0:
raise RuntimeError("No process ids recorded")
class ConnectDistributedService(ConnectServiceBase):
"""Runs Kafka Connect in distributed mode."""
def __init__(self, context, num_nodes, kafka, files, offsets_topic="connect-offsets",
configs_topic="connect-configs", status_topic="connect-status", startup_timeout_sec = 60):
super(ConnectDistributedService, self).__init__(context, num_nodes, kafka, files, startup_timeout_sec)
self.offsets_topic = offsets_topic
self.configs_topic = configs_topic
self.status_topic = status_topic
# connector_configs argument is intentionally ignored in distributed service.
def start_cmd(self, node, connector_configs):
cmd = "( export KAFKA_LOG4J_OPTS=\"-Dlog4j.configuration=file:%s\"; " % self.LOG4J_CONFIG_FILE
heap_kafka_opts = "-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=%s" % \
self.logs["connect_heap_dump_file"]["path"]
other_kafka_opts = self.security_config.kafka_opts.strip('\"')
cmd += "export KAFKA_OPTS=\"%s %s\"; " % (heap_kafka_opts, other_kafka_opts)
for envvar in self.environment:
cmd += "export %s=%s; " % (envvar, str(self.environment[envvar]))
cmd += "%s %s " % (self.path.script("connect-distributed.sh", node), self.CONFIG_FILE)
cmd += " & echo $! >&3 ) 1>> %s 2>> %s 3> %s" % (self.STDOUT_FILE, self.STDERR_FILE, self.PID_FILE)
return cmd
def start_node(self, node):
node.account.ssh("mkdir -p %s" % self.PERSISTENT_ROOT, allow_fail=False)
self.security_config.setup_node(node)
if self.external_config_template_func:
node.account.create_file(self.EXTERNAL_CONFIGS_FILE, self.external_config_template_func(node))
node.account.create_file(self.CONFIG_FILE, self.config_template_func(node))
node.account.create_file(self.LOG4J_CONFIG_FILE, self.render('connect_log4j.properties', log_file=self.LOG_FILE))
if self.connector_config_templates:
raise DucktapeError("Config files are not valid in distributed mode, submit connectors via the REST API")
self.logger.info("Starting Kafka Connect distributed process on " + str(node.account))
if self.startup_mode == self.STARTUP_MODE_LOAD:
self.start_and_wait_to_load_plugins(node, 'distributed', '')
elif self.startup_mode == self.STARTUP_MODE_INSTANT:
self.start_and_return_immediately(node, 'distributed', '')
else:
# The default mode is to wait until the complete startup of the worker
self.start_and_wait_to_start_listening(node, 'distributed', '')
if len(self.pids(node)) == 0:
raise RuntimeError("No process ids recorded")
class ErrorTolerance(object):
ALL = "all"
NONE = "none"
class ConnectRestError(RuntimeError):
def __init__(self, status, msg, url):
self.status = status
self.message = msg
self.url = url
def __unicode__(self):
return "Kafka Connect REST call failed: returned " + self.status + " for " + self.url + ". Response: " + self.message
class VerifiableConnector(object):
def messages(self):
"""
Collect and parse the logs from Kafka Connect nodes. Return a list containing all parsed JSON messages generated by
this source.
"""
self.logger.info("Collecting messages from log of %s %s", type(self).__name__, self.name)
records = []
for node in self.cc.nodes:
for line in node.account.ssh_capture('cat ' + self.cc.STDOUT_FILE):
try:
data = json.loads(line)
except ValueError:
self.logger.debug("Ignoring unparseable line: %s", line)
continue
# Filter to only ones matching our name to support multiple verifiable producers
if data['name'] != self.name:
continue
data['node'] = node
records.append(data)
return records
def stop(self):
self.logger.info("Destroying connector %s %s", type(self).__name__, self.name)
self.cc.delete_connector(self.name)
class VerifiableSource(VerifiableConnector):
"""
Helper class for running a verifiable source connector on a Kafka Connect cluster and analyzing the output.
"""
def __init__(self, cc, name="verifiable-source", tasks=1, topic="verifiable", throughput=1000):
self.cc = cc
self.logger = self.cc.logger
self.name = name
self.tasks = tasks
self.topic = topic
self.throughput = throughput
def committed_messages(self):
return filter(lambda m: 'committed' in m and m['committed'], self.messages())
def sent_messages(self):
return filter(lambda m: 'committed' not in m or not m['committed'], self.messages())
def start(self):
self.logger.info("Creating connector VerifiableSourceConnector %s", self.name)
self.cc.create_connector({
'name': self.name,
'connector.class': 'org.apache.kafka.connect.tools.VerifiableSourceConnector',
'tasks.max': self.tasks,
'topic': self.topic,
'throughput': self.throughput
})
class VerifiableSink(VerifiableConnector):
"""
Helper class for running a verifiable sink connector on a Kafka Connect cluster and analyzing the output.
"""
def __init__(self, cc, name="verifiable-sink", tasks=1, topics=["verifiable"]):
self.cc = cc
self.logger = self.cc.logger
self.name = name
self.tasks = tasks
self.topics = topics
def flushed_messages(self):
return filter(lambda m: 'flushed' in m and m['flushed'], self.messages())
def received_messages(self):
return filter(lambda m: 'flushed' not in m or not m['flushed'], self.messages())
def start(self):
self.logger.info("Creating connector VerifiableSinkConnector %s", self.name)
self.cc.create_connector({
'name': self.name,
'connector.class': 'org.apache.kafka.connect.tools.VerifiableSinkConnector',
'tasks.max': self.tasks,
'topics': ",".join(self.topics)
})
class MockSink(object):
def __init__(self, cc, topics, mode=None, delay_sec=10, name="mock-sink"):
self.cc = cc
self.logger = self.cc.logger
self.name = name
self.mode = mode
self.delay_sec = delay_sec
self.topics = topics
def start(self):
self.logger.info("Creating connector MockSinkConnector %s", self.name)
self.cc.create_connector({
'name': self.name,
'connector.class': 'org.apache.kafka.connect.tools.MockSinkConnector',
'tasks.max': 1,
'topics': ",".join(self.topics),
'mock_mode': self.mode,
'delay_ms': self.delay_sec * 1000
})
class MockSource(object):
def __init__(self, cc, mode=None, delay_sec=10, name="mock-source"):
self.cc = cc
self.logger = self.cc.logger
self.name = name
self.mode = mode
self.delay_sec = delay_sec
def start(self):
self.logger.info("Creating connector MockSourceConnector %s", self.name)
self.cc.create_connector({
'name': self.name,
'connector.class': 'org.apache.kafka.connect.tools.MockSourceConnector',
'tasks.max': 1,
'mock_mode': self.mode,
'delay_ms': self.delay_sec * 1000
})

View File

@@ -0,0 +1,315 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import itertools
import os
from ducktape.cluster.remoteaccount import RemoteCommandError
from ducktape.services.background_thread import BackgroundThreadService
from ducktape.utils.util import wait_until
from kafkatest.directory_layout.kafka_path import KafkaPathResolverMixin
from kafkatest.services.monitor.jmx import JmxMixin
from kafkatest.version import DEV_BRANCH, LATEST_0_8_2, LATEST_0_9, LATEST_0_10_0, V_0_9_0_0, V_0_10_0_0, V_0_11_0_0, V_2_0_0
"""
The console consumer is a tool that reads data from Kafka and outputs it to standard output.
"""
class ConsoleConsumer(KafkaPathResolverMixin, JmxMixin, BackgroundThreadService):
# Root directory for persistent output
PERSISTENT_ROOT = "/mnt/console_consumer"
STDOUT_CAPTURE = os.path.join(PERSISTENT_ROOT, "console_consumer.stdout")
STDERR_CAPTURE = os.path.join(PERSISTENT_ROOT, "console_consumer.stderr")
LOG_DIR = os.path.join(PERSISTENT_ROOT, "logs")
LOG_FILE = os.path.join(LOG_DIR, "console_consumer.log")
LOG4J_CONFIG = os.path.join(PERSISTENT_ROOT, "tools-log4j.properties")
CONFIG_FILE = os.path.join(PERSISTENT_ROOT, "console_consumer.properties")
JMX_TOOL_LOG = os.path.join(PERSISTENT_ROOT, "jmx_tool.log")
JMX_TOOL_ERROR_LOG = os.path.join(PERSISTENT_ROOT, "jmx_tool.err.log")
logs = {
"consumer_stdout": {
"path": STDOUT_CAPTURE,
"collect_default": False},
"consumer_stderr": {
"path": STDERR_CAPTURE,
"collect_default": False},
"consumer_log": {
"path": LOG_FILE,
"collect_default": True},
"jmx_log": {
"path" : JMX_TOOL_LOG,
"collect_default": False},
"jmx_err_log": {
"path": JMX_TOOL_ERROR_LOG,
"collect_default": False}
}
def __init__(self, context, num_nodes, kafka, topic, group_id="test-consumer-group", new_consumer=True,
message_validator=None, from_beginning=True, consumer_timeout_ms=None, version=DEV_BRANCH,
client_id="console-consumer", print_key=False, jmx_object_names=None, jmx_attributes=None,
enable_systest_events=False, stop_timeout_sec=35, print_timestamp=False, print_partition=False,
isolation_level="read_uncommitted", jaas_override_variables=None,
kafka_opts_override="", client_prop_file_override="", consumer_properties={}):
"""
Args:
context: standard context
num_nodes: number of nodes to use (this should be 1)
kafka: kafka service
topic: consume from this topic
new_consumer: use new Kafka consumer if True
message_validator: function which returns message or None
from_beginning: consume from beginning if True, else from the end
consumer_timeout_ms: corresponds to consumer.timeout.ms. consumer process ends if time between
successively consumed messages exceeds this timeout. Setting this and
waiting for the consumer to stop is a pretty good way to consume all messages
in a topic.
print_timestamp if True, print each message's timestamp as well
print_key if True, print each message's key as well
print_partition if True, print each message's partition as well
enable_systest_events if True, console consumer will print additional lifecycle-related information
only available in 0.10.0 and later.
stop_timeout_sec After stopping a node, wait up to stop_timeout_sec for the node to stop,
and the corresponding background thread to finish successfully.
isolation_level How to handle transactional messages.
jaas_override_variables A dict of variables to be used in the jaas.conf template file
kafka_opts_override Override parameters of the KAFKA_OPTS environment variable
client_prop_file_override Override client.properties file used by the consumer
consumer_properties A dict of values to pass in as --consumer-property key=value
"""
JmxMixin.__init__(self, num_nodes=num_nodes, jmx_object_names=jmx_object_names, jmx_attributes=(jmx_attributes or []),
root=ConsoleConsumer.PERSISTENT_ROOT)
BackgroundThreadService.__init__(self, context, num_nodes)
self.kafka = kafka
self.new_consumer = new_consumer
self.group_id = group_id
self.args = {
'topic': topic,
}
self.consumer_timeout_ms = consumer_timeout_ms
for node in self.nodes:
node.version = version
self.from_beginning = from_beginning
self.message_validator = message_validator
self.messages_consumed = {idx: [] for idx in range(1, num_nodes + 1)}
self.clean_shutdown_nodes = set()
self.client_id = client_id
self.print_key = print_key
self.print_partition = print_partition
self.log_level = "TRACE"
self.stop_timeout_sec = stop_timeout_sec
self.isolation_level = isolation_level
self.enable_systest_events = enable_systest_events
if self.enable_systest_events:
# Only available in 0.10.0 and up
assert version >= V_0_10_0_0
self.print_timestamp = print_timestamp
self.jaas_override_variables = jaas_override_variables or {}
self.kafka_opts_override = kafka_opts_override
self.client_prop_file_override = client_prop_file_override
self.consumer_properties = consumer_properties
def prop_file(self, node):
"""Return a string which can be used to create a configuration file appropriate for the given node."""
# Process client configuration
prop_file = self.render('console_consumer.properties')
if hasattr(node, "version") and node.version <= LATEST_0_8_2:
# in 0.8.2.X and earlier, console consumer does not have --timeout-ms option
# instead, we have to pass it through the config file
prop_file += "\nconsumer.timeout.ms=%s\n" % str(self.consumer_timeout_ms)
# Add security properties to the config. If security protocol is not specified,
# use the default in the template properties.
self.security_config = self.kafka.security_config.client_config(prop_file, node, self.jaas_override_variables)
self.security_config.setup_node(node)
prop_file += str(self.security_config)
return prop_file
def start_cmd(self, node):
"""Return the start command appropriate for the given node."""
args = self.args.copy()
args['zk_connect'] = self.kafka.zk_connect_setting()
args['stdout'] = ConsoleConsumer.STDOUT_CAPTURE
args['stderr'] = ConsoleConsumer.STDERR_CAPTURE
args['log_dir'] = ConsoleConsumer.LOG_DIR
args['log4j_config'] = ConsoleConsumer.LOG4J_CONFIG
args['config_file'] = ConsoleConsumer.CONFIG_FILE
args['stdout'] = ConsoleConsumer.STDOUT_CAPTURE
args['jmx_port'] = self.jmx_port
args['console_consumer'] = self.path.script("kafka-console-consumer.sh", node)
args['broker_list'] = self.kafka.bootstrap_servers(self.security_config.security_protocol)
if self.kafka_opts_override:
args['kafka_opts'] = "\"%s\"" % self.kafka_opts_override
else:
args['kafka_opts'] = self.security_config.kafka_opts
cmd = "export JMX_PORT=%(jmx_port)s; " \
"export LOG_DIR=%(log_dir)s; " \
"export KAFKA_LOG4J_OPTS=\"-Dlog4j.configuration=file:%(log4j_config)s\"; " \
"export KAFKA_OPTS=%(kafka_opts)s; " \
"%(console_consumer)s " \
"--topic %(topic)s " \
"--consumer.config %(config_file)s " % args
if self.new_consumer:
assert node.version >= V_0_9_0_0, \
"new_consumer is only supported if version >= 0.9.0.0, version %s" % str(node.version)
if node.version <= LATEST_0_10_0:
cmd += " --new-consumer"
cmd += " --bootstrap-server %(broker_list)s" % args
if node.version >= V_0_11_0_0:
cmd += " --isolation-level %s" % self.isolation_level
else:
assert node.version < V_2_0_0, \
"new_consumer==false is only supported if version < 2.0.0, version %s" % str(node.version)
cmd += " --zookeeper %(zk_connect)s" % args
if self.from_beginning:
cmd += " --from-beginning"
if self.consumer_timeout_ms is not None:
# version 0.8.X and below do not support --timeout-ms option
# This will be added in the properties file instead
if node.version > LATEST_0_8_2:
cmd += " --timeout-ms %s" % self.consumer_timeout_ms
if self.print_timestamp:
cmd += " --property print.timestamp=true"
if self.print_key:
cmd += " --property print.key=true"
if self.print_partition:
cmd += " --property print.partition=true"
# LoggingMessageFormatter was introduced after 0.9
if node.version > LATEST_0_9:
cmd += " --formatter kafka.tools.LoggingMessageFormatter"
if self.enable_systest_events:
# enable systest events is only available in 0.10.0 and later
# check the assertion here as well, in case node.version has been modified
assert node.version >= V_0_10_0_0
cmd += " --enable-systest-events"
if self.consumer_properties is not None:
for k, v in self.consumer_properties.items():
cmd += " --consumer-property %s=%s" % (k, v)
cmd += " 2>> %(stderr)s | tee -a %(stdout)s &" % args
return cmd
def pids(self, node):
return node.account.java_pids(self.java_class_name())
def alive(self, node):
return len(self.pids(node)) > 0
def _worker(self, idx, node):
node.account.ssh("mkdir -p %s" % ConsoleConsumer.PERSISTENT_ROOT, allow_fail=False)
# Create and upload config file
self.logger.info("console_consumer.properties:")
self.security_config = self.kafka.security_config.client_config(node=node,
jaas_override_variables=self.jaas_override_variables)
self.security_config.setup_node(node)
if self.client_prop_file_override:
prop_file = self.client_prop_file_override
else:
prop_file = self.prop_file(node)
self.logger.info(prop_file)
node.account.create_file(ConsoleConsumer.CONFIG_FILE, prop_file)
# Create and upload log properties
log_config = self.render('tools_log4j.properties', log_file=ConsoleConsumer.LOG_FILE)
node.account.create_file(ConsoleConsumer.LOG4J_CONFIG, log_config)
# Run and capture output
cmd = self.start_cmd(node)
self.logger.debug("Console consumer %d command: %s", idx, cmd)
consumer_output = node.account.ssh_capture(cmd, allow_fail=False)
with self.lock:
self.logger.debug("collecting following jmx objects: %s", self.jmx_object_names)
self.start_jmx_tool(idx, node)
for line in consumer_output:
msg = line.strip()
if msg == "shutdown_complete":
# Note that we can only rely on shutdown_complete message if running 0.10.0 or greater
if node in self.clean_shutdown_nodes:
raise Exception("Unexpected shutdown event from consumer, already shutdown. Consumer index: %d" % idx)
self.clean_shutdown_nodes.add(node)
else:
if self.message_validator is not None:
msg = self.message_validator(msg)
if msg is not None:
self.messages_consumed[idx].append(msg)
with self.lock:
self.read_jmx_output(idx, node)
def start_node(self, node):
BackgroundThreadService.start_node(self, node)
def stop_node(self, node):
self.logger.info("%s Stopping node %s" % (self.__class__.__name__, str(node.account)))
node.account.kill_java_processes(self.java_class_name(),
clean_shutdown=True, allow_fail=True)
stopped = self.wait_node(node, timeout_sec=self.stop_timeout_sec)
assert stopped, "Node %s: did not stop within the specified timeout of %s seconds" % \
(str(node.account), str(self.stop_timeout_sec))
def clean_node(self, node):
if self.alive(node):
self.logger.warn("%s %s was still alive at cleanup time. Killing forcefully..." %
(self.__class__.__name__, node.account))
JmxMixin.clean_node(self, node)
node.account.kill_java_processes(self.java_class_name(), clean_shutdown=False, allow_fail=True)
node.account.ssh("rm -rf %s" % ConsoleConsumer.PERSISTENT_ROOT, allow_fail=False)
self.security_config.clean_node(node)
def java_class_name(self):
return "ConsoleConsumer"
def has_log_message(self, node, message):
try:
node.account.ssh("grep '%s' %s" % (message, ConsoleConsumer.LOG_FILE))
except RemoteCommandError:
return False
return True
def wait_for_offset_reset(self, node, topic, num_partitions):
for partition in range(num_partitions):
message = "Resetting offset for partition %s-%d" % (topic, partition)
wait_until(lambda: self.has_log_message(node, message),
timeout_sec=60,
err_msg="Offset not reset for partition %s-%d" % (topic, partition))

View File

@@ -0,0 +1,21 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
Define Consumer configuration property names here.
"""
GROUP_INSTANCE_ID = "group.instance.id"
SESSION_TIMEOUT_MS = "session.timeout.ms"

View File

@@ -0,0 +1,102 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os.path
from kafkatest.directory_layout.kafka_path import KafkaPathResolverMixin
"""
Delegation tokens is a tool to manage the lifecycle of delegation tokens.
All commands are executed on a secured Kafka node reusing its generated jaas.conf and krb5.conf.
"""
class DelegationTokens(KafkaPathResolverMixin):
def __init__(self, kafka, context):
self.client_properties_content = """
security.protocol=SASL_PLAINTEXT
sasl.kerberos.service.name=kafka
"""
self.context = context
self.command_path = self.path.script("kafka-delegation-tokens.sh")
self.kafka_opts = "KAFKA_OPTS=\"-Djava.security.auth.login.config=/mnt/security/jaas.conf " \
"-Djava.security.krb5.conf=/mnt/security/krb5.conf\" "
self.kafka = kafka
self.bootstrap_server = " --bootstrap-server " + self.kafka.bootstrap_servers('SASL_PLAINTEXT')
self.base_cmd = self.kafka_opts + self.command_path + self.bootstrap_server
self.client_prop_path = os.path.join(self.kafka.PERSISTENT_ROOT, "client.properties")
self.jaas_deleg_conf_path = os.path.join(self.kafka.PERSISTENT_ROOT, "jaas_deleg.conf")
self.token_hmac_path = os.path.join(self.kafka.PERSISTENT_ROOT, "deleg_token_hmac.out")
self.delegation_token_out = os.path.join(self.kafka.PERSISTENT_ROOT, "delegation_token.out")
self.expire_delegation_token_out = os.path.join(self.kafka.PERSISTENT_ROOT, "expire_delegation_token.out")
self.renew_delegation_token_out = os.path.join(self.kafka.PERSISTENT_ROOT, "renew_delegation_token.out")
self.node = self.kafka.nodes[0]
def generate_delegation_token(self, maxlifetimeperiod=-1):
self.node.account.create_file(self.client_prop_path, self.client_properties_content)
cmd = self.base_cmd + " --create" \
" --max-life-time-period %s" \
" --command-config %s > %s" % (maxlifetimeperiod, self.client_prop_path, self.delegation_token_out)
self.node.account.ssh(cmd, allow_fail=False)
def expire_delegation_token(self, hmac):
cmd = self.base_cmd + " --expire" \
" --expiry-time-period -1" \
" --hmac %s" \
" --command-config %s > %s" % (hmac, self.client_prop_path, self.expire_delegation_token_out)
self.node.account.ssh(cmd, allow_fail=False)
def renew_delegation_token(self, hmac, renew_time_period=-1):
cmd = self.base_cmd + " --renew" \
" --renew-time-period %s" \
" --hmac %s" \
" --command-config %s > %s" \
% (renew_time_period, hmac, self.client_prop_path, self.renew_delegation_token_out)
return self.node.account.ssh_capture(cmd, allow_fail=False)
def create_jaas_conf_with_delegation_token(self):
dt = self.parse_delegation_token_out()
jaas_deleg_content = """
KafkaClient {
org.apache.kafka.common.security.scram.ScramLoginModule required
username="%s"
password="%s"
tokenauth=true;
};
""" % (dt["tokenid"], dt["hmac"])
self.node.account.create_file(self.jaas_deleg_conf_path, jaas_deleg_content)
return jaas_deleg_content
def token_hmac(self):
dt = self.parse_delegation_token_out()
return dt["hmac"]
def parse_delegation_token_out(self):
cmd = "tail -1 %s" % self.delegation_token_out
output_iter = self.node.account.ssh_capture(cmd, allow_fail=False)
output = ""
for line in output_iter:
output += line
tokenid, hmac, owner, renewers, issuedate, expirydate, maxdate = output.split()
return {"tokenid" : tokenid,
"hmac" : hmac,
"owner" : owner,
"renewers" : renewers,
"issuedate" : issuedate,
"expirydate" :expirydate,
"maxdate" : maxdate}

View File

@@ -0,0 +1,18 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from kafka import KafkaService
from util import TopicPartition
from config import KafkaConfig

View File

@@ -0,0 +1,48 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import config_property
class KafkaConfig(dict):
"""A dictionary-like container class which allows for definition of overridable default values,
which is also capable of "rendering" itself as a useable server.properties file.
"""
DEFAULTS = {
config_property.PORT: 9092,
config_property.SOCKET_RECEIVE_BUFFER_BYTES: 65536,
config_property.LOG_DIRS: "/mnt/kafka/kafka-data-logs-1,/mnt/kafka/kafka-data-logs-2",
config_property.ZOOKEEPER_CONNECTION_TIMEOUT_MS: 2000
}
def __init__(self, **kwargs):
super(KafkaConfig, self).__init__(**kwargs)
# Set defaults
for key, val in self.DEFAULTS.items():
if not self.has_key(key):
self[key] = val
def render(self):
"""Render self as a series of lines key=val\n, and do so in a consistent order. """
keys = [k for k in self.keys()]
keys.sort()
s = ""
for k in keys:
s += "%s=%s\n" % (k, str(self[k]))
return s

View File

@@ -0,0 +1,192 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
Define Kafka configuration property names here.
"""
BROKER_ID = "broker.id"
PORT = "port"
ADVERTISED_HOSTNAME = "advertised.host.name"
NUM_NETWORK_THREADS = "num.network.threads"
NUM_IO_THREADS = "num.io.threads"
SOCKET_SEND_BUFFER_BYTES = "socket.send.buffer.bytes"
SOCKET_RECEIVE_BUFFER_BYTES = "socket.receive.buffer.bytes"
SOCKET_REQUEST_MAX_BYTES = "socket.request.max.bytes"
LOG_DIRS = "log.dirs"
NUM_PARTITIONS = "num.partitions"
NUM_RECOVERY_THREADS_PER_DATA_DIR = "num.recovery.threads.per.data.dir"
LOG_RETENTION_HOURS = "log.retention.hours"
LOG_SEGMENT_BYTES = "log.segment.bytes"
LOG_RETENTION_CHECK_INTERVAL_MS = "log.retention.check.interval.ms"
LOG_RETENTION_MS = "log.retention.ms"
LOG_CLEANER_ENABLE = "log.cleaner.enable"
AUTO_CREATE_TOPICS_ENABLE = "auto.create.topics.enable"
ZOOKEEPER_CONNECT = "zookeeper.connect"
ZOOKEEPER_SSL_CLIENT_ENABLE = "zookeeper.ssl.client.enable"
ZOOKEEPER_CLIENT_CNXN_SOCKET = "zookeeper.clientCnxnSocket"
ZOOKEEPER_CONNECTION_TIMEOUT_MS = "zookeeper.connection.timeout.ms"
INTER_BROKER_PROTOCOL_VERSION = "inter.broker.protocol.version"
MESSAGE_FORMAT_VERSION = "log.message.format.version"
MESSAGE_TIMESTAMP_TYPE = "message.timestamp.type"
THROTTLING_REPLICATION_RATE_LIMIT = "replication.quota.throttled.rate"
LOG_FLUSH_INTERVAL_MESSAGE = "log.flush.interval.messages"
REPLICA_HIGHWATERMARK_CHECKPOINT_INTERVAL_MS = "replica.high.watermark.checkpoint.interval.ms"
LOG_ROLL_TIME_MS = "log.roll.ms"
OFFSETS_TOPIC_NUM_PARTITIONS = "offsets.topic.num.partitions"
DELEGATION_TOKEN_MAX_LIFETIME_MS="delegation.token.max.lifetime.ms"
DELEGATION_TOKEN_EXPIRY_TIME_MS="delegation.token.expiry.time.ms"
DELEGATION_TOKEN_MASTER_KEY="delegation.token.master.key"
SASL_ENABLED_MECHANISMS="sasl.enabled.mechanisms"
"""
From KafkaConfig.scala
/** ********* General Configuration ***********/
val MaxReservedBrokerIdProp = "reserved.broker.max.id"
val MessageMaxBytesProp = "message.max.bytes"
val NumIoThreadsProp = "num.io.threads"
val BackgroundThreadsProp = "background.threads"
val QueuedMaxRequestsProp = "queued.max.requests"
/** ********* Socket Server Configuration ***********/
val PortProp = "port"
val HostNameProp = "host.name"
val ListenersProp = "listeners"
val AdvertisedPortProp = "advertised.port"
val AdvertisedListenersProp = "advertised.listeners"
val SocketSendBufferBytesProp = "socket.send.buffer.bytes"
val SocketReceiveBufferBytesProp = "socket.receive.buffer.bytes"
val SocketRequestMaxBytesProp = "socket.request.max.bytes"
val MaxConnectionsPerIpProp = "max.connections.per.ip"
val MaxConnectionsPerIpOverridesProp = "max.connections.per.ip.overrides"
val ConnectionsMaxIdleMsProp = "connections.max.idle.ms"
/** ********* Log Configuration ***********/
val NumPartitionsProp = "num.partitions"
val LogDirsProp = "log.dirs"
val LogDirProp = "log.dir"
val LogSegmentBytesProp = "log.segment.bytes"
val LogRollTimeMillisProp = "log.roll.ms"
val LogRollTimeHoursProp = "log.roll.hours"
val LogRollTimeJitterMillisProp = "log.roll.jitter.ms"
val LogRollTimeJitterHoursProp = "log.roll.jitter.hours"
val LogRetentionTimeMillisProp = "log.retention.ms"
val LogRetentionTimeMinutesProp = "log.retention.minutes"
val LogRetentionTimeHoursProp = "log.retention.hours"
val LogRetentionBytesProp = "log.retention.bytes"
val LogCleanupIntervalMsProp = "log.retention.check.interval.ms"
val LogCleanupPolicyProp = "log.cleanup.policy"
val LogCleanerThreadsProp = "log.cleaner.threads"
val LogCleanerIoMaxBytesPerSecondProp = "log.cleaner.io.max.bytes.per.second"
val LogCleanerDedupeBufferSizeProp = "log.cleaner.dedupe.buffer.size"
val LogCleanerIoBufferSizeProp = "log.cleaner.io.buffer.size"
val LogCleanerDedupeBufferLoadFactorProp = "log.cleaner.io.buffer.load.factor"
val LogCleanerBackoffMsProp = "log.cleaner.backoff.ms"
val LogCleanerMinCleanRatioProp = "log.cleaner.min.cleanable.ratio"
val LogCleanerEnableProp = "log.cleaner.enable"
val LogCleanerDeleteRetentionMsProp = "log.cleaner.delete.retention.ms"
val LogIndexSizeMaxBytesProp = "log.index.size.max.bytes"
val LogIndexIntervalBytesProp = "log.index.interval.bytes"
val LogFlushIntervalMessagesProp = "log.flush.interval.messages"
val LogDeleteDelayMsProp = "log.segment.delete.delay.ms"
val LogFlushSchedulerIntervalMsProp = "log.flush.scheduler.interval.ms"
val LogFlushIntervalMsProp = "log.flush.interval.ms"
val LogFlushOffsetCheckpointIntervalMsProp = "log.flush.offset.checkpoint.interval.ms"
val LogPreAllocateProp = "log.preallocate"
val NumRecoveryThreadsPerDataDirProp = "num.recovery.threads.per.data.dir"
val MinInSyncReplicasProp = "min.insync.replicas"
/** ********* Replication configuration ***********/
val ControllerSocketTimeoutMsProp = "controller.socket.timeout.ms"
val DefaultReplicationFactorProp = "default.replication.factor"
val ReplicaLagTimeMaxMsProp = "replica.lag.time.max.ms"
val ReplicaSocketTimeoutMsProp = "replica.socket.timeout.ms"
val ReplicaSocketReceiveBufferBytesProp = "replica.socket.receive.buffer.bytes"
val ReplicaFetchMaxBytesProp = "replica.fetch.max.bytes"
val ReplicaFetchWaitMaxMsProp = "replica.fetch.wait.max.ms"
val ReplicaFetchMinBytesProp = "replica.fetch.min.bytes"
val ReplicaFetchBackoffMsProp = "replica.fetch.backoff.ms"
val NumReplicaFetchersProp = "num.replica.fetchers"
val ReplicaHighWatermarkCheckpointIntervalMsProp = "replica.high.watermark.checkpoint.interval.ms"
val FetchPurgatoryPurgeIntervalRequestsProp = "fetch.purgatory.purge.interval.requests"
val ProducerPurgatoryPurgeIntervalRequestsProp = "producer.purgatory.purge.interval.requests"
val AutoLeaderRebalanceEnableProp = "auto.leader.rebalance.enable"
val LeaderImbalancePerBrokerPercentageProp = "leader.imbalance.per.broker.percentage"
val LeaderImbalanceCheckIntervalSecondsProp = "leader.imbalance.check.interval.seconds"
val UncleanLeaderElectionEnableProp = "unclean.leader.election.enable"
val InterBrokerSecurityProtocolProp = "security.inter.broker.protocol"
val InterBrokerProtocolVersionProp = "inter.broker.protocol.version"
/** ********* Controlled shutdown configuration ***********/
val ControlledShutdownMaxRetriesProp = "controlled.shutdown.max.retries"
val ControlledShutdownRetryBackoffMsProp = "controlled.shutdown.retry.backoff.ms"
val ControlledShutdownEnableProp = "controlled.shutdown.enable"
/** ********* Consumer coordinator configuration ***********/
val ConsumerMinSessionTimeoutMsProp = "consumer.min.session.timeout.ms"
val ConsumerMaxSessionTimeoutMsProp = "consumer.max.session.timeout.ms"
/** ********* Offset management configuration ***********/
val OffsetMetadataMaxSizeProp = "offset.metadata.max.bytes"
val OffsetsLoadBufferSizeProp = "offsets.load.buffer.size"
val OffsetsTopicReplicationFactorProp = "offsets.topic.replication.factor"
val OffsetsTopicPartitionsProp = "offsets.topic.num.partitions"
val OffsetsTopicSegmentBytesProp = "offsets.topic.segment.bytes"
val OffsetsTopicCompressionCodecProp = "offsets.topic.compression.codec"
val OffsetsRetentionMinutesProp = "offsets.retention.minutes"
val OffsetsRetentionCheckIntervalMsProp = "offsets.retention.check.interval.ms"
val OffsetCommitTimeoutMsProp = "offsets.commit.timeout.ms"
val OffsetCommitRequiredAcksProp = "offsets.commit.required.acks"
/** ********* Quota Configuration ***********/
val ProducerQuotaBytesPerSecondDefaultProp = "quota.producer.default"
val ConsumerQuotaBytesPerSecondDefaultProp = "quota.consumer.default"
val NumQuotaSamplesProp = "quota.window.num"
val QuotaWindowSizeSecondsProp = "quota.window.size.seconds"
val DeleteTopicEnableProp = "delete.topic.enable"
val CompressionTypeProp = "compression.type"
/** ********* Kafka Metrics Configuration ***********/
val MetricSampleWindowMsProp = CommonClientConfigs.METRICS_SAMPLE_WINDOW_MS_CONFIG
val MetricNumSamplesProp: String = CommonClientConfigs.METRICS_NUM_SAMPLES_CONFIG
val MetricReporterClassesProp: String = CommonClientConfigs.METRIC_REPORTER_CLASSES_CONFIG
/** ********* SSL Configuration ****************/
val PrincipalBuilderClassProp = SSLConfigs.PRINCIPAL_BUILDER_CLASS_CONFIG
val SSLProtocolProp = SSLConfigs.SSL_PROTOCOL_CONFIG
val SSLProviderProp = SSLConfigs.SSL_PROVIDER_CONFIG
val SSLCipherSuitesProp = SSLConfigs.SSL_CIPHER_SUITES_CONFIG
val SSLEnabledProtocolsProp = SSLConfigs.SSL_ENABLED_PROTOCOLS_CONFIG
val SSLKeystoreTypeProp = SSLConfigs.SSL_KEYSTORE_TYPE_CONFIG
val SSLKeystoreLocationProp = SSLConfigs.SSL_KEYSTORE_LOCATION_CONFIG
val SSLKeystorePasswordProp = SSLConfigs.SSL_KEYSTORE_PASSWORD_CONFIG
val SSLKeyPasswordProp = SSLConfigs.SSL_KEY_PASSWORD_CONFIG
val SSLTruststoreTypeProp = SSLConfigs.SSL_TRUSTSTORE_TYPE_CONFIG
val SSLTruststoreLocationProp = SSLConfigs.SSL_TRUSTSTORE_LOCATION_CONFIG
val SSLTruststorePasswordProp = SSLConfigs.SSL_TRUSTSTORE_PASSWORD_CONFIG
val SSLKeyManagerAlgorithmProp = SSLConfigs.SSL_KEYMANAGER_ALGORITHM_CONFIG
val SSLTrustManagerAlgorithmProp = SSLConfigs.SSL_TRUSTMANAGER_ALGORITHM_CONFIG
val SSLEndpointIdentificationAlgorithmProp = SSLConfigs.SSL_ENDPOINT_IDENTIFICATION_ALGORITHM_CONFIG
val SSLSecureRandomImplementationProp = SSLConfigs.SSL_SECURE_RANDOM_IMPLEMENTATION_CONFIG
val SSLClientAuthProp = SSLConfigs.SSL_CLIENT_AUTH_CONFIG
"""

View File

@@ -0,0 +1,897 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import collections
import json
import os.path
import re
import signal
import time
from ducktape.services.service import Service
from ducktape.utils.util import wait_until
from ducktape.cluster.remoteaccount import RemoteCommandError
from config import KafkaConfig
from kafkatest.directory_layout.kafka_path import KafkaPathResolverMixin
from kafkatest.services.kafka import config_property
from kafkatest.services.monitor.jmx import JmxMixin
from kafkatest.services.security.minikdc import MiniKdc
from kafkatest.services.security.listener_security_config import ListenerSecurityConfig
from kafkatest.services.security.security_config import SecurityConfig
from kafkatest.version import DEV_BRANCH, LATEST_0_10_0
class KafkaListener:
def __init__(self, name, port_number, security_protocol, open=False):
self.name = name
self.port_number = port_number
self.security_protocol = security_protocol
self.open = open
def listener(self):
return "%s://:%s" % (self.name, str(self.port_number))
def advertised_listener(self, node):
return "%s://%s:%s" % (self.name, node.account.hostname, str(self.port_number))
def listener_security_protocol(self):
return "%s:%s" % (self.name, self.security_protocol)
class KafkaService(KafkaPathResolverMixin, JmxMixin, Service):
PERSISTENT_ROOT = "/mnt/kafka"
STDOUT_STDERR_CAPTURE = os.path.join(PERSISTENT_ROOT, "server-start-stdout-stderr.log")
LOG4J_CONFIG = os.path.join(PERSISTENT_ROOT, "kafka-log4j.properties")
# Logs such as controller.log, server.log, etc all go here
OPERATIONAL_LOG_DIR = os.path.join(PERSISTENT_ROOT, "kafka-operational-logs")
OPERATIONAL_LOG_INFO_DIR = os.path.join(OPERATIONAL_LOG_DIR, "info")
OPERATIONAL_LOG_DEBUG_DIR = os.path.join(OPERATIONAL_LOG_DIR, "debug")
# Kafka log segments etc go here
DATA_LOG_DIR_PREFIX = os.path.join(PERSISTENT_ROOT, "kafka-data-logs")
DATA_LOG_DIR_1 = "%s-1" % (DATA_LOG_DIR_PREFIX)
DATA_LOG_DIR_2 = "%s-2" % (DATA_LOG_DIR_PREFIX)
CONFIG_FILE = os.path.join(PERSISTENT_ROOT, "kafka.properties")
# Kafka Authorizer
ACL_AUTHORIZER = "kafka.security.authorizer.AclAuthorizer"
# Old Kafka Authorizer. This is deprecated but still supported.
SIMPLE_AUTHORIZER = "kafka.security.auth.SimpleAclAuthorizer"
HEAP_DUMP_FILE = os.path.join(PERSISTENT_ROOT, "kafka_heap_dump.bin")
INTERBROKER_LISTENER_NAME = 'INTERNAL'
JAAS_CONF_PROPERTY = "java.security.auth.login.config=/mnt/security/jaas.conf"
KRB5_CONF = "java.security.krb5.conf=/mnt/security/krb5.conf"
logs = {
"kafka_server_start_stdout_stderr": {
"path": STDOUT_STDERR_CAPTURE,
"collect_default": True},
"kafka_operational_logs_info": {
"path": OPERATIONAL_LOG_INFO_DIR,
"collect_default": True},
"kafka_operational_logs_debug": {
"path": OPERATIONAL_LOG_DEBUG_DIR,
"collect_default": False},
"kafka_data_1": {
"path": DATA_LOG_DIR_1,
"collect_default": False},
"kafka_data_2": {
"path": DATA_LOG_DIR_2,
"collect_default": False},
"kafka_heap_dump_file": {
"path": HEAP_DUMP_FILE,
"collect_default": True}
}
def __init__(self, context, num_nodes, zk, security_protocol=SecurityConfig.PLAINTEXT, interbroker_security_protocol=SecurityConfig.PLAINTEXT,
client_sasl_mechanism=SecurityConfig.SASL_MECHANISM_GSSAPI, interbroker_sasl_mechanism=SecurityConfig.SASL_MECHANISM_GSSAPI,
authorizer_class_name=None, topics=None, version=DEV_BRANCH, jmx_object_names=None,
jmx_attributes=None, zk_connect_timeout=5000, zk_session_timeout=6000, server_prop_overides=None, zk_chroot=None,
zk_client_secure=False,
listener_security_config=ListenerSecurityConfig(), per_node_server_prop_overrides=None, extra_kafka_opts=""):
"""
:param context: test context
:param ZookeeperService zk:
:param dict topics: which topics to create automatically
:param str security_protocol: security protocol for clients to use
:param str interbroker_security_protocol: security protocol to use for broker-to-broker communication
:param str client_sasl_mechanism: sasl mechanism for clients to use
:param str interbroker_sasl_mechanism: sasl mechanism to use for broker-to-broker communication
:param str authorizer_class_name: which authorizer class to use
:param str version: which kafka version to use. Defaults to "dev" branch
:param jmx_object_names:
:param jmx_attributes:
:param int zk_connect_timeout:
:param int zk_session_timeout:
:param dict server_prop_overides: overrides for kafka.properties file
:param zk_chroot:
:param bool zk_client_secure: connect to Zookeeper over secure client port (TLS) when True
:param ListenerSecurityConfig listener_security_config: listener config to use
:param dict per_node_server_prop_overrides:
:param str extra_kafka_opts: jvm args to add to KAFKA_OPTS variable
"""
Service.__init__(self, context, num_nodes)
JmxMixin.__init__(self, num_nodes=num_nodes, jmx_object_names=jmx_object_names, jmx_attributes=(jmx_attributes or []),
root=KafkaService.PERSISTENT_ROOT)
self.zk = zk
self.security_protocol = security_protocol
self.client_sasl_mechanism = client_sasl_mechanism
self.topics = topics
self.minikdc = None
self.authorizer_class_name = authorizer_class_name
self.zk_set_acl = False
if server_prop_overides is None:
self.server_prop_overides = []
else:
self.server_prop_overides = server_prop_overides
if per_node_server_prop_overrides is None:
self.per_node_server_prop_overrides = {}
else:
self.per_node_server_prop_overrides = per_node_server_prop_overrides
self.log_level = "DEBUG"
self.zk_chroot = zk_chroot
self.zk_client_secure = zk_client_secure
self.listener_security_config = listener_security_config
self.extra_kafka_opts = extra_kafka_opts
#
# In a heavily loaded and not very fast machine, it is
# sometimes necessary to give more time for the zk client
# to have its session established, especially if the client
# is authenticating and waiting for the SaslAuthenticated
# in addition to the SyncConnected event.
#
# The default value for zookeeper.connect.timeout.ms is
# 2 seconds and here we increase it to 5 seconds, but
# it can be overridden by setting the corresponding parameter
# for this constructor.
self.zk_connect_timeout = zk_connect_timeout
# Also allow the session timeout to be provided explicitly,
# primarily so that test cases can depend on it when waiting
# e.g. brokers to deregister after a hard kill.
self.zk_session_timeout = zk_session_timeout
self.port_mappings = {
'PLAINTEXT': KafkaListener('PLAINTEXT', 9092, 'PLAINTEXT', False),
'SSL': KafkaListener('SSL', 9093, 'SSL', False),
'SASL_PLAINTEXT': KafkaListener('SASL_PLAINTEXT', 9094, 'SASL_PLAINTEXT', False),
'SASL_SSL': KafkaListener('SASL_SSL', 9095, 'SASL_SSL', False),
KafkaService.INTERBROKER_LISTENER_NAME:
KafkaListener(KafkaService.INTERBROKER_LISTENER_NAME, 9099, None, False)
}
self.interbroker_listener = None
self.setup_interbroker_listener(interbroker_security_protocol, self.listener_security_config.use_separate_interbroker_listener)
self.interbroker_sasl_mechanism = interbroker_sasl_mechanism
for node in self.nodes:
node.version = version
node.config = KafkaConfig(**{config_property.BROKER_ID: self.idx(node)})
def set_version(self, version):
for node in self.nodes:
node.version = version
@property
def interbroker_security_protocol(self):
return self.interbroker_listener.security_protocol
# this is required for backwards compatibility - there are a lot of tests that set this property explicitly
# meaning 'use one of the existing listeners that match given security protocol, do not use custom listener'
@interbroker_security_protocol.setter
def interbroker_security_protocol(self, security_protocol):
self.setup_interbroker_listener(security_protocol, use_separate_listener=False)
def setup_interbroker_listener(self, security_protocol, use_separate_listener=False):
self.listener_security_config.use_separate_interbroker_listener = use_separate_listener
if self.listener_security_config.use_separate_interbroker_listener:
# do not close existing port here since it is not used exclusively for interbroker communication
self.interbroker_listener = self.port_mappings[KafkaService.INTERBROKER_LISTENER_NAME]
self.interbroker_listener.security_protocol = security_protocol
else:
# close dedicated interbroker port, so it's not dangling in 'listeners' and 'advertised.listeners'
self.close_port(KafkaService.INTERBROKER_LISTENER_NAME)
self.interbroker_listener = self.port_mappings[security_protocol]
@property
def security_config(self):
config = SecurityConfig(self.context, self.security_protocol, self.interbroker_listener.security_protocol,
zk_sasl=self.zk.zk_sasl, zk_tls=self.zk_client_secure,
client_sasl_mechanism=self.client_sasl_mechanism,
interbroker_sasl_mechanism=self.interbroker_sasl_mechanism,
listener_security_config=self.listener_security_config)
for port in self.port_mappings.values():
if port.open:
config.enable_security_protocol(port.security_protocol)
return config
def open_port(self, listener_name):
self.port_mappings[listener_name].open = True
def close_port(self, listener_name):
self.port_mappings[listener_name].open = False
def start_minikdc_if_necessary(self, add_principals=""):
if self.security_config.has_sasl:
if self.minikdc is None:
self.minikdc = MiniKdc(self.context, self.nodes, extra_principals = add_principals)
self.minikdc.start()
else:
self.minikdc = None
def alive(self, node):
return len(self.pids(node)) > 0
def start(self, add_principals="", use_zk_to_create_topic=True):
if self.zk_client_secure and not self.zk.zk_client_secure_port:
raise Exception("Unable to start Kafka: TLS to Zookeeper requested but Zookeeper secure port not enabled")
self.open_port(self.security_protocol)
self.interbroker_listener.open = True
self.start_minikdc_if_necessary(add_principals)
self._ensure_zk_chroot()
Service.start(self)
self.logger.info("Waiting for brokers to register at ZK")
retries = 30
expected_broker_ids = set(self.nodes)
wait_until(lambda: {node for node in self.nodes if self.is_registered(node)} == expected_broker_ids, 30, 1)
if retries == 0:
raise RuntimeError("Kafka servers didn't register at ZK within 30 seconds")
# Create topics if necessary
if self.topics is not None:
for topic, topic_cfg in self.topics.items():
if topic_cfg is None:
topic_cfg = {}
topic_cfg["topic"] = topic
self.create_topic(topic_cfg, use_zk_to_create_topic=use_zk_to_create_topic)
def _ensure_zk_chroot(self):
self.logger.info("Ensuring zk_chroot %s exists", self.zk_chroot)
if self.zk_chroot:
if not self.zk_chroot.startswith('/'):
raise Exception("Zookeeper chroot must start with '/' but found " + self.zk_chroot)
parts = self.zk_chroot.split('/')[1:]
for i in range(len(parts)):
self.zk.create('/' + '/'.join(parts[:i+1]))
def set_protocol_and_port(self, node):
listeners = []
advertised_listeners = []
protocol_map = []
for port in self.port_mappings.values():
if port.open:
listeners.append(port.listener())
advertised_listeners.append(port.advertised_listener(node))
protocol_map.append(port.listener_security_protocol())
self.listeners = ','.join(listeners)
self.advertised_listeners = ','.join(advertised_listeners)
self.listener_security_protocol_map = ','.join(protocol_map)
self.interbroker_bootstrap_servers = self.__bootstrap_servers(self.interbroker_listener, True)
def prop_file(self, node):
self.set_protocol_and_port(node)
#load template configs as dictionary
config_template = self.render('kafka.properties', node=node, broker_id=self.idx(node),
security_config=self.security_config, num_nodes=self.num_nodes,
listener_security_config=self.listener_security_config)
configs = dict( l.rstrip().split('=', 1) for l in config_template.split('\n')
if not l.startswith("#") and "=" in l )
#load specific test override configs
override_configs = KafkaConfig(**node.config)
override_configs[config_property.ADVERTISED_HOSTNAME] = node.account.hostname
override_configs[config_property.ZOOKEEPER_CONNECT] = self.zk_connect_setting()
if self.zk_client_secure:
override_configs[config_property.ZOOKEEPER_SSL_CLIENT_ENABLE] = 'true'
override_configs[config_property.ZOOKEEPER_CLIENT_CNXN_SOCKET] = 'org.apache.zookeeper.ClientCnxnSocketNetty'
else:
override_configs[config_property.ZOOKEEPER_SSL_CLIENT_ENABLE] = 'false'
for prop in self.server_prop_overides:
override_configs[prop[0]] = prop[1]
for prop in self.per_node_server_prop_overrides.get(self.idx(node), []):
override_configs[prop[0]] = prop[1]
#update template configs with test override configs
configs.update(override_configs)
prop_file = self.render_configs(configs)
return prop_file
def render_configs(self, configs):
"""Render self as a series of lines key=val\n, and do so in a consistent order. """
keys = [k for k in configs.keys()]
keys.sort()
s = ""
for k in keys:
s += "%s=%s\n" % (k, str(configs[k]))
return s
def start_cmd(self, node):
cmd = "export JMX_PORT=%d; " % self.jmx_port
cmd += "export KAFKA_LOG4J_OPTS=\"-Dlog4j.configuration=file:%s\"; " % self.LOG4J_CONFIG
heap_kafka_opts = "-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=%s" % \
self.logs["kafka_heap_dump_file"]["path"]
security_kafka_opts = self.security_config.kafka_opts.strip('\"')
cmd += "export KAFKA_OPTS=\"%s %s %s\"; " % (heap_kafka_opts, security_kafka_opts, self.extra_kafka_opts)
cmd += "%s %s 1>> %s 2>> %s &" % \
(self.path.script("kafka-server-start.sh", node),
KafkaService.CONFIG_FILE,
KafkaService.STDOUT_STDERR_CAPTURE,
KafkaService.STDOUT_STDERR_CAPTURE)
return cmd
def start_node(self, node, timeout_sec=60):
node.account.mkdirs(KafkaService.PERSISTENT_ROOT)
prop_file = self.prop_file(node)
self.logger.info("kafka.properties:")
self.logger.info(prop_file)
node.account.create_file(KafkaService.CONFIG_FILE, prop_file)
node.account.create_file(self.LOG4J_CONFIG, self.render('log4j.properties', log_dir=KafkaService.OPERATIONAL_LOG_DIR))
self.security_config.setup_node(node)
self.security_config.setup_credentials(node, self.path, self.zk_connect_setting(), broker=True)
cmd = self.start_cmd(node)
self.logger.debug("Attempting to start KafkaService on %s with command: %s" % (str(node.account), cmd))
with node.account.monitor_log(KafkaService.STDOUT_STDERR_CAPTURE) as monitor:
node.account.ssh(cmd)
# Kafka 1.0.0 and higher don't have a space between "Kafka" and "Server"
monitor.wait_until("Kafka\s*Server.*started", timeout_sec=timeout_sec, backoff_sec=.25,
err_msg="Kafka server didn't finish startup in %d seconds" % timeout_sec)
# Credentials for inter-broker communication are created before starting Kafka.
# Client credentials are created after starting Kafka so that both loading of
# existing credentials from ZK and dynamic update of credentials in Kafka are tested.
self.security_config.setup_credentials(node, self.path, self.zk_connect_setting(), broker=False)
self.start_jmx_tool(self.idx(node), node)
if len(self.pids(node)) == 0:
raise Exception("No process ids recorded on node %s" % node.account.hostname)
def pids(self, node):
"""Return process ids associated with running processes on the given node."""
try:
cmd = "jcmd | grep -e %s | awk '{print $1}'" % self.java_class_name()
pid_arr = [pid for pid in node.account.ssh_capture(cmd, allow_fail=True, callback=int)]
return pid_arr
except (RemoteCommandError, ValueError) as e:
return []
def signal_node(self, node, sig=signal.SIGTERM):
pids = self.pids(node)
for pid in pids:
node.account.signal(pid, sig)
def signal_leader(self, topic, partition=0, sig=signal.SIGTERM):
leader = self.leader(topic, partition)
self.signal_node(leader, sig)
def stop_node(self, node, clean_shutdown=True, timeout_sec=60):
pids = self.pids(node)
sig = signal.SIGTERM if clean_shutdown else signal.SIGKILL
for pid in pids:
node.account.signal(pid, sig, allow_fail=False)
try:
wait_until(lambda: len(self.pids(node)) == 0, timeout_sec=timeout_sec,
err_msg="Kafka node failed to stop in %d seconds" % timeout_sec)
except Exception:
self.thread_dump(node)
raise
def thread_dump(self, node):
for pid in self.pids(node):
try:
node.account.signal(pid, signal.SIGQUIT, allow_fail=True)
except:
self.logger.warn("Could not dump threads on node")
def clean_node(self, node):
JmxMixin.clean_node(self, node)
self.security_config.clean_node(node)
node.account.kill_java_processes(self.java_class_name(),
clean_shutdown=False, allow_fail=True)
node.account.ssh("sudo rm -rf -- %s" % KafkaService.PERSISTENT_ROOT, allow_fail=False)
def _kafka_topics_cmd(self, node, use_zk_connection=True):
"""
Returns kafka-topics.sh command path with jaas configuration and krb5 environment variable
set. If Admin client is not going to be used, don't set the environment variable.
"""
kafka_topic_script = self.path.script("kafka-topics.sh", node)
skip_security_settings = use_zk_connection or not node.version.topic_command_supports_bootstrap_server()
return kafka_topic_script if skip_security_settings else \
"KAFKA_OPTS='-D%s -D%s' %s" % (KafkaService.JAAS_CONF_PROPERTY, KafkaService.KRB5_CONF, kafka_topic_script)
def _kafka_topics_cmd_config(self, node, use_zk_connection=True):
"""
Return --command-config parameter to the kafka-topics.sh command. The config parameter specifies
the security settings that AdminClient uses to connect to a secure kafka server.
"""
skip_command_config = use_zk_connection or not node.version.topic_command_supports_bootstrap_server()
return "" if skip_command_config else " --command-config <(echo '%s')" % (self.security_config.client_config())
def create_topic(self, topic_cfg, node=None, use_zk_to_create_topic=True):
"""Run the admin tool create topic command.
Specifying node is optional, and may be done if for different kafka nodes have different versions,
and we care where command gets run.
If the node is not specified, run the command from self.nodes[0]
"""
if node is None:
node = self.nodes[0]
self.logger.info("Creating topic %s with settings %s",
topic_cfg["topic"], topic_cfg)
use_zk_connection = topic_cfg.get('if-not-exists', False) or use_zk_to_create_topic
cmd = "%(kafka_topics_cmd)s %(connection_string)s --create --topic %(topic)s " % {
'kafka_topics_cmd': self._kafka_topics_cmd(node, use_zk_connection),
'connection_string': self._connect_setting(node, use_zk_connection),
'topic': topic_cfg.get("topic"),
}
if 'replica-assignment' in topic_cfg:
cmd += " --replica-assignment %(replica-assignment)s" % {
'replica-assignment': topic_cfg.get('replica-assignment')
}
else:
cmd += " --partitions %(partitions)d --replication-factor %(replication-factor)d" % {
'partitions': topic_cfg.get('partitions', 1),
'replication-factor': topic_cfg.get('replication-factor', 1)
}
if topic_cfg.get('if-not-exists', False):
cmd += ' --if-not-exists'
if "configs" in topic_cfg.keys() and topic_cfg["configs"] is not None:
for config_name, config_value in topic_cfg["configs"].items():
cmd += " --config %s=%s" % (config_name, str(config_value))
cmd += self._kafka_topics_cmd_config(node, use_zk_connection)
self.logger.info("Running topic creation command...\n%s" % cmd)
node.account.ssh(cmd)
def delete_topic(self, topic, node=None):
"""
Delete a topic with the topics command
:param topic:
:param node:
:return:
"""
if node is None:
node = self.nodes[0]
self.logger.info("Deleting topic %s" % topic)
kafka_topic_script = self.path.script("kafka-topics.sh", node)
cmd = kafka_topic_script + " "
cmd += "--bootstrap-server %(bootstrap_servers)s --delete --topic %(topic)s " % {
'bootstrap_servers': self.bootstrap_servers(self.security_protocol),
'topic': topic
}
self.logger.info("Running topic delete command...\n%s" % cmd)
node.account.ssh(cmd)
def describe_topic(self, topic, node=None, use_zk_to_describe_topic=True):
if node is None:
node = self.nodes[0]
cmd = "%s %s --topic %s --describe %s" % \
(self._kafka_topics_cmd(node=node, use_zk_connection=use_zk_to_describe_topic),
self._connect_setting(node=node, use_zk_connection=use_zk_to_describe_topic),
topic, self._kafka_topics_cmd_config(node=node, use_zk_connection=use_zk_to_describe_topic))
self.logger.info("Running topic describe command...\n%s" % cmd)
output = ""
for line in node.account.ssh_capture(cmd):
output += line
return output
def list_topics(self, node=None, use_zk_to_list_topic=True):
if node is None:
node = self.nodes[0]
cmd = "%s %s --list %s" % (self._kafka_topics_cmd(node, use_zk_to_list_topic),
self._connect_setting(node, use_zk_to_list_topic),
self._kafka_topics_cmd_config(node, use_zk_to_list_topic))
for line in node.account.ssh_capture(cmd):
if not line.startswith("SLF4J"):
yield line.rstrip()
def alter_message_format(self, topic, msg_format_version, node=None):
if node is None:
node = self.nodes[0]
self.logger.info("Altering message format version for topic %s with format %s", topic, msg_format_version)
cmd = "%s --zookeeper %s %s --entity-name %s --entity-type topics --alter --add-config message.format.version=%s" % \
(self.path.script("kafka-configs.sh", node), self.zk_connect_setting(), self.zk.zkTlsConfigFileOption(), topic, msg_format_version)
self.logger.info("Running alter message format command...\n%s" % cmd)
node.account.ssh(cmd)
def set_unclean_leader_election(self, topic, value=True, node=None):
if node is None:
node = self.nodes[0]
if value is True:
self.logger.info("Enabling unclean leader election for topic %s", topic)
else:
self.logger.info("Disabling unclean leader election for topic %s", topic)
cmd = "%s --zookeeper %s %s --entity-name %s --entity-type topics --alter --add-config unclean.leader.election.enable=%s" % \
(self.path.script("kafka-configs.sh", node), self.zk_connect_setting(), self.zk.zkTlsConfigFileOption(), topic, str(value).lower())
self.logger.info("Running alter unclean leader command...\n%s" % cmd)
node.account.ssh(cmd)
def parse_describe_topic(self, topic_description):
"""Parse output of kafka-topics.sh --describe (or describe_topic() method above), which is a string of form
PartitionCount:2\tReplicationFactor:2\tConfigs:
Topic: test_topic\ttPartition: 0\tLeader: 3\tReplicas: 3,1\tIsr: 3,1
Topic: test_topic\tPartition: 1\tLeader: 1\tReplicas: 1,2\tIsr: 1,2
into a dictionary structure appropriate for use with reassign-partitions tool:
{
"partitions": [
{"topic": "test_topic", "partition": 0, "replicas": [3, 1]},
{"topic": "test_topic", "partition": 1, "replicas": [1, 2]}
]
}
"""
lines = map(lambda x: x.strip(), topic_description.split("\n"))
partitions = []
for line in lines:
m = re.match(".*Leader:.*", line)
if m is None:
continue
fields = line.split("\t")
# ["Partition: 4", "Leader: 0"] -> ["4", "0"]
fields = map(lambda x: x.split(" ")[1], fields)
partitions.append(
{"topic": fields[0],
"partition": int(fields[1]),
"replicas": map(int, fields[3].split(','))})
return {"partitions": partitions}
def verify_reassign_partitions(self, reassignment, node=None):
"""Run the reassign partitions admin tool in "verify" mode
"""
if node is None:
node = self.nodes[0]
json_file = "/tmp/%s_reassign.json" % str(time.time())
# reassignment to json
json_str = json.dumps(reassignment)
json_str = json.dumps(json_str)
# create command
cmd = "echo %s > %s && " % (json_str, json_file)
cmd += "%s " % self.path.script("kafka-reassign-partitions.sh", node)
cmd += "--zookeeper %s " % self.zk_connect_setting()
cmd += "--reassignment-json-file %s " % json_file
cmd += "--verify "
cmd += "&& sleep 1 && rm -f %s" % json_file
# send command
self.logger.info("Verifying partition reassignment...")
self.logger.debug(cmd)
output = ""
for line in node.account.ssh_capture(cmd):
output += line
self.logger.debug(output)
if re.match(".*Reassignment of partition.*failed.*",
output.replace('\n', '')) is not None:
return False
if re.match(".*is still in progress.*",
output.replace('\n', '')) is not None:
return False
return True
def execute_reassign_partitions(self, reassignment, node=None,
throttle=None):
"""Run the reassign partitions admin tool in "verify" mode
"""
if node is None:
node = self.nodes[0]
json_file = "/tmp/%s_reassign.json" % str(time.time())
# reassignment to json
json_str = json.dumps(reassignment)
json_str = json.dumps(json_str)
# create command
cmd = "echo %s > %s && " % (json_str, json_file)
cmd += "%s " % self.path.script( "kafka-reassign-partitions.sh", node)
cmd += "--zookeeper %s " % self.zk_connect_setting()
cmd += "--reassignment-json-file %s " % json_file
cmd += "--execute"
if throttle is not None:
cmd += " --throttle %d" % throttle
cmd += " && sleep 1 && rm -f %s" % json_file
# send command
self.logger.info("Executing parition reassignment...")
self.logger.debug(cmd)
output = ""
for line in node.account.ssh_capture(cmd):
output += line
self.logger.debug("Verify partition reassignment:")
self.logger.debug(output)
def search_data_files(self, topic, messages):
"""Check if a set of messages made it into the Kakfa data files. Note that
this method takes no account of replication. It simply looks for the
payload in all the partition files of the specified topic. 'messages' should be
an array of numbers. The list of missing messages is returned.
"""
payload_match = "payload: " + "$|payload: ".join(str(x) for x in messages) + "$"
found = set([])
self.logger.debug("number of unique missing messages we will search for: %d",
len(messages))
for node in self.nodes:
# Grab all .log files in directories prefixed with this topic
files = node.account.ssh_capture("find %s* -regex '.*/%s-.*/[^/]*.log'" % (KafkaService.DATA_LOG_DIR_PREFIX, topic))
# Check each data file to see if it contains the messages we want
for log in files:
cmd = "%s kafka.tools.DumpLogSegments --print-data-log --files %s | grep -E \"%s\"" % \
(self.path.script("kafka-run-class.sh", node), log.strip(), payload_match)
for line in node.account.ssh_capture(cmd, allow_fail=True):
for val in messages:
if line.strip().endswith("payload: "+str(val)):
self.logger.debug("Found %s in data-file [%s] in line: [%s]" % (val, log.strip(), line.strip()))
found.add(val)
self.logger.debug("Number of unique messages found in the log: %d",
len(found))
missing = list(set(messages) - found)
if len(missing) > 0:
self.logger.warn("The following values were not found in the data files: " + str(missing))
return missing
def restart_cluster(self, clean_shutdown=True, timeout_sec=60, after_each_broker_restart=None, *args):
for node in self.nodes:
self.restart_node(node, clean_shutdown=clean_shutdown, timeout_sec=timeout_sec)
if after_each_broker_restart is not None:
after_each_broker_restart(*args)
def restart_node(self, node, clean_shutdown=True, timeout_sec=60):
"""Restart the given node."""
self.stop_node(node, clean_shutdown, timeout_sec)
self.start_node(node, timeout_sec)
def isr_idx_list(self, topic, partition=0):
""" Get in-sync replica list the given topic and partition.
"""
self.logger.debug("Querying zookeeper to find in-sync replicas for topic %s and partition %d" % (topic, partition))
zk_path = "/brokers/topics/%s/partitions/%d/state" % (topic, partition)
partition_state = self.zk.query(zk_path, chroot=self.zk_chroot)
if partition_state is None:
raise Exception("Error finding partition state for topic %s and partition %d." % (topic, partition))
partition_state = json.loads(partition_state)
self.logger.info(partition_state)
isr_idx_list = partition_state["isr"]
self.logger.info("Isr for topic %s and partition %d is now: %s" % (topic, partition, isr_idx_list))
return isr_idx_list
def replicas(self, topic, partition=0):
""" Get the assigned replicas for the given topic and partition.
"""
self.logger.debug("Querying zookeeper to find assigned replicas for topic %s and partition %d" % (topic, partition))
zk_path = "/brokers/topics/%s" % (topic)
assignment = self.zk.query(zk_path, chroot=self.zk_chroot)
if assignment is None:
raise Exception("Error finding partition state for topic %s and partition %d." % (topic, partition))
assignment = json.loads(assignment)
self.logger.info(assignment)
replicas = assignment["partitions"][str(partition)]
self.logger.info("Assigned replicas for topic %s and partition %d is now: %s" % (topic, partition, replicas))
return [self.get_node(replica) for replica in replicas]
def leader(self, topic, partition=0):
""" Get the leader replica for the given topic and partition.
"""
self.logger.debug("Querying zookeeper to find leader replica for topic %s and partition %d" % (topic, partition))
zk_path = "/brokers/topics/%s/partitions/%d/state" % (topic, partition)
partition_state = self.zk.query(zk_path, chroot=self.zk_chroot)
if partition_state is None:
raise Exception("Error finding partition state for topic %s and partition %d." % (topic, partition))
partition_state = json.loads(partition_state)
self.logger.info(partition_state)
leader_idx = int(partition_state["leader"])
self.logger.info("Leader for topic %s and partition %d is now: %d" % (topic, partition, leader_idx))
return self.get_node(leader_idx)
def cluster_id(self):
""" Get the current cluster id
"""
self.logger.debug("Querying ZooKeeper to retrieve cluster id")
cluster = self.zk.query("/cluster/id", chroot=self.zk_chroot)
try:
return json.loads(cluster)['id'] if cluster else None
except:
self.logger.debug("Data in /cluster/id znode could not be parsed. Data = %s" % cluster)
raise
def check_protocol_errors(self, node):
""" Checks for common protocol exceptions due to invalid inter broker protocol handling.
While such errors can and should be checked in other ways, checking the logs is a worthwhile failsafe.
"""
for node in self.nodes:
exit_code = node.account.ssh("grep -e 'java.lang.IllegalArgumentException: Invalid version' -e SchemaException %s/*"
% KafkaService.OPERATIONAL_LOG_DEBUG_DIR, allow_fail=True)
if exit_code != 1:
return False
return True
def list_consumer_groups(self, node=None, command_config=None):
""" Get list of consumer groups.
"""
if node is None:
node = self.nodes[0]
consumer_group_script = self.path.script("kafka-consumer-groups.sh", node)
if command_config is None:
command_config = ""
else:
command_config = "--command-config " + command_config
cmd = "%s --bootstrap-server %s %s --list" % \
(consumer_group_script,
self.bootstrap_servers(self.security_protocol),
command_config)
output = ""
self.logger.debug(cmd)
for line in node.account.ssh_capture(cmd):
if not line.startswith("SLF4J"):
output += line
self.logger.debug(output)
return output
def describe_consumer_group(self, group, node=None, command_config=None):
""" Describe a consumer group.
"""
if node is None:
node = self.nodes[0]
consumer_group_script = self.path.script("kafka-consumer-groups.sh", node)
if command_config is None:
command_config = ""
else:
command_config = "--command-config " + command_config
cmd = "%s --bootstrap-server %s %s --group %s --describe" % \
(consumer_group_script,
self.bootstrap_servers(self.security_protocol),
command_config, group)
output = ""
self.logger.debug(cmd)
for line in node.account.ssh_capture(cmd):
if not (line.startswith("SLF4J") or line.startswith("TOPIC") or line.startswith("Could not fetch offset")):
output += line
self.logger.debug(output)
return output
def zk_connect_setting(self):
return self.zk.connect_setting(self.zk_chroot, self.zk_client_secure)
def _connect_setting(self, node, use_zk_connection=True):
"""
Checks if --bootstrap-server config is supported, if yes then returns a string with
bootstrap server, otherwise returns zookeeper connection string.
"""
if node.version.topic_command_supports_bootstrap_server() and not use_zk_connection:
connection_setting = "--bootstrap-server %s" % (self.bootstrap_servers(self.security_protocol))
else:
connection_setting = "--zookeeper %s" % (self.zk_connect_setting())
return connection_setting
def __bootstrap_servers(self, port, validate=True, offline_nodes=[]):
if validate and not port.open:
raise ValueError("We are retrieving bootstrap servers for the port: %s which is not currently open. - " %
str(port.port_number))
return ','.join([node.account.hostname + ":" + str(port.port_number)
for node in self.nodes
if node not in offline_nodes])
def bootstrap_servers(self, protocol='PLAINTEXT', validate=True, offline_nodes=[]):
"""Return comma-delimited list of brokers in this cluster formatted as HOSTNAME1:PORT1,HOSTNAME:PORT2,...
This is the format expected by many config files.
"""
port_mapping = self.port_mappings[protocol]
self.logger.info("Bootstrap client port is: " + str(port_mapping.port_number))
return self.__bootstrap_servers(port_mapping, validate, offline_nodes)
def controller(self):
""" Get the controller node
"""
self.logger.debug("Querying zookeeper to find controller broker")
controller_info = self.zk.query("/controller", chroot=self.zk_chroot)
if controller_info is None:
raise Exception("Error finding controller info")
controller_info = json.loads(controller_info)
self.logger.debug(controller_info)
controller_idx = int(controller_info["brokerid"])
self.logger.info("Controller's ID: %d" % (controller_idx))
return self.get_node(controller_idx)
def is_registered(self, node):
"""
Check whether a broker is registered in Zookeeper
"""
self.logger.debug("Querying zookeeper to see if broker %s is registered", str(node))
broker_info = self.zk.query("/brokers/ids/%s" % self.idx(node), chroot=self.zk_chroot)
self.logger.debug("Broker info: %s", broker_info)
return broker_info is not None
def get_offset_shell(self, topic, partitions, max_wait_ms, offsets, time):
node = self.nodes[0]
cmd = self.path.script("kafka-run-class.sh", node)
cmd += " kafka.tools.GetOffsetShell"
cmd += " --topic %s --broker-list %s --max-wait-ms %s --offsets %s --time %s" % (topic, self.bootstrap_servers(self.security_protocol), max_wait_ms, offsets, time)
if partitions:
cmd += ' --partitions %s' % partitions
cmd += " 2>> %s/get_offset_shell.log" % KafkaService.PERSISTENT_ROOT
cmd += " | tee -a %s/get_offset_shell.log &" % KafkaService.PERSISTENT_ROOT
output = ""
self.logger.debug(cmd)
for line in node.account.ssh_capture(cmd):
output += line
self.logger.debug(output)
return output
def java_class_name(self):
return "kafka.Kafka"

View File

@@ -0,0 +1,91 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# see kafka.server.KafkaConfig for additional details and defaults
advertised.host.name={{ node.account.hostname }}
listeners={{ listeners }}
advertised.listeners={{ advertised_listeners }}
listener.security.protocol.map={{ listener_security_protocol_map }}
{% if node.version.supports_named_listeners() %}
inter.broker.listener.name={{ interbroker_listener.name }}
{% else %}
security.inter.broker.protocol={{ interbroker_listener.security_protocol }}
{% endif %}
{% for k, v in listener_security_config.client_listener_overrides.iteritems() %}
{% if listener_security_config.requires_sasl_mechanism_prefix(k) %}
listener.name.{{ security_protocol.lower() }}.{{ security_config.client_sasl_mechanism.lower() }}.{{ k }}={{ v }}
{% else %}
listener.name.{{ security_protocol.lower() }}.{{ k }}={{ v }}
{% endif %}
{% endfor %}
{% if interbroker_listener.name != security_protocol %}
{% for k, v in listener_security_config.interbroker_listener_overrides.iteritems() %}
{% if listener_security_config.requires_sasl_mechanism_prefix(k) %}
listener.name.{{ interbroker_listener.name.lower() }}.{{ security_config.interbroker_sasl_mechanism.lower() }}.{{ k }}={{ v }}
{% else %}
listener.name.{{ interbroker_listener.name.lower() }}.{{ k }}={{ v }}
{% endif %}
{% endfor %}
{% endif %}
ssl.keystore.location=/mnt/security/test.keystore.jks
ssl.keystore.password=test-ks-passwd
ssl.key.password=test-ks-passwd
ssl.keystore.type=JKS
ssl.truststore.location=/mnt/security/test.truststore.jks
ssl.truststore.password=test-ts-passwd
ssl.truststore.type=JKS
ssl.endpoint.identification.algorithm=HTTPS
# Zookeeper TLS settings
#
# Note that zookeeper.ssl.client.enable will be set to true or false elsewhere, as appropriate.
# If it is false then these ZK keystore/truststore settings will have no effect. If it is true then
# zookeeper.clientCnxnSocket will also be set elsewhere (to org.apache.zookeeper.ClientCnxnSocketNetty)
{% if not zk.zk_tls_encrypt_only %}
zookeeper.ssl.keystore.location=/mnt/security/test.keystore.jks
zookeeper.ssl.keystore.password=test-ks-passwd
{% endif %}
zookeeper.ssl.truststore.location=/mnt/security/test.truststore.jks
zookeeper.ssl.truststore.password=test-ts-passwd
#
sasl.mechanism.inter.broker.protocol={{ security_config.interbroker_sasl_mechanism }}
sasl.enabled.mechanisms={{ ",".join(security_config.enabled_sasl_mechanisms) }}
sasl.kerberos.service.name=kafka
{% if authorizer_class_name is not none %}
ssl.client.auth=required
authorizer.class.name={{ authorizer_class_name }}
{% endif %}
zookeeper.set.acl={{"true" if zk_set_acl else "false"}}
zookeeper.connection.timeout.ms={{ zk_connect_timeout }}
zookeeper.session.timeout.ms={{ zk_session_timeout }}
{% if replica_lag is defined %}
replica.lag.time.max.ms={{replica_lag}}
{% endif %}
{% if auto_create_topics_enable is defined and auto_create_topics_enable is not none %}
auto.create.topics.enable={{ auto_create_topics_enable }}
{% endif %}
offsets.topic.num.partitions={{ num_nodes }}
offsets.topic.replication.factor={{ 3 if num_nodes > 3 else num_nodes }}
# Set to a low, but non-zero value to exercise this path without making tests much slower
group.initial.rebalance.delay.ms=100

View File

@@ -0,0 +1,136 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
log4j.rootLogger={{ log_level|default("DEBUG") }}, stdout
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=[%d] %p %m (%c)%n
# INFO level appenders
log4j.appender.kafkaInfoAppender=org.apache.log4j.DailyRollingFileAppender
log4j.appender.kafkaInfoAppender.DatePattern='.'yyyy-MM-dd-HH
log4j.appender.kafkaInfoAppender.File={{ log_dir }}/info/server.log
log4j.appender.kafkaInfoAppender.layout=org.apache.log4j.PatternLayout
log4j.appender.kafkaInfoAppender.layout.ConversionPattern=[%d] %p %m (%c)%n
log4j.appender.kafkaInfoAppender.Threshold=INFO
log4j.appender.stateChangeInfoAppender=org.apache.log4j.DailyRollingFileAppender
log4j.appender.stateChangeInfoAppender.DatePattern='.'yyyy-MM-dd-HH
log4j.appender.stateChangeInfoAppender.File={{ log_dir }}/info/state-change.log
log4j.appender.stateChangeInfoAppender.layout=org.apache.log4j.PatternLayout
log4j.appender.stateChangeInfoAppender.layout.ConversionPattern=[%d] %p %m (%c)%n
log4j.appender.stateChangeInfoAppender.Threshold=INFO
log4j.appender.requestInfoAppender=org.apache.log4j.DailyRollingFileAppender
log4j.appender.requestInfoAppender.DatePattern='.'yyyy-MM-dd-HH
log4j.appender.requestInfoAppender.File={{ log_dir }}/info/kafka-request.log
log4j.appender.requestInfoAppender.layout=org.apache.log4j.PatternLayout
log4j.appender.requestInfoAppender.layout.ConversionPattern=[%d] %p %m (%c)%n
log4j.appender.requestInfoAppender.Threshold=INFO
log4j.appender.cleanerInfoAppender=org.apache.log4j.DailyRollingFileAppender
log4j.appender.cleanerInfoAppender.DatePattern='.'yyyy-MM-dd-HH
log4j.appender.cleanerInfoAppender.File={{ log_dir }}/info/log-cleaner.log
log4j.appender.cleanerInfoAppender.layout=org.apache.log4j.PatternLayout
log4j.appender.cleanerInfoAppender.layout.ConversionPattern=[%d] %p %m (%c)%n
log4j.appender.cleanerInfoAppender.Threshold=INFO
log4j.appender.controllerInfoAppender=org.apache.log4j.DailyRollingFileAppender
log4j.appender.controllerInfoAppender.DatePattern='.'yyyy-MM-dd-HH
log4j.appender.controllerInfoAppender.File={{ log_dir }}/info/controller.log
log4j.appender.controllerInfoAppender.layout=org.apache.log4j.PatternLayout
log4j.appender.controllerInfoAppender.layout.ConversionPattern=[%d] %p %m (%c)%n
log4j.appender.controllerInfoAppender.Threshold=INFO
log4j.appender.authorizerInfoAppender=org.apache.log4j.DailyRollingFileAppender
log4j.appender.authorizerInfoAppender.DatePattern='.'yyyy-MM-dd-HH
log4j.appender.authorizerInfoAppender.File={{ log_dir }}/info/kafka-authorizer.log
log4j.appender.authorizerInfoAppender.layout=org.apache.log4j.PatternLayout
log4j.appender.authorizerInfoAppender.layout.ConversionPattern=[%d] %p %m (%c)%n
log4j.appender.authorizerInfoAppender.Threshold=INFO
# DEBUG level appenders
log4j.appender.kafkaDebugAppender=org.apache.log4j.DailyRollingFileAppender
log4j.appender.kafkaDebugAppender.DatePattern='.'yyyy-MM-dd-HH
log4j.appender.kafkaDebugAppender.File={{ log_dir }}/debug/server.log
log4j.appender.kafkaDebugAppender.layout=org.apache.log4j.PatternLayout
log4j.appender.kafkaDebugAppender.layout.ConversionPattern=[%d] %p %m (%c)%n
log4j.appender.kafkaDebugAppender.Threshold=DEBUG
log4j.appender.stateChangeDebugAppender=org.apache.log4j.DailyRollingFileAppender
log4j.appender.stateChangeDebugAppender.DatePattern='.'yyyy-MM-dd-HH
log4j.appender.stateChangeDebugAppender.File={{ log_dir }}/debug/state-change.log
log4j.appender.stateChangeDebugAppender.layout=org.apache.log4j.PatternLayout
log4j.appender.stateChangeDebugAppender.layout.ConversionPattern=[%d] %p %m (%c)%n
log4j.appender.stateChangeDebugAppender.Threshold=DEBUG
log4j.appender.requestDebugAppender=org.apache.log4j.DailyRollingFileAppender
log4j.appender.requestDebugAppender.DatePattern='.'yyyy-MM-dd-HH
log4j.appender.requestDebugAppender.File={{ log_dir }}/debug/kafka-request.log
log4j.appender.requestDebugAppender.layout=org.apache.log4j.PatternLayout
log4j.appender.requestDebugAppender.layout.ConversionPattern=[%d] %p %m (%c)%n
log4j.appender.requestDebugAppender.Threshold=DEBUG
log4j.appender.cleanerDebugAppender=org.apache.log4j.DailyRollingFileAppender
log4j.appender.cleanerDebugAppender.DatePattern='.'yyyy-MM-dd-HH
log4j.appender.cleanerDebugAppender.File={{ log_dir }}/debug/log-cleaner.log
log4j.appender.cleanerDebugAppender.layout=org.apache.log4j.PatternLayout
log4j.appender.cleanerDebugAppender.layout.ConversionPattern=[%d] %p %m (%c)%n
log4j.appender.cleanerDebugAppender.Threshold=DEBUG
log4j.appender.controllerDebugAppender=org.apache.log4j.DailyRollingFileAppender
log4j.appender.controllerDebugAppender.DatePattern='.'yyyy-MM-dd-HH
log4j.appender.controllerDebugAppender.File={{ log_dir }}/debug/controller.log
log4j.appender.controllerDebugAppender.layout=org.apache.log4j.PatternLayout
log4j.appender.controllerDebugAppender.layout.ConversionPattern=[%d] %p %m (%c)%n
log4j.appender.controllerDebugAppender.Threshold=DEBUG
log4j.appender.authorizerDebugAppender=org.apache.log4j.DailyRollingFileAppender
log4j.appender.authorizerDebugAppender.DatePattern='.'yyyy-MM-dd-HH
log4j.appender.authorizerDebugAppender.File={{ log_dir }}/debug/kafka-authorizer.log
log4j.appender.authorizerDebugAppender.layout=org.apache.log4j.PatternLayout
log4j.appender.authorizerDebugAppender.layout.ConversionPattern=[%d] %p %m (%c)%n
log4j.appender.authorizerDebugAppender.Threshold=DEBUG
# Turn on all our debugging info
log4j.logger.kafka.producer.async.DefaultEventHandler={{ log_level|default("DEBUG") }}, kafkaInfoAppender, kafkaDebugAppender
log4j.logger.kafka.client.ClientUtils={{ log_level|default("DEBUG") }}, kafkaInfoAppender, kafkaDebugAppender
log4j.logger.kafka.perf={{ log_level|default("DEBUG") }}, kafkaInfoAppender, kafkaDebugAppender
log4j.logger.kafka.perf.ProducerPerformance$ProducerThread={{ log_level|default("DEBUG") }}, kafkaInfoAppender, kafkaDebugAppender
log4j.logger.kafka={{ log_level|default("DEBUG") }}, kafkaInfoAppender, kafkaDebugAppender
log4j.logger.kafka.network.RequestChannel$={{ log_level|default("DEBUG") }}, requestInfoAppender, requestDebugAppender
log4j.additivity.kafka.network.RequestChannel$=false
log4j.logger.kafka.network.Processor={{ log_level|default("DEBUG") }}, requestInfoAppender, requestDebugAppender
log4j.logger.kafka.server.KafkaApis={{ log_level|default("DEBUG") }}, requestInfoAppender, requestDebugAppender
log4j.additivity.kafka.server.KafkaApis=false
log4j.logger.kafka.request.logger={{ log_level|default("DEBUG") }}, requestInfoAppender, requestDebugAppender
log4j.additivity.kafka.request.logger=false
log4j.logger.kafka.controller={{ log_level|default("DEBUG") }}, controllerInfoAppender, controllerDebugAppender
log4j.additivity.kafka.controller=false
log4j.logger.kafka.log.LogCleaner={{ log_level|default("DEBUG") }}, cleanerInfoAppender, cleanerDebugAppender
log4j.additivity.kafka.log.LogCleaner=false
log4j.logger.state.change.logger={{ log_level|default("DEBUG") }}, stateChangeInfoAppender, stateChangeDebugAppender
log4j.additivity.state.change.logger=false
#Change this to debug to get the actual audit log for authorizer.
log4j.logger.kafka.authorizer.logger={{ log_level|default("DEBUG") }}, authorizerInfoAppender, authorizerDebugAppender
log4j.additivity.kafka.authorizer.logger=false

View File

@@ -0,0 +1,18 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from collections import namedtuple
TopicPartition = namedtuple('TopicPartition', ['topic', 'partition'])

View File

@@ -0,0 +1,83 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from ducktape.services.background_thread import BackgroundThreadService
from kafkatest.directory_layout.kafka_path import KafkaPathResolverMixin
from kafkatest.services.security.security_config import SecurityConfig
class KafkaLog4jAppender(KafkaPathResolverMixin, BackgroundThreadService):
logs = {
"producer_log": {
"path": "/mnt/kafka_log4j_appender.log",
"collect_default": False}
}
def __init__(self, context, num_nodes, kafka, topic, max_messages=-1, security_protocol="PLAINTEXT"):
super(KafkaLog4jAppender, self).__init__(context, num_nodes)
self.kafka = kafka
self.topic = topic
self.max_messages = max_messages
self.security_protocol = security_protocol
self.security_config = SecurityConfig(self.context, security_protocol)
self.stop_timeout_sec = 30
def _worker(self, idx, node):
cmd = self.start_cmd(node)
self.logger.debug("VerifiableLog4jAppender %d command: %s" % (idx, cmd))
self.security_config.setup_node(node)
node.account.ssh(cmd)
def start_cmd(self, node):
cmd = self.path.script("kafka-run-class.sh", node)
cmd += " "
cmd += self.java_class_name()
cmd += " --topic %s --broker-list %s" % (self.topic, self.kafka.bootstrap_servers(self.security_protocol))
if self.max_messages > 0:
cmd += " --max-messages %s" % str(self.max_messages)
if self.security_protocol != SecurityConfig.PLAINTEXT:
cmd += " --security-protocol %s" % str(self.security_protocol)
if self.security_protocol == SecurityConfig.SSL or self.security_protocol == SecurityConfig.SASL_SSL:
cmd += " --ssl-truststore-location %s" % str(SecurityConfig.TRUSTSTORE_PATH)
cmd += " --ssl-truststore-password %s" % str(SecurityConfig.ssl_stores.truststore_passwd)
if self.security_protocol == SecurityConfig.SASL_PLAINTEXT or \
self.security_protocol == SecurityConfig.SASL_SSL or \
self.security_protocol == SecurityConfig.SASL_MECHANISM_GSSAPI or \
self.security_protocol == SecurityConfig.SASL_MECHANISM_PLAIN:
cmd += " --sasl-kerberos-service-name %s" % str('kafka')
cmd += " --client-jaas-conf-path %s" % str(SecurityConfig.JAAS_CONF_PATH)
cmd += " --kerb5-conf-path %s" % str(SecurityConfig.KRB5CONF_PATH)
cmd += " 2>> /mnt/kafka_log4j_appender.log | tee -a /mnt/kafka_log4j_appender.log &"
return cmd
def stop_node(self, node):
node.account.kill_java_processes(self.java_class_name(), allow_fail=False)
stopped = self.wait_node(node, timeout_sec=self.stop_timeout_sec)
assert stopped, "Node %s: did not stop within the specified timeout of %s seconds" % \
(str(node.account), str(self.stop_timeout_sec))
def clean_node(self, node):
node.account.kill_java_processes(self.java_class_name(), clean_shutdown=False,
allow_fail=False)
node.account.ssh("rm -rf /mnt/kafka_log4j_appender.log", allow_fail=False)
def java_class_name(self):
return "org.apache.kafka.tools.VerifiableLog4jAppender"

View File

@@ -0,0 +1,88 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
from ducktape.services.background_thread import BackgroundThreadService
from kafkatest.directory_layout.kafka_path import KafkaPathResolverMixin, CORE_LIBS_JAR_NAME, CORE_DEPENDANT_TEST_LIBS_JAR_NAME
from kafkatest.services.security.security_config import SecurityConfig
from kafkatest.version import DEV_BRANCH
class LogCompactionTester(KafkaPathResolverMixin, BackgroundThreadService):
OUTPUT_DIR = "/mnt/logcompaction_tester"
LOG_PATH = os.path.join(OUTPUT_DIR, "logcompaction_tester_stdout.log")
VERIFICATION_STRING = "Data verification is completed"
logs = {
"tool_logs": {
"path": LOG_PATH,
"collect_default": True}
}
def __init__(self, context, kafka, security_protocol="PLAINTEXT", stop_timeout_sec=30):
super(LogCompactionTester, self).__init__(context, 1)
self.kafka = kafka
self.security_protocol = security_protocol
self.security_config = SecurityConfig(self.context, security_protocol)
self.stop_timeout_sec = stop_timeout_sec
self.log_compaction_completed = False
def _worker(self, idx, node):
node.account.ssh("mkdir -p %s" % LogCompactionTester.OUTPUT_DIR)
cmd = self.start_cmd(node)
self.logger.info("LogCompactionTester %d command: %s" % (idx, cmd))
self.security_config.setup_node(node)
for line in node.account.ssh_capture(cmd):
self.logger.debug("Checking line:{}".format(line))
if line.startswith(LogCompactionTester.VERIFICATION_STRING):
self.log_compaction_completed = True
def start_cmd(self, node):
core_libs_jar = self.path.jar(CORE_LIBS_JAR_NAME, DEV_BRANCH)
core_dependant_test_libs_jar = self.path.jar(CORE_DEPENDANT_TEST_LIBS_JAR_NAME, DEV_BRANCH)
cmd = "for file in %s; do CLASSPATH=$CLASSPATH:$file; done;" % core_libs_jar
cmd += " for file in %s; do CLASSPATH=$CLASSPATH:$file; done;" % core_dependant_test_libs_jar
cmd += " export CLASSPATH;"
cmd += self.path.script("kafka-run-class.sh", node)
cmd += " %s" % self.java_class_name()
cmd += " --bootstrap-server %s --messages 1000000 --sleep 20 --duplicates 10 --percent-deletes 10" % (self.kafka.bootstrap_servers(self.security_protocol))
cmd += " 2>> %s | tee -a %s &" % (self.logs["tool_logs"]["path"], self.logs["tool_logs"]["path"])
return cmd
def stop_node(self, node):
node.account.kill_java_processes(self.java_class_name(), clean_shutdown=True,
allow_fail=True)
stopped = self.wait_node(node, timeout_sec=self.stop_timeout_sec)
assert stopped, "Node %s: did not stop within the specified timeout of %s seconds" % \
(str(node.account), str(self.stop_timeout_sec))
def clean_node(self, node):
node.account.kill_java_processes(self.java_class_name(), clean_shutdown=False,
allow_fail=True)
node.account.ssh("rm -rf %s" % LogCompactionTester.OUTPUT_DIR, allow_fail=False)
def java_class_name(self):
return "kafka.tools.LogCompactionTester"
@property
def is_done(self):
return self.log_compaction_completed

View File

@@ -0,0 +1,164 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
from ducktape.services.service import Service
from ducktape.utils.util import wait_until
from kafkatest.directory_layout.kafka_path import KafkaPathResolverMixin
"""
MirrorMaker is a tool for mirroring data between two Kafka clusters.
"""
class MirrorMaker(KafkaPathResolverMixin, Service):
# Root directory for persistent output
PERSISTENT_ROOT = "/mnt/mirror_maker"
LOG_DIR = os.path.join(PERSISTENT_ROOT, "logs")
LOG_FILE = os.path.join(LOG_DIR, "mirror_maker.log")
LOG4J_CONFIG = os.path.join(PERSISTENT_ROOT, "tools-log4j.properties")
PRODUCER_CONFIG = os.path.join(PERSISTENT_ROOT, "producer.properties")
CONSUMER_CONFIG = os.path.join(PERSISTENT_ROOT, "consumer.properties")
logs = {
"mirror_maker_log": {
"path": LOG_FILE,
"collect_default": True}
}
def __init__(self, context, num_nodes, source, target, whitelist=None, num_streams=1,
consumer_timeout_ms=None, offsets_storage="kafka",
offset_commit_interval_ms=60000, log_level="DEBUG", producer_interceptor_classes=None):
"""
MirrorMaker mirrors messages from one or more source clusters to a single destination cluster.
Args:
context: standard context
source: source Kafka cluster
target: target Kafka cluster to which data will be mirrored
whitelist: whitelist regex for topics to mirror
blacklist: blacklist regex for topics not to mirror
num_streams: number of consumer threads to create; can be a single int, or a list with
one value per node, allowing num_streams to be the same for each node,
or configured independently per-node
consumer_timeout_ms: consumer stops if t > consumer_timeout_ms elapses between consecutive messages
offsets_storage: used for consumer offsets.storage property
offset_commit_interval_ms: how frequently the mirror maker consumer commits offsets
"""
super(MirrorMaker, self).__init__(context, num_nodes=num_nodes)
self.log_level = log_level
self.consumer_timeout_ms = consumer_timeout_ms
self.num_streams = num_streams
if not isinstance(num_streams, int):
# if not an integer, num_streams should be configured per-node
assert len(num_streams) == num_nodes
self.whitelist = whitelist
self.source = source
self.target = target
self.offsets_storage = offsets_storage.lower()
if not (self.offsets_storage in ["kafka", "zookeeper"]):
raise Exception("offsets_storage should be 'kafka' or 'zookeeper'. Instead found %s" % self.offsets_storage)
self.offset_commit_interval_ms = offset_commit_interval_ms
self.producer_interceptor_classes = producer_interceptor_classes
self.external_jars = None
# These properties are potentially used by third-party tests.
self.source_auto_offset_reset = None
self.partition_assignment_strategy = None
def start_cmd(self, node):
cmd = "export LOG_DIR=%s;" % MirrorMaker.LOG_DIR
cmd += " export KAFKA_LOG4J_OPTS=\"-Dlog4j.configuration=file:%s\";" % MirrorMaker.LOG4J_CONFIG
cmd += " export KAFKA_OPTS=%s;" % self.security_config.kafka_opts
# add external dependencies, for instance for interceptors
if self.external_jars is not None:
cmd += "for file in %s; do CLASSPATH=$CLASSPATH:$file; done; " % self.external_jars
cmd += "export CLASSPATH; "
cmd += " %s %s" % (self.path.script("kafka-run-class.sh", node),
self.java_class_name())
cmd += " --consumer.config %s" % MirrorMaker.CONSUMER_CONFIG
cmd += " --producer.config %s" % MirrorMaker.PRODUCER_CONFIG
cmd += " --offset.commit.interval.ms %s" % str(self.offset_commit_interval_ms)
if isinstance(self.num_streams, int):
cmd += " --num.streams %d" % self.num_streams
else:
# config num_streams separately on each node
cmd += " --num.streams %d" % self.num_streams[self.idx(node) - 1]
if self.whitelist is not None:
cmd += " --whitelist=\"%s\"" % self.whitelist
cmd += " 1>> %s 2>> %s &" % (MirrorMaker.LOG_FILE, MirrorMaker.LOG_FILE)
return cmd
def pids(self, node):
return node.account.java_pids(self.java_class_name())
def alive(self, node):
return len(self.pids(node)) > 0
def start_node(self, node):
node.account.ssh("mkdir -p %s" % MirrorMaker.PERSISTENT_ROOT, allow_fail=False)
node.account.ssh("mkdir -p %s" % MirrorMaker.LOG_DIR, allow_fail=False)
self.security_config = self.source.security_config.client_config()
self.security_config.setup_node(node)
# Create, upload one consumer config file for source cluster
consumer_props = self.render("mirror_maker_consumer.properties")
consumer_props += str(self.security_config)
node.account.create_file(MirrorMaker.CONSUMER_CONFIG, consumer_props)
self.logger.info("Mirrormaker consumer props:\n" + consumer_props)
# Create, upload producer properties file for target cluster
producer_props = self.render('mirror_maker_producer.properties')
producer_props += str(self.security_config)
self.logger.info("Mirrormaker producer props:\n" + producer_props)
node.account.create_file(MirrorMaker.PRODUCER_CONFIG, producer_props)
# Create and upload log properties
log_config = self.render('tools_log4j.properties', log_file=MirrorMaker.LOG_FILE)
node.account.create_file(MirrorMaker.LOG4J_CONFIG, log_config)
# Run mirror maker
cmd = self.start_cmd(node)
self.logger.debug("Mirror maker command: %s", cmd)
node.account.ssh(cmd, allow_fail=False)
wait_until(lambda: self.alive(node), timeout_sec=30, backoff_sec=.5,
err_msg="Mirror maker took to long to start.")
self.logger.debug("Mirror maker is alive")
def stop_node(self, node, clean_shutdown=True):
node.account.kill_java_processes(self.java_class_name(), allow_fail=True,
clean_shutdown=clean_shutdown)
wait_until(lambda: not self.alive(node), timeout_sec=30, backoff_sec=.5,
err_msg="Mirror maker took to long to stop.")
def clean_node(self, node):
if self.alive(node):
self.logger.warn("%s %s was still alive at cleanup time. Killing forcefully..." %
(self.__class__.__name__, node.account))
node.account.kill_java_processes(self.java_class_name(), clean_shutdown=False,
allow_fail=True)
node.account.ssh("rm -rf %s" % MirrorMaker.PERSISTENT_ROOT, allow_fail=False)
self.security_config.clean_node(node)
def java_class_name(self):
return "kafka.tools.MirrorMaker"

View File

@@ -0,0 +1,14 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

View File

@@ -0,0 +1,228 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from BaseHTTPServer import BaseHTTPRequestHandler, HTTPServer
from collections import defaultdict, namedtuple
import json
from threading import Thread
from select import select
import socket
MetricKey = namedtuple('MetricKey', ['host', 'client_id', 'name', 'group', 'tags'])
MetricValue = namedtuple('MetricValue', ['time', 'value'])
# Python's logging library doesn't define anything more detailed than DEBUG, but we'd like a finer-grained setting for
# for highly detailed messages, e.g. logging every single incoming request.
TRACE = 5
class HttpMetricsCollector(object):
"""
HttpMetricsCollector enables collection of metrics from various Kafka clients instrumented with the
PushHttpMetricsReporter. It starts a web server locally and provides the necessary configuration for clients
to automatically report metrics data to this server. It also provides basic functionality for querying the
recorded metrics. This class can be used either as a mixin or standalone object.
"""
# The port to listen on on the worker node, which will be forwarded to the port listening on this driver node
REMOTE_PORT = 6789
def __init__(self, **kwargs):
"""
Create a new HttpMetricsCollector
:param period the period, in seconds, between updates that the metrics reporter configuration should define.
defaults to reporting once per second
:param args:
:param kwargs:
"""
self._http_metrics_period = kwargs.pop('period', 1)
super(HttpMetricsCollector, self).__init__(**kwargs)
# TODO: currently we maintain just a simple map from all key info -> value. However, some key fields are far
# more common to filter on, so we'd want to index by them, e.g. host, client.id, metric name.
self._http_metrics = defaultdict(list)
self._httpd = HTTPServer(('', 0), _MetricsReceiver)
self._httpd.parent = self
self._httpd.metrics = self._http_metrics
self._http_metrics_thread = Thread(target=self._run_http_metrics_httpd,
name='http-metrics-thread[%s]' % str(self))
self._http_metrics_thread.start()
self._forwarders = {}
@property
def http_metrics_url(self):
"""
:return: the URL to use when reporting metrics
"""
return "http://%s:%d" % ("localhost", self.REMOTE_PORT)
@property
def http_metrics_client_configs(self):
"""
Get client configurations that can be used to report data to this collector. Put these in a properties file for
clients (e.g. console producer or consumer) to have them push metrics to this driver. Note that in some cases
(e.g. streams, connect) these settings may need to be prefixed.
:return: a dictionary of client configurations that will direct a client to report metrics to this collector
"""
return {
"metric.reporters": "org.apache.kafka.tools.PushHttpMetricsReporter",
"metrics.url": self.http_metrics_url,
"metrics.period": self._http_metrics_period,
}
def start_node(self, node):
local_port = self._httpd.socket.getsockname()[1]
self.logger.debug('HttpMetricsCollector listening on %s', local_port)
self._forwarders[self.idx(node)] = _ReverseForwarder(self.logger, node, self.REMOTE_PORT, local_port)
super(HttpMetricsCollector, self).start_node(node)
def stop(self):
super(HttpMetricsCollector, self).stop()
if self._http_metrics_thread:
self.logger.debug("Shutting down metrics httpd")
self._httpd.shutdown()
self._http_metrics_thread.join()
self.logger.debug("Finished shutting down metrics httpd")
def stop_node(self, node):
super(HttpMetricsCollector, self).stop_node(node)
idx = self.idx(node)
self._forwarders[idx].stop()
del self._forwarders[idx]
def metrics(self, host=None, client_id=None, name=None, group=None, tags=None):
"""
Get any collected metrics that match the specified parameters, yielding each as a tuple of
(key, [<timestamp, value>, ...]) values.
"""
for k, values in self._http_metrics.iteritems():
if ((host is None or host == k.host) and
(client_id is None or client_id == k.client_id) and
(name is None or name == k.name) and
(group is None or group == k.group) and
(tags is None or tags == k.tags)):
yield (k, values)
def _run_http_metrics_httpd(self):
self._httpd.serve_forever()
class _MetricsReceiver(BaseHTTPRequestHandler):
"""
HTTP request handler that accepts requests from the PushHttpMetricsReporter and stores them back into the parent
HttpMetricsCollector
"""
def log_message(self, format, *args, **kwargs):
# Don't do any logging here so we get rid of the mostly useless per-request Apache log-style info that spams
# the debug log
pass
def do_POST(self):
data = self.rfile.read(int(self.headers['Content-Length']))
data = json.loads(data)
self.server.parent.logger.log(TRACE, "POST %s\n\n%s\n%s", self.path, self.headers,
json.dumps(data, indent=4, separators=(',', ': ')))
self.send_response(204)
self.end_headers()
client = data['client']
host = client['host']
client_id = client['client_id']
ts = client['time']
metrics = data['metrics']
for raw_metric in metrics:
name = raw_metric['name']
group = raw_metric['group']
# Convert to tuple of pairs because dicts & lists are unhashable
tags = tuple([(k, v) for k, v in raw_metric['tags'].iteritems()]),
value = raw_metric['value']
key = MetricKey(host=host, client_id=client_id, name=name, group=group, tags=tags)
metric_value = MetricValue(time=ts, value=value)
self.server.metrics[key].append(metric_value)
class _ReverseForwarder(object):
"""
Runs reverse forwarding of a port on a node to a local port. This allows you to setup a server on the test driver
that only assumes we have basic SSH access that ducktape guarantees is available for worker nodes.
"""
def __init__(self, logger, node, remote_port, local_port):
self.logger = logger
self._node = node
self._local_port = local_port
self._remote_port = remote_port
self.logger.debug('Forwarding %s port %d to driver port %d', node, remote_port, local_port)
self._stopping = False
self._transport = node.account.ssh_client.get_transport()
self._transport.request_port_forward('', remote_port)
self._accept_thread = Thread(target=self._accept)
self._accept_thread.start()
def stop(self):
self._stopping = True
self._accept_thread.join(30)
if self._accept_thread.isAlive():
raise RuntimeError("Failed to stop reverse forwarder on %s", self._node)
self._transport.cancel_port_forward('', self._remote_port)
def _accept(self):
while not self._stopping:
chan = self._transport.accept(1)
if chan is None:
continue
thr = Thread(target=self._handler, args=(chan,))
thr.setDaemon(True)
thr.start()
def _handler(self, chan):
sock = socket.socket()
try:
sock.connect(("localhost", self._local_port))
except Exception as e:
self.logger.error('Forwarding request to port %d failed: %r', self._local_port, e)
return
self.logger.log(TRACE, 'Connected! Tunnel open %r -> %r -> %d', chan.origin_addr, chan.getpeername(),
self._local_port)
while True:
r, w, x = select([sock, chan], [], [])
if sock in r:
data = sock.recv(1024)
if len(data) == 0:
break
chan.send(data)
if chan in r:
data = chan.recv(1024)
if len(data) == 0:
break
sock.send(data)
chan.close()
sock.close()
self.logger.log(TRACE, 'Tunnel closed from %r', chan.origin_addr)

View File

@@ -0,0 +1,141 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
from ducktape.cluster.remoteaccount import RemoteCommandError
from ducktape.utils.util import wait_until
from kafkatest.version import get_version, V_0_11_0_0, DEV_BRANCH
class JmxMixin(object):
"""This mixin helps existing service subclasses start JmxTool on their worker nodes and collect jmx stats.
A couple things worth noting:
- this is not a service in its own right.
- we assume the service using JmxMixin also uses KafkaPathResolverMixin
- this uses the --wait option for JmxTool, so the list of object names must be explicit; no patterns are permitted
"""
def __init__(self, num_nodes, jmx_object_names=None, jmx_attributes=None, jmx_poll_ms=1000, root="/mnt"):
self.jmx_object_names = jmx_object_names
self.jmx_attributes = jmx_attributes or []
self.jmx_poll_ms = jmx_poll_ms
self.jmx_port = 9192
self.started = [False] * num_nodes
self.jmx_stats = [{} for x in range(num_nodes)]
self.maximum_jmx_value = {} # map from object_attribute_name to maximum value observed over time
self.average_jmx_value = {} # map from object_attribute_name to average value observed over time
self.jmx_tool_log = os.path.join(root, "jmx_tool.log")
self.jmx_tool_err_log = os.path.join(root, "jmx_tool.err.log")
def clean_node(self, node):
node.account.kill_java_processes(self.jmx_class_name(), clean_shutdown=False,
allow_fail=True)
idx = self.idx(node)
self.started[idx-1] = False
node.account.ssh("rm -f -- %s %s" % (self.jmx_tool_log, self.jmx_tool_err_log), allow_fail=False)
def start_jmx_tool(self, idx, node):
if self.jmx_object_names is None:
self.logger.debug("%s: Not starting jmx tool because no jmx objects are defined" % node.account)
return
if self.started[idx-1]:
self.logger.debug("%s: jmx tool has been started already on this node" % node.account)
return
# JmxTool is not particularly robust to slow-starting processes. In order to ensure JmxTool doesn't fail if the
# process we're trying to monitor takes awhile before listening on the JMX port, wait until we can see that port
# listening before even launching JmxTool
def check_jmx_port_listening():
return 0 == node.account.ssh("nc -z 127.0.0.1 %d" % self.jmx_port, allow_fail=True)
wait_until(check_jmx_port_listening, timeout_sec=30, backoff_sec=.1,
err_msg="%s: Never saw JMX port for %s start listening" % (node.account, self))
# To correctly wait for requested JMX metrics to be added we need the --wait option for JmxTool. This option was
# not added until 0.11.0.1, so any earlier versions need to use JmxTool from a newer version.
use_jmxtool_version = get_version(node)
if use_jmxtool_version <= V_0_11_0_0:
use_jmxtool_version = DEV_BRANCH
cmd = "%s %s " % (self.path.script("kafka-run-class.sh", use_jmxtool_version), self.jmx_class_name())
cmd += "--reporting-interval %d --jmx-url service:jmx:rmi:///jndi/rmi://127.0.0.1:%d/jmxrmi" % (self.jmx_poll_ms, self.jmx_port)
cmd += " --wait"
for jmx_object_name in self.jmx_object_names:
cmd += " --object-name %s" % jmx_object_name
cmd += " --attributes "
for jmx_attribute in self.jmx_attributes:
cmd += "%s," % jmx_attribute
cmd += " 1>> %s" % self.jmx_tool_log
cmd += " 2>> %s &" % self.jmx_tool_err_log
self.logger.debug("%s: Start JmxTool %d command: %s" % (node.account, idx, cmd))
node.account.ssh(cmd, allow_fail=False)
wait_until(lambda: self._jmx_has_output(node), timeout_sec=30, backoff_sec=.5, err_msg="%s: Jmx tool took too long to start" % node.account)
self.started[idx-1] = True
def _jmx_has_output(self, node):
"""Helper used as a proxy to determine whether jmx is running by that jmx_tool_log contains output."""
try:
node.account.ssh("test -s %s" % self.jmx_tool_log, allow_fail=False)
return True
except RemoteCommandError:
return False
def read_jmx_output(self, idx, node):
if not self.started[idx-1]:
return
object_attribute_names = []
cmd = "cat %s" % self.jmx_tool_log
self.logger.debug("Read jmx output %d command: %s", idx, cmd)
lines = [line for line in node.account.ssh_capture(cmd, allow_fail=False)]
assert len(lines) > 1, "There don't appear to be any samples in the jmx tool log: %s" % lines
for line in lines:
if "time" in line:
object_attribute_names = line.strip()[1:-1].split("\",\"")[1:]
continue
stats = [float(field) for field in line.split(',')]
time_sec = int(stats[0]/1000)
self.jmx_stats[idx-1][time_sec] = {name: stats[i+1] for i, name in enumerate(object_attribute_names)}
# do not calculate average and maximum of jmx stats until we have read output from all nodes
# If the service is multithreaded, this means that the results will be aggregated only when the last
# service finishes
if any(len(time_to_stats) == 0 for time_to_stats in self.jmx_stats):
return
start_time_sec = min([min(time_to_stats.keys()) for time_to_stats in self.jmx_stats])
end_time_sec = max([max(time_to_stats.keys()) for time_to_stats in self.jmx_stats])
for name in object_attribute_names:
aggregates_per_time = []
for time_sec in xrange(start_time_sec, end_time_sec + 1):
# assume that value is 0 if it is not read by jmx tool at the given time. This is appropriate for metrics such as bandwidth
values_per_node = [time_to_stats.get(time_sec, {}).get(name, 0) for time_to_stats in self.jmx_stats]
# assume that value is aggregated across nodes by sum. This is appropriate for metrics such as bandwidth
aggregates_per_time.append(sum(values_per_node))
self.average_jmx_value[name] = sum(aggregates_per_time) / len(aggregates_per_time)
self.maximum_jmx_value[name] = max(aggregates_per_time)
def read_jmx_output_all_nodes(self):
for node in self.nodes:
self.read_jmx_output(self.idx(node), node)
def jmx_class_name(self):
return "kafka.tools.JmxTool"

View File

@@ -0,0 +1,19 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from performance import PerformanceService, throughput, latency, compute_aggregate_throughput
from end_to_end_latency import EndToEndLatencyService
from producer_performance import ProducerPerformanceService
from consumer_performance import ConsumerPerformanceService

View File

@@ -0,0 +1,187 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
from kafkatest.services.performance import PerformanceService
from kafkatest.services.security.security_config import SecurityConfig
from kafkatest.version import DEV_BRANCH, V_0_9_0_0, V_2_0_0, LATEST_0_10_0
class ConsumerPerformanceService(PerformanceService):
"""
See ConsumerPerformance.scala as the source of truth on these settings, but for reference:
"zookeeper" "The connection string for the zookeeper connection in the form host:port. Multiple URLS can
be given to allow fail-over. This option is only used with the old consumer."
"broker-list", "A broker list to use for connecting if using the new consumer."
"topic", "REQUIRED: The topic to consume from."
"group", "The group id to consume on."
"fetch-size", "The amount of data to fetch in a single request."
"from-latest", "If the consumer does not already have an establishedoffset to consume from,
start with the latest message present in the log rather than the earliest message."
"socket-buffer-size", "The size of the tcp RECV size."
"threads", "Number of processing threads."
"num-fetch-threads", "Number of fetcher threads. Defaults to 1"
"new-consumer", "Use the new consumer implementation."
"consumer.config", "Consumer config properties file."
"""
# Root directory for persistent output
PERSISTENT_ROOT = "/mnt/consumer_performance"
LOG_DIR = os.path.join(PERSISTENT_ROOT, "logs")
STDOUT_CAPTURE = os.path.join(PERSISTENT_ROOT, "consumer_performance.stdout")
STDERR_CAPTURE = os.path.join(PERSISTENT_ROOT, "consumer_performance.stderr")
LOG_FILE = os.path.join(LOG_DIR, "consumer_performance.log")
LOG4J_CONFIG = os.path.join(PERSISTENT_ROOT, "tools-log4j.properties")
CONFIG_FILE = os.path.join(PERSISTENT_ROOT, "consumer.properties")
logs = {
"consumer_performance_output": {
"path": STDOUT_CAPTURE,
"collect_default": True},
"consumer_performance_stderr": {
"path": STDERR_CAPTURE,
"collect_default": True},
"consumer_performance_log": {
"path": LOG_FILE,
"collect_default": True}
}
def __init__(self, context, num_nodes, kafka, topic, messages, version=DEV_BRANCH, new_consumer=True, settings={}):
super(ConsumerPerformanceService, self).__init__(context, num_nodes)
self.kafka = kafka
self.security_config = kafka.security_config.client_config()
self.topic = topic
self.messages = messages
self.new_consumer = new_consumer
self.settings = settings
assert version >= V_0_9_0_0 or (not new_consumer), \
"new_consumer is only supported if version >= 0.9.0.0, version %s" % str(version)
assert version < V_2_0_0 or new_consumer, \
"new_consumer==false is only supported if version < 2.0.0, version %s" % str(version)
security_protocol = self.security_config.security_protocol
assert version >= V_0_9_0_0 or security_protocol == SecurityConfig.PLAINTEXT, \
"Security protocol %s is only supported if version >= 0.9.0.0, version %s" % (self.security_config, str(version))
# These less-frequently used settings can be updated manually after instantiation
self.fetch_size = None
self.socket_buffer_size = None
self.threads = None
self.num_fetch_threads = None
self.group = None
self.from_latest = None
for node in self.nodes:
node.version = version
def args(self, version):
"""Dictionary of arguments used to start the Consumer Performance script."""
args = {
'topic': self.topic,
'messages': self.messages,
}
if self.new_consumer:
if version <= LATEST_0_10_0:
args['new-consumer'] = ""
args['broker-list'] = self.kafka.bootstrap_servers(self.security_config.security_protocol)
else:
args['zookeeper'] = self.kafka.zk_connect_setting()
if self.fetch_size is not None:
args['fetch-size'] = self.fetch_size
if self.socket_buffer_size is not None:
args['socket-buffer-size'] = self.socket_buffer_size
if self.threads is not None:
args['threads'] = self.threads
if self.num_fetch_threads is not None:
args['num-fetch-threads'] = self.num_fetch_threads
if self.group is not None:
args['group'] = self.group
if self.from_latest:
args['from-latest'] = ""
return args
def start_cmd(self, node):
cmd = "export LOG_DIR=%s;" % ConsumerPerformanceService.LOG_DIR
cmd += " export KAFKA_OPTS=%s;" % self.security_config.kafka_opts
cmd += " export KAFKA_LOG4J_OPTS=\"-Dlog4j.configuration=file:%s\";" % ConsumerPerformanceService.LOG4J_CONFIG
cmd += " %s" % self.path.script("kafka-consumer-perf-test.sh", node)
for key, value in self.args(node.version).items():
cmd += " --%s %s" % (key, value)
if node.version >= V_0_9_0_0:
# This is only used for security settings
cmd += " --consumer.config %s" % ConsumerPerformanceService.CONFIG_FILE
for key, value in self.settings.items():
cmd += " %s=%s" % (str(key), str(value))
cmd += " 2>> %(stderr)s | tee -a %(stdout)s" % {'stdout': ConsumerPerformanceService.STDOUT_CAPTURE,
'stderr': ConsumerPerformanceService.STDERR_CAPTURE}
return cmd
def parse_results(self, line, version):
parts = line.split(',')
if version >= V_0_9_0_0:
result = {
'total_mb': float(parts[2]),
'mbps': float(parts[3]),
'records_per_sec': float(parts[5]),
}
else:
result = {
'total_mb': float(parts[3]),
'mbps': float(parts[4]),
'records_per_sec': float(parts[6]),
}
return result
def _worker(self, idx, node):
node.account.ssh("mkdir -p %s" % ConsumerPerformanceService.PERSISTENT_ROOT, allow_fail=False)
log_config = self.render('tools_log4j.properties', log_file=ConsumerPerformanceService.LOG_FILE)
node.account.create_file(ConsumerPerformanceService.LOG4J_CONFIG, log_config)
node.account.create_file(ConsumerPerformanceService.CONFIG_FILE, str(self.security_config))
self.security_config.setup_node(node)
cmd = self.start_cmd(node)
self.logger.debug("Consumer performance %d command: %s", idx, cmd)
last = None
for line in node.account.ssh_capture(cmd):
last = line
# Parse and save the last line's information
self.results[idx-1] = self.parse_results(last, node.version)

View File

@@ -0,0 +1,124 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
from kafkatest.services.performance import PerformanceService
from kafkatest.services.security.security_config import SecurityConfig
from kafkatest.version import DEV_BRANCH, V_0_9_0_0
class EndToEndLatencyService(PerformanceService):
MESSAGE_BYTES = 21 # 0.8.X messages are fixed at 21 bytes, so we'll match that for other versions
# Root directory for persistent output
PERSISTENT_ROOT = "/mnt/end_to_end_latency"
LOG_DIR = os.path.join(PERSISTENT_ROOT, "logs")
STDOUT_CAPTURE = os.path.join(PERSISTENT_ROOT, "end_to_end_latency.stdout")
STDERR_CAPTURE = os.path.join(PERSISTENT_ROOT, "end_to_end_latency.stderr")
LOG_FILE = os.path.join(LOG_DIR, "end_to_end_latency.log")
LOG4J_CONFIG = os.path.join(PERSISTENT_ROOT, "tools-log4j.properties")
CONFIG_FILE = os.path.join(PERSISTENT_ROOT, "client.properties")
logs = {
"end_to_end_latency_output": {
"path": STDOUT_CAPTURE,
"collect_default": True},
"end_to_end_latency_stderr": {
"path": STDERR_CAPTURE,
"collect_default": True},
"end_to_end_latency_log": {
"path": LOG_FILE,
"collect_default": True}
}
def __init__(self, context, num_nodes, kafka, topic, num_records, compression_type="none", version=DEV_BRANCH, acks=1):
super(EndToEndLatencyService, self).__init__(context, num_nodes,
root=EndToEndLatencyService.PERSISTENT_ROOT)
self.kafka = kafka
self.security_config = kafka.security_config.client_config()
security_protocol = self.security_config.security_protocol
if version < V_0_9_0_0:
assert security_protocol == SecurityConfig.PLAINTEXT, \
"Security protocol %s is only supported if version >= 0.9.0.0, version %s" % (self.security_config, str(version))
assert compression_type == "none", \
"Compression type %s is only supported if version >= 0.9.0.0, version %s" % (compression_type, str(version))
self.args = {
'topic': topic,
'num_records': num_records,
'acks': acks,
'compression_type': compression_type,
'kafka_opts': self.security_config.kafka_opts,
'message_bytes': EndToEndLatencyService.MESSAGE_BYTES
}
for node in self.nodes:
node.version = version
def start_cmd(self, node):
args = self.args.copy()
args.update({
'zk_connect': self.kafka.zk_connect_setting(),
'bootstrap_servers': self.kafka.bootstrap_servers(self.security_config.security_protocol),
'config_file': EndToEndLatencyService.CONFIG_FILE,
'kafka_run_class': self.path.script("kafka-run-class.sh", node),
'java_class_name': self.java_class_name()
})
cmd = "export KAFKA_LOG4J_OPTS=\"-Dlog4j.configuration=file:%s\"; " % EndToEndLatencyService.LOG4J_CONFIG
if node.version >= V_0_9_0_0:
cmd += "KAFKA_OPTS=%(kafka_opts)s %(kafka_run_class)s %(java_class_name)s " % args
cmd += "%(bootstrap_servers)s %(topic)s %(num_records)d %(acks)d %(message_bytes)d %(config_file)s" % args
else:
# Set fetch max wait to 0 to match behavior in later versions
cmd += "KAFKA_OPTS=%(kafka_opts)s %(kafka_run_class)s kafka.tools.TestEndToEndLatency " % args
cmd += "%(bootstrap_servers)s %(zk_connect)s %(topic)s %(num_records)d 0 %(acks)d" % args
cmd += " 2>> %(stderr)s | tee -a %(stdout)s" % {'stdout': EndToEndLatencyService.STDOUT_CAPTURE,
'stderr': EndToEndLatencyService.STDERR_CAPTURE}
return cmd
def _worker(self, idx, node):
node.account.ssh("mkdir -p %s" % EndToEndLatencyService.PERSISTENT_ROOT, allow_fail=False)
log_config = self.render('tools_log4j.properties', log_file=EndToEndLatencyService.LOG_FILE)
node.account.create_file(EndToEndLatencyService.LOG4J_CONFIG, log_config)
client_config = str(self.security_config)
if node.version >= V_0_9_0_0:
client_config += "compression_type=%(compression_type)s" % self.args
node.account.create_file(EndToEndLatencyService.CONFIG_FILE, client_config)
self.security_config.setup_node(node)
cmd = self.start_cmd(node)
self.logger.debug("End-to-end latency %d command: %s", idx, cmd)
results = {}
for line in node.account.ssh_capture(cmd):
if line.startswith("Avg latency:"):
results['latency_avg_ms'] = float(line.split()[2])
if line.startswith("Percentiles"):
results['latency_50th_ms'] = float(line.split()[3][:-1])
results['latency_99th_ms'] = float(line.split()[6][:-1])
results['latency_999th_ms'] = float(line.split()[9])
self.results[idx-1] = results
def java_class_name(self):
return "kafka.tools.EndToEndLatency"

View File

@@ -0,0 +1,72 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from ducktape.services.background_thread import BackgroundThreadService
from kafkatest.directory_layout.kafka_path import KafkaPathResolverMixin
class PerformanceService(KafkaPathResolverMixin, BackgroundThreadService):
def __init__(self, context=None, num_nodes=0, root="/mnt/*", stop_timeout_sec=30):
super(PerformanceService, self).__init__(context, num_nodes)
self.results = [None] * self.num_nodes
self.stats = [[] for x in range(self.num_nodes)]
self.stop_timeout_sec = stop_timeout_sec
self.root = root
def java_class_name(self):
"""
Returns the name of the Java class which this service creates. Subclasses should override
this method, so that we know the name of the java process to stop. If it is not
overridden, we will kill all java processes in PerformanceService#stop_node (for backwards
compatibility.)
"""
return ""
def stop_node(self, node):
node.account.kill_java_processes(self.java_class_name(), clean_shutdown=True, allow_fail=True)
stopped = self.wait_node(node, timeout_sec=self.stop_timeout_sec)
assert stopped, "Node %s: did not stop within the specified timeout of %s seconds" % \
(str(node.account), str(self.stop_timeout_sec))
def clean_node(self, node):
node.account.kill_java_processes(self.java_class_name(), clean_shutdown=False, allow_fail=True)
node.account.ssh("rm -rf -- %s" % self.root, allow_fail=False)
def throughput(records_per_sec, mb_per_sec):
"""Helper method to ensure uniform representation of throughput data"""
return {
"records_per_sec": records_per_sec,
"mb_per_sec": mb_per_sec
}
def latency(latency_50th_ms, latency_99th_ms, latency_999th_ms):
"""Helper method to ensure uniform representation of latency data"""
return {
"latency_50th_ms": latency_50th_ms,
"latency_99th_ms": latency_99th_ms,
"latency_999th_ms": latency_999th_ms
}
def compute_aggregate_throughput(perf):
"""Helper method for computing throughput after running a performance service."""
aggregate_rate = sum([r['records_per_sec'] for r in perf.results])
aggregate_mbps = sum([r['mbps'] for r in perf.results])
return throughput(aggregate_rate, aggregate_mbps)

View File

@@ -0,0 +1,174 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import time
from ducktape.utils.util import wait_until
from ducktape.cluster.remoteaccount import RemoteCommandError
from kafkatest.directory_layout.kafka_path import TOOLS_JAR_NAME, TOOLS_DEPENDANT_TEST_LIBS_JAR_NAME
from kafkatest.services.monitor.http import HttpMetricsCollector
from kafkatest.services.performance import PerformanceService
from kafkatest.services.security.security_config import SecurityConfig
from kafkatest.version import DEV_BRANCH, V_0_9_0_0
class ProducerPerformanceService(HttpMetricsCollector, PerformanceService):
PERSISTENT_ROOT = "/mnt/producer_performance"
STDOUT_CAPTURE = os.path.join(PERSISTENT_ROOT, "producer_performance.stdout")
STDERR_CAPTURE = os.path.join(PERSISTENT_ROOT, "producer_performance.stderr")
LOG_DIR = os.path.join(PERSISTENT_ROOT, "logs")
LOG_FILE = os.path.join(LOG_DIR, "producer_performance.log")
LOG4J_CONFIG = os.path.join(PERSISTENT_ROOT, "tools-log4j.properties")
def __init__(self, context, num_nodes, kafka, topic, num_records, record_size, throughput, version=DEV_BRANCH, settings=None,
intermediate_stats=False, client_id="producer-performance"):
super(ProducerPerformanceService, self).__init__(context=context, num_nodes=num_nodes)
self.logs = {
"producer_performance_stdout": {
"path": ProducerPerformanceService.STDOUT_CAPTURE,
"collect_default": True},
"producer_performance_stderr": {
"path": ProducerPerformanceService.STDERR_CAPTURE,
"collect_default": True},
"producer_performance_log": {
"path": ProducerPerformanceService.LOG_FILE,
"collect_default": True}
}
self.kafka = kafka
self.security_config = kafka.security_config.client_config()
security_protocol = self.security_config.security_protocol
assert version >= V_0_9_0_0 or security_protocol == SecurityConfig.PLAINTEXT, \
"Security protocol %s is only supported if version >= 0.9.0.0, version %s" % (self.security_config, str(version))
self.args = {
'topic': topic,
'kafka_opts': self.security_config.kafka_opts,
'num_records': num_records,
'record_size': record_size,
'throughput': throughput
}
self.settings = settings or {}
self.intermediate_stats = intermediate_stats
self.client_id = client_id
for node in self.nodes:
node.version = version
def start_cmd(self, node):
args = self.args.copy()
args.update({
'bootstrap_servers': self.kafka.bootstrap_servers(self.security_config.security_protocol),
'client_id': self.client_id,
'kafka_run_class': self.path.script("kafka-run-class.sh", node),
'metrics_props': ' '.join(["%s=%s" % (k, v) for k, v in self.http_metrics_client_configs.iteritems()])
})
cmd = ""
if node.version < DEV_BRANCH:
# In order to ensure more consistent configuration between versions, always use the ProducerPerformance
# tool from the development branch
tools_jar = self.path.jar(TOOLS_JAR_NAME, DEV_BRANCH)
tools_dependant_libs_jar = self.path.jar(TOOLS_DEPENDANT_TEST_LIBS_JAR_NAME, DEV_BRANCH)
for jar in (tools_jar, tools_dependant_libs_jar):
cmd += "for file in %s; do CLASSPATH=$CLASSPATH:$file; done; " % jar
cmd += "export CLASSPATH; "
cmd += " export KAFKA_LOG4J_OPTS=\"-Dlog4j.configuration=file:%s\"; " % ProducerPerformanceService.LOG4J_CONFIG
cmd += "KAFKA_OPTS=%(kafka_opts)s KAFKA_HEAP_OPTS=\"-XX:+HeapDumpOnOutOfMemoryError\" %(kafka_run_class)s org.apache.kafka.tools.ProducerPerformance " \
"--topic %(topic)s --num-records %(num_records)d --record-size %(record_size)d --throughput %(throughput)d --producer-props bootstrap.servers=%(bootstrap_servers)s client.id=%(client_id)s %(metrics_props)s" % args
self.security_config.setup_node(node)
if self.security_config.security_protocol != SecurityConfig.PLAINTEXT:
self.settings.update(self.security_config.properties)
for key, value in self.settings.items():
cmd += " %s=%s" % (str(key), str(value))
cmd += " 2>>%s | tee %s" % (ProducerPerformanceService.STDERR_CAPTURE, ProducerPerformanceService.STDOUT_CAPTURE)
return cmd
def pids(self, node):
try:
cmd = "jps | grep -i ProducerPerformance | awk '{print $1}'"
pid_arr = [pid for pid in node.account.ssh_capture(cmd, allow_fail=True, callback=int)]
return pid_arr
except (RemoteCommandError, ValueError) as e:
return []
def alive(self, node):
return len(self.pids(node)) > 0
def _worker(self, idx, node):
node.account.ssh("mkdir -p %s" % ProducerPerformanceService.PERSISTENT_ROOT, allow_fail=False)
# Create and upload log properties
log_config = self.render('tools_log4j.properties', log_file=ProducerPerformanceService.LOG_FILE)
node.account.create_file(ProducerPerformanceService.LOG4J_CONFIG, log_config)
cmd = self.start_cmd(node)
self.logger.debug("Producer performance %d command: %s", idx, cmd)
# start ProducerPerformance process
start = time.time()
producer_output = node.account.ssh_capture(cmd)
wait_until(lambda: self.alive(node), timeout_sec=20, err_msg="ProducerPerformance failed to start")
# block until there is at least one line of output
first_line = next(producer_output, None)
if first_line is None:
raise Exception("No output from ProducerPerformance")
wait_until(lambda: not self.alive(node), timeout_sec=1200, backoff_sec=2, err_msg="ProducerPerformance failed to finish")
elapsed = time.time() - start
self.logger.debug("ProducerPerformance process ran for %s seconds" % elapsed)
# parse producer output from file
last = None
producer_output = node.account.ssh_capture("cat %s" % ProducerPerformanceService.STDOUT_CAPTURE)
for line in producer_output:
if self.intermediate_stats:
try:
self.stats[idx-1].append(self.parse_stats(line))
except:
# Sometimes there are extraneous log messages
pass
last = line
try:
self.results[idx-1] = self.parse_stats(last)
except:
raise Exception("Unable to parse aggregate performance statistics on node %d: %s" % (idx, last))
def parse_stats(self, line):
parts = line.split(',')
return {
'records': int(parts[0].split()[0]),
'records_per_sec': float(parts[1].split()[0]),
'mbps': float(parts[1].split('(')[1].split()[0]),
'latency_avg_ms': float(parts[2].split()[0]),
'latency_max_ms': float(parts[3].split()[0]),
'latency_50th_ms': float(parts[4].split()[0]),
'latency_95th_ms': float(parts[5].split()[0]),
'latency_99th_ms': float(parts[6].split()[0]),
'latency_999th_ms': float(parts[7].split()[0]),
}

View File

@@ -0,0 +1,108 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from kafkatest.services.monitor.jmx import JmxMixin
from kafkatest.services.streams import StreamsTestBaseService
from kafkatest.services.kafka import KafkaConfig
from kafkatest.services import streams_property
#
# Class used to start the simple Kafka Streams benchmark
#
class StreamsSimpleBenchmarkService(StreamsTestBaseService):
"""Base class for simple Kafka Streams benchmark"""
def __init__(self, test_context, kafka, test_name, num_threads, num_recs_or_wait_ms, key_skew, value_size):
super(StreamsSimpleBenchmarkService, self).__init__(test_context,
kafka,
"org.apache.kafka.streams.perf.SimpleBenchmark",
test_name,
num_recs_or_wait_ms,
key_skew,
value_size)
self.jmx_option = ""
if test_name.startswith('stream') or test_name.startswith('table'):
self.jmx_option = "stream-jmx"
JmxMixin.__init__(self,
num_nodes=1,
jmx_object_names=['kafka.streams:type=stream-thread-metrics,thread-id=simple-benchmark-StreamThread-%d' %(i+1) for i in range(num_threads)],
jmx_attributes=['process-latency-avg',
'process-rate',
'commit-latency-avg',
'commit-rate',
'poll-latency-avg',
'poll-rate'],
root=StreamsTestBaseService.PERSISTENT_ROOT)
if test_name.startswith('consume'):
self.jmx_option = "consumer-jmx"
JmxMixin.__init__(self,
num_nodes=1,
jmx_object_names=['kafka.consumer:type=consumer-fetch-manager-metrics,client-id=simple-benchmark-consumer'],
jmx_attributes=['records-consumed-rate'],
root=StreamsTestBaseService.PERSISTENT_ROOT)
self.num_threads = num_threads
def prop_file(self):
cfg = KafkaConfig(**{streams_property.STATE_DIR: self.PERSISTENT_ROOT,
streams_property.KAFKA_SERVERS: self.kafka.bootstrap_servers(),
streams_property.NUM_THREADS: self.num_threads})
return cfg.render()
def start_cmd(self, node):
if self.jmx_option != "":
args = self.args.copy()
args['jmx_port'] = self.jmx_port
args['config_file'] = self.CONFIG_FILE
args['stdout'] = self.STDOUT_FILE
args['stderr'] = self.STDERR_FILE
args['pidfile'] = self.PID_FILE
args['log4j'] = self.LOG4J_CONFIG_FILE
args['kafka_run_class'] = self.path.script("kafka-run-class.sh", node)
cmd = "( export JMX_PORT=%(jmx_port)s; export KAFKA_LOG4J_OPTS=\"-Dlog4j.configuration=file:%(log4j)s\"; " \
"INCLUDE_TEST_JARS=true %(kafka_run_class)s %(streams_class_name)s " \
" %(config_file)s %(user_test_args1)s %(user_test_args2)s %(user_test_args3)s" \
" %(user_test_args4)s & echo $! >&3 ) 1>> %(stdout)s 2>> %(stderr)s 3> %(pidfile)s" % args
else:
cmd = super(StreamsSimpleBenchmarkService, self).start_cmd(node)
return cmd
def start_node(self, node):
super(StreamsSimpleBenchmarkService, self).start_node(node)
if self.jmx_option != "":
self.start_jmx_tool(1, node)
def clean_node(self, node):
if self.jmx_option != "":
JmxMixin.clean_node(self, node)
super(StreamsSimpleBenchmarkService, self).clean_node(node)
def collect_data(self, node, tag = None):
# Collect the data and return it to the framework
output = node.account.ssh_capture("grep Performance %s" % self.STDOUT_FILE)
data = {}
for line in output:
parts = line.split(':')
data[tag + parts[0]] = parts[1]
return data

View File

@@ -0,0 +1,25 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# Define the root logger with appender file
log4j.rootLogger = {{ log_level|default("INFO") }}, FILE
log4j.appender.FILE=org.apache.log4j.FileAppender
log4j.appender.FILE.File={{ log_file }}
log4j.appender.FILE.ImmediateFlush=true
# Set the append to false, overwrite
log4j.appender.FILE.Append=false
log4j.appender.FILE.layout=org.apache.log4j.PatternLayout
log4j.appender.FILE.layout.conversionPattern=[%d] %p %m (%c)%n

View File

@@ -0,0 +1,93 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from ducktape.services.background_thread import BackgroundThreadService
from kafkatest.directory_layout.kafka_path import KafkaPathResolverMixin
from kafkatest.services.security.security_config import SecurityConfig
import re
class ReplicaVerificationTool(KafkaPathResolverMixin, BackgroundThreadService):
logs = {
"producer_log": {
"path": "/mnt/replica_verification_tool.log",
"collect_default": False}
}
def __init__(self, context, num_nodes, kafka, topic, report_interval_ms, security_protocol="PLAINTEXT", stop_timeout_sec=30):
super(ReplicaVerificationTool, self).__init__(context, num_nodes)
self.kafka = kafka
self.topic = topic
self.report_interval_ms = report_interval_ms
self.security_protocol = security_protocol
self.security_config = SecurityConfig(self.context, security_protocol)
self.partition_lag = {}
self.stop_timeout_sec = stop_timeout_sec
def _worker(self, idx, node):
cmd = self.start_cmd(node)
self.logger.debug("ReplicaVerificationTool %d command: %s" % (idx, cmd))
self.security_config.setup_node(node)
for line in node.account.ssh_capture(cmd):
self.logger.debug("Parsing line:{}".format(line))
parsed = re.search('.*max lag is (.+?) for partition ([a-zA-Z0-9._-]+-[0-9]+) at', line)
if parsed:
lag = int(parsed.group(1))
topic_partition = parsed.group(2)
self.logger.debug("Setting max lag for {} as {}".format(topic_partition, lag))
self.partition_lag[topic_partition] = lag
def get_lag_for_partition(self, topic, partition):
"""
Get latest lag for given topic-partition
Args:
topic: a topic
partition: a partition of the topic
"""
topic_partition = topic + '-' + str(partition)
lag = self.partition_lag.get(topic_partition, -1)
self.logger.debug("Returning lag for {} as {}".format(topic_partition, lag))
return lag
def start_cmd(self, node):
cmd = self.path.script("kafka-run-class.sh", node)
cmd += " %s" % self.java_class_name()
cmd += " --broker-list %s --topic-white-list %s --time -2 --report-interval-ms %s" % (self.kafka.bootstrap_servers(self.security_protocol), self.topic, self.report_interval_ms)
cmd += " 2>> /mnt/replica_verification_tool.log | tee -a /mnt/replica_verification_tool.log &"
return cmd
def stop_node(self, node):
node.account.kill_java_processes(self.java_class_name(), clean_shutdown=True,
allow_fail=True)
stopped = self.wait_node(node, timeout_sec=self.stop_timeout_sec)
assert stopped, "Node %s: did not stop within the specified timeout of %s seconds" % \
(str(node.account), str(self.stop_timeout_sec))
def clean_node(self, node):
node.account.kill_java_processes(self.java_class_name(), clean_shutdown=False,
allow_fail=True)
node.account.ssh("rm -rf /mnt/replica_verification_tool.log", allow_fail=False)
def java_class_name(self):
return "kafka.tools.ReplicaVerificationTool"

View File

@@ -0,0 +1,15 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

View File

@@ -0,0 +1,75 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from kafkatest.directory_layout.kafka_path import KafkaPathResolverMixin
class ACLs(KafkaPathResolverMixin):
def __init__(self, context):
self.context = context
def set_acls(self, protocol, kafka, topic, group):
node = kafka.nodes[0]
setting = kafka.zk_connect_setting()
# Set server ACLs
kafka_principal = "User:CN=systemtest" if protocol == "SSL" else "User:kafka"
self.acls_command(node, ACLs.add_cluster_acl(setting, kafka_principal))
self.acls_command(node, ACLs.broker_read_acl(setting, "*", kafka_principal))
# Set client ACLs
client_principal = "User:CN=systemtest" if protocol == "SSL" else "User:client"
self.acls_command(node, ACLs.produce_acl(setting, topic, client_principal))
self.acls_command(node, ACLs.consume_acl(setting, topic, group, client_principal))
def acls_command(self, node, properties):
cmd = "%s %s" % (self.path.script("kafka-acls.sh", node), properties)
node.account.ssh(cmd)
@staticmethod
def add_cluster_acl(zk_connect, principal="User:kafka"):
return "--authorizer-properties zookeeper.connect=%(zk_connect)s --add --cluster " \
"--operation=ClusterAction --allow-principal=%(principal)s " % {
'zk_connect': zk_connect,
'principal': principal
}
@staticmethod
def broker_read_acl(zk_connect, topic, principal="User:kafka"):
return "--authorizer-properties zookeeper.connect=%(zk_connect)s --add --topic=%(topic)s " \
"--operation=Read --allow-principal=%(principal)s " % {
'zk_connect': zk_connect,
'topic': topic,
'principal': principal
}
@staticmethod
def produce_acl(zk_connect, topic, principal="User:client"):
return "--authorizer-properties zookeeper.connect=%(zk_connect)s --add --topic=%(topic)s " \
"--producer --allow-principal=%(principal)s " % {
'zk_connect': zk_connect,
'topic': topic,
'principal': principal
}
@staticmethod
def consume_acl(zk_connect, topic, group, principal="User:client"):
return "--authorizer-properties zookeeper.connect=%(zk_connect)s --add --topic=%(topic)s " \
"--group=%(group)s --consumer --allow-principal=%(principal)s " % {
'zk_connect': zk_connect,
'topic': topic,
'group': group,
'principal': principal
}

View File

@@ -0,0 +1,43 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
class ListenerSecurityConfig:
SASL_MECHANISM_PREFIXED_CONFIGS = ["connections.max.reauth.ms", "sasl.jaas.config",
"sasl.login.callback.handler.class", "sasl.login.class",
"sasl.server.callback.handler.class"]
def __init__(self, use_separate_interbroker_listener=False,
client_listener_overrides={}, interbroker_listener_overrides={}):
"""
:param bool use_separate_interbroker_listener - if set, will use a separate interbroker listener,
with security protocol set to interbroker_security_protocol value. If set, requires
interbroker_security_protocol to be provided.
Normally port name is the same as its security protocol, so setting security_protocol and
interbroker_security_protocol to the same value will lead to a single port being open and both client
and broker-to-broker communication will go over that port. This parameter allows
you to add an interbroker listener with the same security protocol as a client listener, but running on a
separate port.
:param dict client_listener_overrides - non-prefixed listener config overrides for named client listener
(for example 'sasl.jaas.config', 'ssl.keystore.location', 'sasl.login.callback.handler.class', etc).
:param dict interbroker_listener_overrides - non-prefixed listener config overrides for named interbroker
listener (for example 'sasl.jaas.config', 'ssl.keystore.location', 'sasl.login.callback.handler.class', etc).
"""
self.use_separate_interbroker_listener = use_separate_interbroker_listener
self.client_listener_overrides = client_listener_overrides
self.interbroker_listener_overrides = interbroker_listener_overrides
def requires_sasl_mechanism_prefix(self, config):
return config in ListenerSecurityConfig.SASL_MECHANISM_PREFIXED_CONFIGS

View File

@@ -0,0 +1,136 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import random
import uuid
from io import open
from os import remove, close
from shutil import move
from tempfile import mkstemp
from ducktape.services.service import Service
from kafkatest.directory_layout.kafka_path import KafkaPathResolverMixin, CORE_LIBS_JAR_NAME, CORE_DEPENDANT_TEST_LIBS_JAR_NAME
from kafkatest.version import DEV_BRANCH
class MiniKdc(KafkaPathResolverMixin, Service):
logs = {
"minikdc_log": {
"path": "/mnt/minikdc/minikdc.log",
"collect_default": True}
}
WORK_DIR = "/mnt/minikdc"
PROPS_FILE = "/mnt/minikdc/minikdc.properties"
KEYTAB_FILE = "/mnt/minikdc/keytab"
KRB5CONF_FILE = "/mnt/minikdc/krb5.conf"
LOG_FILE = "/mnt/minikdc/minikdc.log"
LOCAL_KEYTAB_FILE = None
LOCAL_KRB5CONF_FILE = None
@staticmethod
def _set_local_keytab_file(local_scratch_dir):
"""Set MiniKdc.LOCAL_KEYTAB_FILE exactly once per test.
LOCAL_KEYTAB_FILE is currently used like a global variable to provide a mechanism to share the
location of the local keytab file among all services which might need it.
Since individual ducktape tests are each run in a subprocess forked from the ducktape main process,
class variables set at class load time are duplicated between test processes. This leads to collisions
if test subprocesses are run in parallel, so we defer setting these class variables until after the test itself
begins to run.
"""
if MiniKdc.LOCAL_KEYTAB_FILE is None:
MiniKdc.LOCAL_KEYTAB_FILE = os.path.join(local_scratch_dir, "keytab")
return MiniKdc.LOCAL_KEYTAB_FILE
@staticmethod
def _set_local_krb5conf_file(local_scratch_dir):
"""Set MiniKdc.LOCAL_KRB5CONF_FILE exactly once per test.
See _set_local_keytab_file for details why we do this.
"""
if MiniKdc.LOCAL_KRB5CONF_FILE is None:
MiniKdc.LOCAL_KRB5CONF_FILE = os.path.join(local_scratch_dir, "krb5conf")
return MiniKdc.LOCAL_KRB5CONF_FILE
def __init__(self, context, kafka_nodes, extra_principals=""):
super(MiniKdc, self).__init__(context, 1)
self.kafka_nodes = kafka_nodes
self.extra_principals = extra_principals
# context.local_scratch_dir uses a ducktape feature:
# each test_context object has a unique local scratch directory which is available for the duration of the test
# which is automatically garbage collected after the test finishes
MiniKdc._set_local_keytab_file(context.local_scratch_dir)
MiniKdc._set_local_krb5conf_file(context.local_scratch_dir)
def replace_in_file(self, file_path, pattern, subst):
fh, abs_path = mkstemp()
with open(abs_path, 'w') as new_file:
with open(file_path) as old_file:
for line in old_file:
new_file.write(line.replace(pattern, subst))
close(fh)
remove(file_path)
move(abs_path, file_path)
def start_node(self, node):
node.account.ssh("mkdir -p %s" % MiniKdc.WORK_DIR, allow_fail=False)
props_file = self.render('minikdc.properties', node=node)
node.account.create_file(MiniKdc.PROPS_FILE, props_file)
self.logger.info("minikdc.properties")
self.logger.info(props_file)
kafka_principals = ' '.join(['kafka/' + kafka_node.account.hostname for kafka_node in self.kafka_nodes])
principals = 'client ' + kafka_principals + ' ' + self.extra_principals
self.logger.info("Starting MiniKdc with principals " + principals)
core_libs_jar = self.path.jar(CORE_LIBS_JAR_NAME, DEV_BRANCH)
core_dependant_test_libs_jar = self.path.jar(CORE_DEPENDANT_TEST_LIBS_JAR_NAME, DEV_BRANCH)
cmd = "for file in %s; do CLASSPATH=$CLASSPATH:$file; done;" % core_libs_jar
cmd += " for file in %s; do CLASSPATH=$CLASSPATH:$file; done;" % core_dependant_test_libs_jar
cmd += " export CLASSPATH;"
cmd += " %s kafka.security.minikdc.MiniKdc %s %s %s %s 1>> %s 2>> %s &" % (self.path.script("kafka-run-class.sh", node), MiniKdc.WORK_DIR, MiniKdc.PROPS_FILE, MiniKdc.KEYTAB_FILE, principals, MiniKdc.LOG_FILE, MiniKdc.LOG_FILE)
self.logger.debug("Attempting to start MiniKdc on %s with command: %s" % (str(node.account), cmd))
with node.account.monitor_log(MiniKdc.LOG_FILE) as monitor:
node.account.ssh(cmd)
monitor.wait_until("MiniKdc Running", timeout_sec=60, backoff_sec=1, err_msg="MiniKdc didn't finish startup")
node.account.copy_from(MiniKdc.KEYTAB_FILE, MiniKdc.LOCAL_KEYTAB_FILE)
node.account.copy_from(MiniKdc.KRB5CONF_FILE, MiniKdc.LOCAL_KRB5CONF_FILE)
# KDC is set to bind openly (via 0.0.0.0). Change krb5.conf to hold the specific KDC address
self.replace_in_file(MiniKdc.LOCAL_KRB5CONF_FILE, '0.0.0.0', node.account.hostname)
def stop_node(self, node):
self.logger.info("Stopping %s on %s" % (type(self).__name__, node.account.hostname))
node.account.kill_java_processes("MiniKdc", clean_shutdown=True, allow_fail=False)
def clean_node(self, node):
node.account.kill_java_processes("MiniKdc", clean_shutdown=False, allow_fail=True)
node.account.ssh("rm -rf " + MiniKdc.WORK_DIR, allow_fail=False)
if os.path.exists(MiniKdc.LOCAL_KEYTAB_FILE):
os.remove(MiniKdc.LOCAL_KEYTAB_FILE)
if os.path.exists(MiniKdc.LOCAL_KRB5CONF_FILE):
os.remove(MiniKdc.LOCAL_KRB5CONF_FILE)

View File

@@ -0,0 +1,352 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import logging
import os
import subprocess
from tempfile import mkdtemp
from shutil import rmtree
from ducktape.template import TemplateRenderer
from kafkatest.services.security.minikdc import MiniKdc
from kafkatest.services.security.listener_security_config import ListenerSecurityConfig
import itertools
class SslStores(object):
def __init__(self, local_scratch_dir, logger=None):
self.logger = logger
self.ca_crt_path = os.path.join(local_scratch_dir, "test.ca.crt")
self.ca_jks_path = os.path.join(local_scratch_dir, "test.ca.jks")
self.ca_passwd = "test-ca-passwd"
self.truststore_path = os.path.join(local_scratch_dir, "test.truststore.jks")
self.truststore_passwd = "test-ts-passwd"
self.keystore_passwd = "test-ks-passwd"
# Zookeeper TLS (as of v3.5.6) does not support a key password different than the keystore password
self.key_passwd = self.keystore_passwd
# Allow upto one hour of clock skew between host and VMs
self.startdate = "-1H"
for file in [self.ca_crt_path, self.ca_jks_path, self.truststore_path]:
if os.path.exists(file):
os.remove(file)
def generate_ca(self):
"""
Generate CA private key and certificate.
"""
self.runcmd("keytool -genkeypair -alias ca -keyalg RSA -keysize 2048 -keystore %s -storetype JKS -storepass %s -keypass %s -dname CN=SystemTestCA -startdate %s --ext bc=ca:true" % (self.ca_jks_path, self.ca_passwd, self.ca_passwd, self.startdate))
self.runcmd("keytool -export -alias ca -keystore %s -storepass %s -storetype JKS -rfc -file %s" % (self.ca_jks_path, self.ca_passwd, self.ca_crt_path))
def generate_truststore(self):
"""
Generate JKS truststore containing CA certificate.
"""
self.runcmd("keytool -importcert -alias ca -file %s -keystore %s -storepass %s -storetype JKS -noprompt" % (self.ca_crt_path, self.truststore_path, self.truststore_passwd))
def generate_and_copy_keystore(self, node):
"""
Generate JKS keystore with certificate signed by the test CA.
The generated certificate has the node's hostname as a DNS SubjectAlternativeName.
"""
ks_dir = mkdtemp(dir="/tmp")
ks_path = os.path.join(ks_dir, "test.keystore.jks")
csr_path = os.path.join(ks_dir, "test.kafka.csr")
crt_path = os.path.join(ks_dir, "test.kafka.crt")
self.runcmd("keytool -genkeypair -alias kafka -keyalg RSA -keysize 2048 -keystore %s -storepass %s -storetype JKS -keypass %s -dname CN=systemtest -ext SAN=DNS:%s -startdate %s" % (ks_path, self.keystore_passwd, self.key_passwd, self.hostname(node), self.startdate))
self.runcmd("keytool -certreq -keystore %s -storepass %s -storetype JKS -keypass %s -alias kafka -file %s" % (ks_path, self.keystore_passwd, self.key_passwd, csr_path))
self.runcmd("keytool -gencert -keystore %s -storepass %s -storetype JKS -alias ca -infile %s -outfile %s -dname CN=systemtest -ext SAN=DNS:%s -startdate %s" % (self.ca_jks_path, self.ca_passwd, csr_path, crt_path, self.hostname(node), self.startdate))
self.runcmd("keytool -importcert -keystore %s -storepass %s -storetype JKS -alias ca -file %s -noprompt" % (ks_path, self.keystore_passwd, self.ca_crt_path))
self.runcmd("keytool -importcert -keystore %s -storepass %s -storetype JKS -keypass %s -alias kafka -file %s -noprompt" % (ks_path, self.keystore_passwd, self.key_passwd, crt_path))
node.account.copy_to(ks_path, SecurityConfig.KEYSTORE_PATH)
# generate ZooKeeper client TLS config file for encryption-only (no client cert) use case
str = """zookeeper.clientCnxnSocket=org.apache.zookeeper.ClientCnxnSocketNetty
zookeeper.ssl.client.enable=true
zookeeper.ssl.truststore.location=%s
zookeeper.ssl.truststore.password=%s
""" % (SecurityConfig.TRUSTSTORE_PATH, self.truststore_passwd)
node.account.create_file(SecurityConfig.ZK_CLIENT_TLS_ENCRYPT_ONLY_CONFIG_PATH, str)
# also generate ZooKeeper client TLS config file for mutual authentication use case
str = """zookeeper.clientCnxnSocket=org.apache.zookeeper.ClientCnxnSocketNetty
zookeeper.ssl.client.enable=true
zookeeper.ssl.truststore.location=%s
zookeeper.ssl.truststore.password=%s
zookeeper.ssl.keystore.location=%s
zookeeper.ssl.keystore.password=%s
""" % (SecurityConfig.TRUSTSTORE_PATH, self.truststore_passwd, SecurityConfig.KEYSTORE_PATH, self.keystore_passwd)
node.account.create_file(SecurityConfig.ZK_CLIENT_MUTUAL_AUTH_CONFIG_PATH, str)
rmtree(ks_dir)
def hostname(self, node):
""" Hostname which may be overridden for testing validation failures
"""
return node.account.hostname
def runcmd(self, cmd):
if self.logger:
self.logger.log(logging.DEBUG, cmd)
proc = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
stdout, stderr = proc.communicate()
if proc.returncode != 0:
raise RuntimeError("Command '%s' returned non-zero exit status %d: %s" % (cmd, proc.returncode, stdout))
class SecurityConfig(TemplateRenderer):
PLAINTEXT = 'PLAINTEXT'
SSL = 'SSL'
SASL_PLAINTEXT = 'SASL_PLAINTEXT'
SASL_SSL = 'SASL_SSL'
SASL_MECHANISM_GSSAPI = 'GSSAPI'
SASL_MECHANISM_PLAIN = 'PLAIN'
SASL_MECHANISM_SCRAM_SHA_256 = 'SCRAM-SHA-256'
SASL_MECHANISM_SCRAM_SHA_512 = 'SCRAM-SHA-512'
SCRAM_CLIENT_USER = "kafka-client"
SCRAM_CLIENT_PASSWORD = "client-secret"
SCRAM_BROKER_USER = "kafka-broker"
SCRAM_BROKER_PASSWORD = "broker-secret"
CONFIG_DIR = "/mnt/security"
KEYSTORE_PATH = "/mnt/security/test.keystore.jks"
TRUSTSTORE_PATH = "/mnt/security/test.truststore.jks"
ZK_CLIENT_TLS_ENCRYPT_ONLY_CONFIG_PATH = "/mnt/security/zk_client_tls_encrypt_only_config.properties"
ZK_CLIENT_MUTUAL_AUTH_CONFIG_PATH = "/mnt/security/zk_client_mutual_auth_config.properties"
JAAS_CONF_PATH = "/mnt/security/jaas.conf"
KRB5CONF_PATH = "/mnt/security/krb5.conf"
KEYTAB_PATH = "/mnt/security/keytab"
# This is initialized only when the first instance of SecurityConfig is created
ssl_stores = None
def __init__(self, context, security_protocol=None, interbroker_security_protocol=None,
client_sasl_mechanism=SASL_MECHANISM_GSSAPI, interbroker_sasl_mechanism=SASL_MECHANISM_GSSAPI,
zk_sasl=False, zk_tls=False, template_props="", static_jaas_conf=True, jaas_override_variables=None,
listener_security_config=ListenerSecurityConfig()):
"""
Initialize the security properties for the node and copy
keystore and truststore to the remote node if the transport protocol
is SSL. If security_protocol is None, the protocol specified in the
template properties file is used. If no protocol is specified in the
template properties either, PLAINTEXT is used as default.
"""
self.context = context
if not SecurityConfig.ssl_stores:
# This generates keystore/trustore files in a local scratch directory which gets
# automatically destroyed after the test is run
# Creating within the scratch directory allows us to run tests in parallel without fear of collision
SecurityConfig.ssl_stores = SslStores(context.local_scratch_dir, context.logger)
SecurityConfig.ssl_stores.generate_ca()
SecurityConfig.ssl_stores.generate_truststore()
if security_protocol is None:
security_protocol = self.get_property('security.protocol', template_props)
if security_protocol is None:
security_protocol = SecurityConfig.PLAINTEXT
elif security_protocol not in [SecurityConfig.PLAINTEXT, SecurityConfig.SSL, SecurityConfig.SASL_PLAINTEXT, SecurityConfig.SASL_SSL]:
raise Exception("Invalid security.protocol in template properties: " + security_protocol)
if interbroker_security_protocol is None:
interbroker_security_protocol = security_protocol
self.interbroker_security_protocol = interbroker_security_protocol
self.has_sasl = self.is_sasl(security_protocol) or self.is_sasl(interbroker_security_protocol) or zk_sasl
self.has_ssl = self.is_ssl(security_protocol) or self.is_ssl(interbroker_security_protocol) or zk_tls
self.zk_sasl = zk_sasl
self.zk_tls = zk_tls
self.static_jaas_conf = static_jaas_conf
self.listener_security_config = listener_security_config
self.properties = {
'security.protocol' : security_protocol,
'ssl.keystore.location' : SecurityConfig.KEYSTORE_PATH,
'ssl.keystore.password' : SecurityConfig.ssl_stores.keystore_passwd,
'ssl.key.password' : SecurityConfig.ssl_stores.key_passwd,
'ssl.truststore.location' : SecurityConfig.TRUSTSTORE_PATH,
'ssl.truststore.password' : SecurityConfig.ssl_stores.truststore_passwd,
'ssl.endpoint.identification.algorithm' : 'HTTPS',
'sasl.mechanism' : client_sasl_mechanism,
'sasl.mechanism.inter.broker.protocol' : interbroker_sasl_mechanism,
'sasl.kerberos.service.name' : 'kafka'
}
self.properties.update(self.listener_security_config.client_listener_overrides)
self.jaas_override_variables = jaas_override_variables or {}
def client_config(self, template_props="", node=None, jaas_override_variables=None):
# If node is not specified, use static jaas config which will be created later.
# Otherwise use static JAAS configuration files with SASL_SSL and sasl.jaas.config
# property with SASL_PLAINTEXT so that both code paths are tested by existing tests.
# Note that this is an artibtrary choice and it is possible to run all tests with
# either static or dynamic jaas config files if required.
static_jaas_conf = node is None or (self.has_sasl and self.has_ssl)
return SecurityConfig(self.context, self.security_protocol,
client_sasl_mechanism=self.client_sasl_mechanism,
template_props=template_props,
static_jaas_conf=static_jaas_conf,
jaas_override_variables=jaas_override_variables,
listener_security_config=self.listener_security_config)
def enable_security_protocol(self, security_protocol):
self.has_sasl = self.has_sasl or self.is_sasl(security_protocol)
self.has_ssl = self.has_ssl or self.is_ssl(security_protocol)
def setup_ssl(self, node):
node.account.ssh("mkdir -p %s" % SecurityConfig.CONFIG_DIR, allow_fail=False)
node.account.copy_to(SecurityConfig.ssl_stores.truststore_path, SecurityConfig.TRUSTSTORE_PATH)
SecurityConfig.ssl_stores.generate_and_copy_keystore(node)
def setup_sasl(self, node):
node.account.ssh("mkdir -p %s" % SecurityConfig.CONFIG_DIR, allow_fail=False)
jaas_conf_file = "jaas.conf"
java_version = node.account.ssh_capture("java -version")
jaas_conf = None
if 'sasl.jaas.config' not in self.properties:
jaas_conf = self.render_jaas_config(
jaas_conf_file,
{
'node': node,
'is_ibm_jdk': any('IBM' in line for line in java_version),
'SecurityConfig': SecurityConfig,
'client_sasl_mechanism': self.client_sasl_mechanism,
'enabled_sasl_mechanisms': self.enabled_sasl_mechanisms
}
)
else:
jaas_conf = self.properties['sasl.jaas.config']
if self.static_jaas_conf:
node.account.create_file(SecurityConfig.JAAS_CONF_PATH, jaas_conf)
elif 'sasl.jaas.config' not in self.properties:
self.properties['sasl.jaas.config'] = jaas_conf.replace("\n", " \\\n")
if self.has_sasl_kerberos:
node.account.copy_to(MiniKdc.LOCAL_KEYTAB_FILE, SecurityConfig.KEYTAB_PATH)
node.account.copy_to(MiniKdc.LOCAL_KRB5CONF_FILE, SecurityConfig.KRB5CONF_PATH)
def render_jaas_config(self, jaas_conf_file, config_variables):
"""
Renders the JAAS config file contents
:param jaas_conf_file: name of the JAAS config template file
:param config_variables: dict of variables used in the template
:return: the rendered template string
"""
variables = config_variables.copy()
variables.update(self.jaas_override_variables) # override variables
return self.render(jaas_conf_file, **variables)
def setup_node(self, node):
if self.has_ssl:
self.setup_ssl(node)
if self.has_sasl:
self.setup_sasl(node)
def setup_credentials(self, node, path, zk_connect, broker):
if broker:
self.maybe_create_scram_credentials(node, zk_connect, path, self.interbroker_sasl_mechanism,
SecurityConfig.SCRAM_BROKER_USER, SecurityConfig.SCRAM_BROKER_PASSWORD)
else:
self.maybe_create_scram_credentials(node, zk_connect, path, self.client_sasl_mechanism,
SecurityConfig.SCRAM_CLIENT_USER, SecurityConfig.SCRAM_CLIENT_PASSWORD)
def maybe_create_scram_credentials(self, node, zk_connect, path, mechanism, user_name, password):
if self.has_sasl and self.is_sasl_scram(mechanism):
cmd = "%s --zookeeper %s --entity-name %s --entity-type users --alter --add-config %s=[password=%s]" % \
(path.script("kafka-configs.sh", node), zk_connect,
user_name, mechanism, password)
node.account.ssh(cmd)
def clean_node(self, node):
if self.security_protocol != SecurityConfig.PLAINTEXT:
node.account.ssh("rm -rf %s" % SecurityConfig.CONFIG_DIR, allow_fail=False)
def get_property(self, prop_name, template_props=""):
"""
Get property value from the string representation of
a properties file.
"""
value = None
for line in template_props.split("\n"):
items = line.split("=")
if len(items) == 2 and items[0].strip() == prop_name:
value = str(items[1].strip())
return value
def is_ssl(self, security_protocol):
return security_protocol == SecurityConfig.SSL or security_protocol == SecurityConfig.SASL_SSL
def is_sasl(self, security_protocol):
return security_protocol == SecurityConfig.SASL_PLAINTEXT or security_protocol == SecurityConfig.SASL_SSL
def is_sasl_scram(self, sasl_mechanism):
return sasl_mechanism == SecurityConfig.SASL_MECHANISM_SCRAM_SHA_256 or sasl_mechanism == SecurityConfig.SASL_MECHANISM_SCRAM_SHA_512
@property
def security_protocol(self):
return self.properties['security.protocol']
@property
def client_sasl_mechanism(self):
return self.properties['sasl.mechanism']
@property
def interbroker_sasl_mechanism(self):
return self.properties['sasl.mechanism.inter.broker.protocol']
@property
def enabled_sasl_mechanisms(self):
return set([self.client_sasl_mechanism, self.interbroker_sasl_mechanism])
@property
def has_sasl_kerberos(self):
return self.has_sasl and (SecurityConfig.SASL_MECHANISM_GSSAPI in self.enabled_sasl_mechanisms)
@property
def kafka_opts(self):
if self.has_sasl:
if self.static_jaas_conf:
return "\"-Djava.security.auth.login.config=%s -Djava.security.krb5.conf=%s\"" % (SecurityConfig.JAAS_CONF_PATH, SecurityConfig.KRB5CONF_PATH)
else:
return "\"-Djava.security.krb5.conf=%s\"" % SecurityConfig.KRB5CONF_PATH
else:
return ""
def props(self, prefix=''):
"""
Return properties as string with line separators, optionally with a prefix.
This is used to append security config properties to
a properties file.
:param prefix: prefix to add to each property
:return: a string containing line-separated properties
"""
if self.security_protocol == SecurityConfig.PLAINTEXT:
return ""
if self.has_sasl and not self.static_jaas_conf and 'sasl.jaas.config' not in self.properties:
raise Exception("JAAS configuration property has not yet been initialized")
config_lines = (prefix + key + "=" + value for key, value in self.properties.iteritems())
# Extra blank lines ensure this can be appended/prepended safely
return "\n".join(itertools.chain([""], config_lines, [""]))
def __str__(self):
"""
Return properties as a string with line separators.
"""
return self.props()

View File

@@ -0,0 +1,108 @@
/**
* Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE
* file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file
* to You under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the
* License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
* an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
* specific language governing permissions and limitations under the License.
*/
{% if static_jaas_conf %}
KafkaClient {
{% endif %}
{% if "GSSAPI" in client_sasl_mechanism %}
{% if is_ibm_jdk %}
com.ibm.security.auth.module.Krb5LoginModule required debug=false
credsType=both
useKeytab="file:/mnt/security/keytab"
principal="client@EXAMPLE.COM";
{% else %}
com.sun.security.auth.module.Krb5LoginModule required debug=false
doNotPrompt=true
useKeyTab=true
storeKey=true
keyTab="/mnt/security/keytab"
principal="client@EXAMPLE.COM";
{% endif %}
{% elif client_sasl_mechanism == "PLAIN" %}
org.apache.kafka.common.security.plain.PlainLoginModule required
username="client"
password="client-secret";
{% elif "SCRAM-SHA-256" in client_sasl_mechanism or "SCRAM-SHA-512" in client_sasl_mechanism %}
org.apache.kafka.common.security.scram.ScramLoginModule required
username="{{ SecurityConfig.SCRAM_CLIENT_USER }}"
password="{{ SecurityConfig.SCRAM_CLIENT_PASSWORD }}";
{% endif %}
{% if static_jaas_conf %}
};
KafkaServer {
{% if "GSSAPI" in enabled_sasl_mechanisms %}
{% if is_ibm_jdk %}
com.ibm.security.auth.module.Krb5LoginModule required debug=false
credsType=both
useKeytab="file:/mnt/security/keytab"
principal="kafka/{{ node.account.hostname }}@EXAMPLE.COM";
{% else %}
com.sun.security.auth.module.Krb5LoginModule required debug=false
doNotPrompt=true
useKeyTab=true
storeKey=true
keyTab="/mnt/security/keytab"
principal="kafka/{{ node.account.hostname }}@EXAMPLE.COM";
{% endif %}
{% endif %}
{% if "PLAIN" in enabled_sasl_mechanisms %}
org.apache.kafka.common.security.plain.PlainLoginModule required
username="kafka"
password="kafka-secret"
user_client="client-secret"
user_kafka="kafka-secret";
{% endif %}
{% if "SCRAM-SHA-256" in client_sasl_mechanism or "SCRAM-SHA-512" in client_sasl_mechanism %}
org.apache.kafka.common.security.scram.ScramLoginModule required
username="{{ SecurityConfig.SCRAM_BROKER_USER }}"
password="{{ SecurityConfig.SCRAM_BROKER_PASSWORD }}";
{% endif %}
};
{% if zk_sasl %}
Client {
{% if is_ibm_jdk %}
com.ibm.security.auth.module.Krb5LoginModule required debug=false
credsType=both
useKeytab="file:/mnt/security/keytab"
principal="zkclient@EXAMPLE.COM";
{% else %}
com.sun.security.auth.module.Krb5LoginModule required
useKeyTab=true
keyTab="/mnt/security/keytab"
storeKey=true
useTicketCache=false
principal="zkclient@EXAMPLE.COM";
{% endif %}
};
Server {
{% if is_ibm_jdk %}
com.ibm.security.auth.module.Krb5LoginModule required debug=false
credsType=both
useKeyTab="file:/mnt/security/keytab"
principal="zookeeper/{{ node.account.hostname }}@EXAMPLE.COM";
{% else %}
com.sun.security.auth.module.Krb5LoginModule required
useKeyTab=true
keyTab="/mnt/security/keytab"
storeKey=true
useTicketCache=false
principal="zookeeper/{{ node.account.hostname }}@EXAMPLE.COM";
{% endif %}
};
{% endif %}
{% endif %}

View File

@@ -0,0 +1,17 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
kdc.bind.address=0.0.0.0

View File

@@ -0,0 +1,640 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os.path
import signal
import streams_property
import consumer_property
from ducktape.services.service import Service
from ducktape.utils.util import wait_until
from kafkatest.directory_layout.kafka_path import KafkaPathResolverMixin
from kafkatest.services.kafka import KafkaConfig
from kafkatest.services.monitor.jmx import JmxMixin
from kafkatest.version import LATEST_0_10_0, LATEST_0_10_1
STATE_DIR = "state.dir"
class StreamsTestBaseService(KafkaPathResolverMixin, JmxMixin, Service):
"""Base class for Streams Test services providing some common settings and functionality"""
PERSISTENT_ROOT = "/mnt/streams"
# The log file contains normal log4j logs written using a file appender. stdout and stderr are handled separately
CONFIG_FILE = os.path.join(PERSISTENT_ROOT, "streams.properties")
LOG_FILE = os.path.join(PERSISTENT_ROOT, "streams.log")
STDOUT_FILE = os.path.join(PERSISTENT_ROOT, "streams.stdout")
STDERR_FILE = os.path.join(PERSISTENT_ROOT, "streams.stderr")
JMX_LOG_FILE = os.path.join(PERSISTENT_ROOT, "jmx_tool.log")
JMX_ERR_FILE = os.path.join(PERSISTENT_ROOT, "jmx_tool.err.log")
LOG4J_CONFIG_FILE = os.path.join(PERSISTENT_ROOT, "tools-log4j.properties")
PID_FILE = os.path.join(PERSISTENT_ROOT, "streams.pid")
CLEAN_NODE_ENABLED = True
logs = {
"streams_log": {
"path": LOG_FILE,
"collect_default": True},
"streams_stdout": {
"path": STDOUT_FILE,
"collect_default": True},
"streams_stderr": {
"path": STDERR_FILE,
"collect_default": True},
"streams_log.1": {
"path": LOG_FILE + ".1",
"collect_default": True},
"streams_stdout.1": {
"path": STDOUT_FILE + ".1",
"collect_default": True},
"streams_stderr.1": {
"path": STDERR_FILE + ".1",
"collect_default": True},
"streams_log.2": {
"path": LOG_FILE + ".2",
"collect_default": True},
"streams_stdout.2": {
"path": STDOUT_FILE + ".2",
"collect_default": True},
"streams_stderr.2": {
"path": STDERR_FILE + ".2",
"collect_default": True},
"streams_log.3": {
"path": LOG_FILE + ".3",
"collect_default": True},
"streams_stdout.3": {
"path": STDOUT_FILE + ".3",
"collect_default": True},
"streams_stderr.3": {
"path": STDERR_FILE + ".3",
"collect_default": True},
"streams_log.0-1": {
"path": LOG_FILE + ".0-1",
"collect_default": True},
"streams_stdout.0-1": {
"path": STDOUT_FILE + ".0-1",
"collect_default": True},
"streams_stderr.0-1": {
"path": STDERR_FILE + ".0-1",
"collect_default": True},
"streams_log.0-2": {
"path": LOG_FILE + ".0-2",
"collect_default": True},
"streams_stdout.0-2": {
"path": STDOUT_FILE + ".0-2",
"collect_default": True},
"streams_stderr.0-2": {
"path": STDERR_FILE + ".0-2",
"collect_default": True},
"streams_log.0-3": {
"path": LOG_FILE + ".0-3",
"collect_default": True},
"streams_stdout.0-3": {
"path": STDOUT_FILE + ".0-3",
"collect_default": True},
"streams_stderr.0-3": {
"path": STDERR_FILE + ".0-3",
"collect_default": True},
"streams_log.0-4": {
"path": LOG_FILE + ".0-4",
"collect_default": True},
"streams_stdout.0-4": {
"path": STDOUT_FILE + ".0-4",
"collect_default": True},
"streams_stderr.0-4": {
"path": STDERR_FILE + ".0-4",
"collect_default": True},
"streams_log.0-5": {
"path": LOG_FILE + ".0-5",
"collect_default": True},
"streams_stdout.0-5": {
"path": STDOUT_FILE + ".0-5",
"collect_default": True},
"streams_stderr.0-5": {
"path": STDERR_FILE + ".0-5",
"collect_default": True},
"streams_log.0-6": {
"path": LOG_FILE + ".0-6",
"collect_default": True},
"streams_stdout.0-6": {
"path": STDOUT_FILE + ".0-6",
"collect_default": True},
"streams_stderr.0-6": {
"path": STDERR_FILE + ".0-6",
"collect_default": True},
"streams_log.1-1": {
"path": LOG_FILE + ".1-1",
"collect_default": True},
"streams_stdout.1-1": {
"path": STDOUT_FILE + ".1-1",
"collect_default": True},
"streams_stderr.1-1": {
"path": STDERR_FILE + ".1-1",
"collect_default": True},
"streams_log.1-2": {
"path": LOG_FILE + ".1-2",
"collect_default": True},
"streams_stdout.1-2": {
"path": STDOUT_FILE + ".1-2",
"collect_default": True},
"streams_stderr.1-2": {
"path": STDERR_FILE + ".1-2",
"collect_default": True},
"streams_log.1-3": {
"path": LOG_FILE + ".1-3",
"collect_default": True},
"streams_stdout.1-3": {
"path": STDOUT_FILE + ".1-3",
"collect_default": True},
"streams_stderr.1-3": {
"path": STDERR_FILE + ".1-3",
"collect_default": True},
"streams_log.1-4": {
"path": LOG_FILE + ".1-4",
"collect_default": True},
"streams_stdout.1-4": {
"path": STDOUT_FILE + ".1-4",
"collect_default": True},
"streams_stderr.1-4": {
"path": STDERR_FILE + ".1-4",
"collect_default": True},
"streams_log.1-5": {
"path": LOG_FILE + ".1-5",
"collect_default": True},
"streams_stdout.1-5": {
"path": STDOUT_FILE + ".1-5",
"collect_default": True},
"streams_stderr.1-5": {
"path": STDERR_FILE + ".1-5",
"collect_default": True},
"streams_log.1-6": {
"path": LOG_FILE + ".1-6",
"collect_default": True},
"streams_stdout.1-6": {
"path": STDOUT_FILE + ".1-6",
"collect_default": True},
"streams_stderr.1-6": {
"path": STDERR_FILE + ".1-6",
"collect_default": True},
"jmx_log": {
"path": JMX_LOG_FILE,
"collect_default": True},
"jmx_err": {
"path": JMX_ERR_FILE,
"collect_default": True},
}
def __init__(self, test_context, kafka, streams_class_name, user_test_args1, user_test_args2=None, user_test_args3=None, user_test_args4=None):
Service.__init__(self, test_context, num_nodes=1)
self.kafka = kafka
self.args = {'streams_class_name': streams_class_name,
'user_test_args1': user_test_args1,
'user_test_args2': user_test_args2,
'user_test_args3': user_test_args3,
'user_test_args4': user_test_args4}
self.log_level = "DEBUG"
@property
def node(self):
return self.nodes[0]
def pids(self, node):
try:
return [pid for pid in node.account.ssh_capture("cat " + self.PID_FILE, callback=int)]
except:
return []
def stop_nodes(self, clean_shutdown=True):
for node in self.nodes:
self.stop_node(node, clean_shutdown)
def stop_node(self, node, clean_shutdown=True):
self.logger.info((clean_shutdown and "Cleanly" or "Forcibly") + " stopping Streams Test on " + str(node.account))
pids = self.pids(node)
sig = signal.SIGTERM if clean_shutdown else signal.SIGKILL
for pid in pids:
node.account.signal(pid, sig, allow_fail=True)
if clean_shutdown:
for pid in pids:
wait_until(lambda: not node.account.alive(pid), timeout_sec=120, err_msg="Streams Test process on " + str(node.account) + " took too long to exit")
node.account.ssh("rm -f " + self.PID_FILE, allow_fail=False)
def restart(self):
# We don't want to do any clean up here, just restart the process.
for node in self.nodes:
self.logger.info("Restarting Kafka Streams on " + str(node.account))
self.stop_node(node)
self.start_node(node)
def abortThenRestart(self):
# We don't want to do any clean up here, just abort then restart the process. The running service is killed immediately.
for node in self.nodes:
self.logger.info("Aborting Kafka Streams on " + str(node.account))
self.stop_node(node, False)
self.logger.info("Restarting Kafka Streams on " + str(node.account))
self.start_node(node)
def wait(self, timeout_sec=1440):
for node in self.nodes:
self.wait_node(node, timeout_sec)
def wait_node(self, node, timeout_sec=None):
for pid in self.pids(node):
wait_until(lambda: not node.account.alive(pid), timeout_sec=timeout_sec, err_msg="Streams Test process on " + str(node.account) + " took too long to exit")
def clean_node(self, node):
node.account.kill_process("streams", clean_shutdown=False, allow_fail=True)
if self.CLEAN_NODE_ENABLED:
node.account.ssh("rm -rf " + self.PERSISTENT_ROOT, allow_fail=False)
def start_cmd(self, node):
args = self.args.copy()
args['config_file'] = self.CONFIG_FILE
args['stdout'] = self.STDOUT_FILE
args['stderr'] = self.STDERR_FILE
args['pidfile'] = self.PID_FILE
args['log4j'] = self.LOG4J_CONFIG_FILE
args['kafka_run_class'] = self.path.script("kafka-run-class.sh", node)
cmd = "( export KAFKA_LOG4J_OPTS=\"-Dlog4j.configuration=file:%(log4j)s\"; " \
"INCLUDE_TEST_JARS=true %(kafka_run_class)s %(streams_class_name)s " \
" %(config_file)s %(user_test_args1)s %(user_test_args2)s %(user_test_args3)s" \
" %(user_test_args4)s & echo $! >&3 ) 1>> %(stdout)s 2>> %(stderr)s 3> %(pidfile)s" % args
self.logger.info("Executing streams cmd: " + cmd)
return cmd
def prop_file(self):
cfg = KafkaConfig(**{streams_property.STATE_DIR: self.PERSISTENT_ROOT, streams_property.KAFKA_SERVERS: self.kafka.bootstrap_servers()})
return cfg.render()
def start_node(self, node):
node.account.mkdirs(self.PERSISTENT_ROOT)
prop_file = self.prop_file()
node.account.create_file(self.CONFIG_FILE, prop_file)
node.account.create_file(self.LOG4J_CONFIG_FILE, self.render('tools_log4j.properties', log_file=self.LOG_FILE))
self.logger.info("Starting StreamsTest process on " + str(node.account))
with node.account.monitor_log(self.STDOUT_FILE) as monitor:
node.account.ssh(self.start_cmd(node))
monitor.wait_until('StreamsTest instance started', timeout_sec=60, err_msg="Never saw message indicating StreamsTest finished startup on " + str(node.account))
if len(self.pids(node)) == 0:
raise RuntimeError("No process ids recorded")
class StreamsSmokeTestBaseService(StreamsTestBaseService):
"""Base class for Streams Smoke Test services providing some common settings and functionality"""
def __init__(self, test_context, kafka, command, num_threads = 3):
super(StreamsSmokeTestBaseService, self).__init__(test_context,
kafka,
"org.apache.kafka.streams.tests.StreamsSmokeTest",
command)
self.NUM_THREADS = num_threads
def prop_file(self):
properties = {streams_property.STATE_DIR: self.PERSISTENT_ROOT,
streams_property.KAFKA_SERVERS: self.kafka.bootstrap_servers(),
streams_property.NUM_THREADS: self.NUM_THREADS}
cfg = KafkaConfig(**properties)
return cfg.render()
class StreamsEosTestBaseService(StreamsTestBaseService):
"""Base class for Streams EOS Test services providing some common settings and functionality"""
clean_node_enabled = True
def __init__(self, test_context, kafka, command):
super(StreamsEosTestBaseService, self).__init__(test_context,
kafka,
"org.apache.kafka.streams.tests.StreamsEosTest",
command)
def clean_node(self, node):
if self.clean_node_enabled:
super(StreamsEosTestBaseService, self).clean_node(node)
class StreamsSmokeTestDriverService(StreamsSmokeTestBaseService):
def __init__(self, test_context, kafka):
super(StreamsSmokeTestDriverService, self).__init__(test_context, kafka, "run")
self.DISABLE_AUTO_TERMINATE = ""
def disable_auto_terminate(self):
self.DISABLE_AUTO_TERMINATE = "disableAutoTerminate"
def start_cmd(self, node):
args = self.args.copy()
args['config_file'] = self.CONFIG_FILE
args['stdout'] = self.STDOUT_FILE
args['stderr'] = self.STDERR_FILE
args['pidfile'] = self.PID_FILE
args['log4j'] = self.LOG4J_CONFIG_FILE
args['disable_auto_terminate'] = self.DISABLE_AUTO_TERMINATE
args['kafka_run_class'] = self.path.script("kafka-run-class.sh", node)
cmd = "( export KAFKA_LOG4J_OPTS=\"-Dlog4j.configuration=file:%(log4j)s\"; " \
"INCLUDE_TEST_JARS=true %(kafka_run_class)s %(streams_class_name)s " \
" %(config_file)s %(user_test_args1)s %(disable_auto_terminate)s" \
" & echo $! >&3 ) 1>> %(stdout)s 2>> %(stderr)s 3> %(pidfile)s" % args
return cmd
class StreamsSmokeTestJobRunnerService(StreamsSmokeTestBaseService):
def __init__(self, test_context, kafka, num_threads = 3):
super(StreamsSmokeTestJobRunnerService, self).__init__(test_context, kafka, "process", num_threads)
class StreamsSmokeTestEOSJobRunnerService(StreamsSmokeTestBaseService):
def __init__(self, test_context, kafka):
super(StreamsSmokeTestEOSJobRunnerService, self).__init__(test_context, kafka, "process-eos")
class StreamsEosTestDriverService(StreamsEosTestBaseService):
def __init__(self, test_context, kafka):
super(StreamsEosTestDriverService, self).__init__(test_context, kafka, "run")
class StreamsEosTestJobRunnerService(StreamsEosTestBaseService):
def __init__(self, test_context, kafka):
super(StreamsEosTestJobRunnerService, self).__init__(test_context, kafka, "process")
class StreamsComplexEosTestJobRunnerService(StreamsEosTestBaseService):
def __init__(self, test_context, kafka):
super(StreamsComplexEosTestJobRunnerService, self).__init__(test_context, kafka, "process-complex")
class StreamsEosTestVerifyRunnerService(StreamsEosTestBaseService):
def __init__(self, test_context, kafka):
super(StreamsEosTestVerifyRunnerService, self).__init__(test_context, kafka, "verify")
class StreamsComplexEosTestVerifyRunnerService(StreamsEosTestBaseService):
def __init__(self, test_context, kafka):
super(StreamsComplexEosTestVerifyRunnerService, self).__init__(test_context, kafka, "verify-complex")
class StreamsSmokeTestShutdownDeadlockService(StreamsSmokeTestBaseService):
def __init__(self, test_context, kafka):
super(StreamsSmokeTestShutdownDeadlockService, self).__init__(test_context, kafka, "close-deadlock-test")
class StreamsBrokerCompatibilityService(StreamsTestBaseService):
def __init__(self, test_context, kafka, eosEnabled):
super(StreamsBrokerCompatibilityService, self).__init__(test_context,
kafka,
"org.apache.kafka.streams.tests.BrokerCompatibilityTest",
eosEnabled)
class StreamsBrokerDownResilienceService(StreamsTestBaseService):
def __init__(self, test_context, kafka, configs):
super(StreamsBrokerDownResilienceService, self).__init__(test_context,
kafka,
"org.apache.kafka.streams.tests.StreamsBrokerDownResilienceTest",
configs)
def start_cmd(self, node):
args = self.args.copy()
args['config_file'] = self.CONFIG_FILE
args['stdout'] = self.STDOUT_FILE
args['stderr'] = self.STDERR_FILE
args['pidfile'] = self.PID_FILE
args['log4j'] = self.LOG4J_CONFIG_FILE
args['kafka_run_class'] = self.path.script("kafka-run-class.sh", node)
cmd = "( export KAFKA_LOG4J_OPTS=\"-Dlog4j.configuration=file:%(log4j)s\"; " \
"INCLUDE_TEST_JARS=true %(kafka_run_class)s %(streams_class_name)s " \
" %(config_file)s %(user_test_args1)s %(user_test_args2)s %(user_test_args3)s" \
" %(user_test_args4)s & echo $! >&3 ) 1>> %(stdout)s 2>> %(stderr)s 3> %(pidfile)s" % args
self.logger.info("Executing: " + cmd)
return cmd
class StreamsStandbyTaskService(StreamsTestBaseService):
def __init__(self, test_context, kafka, configs):
super(StreamsStandbyTaskService, self).__init__(test_context,
kafka,
"org.apache.kafka.streams.tests.StreamsStandByReplicaTest",
configs)
class StreamsOptimizedUpgradeTestService(StreamsTestBaseService):
def __init__(self, test_context, kafka):
super(StreamsOptimizedUpgradeTestService, self).__init__(test_context,
kafka,
"org.apache.kafka.streams.tests.StreamsOptimizedTest",
"")
self.OPTIMIZED_CONFIG = 'none'
self.INPUT_TOPIC = None
self.AGGREGATION_TOPIC = None
self.REDUCE_TOPIC = None
self.JOIN_TOPIC = None
def prop_file(self):
properties = {streams_property.STATE_DIR: self.PERSISTENT_ROOT,
streams_property.KAFKA_SERVERS: self.kafka.bootstrap_servers()}
properties['topology.optimization'] = self.OPTIMIZED_CONFIG
properties['input.topic'] = self.INPUT_TOPIC
properties['aggregation.topic'] = self.AGGREGATION_TOPIC
properties['reduce.topic'] = self.REDUCE_TOPIC
properties['join.topic'] = self.JOIN_TOPIC
cfg = KafkaConfig(**properties)
return cfg.render()
class StreamsUpgradeTestJobRunnerService(StreamsTestBaseService):
def __init__(self, test_context, kafka):
super(StreamsUpgradeTestJobRunnerService, self).__init__(test_context,
kafka,
"org.apache.kafka.streams.tests.StreamsUpgradeTest",
"")
self.UPGRADE_FROM = None
self.UPGRADE_TO = None
def set_version(self, kafka_streams_version):
self.KAFKA_STREAMS_VERSION = kafka_streams_version
def set_upgrade_from(self, upgrade_from):
self.UPGRADE_FROM = upgrade_from
def set_upgrade_to(self, upgrade_to):
self.UPGRADE_TO = upgrade_to
def prop_file(self):
properties = {streams_property.STATE_DIR: self.PERSISTENT_ROOT,
streams_property.KAFKA_SERVERS: self.kafka.bootstrap_servers()}
if self.UPGRADE_FROM is not None:
properties['upgrade.from'] = self.UPGRADE_FROM
if self.UPGRADE_TO == "future_version":
properties['test.future.metadata'] = "any_value"
cfg = KafkaConfig(**properties)
return cfg.render()
def start_cmd(self, node):
args = self.args.copy()
if self.KAFKA_STREAMS_VERSION == str(LATEST_0_10_0) or self.KAFKA_STREAMS_VERSION == str(LATEST_0_10_1):
args['zk'] = self.kafka.zk.connect_setting()
else:
args['zk'] = ""
args['config_file'] = self.CONFIG_FILE
args['stdout'] = self.STDOUT_FILE
args['stderr'] = self.STDERR_FILE
args['pidfile'] = self.PID_FILE
args['log4j'] = self.LOG4J_CONFIG_FILE
args['version'] = self.KAFKA_STREAMS_VERSION
args['kafka_run_class'] = self.path.script("kafka-run-class.sh", node)
cmd = "( export KAFKA_LOG4J_OPTS=\"-Dlog4j.configuration=file:%(log4j)s\"; " \
"INCLUDE_TEST_JARS=true UPGRADE_KAFKA_STREAMS_TEST_VERSION=%(version)s " \
" %(kafka_run_class)s %(streams_class_name)s %(zk)s %(config_file)s " \
" & echo $! >&3 ) 1>> %(stdout)s 2>> %(stderr)s 3> %(pidfile)s" % args
self.logger.info("Executing: " + cmd)
return cmd
class StreamsNamedRepartitionTopicService(StreamsTestBaseService):
def __init__(self, test_context, kafka):
super(StreamsNamedRepartitionTopicService, self).__init__(test_context,
kafka,
"org.apache.kafka.streams.tests.StreamsNamedRepartitionTest",
"")
self.ADD_ADDITIONAL_OPS = 'false'
self.INPUT_TOPIC = None
self.AGGREGATION_TOPIC = None
def prop_file(self):
properties = {streams_property.STATE_DIR: self.PERSISTENT_ROOT,
streams_property.KAFKA_SERVERS: self.kafka.bootstrap_servers()}
properties['input.topic'] = self.INPUT_TOPIC
properties['aggregation.topic'] = self.AGGREGATION_TOPIC
properties['add.operations'] = self.ADD_ADDITIONAL_OPS
cfg = KafkaConfig(**properties)
return cfg.render()
class StaticMemberTestService(StreamsTestBaseService):
def __init__(self, test_context, kafka, group_instance_id, num_threads):
super(StaticMemberTestService, self).__init__(test_context,
kafka,
"org.apache.kafka.streams.tests.StaticMemberTestClient",
"")
self.INPUT_TOPIC = None
self.GROUP_INSTANCE_ID = group_instance_id
self.NUM_THREADS = num_threads
def prop_file(self):
properties = {streams_property.STATE_DIR: self.PERSISTENT_ROOT,
streams_property.KAFKA_SERVERS: self.kafka.bootstrap_servers(),
streams_property.NUM_THREADS: self.NUM_THREADS,
consumer_property.GROUP_INSTANCE_ID: self.GROUP_INSTANCE_ID,
consumer_property.SESSION_TIMEOUT_MS: 60000}
properties['input.topic'] = self.INPUT_TOPIC
cfg = KafkaConfig(**properties)
return cfg.render()
class CooperativeRebalanceUpgradeService(StreamsTestBaseService):
def __init__(self, test_context, kafka):
super(CooperativeRebalanceUpgradeService, self).__init__(test_context,
kafka,
"org.apache.kafka.streams.tests.StreamsUpgradeToCooperativeRebalanceTest",
"")
self.UPGRADE_FROM = None
# these properties will be overridden in test
self.SOURCE_TOPIC = None
self.SINK_TOPIC = None
self.TASK_DELIMITER = "#"
self.REPORT_INTERVAL = None
self.standby_tasks = None
self.active_tasks = None
self.upgrade_phase = None
def set_tasks(self, task_string):
label = "TASK-ASSIGNMENTS:"
task_string_substr = task_string[len(label):]
all_tasks = task_string_substr.split(self.TASK_DELIMITER)
self.active_tasks = set(all_tasks[0].split(","))
if len(all_tasks) > 1:
self.standby_tasks = set(all_tasks[1].split(","))
def set_version(self, kafka_streams_version):
self.KAFKA_STREAMS_VERSION = kafka_streams_version
def set_upgrade_phase(self, upgrade_phase):
self.upgrade_phase = upgrade_phase
def start_cmd(self, node):
args = self.args.copy()
if self.KAFKA_STREAMS_VERSION == str(LATEST_0_10_0) or self.KAFKA_STREAMS_VERSION == str(LATEST_0_10_1):
args['zk'] = self.kafka.zk.connect_setting()
else:
args['zk'] = ""
args['config_file'] = self.CONFIG_FILE
args['stdout'] = self.STDOUT_FILE
args['stderr'] = self.STDERR_FILE
args['pidfile'] = self.PID_FILE
args['log4j'] = self.LOG4J_CONFIG_FILE
args['version'] = self.KAFKA_STREAMS_VERSION
args['kafka_run_class'] = self.path.script("kafka-run-class.sh", node)
cmd = "( export KAFKA_LOG4J_OPTS=\"-Dlog4j.configuration=file:%(log4j)s\"; " \
"INCLUDE_TEST_JARS=true UPGRADE_KAFKA_STREAMS_TEST_VERSION=%(version)s " \
" %(kafka_run_class)s %(streams_class_name)s %(zk)s %(config_file)s " \
" & echo $! >&3 ) 1>> %(stdout)s 2>> %(stderr)s 3> %(pidfile)s" % args
self.logger.info("Executing: " + cmd)
return cmd
def prop_file(self):
properties = {streams_property.STATE_DIR: self.PERSISTENT_ROOT,
streams_property.KAFKA_SERVERS: self.kafka.bootstrap_servers()}
if self.UPGRADE_FROM is not None:
properties['upgrade.from'] = self.UPGRADE_FROM
else:
try:
del properties['upgrade.from']
except KeyError:
self.logger.info("Key 'upgrade.from' not there, better safe than sorry")
if self.upgrade_phase is not None:
properties['upgrade.phase'] = self.upgrade_phase
properties['source.topic'] = self.SOURCE_TOPIC
properties['sink.topic'] = self.SINK_TOPIC
properties['task.delimiter'] = self.TASK_DELIMITER
properties['report.interval'] = self.REPORT_INTERVAL
cfg = KafkaConfig(**properties)
return cfg.render()

View File

@@ -0,0 +1,22 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
Define Streams configuration property names here.
"""
STATE_DIR = "state.dir"
KAFKA_SERVERS = "bootstrap.servers"
NUM_THREADS = "num.stream.threads"

View File

@@ -0,0 +1,29 @@
##
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
##
# Define the root logger with appender file
log4j.rootLogger = {{ log_level|default("INFO") }}, FILE
log4j.appender.FILE=org.apache.log4j.FileAppender
log4j.appender.FILE.File={{ log_file }}
log4j.appender.FILE.ImmediateFlush=true
log4j.appender.FILE.Append=true
log4j.appender.FILE.layout=org.apache.log4j.PatternLayout
log4j.appender.FILE.layout.conversionPattern=[%d] %p %m (%c)%n
log4j.logger.org.apache.zookeeper=ERROR
log4j.logger.org.reflections=ERROR

View File

@@ -0,0 +1,24 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
group.id={{ group_id|default('test-consumer-group') }}
{% if client_id is defined and client_id is not none %}
client.id={{ client_id }}
{% endif %}
{% if consumer_metadata_max_age_ms is defined and consumer_metadata_max_age_ms is not none %}
metadata.max.age.ms={{ consumer_metadata_max_age_ms }}
{% endif %}

View File

@@ -0,0 +1,27 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# see kafka.consumer.ConsumerConfig for more details
bootstrap.servers={{ source.bootstrap_servers(security_config.security_protocol) }}
{% if source_auto_offset_reset is defined and source_auto_offset_reset is not none %}
auto.offset.reset={{ source_auto_offset_reset|default('latest') }}
{% endif %}
group.id={{ group_id|default('test-consumer-group') }}
{% if partition_assignment_strategy is defined and partition_assignment_strategy is not none %}
partition.assignment.strategy={{ partition_assignment_strategy }}
{% endif %}

View File

@@ -0,0 +1,20 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
bootstrap.servers = {{ target.bootstrap_servers(security_config.security_protocol) }}
{% if producer_interceptor_classes is defined and producer_interceptor_classes is not none %}
interceptor.classes={{ producer_interceptor_classes }}
{% endif %}

View File

@@ -0,0 +1,17 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# see kafka.producer.ProducerConfig for more details
request.timeout.ms={{ request_timeout_ms }}

View File

@@ -0,0 +1,31 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# Define the root logger with appender file
log4j.rootLogger = {{ log_level|default("INFO") }}, FILE
{% if loggers is defined %}
{% for logger, log_level in loggers.iteritems() %}
log4j.logger.{{ logger }}={{ log_level }}
{% endfor %}
{% endif %}
log4j.appender.FILE=org.apache.log4j.FileAppender
log4j.appender.FILE.File={{ log_file }}
log4j.appender.FILE.ImmediateFlush=true
# Set the append to true
log4j.appender.FILE.Append=true
log4j.appender.FILE.layout=org.apache.log4j.PatternLayout
log4j.appender.FILE.layout.conversionPattern=[%d] %p %m (%c)%n

View File

@@ -0,0 +1,40 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
dataDir=/mnt/zookeeper/data
{% if zk_client_port %}
clientPort=2181
{% endif %}
{% if zk_client_secure_port %}
secureClientPort=2182
serverCnxnFactory=org.apache.zookeeper.server.NettyServerCnxnFactory
authProvider.x509=org.apache.zookeeper.server.auth.X509AuthenticationProvider
ssl.keyStore.location=/mnt/security/test.keystore.jks
ssl.keyStore.password=test-ks-passwd
ssl.keyStore.type=JKS
ssl.trustStore.location=/mnt/security/test.truststore.jks
ssl.trustStore.password=test-ts-passwd
ssl.trustStore.type=JKS
{% if zk_tls_encrypt_only %}
ssl.clientAuth=none
{% endif %}
{% endif %}
maxClientCnxns=0
initLimit=5
syncLimit=2
quorumListenOnAllIPs=true
{% for node in nodes %}
server.{{ loop.index }}={{ node.account.hostname }}:2888:3888
{% endfor %}

View File

@@ -0,0 +1,203 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import json
import signal
from ducktape.utils.util import wait_until
from ducktape.services.background_thread import BackgroundThreadService
from kafkatest.directory_layout.kafka_path import KafkaPathResolverMixin
from ducktape.cluster.remoteaccount import RemoteCommandError
class TransactionalMessageCopier(KafkaPathResolverMixin, BackgroundThreadService):
"""This service wraps org.apache.kafka.tools.TransactionalMessageCopier for
use in system testing.
"""
PERSISTENT_ROOT = "/mnt/transactional_message_copier"
STDOUT_CAPTURE = os.path.join(PERSISTENT_ROOT, "transactional_message_copier.stdout")
STDERR_CAPTURE = os.path.join(PERSISTENT_ROOT, "transactional_message_copier.stderr")
LOG_DIR = os.path.join(PERSISTENT_ROOT, "logs")
LOG_FILE = os.path.join(LOG_DIR, "transactional_message_copier.log")
LOG4J_CONFIG = os.path.join(PERSISTENT_ROOT, "tools-log4j.properties")
logs = {
"transactional_message_copier_stdout": {
"path": STDOUT_CAPTURE,
"collect_default": True},
"transactional_message_copier_stderr": {
"path": STDERR_CAPTURE,
"collect_default": True},
"transactional_message_copier_log": {
"path": LOG_FILE,
"collect_default": True}
}
def __init__(self, context, num_nodes, kafka, transactional_id, consumer_group,
input_topic, input_partition, output_topic, max_messages=-1,
transaction_size=1000, transaction_timeout=None, enable_random_aborts=True, use_group_metadata=False, group_mode=False):
super(TransactionalMessageCopier, self).__init__(context, num_nodes)
self.kafka = kafka
self.transactional_id = transactional_id
self.consumer_group = consumer_group
self.transaction_size = transaction_size
self.transaction_timeout = transaction_timeout
self.input_topic = input_topic
self.input_partition = input_partition
self.output_topic = output_topic
self.max_messages = max_messages
self.message_copy_finished = False
self.consumed = -1
self.remaining = -1
self.stop_timeout_sec = 60
self.enable_random_aborts = enable_random_aborts
self.use_group_metadata = use_group_metadata
self.group_mode = group_mode
self.loggers = {
"org.apache.kafka.clients.producer": "TRACE",
"org.apache.kafka.clients.consumer": "TRACE"
}
def _worker(self, idx, node):
node.account.ssh("mkdir -p %s" % TransactionalMessageCopier.PERSISTENT_ROOT,
allow_fail=False)
# Create and upload log properties
log_config = self.render('tools_log4j.properties',
log_file=TransactionalMessageCopier.LOG_FILE)
node.account.create_file(TransactionalMessageCopier.LOG4J_CONFIG, log_config)
# Configure security
self.security_config = self.kafka.security_config.client_config(node=node)
self.security_config.setup_node(node)
cmd = self.start_cmd(node, idx)
self.logger.debug("TransactionalMessageCopier %d command: %s" % (idx, cmd))
try:
for line in node.account.ssh_capture(cmd):
line = line.strip()
data = self.try_parse_json(line)
if data is not None:
with self.lock:
self.remaining = int(data["remaining"])
self.consumed = int(data["consumed"])
self.logger.info("%s: consumed %d, remaining %d" %
(self.transactional_id, self.consumed, self.remaining))
if "shutdown_complete" in data:
if self.remaining == 0:
# We are only finished if the remaining
# messages at the time of shutdown is 0.
#
# Otherwise a clean shutdown would still print
# a 'shutdown complete' messages even though
# there are unprocessed messages, causing
# tests to fail.
self.logger.info("%s : Finished message copy" % self.transactional_id)
self.message_copy_finished = True
else:
self.logger.info("%s : Shut down without finishing message copy." %\
self.transactional_id)
except RemoteCommandError as e:
self.logger.debug("Got exception while reading output from copier, \
probably because it was SIGKILL'd (exit code 137): %s" % str(e))
def start_cmd(self, node, idx):
cmd = "export LOG_DIR=%s;" % TransactionalMessageCopier.LOG_DIR
cmd += " export KAFKA_OPTS=%s;" % self.security_config.kafka_opts
cmd += " export KAFKA_LOG4J_OPTS=\"-Dlog4j.configuration=file:%s\"; " % TransactionalMessageCopier.LOG4J_CONFIG
cmd += self.path.script("kafka-run-class.sh", node) + " org.apache.kafka.tools." + "TransactionalMessageCopier"
cmd += " --broker-list %s" % self.kafka.bootstrap_servers(self.security_config.security_protocol)
cmd += " --transactional-id %s" % self.transactional_id
cmd += " --consumer-group %s" % self.consumer_group
cmd += " --input-topic %s" % self.input_topic
cmd += " --output-topic %s" % self.output_topic
cmd += " --input-partition %s" % str(self.input_partition)
cmd += " --transaction-size %s" % str(self.transaction_size)
if self.transaction_timeout is not None:
cmd += " --transaction-timeout %s" % str(self.transaction_timeout)
if self.enable_random_aborts:
cmd += " --enable-random-aborts"
if self.use_group_metadata:
cmd += " --use-group-metadata"
if self.group_mode:
cmd += " --group-mode"
if self.max_messages > 0:
cmd += " --max-messages %s" % str(self.max_messages)
cmd += " 2>> %s | tee -a %s &" % (TransactionalMessageCopier.STDERR_CAPTURE, TransactionalMessageCopier.STDOUT_CAPTURE)
return cmd
def clean_node(self, node):
self.kill_node(node, clean_shutdown=False)
node.account.ssh("rm -rf " + self.PERSISTENT_ROOT, allow_fail=False)
self.security_config.clean_node(node)
def pids(self, node):
try:
cmd = "jps | grep -i TransactionalMessageCopier | awk '{print $1}'"
pid_arr = [pid for pid in node.account.ssh_capture(cmd, allow_fail=True, callback=int)]
return pid_arr
except (RemoteCommandError, ValueError) as e:
self.logger.error("Could not list pids: %s" % str(e))
return []
def alive(self, node):
return len(self.pids(node)) > 0
def kill_node(self, node, clean_shutdown=True):
pids = self.pids(node)
sig = signal.SIGTERM if clean_shutdown else signal.SIGKILL
for pid in pids:
node.account.signal(pid, sig)
wait_until(lambda: len(self.pids(node)) == 0, timeout_sec=60, err_msg="Message Copier failed to stop")
def stop_node(self, node, clean_shutdown=True):
self.kill_node(node, clean_shutdown)
stopped = self.wait_node(node, timeout_sec=self.stop_timeout_sec)
assert stopped, "Node %s: did not stop within the specified timeout of %s seconds" % \
(str(node.account), str(self.stop_timeout_sec))
def restart(self, clean_shutdown):
if self.is_done:
return
node = self.nodes[0]
with self.lock:
self.consumed = -1
self.remaining = -1
self.stop_node(node, clean_shutdown)
self.start_node(node)
def try_parse_json(self, string):
"""Try to parse a string as json. Return None if not parseable."""
try:
record = json.loads(string)
return record
except ValueError:
self.logger.debug("Could not parse as json: %s" % str(string))
return None
@property
def is_done(self):
return self.message_copy_finished
def progress_percent(self):
with self.lock:
if self.remaining < 0:
return 0
if self.consumed + self.remaining == 0:
return 100
return (float(self.consumed)/float(self.consumed + self.remaining)) * 100

View File

@@ -0,0 +1,14 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

View File

@@ -0,0 +1,56 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from ducktape.services.service import Service
from kafkatest.services.trogdor.task_spec import TaskSpec
class ConsumeBenchWorkloadSpec(TaskSpec):
def __init__(self, start_ms, duration_ms, consumer_node, bootstrap_servers,
target_messages_per_sec, max_messages, active_topics,
consumer_conf, common_client_conf, admin_client_conf, consumer_group=None, threads_per_worker=1):
super(ConsumeBenchWorkloadSpec, self).__init__(start_ms, duration_ms)
self.message["class"] = "org.apache.kafka.trogdor.workload.ConsumeBenchSpec"
self.message["consumerNode"] = consumer_node
self.message["bootstrapServers"] = bootstrap_servers
self.message["targetMessagesPerSec"] = target_messages_per_sec
self.message["maxMessages"] = max_messages
self.message["consumerConf"] = consumer_conf
self.message["adminClientConf"] = admin_client_conf
self.message["commonClientConf"] = common_client_conf
self.message["activeTopics"] = active_topics
self.message["threadsPerWorker"] = threads_per_worker
if consumer_group is not None:
self.message["consumerGroup"] = consumer_group
class ConsumeBenchWorkloadService(Service):
def __init__(self, context, kafka):
Service.__init__(self, context, num_nodes=1)
self.bootstrap_servers = kafka.bootstrap_servers(validate=False)
self.consumer_node = self.nodes[0].account.hostname
def free(self):
Service.free(self)
def wait_node(self, node, timeout_sec=None):
pass
def stop_node(self, node):
pass
def clean_node(self, node):
pass

View File

@@ -0,0 +1,48 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from kafkatest.services.trogdor.task_spec import TaskSpec
class DegradedNetworkFaultSpec(TaskSpec):
"""
The specification for a network degradation fault.
Degrades the network so that traffic on a subset of nodes has higher latency
"""
def __init__(self, start_ms, duration_ms):
"""
Create a new NetworkDegradeFaultSpec.
:param start_ms: The start time, as described in task_spec.py
:param duration_ms: The duration in milliseconds.
"""
super(DegradedNetworkFaultSpec, self).__init__(start_ms, duration_ms)
self.message["class"] = "org.apache.kafka.trogdor.fault.DegradedNetworkFaultSpec"
self.message["nodeSpecs"] = {}
def add_node_spec(self, node, networkDevice, latencyMs=0, rateLimitKbit=0):
"""
Add a node spec to this fault spec
:param node: The node name which is to be degraded
:param networkDevice: The network device name (e.g., eth0) to apply the degradation to
:param latencyMs: Optional. How much latency to add to each packet
:param rateLimitKbit: Optional. Maximum throughput in kilobits per second to allow
:return:
"""
self.message["nodeSpecs"][node] = {
"rateLimitKbit": rateLimitKbit, "latencyMs": latencyMs, "networkDevice": networkDevice
}

View File

@@ -0,0 +1,46 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from kafkatest.services.trogdor.task_spec import TaskSpec
class FilesUnreadableFaultSpec(TaskSpec):
"""
The specification for a fault which makes files unreadable.
"""
def __init__(self, start_ms, duration_ms, node_names, mount_path,
prefix, error_code):
"""
Create a new FilesUnreadableFaultSpec.
:param start_ms: The start time, as described in task_spec.py
:param duration_ms: The duration in milliseconds.
:param node_names: The names of the node(s) to create the fault on.
:param mount_path: The mount path.
:param prefix: The prefix within the mount point to make unreadable.
:param error_code: The error code to use.
"""
super(FilesUnreadableFaultSpec, self).__init__(start_ms, duration_ms)
self.message["class"] = "org.apache.kafka.trogdor.fault.FilesUnreadableFaultSpec"
self.message["nodeNames"] = node_names
self.message["mountPath"] = mount_path
self.message["prefix"] = prefix
self.message["errorCode"] = error_code
self.kibosh_message = {}
self.kibosh_message["type"] = "unreadable"
self.kibosh_message["prefix"] = prefix
self.kibosh_message["code"] = error_code

View File

@@ -0,0 +1,156 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import json
import os.path
from ducktape.services.service import Service
from ducktape.utils import util
class KiboshService(Service):
"""
Kibosh is a fault-injecting FUSE filesystem.
Attributes:
INSTALL_ROOT The path of where Kibosh is installed.
BINARY_NAME The Kibosh binary name.
BINARY_PATH The path to the kibosh binary.
"""
INSTALL_ROOT = "/opt/kibosh/build"
BINARY_NAME = "kibosh"
BINARY_PATH = os.path.join(INSTALL_ROOT, BINARY_NAME)
def __init__(self, context, nodes, target, mirror, persist="/mnt/kibosh"):
"""
Create a Kibosh service.
:param context: The TestContext object.
:param nodes: The nodes to put the Kibosh FS on. Kibosh allocates no
nodes of its own.
:param target: The target directory, which Kibosh exports a view of.
:param mirror: The mirror directory, where Kibosh injects faults.
:param persist: Where the log files and pid files will be created.
"""
Service.__init__(self, context, num_nodes=0)
if (len(nodes) == 0):
raise RuntimeError("You must supply at least one node to run the service on.")
for node in nodes:
self.nodes.append(node)
self.target = target
self.mirror = mirror
self.persist = persist
self.control_path = os.path.join(self.mirror, "kibosh_control")
self.pidfile_path = os.path.join(self.persist, "pidfile")
self.stdout_stderr_path = os.path.join(self.persist, "kibosh-stdout-stderr.log")
self.log_path = os.path.join(self.persist, "kibosh.log")
self.logs = {
"kibosh-stdout-stderr.log": {
"path": self.stdout_stderr_path,
"collect_default": True},
"kibosh.log": {
"path": self.log_path,
"collect_default": True}
}
def free(self):
"""Clear the nodes list."""
# Because the filesystem runs on nodes which have been allocated by other services, those nodes
# are not deallocated here.
self.nodes = []
Service.free(self)
def kibosh_running(self, node):
return 0 == node.account.ssh("test -e '%s'" % self.control_path, allow_fail=True)
def start_node(self, node):
node.account.mkdirs(self.persist)
cmd = "sudo -E "
cmd += " %s" % KiboshService.BINARY_PATH
cmd += " --target %s" % self.target
cmd += " --pidfile %s" % self.pidfile_path
cmd += " --log %s" % self.log_path
cmd += " --control-mode 666"
cmd += " --verbose"
cmd += " %s" % self.mirror
cmd += " &> %s" % self.stdout_stderr_path
node.account.ssh(cmd)
util.wait_until(lambda: self.kibosh_running(node), 20, backoff_sec=.1,
err_msg="Timed out waiting for kibosh to start on %s" % node.account.hostname)
def pids(self, node):
return [pid for pid in node.account.ssh_capture("test -e '%s' && test -e /proc/$(cat '%s')" %
(self.pidfile_path, self.pidfile_path), allow_fail=True)]
def wait_node(self, node, timeout_sec=None):
return len(self.pids(node)) == 0
def kibosh_process_running(self, node):
pids = self.pids(node)
if len(pids) == 0:
return True
return False
def stop_node(self, node):
"""Halt kibosh process(es) on this node."""
node.account.logger.debug("stop_node(%s): unmounting %s" % (node.name, self.mirror))
node.account.ssh("sudo fusermount -u %s" % self.mirror, allow_fail=True)
# Wait for the kibosh process to terminate.
try:
util.wait_until(lambda: self.kibosh_process_running(node), 20, backoff_sec=.1,
err_msg="Timed out waiting for kibosh to stop on %s" % node.account.hostname)
except TimeoutError:
# If the process won't terminate, use kill -9 to shut it down.
node.account.logger.debug("stop_node(%s): killing the kibosh process managing %s" % (node.name, self.mirror))
node.account.ssh("sudo kill -9 %s" % (" ".join(self.pids(node))), allow_fail=True)
node.account.ssh("sudo fusermount -u %s" % self.mirror)
util.wait_until(lambda: self.kibosh_process_running(node), 20, backoff_sec=.1,
err_msg="Timed out waiting for kibosh to stop on %s" % node.account.hostname)
def clean_node(self, node):
"""Clean up persistent state on this node - e.g. service logs, configuration files etc."""
self.stop_node(node)
node.account.ssh("rm -rf -- %s" % self.persist)
def set_faults(self, node, specs):
"""
Set the currently active faults.
:param node: The node.
:param spec: An array of FaultSpec objects describing the faults.
"""
if len(specs) == 0:
obj_json = "{}"
else:
fault_array = [spec.kibosh_message for spec in specs]
obj = { 'faults': fault_array }
obj_json = json.dumps(obj)
node.account.create_file(self.control_path, obj_json)
def get_fault_json(self, node):
"""
Return a JSON string which contains the currently active faults.
:param node: The node.
:returns: The fault JSON describing the faults.
"""
iter = node.account.ssh_capture("cat '%s'" % self.control_path)
text = ""
for line in iter:
text = "%s%s" % (text, line.rstrip("\r\n"))
return text

View File

@@ -0,0 +1,39 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from kafkatest.services.trogdor.task_spec import TaskSpec
class NetworkPartitionFaultSpec(TaskSpec):
"""
The specification for a network partition fault.
Network partition faults fracture the network into different partitions
that cannot communicate with each other.
"""
def __init__(self, start_ms, duration_ms, partitions):
"""
Create a new NetworkPartitionFaultSpec.
:param start_ms: The start time, as described in task_spec.py
:param duration_ms: The duration in milliseconds.
:param partitions: An array of arrays describing the partitions.
The inner arrays may contain either node names,
or ClusterNode objects.
"""
super(NetworkPartitionFaultSpec, self).__init__(start_ms, duration_ms)
self.message["class"] = "org.apache.kafka.trogdor.fault.NetworkPartitionFaultSpec"
self.message["partitions"] = [TaskSpec.to_node_names(p) for p in partitions]

View File

@@ -0,0 +1,35 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from kafkatest.services.trogdor.task_spec import TaskSpec
class NoOpTaskSpec(TaskSpec):
"""
The specification for a nop-op task.
No-op faults are used to test Trogdor. They don't do anything,
but must be propagated to all Trogdor agents.
"""
def __init__(self, start_ms, duration_ms):
"""
Create a new NoOpFault.
:param start_ms: The start time, as described in task_spec.py
:param duration_ms: The duration in milliseconds.
"""
super(NoOpTaskSpec, self).__init__(start_ms, duration_ms)
self.message["class"] = "org.apache.kafka.trogdor.task.NoOpTaskSpec";

View File

@@ -0,0 +1,38 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from kafkatest.services.trogdor.task_spec import TaskSpec
class ProcessStopFaultSpec(TaskSpec):
"""
The specification for a process stop fault.
"""
def __init__(self, start_ms, duration_ms, nodes, java_process_name):
"""
Create a new ProcessStopFaultSpec.
:param start_ms: The start time, as described in task_spec.py
:param duration_ms: The duration in milliseconds.
:param node_names: An array describing the nodes to stop processes on. The array
may contain either node names, or ClusterNode objects.
:param java_process_name: The name of the java process to stop. This is the name which
is reported by jps, etc., not the OS-level process name.
"""
super(ProcessStopFaultSpec, self).__init__(start_ms, duration_ms)
self.message["class"] = "org.apache.kafka.trogdor.fault.ProcessStopFaultSpec"
self.message["nodeNames"] = TaskSpec.to_node_names(nodes)
self.message["javaProcessName"] = java_process_name

View File

@@ -0,0 +1,56 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from ducktape.services.service import Service
from kafkatest.services.trogdor.task_spec import TaskSpec
class ProduceBenchWorkloadSpec(TaskSpec):
def __init__(self, start_ms, duration_ms, producer_node, bootstrap_servers,
target_messages_per_sec, max_messages, producer_conf, admin_client_conf,
common_client_conf, inactive_topics, active_topics,
transaction_generator=None):
super(ProduceBenchWorkloadSpec, self).__init__(start_ms, duration_ms)
self.message["class"] = "org.apache.kafka.trogdor.workload.ProduceBenchSpec"
self.message["producerNode"] = producer_node
self.message["bootstrapServers"] = bootstrap_servers
self.message["targetMessagesPerSec"] = target_messages_per_sec
self.message["maxMessages"] = max_messages
self.message["producerConf"] = producer_conf
self.message["transactionGenerator"] = transaction_generator
self.message["adminClientConf"] = admin_client_conf
self.message["commonClientConf"] = common_client_conf
self.message["inactiveTopics"] = inactive_topics
self.message["activeTopics"] = active_topics
class ProduceBenchWorkloadService(Service):
def __init__(self, context, kafka):
Service.__init__(self, context, num_nodes=1)
self.bootstrap_servers = kafka.bootstrap_servers(validate=False)
self.producer_node = self.nodes[0].account.hostname
def free(self):
Service.free(self)
def wait_node(self, node, timeout_sec=None):
pass
def stop_node(self, node):
pass
def clean_node(self, node):
pass

View File

@@ -0,0 +1,49 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from ducktape.services.service import Service
from kafkatest.services.trogdor.task_spec import TaskSpec
class RoundTripWorkloadSpec(TaskSpec):
def __init__(self, start_ms, duration_ms, client_node, bootstrap_servers,
target_messages_per_sec, max_messages, active_topics):
super(RoundTripWorkloadSpec, self).__init__(start_ms, duration_ms)
self.message["class"] = "org.apache.kafka.trogdor.workload.RoundTripWorkloadSpec"
self.message["clientNode"] = client_node
self.message["bootstrapServers"] = bootstrap_servers
self.message["targetMessagesPerSec"] = target_messages_per_sec
self.message["maxMessages"] = max_messages
self.message["activeTopics"] = active_topics
class RoundTripWorkloadService(Service):
def __init__(self, context, kafka):
Service.__init__(self, context, num_nodes=1)
self.bootstrap_servers = kafka.bootstrap_servers(validate=False)
self.client_node = self.nodes[0].account.hostname
def free(self):
Service.free(self)
def wait_node(self, node, timeout_sec=None):
pass
def stop_node(self, node):
pass
def clean_node(self, node):
pass

View File

@@ -0,0 +1,54 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import json
class TaskSpec(object):
"""
The base class for a task specification.
MAX_DURATION_MS The longest duration we should use for a task specification.
"""
MAX_DURATION_MS=10000000
def __init__(self, start_ms, duration_ms):
"""
Create a new task specification.
:param start_ms: The target start time in milliseconds since the epoch.
:param duration_ms: The duration in milliseconds.
"""
self.message = {
'startMs': start_ms,
'durationMs': duration_ms
}
@staticmethod
def to_node_names(nodes):
"""
Convert an array of nodes or node names to an array of node names.
"""
node_names = []
for obj in nodes:
if isinstance(obj, basestring):
node_names.append(obj)
else:
node_names.append(obj.name)
return node_names
def __str__(self):
return json.dumps(self.message)

View File

@@ -0,0 +1,23 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
log4j.rootLogger=DEBUG, mylogger
log4j.logger.kafka=DEBUG
log4j.logger.org.apache.kafka=DEBUG
log4j.logger.org.eclipse=INFO
log4j.appender.mylogger=org.apache.log4j.FileAppender
log4j.appender.mylogger.File={{ log_path }}
log4j.appender.mylogger.layout=org.apache.log4j.PatternLayout
log4j.appender.mylogger.layout.ConversionPattern=[%d] %p %m (%c)%n

View File

@@ -0,0 +1,354 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import json
import os.path
import requests
from requests.adapters import HTTPAdapter
from requests.packages.urllib3 import Retry
from ducktape.services.service import Service
from ducktape.utils.util import wait_until
from kafkatest.directory_layout.kafka_path import KafkaPathResolverMixin
class TrogdorService(KafkaPathResolverMixin, Service):
"""
A ducktape service for running the trogdor fault injection daemons.
Attributes:
PERSISTENT_ROOT The root filesystem path to store service files under.
COORDINATOR_STDOUT_STDERR The path where we store the coordinator's stdout/stderr output.
AGENT_STDOUT_STDERR The path where we store the agents's stdout/stderr output.
COORDINATOR_LOG The path where we store the coordinator's log4j output.
AGENT_LOG The path where we store the agent's log4j output.
AGENT_LOG4J_PROPERTIES The path to the agent log4j.properties file for log config.
COORDINATOR_LOG4J_PROPERTIES The path to the coordinator log4j.properties file for log config.
CONFIG_PATH The path to the trogdor configuration file.
DEFAULT_AGENT_PORT The default port to use for trogdor_agent daemons.
DEFAULT_COORDINATOR_PORT The default port to use for trogdor_coordinator daemons.
REQUEST_TIMEOUT The request timeout in seconds to use for REST requests.
REQUEST_HEADERS The request headers to use when communicating with trogdor.
"""
PERSISTENT_ROOT="/mnt/trogdor"
COORDINATOR_STDOUT_STDERR = os.path.join(PERSISTENT_ROOT, "trogdor-coordinator-stdout-stderr.log")
AGENT_STDOUT_STDERR = os.path.join(PERSISTENT_ROOT, "trogdor-agent-stdout-stderr.log")
COORDINATOR_LOG = os.path.join(PERSISTENT_ROOT, "trogdor-coordinator.log")
AGENT_LOG = os.path.join(PERSISTENT_ROOT, "trogdor-agent.log")
COORDINATOR_LOG4J_PROPERTIES = os.path.join(PERSISTENT_ROOT, "trogdor-coordinator-log4j.properties")
AGENT_LOG4J_PROPERTIES = os.path.join(PERSISTENT_ROOT, "trogdor-agent-log4j.properties")
CONFIG_PATH = os.path.join(PERSISTENT_ROOT, "trogdor.conf")
DEFAULT_AGENT_PORT=8888
DEFAULT_COORDINATOR_PORT=8889
REQUEST_TIMEOUT=5
REQUEST_HEADERS = {"Content-type": "application/json"}
logs = {
"trogdor_coordinator_stdout_stderr": {
"path": COORDINATOR_STDOUT_STDERR,
"collect_default": True},
"trogdor_agent_stdout_stderr": {
"path": AGENT_STDOUT_STDERR,
"collect_default": True},
"trogdor_coordinator_log": {
"path": COORDINATOR_LOG,
"collect_default": True},
"trogdor_agent_log": {
"path": AGENT_LOG,
"collect_default": True},
}
def __init__(self, context, agent_nodes=None, client_services=None,
agent_port=DEFAULT_AGENT_PORT, coordinator_port=DEFAULT_COORDINATOR_PORT):
"""
Create a Trogdor service.
:param context: The test context.
:param agent_nodes: The nodes to run the agents on.
:param client_services: Services whose nodes we should run agents on.
:param agent_port: The port to use for the trogdor_agent daemons.
:param coordinator_port: The port to use for the trogdor_coordinator daemons.
"""
Service.__init__(self, context, num_nodes=1)
self.coordinator_node = self.nodes[0]
if client_services is not None:
for client_service in client_services:
for node in client_service.nodes:
self.nodes.append(node)
if agent_nodes is not None:
for agent_node in agent_nodes:
self.nodes.append(agent_node)
if (len(self.nodes) == 1):
raise RuntimeError("You must supply at least one agent node to run the service on.")
self.agent_port = agent_port
self.coordinator_port = coordinator_port
def free(self):
# We only want to deallocate the coordinator node, not the agent nodes. So we
# change self.nodes to include only the coordinator node, and then invoke
# the base class' free method.
if self.coordinator_node is not None:
self.nodes = [self.coordinator_node]
self.coordinator_node = None
Service.free(self)
def _create_config_dict(self):
"""
Create a dictionary with the Trogdor configuration.
:return: The configuration dictionary.
"""
dict_nodes = {}
for node in self.nodes:
dict_nodes[node.name] = {
"hostname": node.account.ssh_hostname,
}
if node.name == self.coordinator_node.name:
dict_nodes[node.name]["trogdor.coordinator.port"] = self.coordinator_port
else:
dict_nodes[node.name]["trogdor.agent.port"] = self.agent_port
return {
"platform": "org.apache.kafka.trogdor.basic.BasicPlatform",
"nodes": dict_nodes,
}
def start_node(self, node):
node.account.mkdirs(TrogdorService.PERSISTENT_ROOT)
# Create the configuration file on the node.
str = json.dumps(self._create_config_dict(), indent=2)
self.logger.info("Creating configuration file %s with %s" % (TrogdorService.CONFIG_PATH, str))
node.account.create_file(TrogdorService.CONFIG_PATH, str)
if self.is_coordinator(node):
self._start_coordinator_node(node)
else:
self._start_agent_node(node)
def _start_coordinator_node(self, node):
node.account.create_file(TrogdorService.COORDINATOR_LOG4J_PROPERTIES,
self.render('log4j.properties',
log_path=TrogdorService.COORDINATOR_LOG))
self._start_trogdor_daemon("coordinator", TrogdorService.COORDINATOR_STDOUT_STDERR,
TrogdorService.COORDINATOR_LOG4J_PROPERTIES,
TrogdorService.COORDINATOR_LOG, node)
self.logger.info("Started trogdor coordinator on %s." % node.name)
def _start_agent_node(self, node):
node.account.create_file(TrogdorService.AGENT_LOG4J_PROPERTIES,
self.render('log4j.properties',
log_path=TrogdorService.AGENT_LOG))
self._start_trogdor_daemon("agent", TrogdorService.AGENT_STDOUT_STDERR,
TrogdorService.AGENT_LOG4J_PROPERTIES,
TrogdorService.AGENT_LOG, node)
self.logger.info("Started trogdor agent on %s." % node.name)
def _start_trogdor_daemon(self, daemon_name, stdout_stderr_capture_path,
log4j_properties_path, log_path, node):
cmd = "export KAFKA_LOG4J_OPTS='-Dlog4j.configuration=file:%s'; " % log4j_properties_path
cmd += "%s %s --%s.config %s --node-name %s 1>> %s 2>> %s &" % \
(self.path.script("trogdor.sh", node),
daemon_name,
daemon_name,
TrogdorService.CONFIG_PATH,
node.name,
stdout_stderr_capture_path,
stdout_stderr_capture_path)
node.account.ssh(cmd)
with node.account.monitor_log(log_path) as monitor:
monitor.wait_until("Starting %s process." % daemon_name, timeout_sec=60, backoff_sec=.10,
err_msg=("%s on %s didn't finish startup" % (daemon_name, node.name)))
def wait_node(self, node, timeout_sec=None):
if self.is_coordinator(node):
return len(node.account.java_pids(self.coordinator_class_name())) == 0
else:
return len(node.account.java_pids(self.agent_class_name())) == 0
def stop_node(self, node):
"""Halt trogdor processes on this node."""
if self.is_coordinator(node):
node.account.kill_java_processes(self.coordinator_class_name())
else:
node.account.kill_java_processes(self.agent_class_name())
def clean_node(self, node):
"""Clean up persistent state on this node - e.g. service logs, configuration files etc."""
self.stop_node(node)
node.account.ssh("rm -rf -- %s" % TrogdorService.PERSISTENT_ROOT)
def _coordinator_url(self, path):
return "http://%s:%d/coordinator/%s" % \
(self.coordinator_node.account.ssh_hostname, self.coordinator_port, path)
def request_session(self):
"""
Creates a new request session which will retry for a while.
"""
session = requests.Session()
session.mount('http://',
HTTPAdapter(max_retries=Retry(total=5, backoff_factor=0.3)))
return session
def _coordinator_post(self, path, message):
"""
Make a POST request to the Trogdor coordinator.
:param path: The URL path to use.
:param message: The message object to send.
:return: The response as an object.
"""
url = self._coordinator_url(path)
self.logger.info("POST %s %s" % (url, message))
response = self.request_session().post(url, json=message,
timeout=TrogdorService.REQUEST_TIMEOUT,
headers=TrogdorService.REQUEST_HEADERS)
response.raise_for_status()
return response.json()
def _coordinator_put(self, path, message):
"""
Make a PUT request to the Trogdor coordinator.
:param path: The URL path to use.
:param message: The message object to send.
:return: The response as an object.
"""
url = self._coordinator_url(path)
self.logger.info("PUT %s %s" % (url, message))
response = self.request_session().put(url, json=message,
timeout=TrogdorService.REQUEST_TIMEOUT,
headers=TrogdorService.REQUEST_HEADERS)
response.raise_for_status()
return response.json()
def _coordinator_get(self, path, message):
"""
Make a GET request to the Trogdor coordinator.
:param path: The URL path to use.
:param message: The message object to send.
:return: The response as an object.
"""
url = self._coordinator_url(path)
self.logger.info("GET %s %s" % (url, message))
response = self.request_session().get(url, json=message,
timeout=TrogdorService.REQUEST_TIMEOUT,
headers=TrogdorService.REQUEST_HEADERS)
response.raise_for_status()
return response.json()
def create_task(self, id, spec):
"""
Create a new task.
:param id: The task id.
:param spec: The task spec.
"""
self._coordinator_post("task/create", { "id": id, "spec": spec.message})
return TrogdorTask(id, self)
def stop_task(self, id):
"""
Stop a task.
:param id: The task id.
"""
self._coordinator_put("task/stop", { "id": id })
def tasks(self):
"""
Get the tasks which are on the coordinator.
:returns: A map of task id strings to task state objects.
Task state objects contain a 'spec' field with the spec
and a 'state' field with the state.
"""
return self._coordinator_get("tasks", {})
def is_coordinator(self, node):
return node == self.coordinator_node
def agent_class_name(self):
return "org.apache.kafka.trogdor.agent.Agent"
def coordinator_class_name(self):
return "org.apache.kafka.trogdor.coordinator.Coordinator"
class TrogdorTask(object):
PENDING_STATE = "PENDING"
RUNNING_STATE = "RUNNING"
STOPPING_STATE = "STOPPING"
DONE_STATE = "DONE"
def __init__(self, id, trogdor):
self.id = id
self.trogdor = trogdor
def task_state_or_error(self):
task_state = self.trogdor.tasks()["tasks"][self.id]
if task_state is None:
raise RuntimeError("Coordinator did not know about %s." % self.id)
error = task_state.get("error")
if error is None or error == "":
return task_state["state"], None
else:
return None, error
def done(self):
"""
Check if this task is done.
:raises RuntimeError: If the task encountered an error.
:returns: True if the task is in DONE_STATE;
False if it is in a different state.
"""
(task_state, error) = self.task_state_or_error()
if task_state is not None:
return task_state == TrogdorTask.DONE_STATE
else:
raise RuntimeError("Failed to gracefully stop %s: got task error: %s" % (self.id, error))
def running(self):
"""
Check if this task is running.
:raises RuntimeError: If the task encountered an error.
:returns: True if the task is in RUNNING_STATE;
False if it is in a different state.
"""
(task_state, error) = self.task_state_or_error()
if task_state is not None:
return task_state == TrogdorTask.RUNNING_STATE
else:
raise RuntimeError("Failed to start %s: got task error: %s" % (self.id, error))
def stop(self):
"""
Stop this task.
:raises RuntimeError: If the task encountered an error.
"""
if self.done():
return
self.trogdor.stop_task(self.id)
def wait_for_done(self, timeout_sec=360):
wait_until(lambda: self.done(),
timeout_sec=timeout_sec,
err_msg="%s failed to finish in the expected amount of time." % self.id)

View File

@@ -0,0 +1,330 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from kafkatest.directory_layout.kafka_path import TOOLS_JAR_NAME, TOOLS_DEPENDANT_TEST_LIBS_JAR_NAME
from kafkatest.version import DEV_BRANCH, LATEST_0_8_2
from ducktape.cluster.remoteaccount import RemoteCommandError
import importlib
import os
import subprocess
import signal
"""This module abstracts the implementation of a verifiable client, allowing
client developers to plug in their own client for all kafkatests that make
use of either the VerifiableConsumer or VerifiableProducer classes.
A verifiable client class must implement exec_cmd() and pids().
This file provides:
* VerifiableClientMixin class: to be used for creating new verifiable client classes
* VerifiableClientJava class: the default Java verifiable clients
* VerifiableClientApp class: uses global configuration to specify
the command to execute and optional "pids" command, deploy script, etc.
Config syntax (pass as --global <json_or_jsonfile>):
{"Verifiable(Producer|Consumer|Client)": {
"class": "kafkatest.services.verifiable_client.VerifiableClientApp",
"exec_cmd": "/vagrant/x/myclient --some --standard --args",
"pids": "pgrep -f ...", // optional
"deploy": "/vagrant/x/mydeploy.sh", // optional
"kill_signal": 2 // optional clean_shutdown kill signal (SIGINT in this case)
}}
* VerifiableClientDummy class: testing dummy
==============================
Verifiable client requirements
==============================
There are currently two verifiable client specifications:
* VerifiableConsumer
* VerifiableProducer
Common requirements for both:
* One-way communication (client -> tests) through new-line delimited
JSON objects on stdout (details below).
* Log/debug to stderr
Common communication for both:
* `{ "name": "startup_complete" }` - Client succesfully started
* `{ "name": "shutdown_complete" }` - Client succesfully terminated (after receiving SIGINT/SIGTERM)
==================
VerifiableConsumer
==================
Command line arguments:
* `--group-id <group-id>`
* `--topic <topic>`
* `--broker-list <brokers>`
* `--session-timeout <n>`
* `--enable-autocommit`
* `--max-messages <n>`
* `--assignment-strategy <s>`
* `--consumer.config <config-file>` - consumer config properties (typically empty)
Environment variables:
* `LOG_DIR` - log output directory. Typically not needed if logs are written to stderr.
* `KAFKA_OPTS` - Security config properties (Java client syntax)
* `KAFKA_LOG4J_OPTS` - Java log4j options (can be ignored)
Client communication:
* `{ "name": "offsets_committed", "success": bool, "error": "<errstr>", "offsets": [ { "topic": "<t>", "partition": <p>, "offset": <o> } ] }` - offset commit results, should be emitted for each committed offset. Emit prior to partitions_revoked.
* `{ "name": "records_consumed", "partitions": [ { "topic": "<t>", "partition": <p>, "minOffset": <o>, "maxOffset": <o> } ], "count": <total_consumed> }` - per-partition delta stats from last records_consumed. Emit every 1000 messages, or 1s. Emit prior to partitions_assigned, partitions_revoked and offsets_committed.
* `{ "name": "partitions_revoked", "partitions": [ { "topic": "<t>", "partition": <p> } ] }` - rebalance: revoked partitions
* `{ "name": "partitions_assigned", "partitions": [ { "topic": "<t>", "partition": <p> } ] }` - rebalance: assigned partitions
==================
VerifiableProducer
==================
Command line arguments:
* `--topic <topic>`
* `--broker-list <brokers>`
* `--max-messages <n>`
* `--throughput <msgs/s>`
* `--producer.config <config-file>` - producer config properties (typically empty)
Environment variables:
* `LOG_DIR` - log output directory. Typically not needed if logs are written to stderr.
* `KAFKA_OPTS` - Security config properties (Java client syntax)
* `KAFKA_LOG4J_OPTS` - Java log4j options (can be ignored)
Client communication:
* `{ "name": "producer_send_error", "message": "<error msg>", "topic": "<t>", "key": "<msg key>", "value": "<msg value>" }` - emit on produce error.
* `{ "name": "producer_send_success", "topic": "<t>", "partition": <p>, "offset": <o>, "key": "<msg key>", "value": "<msg value>" }` - emit on produce success.
===========
Development
===========
**Logs:**
During development of kafkatest clients it is generally a good idea to
enable collection of the client's stdout and stderr logs for troubleshooting.
Do this by setting "collect_default" to True for verifiable_consumder_stdout
and .._stderr in verifiable_consumer.py and verifiable_producer.py
**Deployment:**
There's currently no automatic way of deploying 3rd party kafkatest clients
on the VM instance so this needs to be done (at least partially) manually for
now.
One way to do this is logging in to a worker (`vagrant ssh worker1`), downloading
and building the kafkatest client under /vagrant (which maps to the kafka root
directory on the host and is shared with all VM instances).
Also make sure to install any system-level dependencies on each instance.
Then use /vagrant/..../yourkafkatestclient as your run-time path since it will
now be available on all instances.
The VerifiableClientApp automates the per-worker deployment with the optional
"deploy": "/vagrant/../deploy_script.sh" globals configuration property, this
script will be called on the VM just prior to executing the client.
"""
def create_verifiable_client_implementation(context, parent):
"""Factory for generating a verifiable client implementation class instance
:param parent: parent class instance, either VerifiableConsumer or VerifiableProducer
This will first check for a fully qualified client implementation class name
in context.globals as "Verifiable<type>" where <type> is "Producer" or "Consumer",
followed by "VerifiableClient" (which should implement both).
The global object layout is: {"class": "<full class name>", "..anything..": ..}.
If present, construct a new instance, else defaults to VerifiableClientJava
"""
# Default class
obj = {"class": "kafkatest.services.verifiable_client.VerifiableClientJava"}
parent_name = parent.__class__.__name__.rsplit('.', 1)[-1]
for k in [parent_name, "VerifiableClient"]:
if k in context.globals:
obj = context.globals[k]
break
if "class" not in obj:
raise SyntaxError('%s (or VerifiableClient) expected object format: {"class": "full.class.path", ..}' % parent_name)
clname = obj["class"]
# Using the fully qualified classname, import the implementation class
if clname.find('.') == -1:
raise SyntaxError("%s (or VerifiableClient) must specify full class path (including module)" % parent_name)
(module_name, clname) = clname.rsplit('.', 1)
cluster_mod = importlib.import_module(module_name)
impl_class = getattr(cluster_mod, clname)
return impl_class(parent, obj)
class VerifiableClientMixin (object):
"""
Verifiable client mixin class
"""
@property
def impl (self):
"""
:return: Return (and create if necessary) the Verifiable client implementation object.
"""
# Add _impl attribute to parent Verifiable(Consumer|Producer) object.
if not hasattr(self, "_impl"):
setattr(self, "_impl", create_verifiable_client_implementation(self.context, self))
if hasattr(self.context, "logger") and self.context.logger is not None:
self.context.logger.debug("Using client implementation %s for %s" % (self._impl.__class__.__name__, self.__class__.__name__))
return self._impl
def exec_cmd (self, node):
"""
:return: command string to execute client.
Environment variables will be prepended and command line arguments
appended to this string later by start_cmd().
This method should also take care of deploying the client on the instance, if necessary.
"""
raise NotImplementedError()
def pids (self, node):
""" :return: list of pids for this client instance on node """
raise NotImplementedError()
def kill_signal (self, clean_shutdown=True):
""" :return: the kill signal to terminate the application. """
if not clean_shutdown:
return signal.SIGKILL
return self.conf.get("kill_signal", signal.SIGTERM)
class VerifiableClientJava (VerifiableClientMixin):
"""
Verifiable Consumer and Producer using the official Java client.
"""
def __init__(self, parent, conf=None):
"""
:param parent: The parent instance, either VerifiableConsumer or VerifiableProducer
:param conf: Optional conf object (the --globals VerifiableX object)
"""
super(VerifiableClientJava, self).__init__()
self.parent = parent
self.java_class_name = parent.java_class_name()
self.conf = conf
def exec_cmd (self, node):
""" :return: command to execute to start instance
Translates Verifiable* to the corresponding Java client class name """
cmd = ""
if self.java_class_name == 'VerifiableProducer' and node.version <= LATEST_0_8_2:
# 0.8.2.X releases do not have VerifiableProducer.java, so cheat and add
# the tools jar from trunk to the classpath
tools_jar = self.parent.path.jar(TOOLS_JAR_NAME, DEV_BRANCH)
tools_dependant_libs_jar = self.parent.path.jar(TOOLS_DEPENDANT_TEST_LIBS_JAR_NAME, DEV_BRANCH)
cmd += "for file in %s; do CLASSPATH=$CLASSPATH:$file; done; " % tools_jar
cmd += "for file in %s; do CLASSPATH=$CLASSPATH:$file; done; " % tools_dependant_libs_jar
cmd += "export CLASSPATH; "
cmd += self.parent.path.script("kafka-run-class.sh", node) + " org.apache.kafka.tools." + self.java_class_name
return cmd
def pids (self, node):
""" :return: pid(s) for this client intstance on node """
try:
cmd = "jps | grep -i " + self.java_class_name + " | awk '{print $1}'"
pid_arr = [pid for pid in node.account.ssh_capture(cmd, allow_fail=True, callback=int)]
return pid_arr
except (RemoteCommandError, ValueError) as e:
return []
class VerifiableClientDummy (VerifiableClientMixin):
"""
Dummy class for testing the pluggable framework
"""
def __init__(self, parent, conf=None):
"""
:param parent: The parent instance, either VerifiableConsumer or VerifiableProducer
:param conf: Optional conf object (the --globals VerifiableX object)
"""
super(VerifiableClientDummy, self).__init__()
self.parent = parent
self.conf = conf
def exec_cmd (self, node):
""" :return: command to execute to start instance """
return 'echo -e \'{"name": "shutdown_complete" }\n\' ; echo ARGS:'
def pids (self, node):
""" :return: pid(s) for this client intstance on node """
return []
class VerifiableClientApp (VerifiableClientMixin):
"""
VerifiableClient using --global settings for exec_cmd, pids and deploy.
By using this a verifiable client application can be used through simple
--globals configuration rather than implementing a Python class.
"""
def __init__(self, parent, conf):
"""
:param parent: The parent instance, either VerifiableConsumer or VerifiableProducer
:param conf: Optional conf object (the --globals VerifiableX object)
"""
super(VerifiableClientApp, self).__init__()
self.parent = parent
# "VerifiableConsumer" or "VerifiableProducer"
self.name = self.parent.__class__.__name__
self.conf = conf
if "exec_cmd" not in self.conf:
raise SyntaxError("%s requires \"exec_cmd\": .. to be set in --globals %s object" % \
(self.__class__.__name__, self.name))
def exec_cmd (self, node):
""" :return: command to execute to start instance """
self.deploy(node)
return self.conf["exec_cmd"]
def pids (self, node):
""" :return: pid(s) for this client intstance on node """
cmd = self.conf.get("pids", "pgrep -f '" + self.conf["exec_cmd"] + "'")
try:
pid_arr = [pid for pid in node.account.ssh_capture(cmd, allow_fail=True, callback=int)]
self.parent.context.logger.info("%s pids are: %s" % (str(node.account), pid_arr))
return pid_arr
except (subprocess.CalledProcessError, ValueError) as e:
return []
def deploy (self, node):
""" Call deploy script specified by "deploy" --global key
This optional script is run on the VM instance just prior to
executing `exec_cmd` to deploy the kafkatest client.
The script path must be as seen by the VM instance, e.g. /vagrant/.... """
if "deploy" not in self.conf:
return
script_cmd = self.conf["deploy"]
self.parent.context.logger.debug("Deploying %s: %s" % (self, script_cmd))
r = node.account.ssh(script_cmd)

View File

@@ -0,0 +1,418 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import json
import os
from ducktape.services.background_thread import BackgroundThreadService
from kafkatest.directory_layout.kafka_path import KafkaPathResolverMixin
from kafkatest.services.kafka import TopicPartition
from kafkatest.services.verifiable_client import VerifiableClientMixin
from kafkatest.version import DEV_BRANCH, V_2_3_0, V_2_3_1, V_0_10_0_0
class ConsumerState:
Started = 1
Dead = 2
Rebalancing = 3
Joined = 4
class ConsumerEventHandler(object):
def __init__(self, node, verify_offsets, idx):
self.node = node
self.idx = idx
self.state = ConsumerState.Dead
self.revoked_count = 0
self.assigned_count = 0
self.assignment = []
self.position = {}
self.committed = {}
self.total_consumed = 0
self.verify_offsets = verify_offsets
def handle_shutdown_complete(self):
self.state = ConsumerState.Dead
self.assignment = []
self.position = {}
def handle_startup_complete(self):
self.state = ConsumerState.Started
def handle_offsets_committed(self, event, node, logger):
if event["success"]:
for offset_commit in event["offsets"]:
if offset_commit.get("error", "") != "":
logger.debug("%s: Offset commit failed for: %s" % (str(node.account), offset_commit))
continue
topic = offset_commit["topic"]
partition = offset_commit["partition"]
tp = TopicPartition(topic, partition)
offset = offset_commit["offset"]
assert tp in self.assignment, \
"Committed offsets for partition %s not assigned (current assignment: %s)" % \
(str(tp), str(self.assignment))
assert tp in self.position, "No previous position for %s: %s" % (str(tp), event)
assert self.position[tp] >= offset, \
"The committed offset %d was greater than the current position %d for partition %s" % \
(offset, self.position[tp], str(tp))
self.committed[tp] = offset
def handle_records_consumed(self, event, logger):
assert self.state == ConsumerState.Joined, \
"Consumed records should only be received when joined (current state: %s)" % str(self.state)
for record_batch in event["partitions"]:
tp = TopicPartition(topic=record_batch["topic"],
partition=record_batch["partition"])
min_offset = record_batch["minOffset"]
max_offset = record_batch["maxOffset"]
assert tp in self.assignment, \
"Consumed records for partition %s which is not assigned (current assignment: %s)" % \
(str(tp), str(self.assignment))
if tp not in self.position or self.position[tp] == min_offset:
self.position[tp] = max_offset + 1
else:
msg = "Consumed from an unexpected offset (%d, %d) for partition %s" % \
(self.position.get(tp), min_offset, str(tp))
if self.verify_offsets:
raise AssertionError(msg)
else:
if tp in self.position:
self.position[tp] = max_offset + 1
logger.warn(msg)
self.total_consumed += event["count"]
def handle_partitions_revoked(self, event):
self.revoked_count += 1
self.state = ConsumerState.Rebalancing
self.position = {}
def handle_partitions_assigned(self, event):
self.assigned_count += 1
self.state = ConsumerState.Joined
assignment = []
for topic_partition in event["partitions"]:
topic = topic_partition["topic"]
partition = topic_partition["partition"]
assignment.append(TopicPartition(topic, partition))
self.assignment = assignment
def handle_kill_process(self, clean_shutdown):
# if the shutdown was clean, then we expect the explicit
# shutdown event from the consumer
if not clean_shutdown:
self.handle_shutdown_complete()
def current_assignment(self):
return list(self.assignment)
def current_position(self, tp):
if tp in self.position:
return self.position[tp]
else:
return None
def last_commit(self, tp):
if tp in self.committed:
return self.committed[tp]
else:
return None
class VerifiableConsumer(KafkaPathResolverMixin, VerifiableClientMixin, BackgroundThreadService):
"""This service wraps org.apache.kafka.tools.VerifiableConsumer for use in
system testing.
NOTE: this class should be treated as a PUBLIC API. Downstream users use
this service both directly and through class extension, so care must be
taken to ensure compatibility.
"""
PERSISTENT_ROOT = "/mnt/verifiable_consumer"
STDOUT_CAPTURE = os.path.join(PERSISTENT_ROOT, "verifiable_consumer.stdout")
STDERR_CAPTURE = os.path.join(PERSISTENT_ROOT, "verifiable_consumer.stderr")
LOG_DIR = os.path.join(PERSISTENT_ROOT, "logs")
LOG_FILE = os.path.join(LOG_DIR, "verifiable_consumer.log")
LOG4J_CONFIG = os.path.join(PERSISTENT_ROOT, "tools-log4j.properties")
CONFIG_FILE = os.path.join(PERSISTENT_ROOT, "verifiable_consumer.properties")
logs = {
"verifiable_consumer_stdout": {
"path": STDOUT_CAPTURE,
"collect_default": False},
"verifiable_consumer_stderr": {
"path": STDERR_CAPTURE,
"collect_default": False},
"verifiable_consumer_log": {
"path": LOG_FILE,
"collect_default": True}
}
def __init__(self, context, num_nodes, kafka, topic, group_id,
static_membership=False, max_messages=-1, session_timeout_sec=30, enable_autocommit=False,
assignment_strategy=None,
version=DEV_BRANCH, stop_timeout_sec=30, log_level="INFO", jaas_override_variables=None,
on_record_consumed=None, reset_policy="earliest", verify_offsets=True):
"""
:param jaas_override_variables: A dict of variables to be used in the jaas.conf template file
"""
super(VerifiableConsumer, self).__init__(context, num_nodes)
self.log_level = log_level
self.kafka = kafka
self.topic = topic
self.group_id = group_id
self.reset_policy = reset_policy
self.static_membership = static_membership
self.max_messages = max_messages
self.session_timeout_sec = session_timeout_sec
self.enable_autocommit = enable_autocommit
self.assignment_strategy = assignment_strategy
self.prop_file = ""
self.stop_timeout_sec = stop_timeout_sec
self.on_record_consumed = on_record_consumed
self.verify_offsets = verify_offsets
self.event_handlers = {}
self.global_position = {}
self.global_committed = {}
self.jaas_override_variables = jaas_override_variables or {}
for node in self.nodes:
node.version = version
def java_class_name(self):
return "VerifiableConsumer"
def _worker(self, idx, node):
with self.lock:
if node not in self.event_handlers:
self.event_handlers[node] = ConsumerEventHandler(node, self.verify_offsets, idx)
handler = self.event_handlers[node]
node.account.ssh("mkdir -p %s" % VerifiableConsumer.PERSISTENT_ROOT, allow_fail=False)
# Create and upload log properties
log_config = self.render('tools_log4j.properties', log_file=VerifiableConsumer.LOG_FILE)
node.account.create_file(VerifiableConsumer.LOG4J_CONFIG, log_config)
# Create and upload config file
self.security_config = self.kafka.security_config.client_config(self.prop_file, node,
self.jaas_override_variables)
self.security_config.setup_node(node)
self.prop_file += str(self.security_config)
self.logger.info("verifiable_consumer.properties:")
self.logger.info(self.prop_file)
node.account.create_file(VerifiableConsumer.CONFIG_FILE, self.prop_file)
self.security_config.setup_node(node)
# apply group.instance.id to the node for static membership validation
node.group_instance_id = None
if self.static_membership:
assert node.version >= V_2_3_0, \
"Version %s does not support static membership (must be 2.3 or higher)" % str(node.version)
node.group_instance_id = self.group_id + "-instance-" + str(idx)
if self.assignment_strategy:
assert node.version >= V_0_10_0_0, \
"Version %s does not setting an assignment strategy (must be 0.10.0 or higher)" % str(node.version)
cmd = self.start_cmd(node)
self.logger.debug("VerifiableConsumer %d command: %s" % (idx, cmd))
for line in node.account.ssh_capture(cmd):
event = self.try_parse_json(node, line.strip())
if event is not None:
with self.lock:
name = event["name"]
if name == "shutdown_complete":
handler.handle_shutdown_complete()
elif name == "startup_complete":
handler.handle_startup_complete()
elif name == "offsets_committed":
handler.handle_offsets_committed(event, node, self.logger)
self._update_global_committed(event)
elif name == "records_consumed":
handler.handle_records_consumed(event, self.logger)
self._update_global_position(event, node)
elif name == "record_data" and self.on_record_consumed:
self.on_record_consumed(event, node)
elif name == "partitions_revoked":
handler.handle_partitions_revoked(event)
elif name == "partitions_assigned":
handler.handle_partitions_assigned(event)
else:
self.logger.debug("%s: ignoring unknown event: %s" % (str(node.account), event))
def _update_global_position(self, consumed_event, node):
for consumed_partition in consumed_event["partitions"]:
tp = TopicPartition(consumed_partition["topic"], consumed_partition["partition"])
if tp in self.global_committed:
# verify that the position never gets behind the current commit.
if self.global_committed[tp] > consumed_partition["minOffset"]:
msg = "Consumed position %d is behind the current committed offset %d for partition %s" % \
(consumed_partition["minOffset"], self.global_committed[tp], str(tp))
if self.verify_offsets:
raise AssertionError(msg)
else:
self.logger.warn(msg)
# the consumer cannot generally guarantee that the position increases monotonically
# without gaps in the face of hard failures, so we only log a warning when this happens
if tp in self.global_position and self.global_position[tp] != consumed_partition["minOffset"]:
self.logger.warn("%s: Expected next consumed offset of %d for partition %s, but instead saw %d" %
(str(node.account), self.global_position[tp], str(tp), consumed_partition["minOffset"]))
self.global_position[tp] = consumed_partition["maxOffset"] + 1
def _update_global_committed(self, commit_event):
if commit_event["success"]:
for offset_commit in commit_event["offsets"]:
tp = TopicPartition(offset_commit["topic"], offset_commit["partition"])
offset = offset_commit["offset"]
assert self.global_position[tp] >= offset, \
"Committed offset %d for partition %s is ahead of the current position %d" % \
(offset, str(tp), self.global_position[tp])
self.global_committed[tp] = offset
def start_cmd(self, node):
cmd = ""
cmd += "export LOG_DIR=%s;" % VerifiableConsumer.LOG_DIR
cmd += " export KAFKA_OPTS=%s;" % self.security_config.kafka_opts
cmd += " export KAFKA_LOG4J_OPTS=\"-Dlog4j.configuration=file:%s\"; " % VerifiableConsumer.LOG4J_CONFIG
cmd += self.impl.exec_cmd(node)
if self.on_record_consumed:
cmd += " --verbose"
if node.group_instance_id:
cmd += " --group-instance-id %s" % node.group_instance_id
elif node.version == V_2_3_0 or node.version == V_2_3_1:
# In 2.3, --group-instance-id was required, but would be left empty
# if `None` is passed as the argument value
cmd += " --group-instance-id None"
if self.assignment_strategy:
cmd += " --assignment-strategy %s" % self.assignment_strategy
if self.enable_autocommit:
cmd += " --enable-autocommit "
cmd += " --reset-policy %s --group-id %s --topic %s --broker-list %s --session-timeout %s" % \
(self.reset_policy, self.group_id, self.topic,
self.kafka.bootstrap_servers(self.security_config.security_protocol),
self.session_timeout_sec*1000)
if self.max_messages > 0:
cmd += " --max-messages %s" % str(self.max_messages)
cmd += " --consumer.config %s" % VerifiableConsumer.CONFIG_FILE
cmd += " 2>> %s | tee -a %s &" % (VerifiableConsumer.STDOUT_CAPTURE, VerifiableConsumer.STDOUT_CAPTURE)
return cmd
def pids(self, node):
return self.impl.pids(node)
def try_parse_json(self, node, string):
"""Try to parse a string as json. Return None if not parseable."""
try:
return json.loads(string)
except ValueError:
self.logger.debug("%s: Could not parse as json: %s" % (str(node.account), str(string)))
return None
def stop_all(self):
for node in self.nodes:
self.stop_node(node)
def kill_node(self, node, clean_shutdown=True, allow_fail=False):
sig = self.impl.kill_signal(clean_shutdown)
for pid in self.pids(node):
node.account.signal(pid, sig, allow_fail)
with self.lock:
self.event_handlers[node].handle_kill_process(clean_shutdown)
def stop_node(self, node, clean_shutdown=True):
self.kill_node(node, clean_shutdown=clean_shutdown)
stopped = self.wait_node(node, timeout_sec=self.stop_timeout_sec)
assert stopped, "Node %s: did not stop within the specified timeout of %s seconds" % \
(str(node.account), str(self.stop_timeout_sec))
def clean_node(self, node):
self.kill_node(node, clean_shutdown=False)
node.account.ssh("rm -rf " + self.PERSISTENT_ROOT, allow_fail=False)
self.security_config.clean_node(node)
def current_assignment(self):
with self.lock:
return { handler.node: handler.current_assignment() for handler in self.event_handlers.itervalues() }
def current_position(self, tp):
with self.lock:
if tp in self.global_position:
return self.global_position[tp]
else:
return None
def owner(self, tp):
with self.lock:
for handler in self.event_handlers.itervalues():
if tp in handler.current_assignment():
return handler.node
return None
def last_commit(self, tp):
with self.lock:
if tp in self.global_committed:
return self.global_committed[tp]
else:
return None
def total_consumed(self):
with self.lock:
return sum(handler.total_consumed for handler in self.event_handlers.itervalues())
def num_rebalances(self):
with self.lock:
return max(handler.assigned_count for handler in self.event_handlers.itervalues())
def num_revokes_for_alive(self, keep_alive=1):
with self.lock:
return max([handler.revoked_count for handler in self.event_handlers.itervalues()
if handler.idx <= keep_alive])
def joined_nodes(self):
with self.lock:
return [handler.node for handler in self.event_handlers.itervalues()
if handler.state == ConsumerState.Joined]
def rebalancing_nodes(self):
with self.lock:
return [handler.node for handler in self.event_handlers.itervalues()
if handler.state == ConsumerState.Rebalancing]
def dead_nodes(self):
with self.lock:
return [handler.node for handler in self.event_handlers.itervalues()
if handler.state == ConsumerState.Dead]
def alive_nodes(self):
with self.lock:
return [handler.node for handler in self.event_handlers.itervalues()
if handler.state != ConsumerState.Dead]

View File

@@ -0,0 +1,315 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import json
import os
import time
from ducktape.cluster.remoteaccount import RemoteCommandError
from ducktape.services.background_thread import BackgroundThreadService
from kafkatest.directory_layout.kafka_path import KafkaPathResolverMixin
from kafkatest.services.kafka import TopicPartition
from kafkatest.services.verifiable_client import VerifiableClientMixin
from kafkatest.utils import is_int, is_int_with_prefix
from kafkatest.version import DEV_BRANCH
class VerifiableProducer(KafkaPathResolverMixin, VerifiableClientMixin, BackgroundThreadService):
"""This service wraps org.apache.kafka.tools.VerifiableProducer for use in
system testing.
NOTE: this class should be treated as a PUBLIC API. Downstream users use
this service both directly and through class extension, so care must be
taken to ensure compatibility.
"""
PERSISTENT_ROOT = "/mnt/verifiable_producer"
STDOUT_CAPTURE = os.path.join(PERSISTENT_ROOT, "verifiable_producer.stdout")
STDERR_CAPTURE = os.path.join(PERSISTENT_ROOT, "verifiable_producer.stderr")
LOG_DIR = os.path.join(PERSISTENT_ROOT, "logs")
LOG_FILE = os.path.join(LOG_DIR, "verifiable_producer.log")
LOG4J_CONFIG = os.path.join(PERSISTENT_ROOT, "tools-log4j.properties")
CONFIG_FILE = os.path.join(PERSISTENT_ROOT, "verifiable_producer.properties")
logs = {
"verifiable_producer_stdout": {
"path": STDOUT_CAPTURE,
"collect_default": False},
"verifiable_producer_stderr": {
"path": STDERR_CAPTURE,
"collect_default": False},
"verifiable_producer_log": {
"path": LOG_FILE,
"collect_default": True}
}
def __init__(self, context, num_nodes, kafka, topic, max_messages=-1, throughput=100000,
message_validator=is_int, compression_types=None, version=DEV_BRANCH, acks=None,
stop_timeout_sec=150, request_timeout_sec=30, log_level="INFO",
enable_idempotence=False, offline_nodes=[], create_time=-1, repeating_keys=None,
jaas_override_variables=None, kafka_opts_override="", client_prop_file_override="",
retries=None):
"""
Args:
:param max_messages number of messages to be produced per producer
:param message_validator checks for an expected format of messages produced. There are
currently two:
* is_int is an integer format; this is default and expected to be used if
num_nodes = 1
* is_int_with_prefix recommended if num_nodes > 1, because otherwise each producer
will produce exactly same messages, and validation may miss missing messages.
:param compression_types If None, all producers will not use compression; or a list of compression types,
one per producer (could be "none").
:param jaas_override_variables A dict of variables to be used in the jaas.conf template file
:param kafka_opts_override Override parameters of the KAFKA_OPTS environment variable
:param client_prop_file_override Override client.properties file used by the consumer
"""
super(VerifiableProducer, self).__init__(context, num_nodes)
self.log_level = log_level
self.kafka = kafka
self.topic = topic
self.max_messages = max_messages
self.throughput = throughput
self.message_validator = message_validator
self.compression_types = compression_types
if self.compression_types is not None:
assert len(self.compression_types) == num_nodes, "Specify one compression type per node"
for node in self.nodes:
node.version = version
self.acked_values = []
self.acked_values_by_partition = {}
self._last_acked_offsets = {}
self.not_acked_values = []
self.produced_count = {}
self.clean_shutdown_nodes = set()
self.acks = acks
self.stop_timeout_sec = stop_timeout_sec
self.request_timeout_sec = request_timeout_sec
self.enable_idempotence = enable_idempotence
self.offline_nodes = offline_nodes
self.create_time = create_time
self.repeating_keys = repeating_keys
self.jaas_override_variables = jaas_override_variables or {}
self.kafka_opts_override = kafka_opts_override
self.client_prop_file_override = client_prop_file_override
self.retries = retries
def java_class_name(self):
return "VerifiableProducer"
def prop_file(self, node):
idx = self.idx(node)
prop_file = self.render('producer.properties', request_timeout_ms=(self.request_timeout_sec * 1000))
prop_file += "\n{}".format(str(self.security_config))
if self.compression_types is not None:
compression_index = idx - 1
self.logger.info("VerifiableProducer (index = %d) will use compression type = %s", idx,
self.compression_types[compression_index])
prop_file += "\ncompression.type=%s\n" % self.compression_types[compression_index]
return prop_file
def _worker(self, idx, node):
node.account.ssh("mkdir -p %s" % VerifiableProducer.PERSISTENT_ROOT, allow_fail=False)
# Create and upload log properties
log_config = self.render('tools_log4j.properties', log_file=VerifiableProducer.LOG_FILE)
node.account.create_file(VerifiableProducer.LOG4J_CONFIG, log_config)
# Configure security
self.security_config = self.kafka.security_config.client_config(node=node,
jaas_override_variables=self.jaas_override_variables)
self.security_config.setup_node(node)
# Create and upload config file
if self.client_prop_file_override:
producer_prop_file = self.client_prop_file_override
else:
producer_prop_file = self.prop_file(node)
if self.acks is not None:
self.logger.info("VerifiableProducer (index = %d) will use acks = %s", idx, self.acks)
producer_prop_file += "\nacks=%s\n" % self.acks
if self.enable_idempotence:
self.logger.info("Setting up an idempotent producer")
producer_prop_file += "\nmax.in.flight.requests.per.connection=5\n"
producer_prop_file += "\nretries=1000000\n"
producer_prop_file += "\nenable.idempotence=true\n"
elif self.retries is not None:
self.logger.info("VerifiableProducer (index = %d) will use retries = %s", idx, self.retries)
producer_prop_file += "\nretries=%s\n" % self.retries
producer_prop_file += "\ndelivery.timeout.ms=%s\n" % (self.request_timeout_sec * 1000 * self.retries)
self.logger.info("verifiable_producer.properties:")
self.logger.info(producer_prop_file)
node.account.create_file(VerifiableProducer.CONFIG_FILE, producer_prop_file)
cmd = self.start_cmd(node, idx)
self.logger.debug("VerifiableProducer %d command: %s" % (idx, cmd))
self.produced_count[idx] = 0
last_produced_time = time.time()
prev_msg = None
for line in node.account.ssh_capture(cmd):
line = line.strip()
data = self.try_parse_json(line)
if data is not None:
with self.lock:
if data["name"] == "producer_send_error":
data["node"] = idx
self.not_acked_values.append(self.message_validator(data["value"]))
self.produced_count[idx] += 1
elif data["name"] == "producer_send_success":
partition = TopicPartition(data["topic"], data["partition"])
value = self.message_validator(data["value"])
self.acked_values.append(value)
if partition not in self.acked_values_by_partition:
self.acked_values_by_partition[partition] = []
self.acked_values_by_partition[partition].append(value)
self._last_acked_offsets[partition] = data["offset"]
self.produced_count[idx] += 1
# Log information if there is a large gap between successively acknowledged messages
t = time.time()
time_delta_sec = t - last_produced_time
if time_delta_sec > 2 and prev_msg is not None:
self.logger.debug(
"Time delta between successively acked messages is large: " +
"delta_t_sec: %s, prev_message: %s, current_message: %s" % (str(time_delta_sec), str(prev_msg), str(data)))
last_produced_time = t
prev_msg = data
elif data["name"] == "shutdown_complete":
if node in self.clean_shutdown_nodes:
raise Exception("Unexpected shutdown event from producer, already shutdown. Producer index: %d" % idx)
self.clean_shutdown_nodes.add(node)
def _has_output(self, node):
"""Helper used as a proxy to determine whether jmx is running by that jmx_tool_log contains output."""
try:
node.account.ssh("test -z \"$(cat %s)\"" % VerifiableProducer.STDOUT_CAPTURE, allow_fail=False)
return False
except RemoteCommandError:
return True
def start_cmd(self, node, idx):
cmd = "export LOG_DIR=%s;" % VerifiableProducer.LOG_DIR
if self.kafka_opts_override:
cmd += " export KAFKA_OPTS=\"%s\";" % self.kafka_opts_override
else:
cmd += " export KAFKA_OPTS=%s;" % self.security_config.kafka_opts
cmd += " export KAFKA_LOG4J_OPTS=\"-Dlog4j.configuration=file:%s\"; " % VerifiableProducer.LOG4J_CONFIG
cmd += self.impl.exec_cmd(node)
cmd += " --topic %s --broker-list %s" % (self.topic, self.kafka.bootstrap_servers(self.security_config.security_protocol, True, self.offline_nodes))
if self.max_messages > 0:
cmd += " --max-messages %s" % str(self.max_messages)
if self.throughput > 0:
cmd += " --throughput %s" % str(self.throughput)
if self.message_validator == is_int_with_prefix:
cmd += " --value-prefix %s" % str(idx)
if self.acks is not None:
cmd += " --acks %s " % str(self.acks)
if self.create_time > -1:
cmd += " --message-create-time %s " % str(self.create_time)
if self.repeating_keys is not None:
cmd += " --repeating-keys %s " % str(self.repeating_keys)
cmd += " --producer.config %s" % VerifiableProducer.CONFIG_FILE
cmd += " 2>> %s | tee -a %s &" % (VerifiableProducer.STDOUT_CAPTURE, VerifiableProducer.STDOUT_CAPTURE)
return cmd
def kill_node(self, node, clean_shutdown=True, allow_fail=False):
sig = self.impl.kill_signal(clean_shutdown)
for pid in self.pids(node):
node.account.signal(pid, sig, allow_fail)
def pids(self, node):
return self.impl.pids(node)
def alive(self, node):
return len(self.pids(node)) > 0
@property
def last_acked_offsets(self):
with self.lock:
return self._last_acked_offsets
@property
def acked(self):
with self.lock:
return self.acked_values
@property
def acked_by_partition(self):
with self.lock:
return self.acked_values_by_partition
@property
def not_acked(self):
with self.lock:
return self.not_acked_values
@property
def num_acked(self):
with self.lock:
return len(self.acked_values)
@property
def num_not_acked(self):
with self.lock:
return len(self.not_acked_values)
def each_produced_at_least(self, count):
with self.lock:
for idx in range(1, self.num_nodes + 1):
if self.produced_count.get(idx) is None or self.produced_count[idx] < count:
return False
return True
def stop_node(self, node):
# There is a race condition on shutdown if using `max_messages` since the
# VerifiableProducer will shutdown automatically when all messages have been
# written. In this case, the process will be gone and the signal will fail.
allow_fail = self.max_messages > 0
self.kill_node(node, clean_shutdown=True, allow_fail=allow_fail)
stopped = self.wait_node(node, timeout_sec=self.stop_timeout_sec)
assert stopped, "Node %s: did not stop within the specified timeout of %s seconds" % \
(str(node.account), str(self.stop_timeout_sec))
def clean_node(self, node):
self.kill_node(node, clean_shutdown=False, allow_fail=False)
node.account.ssh("rm -rf " + self.PERSISTENT_ROOT, allow_fail=False)
self.security_config.clean_node(node)
def try_parse_json(self, string):
"""Try to parse a string as json. Return None if not parseable."""
try:
record = json.loads(string)
return record
except ValueError:
self.logger.debug("Could not parse as json: %s" % str(string))
return None

View File

@@ -0,0 +1,251 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import re
import time
from ducktape.services.service import Service
from ducktape.utils.util import wait_until
from ducktape.cluster.remoteaccount import RemoteCommandError
from kafkatest.directory_layout.kafka_path import KafkaPathResolverMixin
from kafkatest.services.security.security_config import SecurityConfig
from kafkatest.version import DEV_BRANCH
class ZookeeperService(KafkaPathResolverMixin, Service):
ROOT = "/mnt/zookeeper"
DATA = os.path.join(ROOT, "data")
HEAP_DUMP_FILE = os.path.join(ROOT, "zk_heap_dump.bin")
logs = {
"zk_log": {
"path": "%s/zk.log" % ROOT,
"collect_default": True},
"zk_data": {
"path": DATA,
"collect_default": False},
"zk_heap_dump_file": {
"path": HEAP_DUMP_FILE,
"collect_default": True}
}
def __init__(self, context, num_nodes, zk_sasl = False, zk_client_port = True, zk_client_secure_port = False,
zk_tls_encrypt_only = False):
"""
:type context
"""
self.kafka_opts = ""
self.zk_sasl = zk_sasl
if not zk_client_port and not zk_client_secure_port:
raise Exception("Cannot disable both ZK clientPort and clientSecurePort")
self.zk_client_port = zk_client_port
self.zk_client_secure_port = zk_client_secure_port
self.zk_tls_encrypt_only = zk_tls_encrypt_only
super(ZookeeperService, self).__init__(context, num_nodes)
@property
def security_config(self):
return SecurityConfig(self.context, zk_sasl=self.zk_sasl, zk_tls=self.zk_client_secure_port)
@property
def security_system_properties(self):
return "-Dzookeeper.authProvider.sasl=org.apache.zookeeper.server.auth.SASLAuthenticationProvider " \
"-DjaasLoginRenew=3600000 " \
"-Djava.security.auth.login.config=%s " \
"-Djava.security.krb5.conf=%s " % (self.security_config.JAAS_CONF_PATH, self.security_config.KRB5CONF_PATH)
@property
def zk_principals(self):
return " zkclient " + ' '.join(['zookeeper/' + zk_node.account.hostname for zk_node in self.nodes])
def restart_cluster(self):
for node in self.nodes:
self.restart_node(node)
def restart_node(self, node):
"""Restart the given node."""
self.stop_node(node)
self.start_node(node)
def start_node(self, node):
idx = self.idx(node)
self.logger.info("Starting ZK node %d on %s", idx, node.account.hostname)
node.account.ssh("mkdir -p %s" % ZookeeperService.DATA)
node.account.ssh("echo %d > %s/myid" % (idx, ZookeeperService.DATA))
self.security_config.setup_node(node)
config_file = self.render('zookeeper.properties')
self.logger.info("zookeeper.properties:")
self.logger.info(config_file)
node.account.create_file("%s/zookeeper.properties" % ZookeeperService.ROOT, config_file)
heap_kafka_opts = "-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=%s" % self.logs["zk_heap_dump_file"]["path"]
other_kafka_opts = self.kafka_opts + ' ' + self.security_system_properties \
if self.security_config.zk_sasl else self.kafka_opts
start_cmd = "export KAFKA_OPTS=\"%s %s\";" % (heap_kafka_opts, other_kafka_opts)
start_cmd += "%s " % self.path.script("zookeeper-server-start.sh", node)
start_cmd += "%s/zookeeper.properties &>> %s &" % (ZookeeperService.ROOT, self.logs["zk_log"]["path"])
node.account.ssh(start_cmd)
wait_until(lambda: self.listening(node), timeout_sec=30, err_msg="Zookeeper node failed to start")
def listening(self, node):
try:
port = 2181 if self.zk_client_port else 2182
cmd = "nc -z %s %s" % (node.account.hostname, port)
node.account.ssh_output(cmd, allow_fail=False)
self.logger.debug("Zookeeper started accepting connections at: '%s:%s')", node.account.hostname, port)
return True
except (RemoteCommandError, ValueError) as e:
return False
def pids(self, node):
return node.account.java_pids(self.java_class_name())
def alive(self, node):
return len(self.pids(node)) > 0
def stop_node(self, node):
idx = self.idx(node)
self.logger.info("Stopping %s node %d on %s" % (type(self).__name__, idx, node.account.hostname))
node.account.kill_java_processes(self.java_class_name(), allow_fail=False)
node.account.kill_java_processes(self.java_cli_class_name(), allow_fail=False)
wait_until(lambda: not self.alive(node), timeout_sec=5, err_msg="Timed out waiting for zookeeper to stop.")
def clean_node(self, node):
self.logger.info("Cleaning ZK node %d on %s", self.idx(node), node.account.hostname)
if self.alive(node):
self.logger.warn("%s %s was still alive at cleanup time. Killing forcefully..." %
(self.__class__.__name__, node.account))
node.account.kill_java_processes(self.java_class_name(),
clean_shutdown=False, allow_fail=True)
node.account.kill_java_processes(self.java_cli_class_name(),
clean_shutdown=False, allow_fail=False)
node.account.ssh("rm -rf -- %s" % ZookeeperService.ROOT, allow_fail=False)
# force_tls is a necessary option for the case where we define both encrypted and non-encrypted ports
def connect_setting(self, chroot=None, force_tls=False):
if chroot and not chroot.startswith("/"):
raise Exception("ZK chroot must start with '/', invalid chroot: %s" % chroot)
chroot = '' if chroot is None else chroot
return ','.join([node.account.hostname + (':2182' if not self.zk_client_port or force_tls else ':2181') + chroot
for node in self.nodes])
def zkTlsConfigFileOption(self, forZooKeeperMain=False):
if not self.zk_client_secure_port:
return ""
return ("-zk-tls-config-file " if forZooKeeperMain else "--zk-tls-config-file ") + \
(SecurityConfig.ZK_CLIENT_TLS_ENCRYPT_ONLY_CONFIG_PATH if self.zk_tls_encrypt_only else SecurityConfig.ZK_CLIENT_MUTUAL_AUTH_CONFIG_PATH)
#
# This call is used to simulate a rolling upgrade to enable/disable
# the use of ZooKeeper ACLs.
#
def zookeeper_migration(self, node, zk_acl):
la_migra_cmd = "export KAFKA_OPTS=\"%s\";" % \
self.security_system_properties if self.security_config.zk_sasl else ""
la_migra_cmd += "%s --zookeeper.acl=%s --zookeeper.connect=%s %s" % \
(self.path.script("zookeeper-security-migration.sh", node), zk_acl,
self.connect_setting(force_tls=self.zk_client_secure_port),
self.zkTlsConfigFileOption())
node.account.ssh(la_migra_cmd)
def _check_chroot(self, chroot):
if chroot and not chroot.startswith("/"):
raise Exception("ZK chroot must start with '/', invalid chroot: %s" % chroot)
def query(self, path, chroot=None):
"""
Queries zookeeper for data associated with 'path' and returns all fields in the schema
"""
self._check_chroot(chroot)
chroot_path = ('' if chroot is None else chroot) + path
kafka_run_class = self.path.script("kafka-run-class.sh", DEV_BRANCH)
cmd = "%s %s -server %s %s get %s" % \
(kafka_run_class, self.java_cli_class_name(), self.connect_setting(force_tls=self.zk_client_secure_port),
self.zkTlsConfigFileOption(True),
chroot_path)
self.logger.debug(cmd)
node = self.nodes[0]
result = None
for line in node.account.ssh_capture(cmd, allow_fail=True):
# loop through all lines in the output, but only hold on to the first match
if result is None:
match = re.match("^({.+})$", line)
if match is not None:
result = match.groups()[0]
return result
def create(self, path, chroot=None, value=""):
"""
Create an znode at the given path
"""
self._check_chroot(chroot)
chroot_path = ('' if chroot is None else chroot) + path
kafka_run_class = self.path.script("kafka-run-class.sh", DEV_BRANCH)
cmd = "%s %s -server %s %s create %s '%s'" % \
(kafka_run_class, self.java_cli_class_name(), self.connect_setting(force_tls=self.zk_client_secure_port),
self.zkTlsConfigFileOption(True),
chroot_path, value)
self.logger.debug(cmd)
output = self.nodes[0].account.ssh_output(cmd)
self.logger.debug(output)
def describe(self, topic):
"""
Describe the given topic using the ConfigCommand CLI
"""
kafka_run_class = self.path.script("kafka-run-class.sh", DEV_BRANCH)
cmd = "%s kafka.admin.ConfigCommand --zookeeper %s %s --describe --topic %s" % \
(kafka_run_class, self.connect_setting(force_tls=self.zk_client_secure_port),
self.zkTlsConfigFileOption(),
topic)
self.logger.debug(cmd)
output = self.nodes[0].account.ssh_output(cmd)
self.logger.debug(output)
def list_acls(self, topic):
"""
List ACLs for the given topic using the AclCommand CLI
"""
kafka_run_class = self.path.script("kafka-run-class.sh", DEV_BRANCH)
cmd = "%s kafka.admin.AclCommand --authorizer-properties zookeeper.connect=%s %s --list --topic %s" % \
(kafka_run_class, self.connect_setting(force_tls=self.zk_client_secure_port),
self.zkTlsConfigFileOption(),
topic)
self.logger.debug(cmd)
output = self.nodes[0].account.ssh_output(cmd)
self.logger.debug(output)
def java_class_name(self):
""" The class name of the Zookeeper quorum peers. """
return "org.apache.zookeeper.server.quorum.QuorumPeerMain"
def java_cli_class_name(self):
""" The class name of the Zookeeper tool within Kafka. """
return "org.apache.zookeeper.ZooKeeperMainWithTlsSupportForKafka"

View File

@@ -0,0 +1,14 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

View File

@@ -0,0 +1,14 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

View File

@@ -0,0 +1,123 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import time
from random import randint
from ducktape.mark import parametrize
from ducktape.tests.test import TestContext
from kafkatest.services.zookeeper import ZookeeperService
from kafkatest.services.kafka import KafkaService
from ducktape.tests.test import Test
from kafkatest.version import DEV_BRANCH, LATEST_0_10_0, LATEST_0_10_1, LATEST_0_10_2, LATEST_0_11_0, LATEST_1_0, LATEST_1_1, LATEST_2_0, LATEST_2_1, LATEST_2_2, LATEST_2_3, LATEST_2_4, V_0_11_0_0, V_0_10_1_0, KafkaVersion
def get_broker_features(broker_version):
features = {}
if broker_version < V_0_10_1_0:
features["create-topics-supported"] = False
features["offsets-for-times-supported"] = False
features["cluster-id-supported"] = False
features["expect-record-too-large-exception"] = True
else:
features["create-topics-supported"] = True
features["offsets-for-times-supported"] = True
features["cluster-id-supported"] = True
features["expect-record-too-large-exception"] = False
if broker_version < V_0_11_0_0:
features["describe-acls-supported"] = False
else:
features["describe-acls-supported"] = True
return features
def run_command(node, cmd, ssh_log_file):
with open(ssh_log_file, 'w') as f:
f.write("Running %s\n" % cmd)
try:
for line in node.account.ssh_capture(cmd):
f.write(line)
except Exception as e:
f.write("** Command failed!")
print e
raise
class ClientCompatibilityFeaturesTest(Test):
"""
Tests clients for the presence or absence of specific features when communicating with brokers with various
versions. Relies on ClientCompatibilityTest.java for much of the functionality.
"""
def __init__(self, test_context):
""":type test_context: ducktape.tests.test.TestContext"""
super(ClientCompatibilityFeaturesTest, self).__init__(test_context=test_context)
self.zk = ZookeeperService(test_context, num_nodes=3)
# Generate a unique topic name
topic_name = "client_compat_features_topic_%d%d" % (int(time.time()), randint(0, 2147483647))
self.topics = { topic_name: {
"partitions": 1, # Use only one partition to avoid worrying about ordering
"replication-factor": 3
}}
self.kafka = KafkaService(test_context, num_nodes=3, zk=self.zk, topics=self.topics)
def invoke_compatibility_program(self, features):
# Run the compatibility test on the first Kafka node.
node = self.zk.nodes[0]
cmd = ("%s org.apache.kafka.tools.ClientCompatibilityTest "
"--bootstrap-server %s "
"--num-cluster-nodes %d "
"--topic %s " % (self.zk.path.script("kafka-run-class.sh", node),
self.kafka.bootstrap_servers(),
len(self.kafka.nodes),
self.topics.keys()[0]))
for k, v in features.iteritems():
cmd = cmd + ("--%s %s " % (k, v))
results_dir = TestContext.results_dir(self.test_context, 0)
try:
os.makedirs(results_dir)
except OSError as e:
if e.errno == errno.EEXIST and os.path.isdir(path):
pass
else:
raise
ssh_log_file = "%s/%s" % (results_dir, "client_compatibility_test_output.txt")
try:
self.logger.info("Running %s" % cmd)
run_command(node, cmd, ssh_log_file)
except Exception as e:
self.logger.info("** Command failed. See %s for log messages." % ssh_log_file)
raise
@parametrize(broker_version=str(DEV_BRANCH))
@parametrize(broker_version=str(LATEST_0_10_0))
@parametrize(broker_version=str(LATEST_0_10_1))
@parametrize(broker_version=str(LATEST_0_10_2))
@parametrize(broker_version=str(LATEST_0_11_0))
@parametrize(broker_version=str(LATEST_1_0))
@parametrize(broker_version=str(LATEST_1_1))
@parametrize(broker_version=str(LATEST_2_0))
@parametrize(broker_version=str(LATEST_2_1))
@parametrize(broker_version=str(LATEST_2_2))
@parametrize(broker_version=str(LATEST_2_3))
@parametrize(broker_version=str(LATEST_2_4))
def run_compatibility_test(self, broker_version):
self.zk.start()
self.kafka.set_version(KafkaVersion(broker_version))
self.kafka.start()
features = get_broker_features(broker_version)
self.invoke_compatibility_program(features)

View File

@@ -0,0 +1,84 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from ducktape.mark import parametrize
from ducktape.utils.util import wait_until
from kafkatest.services.zookeeper import ZookeeperService
from kafkatest.services.kafka import KafkaService
from kafkatest.services.verifiable_producer import VerifiableProducer
from kafkatest.services.console_consumer import ConsoleConsumer
from kafkatest.tests.produce_consume_validate import ProduceConsumeValidateTest
from kafkatest.utils import is_int_with_prefix
from kafkatest.version import DEV_BRANCH, LATEST_0_10_0, LATEST_0_10_1, LATEST_0_10_2, LATEST_0_11_0, LATEST_1_0, LATEST_1_1, LATEST_2_0, LATEST_2_1, LATEST_2_2, LATEST_2_3, LATEST_2_4, KafkaVersion
class ClientCompatibilityProduceConsumeTest(ProduceConsumeValidateTest):
"""
These tests validate that we can use a new client to produce and consume from older brokers.
"""
def __init__(self, test_context):
""":type test_context: ducktape.tests.test.TestContext"""
super(ClientCompatibilityProduceConsumeTest, self).__init__(test_context=test_context)
self.topic = "test_topic"
self.zk = ZookeeperService(test_context, num_nodes=3)
self.kafka = KafkaService(test_context, num_nodes=3, zk=self.zk, topics={self.topic:{
"partitions": 10,
"replication-factor": 2}})
self.num_partitions = 10
self.timeout_sec = 60
self.producer_throughput = 1000
self.num_producers = 2
self.messages_per_producer = 1000
self.num_consumers = 1
def setUp(self):
self.zk.start()
def min_cluster_size(self):
# Override this since we're adding services outside of the constructor
return super(ClientCompatibilityProduceConsumeTest, self).min_cluster_size() + self.num_producers + self.num_consumers
@parametrize(broker_version=str(DEV_BRANCH))
@parametrize(broker_version=str(LATEST_0_10_0))
@parametrize(broker_version=str(LATEST_0_10_1))
@parametrize(broker_version=str(LATEST_0_10_2))
@parametrize(broker_version=str(LATEST_0_11_0))
@parametrize(broker_version=str(LATEST_1_0))
@parametrize(broker_version=str(LATEST_1_1))
@parametrize(broker_version=str(LATEST_2_0))
@parametrize(broker_version=str(LATEST_2_1))
@parametrize(broker_version=str(LATEST_2_2))
@parametrize(broker_version=str(LATEST_2_3))
@parametrize(broker_version=str(LATEST_2_4))
def test_produce_consume(self, broker_version):
print("running producer_consumer_compat with broker_version = %s" % broker_version)
self.kafka.set_version(KafkaVersion(broker_version))
self.kafka.security_protocol = "PLAINTEXT"
self.kafka.interbroker_security_protocol = self.kafka.security_protocol
self.producer = VerifiableProducer(self.test_context, self.num_producers, self.kafka,
self.topic, throughput=self.producer_throughput,
message_validator=is_int_with_prefix)
self.consumer = ConsoleConsumer(self.test_context, self.num_consumers, self.kafka, self.topic,
consumer_timeout_ms=60000,
message_validator=is_int_with_prefix)
self.kafka.start()
self.run_produce_consume_validate(lambda: wait_until(
lambda: self.producer.each_produced_at_least(self.messages_per_producer) == True,
timeout_sec=120, backoff_sec=1,
err_msg="Producer did not produce all messages in reasonable amount of time"))

View File

@@ -0,0 +1,87 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from ducktape.mark import parametrize
from ducktape.utils.util import wait_until
from ducktape.mark.resource import cluster
from kafkatest.services.zookeeper import ZookeeperService
from kafkatest.services.kafka import KafkaService
from kafkatest.services.verifiable_producer import VerifiableProducer
from kafkatest.services.console_consumer import ConsoleConsumer
from kafkatest.tests.produce_consume_validate import ProduceConsumeValidateTest
from kafkatest.utils import is_int_with_prefix
class CompressionTest(ProduceConsumeValidateTest):
"""
These tests validate produce / consume for compressed topics.
"""
COMPRESSION_TYPES = ["snappy", "gzip", "lz4", "zstd", "none"]
def __init__(self, test_context):
""":type test_context: ducktape.tests.test.TestContext"""
super(CompressionTest, self).__init__(test_context=test_context)
self.topic = "test_topic"
self.zk = ZookeeperService(test_context, num_nodes=1)
self.kafka = KafkaService(test_context, num_nodes=1, zk=self.zk, topics={self.topic: {
"partitions": 10,
"replication-factor": 1}})
self.num_partitions = 10
self.timeout_sec = 60
self.producer_throughput = 1000
self.num_producers = len(self.COMPRESSION_TYPES)
self.messages_per_producer = 1000
self.num_consumers = 1
def setUp(self):
self.zk.start()
def min_cluster_size(self):
# Override this since we're adding services outside of the constructor
return super(CompressionTest, self).min_cluster_size() + self.num_producers + self.num_consumers
@cluster(num_nodes=8)
@parametrize(compression_types=COMPRESSION_TYPES)
def test_compressed_topic(self, compression_types):
"""Test produce => consume => validate for compressed topics
Setup: 1 zk, 1 kafka node, 1 topic with partitions=10, replication-factor=1
compression_types parameter gives a list of compression types (or no compression if
"none"). Each producer in a VerifiableProducer group (num_producers = number of compression
types) will use a compression type from the list based on producer's index in the group.
- Produce messages in the background
- Consume messages in the background
- Stop producing, and finish consuming
- Validate that every acked message was consumed
"""
self.kafka.security_protocol = "PLAINTEXT"
self.kafka.interbroker_security_protocol = self.kafka.security_protocol
self.producer = VerifiableProducer(self.test_context, self.num_producers, self.kafka,
self.topic, throughput=self.producer_throughput,
message_validator=is_int_with_prefix,
compression_types=compression_types)
self.consumer = ConsoleConsumer(self.test_context, self.num_consumers, self.kafka, self.topic,
consumer_timeout_ms=60000, message_validator=is_int_with_prefix)
self.kafka.start()
self.run_produce_consume_validate(lambda: wait_until(
lambda: self.producer.each_produced_at_least(self.messages_per_producer) == True,
timeout_sec=120, backoff_sec=1,
err_msg="Producer did not produce all messages in reasonable amount of time"))

View File

@@ -0,0 +1,86 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from ducktape.mark.resource import cluster
from kafkatest.tests.verifiable_consumer_test import VerifiableConsumerTest
from kafkatest.services.kafka import TopicPartition
class ConsumerRollingUpgradeTest(VerifiableConsumerTest):
TOPIC = "test_topic"
NUM_PARTITIONS = 4
RANGE = "org.apache.kafka.clients.consumer.RangeAssignor"
ROUND_ROBIN = "org.apache.kafka.clients.consumer.RoundRobinAssignor"
def __init__(self, test_context):
super(ConsumerRollingUpgradeTest, self).__init__(test_context, num_consumers=2, num_producers=0,
num_zk=1, num_brokers=1, topics={
self.TOPIC : { 'partitions': self.NUM_PARTITIONS, 'replication-factor': 1 }
})
def _verify_range_assignment(self, consumer):
# range assignment should give us two partition sets: (0, 1) and (2, 3)
assignment = set([frozenset(partitions) for partitions in consumer.current_assignment().values()])
assert assignment == set([
frozenset([TopicPartition(self.TOPIC, 0), TopicPartition(self.TOPIC, 1)]),
frozenset([TopicPartition(self.TOPIC, 2), TopicPartition(self.TOPIC, 3)])]), \
"Mismatched assignment: %s" % assignment
def _verify_roundrobin_assignment(self, consumer):
assignment = set([frozenset(x) for x in consumer.current_assignment().values()])
assert assignment == set([
frozenset([TopicPartition(self.TOPIC, 0), TopicPartition(self.TOPIC, 2)]),
frozenset([TopicPartition(self.TOPIC, 1), TopicPartition(self.TOPIC, 3)])]), \
"Mismatched assignment: %s" % assignment
@cluster(num_nodes=4)
def rolling_update_test(self):
"""
Verify rolling updates of partition assignment strategies works correctly. In this
test, we use a rolling restart to change the group's assignment strategy from "range"
to "roundrobin." We verify after every restart that all members are still in the group
and that the correct assignment strategy was used.
"""
# initialize the consumer using range assignment
consumer = self.setup_consumer(self.TOPIC, assignment_strategy=self.RANGE)
consumer.start()
self.await_all_members(consumer)
self._verify_range_assignment(consumer)
# change consumer configuration to prefer round-robin assignment, but still support range assignment
consumer.assignment_strategy = self.ROUND_ROBIN + "," + self.RANGE
# restart one of the nodes and verify that we are still using range assignment
consumer.stop_node(consumer.nodes[0])
consumer.start_node(consumer.nodes[0])
self.await_all_members(consumer)
self._verify_range_assignment(consumer)
# now restart the other node and verify that we have switched to round-robin
consumer.stop_node(consumer.nodes[1])
consumer.start_node(consumer.nodes[1])
self.await_all_members(consumer)
self._verify_roundrobin_assignment(consumer)
# if we want, we can now drop support for range assignment
consumer.assignment_strategy = self.ROUND_ROBIN
for node in consumer.nodes:
consumer.stop_node(node)
consumer.start_node(node)
self.await_all_members(consumer)
self._verify_roundrobin_assignment(consumer)

View File

@@ -0,0 +1,430 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from ducktape.mark import matrix
from ducktape.utils.util import wait_until
from ducktape.mark.resource import cluster
from kafkatest.tests.verifiable_consumer_test import VerifiableConsumerTest
from kafkatest.services.kafka import TopicPartition
import signal
class OffsetValidationTest(VerifiableConsumerTest):
TOPIC = "test_topic"
NUM_PARTITIONS = 1
def __init__(self, test_context):
super(OffsetValidationTest, self).__init__(test_context, num_consumers=3, num_producers=1,
num_zk=1, num_brokers=2, topics={
self.TOPIC : { 'partitions': self.NUM_PARTITIONS, 'replication-factor': 2 }
})
def rolling_bounce_consumers(self, consumer, keep_alive=0, num_bounces=5, clean_shutdown=True):
for _ in range(num_bounces):
for node in consumer.nodes[keep_alive:]:
consumer.stop_node(node, clean_shutdown)
wait_until(lambda: len(consumer.dead_nodes()) == 1,
timeout_sec=self.session_timeout_sec+5,
err_msg="Timed out waiting for the consumer to shutdown")
consumer.start_node(node)
self.await_all_members(consumer)
self.await_consumed_messages(consumer)
def bounce_all_consumers(self, consumer, keep_alive=0, num_bounces=5, clean_shutdown=True):
for _ in range(num_bounces):
for node in consumer.nodes[keep_alive:]:
consumer.stop_node(node, clean_shutdown)
wait_until(lambda: len(consumer.dead_nodes()) == self.num_consumers - keep_alive, timeout_sec=10,
err_msg="Timed out waiting for the consumers to shutdown")
for node in consumer.nodes[keep_alive:]:
consumer.start_node(node)
self.await_all_members(consumer)
self.await_consumed_messages(consumer)
def rolling_bounce_brokers(self, consumer, num_bounces=5, clean_shutdown=True):
for _ in range(num_bounces):
for node in self.kafka.nodes:
self.kafka.restart_node(node, clean_shutdown=True)
self.await_all_members(consumer)
self.await_consumed_messages(consumer)
def setup_consumer(self, topic, **kwargs):
# collect verifiable consumer events since this makes debugging much easier
consumer = super(OffsetValidationTest, self).setup_consumer(topic, **kwargs)
self.mark_for_collect(consumer, 'verifiable_consumer_stdout')
return consumer
@cluster(num_nodes=7)
def test_broker_rolling_bounce(self):
"""
Verify correct consumer behavior when the brokers are consecutively restarted.
Setup: single Kafka cluster with one producer writing messages to a single topic with one
partition, an a set of consumers in the same group reading from the same topic.
- Start a producer which continues producing new messages throughout the test.
- Start up the consumers and wait until they've joined the group.
- In a loop, restart each broker consecutively, waiting for the group to stabilize between
each broker restart.
- Verify delivery semantics according to the failure type and that the broker bounces
did not cause unexpected group rebalances.
"""
partition = TopicPartition(self.TOPIC, 0)
producer = self.setup_producer(self.TOPIC)
consumer = self.setup_consumer(self.TOPIC)
producer.start()
self.await_produced_messages(producer)
consumer.start()
self.await_all_members(consumer)
num_rebalances = consumer.num_rebalances()
# TODO: make this test work with hard shutdowns, which probably requires
# pausing before the node is restarted to ensure that any ephemeral
# nodes have time to expire
self.rolling_bounce_brokers(consumer, clean_shutdown=True)
unexpected_rebalances = consumer.num_rebalances() - num_rebalances
assert unexpected_rebalances == 0, \
"Broker rolling bounce caused %d unexpected group rebalances" % unexpected_rebalances
consumer.stop_all()
assert consumer.current_position(partition) == consumer.total_consumed(), \
"Total consumed records %d did not match consumed position %d" % \
(consumer.total_consumed(), consumer.current_position(partition))
@cluster(num_nodes=7)
@matrix(clean_shutdown=[True], bounce_mode=["all", "rolling"])
def test_consumer_bounce(self, clean_shutdown, bounce_mode):
"""
Verify correct consumer behavior when the consumers in the group are consecutively restarted.
Setup: single Kafka cluster with one producer and a set of consumers in one group.
- Start a producer which continues producing new messages throughout the test.
- Start up the consumers and wait until they've joined the group.
- In a loop, restart each consumer, waiting for each one to rejoin the group before
restarting the rest.
- Verify delivery semantics according to the failure type.
"""
partition = TopicPartition(self.TOPIC, 0)
producer = self.setup_producer(self.TOPIC)
consumer = self.setup_consumer(self.TOPIC)
producer.start()
self.await_produced_messages(producer)
consumer.start()
self.await_all_members(consumer)
if bounce_mode == "all":
self.bounce_all_consumers(consumer, clean_shutdown=clean_shutdown)
else:
self.rolling_bounce_consumers(consumer, clean_shutdown=clean_shutdown)
consumer.stop_all()
if clean_shutdown:
# if the total records consumed matches the current position, we haven't seen any duplicates
# this can only be guaranteed with a clean shutdown
assert consumer.current_position(partition) == consumer.total_consumed(), \
"Total consumed records %d did not match consumed position %d" % \
(consumer.total_consumed(), consumer.current_position(partition))
else:
# we may have duplicates in a hard failure
assert consumer.current_position(partition) <= consumer.total_consumed(), \
"Current position %d greater than the total number of consumed records %d" % \
(consumer.current_position(partition), consumer.total_consumed())
@cluster(num_nodes=7)
@matrix(clean_shutdown=[True], static_membership=[True, False], bounce_mode=["all", "rolling"], num_bounces=[5])
def test_static_consumer_bounce(self, clean_shutdown, static_membership, bounce_mode, num_bounces):
"""
Verify correct static consumer behavior when the consumers in the group are restarted. In order to make
sure the behavior of static members are different from dynamic ones, we take both static and dynamic
membership into this test suite.
Setup: single Kafka cluster with one producer and a set of consumers in one group.
- Start a producer which continues producing new messages throughout the test.
- Start up the consumers as static/dynamic members and wait until they've joined the group.
- In a loop, restart each consumer except the first member (note: may not be the leader), and expect no rebalance triggered
during this process if the group is in static membership.
"""
partition = TopicPartition(self.TOPIC, 0)
producer = self.setup_producer(self.TOPIC)
producer.start()
self.await_produced_messages(producer)
self.session_timeout_sec = 60
consumer = self.setup_consumer(self.TOPIC, static_membership=static_membership)
consumer.start()
self.await_all_members(consumer)
num_revokes_before_bounce = consumer.num_revokes_for_alive()
num_keep_alive = 1
if bounce_mode == "all":
self.bounce_all_consumers(consumer, keep_alive=num_keep_alive, num_bounces=num_bounces)
else:
self.rolling_bounce_consumers(consumer, keep_alive=num_keep_alive, num_bounces=num_bounces)
num_revokes_after_bounce = consumer.num_revokes_for_alive() - num_revokes_before_bounce
check_condition = num_revokes_after_bounce != 0
# under static membership, the live consumer shall not revoke any current running partitions,
# since there is no global rebalance being triggered.
if static_membership:
check_condition = num_revokes_after_bounce == 0
assert check_condition, \
"Total revoked count %d does not match the expectation of having 0 revokes as %d" % \
(num_revokes_after_bounce, check_condition)
consumer.stop_all()
if clean_shutdown:
# if the total records consumed matches the current position, we haven't seen any duplicates
# this can only be guaranteed with a clean shutdown
assert consumer.current_position(partition) == consumer.total_consumed(), \
"Total consumed records %d did not match consumed position %d" % \
(consumer.total_consumed(), consumer.current_position(partition))
else:
# we may have duplicates in a hard failure
assert consumer.current_position(partition) <= consumer.total_consumed(), \
"Current position %d greater than the total number of consumed records %d" % \
(consumer.current_position(partition), consumer.total_consumed())
@cluster(num_nodes=10)
@matrix(num_conflict_consumers=[1, 2], fencing_stage=["stable", "all"])
def test_fencing_static_consumer(self, num_conflict_consumers, fencing_stage):
"""
Verify correct static consumer behavior when there are conflicting consumers with same group.instance.id.
- Start a producer which continues producing new messages throughout the test.
- Start up the consumers as static members and wait until they've joined the group. Some conflict consumers will be configured with
- the same group.instance.id.
- Let normal consumers and fencing consumers start at the same time, and expect only unique consumers left.
"""
partition = TopicPartition(self.TOPIC, 0)
producer = self.setup_producer(self.TOPIC)
producer.start()
self.await_produced_messages(producer)
self.session_timeout_sec = 60
consumer = self.setup_consumer(self.TOPIC, static_membership=True)
self.num_consumers = num_conflict_consumers
conflict_consumer = self.setup_consumer(self.TOPIC, static_membership=True)
# wait original set of consumer to stable stage before starting conflict members.
if fencing_stage == "stable":
consumer.start()
self.await_members(consumer, len(consumer.nodes))
conflict_consumer.start()
self.await_members(conflict_consumer, num_conflict_consumers)
self.await_members(consumer, len(consumer.nodes) - num_conflict_consumers)
assert len(consumer.dead_nodes()) == num_conflict_consumers
else:
consumer.start()
conflict_consumer.start()
wait_until(lambda: len(consumer.joined_nodes()) + len(conflict_consumer.joined_nodes()) == len(consumer.nodes),
timeout_sec=self.session_timeout_sec,
err_msg="Timed out waiting for consumers to join, expected total %d joined, but only see %d joined from"
"normal consumer group and %d from conflict consumer group" % \
(len(consumer.nodes), len(consumer.joined_nodes()), len(conflict_consumer.joined_nodes()))
)
wait_until(lambda: len(consumer.dead_nodes()) + len(conflict_consumer.dead_nodes()) == len(conflict_consumer.nodes),
timeout_sec=self.session_timeout_sec,
err_msg="Timed out waiting for fenced consumers to die, expected total %d dead, but only see %d dead in"
"normal consumer group and %d dead in conflict consumer group" % \
(len(conflict_consumer.nodes), len(consumer.dead_nodes()), len(conflict_consumer.dead_nodes()))
)
@cluster(num_nodes=7)
@matrix(clean_shutdown=[True], enable_autocommit=[True, False])
def test_consumer_failure(self, clean_shutdown, enable_autocommit):
partition = TopicPartition(self.TOPIC, 0)
consumer = self.setup_consumer(self.TOPIC, enable_autocommit=enable_autocommit)
producer = self.setup_producer(self.TOPIC)
consumer.start()
self.await_all_members(consumer)
partition_owner = consumer.owner(partition)
assert partition_owner is not None
# startup the producer and ensure that some records have been written
producer.start()
self.await_produced_messages(producer)
# stop the partition owner and await its shutdown
consumer.kill_node(partition_owner, clean_shutdown=clean_shutdown)
wait_until(lambda: len(consumer.joined_nodes()) == (self.num_consumers - 1) and consumer.owner(partition) != None,
timeout_sec=self.session_timeout_sec*2+5,
err_msg="Timed out waiting for consumer to close")
# ensure that the remaining consumer does some work after rebalancing
self.await_consumed_messages(consumer, min_messages=1000)
consumer.stop_all()
if clean_shutdown:
# if the total records consumed matches the current position, we haven't seen any duplicates
# this can only be guaranteed with a clean shutdown
assert consumer.current_position(partition) == consumer.total_consumed(), \
"Total consumed records %d did not match consumed position %d" % \
(consumer.total_consumed(), consumer.current_position(partition))
else:
# we may have duplicates in a hard failure
assert consumer.current_position(partition) <= consumer.total_consumed(), \
"Current position %d greater than the total number of consumed records %d" % \
(consumer.current_position(partition), consumer.total_consumed())
# if autocommit is not turned on, we can also verify the last committed offset
if not enable_autocommit:
assert consumer.last_commit(partition) == consumer.current_position(partition), \
"Last committed offset %d did not match last consumed position %d" % \
(consumer.last_commit(partition), consumer.current_position(partition))
@cluster(num_nodes=7)
@matrix(clean_shutdown=[True, False], enable_autocommit=[True, False])
def test_broker_failure(self, clean_shutdown, enable_autocommit):
partition = TopicPartition(self.TOPIC, 0)
consumer = self.setup_consumer(self.TOPIC, enable_autocommit=enable_autocommit)
producer = self.setup_producer(self.TOPIC)
producer.start()
consumer.start()
self.await_all_members(consumer)
num_rebalances = consumer.num_rebalances()
# shutdown one of the brokers
# TODO: we need a way to target the coordinator instead of picking arbitrarily
self.kafka.signal_node(self.kafka.nodes[0], signal.SIGTERM if clean_shutdown else signal.SIGKILL)
# ensure that the consumers do some work after the broker failure
self.await_consumed_messages(consumer, min_messages=1000)
# verify that there were no rebalances on failover
assert num_rebalances == consumer.num_rebalances(), "Broker failure should not cause a rebalance"
consumer.stop_all()
# if the total records consumed matches the current position, we haven't seen any duplicates
assert consumer.current_position(partition) == consumer.total_consumed(), \
"Total consumed records %d did not match consumed position %d" % \
(consumer.total_consumed(), consumer.current_position(partition))
# if autocommit is not turned on, we can also verify the last committed offset
if not enable_autocommit:
assert consumer.last_commit(partition) == consumer.current_position(partition), \
"Last committed offset %d did not match last consumed position %d" % \
(consumer.last_commit(partition), consumer.current_position(partition))
@cluster(num_nodes=7)
def test_group_consumption(self):
"""
Verifies correct group rebalance behavior as consumers are started and stopped.
In particular, this test verifies that the partition is readable after every
expected rebalance.
Setup: single Kafka cluster with a group of consumers reading from one topic
with one partition while the verifiable producer writes to it.
- Start the consumers one by one, verifying consumption after each rebalance
- Shutdown the consumers one by one, verifying consumption after each rebalance
"""
consumer = self.setup_consumer(self.TOPIC)
producer = self.setup_producer(self.TOPIC)
partition = TopicPartition(self.TOPIC, 0)
producer.start()
for num_started, node in enumerate(consumer.nodes, 1):
consumer.start_node(node)
self.await_members(consumer, num_started)
self.await_consumed_messages(consumer)
for num_stopped, node in enumerate(consumer.nodes, 1):
consumer.stop_node(node)
if num_stopped < self.num_consumers:
self.await_members(consumer, self.num_consumers - num_stopped)
self.await_consumed_messages(consumer)
assert consumer.current_position(partition) == consumer.total_consumed(), \
"Total consumed records %d did not match consumed position %d" % \
(consumer.total_consumed(), consumer.current_position(partition))
assert consumer.last_commit(partition) == consumer.current_position(partition), \
"Last committed offset %d did not match last consumed position %d" % \
(consumer.last_commit(partition), consumer.current_position(partition))
class AssignmentValidationTest(VerifiableConsumerTest):
TOPIC = "test_topic"
NUM_PARTITIONS = 6
def __init__(self, test_context):
super(AssignmentValidationTest, self).__init__(test_context, num_consumers=3, num_producers=0,
num_zk=1, num_brokers=2, topics={
self.TOPIC : { 'partitions': self.NUM_PARTITIONS, 'replication-factor': 1 },
})
@cluster(num_nodes=6)
@matrix(assignment_strategy=["org.apache.kafka.clients.consumer.RangeAssignor",
"org.apache.kafka.clients.consumer.RoundRobinAssignor",
"org.apache.kafka.clients.consumer.StickyAssignor"])
def test_valid_assignment(self, assignment_strategy):
"""
Verify assignment strategy correctness: each partition is assigned to exactly
one consumer instance.
Setup: single Kafka cluster with a set of consumers in the same group.
- Start the consumers one by one
- Validate assignment after every expected rebalance
"""
consumer = self.setup_consumer(self.TOPIC, assignment_strategy=assignment_strategy)
for num_started, node in enumerate(consumer.nodes, 1):
consumer.start_node(node)
self.await_members(consumer, num_started)
assert self.valid_assignment(self.TOPIC, self.NUM_PARTITIONS, consumer.current_assignment()), \
"expected valid assignments of %d partitions when num_started %d: %s" % \
(self.NUM_PARTITIONS, num_started, \
[(str(node.account), a) for node, a in consumer.current_assignment().items()])

View File

@@ -0,0 +1,104 @@
# Copyright 2015 Confluent Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from ducktape.mark import parametrize
from ducktape.utils.util import wait_until
from ducktape.mark.resource import cluster
from kafkatest.services.console_consumer import ConsoleConsumer
from kafkatest.services.kafka import KafkaService
from kafkatest.services.verifiable_producer import VerifiableProducer
from kafkatest.services.zookeeper import ZookeeperService
from kafkatest.tests.produce_consume_validate import ProduceConsumeValidateTest
from kafkatest.utils import is_int
from kafkatest.version import LATEST_0_9, LATEST_0_10, LATEST_0_11, DEV_BRANCH, KafkaVersion
class MessageFormatChangeTest(ProduceConsumeValidateTest):
def __init__(self, test_context):
super(MessageFormatChangeTest, self).__init__(test_context=test_context)
def setUp(self):
self.topic = "test_topic"
self.zk = ZookeeperService(self.test_context, num_nodes=1)
self.zk.start()
# Producer and consumer
self.producer_throughput = 10000
self.num_producers = 1
self.num_consumers = 1
self.messages_per_producer = 100
def produce_and_consume(self, producer_version, consumer_version, group):
self.producer = VerifiableProducer(self.test_context, self.num_producers, self.kafka,
self.topic,
throughput=self.producer_throughput,
message_validator=is_int,
version=KafkaVersion(producer_version))
self.consumer = ConsoleConsumer(self.test_context, self.num_consumers, self.kafka,
self.topic, consumer_timeout_ms=30000,
message_validator=is_int, version=KafkaVersion(consumer_version))
self.consumer.group_id = group
self.run_produce_consume_validate(lambda: wait_until(
lambda: self.producer.each_produced_at_least(self.messages_per_producer) == True,
timeout_sec=120, backoff_sec=1,
err_msg="Producer did not produce all messages in reasonable amount of time"))
@cluster(num_nodes=12)
@parametrize(producer_version=str(DEV_BRANCH), consumer_version=str(DEV_BRANCH))
@parametrize(producer_version=str(LATEST_0_10), consumer_version=str(LATEST_0_10))
@parametrize(producer_version=str(LATEST_0_9), consumer_version=str(LATEST_0_9))
def test_compatibility(self, producer_version, consumer_version):
""" This tests performs the following checks:
The workload is a mix of 0.9.x, 0.10.x and 0.11.x producers and consumers
that produce to and consume from a DEV_BRANCH cluster
1. initially the topic is using message format 0.9.0
2. change the message format version for topic to 0.10.0 on the fly.
3. change the message format version for topic to 0.11.0 on the fly.
4. change the message format version for topic back to 0.10.0 on the fly (only if the client version is 0.11.0 or newer)
- The producers and consumers should not have any issue.
Note regarding step number 4. Downgrading the message format version is generally unsupported as it breaks
older clients. More concretely, if we downgrade a topic from 0.11.0 to 0.10.0 after it contains messages with
version 0.11.0, we will return the 0.11.0 messages without down conversion due to an optimisation in the
handling of fetch requests. This will break any consumer that doesn't support 0.11.0. So, in practice, step 4
is similar to step 2 and it didn't seem worth it to increase the cluster size to in order to add a step 5 that
would change the message format version for the topic back to 0.9.0.0.
"""
self.kafka = KafkaService(self.test_context, num_nodes=3, zk=self.zk, version=DEV_BRANCH, topics={self.topic: {
"partitions": 3,
"replication-factor": 3,
'configs': {"min.insync.replicas": 2}}})
self.kafka.start()
self.logger.info("First format change to 0.9.0")
self.kafka.alter_message_format(self.topic, str(LATEST_0_9))
self.produce_and_consume(producer_version, consumer_version, "group1")
self.logger.info("Second format change to 0.10.0")
self.kafka.alter_message_format(self.topic, str(LATEST_0_10))
self.produce_and_consume(producer_version, consumer_version, "group2")
self.logger.info("Third format change to 0.11.0")
self.kafka.alter_message_format(self.topic, str(LATEST_0_11))
self.produce_and_consume(producer_version, consumer_version, "group3")
if producer_version == str(DEV_BRANCH) and consumer_version == str(DEV_BRANCH):
self.logger.info("Fourth format change back to 0.10.0")
self.kafka.alter_message_format(self.topic, str(LATEST_0_10))
self.produce_and_consume(producer_version, consumer_version, "group4")

View File

@@ -0,0 +1,51 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from ducktape.utils.util import wait_until
from kafkatest.tests.verifiable_consumer_test import VerifiableConsumerTest
class PluggableConsumerTest(VerifiableConsumerTest):
""" Verify that the pluggable client framework works. """
TOPIC = "test_topic"
NUM_PARTITIONS = 1
def __init__(self, test_context):
super(PluggableConsumerTest, self).__init__(test_context, num_consumers=1, num_producers=0,
num_zk=1, num_brokers=1, topics={
self.TOPIC : { 'partitions': self.NUM_PARTITIONS, 'replication-factor': 1 },
})
def test_start_stop(self):
"""
Test that a pluggable VerifiableConsumer module load works
"""
consumer = self.setup_consumer(self.TOPIC)
for num_started, node in enumerate(consumer.nodes, 1):
consumer.start_node(node)
self.logger.debug("Waiting for %d nodes to start" % len(consumer.nodes))
wait_until(lambda: len(consumer.alive_nodes()) == len(consumer.nodes),
timeout_sec=60,
err_msg="Timed out waiting for consumers to start")
self.logger.debug("Started: %s" % str(consumer.alive_nodes()))
consumer.stop_all()
self.logger.debug("Waiting for %d nodes to stop" % len(consumer.nodes))
wait_until(lambda: len(consumer.dead_nodes()) == len(consumer.nodes),
timeout_sec=self.session_timeout_sec+5,
err_msg="Timed out waiting for consumers to shutdown")

View File

@@ -0,0 +1,236 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from ducktape.tests.test import Test
from ducktape.mark import matrix, parametrize
from ducktape.mark.resource import cluster
from kafkatest.services.zookeeper import ZookeeperService
from kafkatest.services.kafka import KafkaService
from kafkatest.services.performance import ProducerPerformanceService
from kafkatest.services.console_consumer import ConsoleConsumer
from kafkatest.version import DEV_BRANCH, LATEST_1_1
class QuotaConfig(object):
CLIENT_ID = 'client-id'
USER = 'user'
USER_CLIENT = '(user, client-id)'
LARGE_QUOTA = 1000 * 1000 * 1000
USER_PRINCIPAL = 'CN=systemtest'
def __init__(self, quota_type, override_quota, kafka):
if quota_type == QuotaConfig.CLIENT_ID:
if override_quota:
self.client_id = 'overridden_id'
self.producer_quota = 3750000
self.consumer_quota = 3000000
self.configure_quota(kafka, self.producer_quota, self.consumer_quota, ['clients', self.client_id])
self.configure_quota(kafka, QuotaConfig.LARGE_QUOTA, QuotaConfig.LARGE_QUOTA, ['clients', None])
else:
self.client_id = 'default_id'
self.producer_quota = 2500000
self.consumer_quota = 2000000
self.configure_quota(kafka, self.producer_quota, self.consumer_quota, ['clients', None])
self.configure_quota(kafka, QuotaConfig.LARGE_QUOTA, QuotaConfig.LARGE_QUOTA, ['clients', 'overridden_id'])
elif quota_type == QuotaConfig.USER:
if override_quota:
self.client_id = 'some_id'
self.producer_quota = 3750000
self.consumer_quota = 3000000
self.configure_quota(kafka, self.producer_quota, self.consumer_quota, ['users', QuotaConfig.USER_PRINCIPAL])
self.configure_quota(kafka, QuotaConfig.LARGE_QUOTA, QuotaConfig.LARGE_QUOTA, ['users', None])
self.configure_quota(kafka, QuotaConfig.LARGE_QUOTA, QuotaConfig.LARGE_QUOTA, ['clients', self.client_id])
else:
self.client_id = 'some_id'
self.producer_quota = 2500000
self.consumer_quota = 2000000
self.configure_quota(kafka, self.producer_quota, self.consumer_quota, ['users', None])
self.configure_quota(kafka, QuotaConfig.LARGE_QUOTA, QuotaConfig.LARGE_QUOTA, ['clients', None])
elif quota_type == QuotaConfig.USER_CLIENT:
if override_quota:
self.client_id = 'overridden_id'
self.producer_quota = 3750000
self.consumer_quota = 3000000
self.configure_quota(kafka, self.producer_quota, self.consumer_quota, ['users', QuotaConfig.USER_PRINCIPAL, 'clients', self.client_id])
self.configure_quota(kafka, QuotaConfig.LARGE_QUOTA, QuotaConfig.LARGE_QUOTA, ['users', QuotaConfig.USER_PRINCIPAL, 'clients', None])
self.configure_quota(kafka, QuotaConfig.LARGE_QUOTA, QuotaConfig.LARGE_QUOTA, ['users', None])
self.configure_quota(kafka, QuotaConfig.LARGE_QUOTA, QuotaConfig.LARGE_QUOTA, ['clients', self.client_id])
else:
self.client_id = 'default_id'
self.producer_quota = 2500000
self.consumer_quota = 2000000
self.configure_quota(kafka, self.producer_quota, self.consumer_quota, ['users', None, 'clients', None])
self.configure_quota(kafka, QuotaConfig.LARGE_QUOTA, QuotaConfig.LARGE_QUOTA, ['users', None])
self.configure_quota(kafka, QuotaConfig.LARGE_QUOTA, QuotaConfig.LARGE_QUOTA, ['clients', None])
def configure_quota(self, kafka, producer_byte_rate, consumer_byte_rate, entity_args):
node = kafka.nodes[0]
cmd = "%s --zookeeper %s --alter --add-config producer_byte_rate=%d,consumer_byte_rate=%d" % \
(kafka.path.script("kafka-configs.sh", node), kafka.zk_connect_setting(), producer_byte_rate, consumer_byte_rate)
cmd += " --entity-type " + entity_args[0] + self.entity_name_opt(entity_args[1])
if len(entity_args) > 2:
cmd += " --entity-type " + entity_args[2] + self.entity_name_opt(entity_args[3])
node.account.ssh(cmd)
def entity_name_opt(self, name):
return " --entity-default" if name is None else " --entity-name " + name
class QuotaTest(Test):
"""
These tests verify that quota provides expected functionality -- they run
producer, broker, and consumer with different clientId and quota configuration and
check that the observed throughput is close to the value we expect.
"""
def __init__(self, test_context):
""":type test_context: ducktape.tests.test.TestContext"""
super(QuotaTest, self).__init__(test_context=test_context)
self.topic = 'test_topic'
self.logger.info('use topic ' + self.topic)
self.maximum_client_deviation_percentage = 100.0
self.maximum_broker_deviation_percentage = 5.0
self.num_records = 50000
self.record_size = 3000
self.zk = ZookeeperService(test_context, num_nodes=1)
self.kafka = KafkaService(test_context, num_nodes=1, zk=self.zk,
security_protocol='SSL', authorizer_class_name='',
interbroker_security_protocol='SSL',
topics={self.topic: {'partitions': 6, 'replication-factor': 1, 'configs': {'min.insync.replicas': 1}}},
jmx_object_names=['kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec',
'kafka.server:type=BrokerTopicMetrics,name=BytesOutPerSec'],
jmx_attributes=['OneMinuteRate'])
self.num_producers = 1
self.num_consumers = 2
def setUp(self):
self.zk.start()
def min_cluster_size(self):
"""Override this since we're adding services outside of the constructor"""
return super(QuotaTest, self).min_cluster_size() + self.num_producers + self.num_consumers
@cluster(num_nodes=5)
@matrix(quota_type=[QuotaConfig.CLIENT_ID, QuotaConfig.USER, QuotaConfig.USER_CLIENT], override_quota=[True, False])
@parametrize(quota_type=QuotaConfig.CLIENT_ID, consumer_num=2)
@parametrize(quota_type=QuotaConfig.CLIENT_ID, old_broker_throttling_behavior=True)
@parametrize(quota_type=QuotaConfig.CLIENT_ID, old_client_throttling_behavior=True)
def test_quota(self, quota_type, override_quota=True, producer_num=1, consumer_num=1,
old_broker_throttling_behavior=False, old_client_throttling_behavior=False):
# Old (pre-2.0) throttling behavior for broker throttles before sending a response to the client.
if old_broker_throttling_behavior:
self.kafka.set_version(LATEST_1_1)
self.kafka.start()
self.quota_config = QuotaConfig(quota_type, override_quota, self.kafka)
producer_client_id = self.quota_config.client_id
consumer_client_id = self.quota_config.client_id
# Old (pre-2.0) throttling behavior for client does not throttle upon receiving a response with a non-zero throttle time.
if old_client_throttling_behavior:
client_version = LATEST_1_1
else:
client_version = DEV_BRANCH
# Produce all messages
producer = ProducerPerformanceService(
self.test_context, producer_num, self.kafka,
topic=self.topic, num_records=self.num_records, record_size=self.record_size, throughput=-1,
client_id=producer_client_id, version=client_version)
producer.run()
# Consume all messages
consumer = ConsoleConsumer(self.test_context, consumer_num, self.kafka, self.topic,
consumer_timeout_ms=60000, client_id=consumer_client_id,
jmx_object_names=['kafka.consumer:type=consumer-fetch-manager-metrics,client-id=%s' % consumer_client_id],
jmx_attributes=['bytes-consumed-rate'], version=client_version)
consumer.run()
for idx, messages in consumer.messages_consumed.iteritems():
assert len(messages) > 0, "consumer %d didn't consume any message before timeout" % idx
success, msg = self.validate(self.kafka, producer, consumer)
assert success, msg
def validate(self, broker, producer, consumer):
"""
For each client_id we validate that:
1) number of consumed messages equals number of produced messages
2) maximum_producer_throughput <= producer_quota * (1 + maximum_client_deviation_percentage/100)
3) maximum_broker_byte_in_rate <= producer_quota * (1 + maximum_broker_deviation_percentage/100)
4) maximum_consumer_throughput <= consumer_quota * (1 + maximum_client_deviation_percentage/100)
5) maximum_broker_byte_out_rate <= consumer_quota * (1 + maximum_broker_deviation_percentage/100)
"""
success = True
msg = ''
self.kafka.read_jmx_output_all_nodes()
# validate that number of consumed messages equals number of produced messages
produced_num = sum([value['records'] for value in producer.results])
consumed_num = sum([len(value) for value in consumer.messages_consumed.values()])
self.logger.info('producer produced %d messages' % produced_num)
self.logger.info('consumer consumed %d messages' % consumed_num)
if produced_num != consumed_num:
success = False
msg += "number of produced messages %d doesn't equal number of consumed messages %d" % (produced_num, consumed_num)
# validate that maximum_producer_throughput <= producer_quota * (1 + maximum_client_deviation_percentage/100)
producer_maximum_bps = max(
metric.value for k, metrics in producer.metrics(group='producer-metrics', name='outgoing-byte-rate', client_id=producer.client_id) for metric in metrics
)
producer_quota_bps = self.quota_config.producer_quota
self.logger.info('producer has maximum throughput %.2f bps with producer quota %.2f bps' % (producer_maximum_bps, producer_quota_bps))
if producer_maximum_bps > producer_quota_bps*(self.maximum_client_deviation_percentage/100+1):
success = False
msg += 'maximum producer throughput %.2f bps exceeded producer quota %.2f bps by more than %.1f%%' % \
(producer_maximum_bps, producer_quota_bps, self.maximum_client_deviation_percentage)
# validate that maximum_broker_byte_in_rate <= producer_quota * (1 + maximum_broker_deviation_percentage/100)
broker_byte_in_attribute_name = 'kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec:OneMinuteRate'
broker_maximum_byte_in_bps = broker.maximum_jmx_value[broker_byte_in_attribute_name]
self.logger.info('broker has maximum byte-in rate %.2f bps with producer quota %.2f bps' %
(broker_maximum_byte_in_bps, producer_quota_bps))
if broker_maximum_byte_in_bps > producer_quota_bps*(self.maximum_broker_deviation_percentage/100+1):
success = False
msg += 'maximum broker byte-in rate %.2f bps exceeded producer quota %.2f bps by more than %.1f%%' % \
(broker_maximum_byte_in_bps, producer_quota_bps, self.maximum_broker_deviation_percentage)
# validate that maximum_consumer_throughput <= consumer_quota * (1 + maximum_client_deviation_percentage/100)
consumer_attribute_name = 'kafka.consumer:type=consumer-fetch-manager-metrics,client-id=%s:bytes-consumed-rate' % consumer.client_id
consumer_maximum_bps = consumer.maximum_jmx_value[consumer_attribute_name]
consumer_quota_bps = self.quota_config.consumer_quota
self.logger.info('consumer has maximum throughput %.2f bps with consumer quota %.2f bps' % (consumer_maximum_bps, consumer_quota_bps))
if consumer_maximum_bps > consumer_quota_bps*(self.maximum_client_deviation_percentage/100+1):
success = False
msg += 'maximum consumer throughput %.2f bps exceeded consumer quota %.2f bps by more than %.1f%%' % \
(consumer_maximum_bps, consumer_quota_bps, self.maximum_client_deviation_percentage)
# validate that maximum_broker_byte_out_rate <= consumer_quota * (1 + maximum_broker_deviation_percentage/100)
broker_byte_out_attribute_name = 'kafka.server:type=BrokerTopicMetrics,name=BytesOutPerSec:OneMinuteRate'
broker_maximum_byte_out_bps = broker.maximum_jmx_value[broker_byte_out_attribute_name]
self.logger.info('broker has maximum byte-out rate %.2f bps with consumer quota %.2f bps' %
(broker_maximum_byte_out_bps, consumer_quota_bps))
if broker_maximum_byte_out_bps > consumer_quota_bps*(self.maximum_broker_deviation_percentage/100+1):
success = False
msg += 'maximum broker byte-out rate %.2f bps exceeded consumer quota %.2f bps by more than %.1f%%' % \
(broker_maximum_byte_out_bps, consumer_quota_bps, self.maximum_broker_deviation_percentage)
return success, msg

View File

@@ -0,0 +1,149 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from ducktape.mark.resource import cluster
from ducktape.utils.util import wait_until
from kafkatest.tests.verifiable_consumer_test import VerifiableConsumerTest
from kafkatest.services.kafka import TopicPartition
from kafkatest.services.verifiable_consumer import VerifiableConsumer
class TruncationTest(VerifiableConsumerTest):
TOPIC = "test_topic"
NUM_PARTITIONS = 1
TOPICS = {
TOPIC: {
'partitions': NUM_PARTITIONS,
'replication-factor': 2
}
}
GROUP_ID = "truncation-test"
def __init__(self, test_context):
super(TruncationTest, self).__init__(test_context, num_consumers=1, num_producers=1,
num_zk=1, num_brokers=3, topics=self.TOPICS)
self.last_total = 0
self.all_offsets_consumed = []
self.all_values_consumed = []
def setup_consumer(self, topic, **kwargs):
consumer = super(TruncationTest, self).setup_consumer(topic, **kwargs)
self.mark_for_collect(consumer, 'verifiable_consumer_stdout')
def print_record(event, node):
self.all_offsets_consumed.append(event['offset'])
self.all_values_consumed.append(event['value'])
consumer.on_record_consumed = print_record
return consumer
@cluster(num_nodes=7)
def test_offset_truncate(self):
"""
Verify correct consumer behavior when the brokers are consecutively restarted.
Setup: single Kafka cluster with one producer writing messages to a single topic with one
partition, an a set of consumers in the same group reading from the same topic.
- Start a producer which continues producing new messages throughout the test.
- Start up the consumers and wait until they've joined the group.
- In a loop, restart each broker consecutively, waiting for the group to stabilize between
each broker restart.
- Verify delivery semantics according to the failure type and that the broker bounces
did not cause unexpected group rebalances.
"""
tp = TopicPartition(self.TOPIC, 0)
producer = self.setup_producer(self.TOPIC, throughput=10)
producer.start()
self.await_produced_messages(producer, min_messages=10)
consumer = self.setup_consumer(self.TOPIC, reset_policy="earliest", verify_offsets=False)
consumer.start()
self.await_all_members(consumer)
# Reduce ISR to one node
isr = self.kafka.isr_idx_list(self.TOPIC, 0)
node1 = self.kafka.get_node(isr[0])
self.kafka.stop_node(node1)
self.logger.info("Reduced ISR to one node, consumer is at %s", consumer.current_position(tp))
# Ensure remaining ISR member has a little bit of data
current_total = consumer.total_consumed()
wait_until(lambda: consumer.total_consumed() > current_total + 10,
timeout_sec=30,
err_msg="Timed out waiting for consumer to move ahead by 10 messages")
# Kill last ISR member
node2 = self.kafka.get_node(isr[1])
self.kafka.stop_node(node2)
self.logger.info("No members in ISR, consumer is at %s", consumer.current_position(tp))
# Keep consuming until we've caught up to HW
def none_consumed(this, consumer):
new_total = consumer.total_consumed()
if new_total == this.last_total:
return True
else:
this.last_total = new_total
return False
self.last_total = consumer.total_consumed()
wait_until(lambda: none_consumed(self, consumer),
timeout_sec=30,
err_msg="Timed out waiting for the consumer to catch up")
self.kafka.start_node(node1)
self.logger.info("Out of sync replica is online, but not electable. Consumer is at %s", consumer.current_position(tp))
pre_truncation_pos = consumer.current_position(tp)
self.kafka.set_unclean_leader_election(self.TOPIC)
self.logger.info("New unclean leader, consumer is at %s", consumer.current_position(tp))
# Wait for truncation to be detected
self.kafka.start_node(node2)
wait_until(lambda: consumer.current_position(tp) >= pre_truncation_pos,
timeout_sec=30,
err_msg="Timed out waiting for truncation")
# Make sure we didn't reset to beginning of log
total_records_consumed = len(self.all_values_consumed)
assert total_records_consumed == len(set(self.all_values_consumed)), "Received duplicate records"
consumer.stop()
producer.stop()
# Re-consume all the records
consumer2 = VerifiableConsumer(self.test_context, 1, self.kafka, self.TOPIC, group_id="group2",
reset_policy="earliest", verify_offsets=True)
consumer2.start()
self.await_all_members(consumer2)
wait_until(lambda: consumer2.total_consumed() > 0,
timeout_sec=30,
err_msg="Timed out waiting for consumer to consume at least 10 messages")
self.last_total = consumer2.total_consumed()
wait_until(lambda: none_consumed(self, consumer2),
timeout_sec=30,
err_msg="Timed out waiting for the consumer to fully consume data")
second_total_consumed = consumer2.total_consumed()
assert second_total_consumed < total_records_consumed, "Expected fewer records with new consumer since we truncated"
self.logger.info("Second consumer saw only %s, meaning %s were truncated",
second_total_consumed, total_records_consumed - second_total_consumed)

View File

@@ -0,0 +1,14 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

View File

@@ -0,0 +1,585 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from ducktape.tests.test import Test
from ducktape.mark.resource import cluster
from ducktape.utils.util import wait_until
from ducktape.mark import matrix, parametrize
from ducktape.cluster.remoteaccount import RemoteCommandError
from kafkatest.services.zookeeper import ZookeeperService
from kafkatest.services.kafka import KafkaService, config_property
from kafkatest.services.connect import ConnectDistributedService, VerifiableSource, VerifiableSink, ConnectRestError, MockSink, MockSource
from kafkatest.services.console_consumer import ConsoleConsumer
from kafkatest.services.security.security_config import SecurityConfig
from kafkatest.version import DEV_BRANCH, LATEST_2_3, LATEST_2_2, LATEST_2_1, LATEST_2_0, LATEST_1_1, LATEST_1_0, LATEST_0_11_0, LATEST_0_10_2, LATEST_0_10_1, LATEST_0_10_0, LATEST_0_9, LATEST_0_8_2, KafkaVersion
from collections import Counter, namedtuple
import itertools
import json
import operator
import time
class ConnectDistributedTest(Test):
"""
Simple test of Kafka Connect in distributed mode, producing data from files on one cluster and consuming it on
another, validating the total output is identical to the input.
"""
FILE_SOURCE_CONNECTOR = 'org.apache.kafka.connect.file.FileStreamSourceConnector'
FILE_SINK_CONNECTOR = 'org.apache.kafka.connect.file.FileStreamSinkConnector'
INPUT_FILE = "/mnt/connect.input"
OUTPUT_FILE = "/mnt/connect.output"
TOPIC = "test"
OFFSETS_TOPIC = "connect-offsets"
OFFSETS_REPLICATION_FACTOR = "1"
OFFSETS_PARTITIONS = "1"
CONFIG_TOPIC = "connect-configs"
CONFIG_REPLICATION_FACTOR = "1"
STATUS_TOPIC = "connect-status"
STATUS_REPLICATION_FACTOR = "1"
STATUS_PARTITIONS = "1"
SCHEDULED_REBALANCE_MAX_DELAY_MS = "60000"
CONNECT_PROTOCOL="sessioned"
# Since tasks can be assigned to any node and we're testing with files, we need to make sure the content is the same
# across all nodes.
FIRST_INPUT_LIST = ["foo", "bar", "baz"]
FIRST_INPUTS = "\n".join(FIRST_INPUT_LIST) + "\n"
SECOND_INPUT_LIST = ["razz", "ma", "tazz"]
SECOND_INPUTS = "\n".join(SECOND_INPUT_LIST) + "\n"
SCHEMA = { "type": "string", "optional": False }
def __init__(self, test_context):
super(ConnectDistributedTest, self).__init__(test_context)
self.num_zk = 1
self.num_brokers = 1
self.topics = {
self.TOPIC: {'partitions': 1, 'replication-factor': 1}
}
self.zk = ZookeeperService(test_context, self.num_zk)
self.key_converter = "org.apache.kafka.connect.json.JsonConverter"
self.value_converter = "org.apache.kafka.connect.json.JsonConverter"
self.schemas = True
def setup_services(self, security_protocol=SecurityConfig.PLAINTEXT, timestamp_type=None, broker_version=DEV_BRANCH, auto_create_topics=False):
self.kafka = KafkaService(self.test_context, self.num_brokers, self.zk,
security_protocol=security_protocol, interbroker_security_protocol=security_protocol,
topics=self.topics, version=broker_version,
server_prop_overides=[["auto.create.topics.enable", str(auto_create_topics)]])
if timestamp_type is not None:
for node in self.kafka.nodes:
node.config[config_property.MESSAGE_TIMESTAMP_TYPE] = timestamp_type
self.cc = ConnectDistributedService(self.test_context, 3, self.kafka, [self.INPUT_FILE, self.OUTPUT_FILE])
self.cc.log_level = "DEBUG"
self.zk.start()
self.kafka.start()
def _start_connector(self, config_file):
connector_props = self.render(config_file)
connector_config = dict([line.strip().split('=', 1) for line in connector_props.split('\n') if line.strip() and not line.strip().startswith('#')])
self.cc.create_connector(connector_config)
def _connector_status(self, connector, node=None):
try:
return self.cc.get_connector_status(connector, node)
except ConnectRestError:
return None
def _connector_has_state(self, status, state):
return status is not None and status['connector']['state'] == state
def _task_has_state(self, task_id, status, state):
if not status:
return False
tasks = status['tasks']
if not tasks:
return False
for task in tasks:
if task['id'] == task_id:
return task['state'] == state
return False
def _all_tasks_have_state(self, status, task_count, state):
if status is None:
return False
tasks = status['tasks']
if len(tasks) != task_count:
return False
return reduce(operator.and_, [task['state'] == state for task in tasks], True)
def is_running(self, connector, node=None):
status = self._connector_status(connector.name, node)
return self._connector_has_state(status, 'RUNNING') and self._all_tasks_have_state(status, connector.tasks, 'RUNNING')
def is_paused(self, connector, node=None):
status = self._connector_status(connector.name, node)
return self._connector_has_state(status, 'PAUSED') and self._all_tasks_have_state(status, connector.tasks, 'PAUSED')
def connector_is_running(self, connector, node=None):
status = self._connector_status(connector.name, node)
return self._connector_has_state(status, 'RUNNING')
def connector_is_failed(self, connector, node=None):
status = self._connector_status(connector.name, node)
return self._connector_has_state(status, 'FAILED')
def task_is_failed(self, connector, task_id, node=None):
status = self._connector_status(connector.name, node)
return self._task_has_state(task_id, status, 'FAILED')
def task_is_running(self, connector, task_id, node=None):
status = self._connector_status(connector.name, node)
return self._task_has_state(task_id, status, 'RUNNING')
@cluster(num_nodes=5)
@matrix(connect_protocol=['sessioned', 'compatible', 'eager'])
def test_restart_failed_connector(self, connect_protocol):
self.CONNECT_PROTOCOL = connect_protocol
self.setup_services()
self.cc.set_configs(lambda node: self.render("connect-distributed.properties", node=node))
self.cc.start()
self.sink = MockSink(self.cc, self.topics.keys(), mode='connector-failure', delay_sec=5)
self.sink.start()
wait_until(lambda: self.connector_is_failed(self.sink), timeout_sec=15,
err_msg="Failed to see connector transition to the FAILED state")
self.cc.restart_connector(self.sink.name)
wait_until(lambda: self.connector_is_running(self.sink), timeout_sec=10,
err_msg="Failed to see connector transition to the RUNNING state")
@cluster(num_nodes=5)
@matrix(connector_type=['source', 'sink'], connect_protocol=['sessioned', 'compatible', 'eager'])
def test_restart_failed_task(self, connector_type, connect_protocol):
self.CONNECT_PROTOCOL = connect_protocol
self.setup_services()
self.cc.set_configs(lambda node: self.render("connect-distributed.properties", node=node))
self.cc.start()
connector = None
if connector_type == "sink":
connector = MockSink(self.cc, self.topics.keys(), mode='task-failure', delay_sec=5)
else:
connector = MockSource(self.cc, mode='task-failure', delay_sec=5)
connector.start()
task_id = 0
wait_until(lambda: self.task_is_failed(connector, task_id), timeout_sec=20,
err_msg="Failed to see task transition to the FAILED state")
self.cc.restart_task(connector.name, task_id)
wait_until(lambda: self.task_is_running(connector, task_id), timeout_sec=10,
err_msg="Failed to see task transition to the RUNNING state")
@cluster(num_nodes=5)
@matrix(connect_protocol=['sessioned', 'compatible', 'eager'])
def test_pause_and_resume_source(self, connect_protocol):
"""
Verify that source connectors stop producing records when paused and begin again after
being resumed.
"""
self.CONNECT_PROTOCOL = connect_protocol
self.setup_services()
self.cc.set_configs(lambda node: self.render("connect-distributed.properties", node=node))
self.cc.start()
self.source = VerifiableSource(self.cc, topic=self.TOPIC)
self.source.start()
wait_until(lambda: self.is_running(self.source), timeout_sec=30,
err_msg="Failed to see connector transition to the RUNNING state")
self.cc.pause_connector(self.source.name)
# wait until all nodes report the paused transition
for node in self.cc.nodes:
wait_until(lambda: self.is_paused(self.source, node), timeout_sec=30,
err_msg="Failed to see connector transition to the PAUSED state")
# verify that we do not produce new messages while paused
num_messages = len(self.source.sent_messages())
time.sleep(10)
assert num_messages == len(self.source.sent_messages()), "Paused source connector should not produce any messages"
self.cc.resume_connector(self.source.name)
for node in self.cc.nodes:
wait_until(lambda: self.is_running(self.source, node), timeout_sec=30,
err_msg="Failed to see connector transition to the RUNNING state")
# after resuming, we should see records produced again
wait_until(lambda: len(self.source.sent_messages()) > num_messages, timeout_sec=30,
err_msg="Failed to produce messages after resuming source connector")
@cluster(num_nodes=5)
@matrix(connect_protocol=['sessioned', 'compatible', 'eager'])
def test_pause_and_resume_sink(self, connect_protocol):
"""
Verify that sink connectors stop consuming records when paused and begin again after
being resumed.
"""
self.CONNECT_PROTOCOL = connect_protocol
self.setup_services()
self.cc.set_configs(lambda node: self.render("connect-distributed.properties", node=node))
self.cc.start()
# use the verifiable source to produce a steady stream of messages
self.source = VerifiableSource(self.cc, topic=self.TOPIC)
self.source.start()
wait_until(lambda: len(self.source.committed_messages()) > 0, timeout_sec=30,
err_msg="Timeout expired waiting for source task to produce a message")
self.sink = VerifiableSink(self.cc, topics=[self.TOPIC])
self.sink.start()
wait_until(lambda: self.is_running(self.sink), timeout_sec=30,
err_msg="Failed to see connector transition to the RUNNING state")
self.cc.pause_connector(self.sink.name)
# wait until all nodes report the paused transition
for node in self.cc.nodes:
wait_until(lambda: self.is_paused(self.sink, node), timeout_sec=30,
err_msg="Failed to see connector transition to the PAUSED state")
# verify that we do not consume new messages while paused
num_messages = len(self.sink.received_messages())
time.sleep(10)
assert num_messages == len(self.sink.received_messages()), "Paused sink connector should not consume any messages"
self.cc.resume_connector(self.sink.name)
for node in self.cc.nodes:
wait_until(lambda: self.is_running(self.sink, node), timeout_sec=30,
err_msg="Failed to see connector transition to the RUNNING state")
# after resuming, we should see records consumed again
wait_until(lambda: len(self.sink.received_messages()) > num_messages, timeout_sec=30,
err_msg="Failed to consume messages after resuming sink connector")
@cluster(num_nodes=5)
@matrix(connect_protocol=['sessioned', 'compatible', 'eager'])
def test_pause_state_persistent(self, connect_protocol):
"""
Verify that paused state is preserved after a cluster restart.
"""
self.CONNECT_PROTOCOL = connect_protocol
self.setup_services()
self.cc.set_configs(lambda node: self.render("connect-distributed.properties", node=node))
self.cc.start()
self.source = VerifiableSource(self.cc, topic=self.TOPIC)
self.source.start()
wait_until(lambda: self.is_running(self.source), timeout_sec=30,
err_msg="Failed to see connector transition to the RUNNING state")
self.cc.pause_connector(self.source.name)
self.cc.restart()
# we should still be paused after restarting
for node in self.cc.nodes:
wait_until(lambda: self.is_paused(self.source, node), timeout_sec=120,
err_msg="Failed to see connector startup in PAUSED state")
@cluster(num_nodes=6)
@matrix(security_protocol=[SecurityConfig.PLAINTEXT, SecurityConfig.SASL_SSL], connect_protocol=['sessioned', 'compatible', 'eager'])
def test_file_source_and_sink(self, security_protocol, connect_protocol):
"""
Tests that a basic file connector works across clean rolling bounces. This validates that the connector is
correctly created, tasks instantiated, and as nodes restart the work is rebalanced across nodes.
"""
self.CONNECT_PROTOCOL = connect_protocol
self.setup_services(security_protocol=security_protocol)
self.cc.set_configs(lambda node: self.render("connect-distributed.properties", node=node))
self.cc.start()
self.logger.info("Creating connectors")
self._start_connector("connect-file-source.properties")
self._start_connector("connect-file-sink.properties")
# Generating data on the source node should generate new records and create new output on the sink node. Timeouts
# here need to be more generous than they are for standalone mode because a) it takes longer to write configs,
# do rebalancing of the group, etc, and b) without explicit leave group support, rebalancing takes awhile
for node in self.cc.nodes:
node.account.ssh("echo -e -n " + repr(self.FIRST_INPUTS) + " >> " + self.INPUT_FILE)
wait_until(lambda: self._validate_file_output(self.FIRST_INPUT_LIST), timeout_sec=70, err_msg="Data added to input file was not seen in the output file in a reasonable amount of time.")
# Restarting both should result in them picking up where they left off,
# only processing new data.
self.cc.restart()
for node in self.cc.nodes:
node.account.ssh("echo -e -n " + repr(self.SECOND_INPUTS) + " >> " + self.INPUT_FILE)
wait_until(lambda: self._validate_file_output(self.FIRST_INPUT_LIST + self.SECOND_INPUT_LIST), timeout_sec=150, err_msg="Sink output file never converged to the same state as the input file")
@cluster(num_nodes=6)
@matrix(clean=[True, False], connect_protocol=['sessioned', 'compatible', 'eager'])
def test_bounce(self, clean, connect_protocol):
"""
Validates that source and sink tasks that run continuously and produce a predictable sequence of messages
run correctly and deliver messages exactly once when Kafka Connect workers undergo clean rolling bounces.
"""
num_tasks = 3
self.CONNECT_PROTOCOL = connect_protocol
self.setup_services()
self.cc.set_configs(lambda node: self.render("connect-distributed.properties", node=node))
self.cc.start()
self.source = VerifiableSource(self.cc, topic=self.TOPIC, tasks=num_tasks, throughput=100)
self.source.start()
self.sink = VerifiableSink(self.cc, tasks=num_tasks, topics=[self.TOPIC])
self.sink.start()
for _ in range(3):
for node in self.cc.nodes:
started = time.time()
self.logger.info("%s bouncing Kafka Connect on %s", clean and "Clean" or "Hard", str(node.account))
self.cc.stop_node(node, clean_shutdown=clean)
with node.account.monitor_log(self.cc.LOG_FILE) as monitor:
self.cc.start_node(node)
monitor.wait_until("Starting connectors and tasks using config offset", timeout_sec=90,
err_msg="Kafka Connect worker didn't successfully join group and start work")
self.logger.info("Bounced Kafka Connect on %s and rejoined in %f seconds", node.account, time.time() - started)
# Give additional time for the consumer groups to recover. Even if it is not a hard bounce, there are
# some cases where a restart can cause a rebalance to take the full length of the session timeout
# (e.g. if the client shuts down before it has received the memberId from its initial JoinGroup).
# If we don't give enough time for the group to stabilize, the next bounce may cause consumers to
# be shut down before they have any time to process data and we can end up with zero data making it
# through the test.
time.sleep(15)
self.source.stop()
self.sink.stop()
self.cc.stop()
# Validate at least once delivery of everything that was reported as written since we should have flushed and
# cleanly exited. Currently this only tests at least once delivery because the sink task may not have consumed
# all the messages generated by the source task. This needs to be done per-task since seqnos are not unique across
# tasks.
success = True
errors = []
allow_dups = not clean
src_messages = self.source.committed_messages()
sink_messages = self.sink.flushed_messages()
for task in range(num_tasks):
# Validate source messages
src_seqnos = [msg['seqno'] for msg in src_messages if msg['task'] == task]
# Every seqno up to the largest one we ever saw should appear. Each seqno should only appear once because clean
# bouncing should commit on rebalance.
src_seqno_max = max(src_seqnos)
self.logger.debug("Max source seqno: %d", src_seqno_max)
src_seqno_counts = Counter(src_seqnos)
missing_src_seqnos = sorted(set(range(src_seqno_max)).difference(set(src_seqnos)))
duplicate_src_seqnos = sorted([seqno for seqno,count in src_seqno_counts.iteritems() if count > 1])
if missing_src_seqnos:
self.logger.error("Missing source sequence numbers for task " + str(task))
errors.append("Found missing source sequence numbers for task %d: %s" % (task, missing_src_seqnos))
success = False
if not allow_dups and duplicate_src_seqnos:
self.logger.error("Duplicate source sequence numbers for task " + str(task))
errors.append("Found duplicate source sequence numbers for task %d: %s" % (task, duplicate_src_seqnos))
success = False
# Validate sink messages
sink_seqnos = [msg['seqno'] for msg in sink_messages if msg['task'] == task]
# Every seqno up to the largest one we ever saw should appear. Each seqno should only appear once because
# clean bouncing should commit on rebalance.
sink_seqno_max = max(sink_seqnos)
self.logger.debug("Max sink seqno: %d", sink_seqno_max)
sink_seqno_counts = Counter(sink_seqnos)
missing_sink_seqnos = sorted(set(range(sink_seqno_max)).difference(set(sink_seqnos)))
duplicate_sink_seqnos = sorted([seqno for seqno,count in sink_seqno_counts.iteritems() if count > 1])
if missing_sink_seqnos:
self.logger.error("Missing sink sequence numbers for task " + str(task))
errors.append("Found missing sink sequence numbers for task %d: %s" % (task, missing_sink_seqnos))
success = False
if not allow_dups and duplicate_sink_seqnos:
self.logger.error("Duplicate sink sequence numbers for task " + str(task))
errors.append("Found duplicate sink sequence numbers for task %d: %s" % (task, duplicate_sink_seqnos))
success = False
# Validate source and sink match
if sink_seqno_max > src_seqno_max:
self.logger.error("Found sink sequence number greater than any generated sink sequence number for task %d: %d > %d", task, sink_seqno_max, src_seqno_max)
errors.append("Found sink sequence number greater than any generated sink sequence number for task %d: %d > %d" % (task, sink_seqno_max, src_seqno_max))
success = False
if src_seqno_max < 1000 or sink_seqno_max < 1000:
errors.append("Not enough messages were processed: source:%d sink:%d" % (src_seqno_max, sink_seqno_max))
success = False
if not success:
self.mark_for_collect(self.cc)
# Also collect the data in the topic to aid in debugging
consumer_validator = ConsoleConsumer(self.test_context, 1, self.kafka, self.source.topic, consumer_timeout_ms=1000, print_key=True)
consumer_validator.run()
self.mark_for_collect(consumer_validator, "consumer_stdout")
assert success, "Found validation errors:\n" + "\n ".join(errors)
@cluster(num_nodes=6)
@matrix(connect_protocol=['sessioned', 'compatible', 'eager'])
def test_transformations(self, connect_protocol):
self.CONNECT_PROTOCOL = connect_protocol
self.setup_services(timestamp_type='CreateTime')
self.cc.set_configs(lambda node: self.render("connect-distributed.properties", node=node))
self.cc.start()
ts_fieldname = 'the_timestamp'
NamedConnector = namedtuple('Connector', ['name'])
source_connector = NamedConnector(name='file-src')
self.cc.create_connector({
'name': source_connector.name,
'connector.class': 'org.apache.kafka.connect.file.FileStreamSourceConnector',
'tasks.max': 1,
'file': self.INPUT_FILE,
'topic': self.TOPIC,
'transforms': 'hoistToStruct,insertTimestampField',
'transforms.hoistToStruct.type': 'org.apache.kafka.connect.transforms.HoistField$Value',
'transforms.hoistToStruct.field': 'content',
'transforms.insertTimestampField.type': 'org.apache.kafka.connect.transforms.InsertField$Value',
'transforms.insertTimestampField.timestamp.field': ts_fieldname,
})
wait_until(lambda: self.connector_is_running(source_connector), timeout_sec=30, err_msg='Failed to see connector transition to the RUNNING state')
for node in self.cc.nodes:
node.account.ssh("echo -e -n " + repr(self.FIRST_INPUTS) + " >> " + self.INPUT_FILE)
consumer = ConsoleConsumer(self.test_context, 1, self.kafka, self.TOPIC, consumer_timeout_ms=15000, print_timestamp=True)
consumer.run()
assert len(consumer.messages_consumed[1]) == len(self.FIRST_INPUT_LIST)
expected_schema = {
'type': 'struct',
'fields': [
{'field': 'content', 'type': 'string', 'optional': False},
{'field': ts_fieldname, 'name': 'org.apache.kafka.connect.data.Timestamp', 'type': 'int64', 'version': 1, 'optional': True},
],
'optional': False
}
for msg in consumer.messages_consumed[1]:
(ts_info, value) = msg.split('\t')
assert ts_info.startswith('CreateTime:')
ts = int(ts_info[len('CreateTime:'):])
obj = json.loads(value)
assert obj['schema'] == expected_schema
assert obj['payload']['content'] in self.FIRST_INPUT_LIST
assert obj['payload'][ts_fieldname] == ts
@cluster(num_nodes=5)
@parametrize(broker_version=str(DEV_BRANCH), auto_create_topics=False, security_protocol=SecurityConfig.PLAINTEXT, connect_protocol='sessioned')
@parametrize(broker_version=str(LATEST_0_11_0), auto_create_topics=False, security_protocol=SecurityConfig.PLAINTEXT, connect_protocol='sessioned')
@parametrize(broker_version=str(LATEST_0_10_2), auto_create_topics=False, security_protocol=SecurityConfig.PLAINTEXT, connect_protocol='sessioned')
@parametrize(broker_version=str(LATEST_0_10_1), auto_create_topics=False, security_protocol=SecurityConfig.PLAINTEXT, connect_protocol='sessioned')
@parametrize(broker_version=str(LATEST_0_10_0), auto_create_topics=True, security_protocol=SecurityConfig.PLAINTEXT, connect_protocol='sessioned')
@parametrize(broker_version=str(DEV_BRANCH), auto_create_topics=False, security_protocol=SecurityConfig.PLAINTEXT, connect_protocol='compatible')
@parametrize(broker_version=str(LATEST_2_3), auto_create_topics=False, security_protocol=SecurityConfig.PLAINTEXT, connect_protocol='compatible')
@parametrize(broker_version=str(LATEST_2_2), auto_create_topics=False, security_protocol=SecurityConfig.PLAINTEXT, connect_protocol='compatible')
@parametrize(broker_version=str(LATEST_2_1), auto_create_topics=False, security_protocol=SecurityConfig.PLAINTEXT, connect_protocol='compatible')
@parametrize(broker_version=str(LATEST_2_0), auto_create_topics=False, security_protocol=SecurityConfig.PLAINTEXT, connect_protocol='compatible')
@parametrize(broker_version=str(LATEST_1_1), auto_create_topics=False, security_protocol=SecurityConfig.PLAINTEXT, connect_protocol='compatible')
@parametrize(broker_version=str(LATEST_1_0), auto_create_topics=False, security_protocol=SecurityConfig.PLAINTEXT, connect_protocol='compatible')
@parametrize(broker_version=str(LATEST_0_11_0), auto_create_topics=False, security_protocol=SecurityConfig.PLAINTEXT, connect_protocol='compatible')
@parametrize(broker_version=str(LATEST_0_10_2), auto_create_topics=False, security_protocol=SecurityConfig.PLAINTEXT, connect_protocol='compatible')
@parametrize(broker_version=str(LATEST_0_10_1), auto_create_topics=False, security_protocol=SecurityConfig.PLAINTEXT, connect_protocol='compatible')
@parametrize(broker_version=str(LATEST_0_10_0), auto_create_topics=True, security_protocol=SecurityConfig.PLAINTEXT, connect_protocol='compatible')
@parametrize(broker_version=str(DEV_BRANCH), auto_create_topics=False, security_protocol=SecurityConfig.PLAINTEXT, connect_protocol='eager')
@parametrize(broker_version=str(LATEST_2_3), auto_create_topics=False, security_protocol=SecurityConfig.PLAINTEXT, connect_protocol='eager')
@parametrize(broker_version=str(LATEST_2_2), auto_create_topics=False, security_protocol=SecurityConfig.PLAINTEXT, connect_protocol='eager')
@parametrize(broker_version=str(LATEST_2_1), auto_create_topics=False, security_protocol=SecurityConfig.PLAINTEXT, connect_protocol='eager')
@parametrize(broker_version=str(LATEST_2_0), auto_create_topics=False, security_protocol=SecurityConfig.PLAINTEXT, connect_protocol='eager')
@parametrize(broker_version=str(LATEST_1_1), auto_create_topics=False, security_protocol=SecurityConfig.PLAINTEXT, connect_protocol='eager')
@parametrize(broker_version=str(LATEST_1_0), auto_create_topics=False, security_protocol=SecurityConfig.PLAINTEXT, connect_protocol='eager')
@parametrize(broker_version=str(LATEST_0_11_0), auto_create_topics=False, security_protocol=SecurityConfig.PLAINTEXT, connect_protocol='eager')
@parametrize(broker_version=str(LATEST_0_10_2), auto_create_topics=False, security_protocol=SecurityConfig.PLAINTEXT, connect_protocol='eager')
@parametrize(broker_version=str(LATEST_0_10_1), auto_create_topics=False, security_protocol=SecurityConfig.PLAINTEXT, connect_protocol='eager')
@parametrize(broker_version=str(LATEST_0_10_0), auto_create_topics=True, security_protocol=SecurityConfig.PLAINTEXT, connect_protocol='eager')
def test_broker_compatibility(self, broker_version, auto_create_topics, security_protocol, connect_protocol):
"""
Verify that Connect will start up with various broker versions with various configurations.
When Connect distributed starts up, it either creates internal topics (v0.10.1.0 and after)
or relies upon the broker to auto-create the topics (v0.10.0.x and before).
"""
self.CONNECT_PROTOCOL = connect_protocol
self.setup_services(broker_version=KafkaVersion(broker_version), auto_create_topics=auto_create_topics, security_protocol=security_protocol)
self.cc.set_configs(lambda node: self.render("connect-distributed.properties", node=node))
self.cc.start()
self.logger.info("Creating connectors")
self._start_connector("connect-file-source.properties")
self._start_connector("connect-file-sink.properties")
# Generating data on the source node should generate new records and create new output on the sink node. Timeouts
# here need to be more generous than they are for standalone mode because a) it takes longer to write configs,
# do rebalancing of the group, etc, and b) without explicit leave group support, rebalancing takes awhile
for node in self.cc.nodes:
node.account.ssh("echo -e -n " + repr(self.FIRST_INPUTS) + " >> " + self.INPUT_FILE)
wait_until(lambda: self._validate_file_output(self.FIRST_INPUT_LIST), timeout_sec=70, err_msg="Data added to input file was not seen in the output file in a reasonable amount of time.")
def _validate_file_output(self, input):
input_set = set(input)
# Output needs to be collected from all nodes because we can't be sure where the tasks will be scheduled.
# Between the first and second rounds, we might even end up with half the data on each node.
output_set = set(itertools.chain(*[
[line.strip() for line in self._file_contents(node, self.OUTPUT_FILE)] for node in self.cc.nodes
]))
return input_set == output_set
def _file_contents(self, node, file):
try:
# Convert to a list here or the RemoteCommandError may be returned during a call to the generator instead of
# immediately
return list(node.account.ssh_capture("cat " + file))
except RemoteCommandError:
return []

Some files were not shown because too many files have changed in this diff Show More