Add km module kafka

This commit is contained in:
leewei
2023-02-14 16:27:47 +08:00
parent 229140f067
commit 0b8160a714
4039 changed files with 718112 additions and 46204 deletions

134
vagrant/README.md Normal file
View File

@@ -0,0 +1,134 @@
# Apache Kafka #
Using Vagrant to get up and running.
1) Install Virtual Box [https://www.virtualbox.org/](https://www.virtualbox.org/)
2) Install Vagrant >= 1.6.4 [https://www.vagrantup.com/](https://www.vagrantup.com/)
3) Install Vagrant Plugins:
$ vagrant plugin install vagrant-hostmanager
# Optional
$ vagrant plugin install vagrant-cachier # Caches & shares package downloads across VMs
In the main Kafka folder, do a normal Kafka build:
$ gradle
$ ./gradlew jar
You can override default settings in `Vagrantfile.local`, which is a Ruby file
that is ignored by git and imported into the Vagrantfile.
One setting you likely want to enable
in `Vagrantfile.local` is `enable_dns = true` to put hostnames in the host's
/etc/hosts file. You probably want this to avoid having to use IP addresses when
addressing the cluster from outside the VMs, e.g. if you run a client on the
host. It's disabled by default since it requires `sudo` access, mucks with your
system state, and breaks with naming conflicts if you try to run multiple
clusters concurrently.
Now bring up the cluster:
$ vagrant/vagrant-up.sh
$ # If on aws, run: vagrant/vagrant-up.sh --aws
(This essentially runs vagrant up --no-provision && vagrant hostmanager && vagrant provision)
We separate out the steps (bringing up the base VMs, mapping hostnames, and configuring the VMs)
due to current limitations in ZooKeeper (ZOOKEEPER-1506) that require us to
collect IPs for all nodes before starting ZooKeeper nodes. Breaking into multiple steps
also allows us to bring machines up in parallel on AWS.
Once this completes:
* Zookeeper will be running on 192.168.50.11 (and `zk1` if you used enable_dns)
* Broker 1 on 192.168.50.51 (and `broker1` if you used enable_dns)
* Broker 2 on 192.168.50.52 (and `broker2` if you used enable_dns)
* Broker 3 on 192.168.50.53 (and `broker3` if you used enable_dns)
To log into one of the machines:
vagrant ssh <machineName>
You can access the brokers and zookeeper by their IP or hostname, e.g.
# Specify ZooKeeper node 1 by it's IP: 192.168.50.11
bin/kafka-topics.sh --create --zookeeper 192.168.50.11:2181 --replication-factor 3 --partitions 1 --topic sandbox
# Specify brokers by their hostnames: broker1, broker2, broker3
bin/kafka-console-producer.sh --broker-list broker1:9092,broker2:9092,broker3:9092 --topic sandbox
# Specify brokers by their IP: 192.168.50.51, 192.168.50.52, 192.168.50.53
bin/kafka-console-consumer.sh --bootstrap-server 192.168.50.51:9092,192.168.50.52:9092,192.168.50.53:9092 --topic sandbox --from-beginning
If you need to update the running cluster, you can re-run the provisioner (the
step that installs software and configures services):
vagrant provision
Note that this doesn't currently ensure a fresh start -- old cluster state will
still remain intact after everything restarts. This can be useful for updating
the cluster to your most recent development version.
Finally, you can clean up the cluster by destroying all the VMs:
vagrant destroy -f
## Configuration ##
You can override some default settings by specifying the values in
`Vagrantfile.local`. It is interpreted as a Ruby file, although you'll probably
only ever need to change a few simple configuration variables. Some values you
might want to override:
* `enable_hostmanager` - true by default; override to false if on AWS to allow parallel cluster bringup.
* `enable_dns` - Register each VM with a hostname in /etc/hosts on the
hosts. Hostnames are always set in the /etc/hosts in the VMs, so this is only
necessary if you want to address them conveniently from the host for tasks
that aren't provided by Vagrant.
* `enable_jmx` - Whether to enable JMX ports on 800x and 900x for Zookeeper and the Brokers respectively where `x` is the nodes of each respectively. For example, the zk1 machine would have JMX exposed on 8001, ZK2 would be on 8002, etc.
* `num_workers` - Generic workers that get the code (from this project), but don't start any services (no brokers, no zookeepers, etc). Useful for starting clients. Each worker will have an IP address of `192.168.50.10x` where `x` starts at `1` and increments for each worker.
* `num_zookeepers` - Size of zookeeper cluster
* `num_brokers` - Number of broker instances to run
* `ram_megabytes` - The size of each virtual machine's RAM; default to `1200MB`
## Using Other Providers ##
### EC2 ###
Install the `vagrant-aws` plugin to provide EC2 support:
$ vagrant plugin install vagrant-aws
Next, configure parameters in `Vagrantfile.local`. A few are *required*:
`enable_hostmanager`, `enable_dns`, `ec2_access_key`, `ec2_secret_key`, `ec2_keypair_name`, `ec2_keypair_file`, and
`ec2_security_groups`. A couple of important notes:
1. You definitely want to use `enable_dns` if you plan to run clients outside of
the cluster (e.g. from your local host). If you don't, you'll need to go
lookup `vagrant ssh-config`.
2. You'll have to setup a reasonable security group yourself. You'll need to
open ports for Zookeeper (2888 & 3888 between ZK nodes, 2181 for clients) and
Kafka (9092). Beware that opening these ports to all sources (e.g. so you can
run producers/consumers locally) will allow anyone to access your Kafka
cluster. All other settings have reasonable defaults for setting up an
Ubuntu-based cluster, but you may want to customize instance type, region,
AMI, etc.
3. `ec2_access_key` and `ec2_secret_key` will use the environment variables
`AWS_ACCESS_KEY` and `AWS_SECRET_KEY` respectively if they are set and not
overridden in `Vagrantfile.local`.
4. If you're launching into a VPC, you must specify `ec2_subnet_id` (the subnet
in which to launch the nodes) and `ec2_security_groups` must be a list of
security group IDs instead of names, e.g. `sg-34fd3551` instead of
`kafka-test-cluster`.
Now start things up, but specify the aws provider:
$ vagrant/vagrant-up.sh --aws
Your instances should get tagged with a name including your hostname to make
them identifiable and make it easier to track instances in the AWS management
console.

View File

@@ -0,0 +1,25 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
export AWS_IAM_ROLE=$(curl -s http://169.254.169.254/latest/meta-data/iam/info | grep InstanceProfileArn | cut -d '"' -f 4 | cut -d '/' -f 2)
export AWS_ACCESS_KEY=$(curl -s http://169.254.169.254/latest/meta-data/iam/security-credentials/$AWS_IAM_ROLE | grep AccessKeyId | awk -F\" '{ print $4 }')
export AWS_SECRET_KEY=$(curl -s http://169.254.169.254/latest/meta-data/iam/security-credentials/$AWS_IAM_ROLE | grep SecretAccessKey | awk -F\" '{ print $4 }')
export AWS_SESSION_TOKEN=$(curl -s http://169.254.169.254/latest/meta-data/iam/security-credentials/$AWS_IAM_ROLE | grep Token | awk -F\" '{ print $4 }')
if [ -z "$AWS_ACCESS_KEY" ]; then
echo "Failed to populate environment variables AWS_ACCESS_KEY, AWS_SECRET_KEY, and AWS_SESSION_TOKEN."
echo "AWS_IAM is currently $AWS_IAM. Double-check that this is correct. If not set, add this command to your .bashrc file:"
echo "export AWS_IAM=<my_aws_iam> # put this into your ~/.bashrc"
fi

View File

@@ -0,0 +1,29 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# Use this template Vagrantfile.local for running system tests on aws
# To use it, move it to the base kafka directory and rename
# it to Vagrantfile.local, and adjust variables as needed.
ec2_instance_type = "m3.xlarge"
ec2_spot_max_price = "0.266" # On-demand price for instance type
enable_hostmanager = false
num_zookeepers = 0
num_brokers = 0
num_workers = 9
ec2_keypair_name = kafkatest
ec2_keypair_file = ../kafkatest.pem
ec2_security_groups = ['kafkatest']
ec2_region = 'us-west-2'
ec2_ami = "ami-29ebb519"

81
vagrant/aws/aws-init.sh Executable file
View File

@@ -0,0 +1,81 @@
#!/usr/bin/env bash
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# This script can be used to set up a driver machine on aws from which you will run tests
# or bring up your mini Kafka cluster.
# Install dependencies
sudo apt-get install -y \
maven \
openjdk-8-jdk-headless \
build-essential \
ruby-dev \
zlib1g-dev \
realpath \
python-setuptools \
iperf \
traceroute
base_dir=`dirname $0`/../..
if [ -z `which vagrant` ]; then
echo "Installing vagrant..."
wget https://releases.hashicorp.com/vagrant/2.1.5/vagrant_2.1.5_x86_64.deb
sudo dpkg -i vagrant_2.1.5_x86_64.deb
rm -f vagrant_2.1.5_x86_64.deb
fi
# Install necessary vagrant plugins
# Note: Do NOT install vagrant-cachier since it doesn't work on AWS and only
# adds log noise
# Custom vagrant-aws with spot instance support. See https://github.com/mitchellh/vagrant-aws/issues/32
wget -nv https://s3-us-west-2.amazonaws.com/confluent-packaging-tools/vagrant-aws-0.7.2.spot.gem -P /tmp
vagrant_plugins="/tmp/vagrant-aws-0.7.2.spot.gem vagrant-hostmanager"
existing=`vagrant plugin list`
for plugin in $vagrant_plugins; do
echo $existing | grep $plugin > /dev/null
if [ $? != 0 ]; then
vagrant plugin install $plugin
fi
done
# Create Vagrantfile.local as a convenience
if [ ! -e "$base_dir/Vagrantfile.local" ]; then
cp $base_dir/vagrant/aws/aws-example-Vagrantfile.local $base_dir/Vagrantfile.local
fi
gradle="gradle-2.2.1"
if [ -z `which gradle` ] && [ ! -d $base_dir/$gradle ]; then
if [ ! -e $gradle-bin.zip ]; then
wget https://services.gradle.org/distributions/$gradle-bin.zip
fi
unzip $gradle-bin.zip
rm -rf $gradle-bin.zip
mv $gradle $base_dir/$gradle
fi
# Ensure aws access keys are in the environment when we use a EC2 driver machine
LOCAL_HOSTNAME=$(hostname -d)
if [[ ${LOCAL_HOSTNAME} =~ .*\.compute\.internal ]]; then
grep "AWS ACCESS KEYS" ~/.bashrc > /dev/null
if [ $? != 0 ]; then
echo "# --- AWS ACCESS KEYS ---" >> ~/.bashrc
echo ". `realpath $base_dir/aws/aws-access-keys-commands`" >> ~/.bashrc
echo "# -----------------------" >> ~/.bashrc
source ~/.bashrc
fi
fi

167
vagrant/base.sh Executable file
View File

@@ -0,0 +1,167 @@
#!/usr/bin/env bash
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
set -ex
# The version of Kibosh to use for testing.
# If you update this, also update tests/docker/Dockerfile
export KIBOSH_VERSION=8841dd392e6fbf02986e2fb1f1ebf04df344b65a
path_to_jdk_cache() {
jdk_version=$1
echo "/tmp/jdk-${jdk_version}.tar.gz"
}
fetch_jdk_tgz() {
jdk_version=$1
path=$(path_to_jdk_cache $jdk_version)
if [ ! -e $path ]; then
mkdir -p $(dirname $path)
curl -s -L "https://s3-us-west-2.amazonaws.com/kafka-packages/jdk-${jdk_version}.tar.gz" -o $path
fi
}
JDK_MAJOR="${JDK_MAJOR:-8}"
JDK_FULL="${JDK_FULL:-8u202-linux-x64}"
if [ -z `which javac` ]; then
apt-get -y update
apt-get install -y software-properties-common python-software-properties binutils java-common
echo "===> Installing JDK..."
mkdir -p /opt/jdk
cd /opt/jdk
rm -rf $JDK_MAJOR
mkdir -p $JDK_MAJOR
cd $JDK_MAJOR
fetch_jdk_tgz $JDK_FULL
tar x --strip-components=1 -zf $(path_to_jdk_cache $JDK_FULL)
for bin in /opt/jdk/$JDK_MAJOR/bin/* ; do
name=$(basename $bin)
update-alternatives --install /usr/bin/$name $name $bin 1081 && update-alternatives --set $name $bin
done
echo -e "export JAVA_HOME=/opt/jdk/$JDK_MAJOR\nexport PATH=\$PATH:\$JAVA_HOME/bin" > /etc/profile.d/jdk.sh
echo "JDK installed: $(javac -version 2>&1)"
fi
chmod a+rw /opt
if [ -h /opt/kafka-dev ]; then
# reset symlink
rm /opt/kafka-dev
fi
ln -s /vagrant /opt/kafka-dev
get_kafka() {
version=$1
scala_version=$2
kafka_dir=/opt/kafka-$version
url=https://s3-us-west-2.amazonaws.com/kafka-packages/kafka_$scala_version-$version.tgz
# the .tgz above does not include the streams test jar hence we need to get it separately
url_streams_test=https://s3-us-west-2.amazonaws.com/kafka-packages/kafka-streams-$version-test.jar
if [ ! -d /opt/kafka-$version ]; then
pushd /tmp
curl -O $url
curl -O $url_streams_test || true
file_tgz=`basename $url`
file_streams_jar=`basename $url_streams_test` || true
tar -xzf $file_tgz
rm -rf $file_tgz
file=`basename $file_tgz .tgz`
mv $file $kafka_dir
mv $file_streams_jar $kafka_dir/libs || true
popd
fi
}
# Install Kibosh
apt-get update -y && apt-get install -y git cmake pkg-config libfuse-dev
pushd /opt
rm -rf /opt/kibosh
git clone -q https://github.com/confluentinc/kibosh.git
pushd "/opt/kibosh"
git reset --hard $KIBOSH_VERSION
mkdir "/opt/kibosh/build"
pushd "/opt/kibosh/build"
../configure && make -j 2
popd
popd
popd
# Install iperf
apt-get install -y iperf traceroute
# Test multiple Kafka versions
# We want to use the latest Scala version per Kafka version
# Previously we could not pull in Scala 2.12 builds, because Scala 2.12 requires Java 8 and we were running the system
# tests with Java 7. We have since switched to Java 8, so 2.0.0 and later use Scala 2.12.
get_kafka 0.8.2.2 2.11
chmod a+rw /opt/kafka-0.8.2.2
get_kafka 0.9.0.1 2.11
chmod a+rw /opt/kafka-0.9.0.1
get_kafka 0.10.0.1 2.11
chmod a+rw /opt/kafka-0.10.0.1
get_kafka 0.10.1.1 2.11
chmod a+rw /opt/kafka-0.10.1.1
get_kafka 0.10.2.2 2.11
chmod a+rw /opt/kafka-0.10.2.2
get_kafka 0.11.0.3 2.11
chmod a+rw /opt/kafka-0.11.0.3
get_kafka 1.0.2 2.11
chmod a+rw /opt/kafka-1.0.2
get_kafka 1.1.1 2.11
chmod a+rw /opt/kafka-1.1.1
get_kafka 2.0.1 2.12
chmod a+rw /opt/kafka-2.0.1
get_kafka 2.1.1 2.12
chmod a+rw /opt/kafka-2.1.1
get_kafka 2.2.2 2.12
chmod a+rw /opt/kafka-2.2.2
get_kafka 2.3.1 2.12
chmod a+rw /opt/kafka-2.3.1
get_kafka 2.4.0 2.12
chmod a+rw /opt/kafka-2.4.0
get_kafka 2.5.0 2.12
chmod a+rw /opt/kafka-2.5.0
# For EC2 nodes, we want to use /mnt, which should have the local disk. On local
# VMs, we can just create it if it doesn't exist and use it like we'd use
# /tmp. Eventually, we'd like to also support more directories, e.g. when EC2
# instances have multiple local disks.
if [ ! -e /mnt ]; then
mkdir /mnt
fi
chmod a+rwx /mnt
# Run ntpdate once to sync to ntp servers
# use -u option to avoid port collision in case ntp daemon is already running
ntpdate -u pool.ntp.org
# Install ntp daemon - it will automatically start on boot
apt-get -y install ntp
# Increase the ulimit
mkdir -p /etc/security/limits.d
echo "* soft nofile 128000" >> /etc/security/limits.d/nofile.conf
echo "* hard nofile 128000" >> /etc/security/limits.d/nofile.conf
ulimit -Hn 128000
ulimit -Sn 128000

43
vagrant/broker.sh Executable file
View File

@@ -0,0 +1,43 @@
#!/usr/bin/env bash
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# Usage: brokers.sh <broker ID> <public hostname or IP> <list zookeeper public hostname or IP + port>
set -e
BROKER_ID=$1
PUBLIC_ADDRESS=$2
PUBLIC_ZOOKEEPER_ADDRESSES=$3
JMX_PORT=$4
kafka_dir=/opt/kafka-dev
cd $kafka_dir
sed \
-e 's/broker.id=0/'broker.id=$BROKER_ID'/' \
-e 's/#advertised.host.name=<hostname routable by clients>/'advertised.host.name=$PUBLIC_ADDRESS'/' \
-e 's/zookeeper.connect=localhost:2181/'zookeeper.connect=$PUBLIC_ZOOKEEPER_ADDRESSES'/' \
$kafka_dir/config/server.properties > $kafka_dir/config/server-$BROKER_ID.properties
echo "Killing server"
bin/kafka-server-stop.sh || true
sleep 5 # Because kafka-server-stop.sh doesn't actually wait
echo "Starting server"
if [[ -n $JMX_PORT ]]; then
export JMX_PORT=$JMX_PORT
export KAFKA_JMX_OPTS="-Djava.rmi.server.hostname=$PUBLIC_ADDRESS -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false "
fi
bin/kafka-server-start.sh $kafka_dir/config/server-$BROKER_ID.properties 1>> /tmp/broker.log 2>> /tmp/broker.log &

75
vagrant/package-base-box.sh Executable file
View File

@@ -0,0 +1,75 @@
#!/usr/bin/env bash
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# This script automates the process of creating and packaging
# a new vagrant base_box. For use locally (not aws).
base_dir=`dirname $0`/..
cd $base_dir
backup_vagrantfile=backup_Vagrantfile.local
local_vagrantfile=Vagrantfile.local
# Restore original Vagrantfile.local, if it exists
function revert_vagrantfile {
rm -f $local_vagrantfile
if [ -e $backup_vagrantfile ]; then
mv $backup_vagrantfile $local_vagrantfile
fi
}
function clean_up {
echo "Cleaning up..."
vagrant destroy -f
rm -f package.box
revert_vagrantfile
}
# Name of the new base box
base_box="kafkatest-worker"
# vagrant VM name
worker_name="worker1"
echo "Destroying vagrant machines..."
vagrant destroy -f
echo "Removing $base_box from vagrant..."
vagrant box remove $base_box
echo "Bringing up a single vagrant machine from scratch..."
if [ -e $local_vagrantfile ]; then
mv $local_vagrantfile $backup_vagrantfile
fi
echo "num_workers = 1" > $local_vagrantfile
echo "num_brokers = 0" >> $local_vagrantfile
echo "num_zookeepers = 0" >> $local_vagrantfile
vagrant up
up_status=$?
if [ $up_status != 0 ]; then
echo "Failed to bring up a template vm, please try running again."
clean_up
exit $up_status
fi
echo "Packaging $worker_name..."
vagrant package $worker_name
echo "Adding new base box $base_box to vagrant..."
vagrant box add $base_box package.box
clean_up

View File

@@ -0,0 +1,26 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# Use this example Vagrantfile.local for running system tests
# To use it, move it to the base kafka directory and rename
# it to Vagrantfile.local
num_zookeepers = 0
num_brokers = 0
num_workers = 9
base_box = "kafkatest-worker"
# System tests use hostnames for each worker that need to be defined in /etc/hosts on the host running ducktape
enable_dns = true

266
vagrant/vagrant-up.sh Executable file
View File

@@ -0,0 +1,266 @@
#!/usr/bin/env bash
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
set -o nounset
set -o errexit # exit script if any command exits with nonzero value
readonly PROG_NAME=$(basename $0)
readonly PROG_DIR=$(dirname $(realpath $0))
readonly INVOKE_DIR=$(pwd)
readonly ARGS="$@"
# overrideable defaults
AWS=false
PARALLEL=true
MAX_PARALLEL=5
DEBUG=false
readonly USAGE="Usage: $PROG_NAME [-h | --help] [--aws [--no-parallel] [--max-parallel MAX]]"
readonly HELP="$(cat <<EOF
Tool to bring up a vagrant cluster on local machine or aws.
-h | --help Show this help message
--aws Use if you are running in aws
--no-parallel Bring up machines not in parallel. Only applicable on aws
--max-parallel MAX Maximum number of machines to bring up in parallel. Note: only applicable on test worker machines on aws. default: $MAX_PARALLEL
--debug Enable debug information for vagrant
Approximately speaking, this wrapper script essentially wraps 2 commands:
vagrant up
vagrant hostmanager
The situation on aws is complicated by the fact that aws imposes a maximum request rate,
which effectively caps the number of machines we are able to bring up in parallel. Therefore, on aws,
this wrapper script attempts to bring up machines in small batches.
If you are seeing rate limit exceeded errors, you may need to use a reduced --max-parallel setting.
EOF
)"
function help {
echo "$USAGE"
echo "$HELP"
exit 0
}
while [[ $# > 0 ]]; do
key="$1"
case $key in
-h | --help)
help
;;
--aws)
AWS=true
;;
--no-parallel)
PARALLEL=false
;;
--max-parallel)
MAX_PARALLEL="$2"
shift
;;
--debug)
DEBUG=true
;;
*)
# unknown option
echo "Unknown option $1"
exit 1
;;
esac
shift # past argument or value
done
# Get a list of vagrant machines (in any state)
function read_vagrant_machines {
local ignore_state="ignore"
local reading_state="reading"
local tmp_file="tmp-$RANDOM"
local state="$ignore_state"
local machines=""
while read -r line; do
# Lines before the first empty line are ignored
# The first empty line triggers change from ignore state to reading state
# When in reading state, we parse in machine names until we hit the next empty line,
# which signals that we're done parsing
if [[ -z "$line" ]]; then
if [[ "$state" == "$ignore_state" ]]; then
state="$reading_state"
else
# all done
echo "$machines"
return
fi
continue
fi
# Parse machine name while in reading state
if [[ "$state" == "$reading_state" ]]; then
line=$(echo "$line" | cut -d ' ' -f 1)
if [[ -z "$machines" ]]; then
machines="$line"
else
machines="${machines} ${line}"
fi
fi
done < <(vagrant status)
}
# Filter "list", returning a list of strings containing pattern as a substring
function filter {
local list="$1"
local pattern="$2"
local result=""
for item in $list; do
if [[ ! -z "$(echo $item | grep "$pattern")" ]]; then
result="$result $item"
fi
done
echo "$result"
}
# Given a list of machine names, return only test worker machines
function worker {
local machines="$1"
local workers=$(filter "$machines" "worker")
workers=$(echo "$workers" | xargs) # trim leading/trailing whitespace
echo "$workers"
}
# Given a list of machine names, return only zookeeper and broker machines
function zk_broker {
local machines="$1"
local zk_broker_list=$(filter "$machines" "zk")
zk_broker_list="$zk_broker_list $(filter "$machines" "broker")"
zk_broker_list=$(echo "$zk_broker_list" | xargs) # trim leading/trailing whitespace
echo "$zk_broker_list"
}
# Run a vagrant command on batches of machines of size $group_size
# This is annoying but necessary on aws to avoid errors due to AWS request rate
# throttling
#
# Example
# $ vagrant_batch_command "vagrant up" "m1 m2 m3 m4 m5" "2"
#
# This is equivalent to running "vagrant up" on groups of machines of size 2 or less, i.e.:
# $ vagrant up m1 m2
# $ vagrant up m3 m4
# $ vagrant up m5
function vagrant_batch_command {
local vagrant_cmd="$1"
local machines="$2"
local group_size="$3"
local count=1
local m_group=""
# Using --provision flag makes this command useable both when bringing up a cluster from scratch,
# and when bringing up a halted cluster. Permissions on certain directores set during provisioning
# seem to revert when machines are halted, so --provision ensures permissions are set correctly in all cases
for machine in $machines; do
m_group="$m_group $machine"
if [[ $(expr $count % $group_size) == 0 ]]; then
# We've reached a full group
# Bring up this part of the cluster
$vagrant_cmd $m_group
m_group=""
fi
((count++))
done
# Take care of any leftover partially complete group
if [[ ! -z "$m_group" ]]; then
$vagrant_cmd $m_group
fi
}
# We assume vagrant-hostmanager is installed, but may or may not be disabled during vagrant up
# In this fashion, we ensure we run hostmanager after machines are up, and before provisioning.
# This sequence of commands is necessary for example for bringing up a multi-node zookeeper cluster
function bring_up_local {
vagrant up --no-provision
vagrant hostmanager
vagrant provision
}
function bring_up_aws {
local parallel="$1"
local max_parallel="$2"
local machines="$(read_vagrant_machines)"
case "$3" in
true)
local debug="--debug"
;;
false)
local debug=""
;;
esac
zk_broker_machines=$(zk_broker "$machines")
worker_machines=$(worker "$machines")
if [[ "$parallel" == "true" ]]; then
if [[ ! -z "$zk_broker_machines" ]]; then
# We still have to bring up zookeeper/broker nodes serially
echo "Bringing up zookeeper/broker machines serially"
vagrant up --provider=aws --no-parallel --no-provision $zk_broker_machines $debug
vagrant hostmanager --provider=aws
vagrant provision
fi
if [[ ! -z "$worker_machines" ]]; then
echo "Bringing up test worker machines in parallel"
# Try to isolate this job in its own /tmp space. See note
# below about vagrant issue
local vagrant_rsync_temp_dir=$(mktemp -d);
TMPDIR=$vagrant_rsync_temp_dir vagrant_batch_command "vagrant up $debug --provider=aws" "$worker_machines" "$max_parallel"
rm -rf $vagrant_rsync_temp_dir
vagrant hostmanager --provider=aws
fi
else
vagrant up --provider=aws --no-parallel --no-provision $debug
vagrant hostmanager --provider=aws
vagrant provision
fi
# Currently it seems that the AWS provider will always run rsync
# as part of vagrant up. However,
# https://github.com/mitchellh/vagrant/issues/7531 means it is not
# safe to do so. Since the bug doesn't seem to cause any direct
# errors, just missing data on some nodes, follow up with serial
# rsyncing to ensure we're in a clean state. Use custom TMPDIR
# values to ensure we're isolated from any other instances of this
# script that are running/ran recently and may cause different
# instances to sync to the wrong nodes
for worker in $worker_machines; do
local vagrant_rsync_temp_dir=$(mktemp -d);
TMPDIR=$vagrant_rsync_temp_dir vagrant rsync $worker;
rm -rf $vagrant_rsync_temp_dir
done
}
function main {
if [[ "$AWS" == "true" ]]; then
bring_up_aws "$PARALLEL" "$MAX_PARALLEL" "$DEBUG"
else
bring_up_local
fi
}
main

47
vagrant/zk.sh Executable file
View File

@@ -0,0 +1,47 @@
#!/usr/bin/env bash
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# Usage: zk.sh <zkid> <num_zk>
set -e
ZKID=$1
NUM_ZK=$2
JMX_PORT=$3
kafka_dir=/opt/kafka-dev
cd $kafka_dir
cp $kafka_dir/config/zookeeper.properties $kafka_dir/config/zookeeper-$ZKID.properties
echo "initLimit=5" >> $kafka_dir/config/zookeeper-$ZKID.properties
echo "syncLimit=2" >> $kafka_dir/config/zookeeper-$ZKID.properties
echo "quorumListenOnAllIPs=true" >> $kafka_dir/config/zookeeper-$ZKID.properties
for i in `seq 1 $NUM_ZK`; do
echo "server.${i}=zk${i}:2888:3888" >> $kafka_dir/config/zookeeper-$ZKID.properties
done
mkdir -p /tmp/zookeeper
echo "$ZKID" > /tmp/zookeeper/myid
echo "Killing ZooKeeper"
bin/zookeeper-server-stop.sh || true
sleep 5 # Because zookeeper-server-stop.sh doesn't actually wait
echo "Starting ZooKeeper"
if [[ -n $JMX_PORT ]]; then
export JMX_PORT=$JMX_PORT
export KAFKA_JMX_OPTS="-Djava.rmi.server.hostname=zk$ZKID -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false "
fi
bin/zookeeper-server-start.sh config/zookeeper-$ZKID.properties 1>> /tmp/zk.log 2>> /tmp/zk.log &