Add km module kafka

This commit is contained in:
leewei
2023-02-14 14:57:39 +08:00
parent 229140f067
commit 469baad65b
4310 changed files with 736354 additions and 46204 deletions

View File

@@ -0,0 +1,196 @@
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<script><!--#include virtual="../../js/templateData.js" --></script>
<script id="content-template" type="text/x-handlebars-template">
<!-- h1>Developer Guide for Kafka Streams</h1 -->
<div class="sub-nav-sticky">
<div class="sticky-top">
<!-- div style="height:35px">
<a href="/{{version}}/documentation/streams/">Introduction</a>
<a class="active-menu-item" href="/{{version}}/documentation/streams/developer-guide">Developer Guide</a>
<a href="/{{version}}/documentation/streams/core-concepts">Concepts</a>
<a href="/{{version}}/documentation/streams/quickstart">Run Demo App</a>
<a href="/{{version}}/documentation/streams/tutorial">Tutorial: Write App</a>
</div -->
</div>
</div>
<div class="section" id="application-reset-tool">
<span id="streams-developer-guide-app-reset"></span><h1>Application Reset Tool<a class="headerlink" href="#application-reset-tool" title="Permalink to this headline"></a></h1>
<p>You can reset an application and force it to reprocess its data from scratch by using the application reset tool.
This can be useful for development and testing, or when fixing bugs.</p>
<p>The application reset tool handles the Kafka Streams <a class="reference internal" href="manage-topics.html#streams-developer-guide-topics-user"><span class="std std-ref">user topics</span></a> (input,
output, and intermediate topics) and <a class="reference internal" href="manage-topics.html#streams-developer-guide-topics-internal"><span class="std std-ref">internal topics</span></a> differently
when resetting the application.</p>
<p>Here&#8217;s what the application reset tool does for each topic type:</p>
<ul class="simple">
<li>Input topics: Reset offsets to specified position (by default to the beginning of the topic).</li>
<li>Intermediate topics: Skip to the end of the topic, i.e., set the application&#8217;s committed consumer offsets for all partitions to each partition&#8217;s <code class="docutils literal"><span class="pre">logSize</span></code> (for consumer group <code class="docutils literal"><span class="pre">application.id</span></code>).</li>
<li>Internal topics: Delete the internal topic (this automatically deletes any committed offsets).</li>
</ul>
<p>The application reset tool does not:</p>
<ul class="simple">
<li>Reset output topics of an application. If any output (or intermediate) topics are consumed by downstream
applications, it is your responsibility to adjust those downstream applications as appropriate when you reset the
upstream application.</li>
<li>Reset the local environment of your application instances. It is your responsibility to delete the local
state on any machine on which an application instance was run. See the instructions in section
<a class="reference internal" href="#streams-developer-guide-reset-local-environment"><span class="std std-ref">Step 2: Reset the local environments of your application instances</span></a> on how to do this.</li>
</ul>
<dl class="docutils">
<dt>Prerequisites</dt>
<dd><ul class="first last">
<li><p class="first">All instances of your application must be stopped. Otherwise, the application may enter an invalid state, crash, or produce incorrect results. You can verify whether the consumer group with ID <code class="docutils literal"><span class="pre">application.id</span></code> is still active by using <code class="docutils literal"><span class="pre">bin/kafka-consumer-groups</span></code>.</p>
</li>
<li><p class="first">Use this tool with care and double-check its parameters: If you provide wrong parameter values (e.g., typos in <code class="docutils literal"><span class="pre">application.id</span></code>) or specify parameters inconsistently (e.g., specify the wrong input topics for the application), this tool might invalidate the application&#8217;s state or even impact other applications, consumer groups, or your Kafka topics.</p>
</li>
<li><p class="first">You should manually delete and re-create any intermediate topics before running the application reset tool. This will free up disk space in Kafka brokers.</p>
</li>
<li><p class="first">You should delete and recreate intermediate topics before running the application reset tool, unless the following applies:</p>
<blockquote>
<div><ul class="simple">
<li>You have external downstream consumers for the application&#8217;s intermediate topics.</li>
<li>You are in a development environment where manually deleting and re-creating intermediate topics is unnecessary.</li>
</ul>
</div></blockquote>
</li>
</ul>
</dd>
</dl>
<div class="section" id="step-1-run-the-application-reset-tool">
<h2>Step 1: Run the application reset tool<a class="headerlink" href="#step-1-run-the-application-reset-tool" title="Permalink to this headline"></a></h2>
<p>Invoke the application reset tool from the command line</p>
<div class="highlight-bash"><div class="highlight"><pre><span></span>&lt;path-to-kafka&gt;/bin/kafka-streams-application-reset
</pre></div>
</div>
<p>The tool accepts the following parameters:</p>
<div class="highlight-bash"><div class="highlight"><pre><span></span>Option <span class="o">(</span>* <span class="o">=</span> required<span class="o">)</span> Description
--------------------- -----------
* --application-id &lt;String: id&gt; The Kafka Streams application ID
<span class="o">(</span>application.id<span class="o">)</span>.
--bootstrap-servers &lt;String: urls&gt; Comma-separated list of broker urls with
format: HOST1:PORT1,HOST2:PORT2
<span class="o">(</span>default: localhost:9092<span class="o">)</span>
--by-duration &lt;String: urls&gt; Reset offsets to offset by duration from
current timestamp. Format: '<span>PnDTnHnMnS</span>'
--config-file &lt;String: file name&gt; Property file containing configs to be
passed to admin clients and embedded
consumer.
--dry-run Display the actions that would be
performed without executing the reset
commands.
--from-file &lt;String: urls&gt; Reset offsets to values defined in CSV
file.
--input-topics &lt;String: list&gt; Comma-separated list of user input
topics. For these topics, the tool will
reset the offset to the earliest
available offset.
--intermediate-topics &lt;String: list&gt; Comma-separated list of intermediate user
topics <span class="o">(</span>topics used in the through<span class="o">()</span>
method<span class="o">)</span>. For these topics, the tool
will skip to the end.
--shift-by &lt;Long: number-of-offsets&gt; Reset offsets shifting current offset by
'n', where 'n' can be positive or
negative
--to-datetime &lt;String&gt; Reset offsets to offset from datetime.
Format: 'YYYY-MM-DDTHH:mm:SS.sss'
--to-earliest Reset offsets to earliest offset.
--to-latest Reset offsets to latest offset.
--to-offset &lt;Long&gt; Reset offsets to a specific offset.
--zookeeper Zookeeper option is deprecated by
bootstrap.servers, as the reset tool
would no longer access Zookeeper
directly.
</pre></div>
</div>
<p>Consider the following as reset-offset scenarios for <code>input-topics</code>:</p>
<ul>
<li> by-duration</li>
<li> from-file</li>
<li> shift-by</li>
<li> to-datetime</li>
<li> to-earliest</li>
<li> to-latest</li>
<li> to-offset</li>
</ul>
<p>Only one of these scenarios can be defined. If not, <code>to-earliest</code> will be executed by default</p>
<p>All the other parameters can be combined as needed. For example, if you want to restart an application from an
empty internal state, but not reprocess previous data, simply omit the parameters <code class="docutils literal"><span class="pre">--input-topics</span></code> and
<code class="docutils literal"><span class="pre">--intermediate-topics</span></code>.</p>
</div>
<div class="section" id="step-2-reset-the-local-environments-of-your-application-instances">
<span id="streams-developer-guide-reset-local-environment"></span><h2>Step 2: Reset the local environments of your application instances<a class="headerlink" href="#step-2-reset-the-local-environments-of-your-application-instances" title="Permalink to this headline"></a></h2>
<p>For a complete application reset, you must delete the application&#8217;s local state directory on any machines where the
application instance was run. You must do this before restarting an application instance on the same machine. You can
use either of these methods:</p>
<ul class="simple">
<li>The API method <code class="docutils literal"><span class="pre">KafkaStreams#cleanUp()</span></code> in your application code.</li>
<li>Manually delete the corresponding local state directory (default location: <code class="docutils literal"><span class="pre">/tmp/kafka-streams/&lt;application.id&gt;</span></code>). For more information, see <a href="/{{version}}/javadoc/org/apache/kafka/streams/StreamsConfig.html#STATE_DIR_CONFIG">Streams</a> javadocs.</li>
</ul>
</div>
</div>
</div>
</div>
<div class="pagination">
<a href="/{{version}}/documentation/streams/developer-guide/security" class="pagination__btn pagination__btn__prev">Previous</a>
<a href="/{{version}}/documentation/streams/upgrade-guide" class="pagination__btn pagination__btn__next">Next</a>
</div>
</script>
<!--#include virtual="../../../includes/_header.htm" -->
<!--#include virtual="../../../includes/_top.htm" -->
<div class="content documentation documentation--current">
<!--#include virtual="../../../includes/_nav.htm" -->
<div class="right">
<!--#include virtual="../../../includes/_docs_banner.htm" -->
<ul class="breadcrumbs">
<li><a href="/documentation">Documentation</a></li>
<li><a href="/documentation/streams">Kafka Streams</a></li>
<li><a href="/documentation/streams/developer-guide/">Developer Guide</a></li>
</ul>
<div class="p-content"></div>
</div>
</div>
<!--#include virtual="../../../includes/_footer.htm" -->
<script>
$(function() {
// Show selected style on nav item
$('.b-nav__streams').addClass('selected');
//sticky secondary nav
var $navbar = $(".sub-nav-sticky"),
y_pos = $navbar.offset().top,
height = $navbar.height();
$(window).scroll(function() {
var scrollTop = $(window).scrollTop();
if (scrollTop > y_pos - height) {
$navbar.addClass("navbar-fixed")
} else if (scrollTop <= y_pos) {
$navbar.removeClass("navbar-fixed")
}
});
// Display docs subnav items
$('.b-nav__docs').parent().toggleClass('nav__item__with__subs--expanded');
});
</script>

View File

@@ -0,0 +1,858 @@
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<script><!--#include virtual="../../js/templateData.js" --></script>
<script id="content-template" type="text/x-handlebars-template">
<!-- h1>Developer Guide for Kafka Streams</h1 -->
<div class="sub-nav-sticky">
<div class="sticky-top">
<!-- div style="height:35px">
<a href="/{{version}}/documentation/streams/">Introduction</a>
<a class="active-menu-item" href="/{{version}}/documentation/streams/developer-guide">Developer Guide</a>
<a href="/{{version}}/documentation/streams/core-concepts">Concepts</a>
<a href="/{{version}}/documentation/streams/quickstart">Run Demo App</a>
<a href="/{{version}}/documentation/streams/tutorial">Tutorial: Write App</a>
</div -->
</div>
</div>
<div class="section" id="configuring-a-streams-application">
<span id="streams-developer-guide-configuration"></span><h1>Configuring a Streams Application<a class="headerlink" href="#configuring-a-streams-application" title="Permalink to this headline"></a></h1>
<p>Kafka and Kafka Streams configuration options must be configured before using Streams. You can configure Kafka Streams by specifying parameters in a <code class="docutils literal"><span class="pre">java.util.Properties</span></code> instance.</p>
<ol class="arabic">
<li><p class="first">Create a <code class="docutils literal"><span class="pre">java.util.Properties</span></code> instance.</p>
</li>
<li><p class="first">Set the <a class="reference internal" href="#streams-developer-guide-required-configs"><span class="std std-ref">parameters</span></a>. For example:</p>
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">java.util.Properties</span><span class="o">;</span>
<span class="kn">import</span> <span class="nn">org.apache.kafka.streams.StreamsConfig</span><span class="o">;</span>
<span class="n">Properties</span> <span class="n">settings</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Properties</span><span class="o">();</span>
<span class="c1">// Set a few key parameters</span>
<span class="n">settings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">StreamsConfig</span><span class="o">.</span><span class="na">APPLICATION_ID_CONFIG</span><span class="o">,</span> <span class="s">&quot;my-first-streams-application&quot;</span><span class="o">);</span>
<span class="n">settings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">StreamsConfig</span><span class="o">.</span><span class="na">BOOTSTRAP_SERVERS_CONFIG</span><span class="o">,</span> <span class="s">&quot;kafka-broker1:9092&quot;</span><span class="o">);</span>
<span class="c1">// Any further settings</span>
<span class="n">settings</span><span class="o">.</span><span class="na">put</span><span class="o">(...</span> <span class="o">,</span> <span class="o">...);</span>
</pre></div>
</div>
</li>
</ol>
<div class="section" id="configuration-parameter-reference">
<span id="streams-developer-guide-required-configs"></span><h2>Configuration parameter reference<a class="headerlink" href="#configuration-parameter-reference" title="Permalink to this headline"></a></h2>
<p>This section contains the most common Streams configuration parameters. For a full reference, see the <a class="reference external" href="/{{version}}/javadoc/org/apache/kafka/streams/StreamsConfig.html">Streams</a> Javadocs.</p>
<div class="contents local topic" id="contents">
<ul class="simple">
<li><a class="reference internal" href="#required-configuration-parameters" id="id3">Required configuration parameters</a><ul>
<li><a class="reference internal" href="#application-id" id="id4">application.id</a></li>
<li><a class="reference internal" href="#bootstrap-servers" id="id5">bootstrap.servers</a></li>
</ul>
</li>
<li><a class="reference internal" href="#optional-configuration-parameters" id="id6">Optional configuration parameters</a><ul>
<li><a class="reference internal" href="#default-deserialization-exception-handler" id="id7">default.deserialization.exception.handler</a></li>
<li><a class="reference internal" href="#default-production-exception-handler" id="id24">default.production.exception.handler</a></li>
<li><a class="reference internal" href="#default-key-serde" id="id8">default.key.serde</a></li>
<li><a class="reference internal" href="#default-value-serde" id="id9">default.value.serde</a></li>
<li><a class="reference internal" href="#num-standby-replicas" id="id10">num.standby.replicas</a></li>
<li><a class="reference internal" href="#num-stream-threads" id="id11">num.stream.threads</a></li>
<li><a class="reference internal" href="#partition-grouper" id="id12">partition.grouper</a></li>
<li><a class="reference internal" href="#processing-guarantee" id="id25">processing.guarantee</a></li>
<li><a class="reference internal" href="#replication-factor" id="id13">replication.factor</a></li>
<li><a class="reference internal" href="#state-dir" id="id14">state.dir</a></li>
<li><a class="reference internal" href="#timestamp-extractor" id="id15">timestamp.extractor</a></li>
</ul>
</li>
<li><a class="reference internal" href="#kafka-consumers-and-producer-configuration-parameters" id="id16">Kafka consumers and producer configuration parameters</a><ul>
<li><a class="reference internal" href="#naming" id="id17">Naming</a></li>
<li><a class="reference internal" href="#default-values" id="id18">Default Values</a></li>
<li><a class="reference internal" href="#enable-auto-commit" id="id19">enable.auto.commit</a></li>
<li><a class="reference internal" href="#rocksdb-config-setter" id="id20">rocksdb.config.setter</a></li>
</ul>
</li>
<li><a class="reference internal" href="#recommended-configuration-parameters-for-resiliency" id="id21">Recommended configuration parameters for resiliency</a><ul>
<li><a class="reference internal" href="#acks" id="id22">acks</a></li>
<li><a class="reference internal" href="#id2" id="id23">replication.factor</a></li>
</ul>
</li>
</ul>
</div>
<div class="section" id="required-configuration-parameters">
<h3><a class="toc-backref" href="#id3">Required configuration parameters</a><a class="headerlink" href="#required-configuration-parameters" title="Permalink to this headline"></a></h3>
<p>Here are the required Streams configuration parameters.</p>
<table border="1" class="non-scrolling-table docutils">
<thead valign="bottom">
<tr class="row-odd"><th class="head">Parameter Name</th>
<th class="head">Importance</th>
<th class="head" colspan="2">Description</th>
<th class="head">Default Value</th>
</tr>
</thead>
<tbody valign="top">
<tr class="row-even"><td>application.id</td>
<td>Required</td>
<td colspan="2">An identifier for the stream processing application. Must be unique within the Kafka cluster.</td>
<td>None</td>
</tr>
<tr class="row-odd"><td>bootstrap.servers</td>
<td>Required</td>
<td colspan="2">A list of host/port pairs to use for establishing the initial connection to the Kafka cluster.</td>
<td>None</td>
</tr>
</tbody>
</table>
<div class="section" id="application-id">
<h4><a class="toc-backref" href="#id4">application.id</a><a class="headerlink" href="#application-id" title="Permalink to this headline"></a></h4>
<blockquote>
<div><p>(Required) The application ID. Each stream processing application must have a unique ID. The same ID must be given to
all instances of the application. It is recommended to use only alphanumeric characters, <code class="docutils literal"><span class="pre">.</span></code> (dot), <code class="docutils literal"><span class="pre">-</span></code> (hyphen), and <code class="docutils literal"><span class="pre">_</span></code> (underscore). Examples: <code class="docutils literal"><span class="pre">&quot;hello_world&quot;</span></code>, <code class="docutils literal"><span class="pre">&quot;hello_world-v1.0.0&quot;</span></code></p>
<p>This ID is used in the following places to isolate resources used by the application from others:</p>
<ul class="simple">
<li>As the default Kafka consumer and producer <code class="docutils literal"><span class="pre">client.id</span></code> prefix</li>
<li>As the Kafka consumer <code class="docutils literal"><span class="pre">group.id</span></code> for coordination</li>
<li>As the name of the subdirectory in the state directory (cf. <code class="docutils literal"><span class="pre">state.dir</span></code>)</li>
<li>As the prefix of internal Kafka topic names</li>
</ul>
<dl class="docutils">
<dt>Tip:</dt>
<dd>When an application is updated, the <code class="docutils literal"><span class="pre">application.id</span></code> should be changed unless you want to reuse the existing data in internal topics and state stores.
For example, you could embed the version information within <code class="docutils literal"><span class="pre">application.id</span></code>, as <code class="docutils literal"><span class="pre">my-app-v1.0.0</span></code> and <code class="docutils literal"><span class="pre">my-app-v1.0.2</span></code>.</dd>
</dl>
</div></blockquote>
</div>
<div class="section" id="bootstrap-servers">
<h4><a class="toc-backref" href="#id5">bootstrap.servers</a><a class="headerlink" href="#bootstrap-servers" title="Permalink to this headline"></a></h4>
<blockquote>
<div><p>(Required) The Kafka bootstrap servers. This is the same <a class="reference external" href="http://kafka.apache.org/documentation.html#producerconfigs">setting</a> that is used by the underlying producer and consumer clients to connect to the Kafka cluster.
Example: <code class="docutils literal"><span class="pre">&quot;kafka-broker1:9092,kafka-broker2:9092&quot;</span></code>.</p>
<dl class="docutils">
<dt>Tip:</dt>
<dd>Kafka Streams applications can only communicate with a single Kafka cluster specified by this config value.
Future versions of Kafka Streams will support connecting to different Kafka clusters for reading input
streams and writing output streams.</dd>
</dl>
</div></blockquote>
</div>
</div>
<div class="section" id="optional-configuration-parameters">
<span id="streams-developer-guide-optional-configs"></span><h3><a class="toc-backref" href="#id6">Optional configuration parameters</a><a class="headerlink" href="#optional-configuration-parameters" title="Permalink to this headline"></a></h3>
<p>Here are the optional <a href="/{{version}}/javadoc/org/apache/kafka/streams/StreamsConfig.html">Streams</a> javadocs, sorted by level of importance:</p>
<blockquote>
<div><ul class="simple">
<li>High: These parameters can have a significant impact on performance. Take care when deciding the values of these parameters.</li>
<li>Medium: These parameters can have some impact on performance. Your specific environment will determine how much tuning effort should be focused on these parameters.</li>
<li>Low: These parameters have a less general or less significant impact on performance.</li>
</ul>
</div></blockquote>
<table border="1" class="non-scrolling-table docutils">
<thead valign="bottom">
<tr class="row-odd"><th class="head">Parameter Name</th>
<th class="head">Importance</th>
<th class="head" colspan="2">Description</th>
<th class="head">Default Value</th>
</tr>
</thead>
<tbody valign="top">
<tr class="row-even"><td>application.server</td>
<td>Low</td>
<td colspan="2">A host:port pair pointing to an embedded user defined endpoint that can be used for discovering the locations of
state stores within a single Kafka Streams application. The value of this must be different for each instance
of the application.</td>
<td>the empty string</td>
</tr>
<tr class="row-odd"><td>buffered.records.per.partition</td>
<td>Low</td>
<td colspan="2">The maximum number of records to buffer per partition.</td>
<td>1000</td>
</tr>
<tr class="row-even"><td>cache.max.bytes.buffering</td>
<td>Medium</td>
<td colspan="2">Maximum number of memory bytes to be used for record caches across all threads.</td>
<td>10485760 bytes</td>
</tr>
<tr class="row-odd"><td>client.id</td>
<td>Medium</td>
<td colspan="2">An ID string to pass to the server when making requests.
(This setting is passed to the consumer/producer clients used internally by Kafka Streams.)</td>
<td>the empty string</td>
</tr>
<tr class="row-even"><td>commit.interval.ms</td>
<td>Low</td>
<td colspan="2">The frequency with which to save the position (offsets in source topics) of tasks.</td>
<td>30000 milliseconds</td>
</tr>
<tr class="row-odd"><td>default.deserialization.exception.handler</td>
<td>Medium</td>
<td colspan="2">Exception handling class that implements the <code class="docutils literal"><span class="pre">DeserializationExceptionHandler</span></code> interface.</td>
<td><code class="docutils literal"><span class="pre">LogAndContinueExceptionHandler</span></code></td>
</tr>
<tr class="row-even"><td>default.production.exception.handler</td>
<td>Medium</td>
<td colspan="2">Exception handling class that implements the <code class="docutils literal"><span class="pre">ProductionExceptionHandler</span></code> interface.</td>
<td><code class="docutils literal"><span class="pre">DefaultProductionExceptionHandler</span></code></td>
</tr>
<tr class="row-odd"><td>key.serde</td>
<td>Medium</td>
<td colspan="2">Default serializer/deserializer class for record keys, implements the <code class="docutils literal"><span class="pre">Serde</span></code> interface (see also value.serde).</td>
<td><code class="docutils literal"><span class="pre">Serdes.ByteArray().getClass().getName()</span></code></td>
</tr>
<tr class="row-even"><td>metric.reporters</td>
<td>Low</td>
<td colspan="2">A list of classes to use as metrics reporters.</td>
<td>the empty list</td>
</tr>
<tr class="row-odd"><td>metrics.num.samples</td>
<td>Low</td>
<td colspan="2">The number of samples maintained to compute metrics.</td>
<td>2</td>
</tr>
<tr class="row-even"><td>metrics.recording.level</td>
<td>Low</td>
<td colspan="2">The highest recording level for metrics.</td>
<td><code class="docutils literal"><span class="pre">INFO</span></code></td>
</tr>
<tr class="row-odd"><td>metrics.sample.window.ms</td>
<td>Low</td>
<td colspan="2">The window of time a metrics sample is computed over.</td>
<td>30000 milliseconds</td>
</tr>
<tr class="row-even"><td>num.standby.replicas</td>
<td>Medium</td>
<td colspan="2">The number of standby replicas for each task.</td>
<td>0</td>
</tr>
<tr class="row-odd"><td>num.stream.threads</td>
<td>Medium</td>
<td colspan="2">The number of threads to execute stream processing.</td>
<td>1</td>
</tr>
<tr class="row-even"><td>partition.grouper</td>
<td>Low</td>
<td colspan="2">Partition grouper class that implements the <code class="docutils literal"><span class="pre">PartitionGrouper</span></code> interface.</td>
<td>See <a class="reference internal" href="#streams-developer-guide-partition-grouper"><span class="std std-ref">Partition Grouper</span></a></td>
</tr>
<tr class="row-even"><td>processing.guarantee</td>
<td>Low</td>
<td colspan="2">The processing mode. Can be either <code class="docutils literal"><span class="pre">"at_least_once"</span></code> (default) or <code class="docutils literal"><span class="pre">"exactly_once"</span></code>.
<td>See <a class="reference internal" href="#streams-developer-guide-processing-guarantedd"><span class="std std-ref">Processing Guarantee</span></a></td>
</tr>
<tr class="row-odd"><td>poll.ms</td>
<td>Low</td>
<td colspan="2">The amount of time in milliseconds to block waiting for input.</td>
<td>100 milliseconds</td>
</tr>
<tr class="row-even"><td>replication.factor</td>
<td>High</td>
<td colspan="2">The replication factor for changelog topics and repartition topics created by the application.</td>
<td>1</td>
</tr>
<tr class="row-even"><td>retries</td>
<td>Medium</td>
<td colspan="2">The number of retries for broker requests that return a retryable error. </td>
<td>0</td>
</tr>
<tr class="row-even"><td>retry.backoff.ms</td>
<td>Medium</td>
<td colspan="2">The amount of time in milliseconds, before a request is retried. This applies if the <code class="docutils literal"><span class="pre">retries</span></code> parameter is configured to be greater than 0. </td>
<td>100</td>
</tr>
<tr class="row-odd"><td>state.cleanup.delay.ms</td>
<td>Low</td>
<td colspan="2">The amount of time in milliseconds to wait before deleting state when a partition has migrated.</td>
<td>600000 milliseconds</td>
</tr>
<tr class="row-even"><td>state.dir</td>
<td>High</td>
<td colspan="2">Directory location for state stores.</td>
<td><code class="docutils literal"><span class="pre">/tmp/kafka-streams</span></code></td>
</tr>
<tr class="row-odd"><td>timestamp.extractor</td>
<td>Medium</td>
<td colspan="2">Timestamp extractor class that implements the <code class="docutils literal"><span class="pre">TimestampExtractor</span></code> interface.</td>
<td>See <a class="reference internal" href="#streams-developer-guide-timestamp-extractor"><span class="std std-ref">Timestamp Extractor</span></a></td>
</tr>
<tr class="row-even"><td>upgrade.from</td>
<td>Medium</td>
<td colspan="2">The version you are upgrading from during a rolling upgrade.</td>
<td>See <a class="reference internal" href="#streams-developer-guide-upgrade-from"><span class="std std-ref">Upgrade From</span></a></td>
</tr>
<tr class="row-odd"><td>value.serde</td>
<td>Medium</td>
<td colspan="2">Default serializer/deserializer class for record values, implements the <code class="docutils literal"><span class="pre">Serde</span></code> interface (see also key.serde).</td>
<td><code class="docutils literal"><span class="pre">Serdes.ByteArray().getClass().getName()</span></code></td>
</tr>
<tr class="row-even"><td>windowstore.changelog.additional.retention.ms</td>
<td>Low</td>
<td colspan="2">Added to a windows maintainMs to ensure data is not deleted from the log prematurely. Allows for clock drift.</td>
<td>86400000 milliseconds = 1 day</td>
</tr>
</tbody>
</table>
<div class="section" id="default-deserialization-exception-handler">
<span id="streams-developer-guide-deh"></span><h4><a class="toc-backref" href="#id7">default.deserialization.exception.handler</a><a class="headerlink" href="#default-deserialization-exception-handler" title="Permalink to this headline"></a></h4>
<blockquote>
<div><p>The default deserialization exception handler allows you to manage record exceptions that fail to deserialize. This
can be caused by corrupt data, incorrect serialization logic, or unhandled record types. The implemented exception
handler needs to return a <code>FAIL</code> or <code>CONTINUE</code> depending on the record and the exception thrown. Returning
<code>FAIL</code> will signal that Streams should shut down and <code>CONTINUE</code> will signal that Streams should ignore the issue
and continue processing. The following library built-in exception handlers are available:</p>
<ul class="simple">
<li><a class="reference external" href="/{{version}}/javadoc/org/apache/kafka/streams/errors/LogAndContinueExceptionHandler.html">LogAndContinueExceptionHandler</a>:
This handler logs the deserialization exception and then signals the processing pipeline to continue processing more records.
This log-and-skip strategy allows Kafka Streams to make progress instead of failing if there are records that fail
to deserialize.</li>
<li><a class="reference external" href="/{{version}}/javadoc/org/apache/kafka/streams/errors/LogAndFailExceptionHandler.html">LogAndFailExceptionHandler</a>.
This handler logs the deserialization exception and then signals the processing pipeline to stop processing more records.</li>
</ul>
<p>You can also provide your own customized exception handler besides the library provided ones to meet your needs. For example, you can choose to forward corrupt
records into a quarantine topic (think: a "dead letter queue") for further processing. To do this, use the Producer API to write a corrupted record directly to
the quarantine topic. To be more concrete, you can create a separate <code>KafkaProducer</code> object outside the Streams client, and pass in this object
as well as the dead letter queue topic name into the <code>Properties</code> map, which then can be retrieved from the <code>configure</code> function call.
The drawback of this approach is that "manual" writes are side effects that are invisible to the Kafka Streams runtime library,
so they do not benefit from the end-to-end processing guarantees of the Streams API:</p>
<pre class="brush: java;">
public class SendToDeadLetterQueueExceptionHandler implements DeserializationExceptionHandler {
KafkaProducer&lt;byte[], byte[]&gt; dlqProducer;
String dlqTopic;
@Override
public DeserializationHandlerResponse handle(final ProcessorContext context,
final ConsumerRecord&lt;byte[], byte[]&gt; record,
final Exception exception) {
log.warn("Exception caught during Deserialization, sending to the dead queue topic; " +
"taskId: {}, topic: {}, partition: {}, offset: {}",
context.taskId(), record.topic(), record.partition(), record.offset(),
exception);
dlqProducer.send(new ProducerRecord&lt;&gt;(dlqTopic, record.timestamp(), record.key(), record.value(), record.headers())).get();
return DeserializationHandlerResponse.CONTINUE;
}
@Override
public void configure(final Map&lt;String, ?&gt; configs) {
dlqProducer = .. // get a producer from the configs map
dlqTopic = .. // get the topic name from the configs map
}
}
</pre>
</div></blockquote>
</div>
<div class="section" id="default-production-exception-handler">
<span id="streams-developer-guide-peh"></span><h4><a class="toc-backref" href="#id24">default.production.exception.handler</a><a class="headerlink" href="#default-production-exception-handler" title="Permalink to this headline"></a></h4>
<blockquote>
<div><p>The default production exception handler allows you to manage exceptions triggered when trying to interact with a broker
such as attempting to produce a record that is too large. By default, Kafka provides and uses the <a class="reference external" href="/{{version}}/javadoc/org/apache/kafka/streams/errors/DefaultProductionExceptionHandler.html">DefaultProductionExceptionHandler</a>
that always fails when these exceptions occur.</p>
<p>Each exception handler can return a <code>FAIL</code> or <code>CONTINUE</code> depending on the record and the exception thrown. Returning <code>FAIL</code> will signal that Streams should shut down and <code>CONTINUE</code> will signal that Streams
should ignore the issue and continue processing. If you want to provide an exception handler that always ignores records that are too large, you could implement something like the following:</p>
<pre class="brush: java;">
import java.util.Properties;
import org.apache.kafka.streams.StreamsConfig;
import org.apache.kafka.common.errors.RecordTooLargeException;
import org.apache.kafka.streams.errors.ProductionExceptionHandler;
import org.apache.kafka.streams.errors.ProductionExceptionHandler.ProductionExceptionHandlerResponse;
public class IgnoreRecordTooLargeHandler implements ProductionExceptionHandler {
public void configure(Map&lt;String, Object&gt; config) {}
public ProductionExceptionHandlerResponse handle(final ProducerRecord&lt;byte[], byte[]&gt; record,
final Exception exception) {
if (exception instanceof RecordTooLargeException) {
return ProductionExceptionHandlerResponse.CONTINUE;
} else {
return ProductionExceptionHandlerResponse.FAIL;
}
}
}
Properties settings = new Properties();
// other various kafka streams settings, e.g. bootstrap servers, application id, etc
settings.put(StreamsConfig.DEFAULT_PRODUCTION_EXCEPTION_HANDLER_CLASS_CONFIG,
IgnoreRecordTooLargeHandler.class);</pre></div>
</blockquote>
</div>
<div class="section" id="default-key-serde">
<h4><a class="toc-backref" href="#id8">default.key.serde</a><a class="headerlink" href="#default-key-serde" title="Permalink to this headline"></a></h4>
<blockquote>
<div><p>The default Serializer/Deserializer class for record keys. Serialization and deserialization in Kafka Streams happens
whenever data needs to be materialized, for example:</p>
<blockquote>
<div><ul class="simple">
<li>Whenever data is read from or written to a <em>Kafka topic</em> (e.g., via the <code class="docutils literal"><span class="pre">StreamsBuilder#stream()</span></code> and <code class="docutils literal"><span class="pre">KStream#to()</span></code> methods).</li>
<li>Whenever data is read from or written to a <em>state store</em>.</li>
</ul>
<p>This is discussed in more detail in <a class="reference internal" href="datatypes.html#streams-developer-guide-serdes"><span class="std std-ref">Data types and serialization</span></a>.</p>
</div></blockquote>
</div></blockquote>
</div>
<div class="section" id="default-value-serde">
<h4><a class="toc-backref" href="#id9">default.value.serde</a><a class="headerlink" href="#default-value-serde" title="Permalink to this headline"></a></h4>
<blockquote>
<div><p>The default Serializer/Deserializer class for record values. Serialization and deserialization in Kafka Streams
happens whenever data needs to be materialized, for example:</p>
<ul class="simple">
<li>Whenever data is read from or written to a <em>Kafka topic</em> (e.g., via the <code class="docutils literal"><span class="pre">StreamsBuilder#stream()</span></code> and <code class="docutils literal"><span class="pre">KStream#to()</span></code> methods).</li>
<li>Whenever data is read from or written to a <em>state store</em>.</li>
</ul>
<p>This is discussed in more detail in <a class="reference internal" href="datatypes.html#streams-developer-guide-serdes"><span class="std std-ref">Data types and serialization</span></a>.</p>
</div></blockquote>
</div>
<div class="section" id="num-standby-replicas">
<span id="streams-developer-guide-standby-replicas"></span><h4><a class="toc-backref" href="#id10">num.standby.replicas</a><a class="headerlink" href="#num-standby-replicas" title="Permalink to this headline"></a></h4>
<blockquote>
<div>The number of standby replicas. Standby replicas are shadow copies of local state stores. Kafka Streams attempts to create the
specified number of replicas per store and keep them up to date as long as there are enough instances running.
Standby replicas are used to minimize the latency of task failover. A task that was previously running on a failed instance is
preferred to restart on an instance that has standby replicas so that the local state store restoration process from its
changelog can be minimized. Details about how Kafka Streams makes use of the standby replicas to minimize the cost of
resuming tasks on failover can be found in the <a class="reference internal" href="../architecture.html#streams_architecture_state"><span class="std std-ref">State</span></a> section.</div></blockquote>
</div>
<div class="admonition note">
<p class="first admonition-title">Note</p>
<p class="last">If you enable <cite>n</cite> standby tasks, you need to provision <cite>n+1</cite> <code class="docutils literal"><span class="pre">KafkaStreams</span></code>
instances.</p>
</div>
<div class="section" id="num-stream-threads">
<h4><a class="toc-backref" href="#id11">num.stream.threads</a><a class="headerlink" href="#num-stream-threads" title="Permalink to this headline"></a></h4>
<blockquote>
<div>This specifies the number of stream threads in an instance of the Kafka Streams application. The stream processing code runs in these thread.
For more information about Kafka Streams threading model, see <a class="reference internal" href="../architecture.html#streams_architecture_threads"><span class="std std-ref">Threading Model</span></a>.</div></blockquote>
</div>
<div class="section" id="partition-grouper">
<span id="streams-developer-guide-partition-grouper"></span><h4><a class="toc-backref" href="#id12">partition.grouper</a><a class="headerlink" href="#partition-grouper" title="Permalink to this headline"></a></h4>
<blockquote>
<div>A partition grouper creates a list of stream tasks from the partitions of source topics, where each created task is assigned with a group of source topic partitions.
The default implementation provided by Kafka Streams is <a class="reference external" href="/{{version}}/javadoc/org/apache/kafka/streams/processor/DefaultPartitionGrouper.html">DefaultPartitionGrouper</a>.
It assigns each task with one partition for each of the source topic partitions. The generated number of tasks equals the largest
number of partitions among the input topics. Usually an application does not need to customize the partition grouper.</div></blockquote>
</div>
<div class="section" id="processing-guarantee">
<span id="streams-developer-guide-processing-guarantee"></span><h4><a class="toc-backref" href="#id25">processing.guarantee</a><a class="headerlink" href="#processing-guarantee" title="Permalink to this headline"></a></h4>
<blockquote>
<div>The processing guarantee that should be used. Possible values are <code class="docutils literal"><span class="pre">"at_least_once"</span></code> (default) and <code class="docutils literal"><span class="pre">"exactly_once"</span></code>.
Note that if exactly-once processing is enabled, the default for parameter <code class="docutils literal"><span class="pre">commit.interval.ms</span></code> changes to 100ms.
Additionally, consumers are configured with <code class="docutils literal"><span class="pre">isolation.level="read_committed"</span></code>
and producers are configured with <code class="docutils literal"><span class="pre">retries=Integer.MAX_VALUE</span></code>, <code class="docutils literal"><span class="pre">enable.idempotence=true</span></code>,
and <code class="docutils literal"><span class="pre">max.in.flight.requests.per.connection=1</span></code> per default.
Note that by default exactly-once processing requires a cluster of at least three brokers what is the recommended setting for production.
For development you can change this, by adjusting broker setting <code class="docutils literal"><span class="pre">transaction.state.log.replication.factor</span></code> and <code class="docutils literal"><span class="pre">transaction.state.log.min.isr</span></code> to the number of broker you want to use.
For more details see <a href="../core-concepts#streams_processing_guarantee">Processing Guarantees</a>.
</div></blockquote>
</div>
<div class="section" id="replication-factor">
<span id="replication-factor-parm"></span><h4><a class="toc-backref" href="#id13">replication.factor</a><a class="headerlink" href="#replication-factor" title="Permalink to this headline"></a></h4>
<blockquote>
<div><p>This specifies the replication factor of internal topics that Kafka Streams creates when local states are used or a stream is
repartitioned for aggregation. Replication is important for fault tolerance. Without replication even a single broker failure
may prevent progress of the stream processing application. It is recommended to use a similar replication factor as source topics.</p>
<dl class="docutils">
<dt>Recommendation:</dt>
<dd>Increase the replication factor to 3 to ensure that the internal Kafka Streams topic can tolerate up to 2 broker failures.
Note that you will require more storage space as well (3 times more with the replication factor of 3).</dd>
</dl>
</div></blockquote>
</div>
<div class="section" id="state-dir">
<h4><a class="toc-backref" href="#id14">state.dir</a><a class="headerlink" href="#state-dir" title="Permalink to this headline"></a></h4>
<blockquote>
<div>The state directory. Kafka Streams persists local states under the state directory. Each application has a subdirectory on its hosting
machine that is located under the state directory. The name of the subdirectory is the application ID. The state stores associated
with the application are created under this subdirectory. When running multiple instances of the same application on a single machine,
this path must be unique for each such instance.</div>
</blockquote>
</div>
<div class="section" id="timestamp-extractor">
<span id="streams-developer-guide-timestamp-extractor"></span><h4><a class="toc-backref" href="#id15">timestamp.extractor</a><a class="headerlink" href="#timestamp-extractor" title="Permalink to this headline"></a></h4>
<blockquote>
<div><p>A timestamp extractor pulls a timestamp from an instance of <a class="reference external" href="/{{version}}/javadoc/org/apache/kafka/clients/consumer/ConsumerRecord.html">ConsumerRecord</a>.
Timestamps are used to control the progress of streams.</p>
<p>The default extractor is
<a class="reference external" href="/{{version}}/javadoc/org/apache/kafka/streams/processor/FailOnInvalidTimestamp.html">FailOnInvalidTimestamp</a>.
This extractor retrieves built-in timestamps that are automatically embedded into Kafka messages by the Kafka producer
client since
<a class="reference external" href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-32+-+Add+timestamps+to+Kafka+message">Kafka version 0.10</a>.
Depending on the setting of Kafka&#8217;s server-side <code class="docutils literal"><span class="pre">log.message.timestamp.type</span></code> broker and <code class="docutils literal"><span class="pre">message.timestamp.type</span></code> topic parameters,
this extractor provides you with:</p>
<ul class="simple">
<li><strong>event-time</strong> processing semantics if <code class="docutils literal"><span class="pre">log.message.timestamp.type</span></code> is set to <code class="docutils literal"><span class="pre">CreateTime</span></code> aka &#8220;producer time&#8221;
(which is the default). This represents the time when a Kafka producer sent the original message. If you use Kafka&#8217;s
official producer client, the timestamp represents milliseconds since the epoch.</li>
<li><strong>ingestion-time</strong> processing semantics if <code class="docutils literal"><span class="pre">log.message.timestamp.type</span></code> is set to <code class="docutils literal"><span class="pre">LogAppendTime</span></code> aka &#8220;broker
time&#8221;. This represents the time when the Kafka broker received the original message, in milliseconds since the epoch.</li>
</ul>
<p>The <code class="docutils literal"><span class="pre">FailOnInvalidTimestamp</span></code> extractor throws an exception if a record contains an invalid (i.e. negative) built-in
timestamp, because Kafka Streams would not process this record but silently drop it. Invalid built-in timestamps can
occur for various reasons: if for example, you consume a topic that is written to by pre-0.10 Kafka producer clients
or by third-party producer clients that don&#8217;t support the new Kafka 0.10 message format yet; another situation where
this may happen is after upgrading your Kafka cluster from <code class="docutils literal"><span class="pre">0.9</span></code> to <code class="docutils literal"><span class="pre">0.10</span></code>, where all the data that was generated
with <code class="docutils literal"><span class="pre">0.9</span></code> does not include the <code class="docutils literal"><span class="pre">0.10</span></code> message timestamps.</p>
<p>If you have data with invalid timestamps and want to process it, then there are two alternative extractors available.
Both work on built-in timestamps, but handle invalid timestamps differently.</p>
<ul class="simple">
<li><a class="reference external" href="/{{version}}/javadoc/org/apache/kafka/streams/processor/LogAndSkipOnInvalidTimestamp.html">LogAndSkipOnInvalidTimestamp</a>:
This extractor logs a warn message and returns the invalid timestamp to Kafka Streams, which will not process but
silently drop the record.
This log-and-skip strategy allows Kafka Streams to make progress instead of failing if there are records with an
invalid built-in timestamp in your input data.</li>
<li><a class="reference external" href="/{{version}}/javadoc/org/apache/kafka/streams/processor/UsePartitionTimeOnInvalidTimestamp.html">UsePartitionTimeOnInvalidTimestamp</a>.
This extractor returns the record&#8217;s built-in timestamp if it is valid (i.e. not negative). If the record does not
have a valid built-in timestamps, the extractor returns the previously extracted valid timestamp from a record of the
same topic partition as the current record as a timestamp estimation. In case that no timestamp can be estimated, it
throws an exception.</li>
</ul>
<p>Another built-in extractor is
<a class="reference external" href="/{{version}}/javadoc/org/apache/kafka/streams/processor/WallclockTimestampExtractor.html">WallclockTimestampExtractor</a>.
This extractor does not actually &#8220;extract&#8221; a timestamp from the consumed record but rather returns the current time in
milliseconds from the system clock (think: <code class="docutils literal"><span class="pre">System.currentTimeMillis()</span></code>), which effectively means Streams will operate
on the basis of the so-called <strong>processing-time</strong> of events.</p>
<p>You can also provide your own timestamp extractors, for instance to retrieve timestamps embedded in the payload of
messages. If you cannot extract a valid timestamp, you can either throw an exception, return a negative timestamp, or
estimate a timestamp. Returning a negative timestamp will result in data loss &#8211; the corresponding record will not be
processed but silently dropped. If you want to estimate a new timestamp, you can use the value provided via
<code class="docutils literal"><span class="pre">previousTimestamp</span></code> (i.e., a Kafka Streams timestamp estimation). Here is an example of a custom
<code class="docutils literal"><span class="pre">TimestampExtractor</span></code> implementation:</p>
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">org.apache.kafka.clients.consumer.ConsumerRecord</span><span class="o">;</span>
<span class="kn">import</span> <span class="nn">org.apache.kafka.streams.processor.TimestampExtractor</span><span class="o">;</span>
<span class="c1">// Extracts the embedded timestamp of a record (giving you &quot;event-time&quot; semantics).</span>
<span class="kd">public</span> <span class="kd">class</span> <span class="nc">MyEventTimeExtractor</span> <span class="kd">implements</span> <span class="n">TimestampExtractor</span> <span class="o">{</span>
<span class="nd">@Override</span>
<span class="kd">public</span> <span class="kt">long</span> <span class="nf">extract</span><span class="o">(</span><span class="kd">final</span> <span class="n">ConsumerRecord</span><span class="o">&lt;</span><span class="n">Object</span><span class="o">,</span> <span class="n">Object</span><span class="o">&gt;</span> <span class="n">record</span><span class="o">,</span> <span class="kd">final</span> <span class="kt">long</span> <span class="n">previousTimestamp</span><span class="o">)</span> <span class="o">{</span>
<span class="c1">// `Foo` is your own custom class, which we assume has a method that returns</span>
<span class="c1">// the embedded timestamp (milliseconds since midnight, January 1, 1970 UTC).</span>
<span class="kt">long</span> <span class="n">timestamp</span> <span class="o">=</span> <span class="o">-</span><span class="mi">1</span><span class="o">;</span>
<span class="kd">final</span> <span class="n">Foo</span> <span class="n">myPojo</span> <span class="o">=</span> <span class="o">(</span><span class="n">Foo</span><span class="o">)</span> <span class="n">record</span><span class="o">.</span><span class="na">value</span><span class="o">();</span>
<span class="k">if</span> <span class="o">(</span><span class="n">myPojo</span> <span class="o">!=</span> <span class="kc">null</span><span class="o">)</span> <span class="o">{</span>
<span class="n">timestamp</span> <span class="o">=</span> <span class="n">myPojo</span><span class="o">.</span><span class="na">getTimestampInMillis</span><span class="o">();</span>
<span class="o">}</span>
<span class="k">if</span> <span class="o">(</span><span class="n">timestamp</span> <span class="o">&lt;</span> <span class="mi">0</span><span class="o">)</span> <span class="o">{</span>
<span class="c1">// Invalid timestamp! Attempt to estimate a new timestamp,</span>
<span class="c1">// otherwise fall back to wall-clock time (processing-time).</span>
<span class="k">if</span> <span class="o">(</span><span class="n">previousTimestamp</span> <span class="o">&gt;=</span> <span class="mi">0</span><span class="o">)</span> <span class="o">{</span>
<span class="k">return</span> <span class="n">previousTimestamp</span><span class="o">;</span>
<span class="o">}</span> <span class="k">else</span> <span class="o">{</span>
<span class="k">return</span> <span class="n">System</span><span class="o">.</span><span class="na">currentTimeMillis</span><span class="o">();</span>
<span class="o">}</span>
<span class="o">}</span>
<span class="o">}</span>
<span class="o">}</span>
</pre></div>
</div>
<p>You would then define the custom timestamp extractor in your Streams configuration as follows:</p>
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">java.util.Properties</span><span class="o">;</span>
<span class="kn">import</span> <span class="nn">org.apache.kafka.streams.StreamsConfig</span><span class="o">;</span>
<span class="n">Properties</span> <span class="n">streamsConfiguration</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Properties</span><span class="o">();</span>
<span class="n">streamsConfiguration</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">StreamsConfig</span><span class="o">.</span><span class="na">DEFAULT_TIMESTAMP_EXTRACTOR_CLASS_CONFIG</span><span class="o">,</span> <span class="n">MyEventTimeExtractor</span><span class="o">.</span><span class="na">class</span><span class="o">);</span>
</pre></div>
</div>
</div></blockquote>
</div>
<div class="section" id="upgrade-from">
<h4><a class="toc-backref" href="#id14">upgrade.from</a><a class="headerlink" href="#upgrade-from" title="Permalink to this headline"></a></h4>
<blockquote>
<div>
The version you are upgrading from. It is important to set this config when performing a rolling upgrade to certain versions, as described in the upgrade guide.
You should set this config to the appropriate version before bouncing your instances and upgrading them to the newer version. Once everyone is on the
newer version, you should remove this config and do a second rolling bounce. It is only necessary to set this config and follow the two-bounce upgrade path
when upgrading from below version 2.0, or when upgrading to 2.4+ from any version lower than 2.4.
</div>
</blockquote>
</div>
</div>
<div class="section" id="kafka-consumers-and-producer-configuration-parameters">
<h3><a class="toc-backref" href="#id16">Kafka consumers, producer and admin client configuration parameters</a><a class="headerlink" href="#kafka-consumers-and-producer-configuration-parameters" title="Permalink to this headline"></a></h3>
<p>You can specify parameters for the Kafka <a class="reference external" href="/{{version}}/javadoc/org/apache/kafka/clients/consumer/package-summary.html">consumers</a>, <a class="reference external" href="/{{version}}/javadoc/org/apache/kafka/clients/producer/package-summary.html">producers</a>,
and <a class="reference external" href="/{{version}}/javadoc/org/apache/kafka/kafka/clients/admin/package-summary.html">admin client</a> that are used internally.
The consumer, producer and admin client settings are defined by specifying parameters in a <code class="docutils literal"><span class="pre">StreamsConfig</span></code> instance.</p>
<p>In this example, the Kafka <a class="reference external" href="/{{version}}/javadoc/org/apache/kafka/clients/consumer/ConsumerConfig.html#SESSION_TIMEOUT_MS_CONFIG">consumer session timeout</a> is configured to be 60000 milliseconds in the Streams settings:</p>
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="n">Properties</span> <span class="n">streamsSettings</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Properties</span><span class="o">();</span>
<span class="c1">// Example of a &quot;normal&quot; setting for Kafka Streams</span>
<span class="n">streamsSettings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">StreamsConfig</span><span class="o">.</span><span class="na">BOOTSTRAP_SERVERS_CONFIG</span><span class="o">,</span> <span class="s">&quot;kafka-broker-01:9092&quot;</span><span class="o">);</span>
<span class="c1">// Customize the Kafka consumer settings of your Streams application</span>
<span class="n">streamsSettings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">ConsumerConfig</span><span class="o">.</span><span class="na">SESSION_TIMEOUT_MS_CONFIG</span><span class="o">,</span> <span class="mi">60000</span><span class="o">);</span>
</pre></div>
</div>
<div class="section" id="naming">
<h4><a class="toc-backref" href="#id17">Naming</a><a class="headerlink" href="#naming" title="Permalink to this headline"></a></h4>
<p>Some consumer, producer and admin client configuration parameters use the same parameter name, and Kafka Streams library itself also uses some parameters that share the same name with its embedded client. For example, <code class="docutils literal"><span class="pre">send.buffer.bytes</span></code> and
<code class="docutils literal"><span class="pre">receive.buffer.bytes</span></code> are used to configure TCP buffers; <code class="docutils literal"><span class="pre">request.timeout.ms</span></code> and <code class="docutils literal"><span class="pre">retry.backoff.ms</span></code> control retries for client request;
<code class="docutils literal"><span class="pre">retries</span></code> are used to configure how many retries are allowed when handling retriable errors from broker request responses.
You can avoid duplicate names by prefix parameter names with <code class="docutils literal"><span class="pre">consumer.</span></code>, <code class="docutils literal"><span class="pre">producer.</span></code>, or <code class="docutils literal"><span class="pre">admin.</span></code> (e.g., <code class="docutils literal"><span class="pre">consumer.send.buffer.bytes</span></code> and <code class="docutils literal"><span class="pre">producer.send.buffer.bytes</span></code>).</p>
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="n">Properties</span> <span class="n">streamsSettings</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Properties</span><span class="o">();</span>
<span class="c1">// same value for consumer, producer, and admin client</span>
<span class="n">streamsSettings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="s">&quot;PARAMETER_NAME&quot;</span><span class="o">,</span> <span class="s">&quot;value&quot;</span><span class="o">);</span>
<span class="c1">// different values for consumer and producer</span>
<span class="n">streamsSettings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="s">&quot;consumer.PARAMETER_NAME&quot;</span><span class="o">,</span> <span class="s">&quot;consumer-value&quot;</span><span class="o">);</span>
<span class="n">streamsSettings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="s">&quot;producer.PARAMETER_NAME&quot;</span><span class="o">,</span> <span class="s">&quot;producer-value&quot;</span><span class="o">);</span>
<span class="n">streamsSettings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="s">&quot;admin.PARAMETER_NAME&quot;</span><span class="o">,</span> <span class="s">&quot;admin-value&quot;</span><span class="o">);</span>
<span class="c1">// alternatively, you can use</span>
<span class="n">streamsSettings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">StreamsConfig</span><span class="o">.</span><span class="na">consumerPrefix</span><span class="o">(</span><span class="s">&quot;PARAMETER_NAME&quot;</span><span class="o">),</span> <span class="s">&quot;consumer-value&quot;</span><span class="o">);</span>
<span class="n">streamsSettings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">StreamsConfig</span><span class="o">.</span><span class="na">producerPrefix</span><span class="o">(</span><span class="s">&quot;PARAMETER_NAME&quot;</span><span class="o">),</span> <span class="s">&quot;producer-value&quot;</span><span class="o">);</span>
<span class="n">streamsSettings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">StreamsConfig</span><span class="o">.</span><span class="na">adminClientPrefix</span><span class="o">(</span><span class="s">&quot;PARAMETER_NAME&quot;</span><span class="o">),</span> <span class="s">&quot;admin-value&quot;</span><span class="o">);</span>
</pre></div>
<p>You could further separate consumer configuration by adding different prefixes:</p>
<ul class="simple">
<li><code class="docutils literal"><span class="pre">main.consumer.</span></code> for main consumer which is the default consumer of stream source.</li>
<li><code class="docutils literal"><span class="pre">restore.consumer.</span></code> for restore consumer which is in charge of state store recovery.</li>
<li><code class="docutils literal"><span class="pre">global.consumer.</span></code> for global consumer which is used in global KTable construction.</li>
</ul>
<p>For example, if you only want to set restore consumer config without touching other consumers' settings, you could simply use <code class="docutils literal"><span class="pre">restore.consumer.</span></code> to set the config.</p>
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="n">Properties</span> <span class="n">streamsSettings</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Properties</span><span class="o">();</span>
<span class="c1">// same config value for all consumer types</span>
<span class="n">streamsSettings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="s">&quot;consumer.PARAMETER_NAME&quot;</span><span class="o">,</span> <span class="s">&quot;general-consumer-value&quot;</span><span class="o">);</span>
<span class="c1">// set a different restore consumer config. This would make restore consumer take restore-consumer-value,</span>
<span>// while main consumer and global consumer stay with general-consumer-value</span>
<span class="n">streamsSettings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="s">&quot;restore.consumer.PARAMETER_NAME&quot;</span><span class="o">,</span> <span class="s">&quot;restore-consumer-value&quot;</span><span class="o">);</span>
<span class="c1">// alternatively, you can use</span>
<span class="n">streamsSettings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">StreamsConfig</span><span class="o">.</span><span class="na">restoreConsumerPrefix</span><span class="o">(</span><span class="s">&quot;PARAMETER_NAME&quot;</span><span class="o">),</span> <span class="s">&quot;restore-consumer-value&quot;</span><span class="o">);</span>
</pre></div>
</div>
<p> Same applied to <code class="docutils literal"><span class="pre">main.consumer.</span></code> and <code class="docutils literal"><span class="pre">main.consumer.</span></code>, if you only want to specify one consumer type config.</p>
<p> Additionally, to configure the internal repartition/changelog topics, you could use the <code class="docutils literal"><span class="pre">topic.</span></code> prefix, followed by any of the standard topic configs.</p>
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="n">Properties</span> <span class="n">streamsSettings</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Properties</span><span class="o">();</span>
<span class="c1">// Override default for both changelog and repartition topics</span>
<span class="n">streamsSettings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="s">&quot;topic.PARAMETER_NAME&quot;</span><span class="o">,</span> <span class="s">&quot;topic-value&quot;</span><span class="o">);</span>
<span class="c1">// alternatively, you can use</span>
<span class="n">streamsSettings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">StreamsConfig</span><span class="o">.</span><span class="na">topicPrefix</span><span class="o">(</span><span class="s">&quot;PARAMETER_NAME&quot;</span><span class="o">),</span> <span class="s">&quot;topic-value&quot;</span><span class="o">);</span>
</pre></div>
</div>
</div>
</div>
<div class="section" id="default-values">
<h4><a class="toc-backref" href="#id18">Default Values</a><a class="headerlink" href="#default-values" title="Permalink to this headline"></a></h4>
<p>Kafka Streams uses different default values for some of the underlying client configs, which are summarized below. For detailed descriptions
of these configs, see <a class="reference external" href="http://kafka.apache.org/0100/documentation.html#producerconfigs">Producer Configs</a>
and <a class="reference external" href="http://kafka.apache.org/0100/documentation.html#newconsumerconfigs">Consumer Configs</a>.</p>
<table border="1" class="non-scrolling-table docutils">
<thead valign="bottom">
<tr class="row-odd"><th class="head">Parameter Name</th>
<th class="head">Corresponding Client</th>
<th class="head">Streams Default</th>
</tr>
</thead>
<tbody valign="top">
<tr class="row-even"><td>auto.offset.reset</td>
<td>Global Consumer</td>
<td>none (cannot be changed)</td>
</tr>
<tr class="row-even"><td>auto.offset.reset</td>
<td>Restore Consumer</td>
<td>none (cannot be changed)</td>
</tr>
<tr class="row-even"><td>auto.offset.reset</td>
<td>Consumer</td>
<td>earliest</td>
</tr>
<tr class="row-odd"><td>enable.auto.commit</td>
<td>Consumer</td>
<td>false</td>
</tr>
<tr class="row-even"><td>linger.ms</td>
<td>Producer</td>
<td>100</td>
</tr>
<tr class="row-odd"><td>max.poll.interval.ms</td>
<td>Consumer</td>
<td>Integer.MAX_VALUE</td>
</tr>
<tr class="row-even"><td>max.poll.records</td>
<td>Consumer</td>
<td>1000</td>
</tr>
<tr class="row-odd"><td>rocksdb.config.setter</td>
<td>Consumer</td>
<td>&nbsp;</td>
</tr>
</tbody>
</table>
</div>
<div class="section" id="enable-auto-commit">
<span id="streams-developer-guide-consumer-auto-commit"></span><h4><a class="toc-backref" href="#id19">enable.auto.commit</a><a class="headerlink" href="#enable-auto-commit" title="Permalink to this headline"></a></h4>
<blockquote>
<div>The consumer auto commit. To guarantee at-least-once processing semantics and turn off auto commits, Kafka Streams overrides this consumer config
value to <code class="docutils literal"><span class="pre">false</span></code>. Consumers will only commit explicitly via <em>commitSync</em> calls when the Kafka Streams library or a user decides
to commit the current processing state.</div></blockquote>
</div>
<div class="section" id="rocksdb-config-setter">
<span id="streams-developer-guide-rocksdb-config"></span><h4><a class="toc-backref" href="#id20">rocksdb.config.setter</a><a class="headerlink" href="#rocksdb-config-setter" title="Permalink to this headline"></a></h4>
<blockquote>
<div><p>The RocksDB configuration. Kafka Streams uses RocksDB as the default storage engine for persistent stores. To change the default
configuration for RocksDB, implement <code class="docutils literal"><span class="pre">RocksDBConfigSetter</span></code> and provide your custom class via <a class="reference external" href="/{{version}}/javadoc/org/apache/kafka/streams/state/RocksDBConfigSetter.html">rocksdb.config.setter</a>.</p>
<p>Here is an example that adjusts the memory size consumed by RocksDB.</p>
<div class="highlight-java"><div class="highlight"><pre><span></span> <span class="kd">public</span> <span class="kd">static</span> <span class="kd">class</span> <span class="nc">CustomRocksDBConfig</span> <span class="kd">implements</span> <span class="n">RocksDBConfigSetter</span> <span class="o">{</span>
<span class="c1">// This object should be a member variable so it can be closed in RocksDBConfigSetter#close.</span>
<span class="kd">private</span> <span class="n">org.rocksdb.Cache</span> <span class="n">cache</span> <span class="o">=</span> <span class="k">new</span> <span class="n">org</span><span class="o">.</span><span class="na">rocksdb</span><span class="o">.</span><span class="na">LRUCache</span><span class="o">(</span><span class="mi">16</span> <span class="o">*</span> <span class="mi">1024L</span> <span class="o">*</span> <span class="mi">1024L</span><span class="o">);</span>
<span class="nd">@Override</span>
<span class="kd">public</span> <span class="kt">void</span> <span class="nf">setConfig</span><span class="o">(</span><span class="kd">final</span> <span class="n">String</span> <span class="n">storeName</span><span class="o">,</span> <span class="kd">final</span> <span class="n">Options</span> <span class="n">options</span><span class="o">,</span> <span class="kd">final</span> <span class="n">Map</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Object</span><span class="o">&gt;</span> <span class="n">configs</span><span class="o">)</span> <span class="o">{</span>
<span class="c1">// See #1 below.</span>
<span class="n">BlockBasedTableConfig</span> <span class="n">tableConfig</span> <span class="o">=</span> <span class="k">(BlockBasedTableConfig)</span> <span class="n">options</span><span><span class="o">.</span><span class="na">tableFormatConfig</span><span class="o">();</span>
<span class="n">tableConfig</span><span class="o">.</span><span class="na">setBlockCache</span><span class="o">(</span><span class="mi">cache</span></span><span class="o">);</span>
<span class="c1">// See #2 below.</span>
<span class="n">tableConfig</span><span class="o">.</span><span class="na">setBlockSize</span><span class="o">(</span><span class="mi">16</span> <span class="o">*</span> <span class="mi">1024L</span><span class="o">);</span>
<span class="c1">// See #3 below.</span>
<span class="n">tableConfig</span><span class="o">.</span><span class="na">setCacheIndexAndFilterBlocks</span><span class="o">(</span><span class="kc">true</span><span class="o">);</span>
<span class="n">options</span><span class="o">.</span><span class="na">setTableFormatConfig</span><span class="o">(</span><span class="n">tableConfig</span><span class="o">);</span>
<span class="c1">// See #4 below.</span>
<span class="n">options</span><span class="o">.</span><span class="na">setMaxWriteBufferNumber</span><span class="o">(</span><span class="mi">2</span><span class="o">);</span>
<span class="o">}</span>
<span class="nd">@Override</span>
<span class="kd">public</span> <span class="kt">void</span> <span class="nf">close</span><span class="o">(</span><span class="kd">final</span> <span class="n">String</span> <span class="n">storeName</span><span class="o">,</span> <span class="kd">final</span> <span class="n">Options</span> <span class="n">options</span><span class="o">)</span> <span class="o">{</span>
<span class="c1">// See #5 below.</span>
<span class="n">cache</span><span class="o">.</span><span class="na">close</span><span class="o">();</span>
<span class="o">}</span>
<span class="o">}</span>
<span class="n">Properties</span> <span class="n">streamsSettings</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Properties</span><span class="o">();</span>
<span class="n">streamsConfig</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">StreamsConfig</span><span class="o">.</span><span class="na">ROCKSDB_CONFIG_SETTER_CLASS_CONFIG</span><span class="o">,</span> <span class="n">CustomRocksDBConfig</span><span class="o">.</span><span class="na">class</span><span class="o">);</span>
</pre></div>
</div>
<dl class="docutils">
<dt>Notes for example:</dt>
<dd><ol class="first last arabic simple">
<li><code class="docutils literal"><span class="pre">BlockBasedTableConfig tableConfig = (BlockBasedTableConfig) options.tableFormatConfig();</span></code> Get a reference to the existing table config rather than create a new one, so you don't accidentally overwrite defaults such as the <code class="docutils literal"><span class="pre">BloomFilter</span></code>, which is an important optimization.
<li><code class="docutils literal"><span class="pre">tableConfig.setBlockSize(16</span> <span class="pre">*</span> <span class="pre">1024L);</span></code> Modify the default <a class="reference external" href="https://github.com/apache/kafka/blob/2.3/streams/src/main/java/org/apache/kafka/streams/state/internals/RocksDBStore.java#L79">block size</a> per these instructions from the <a class="reference external" href="https://github.com/facebook/rocksdb/wiki/Memory-usage-in-RocksDB#indexes-and-filter-blocks">RocksDB GitHub</a>.</li>
<li><code class="docutils literal"><span class="pre">tableConfig.setCacheIndexAndFilterBlocks(true);</span></code> Do not let the index and filter blocks grow unbounded. For more information, see the <a class="reference external" href="https://github.com/facebook/rocksdb/wiki/Block-Cache#caching-index-and-filter-blocks">RocksDB GitHub</a>.</li>
<li><code class="docutils literal"><span class="pre">options.setMaxWriteBufferNumber(2);</span></code> See the advanced options in the <a class="reference external" href="https://github.com/facebook/rocksdb/blob/8dee8cad9ee6b70fd6e1a5989a8156650a70c04f/include/rocksdb/advanced_options.h#L103">RocksDB GitHub</a>.</li>
<li><code class="docutils literal"><span class="pre">cache.close();</span></code> To avoid memory leaks, you must close any objects you constructed that extend org.rocksdb.RocksObject. See <a class="reference external" href="https://github.com/facebook/rocksdb/wiki/RocksJava-Basics#memory-management">RocksJava docs</a> for more details.</li>
</ol>
</dd>
</dl>
</div></blockquote>
</div>
</div>
<div class="section" id="recommended-configuration-parameters-for-resiliency">
<h3><a class="toc-backref" href="#id21">Recommended configuration parameters for resiliency</a><a class="headerlink" href="#recommended-configuration-parameters-for-resiliency" title="Permalink to this headline"></a></h3>
<p>There are several Kafka and Kafka Streams configuration options that need to be configured explicitly for resiliency in face of broker failures:</p>
<table border="1" class="non-scrolling-table docutils">
<thead valign="bottom">
<tr class="row-odd"><th class="head">Parameter Name</th>
<th class="head">Corresponding Client</th>
<th class="head">Default value</th>
<th class="head">Consider setting to</th>
</tr>
</thead>
<tbody valign="top">
<tr class="row-even"><td>acks</td>
<td>Producer</td>
<td><code class="docutils literal"><span class="pre">acks=1</span></code></td>
<td><code class="docutils literal"><span class="pre">acks=all</span></code></td>
</tr>
<tr class="row-odd"><td>replication.factor</td>
<td>Streams</td>
<td><code class="docutils literal"><span class="pre">1</span></code></td>
<td><code class="docutils literal"><span class="pre">3</span></code></td>
</tr>
<tr class="row-even"><td>min.insync.replicas</td>
<td>Broker</td>
<td><code class="docutils literal"><span class="pre">1</span></code></td>
<td><code class="docutils literal"><span class="pre">2</span></code></td>
</tr>
</tbody>
</table>
<p>Increasing the replication factor to 3 ensures that the internal Kafka Streams topic can tolerate up to 2 broker failures. Changing the acks setting to &#8220;all&#8221;
guarantees that a record will not be lost as long as one replica is alive. The tradeoff from moving to the default values to the recommended ones is
that some performance and more storage space (3x with the replication factor of 3) are sacrificed for more resiliency.</p>
<div class="section" id="acks">
<h4><a class="toc-backref" href="#id22">acks</a><a class="headerlink" href="#acks" title="Permalink to this headline"></a></h4>
<blockquote>
<div><p>The number of acknowledgments that the leader must have received before considering a request complete. This controls
the durability of records that are sent. The possible values are:</p>
<ul class="simple">
<li><code class="docutils literal"><span class="pre">acks=0</span></code> The producer does not wait for acknowledgment from the server and the record is immediately added to the socket buffer and considered sent. No guarantee can be made that the server has received the record in this case, and the <code class="docutils literal"><span class="pre">retries</span></code> configuration will not take effect (as the client won&#8217;t generally know of any failures). The offset returned for each record will always be set to <code class="docutils literal"><span class="pre">-1</span></code>.</li>
<li><code class="docutils literal"><span class="pre">acks=1</span></code> The leader writes the record to its local log and responds without waiting for full acknowledgement from all followers. If the leader immediately fails after acknowledging the record, but before the followers have replicated it, then the record will be lost.</li>
<li><code class="docutils literal"><span class="pre">acks=all</span></code> The leader waits for the full set of in-sync replicas to acknowledge the record. This guarantees that the record will not be lost if there is at least one in-sync replica alive. This is the strongest available guarantee.</li>
</ul>
<p>For more information, see the <a class="reference external" href="https://kafka.apache.org/documentation/#producerconfigs">Kafka Producer documentation</a>.</p>
</div></blockquote>
</div>
<div class="section" id="id2">
<h4><a class="toc-backref" href="#id23">replication.factor</a><a class="headerlink" href="#id2" title="Permalink to this headline"></a></h4>
<blockquote>
<div>See the <a class="reference internal" href="#replication-factor-parm"><span class="std std-ref">description here</span></a>.</div></blockquote>
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="n">Properties</span> <span class="n">streamsSettings</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Properties</span><span class="o">();</span>
<span class="n">streamsSettings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">StreamsConfig</span><span class="o">.</span><span class="na">REPLICATION_FACTOR_CONFIG</span><span class="o">,</span> <span class="mi">3</span><span class="o">);</span>
<span class="n">streamsSettings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">StreamsConfig</span><span class="o">.</span><span class="na">topicPrefix</span><span class="o">(</span><span class="n">TopicConfig</span><span class="o">.</span><span class="na">MIN_IN_SYNC_REPLICAS_CONFIG</span><span class="o">),</span> <span class="mi">2</span><span class="o">);</span>
<span class="n">streamsSettings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">StreamsConfig</span><span class="o">.</span><span class="na">producerPrefix</span><span class="o">(</span><span class="n">ProducerConfig</span><span class="o">.</span><span class="na">ACKS_CONFIG</span><span class="o">),</span> <span class="s">&quot;all&quot;</span><span class="o">);</span>
</pre></div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="pagination">
<a href="/{{version}}/documentation/streams/developer-guide/write-streams" class="pagination__btn pagination__btn__prev">Previous</a>
<a href="/{{version}}/documentation/streams/developer-guide/dsl-api" class="pagination__btn pagination__btn__next">Next</a>
</div>
</script>
<!--#include virtual="../../../includes/_header.htm" -->
<!--#include virtual="../../../includes/_top.htm" -->
<div class="content documentation documentation--current">
<!--#include virtual="../../../includes/_nav.htm" -->
<div class="right">
<!--#include virtual="../../../includes/_docs_banner.htm" -->
<ul class="breadcrumbs">
<li><a href="/documentation">Documentation</a></li>
<li><a href="/documentation/streams">Kafka Streams</a></li>
<li><a href="/documentation/streams/developer-guide/">Developer Guide</a></li>
</ul>
<div class="p-content"></div>
</div>
</div>
<!--#include virtual="../../../includes/_footer.htm" -->
<script>
$(function() {
// Show selected style on nav item
$('.b-nav__streams').addClass('selected');
//sticky secondary nav
var $navbar = $(".sub-nav-sticky"),
y_pos = $navbar.offset().top,
height = $navbar.height();
$(window).scroll(function() {
var scrollTop = $(window).scrollTop();
if (scrollTop > y_pos - height) {
$navbar.addClass("navbar-fixed")
} else if (scrollTop <= y_pos) {
$navbar.removeClass("navbar-fixed")
}
});
// Display docs subnav items
$('.b-nav__docs').parent().toggleClass('nav__item__with__subs--expanded');
});
</script>

View File

@@ -0,0 +1,239 @@
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<script><!--#include virtual="../../js/templateData.js" --></script>
<script id="content-template" type="text/x-handlebars-template">
<!-- h1>Developer Guide for Kafka Streams</h1 -->
<div class="sub-nav-sticky">
<div class="sticky-top">
<!-- div style="height:35px">
<a href="/{{version}}/documentation/streams/">Introduction</a>
<a class="active-menu-item" href="/{{version}}/documentation/streams/developer-guide">Developer Guide</a>
<a href="/{{version}}/documentation/streams/core-concepts">Concepts</a>
<a href="/{{version}}/documentation/streams/quickstart">Run Demo App</a>
<a href="/{{version}}/documentation/streams/tutorial">Tutorial: Write App</a>
</div -->
</div>
</div>
<div class="section" id="data-types-and-serialization">
<span id="streams-developer-guide-serdes"></span><h1>Data Types and Serialization<a class="headerlink" href="#data-types-and-serialization" title="Permalink to this headline"></a></h1>
<p>Every Kafka Streams application must provide SerDes (Serializer/Deserializer) for the data types of record keys and record values (e.g. <code class="docutils literal"><span class="pre">java.lang.String</span></code>) to materialize the data when necessary. Operations that require such SerDes information include: <code class="docutils literal"><span class="pre">stream()</span></code>, <code class="docutils literal"><span class="pre">table()</span></code>, <code class="docutils literal"><span class="pre">to()</span></code>, <code class="docutils literal"><span class="pre">through()</span></code>, <code class="docutils literal"><span class="pre">groupByKey()</span></code>, <code class="docutils literal"><span class="pre">groupBy()</span></code>.</p>
<p>You can provide SerDes by using either of these methods:</p>
<ul class="simple">
<li>By setting default SerDes in the <code class="docutils literal"><span class="pre">java.util.Properties</span></code> config instance.</li>
<li>By specifying explicit SerDes when calling the appropriate API methods, thus overriding the defaults.</li>
</ul>
<p class="topic-title first"><b>Table of Contents</b></p>
<ul class="simple">
<li><a class="reference internal" href="#configuring-serdes" id="id1">Configuring SerDes</a></li>
<li><a class="reference internal" href="#overriding-default-serdes" id="id2">Overriding default SerDes</a></li>
<li><a class="reference internal" href="#available-serdes" id="id3">Available SerDes</a></li>
<ul>
<li><a class="reference internal" href="#primitive-and-basic-types" id="id4">Primitive and basic types</a></li>
<li><a class="reference internal" href="#json" id="id6">JSON</a></li>
<li><a class="reference internal" href="#implementing-custom-serdes" id="id5">Implementing custom serdes</a></li>
</ul>
<li><a class="reference internal" href="#scala-dsl-serdes" id="id8">Kafka Streams DSL for Scala Implicit SerDes</a></li>
</ul>
<div class="section" id="configuring-serdes">
<h2>Configuring SerDes<a class="headerlink" href="#configuring-serdes" title="Permalink to this headline"></a></h2>
<p>SerDes specified in the Streams configuration are used as the default in your Kafka Streams application.</p>
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">org.apache.kafka.common.serialization.Serdes</span><span class="o">;</span>
<span class="kn">import</span> <span class="nn">org.apache.kafka.streams.StreamsConfig</span><span class="o">;</span>
<span class="n">Properties</span> <span class="n">settings</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Properties</span><span class="o">();</span>
<span class="c1">// Default serde for keys of data records (here: built-in serde for String type)</span>
<span class="n">settings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">StreamsConfig</span><span class="o">.</span><span class="na">DEFAULT_KEY_SERDE_CLASS_CONFIG</span><span class="o">,</span> <span class="n">Serdes</span><span class="o">.</span><span class="na">String</span><span class="o">().</span><span class="na">getClass</span><span class="o">().</span><span class="na">getName</span><span class="o">());</span>
<span class="c1">// Default serde for values of data records (here: built-in serde for Long type)</span>
<span class="n">settings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">StreamsConfig</span><span class="o">.</span><span class="na">DEFAULT_VALUE_SERDE_CLASS_CONFIG</span><span class="o">,</span> <span class="n">Serdes</span><span class="o">.</span><span class="na">Long</span><span class="o">().</span><span class="na">getClass</span><span class="o">().</span><span class="na">getName</span><span class="o">());</span>
</pre></div>
</div>
</div>
<div class="section" id="overriding-default-serdes">
<h2>Overriding default SerDes<a class="headerlink" href="#overriding-default-serdes" title="Permalink to this headline"></a></h2>
<p>You can also specify SerDes explicitly by passing them to the appropriate API methods, which overrides the default serde settings:</p>
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">org.apache.kafka.common.serialization.Serde</span><span class="o">;</span>
<span class="kn">import</span> <span class="nn">org.apache.kafka.common.serialization.Serdes</span><span class="o">;</span>
<span class="kd">final</span> <span class="n">Serde</span><span class="o">&lt;</span><span class="n">String</span><span class="o">&gt;</span> <span class="n">stringSerde</span> <span class="o">=</span> <span class="n">Serdes</span><span class="o">.</span><span class="na">String</span><span class="o">();</span>
<span class="kd">final</span> <span class="n">Serde</span><span class="o">&lt;</span><span class="n">Long</span><span class="o">&gt;</span> <span class="n">longSerde</span> <span class="o">=</span> <span class="n">Serdes</span><span class="o">.</span><span class="na">Long</span><span class="o">();</span>
<span class="c1">// The stream userCountByRegion has type `String` for record keys (for region)</span>
<span class="c1">// and type `Long` for record values (for user counts).</span>
<span class="n">KStream</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Long</span><span class="o">&gt;</span> <span class="n">userCountByRegion</span> <span class="o">=</span> <span class="o">...;</span>
<span class="n">userCountByRegion</span><span class="o">.</span><span class="na">to</span><span class="o">(</span><span class="s">&quot;RegionCountsTopic&quot;</span><span class="o">,</span> <span class="n">Produced</span><span class="o">.</span><span class="na">with</span><span class="o">(</span><span class="n">stringSerde</span><span class="o">,</span> <span class="n">longSerde</span><span class="o">));</span>
</pre></div>
</div>
<p>If you want to override serdes selectively, i.e., keep the defaults for some fields, then don&#8217;t specify the serde whenever you want to leverage the default settings:</p>
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">org.apache.kafka.common.serialization.Serde</span><span class="o">;</span>
<span class="kn">import</span> <span class="nn">org.apache.kafka.common.serialization.Serdes</span><span class="o">;</span>
<span class="c1">// Use the default serializer for record keys (here: region as String) by not specifying the key serde,</span>
<span class="c1">// but override the default serializer for record values (here: userCount as Long).</span>
<span class="kd">final</span> <span class="n">Serde</span><span class="o">&lt;</span><span class="n">Long</span><span class="o">&gt;</span> <span class="n">longSerde</span> <span class="o">=</span> <span class="n">Serdes</span><span class="o">.</span><span class="na">Long</span><span class="o">();</span>
<span class="n">KStream</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Long</span><span class="o">&gt;</span> <span class="n">userCountByRegion</span> <span class="o">=</span> <span class="o">...;</span>
<span class="n">userCountByRegion</span><span class="o">.</span><span class="na">to</span><span class="o">(</span><span class="s">&quot;RegionCountsTopic&quot;</span><span class="o">,</span> <span class="n">Produced</span><span class="o">.</span><span class="na">valueSerde</span><span class="o">(</span><span class="n">Serdes</span><span class="o">.</span><span class="na">Long</span><span class="o">()));</span>
</pre></div>
</div>
<p>If some of your incoming records are corrupted or ill-formatted, they will cause the deserializer class to report an error.
Since 1.0.x we have introduced an <code>DeserializationExceptionHandler</code> interface which allows
you to customize how to handle such records. The customized implementation of the interface can be specified via the <code>StreamsConfig</code>.
For more details, please feel free to read the <a href="config-streams.html#default-deserialization-exception-handler">Configuring a Streams Application</a> section.
</p>
</div>
<div class="section" id="available-serdes">
<span id="streams-developer-guide-serdes-available"></span><h2>Available SerDes<a class="headerlink" href="#available-serdes" title="Permalink to this headline"></a></h2>
<div class="section" id="primitive-and-basic-types">
<h3>Primitive and basic types<a class="headerlink" href="#primitive-and-basic-types" title="Permalink to this headline"></a></h3>
<p>Apache Kafka includes several built-in serde implementations for Java primitives and basic types such as <code class="docutils literal"><span class="pre">byte[]</span></code> in
its <code class="docutils literal"><span class="pre">kafka-clients</span></code> Maven artifact:</p>
<div class="highlight-xml"><div class="highlight"><pre><span></span><span class="nt">&lt;dependency&gt;</span>
<span class="nt">&lt;groupId&gt;</span>org.apache.kafka<span class="nt">&lt;/groupId&gt;</span>
<span class="nt">&lt;artifactId&gt;</span>kafka-clients<span class="nt">&lt;/artifactId&gt;</span>
<span class="nt">&lt;version&gt;</span>{{fullDotVersion}}<span class="nt">&lt;/version&gt;</span>
<span class="nt">&lt;/dependency&gt;</span>
</pre></div>
</div>
<p>This artifact provides the following serde implementations under the package <a class="reference external" href="https://github.com/apache/kafka/blob/{{dotVersion}}/clients/src/main/java/org/apache/kafka/common/serialization">org.apache.kafka.common.serialization</a>, which you can leverage when e.g., defining default serializers in your Streams configuration.</p>
<table border="1" class="docutils">
<colgroup>
<col width="17%" />
<col width="83%" />
</colgroup>
<thead valign="bottom">
<tr class="row-odd"><th class="head">Data type</th>
<th class="head">Serde</th>
</tr>
</thead>
<tbody valign="top">
<tr class="row-even"><td>byte[]</td>
<td><code class="docutils literal"><span class="pre">Serdes.ByteArray()</span></code>, <code class="docutils literal"><span class="pre">Serdes.Bytes()</span></code> (see tip below)</td>
</tr>
<tr class="row-odd"><td>ByteBuffer</td>
<td><code class="docutils literal"><span class="pre">Serdes.ByteBuffer()</span></code></td>
</tr>
<tr class="row-even"><td>Double</td>
<td><code class="docutils literal"><span class="pre">Serdes.Double()</span></code></td>
</tr>
<tr class="row-odd"><td>Integer</td>
<td><code class="docutils literal"><span class="pre">Serdes.Integer()</span></code></td>
</tr>
<tr class="row-even"><td>Long</td>
<td><code class="docutils literal"><span class="pre">Serdes.Long()</span></code></td>
</tr>
<tr class="row-odd"><td>String</td>
<td><code class="docutils literal"><span class="pre">Serdes.String()</span></code></td>
</tr>
<tr class="row-even"><td>UUID</td>
<td><code class="docutils literal"><span class="pre">Serdes.UUID()</span></code></td>
</tr>
</tr>
<tr class="row-odd"><td>Void</td>
<td><code class="docutils literal"><span class="pre">Serdes.Void()</span></code></td>
</tr>
</tbody>
</table>
<div class="admonition tip">
<p><b>Tip</b></p>
<p class="last"><a class="reference external" href="https://github.com/apache/kafka/blob/{{dotVersion}}/clients/src/main/java/org/apache/kafka/common/utils/Bytes.java">Bytes</a> is a wrapper for Java&#8217;s <code class="docutils literal"><span class="pre">byte[]</span></code> (byte array) that supports proper equality and ordering semantics. You may want to consider using <code class="docutils literal"><span class="pre">Bytes</span></code> instead of <code class="docutils literal"><span class="pre">byte[]</span></code> in your applications.</p>
</div>
</div>
<div class="section" id="json">
<h3>JSON<a class="headerlink" href="#json" title="Permalink to this headline"></a></h3>
<p>The Kafka Streams code examples also include a basic serde implementation for JSON:</p>
<ul class="simple">
<li><a class="reference external" href="https://github.com/apache/kafka/blob/{{dotVersion}}/streams/examples/src/main/java/org/apache/kafka/streams/examples/pageview/PageViewTypedDemo.java#L83">PageViewTypedDemo</a></li>
</ul>
<p>As shown in the example, you can use JSONSerdes inner classes <code class="docutils literal"><span class="pre">Serdes.serdeFrom(&lt;serializerInstance&gt;, &lt;deserializerInstance&gt;)</span></code> to construct JSON compatible serializers and deserializers.
</p>
</div>
<div class="section" id="implementing-custom-serdes">
<span id="streams-developer-guide-serdes-custom"></span><h2>Implementing custom SerDes<a class="headerlink" href="#implementing-custom-serdes" title="Permalink to this headline"></a></h2>
<p>If you need to implement custom SerDes, your best starting point is to take a look at the source code references of
existing SerDes (see previous section). Typically, your workflow will be similar to:</p>
<ol class="arabic simple">
<li>Write a <em>serializer</em> for your data type <code class="docutils literal"><span class="pre">T</span></code> by implementing
<a class="reference external" href="https://github.com/apache/kafka/blob/{{dotVersion}}/clients/src/main/java/org/apache/kafka/common/serialization/Serializer.java">org.apache.kafka.common.serialization.Serializer</a>.</li>
<li>Write a <em>deserializer</em> for <code class="docutils literal"><span class="pre">T</span></code> by implementing
<a class="reference external" href="https://github.com/apache/kafka/blob/{{dotVersion}}/clients/src/main/java/org/apache/kafka/common/serialization/Deserializer.java">org.apache.kafka.common.serialization.Deserializer</a>.</li>
<li>Write a <em>serde</em> for <code class="docutils literal"><span class="pre">T</span></code> by implementing
<a class="reference external" href="https://github.com/apache/kafka/blob/{{dotVersion}}/clients/src/main/java/org/apache/kafka/common/serialization/Serde.java">org.apache.kafka.common.serialization.Serde</a>,
which you either do manually (see existing SerDes in the previous section) or by leveraging helper functions in
<a class="reference external" href="https://github.com/apache/kafka/blob/{{dotVersion}}/clients/src/main/java/org/apache/kafka/common/serialization/Serdes.java">Serdes</a>
such as <code class="docutils literal"><span class="pre">Serdes.serdeFrom(Serializer&lt;T&gt;, Deserializer&lt;T&gt;)</span></code>.
Note that you will need to implement your own class (that has no generic types) if you want to use your custom serde in the configuration provided to <code class="docutils literal"><span class="pre">KafkaStreams</span></code>.
If your serde class has generic types or you use <code class="docutils literal"><span class="pre">Serdes.serdeFrom(Serializer&lt;T&gt;, Deserializer&lt;T&gt;)</span></code>, you can pass your serde only
via methods calls (for example <code class="docutils literal"><span class="pre">builder.stream("topicName", Consumed.with(...))</span></code>).</li>
</ol>
</div>
</div>
<div class="section" id="scala-dsl-serdes">
<h2>Kafka Streams DSL for Scala Implicit SerDes<a class="headerlink" href="scala-dsl-serdes" title="Permalink to this headline"></a></h2>
<p>When using the <a href="dsl-api.html#scala-dsl">Kafka Streams DSL for Scala</a> you're not required to configure a default SerDes. In fact, it's not supported. SerDes are instead provided implicitly by default implementations for common primitive datatypes. See the <a href="dsl-api.html#scala-dsl-implicit-serdes">Implicit SerDes</a> and <a href="dsl-api.html#scala-dsl-user-defined-serdes">User-Defined SerDes</a> sections in the DSL API documentation for details</p>
</div>
</div>
</div>
<div class="pagination">
<a href="/{{version}}/documentation/streams/developer-guide/processor-api" class="pagination__btn pagination__btn__prev">Previous</a>
<a href="/{{version}}/documentation/streams/developer-guide/testing" class="pagination__btn pagination__btn__next">Next</a>
</div>
</script>
<!--#include virtual="../../../includes/_header.htm" -->
<!--#include virtual="../../../includes/_top.htm" -->
<div class="content documentation documentation--current">
<!--#include virtual="../../../includes/_nav.htm" -->
<div class="right">
<!--#include virtual="../../../includes/_docs_banner.htm" -->
<ul class="breadcrumbs">
<li><a href="/documentation">Documentation</a></li>
<li><a href="/documentation/streams">Kafka Streams</a></li>
<li><a href="/documentation/streams/developer-guide/">Developer Guide</a></li>
</ul>
<div class="p-content"></div>
</div>
</div>
<!--#include virtual="../../../includes/_footer.htm" -->
<script>
$(function() {
// Show selected style on nav item
$('.b-nav__streams').addClass('selected');
//sticky secondary nav
var $navbar = $(".sub-nav-sticky"),
y_pos = $navbar.offset().top,
height = $navbar.height();
$(window).scroll(function() {
var scrollTop = $(window).scrollTop();
if (scrollTop > y_pos - height) {
$navbar.addClass("navbar-fixed")
} else if (scrollTop <= y_pos) {
$navbar.removeClass("navbar-fixed")
}
});
// Display docs subnav items
$('.b-nav__docs').parent().toggleClass('nav__item__with__subs--expanded');
});
</script>

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,370 @@
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<script><!--#include virtual="../../js/templateData.js" --></script>
<script id="content-template" type="text/x-handlebars-template">
<!-- h1>Developer Guide for Kafka Streams</h1 -->
<div class="sub-nav-sticky">
<div class="sticky-top">
<!-- div style="height:35px">
<a href="/{{version}}/documentation/streams/">Introduction</a>
<a class="active-menu-item" href="/{{version}}/documentation/streams/developer-guide">Developer Guide</a>
<a href="/{{version}}/documentation/streams/core-concepts">Concepts</a>
<a href="/{{version}}/documentation/streams/quickstart">Run Demo App</a>
<a href="/{{version}}/documentation/streams/tutorial">Tutorial: Write App</a>
</div -->
</div>
</div>
<div class="section" id="naming">
<span id="streams-developer-guide-dsl-topology-naming"></span>
<h1>Naming Operators in a Kafka Streams DSL Application<a class="headerlink" href="#naming" title="Permalink to this headline"></a></h1>
<p>
You now can give names to processors when using the Kafka Streams DSL.
In the PAPI there are <code>Processors</code> and <code>State Stores</code> and
you are required to explicitly name each one.
</p>
<p>
At the DLS layer, there are operators. A single DSL operator may
compile down to multiple <code>Processors</code> and <code>State Stores</code>, and
if required <code>repartition topics</code>. But with the Kafka Streams
DSL, all these names are generated for you. There is a relationship between
the generated processor name state store names (hence changelog topic names) and repartition
topic names. Note, that the names of state stores and changelog/repartition topics
are "stateful" while processor names are "stateless".
</p>
<p>
This distinction
of stateful vs. stateless names has important implications when updating your topology.
While the internal naming makes creating
a topology with the DSL much more straightforward,
there are a couple of trade-offs. The first trade-off is what we could
consider a readability issue. The other
more severe trade-off is the shifting of names due to the relationship between the
DSL operator and the generated <code>Processors</code>, <code>State Stores</code> changelog
topics and repartition topics.
</p>
<h2>Readability Issues</h2>
<p>
By saying there is a readability trade-off, we are referring to viewing a description of the topology.
When you render the string description of your topology via the <code>Topology#desribe()</code>
method, you can see what the processor is, but you don't have any context for its business purpose.
For example, consider the following simple topology:
<br/>
<pre>
KStream&lt;String,String&gt; stream = builder.stream("input");
stream.filter((k,v) -> !v.equals("invalid_txn"))
.mapValues((v) -> v.substring(0,5))
.to("output")
</pre>
</p>
<p>
Running <code>Topology#describe()</code> yields this string:
<pre>
Topologies:
Sub-topology: 0
Source: KSTREAM-SOURCE-0000000000 (topics: [input])
--> KSTREAM-FILTER-0000000001
Processor: KSTREAM-FILTER-0000000001 (stores: [])
--> KSTREAM-MAPVALUES-0000000002
<-- KSTREAM-SOURCE-0000000000
Processor: KSTREAM-MAPVALUES-0000000002 (stores: [])
--> KSTREAM-SINK-0000000003
<-- KSTREAM-FILTER-0000000001
Sink: KSTREAM-SINK-0000000003 (topic: output)
<-- KSTREAM-MAPVALUES-0000000002
</pre>
From this report, you can see what the different operators are, but what is the broader context here?
For example, consider <code>KSTREAM-FILTER-0000000001</code>, we can see that it's a
filter operation, which means that records are dropped that don't match the given predicate. But what is
the meaning of the predicate? Additionally, you can see the topic names of the source and sink nodes,
but what if the topics aren't named in a meaningful way? Then you're left to guess the
business purpose behind these topics.
</p>
<p>
Also notice the numbering here: the source node is suffixed with <code>0000000000</code>
indicating it's the first processor in the topology.
The filter is suffixed with <code>0000000001</code>, indicating it's the second processor in
the topology. In Kafka Streams, there are now overloaded methods for
both <code>KStream</code> and <code>KTable</code> that accept
a new parameter <code>Named</code>. By using the <code>Named</code> class DSL users can
provide meaningful names to the processors in their topology.
</p>
<p>
Now let's take a look at your topology with all the processors named:
<pre>
KStream&lt;String,String&gt; stream =
builder.stream("input", Consumed.as("Customer_transactions_input_topic"));
stream.filter((k,v) -> !v.equals("invalid_txn"), Named.as("filter_out_invalid_txns"))
.mapValues((v) -> v.substring(0,5), Named.as("Map_values_to_first_6_characters"))
.to("output", Produced.as("Mapped_transactions_output_topic"));
</pre>
<pre>
Topologies:
Sub-topology: 0
Source: Customer_transactions_input_topic (topics: [input])
--> filter_out_invalid_txns
Processor: filter_out_invalid_txns (stores: [])
--> Map_values_to_first_6_characters
<-- Customer_transactions_input_topic
Processor: Map_values_to_first_6_characters (stores: [])
--> Mapped_transactions_output_topic
<-- filter_out_invalid_txns
Sink: Mapped_transactions_output_topic (topic: output)
<-- Map_values_to_first_6_characters
</pre>
Now you can look at the topology description and easily understand what role each processor
plays in the topology. But there's another reason for naming your processor nodes when you
have stateful operators that remain between restarts of your Kafka Streams applications,
state stores, changelog topics, and repartition topics.
</p>
<h2>Changing Names</h2>
<p>
Generated names are numbered where they are built in the topology.
The name generation strategy is
<code>KSTREAM|KTABLE-&gt;operator name&lt;-&gt;number suffix&lt;</code>. The number is a
globally incrementing number that represents the operator's order in the topology.
The generated number is prefixed with a varying number of "0"s to create a
string that is consistently 10 characters long.
This means that if you add/remove or shift the order of operations, the position of the
processor shifts, which shifts the name of the processor. Since <strong>most</strong> processors exist
in memory only, this name shifting presents no issue for many topologies. But the name
shifting does have implications for topologies with stateful operators or repartition topics.
Here's a different topology with some state:
<pre>
KStream&lt;String,String&gt; stream = builder.stream("input");
stream.groupByKey()
.count()
.toStream()
.to("output");
</pre>
This topology description yields the following:
<pre>
Topologies:
Sub-topology: 0
Source: KSTREAM-SOURCE-0000000000 (topics: [input])
--> KSTREAM-AGGREGATE-0000000002
Processor: KSTREAM-AGGREGATE-0000000002 (stores: [KSTREAM-AGGREGATE-STATE-STORE-0000000001])
--> KTABLE-TOSTREAM-0000000003
<-- KSTREAM-SOURCE-0000000000
Processor: KTABLE-TOSTREAM-0000000003 (stores: [])
--> KSTREAM-SINK-0000000004
<-- KSTREAM-AGGREGATE-0000000002
Sink: KSTREAM-SINK-0000000004 (topic: output)
<-- KTABLE-TOSTREAM-0000000003
</pre>
</p>
<p>
You can see from the topology description above that the state store is named
<code>KSTREAM-AGGREGATE-STATE-STORE-0000000002</code>. Here's what happens when you
add a filter to keep some of the records out of the aggregation:
<pre>
KStream&lt;String,String&gt; stream = builder.stream("input");
stream.filter((k,v)-> v !=null && v.length() >= 6 )
.groupByKey()
.count()
.toStream()
.to("output");
</pre>
And the corresponding topology:
<pre>
Topologies:
Sub-topology: 0
Source: KSTREAM-SOURCE-0000000000 (topics: [input])
--> KSTREAM-FILTER-0000000001
Processor: KSTREAM-FILTER-0000000001 (stores: [])
--> KSTREAM-AGGREGATE-0000000003
<-- KSTREAM-SOURCE-0000000000
Processor: KSTREAM-AGGREGATE-0000000003 (stores: [KSTREAM-AGGREGATE-STATE-STORE-0000000002])
--> KTABLE-TOSTREAM-0000000004
<-- KSTREAM-FILTER-0000000001
Processor: KTABLE-TOSTREAM-0000000004 (stores: [])
--> KSTREAM-SINK-0000000005
<-- KSTREAM-AGGREGATE-0000000003
Sink: KSTREAM-SINK-0000000005 (topic: output)
<-- KTABLE-TOSTREAM-0000000004
</pre>
</p>
<p>
Notice that since you've added an operation <em>before</em> the <code>count</code> operation, the state
store (and the changelog topic) names have changed. This name change means you can't
do a rolling re-deployment of your updated topology. Also, you must use the
<a href="/{{version}}/documentation/streams/developer-guide/app-reset-tool">Streams Reset Tool</a>
to re-calculate the aggregations, because the changelog topic has changed on start-up and the
new changelog topic contains no data.
Fortunately, there's an easy solution to remedy this situation. Give the
state store a user-defined name instead of relying on the generated one,
so you don't have to worry about topology changes shifting the name of the state store.
You've had the ability to name repartition topics with the <code>Joined</code>,
<code>StreamJoined</code>, and<code>Grouped</code> classes, and
name state store and changelog topics with <code>Materialized</code>.
But it's worth reiterating the importance of naming these DSL topology operations again.
Here's how your DSL code looks now giving a specific name to your state store:
<pre>
KStream&lt;String,String&gt; stream = builder.stream("input");
stream.filter((k, v) -> v != null && v.length() >= 6)
.groupByKey()
.count(Materialized.as("Purchase_count_store"))
.toStream()
.to("output");
</pre>
And here's the topology
<pre>
Topologies:
Sub-topology: 0
Source: KSTREAM-SOURCE-0000000000 (topics: [input])
--> KSTREAM-FILTER-0000000001
Processor: KSTREAM-FILTER-0000000001 (stores: [])
--> KSTREAM-AGGREGATE-0000000002
<-- KSTREAM-SOURCE-0000000000
Processor: KSTREAM-AGGREGATE-0000000002 (stores: [Purchase_count_store])
--> KTABLE-TOSTREAM-0000000003
<-- KSTREAM-FILTER-0000000001
Processor: KTABLE-TOSTREAM-0000000003 (stores: [])
--> KSTREAM-SINK-0000000004
<-- KSTREAM-AGGREGATE-0000000002
Sink: KSTREAM-SINK-0000000004 (topic: output)
<-- KTABLE-TOSTREAM-0000000003
</pre>
</p>
<p>
Now, even though you've added processors before your state store, the store name and its changelog
topic names don't change. This makes your topology more robust and resilient to changes made by
adding or removing processors.
</p>
<h2>Conclusion</h2>
It's a good practice to name your processing nodes when using the DSL, and it's even
more important to do this when you have "stateful" processors
your application such as repartition
topics and state stores (and the accompanying changelog topics).
<p>
Here are a couple of points to remember when naming your DSL topology:
<ol>
<li>
If you have an <em>existing topology</em> and you <em>haven't</em> named your
state stores (and changelog topics) and repartition topics, we recommended that you
do so. But this will be a topology breaking change, so you'll need to shut down all
application instances, make the changes, and run the
<a href="/{{version}}/documentation/streams/developer-guide/app-reset-tool">Streams Reset Tool</a>.
Although this may be inconvenient at first, it's worth the effort to protect your application from
unexpected errors due to topology changes.
</li>
<li>
If you have a <em>new topology</em>, make sure you name the persistent parts of your topology:
state stores (changelog topics) and repartition topics. This way, when you deploy your
application, you're protected from topology changes that otherwise would break your Kafka Streams application.
If you don't want to add names to stateless processors at first, that's fine as you can
always go back and add the names later.
</li>
</ol>
Here's a quick reference on naming the critical parts of
your Kafka Streams application to prevent topology name changes from breaking your application:
<table>
<tr>
<th>Operation</th><th>Naming Class</th>
</tr>
<tr>
<td>Aggregation repartition topics</td><td>Grouped</td>
</tr>
<tr>
<td>KStream-KStream Join repartition topics</td><td>StreamJoined</td>
</tr>
<tr>
<td>KStream-KTable Join repartition topic</td><td>Joined</td>
</tr>
<tr>
<td>KStream-KStream Join state stores</td><td>StreamJoined</td>
</tr>
<tr>
<td>State Stores (for aggregations and KTable-KTable joins)</td><td>Materialized</td>
</tr>
<tr>
<td>Stream/Table non-stateful operations</td><td>Named</td>
</tr>
</table>
</p>
</div>
</script>
<!--#include virtual="../../../includes/_header.htm" -->
<!--#include virtual="../../../includes/_top.htm" -->
<div class="content documentation documentation--current">
<!--#include virtual="../../../includes/_nav.htm" -->
<div class="right">
<!--#include virtual="../../../includes/_docs_banner.htm" -->
<ul class="breadcrumbs">
<li><a href="/documentation">Documentation</a></li>
<li><a href="/documentation/streams">Kafka Streams</a></li>
<li><a href="/documentation/streams/developer-guide/">Developer Guide</a></li>
</ul>
<div class="p-content"></div>
</div>
</div>
<!--#include virtual="../../../includes/_footer.htm" -->
<script>
$(function () {
// Show selected style on nav item
$('.b-nav__streams').addClass('selected');
//sticky secondary nav
var $navbar = $(".sub-nav-sticky"),
y_pos = $navbar.offset().top,
height = $navbar.height();
$(window).scroll(function () {
var scrollTop = $(window).scrollTop();
if (scrollTop > y_pos - height) {
$navbar.addClass("navbar-fixed")
} else if (scrollTop <= y_pos) {
$navbar.removeClass("navbar-fixed")
}
});
// Display docs subnav items
$('.b-nav__docs').parent().toggleClass('nav__item__with__subs--expanded');
});
</script>

View File

@@ -0,0 +1,106 @@
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<script><!--#include virtual="../../js/templateData.js" --></script>
<script id="content-template" type="text/x-handlebars-template">
<h1>Developer Guide for Kafka Streams</h1>
<div class="sub-nav-sticky">
<div class="sticky-top">
<div style="height:35px">
<a href="/{{version}}/documentation/streams/">Introduction</a>
<a href="/{{version}}/documentation/streams/quickstart">Run Demo App</a>
<a href="/{{version}}/documentation/streams/tutorial">Tutorial: Write App</a>
<a href="/{{version}}/documentation/streams/core-concepts">Concepts</a>
<a href="/{{version}}/documentation/streams/architecture">Architecture</a>
<a class="active-menu-item" href="/{{version}}/documentation/streams/developer-guide/">Developer Guide</a>
<a href="/{{version}}/documentation/streams/upgrade-guide">Upgrade</a>
</div>
</div>
</div>
<div class="section" id="developer-guide">
<!-- span id="streams-developer-guide"></span><h1>Developer Guide<a class="headerlink" href="#developer-guide" title="Permalink to this headline"></a></h1 -->
<p>This developer guide describes how to write, configure, and execute a Kafka Streams application.</p>
<div class="toctree-wrapper compound">
<ul>
<li class="toctree-l1"><a class="reference internal" href="write-streams.html">Writing a Streams Application</a></li>
<li class="toctree-l1"><a class="reference internal" href="config-streams.html">Configuring a Streams Application</a></li>
<li class="toctree-l1"><a class="reference internal" href="dsl-api.html">Streams DSL</a></li>
<li class="toctree-l1"><a class="reference internal" href="processor-api.html">Processor API</a></li>
<li class="toctree-l1"><a class="reference internal" href="dsl-topology-naming.html">Naming Operators in a Streams DSL application</a></li>
<li class="toctree-l1"><a class="reference internal" href="datatypes.html">Data Types and Serialization</a></li>
<li class="toctree-l1"><a class="reference internal" href="testing.html">Testing a Streams Application</a></li>
<li class="toctree-l1"><a class="reference internal" href="interactive-queries.html">Interactive Queries</a></li>
<li class="toctree-l1"><a class="reference internal" href="memory-mgmt.html">Memory Management</a></li>
<li class="toctree-l1"><a class="reference internal" href="running-app.html">Running Streams Applications</a></li>
<li class="toctree-l1"><a class="reference internal" href="manage-topics.html">Managing Streams Application Topics</a></li>
<li class="toctree-l1"><a class="reference internal" href="security.html">Streams Security</a></li>
<li class="toctree-l1"><a class="reference internal" href="app-reset-tool.html">Application Reset Tool</a></li>
</ul>
</div>
</div>
</div>
</div>
<div class="pagination">
<a href="/{{version}}/documentation/streams/architecture" class="pagination__btn pagination__btn__prev">Previous</a>
<a href="/{{version}}/documentation/streams/developer-guide/write-streams" class="pagination__btn pagination__btn__next">Next</a>
</div>
</script>
<!--#include virtual="../../../includes/_header.htm" -->
<!--#include virtual="../../../includes/_top.htm" -->
<div class="content documentation documentation--current">
<!--#include virtual="../../../includes/_nav.htm" -->
<div class="right">
<!--#include virtual="../../../includes/_docs_banner.htm" -->
<ul class="breadcrumbs">
<li><a href="/documentation">Documentation</a></li>
<li><a href="/documentation/streams">Kafka Streams</a></li>
</ul>
<div class="p-content"></div>
</div>
</div>
<!--#include virtual="../../../includes/_footer.htm" -->
<script>
$(function() {
// Show selected style on nav item
$('.b-nav__streams').addClass('selected');
//sticky secondary nav
var $navbar = $(".sub-nav-sticky"),
y_pos = $navbar.offset().top,
height = $navbar.height();
$(window).scroll(function() {
var scrollTop = $(window).scrollTop();
if (scrollTop > y_pos - height) {
$navbar.addClass("navbar-fixed")
} else if (scrollTop <= y_pos) {
$navbar.removeClass("navbar-fixed")
}
});
// Display docs subnav items
$('.b-nav__docs').parent().toggleClass('nav__item__with__subs--expanded');
});
</script>

View File

@@ -0,0 +1,529 @@
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<script><!--#include virtual="../../js/templateData.js" --></script>
<script id="content-template" type="text/x-handlebars-template">
<!-- h1>Developer Guide for Kafka Streams</h1 -->
<div class="sub-nav-sticky">
<div class="sticky-top">
<!-- div style="height:35px">
<a href="/{{version}}/documentation/streams/">Introduction</a>
<a class="active-menu-item" href="/{{version}}/documentation/streams/developer-guide">Developer Guide</a>
<a href="/{{version}}/documentation/streams/core-concepts">Concepts</a>
<a href="/{{version}}/documentation/streams/quickstart">Run Demo App</a>
<a href="/{{version}}/documentation/streams/tutorial">Tutorial: Write App</a>
</div -->
</div>
</div>
<div class="section" id="interactive-queries">
<span id="streams-developer-guide-interactive-queries"></span><h1>Interactive Queries<a class="headerlink" href="#interactive-queries" title="Permalink to this headline"></a></h1>
<p>Interactive queries allow you to leverage the state of your application from outside your application. The Kafka Streams enables your applications to be queryable.</p>
<div class="contents local topic" id="table-of-contents">
<p class="topic-title first"><b>Table of Contents</b></p>
<ul class="simple">
<li><a class="reference internal" href="#querying-local-state-stores-for-an-app-instance" id="id3">Querying local state stores for an app instance</a><ul>
<li><a class="reference internal" href="#querying-local-key-value-stores" id="id4">Querying local key-value stores</a></li>
<li><a class="reference internal" href="#querying-local-window-stores" id="id5">Querying local window stores</a></li>
<li><a class="reference internal" href="#querying-local-custom-state-stores" id="id6">Querying local custom state stores</a></li>
</ul>
</li>
<li><a class="reference internal" href="#querying-remote-state-stores-for-the-entire-app" id="id7">Querying remote state stores for the entire app</a><ul>
<li><a class="reference internal" href="#adding-an-rpc-layer-to-your-application" id="id8">Adding an RPC layer to your application</a></li>
<li><a class="reference internal" href="#exposing-the-rpc-endpoints-of-your-application" id="id9">Exposing the RPC endpoints of your application</a></li>
<li><a class="reference internal" href="#discovering-and-accessing-application-instances-and-their-local-state-stores" id="id10">Discovering and accessing application instances and their local state stores</a></li>
</ul>
</li>
<li><a class="reference internal" href="#demo-applications" id="id11">Demo applications</a></li>
</ul>
</div>
<p>The full state of your application is typically <a class="reference internal" href="../architecture.html#streams_architecture_state"><span class="std std-ref">split across many distributed instances of your application</span></a>, and across many state stores that are managed locally by these application instances.</p>
<div class="figure align-center">
<img class="centered" src="/{{version}}/images/streams-interactive-queries-03.png">
</div>
<p>There are local and remote components to interactively querying the state of your application.</p>
<dl class="docutils">
<dt>Local state</dt>
<dd>An application instance can query the locally managed portion of the state and directly query its own local state stores. You can use the corresponding local data in other parts of your application code, as long as it doesn&#8217;t require calling the Kafka Streams API. Querying state stores is always read-only to guarantee that the underlying state stores will never be mutated out-of-band (e.g., you cannot add new entries). State stores should only be mutated by the corresponding processor topology and the input data it operates on. For more information, see <a class="reference internal" href="#streams-developer-guide-interactive-queries-local-stores"><span class="std std-ref">Querying local state stores for an app instance</span></a>.</dd>
<dt>Remote state</dt>
<dd><p class="first">To query the full state of your application, you must connect the various fragments of the state, including:</p>
<ul class="simple">
<li>query local state stores</li>
<li>discover all running instances of your application in the network and their state stores</li>
<li>communicate with these instances over the network (e.g., an RPC layer)</li>
</ul>
<p class="last">Connecting these fragments enables communication between instances of the same app and communication from other applications for interactive queries. For more information, see <a class="reference internal" href="#streams-developer-guide-interactive-queries-discovery"><span class="std std-ref">Querying remote state stores for the entire app</span></a>.</p>
</dd>
</dl>
<p>Kafka Streams natively provides all of the required functionality for interactively querying the state of your application, except if you want to expose the full state of your application via interactive queries. To allow application instances to communicate over the network, you must add a Remote Procedure Call (RPC) layer to your application (e.g., REST API).</p>
<p>This table shows the Kafka Streams native communication support for various procedures.</p>
<table border="1" class="docutils">
<colgroup>
<col width="42%" />
<col width="27%" />
<col width="31%" />
</colgroup>
<thead valign="bottom">
<tr class="row-odd"><th class="head">Procedure</th>
<th class="head">Application instance</th>
<th class="head">Entire application</th>
</tr>
</thead>
<tbody valign="top">
<tr class="row-even"><td>Query local state stores of an app instance</td>
<td>Supported</td>
<td>Supported</td>
</tr>
<tr class="row-odd"><td>Make an app instance discoverable to others</td>
<td>Supported</td>
<td>Supported</td>
</tr>
<tr class="row-even"><td>Discover all running app instances and their state stores</td>
<td>Supported</td>
<td>Supported</td>
</tr>
<tr class="row-odd"><td>Communicate with app instances over the network (RPC)</td>
<td>Supported</td>
<td>Not supported (you must configure)</td>
</tr>
</tbody>
</table>
<div class="section" id="querying-local-state-stores-for-an-app-instance">
<span id="streams-developer-guide-interactive-queries-local-stores"></span><h2><a class="toc-backref" href="#id3">Querying local state stores for an app instance</a><a class="headerlink" href="#querying-local-state-stores-for-an-app-instance" title="Permalink to this headline"></a></h2>
<p>A Kafka Streams application typically runs on multiple instances. The state that is locally available on any given instance is only a subset of the <a class="reference internal" href="../architecture.html#streams-architecture-state"><span class="std std-ref">application&#8217;s entire state</span></a>. Querying the local stores on an instance will only return data locally available on that particular instance.</p>
<p>The method <code class="docutils literal"><span class="pre">KafkaStreams#store(...)</span></code> finds an application instance&#8217;s local state stores by name and type.</p>
<div class="figure align-center" id="id1">
<img class="centered" src="/{{version}}/images/streams-interactive-queries-api-01.png">
<p class="caption"><span class="caption-text">Every application instance can directly query any of its local state stores.</span></p>
</div>
<p>The <em>name</em> of a state store is defined when you create the store. You can create the store explicitly by using the Processor API or implicitly by using stateful operations in the DSL.</p>
<p>The <em>type</em> of a state store is defined by <code class="docutils literal"><span class="pre">QueryableStoreType</span></code>. You can access the built-in types via the class <code class="docutils literal"><span class="pre">QueryableStoreTypes</span></code>.
Kafka Streams currently has two built-in types:</p>
<ul class="simple">
<li>A key-value store <code class="docutils literal"><span class="pre">QueryableStoreTypes#keyValueStore()</span></code>, see <a class="reference internal" href="#streams-developer-guide-interactive-queries-local-key-value-stores"><span class="std std-ref">Querying local key-value stores</span></a>.</li>
<li>A window store <code class="docutils literal"><span class="pre">QueryableStoreTypes#windowStore()</span></code>, see <a class="reference internal" href="#streams-developer-guide-interactive-queries-local-window-stores"><span class="std std-ref">Querying local window stores</span></a>.</li>
</ul>
<p>You can also <a class="reference internal" href="#streams-developer-guide-interactive-queries-custom-stores"><span class="std std-ref">implement your own QueryableStoreType</span></a> as described in section <a class="reference internal" href="#streams-developer-guide-interactive-queries-custom-stores"><span class="std std-ref">Querying local custom state stores</span></a>.</p>
<div class="admonition note">
<p><b>Note</b></p>
<p class="last">Kafka Streams materializes one state store per stream partition. This means your application will potentially manage
many underlying state stores. The API enables you to query all of the underlying stores without having to know which
partition the data is in.</p>
</div>
<div class="section" id="querying-local-key-value-stores">
<span id="streams-developer-guide-interactive-queries-local-key-value-stores"></span><h3><a class="toc-backref" href="#id4">Querying local key-value stores</a><a class="headerlink" href="#querying-local-key-value-stores" title="Permalink to this headline"></a></h3>
<p>To query a local key-value store, you must first create a topology with a key-value store. This example creates a key-value
store named &#8220;CountsKeyValueStore&#8221;. This store will hold the latest count for any word that is found on the topic &#8220;word-count-input&#8221;.</p>
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="n">Properties </span> <span class="n">props</span> <span class="o">=</span> <span class="o">...;</span>
<span class="n">StreamsBuilder</span> <span class="n">builder</span> <span class="o">=</span> <span class="o">...;</span>
<span class="n">KStream</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">String</span><span class="o">&gt;</span> <span class="n">textLines</span> <span class="o">=</span> <span class="o">...;</span>
<span class="c1">// Define the processing topology (here: WordCount)</span>
<span class="n">KGroupedStream</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">String</span><span class="o">&gt;</span> <span class="n">groupedByWord</span> <span class="o">=</span> <span class="n">textLines</span>
<span class="o">.</span><span class="na">flatMapValues</span><span class="o">(</span><span class="n">value</span> <span class="o">-&gt;</span> <span class="n">Arrays</span><span class="o">.</span><span class="na">asList</span><span class="o">(</span><span class="n">value</span><span class="o">.</span><span class="na">toLowerCase</span><span class="o">().</span><span class="na">split</span><span class="o">(</span><span class="s">&quot;\\W+&quot;</span><span class="o">)))</span>
<span class="o">.</span><span class="na">groupBy</span><span class="o">((</span><span class="n">key</span><span class="o">,</span> <span class="n">word</span><span class="o">)</span> <span class="o">-&gt;</span> <span class="n">word</span><span class="o">,</span> <span class="n">Grouped</span><span class="o">.</span><span class="na">with</span><span class="o">(</span><span class="n">stringSerde</span><span class="o">,</span> <span class="n">stringSerde</span><span class="o">));</span>
<span class="c1">// Create a key-value store named &quot;CountsKeyValueStore&quot; for the all-time word counts</span>
<span class="n">groupedByWord</span><span class="o">.</span><span class="na">count</span><span class="o">(</span><span class="n">Materialized</span><span class="o">.&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">String</span><span class="o">,</span> <span class="n">KeyValueStore</span><span class="o">&lt;</span><span class="n">Bytes</span><span class="o">,</span> <span class="kt">byte</span><span class="o">[]&gt;</span><span class="n">as</span><span class="o">(</span><span class="s">&quot;CountsKeyValueStore&quot;</span><span class="o">));</span>
<span class="c1">// Start an instance of the topology</span>
<span class="n">KafkaStreams</span> <span class="n">streams</span> <span class="o">=</span> <span class="k">new</span> <span class="n">KafkaStreams</span><span class="o">(</span><span class="n">builder</span><span class="o">,</span> <span class="n">props</span><span class="o">);</span>
<span class="n">streams</span><span class="o">.</span><span class="na">start</span><span class="o">();</span>
</pre></div>
</div>
<p>After the application has started, you can get access to &#8220;CountsKeyValueStore&#8221; and then query it via the <a class="reference external" href="https://github.com/apache/kafka/blob/1.0/streams/src/main/java/org/apache/kafka/streams/state/ReadOnlyKeyValueStore.java">ReadOnlyKeyValueStore</a> API:</p>
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="c1">// Get the key-value store CountsKeyValueStore</span>
<span class="n">ReadOnlyKeyValueStore</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Long</span><span class="o">&gt;</span> <span class="n">keyValueStore</span> <span class="o">=</span>
<span class="n">streams</span><span class="o">.</span><span class="na">store</span><span class="o">(</span><span class="s">&quot;CountsKeyValueStore&quot;</span><span class="o">,</span> <span class="n">QueryableStoreTypes</span><span class="o">.</span><span class="na">keyValueStore</span><span class="o">());</span>
<span class="c1">// Get value by key</span>
<span class="n">System</span><span class="o">.</span><span class="na">out</span><span class="o">.</span><span class="na">println</span><span class="o">(</span><span class="s">&quot;count for hello:&quot;</span> <span class="o">+</span> <span class="n">keyValueStore</span><span class="o">.</span><span class="na">get</span><span class="o">(</span><span class="s">&quot;hello&quot;</span><span class="o">));</span>
<span class="c1">// Get the values for a range of keys available in this application instance</span>
<span class="n">KeyValueIterator</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Long</span><span class="o">&gt;</span> <span class="n">range</span> <span class="o">=</span> <span class="n">keyValueStore</span><span class="o">.</span><span class="na">range</span><span class="o">(</span><span class="s">&quot;all&quot;</span><span class="o">,</span> <span class="s">&quot;streams&quot;</span><span class="o">);</span>
<span class="k">while</span> <span class="o">(</span><span class="n">range</span><span class="o">.</span><span class="na">hasNext</span><span class="o">())</span> <span class="o">{</span>
<span class="n">KeyValue</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Long</span><span class="o">&gt;</span> <span class="n">next</span> <span class="o">=</span> <span class="n">range</span><span class="o">.</span><span class="na">next</span><span class="o">();</span>
<span class="n">System</span><span class="o">.</span><span class="na">out</span><span class="o">.</span><span class="na">println</span><span class="o">(</span><span class="s">&quot;count for &quot;</span> <span class="o">+</span> <span class="n">next</span><span class="o">.</span><span class="na">key</span> <span class="o">+</span> <span class="s">&quot;: &quot;</span> <span class="o">+</span> <span class="n">next</span><span class="o">.</span><span class="na">value</span><span class="o">);</span>
<span class="o">}</span>
<span class="c1">// Get the values for all of the keys available in this application instance</span>
<span class="n">KeyValueIterator</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Long</span><span class="o">&gt;</span> <span class="n">range</span> <span class="o">=</span> <span class="n">keyValueStore</span><span class="o">.</span><span class="na">all</span><span class="o">();</span>
<span class="k">while</span> <span class="o">(</span><span class="n">range</span><span class="o">.</span><span class="na">hasNext</span><span class="o">())</span> <span class="o">{</span>
<span class="n">KeyValue</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Long</span><span class="o">&gt;</span> <span class="n">next</span> <span class="o">=</span> <span class="n">range</span><span class="o">.</span><span class="na">next</span><span class="o">();</span>
<span class="n">System</span><span class="o">.</span><span class="na">out</span><span class="o">.</span><span class="na">println</span><span class="o">(</span><span class="s">&quot;count for &quot;</span> <span class="o">+</span> <span class="n">next</span><span class="o">.</span><span class="na">key</span> <span class="o">+</span> <span class="s">&quot;: &quot;</span> <span class="o">+</span> <span class="n">next</span><span class="o">.</span><span class="na">value</span><span class="o">);</span>
<span class="o">}</span>
</pre></div>
</div>
<p>You can also materialize the results of stateless operators by using the overloaded methods that take a <code class="docutils literal"><span class="pre">queryableStoreName</span></code>
as shown in the example below:</p>
<div class="highlight-java"><div class="highlight"><pre><span></span>
<span class="n">StreamsBuilder</span> <span class="n">builder</span> <span class="o">=</span> <span class="o">...;</span>
<span class="n">KTable</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Integer</span><span class="o">&gt;</span> <span class="n">regionCounts</span> <span class="o">=</span> <span class="o">...;</span>
<span class="c1">// materialize the result of filtering corresponding to odd numbers</span>
<span class="c1">// the &quot;queryableStoreName&quot; can be subsequently queried.</span>
<span class="n">KTable</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Integer</span><span class="o">&gt;</span> <span class="n">oddCounts</span> <span class="o">=</span> <span class="n">numberLines</span><span class="o">.</span><span class="na">filter</span><span class="o">((</span><span class="n">region</span><span class="o">,</span> <span class="n">count</span><span class="o">)</span> <span class="o">-&gt;</span> <span class="o">(</span><span class="n">count</span> <span class="o">%</span> <span class="mi">2</span> <span class="o">!=</span> <span class="mi">0</span><span class="o">),</span>
<span class="n">Materialized</span><span class="o">.&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Integer</span><span class="o">,</span> <span class="n">KeyValueStore</span><span class="o">&lt;</span><span class="n">Bytes</span><span class="o">,</span> <span class="kt">byte</span><span class="o">[]&gt;</span><span class="n">as</span><span class="o">(</span><span class="s">&quot;queryableStoreName&quot;</span><span class="o">));</span>
<span class="c1">// do not materialize the result of filtering corresponding to even numbers</span>
<span class="c1">// this means that these results will not be materialized and cannot be queried.</span>
<span class="n">KTable</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Integer</span><span class="o">&gt;</span> <span class="n">oddCounts</span> <span class="o">=</span> <span class="n">numberLines</span><span class="o">.</span><span class="na">filter</span><span class="o">((</span><span class="n">region</span><span class="o">,</span> <span class="n">count</span><span class="o">)</span> <span class="o">-&gt;</span> <span class="o">(</span><span class="n">count</span> <span class="o">%</span> <span class="mi">2</span> <span class="o">==</span> <span class="mi">0</span><span class="o">));</span>
</pre></div>
</div>
</div>
<div class="section" id="querying-local-window-stores">
<span id="streams-developer-guide-interactive-queries-local-window-stores"></span><h3><a class="toc-backref" href="#id5">Querying local window stores</a><a class="headerlink" href="#querying-local-window-stores" title="Permalink to this headline"></a></h3>
<p>A window store will potentially have many results for any given key because the key can be present in multiple windows.
However, there is only one result per window for a given key.</p>
<p>To query a local window store, you must first create a topology with a window store. This example creates a window store
named &#8220;CountsWindowStore&#8221; that contains the counts for words in 1-minute windows.</p>
<div class="highlight-java"><div class="highlight"><pre><span></span>
<span class="n">StreamsBuilder</span> <span class="n">builder</span> <span class="o">=</span> <span class="o">...;</span>
<span class="n">KStream</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">String</span><span class="o">&gt;</span> <span class="n">textLines</span> <span class="o">=</span> <span class="o">...;</span>
<span class="c1">// Define the processing topology (here: WordCount)</span>
<span class="n">KGroupedStream</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">String</span><span class="o">&gt;</span> <span class="n">groupedByWord</span> <span class="o">=</span> <span class="n">textLines</span>
<span class="o">.</span><span class="na">flatMapValues</span><span class="o">(</span><span class="n">value</span> <span class="o">-&gt;</span> <span class="n">Arrays</span><span class="o">.</span><span class="na">asList</span><span class="o">(</span><span class="n">value</span><span class="o">.</span><span class="na">toLowerCase</span><span class="o">().</span><span class="na">split</span><span class="o">(</span><span class="s">&quot;\\W+&quot;</span><span class="o">)))</span>
<span class="o">.</span><span class="na">groupBy</span><span class="o">((</span><span class="n">key</span><span class="o">,</span> <span class="n">word</span><span class="o">)</span> <span class="o">-&gt;</span> <span class="n">word</span><span class="o">,</span> <span class="n">Grouped</span><span class="o">.</span><span class="na">with</span><span class="o">(</span><span class="n">stringSerde</span><span class="o">,</span> <span class="n">stringSerde</span><span class="o">));</span>
<span class="c1">// Create a window state store named &quot;CountsWindowStore&quot; that contains the word counts for every minute</span>
<span class="n">groupedByWord</span><span class="o">.</span><span class="na">windowedBy</span><span class="o">(</span><span class="n">TimeWindows</span><span class="o">.</span><span class="na">of</span><span class="o">(<span class="n">Duration</span><span class="o">.</span><span class="na">ofSeconds</span><span class="o">(</span><span class="mi">60</span><span class="o">)))</span>
<span class="o">.</span><span class="na">count</span><span class="o">(</span><span class="n">Materialized</span><span class="o">.&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Long</span><span class="o">,</span> <span class="n">WindowStore</span><span class="o">&lt;</span><span class="n">Bytes</span><span class="o">,</span> <span class="kt">byte</span><span class="o">[]&gt;</span><span class="n">as</span><span class="o">(</span><span class="s">&quot;CountsWindowStore&quot;</span><span class="o">));</span>
</pre></div>
</div>
<p>After the application has started, you can get access to &#8220;CountsWindowStore&#8221; and then query it via the <a class="reference external" href="https://github.com/apache/kafka/blob/1.0/streams/src/main/java/org/apache/kafka/streams/state/ReadOnlyWindowStore.java">ReadOnlyWindowStore</a> API:</p>
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="c1">// Get the window store named &quot;CountsWindowStore&quot;</span>
<span class="n">ReadOnlyWindowStore</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Long</span><span class="o">&gt;</span> <span class="n">windowStore</span> <span class="o">=</span>
<span class="n">streams</span><span class="o">.</span><span class="na">store</span><span class="o">(</span><span class="s">&quot;CountsWindowStore&quot;</span><span class="o">,</span> <span class="n">QueryableStoreTypes</span><span class="o">.</span><span class="na">windowStore</span><span class="o">());</span>
<span class="c1">// Fetch values for the key &quot;world&quot; for all of the windows available in this application instance.</span>
<span class="c1">// To get *all* available windows we fetch windows from the beginning of time until now.</span>
<span class="kt">Instant</span> <span class="n">timeFrom</span> <span class="o">=</span> <span class="na">Instant</span><span class="o">.</span><span class="na">ofEpochMilli<span class="o">(</span><span class="mi">0</span><span class="o">);</span> <span class="c1">// beginning of time = oldest available</span>
<span class="kt">Instant</span> <span class="n">timeTo</span> <span class="o">=</span> <span class="n">Instant</span><span class="o">.</span><span class="na">now</span><span class="o">();</span> <span class="c1">// now (in processing-time)</span>
<span class="n">WindowStoreIterator</span><span class="o">&lt;</span><span class="n">Long</span><span class="o">&gt;</span> <span class="n">iterator</span> <span class="o">=</span> <span class="n">windowStore</span><span class="o">.</span><span class="na">fetch</span><span class="o">(</span><span class="s">&quot;world&quot;</span><span class="o">,</span> <span class="n">timeFrom</span><span class="o">,</span> <span class="n">timeTo</span><span class="o">);</span>
<span class="k">while</span> <span class="o">(</span><span class="n">iterator</span><span class="o">.</span><span class="na">hasNext</span><span class="o">())</span> <span class="o">{</span>
<span class="n">KeyValue</span><span class="o">&lt;</span><span class="n">Long</span><span class="o">,</span> <span class="n">Long</span><span class="o">&gt;</span> <span class="n">next</span> <span class="o">=</span> <span class="n">iterator</span><span class="o">.</span><span class="na">next</span><span class="o">();</span>
<span class="kt">long</span> <span class="n">windowTimestamp</span> <span class="o">=</span> <span class="n">next</span><span class="o">.</span><span class="na">key</span><span class="o">;</span>
<span class="n">System</span><span class="o">.</span><span class="na">out</span><span class="o">.</span><span class="na">println</span><span class="o">(</span><span class="s">&quot;Count of &#39;world&#39; @ time &quot;</span> <span class="o">+</span> <span class="n">windowTimestamp</span> <span class="o">+</span> <span class="s">&quot; is &quot;</span> <span class="o">+</span> <span class="n">next</span><span class="o">.</span><span class="na">value</span><span class="o">);</span>
<span class="o">}</span>
</pre></div>
</div>
</div>
<div class="section" id="querying-local-custom-state-stores">
<span id="streams-developer-guide-interactive-queries-custom-stores"></span><h3><a class="toc-backref" href="#id6">Querying local custom state stores</a><a class="headerlink" href="#querying-local-custom-state-stores" title="Permalink to this headline"></a></h3>
<div class="admonition note">
<p><b>Note</b></p>
<p class="last">Only the <a class="reference internal" href="processor-api.html#streams-developer-guide-processor-api"><span class="std std-ref">Processor API</span></a> supports custom state stores.</p>
</div>
<p>Before querying the custom state stores you must implement these interfaces:</p>
<ul class="simple">
<li>Your custom state store must implement <code class="docutils literal"><span class="pre">StateStore</span></code>.</li>
<li>You must have an interface to represent the operations available on the store.</li>
<li>You must provide an implementation of <code class="docutils literal"><span class="pre">StoreBuilder</span></code> for creating instances of your store.</li>
<li>It is recommended that you provide an interface that restricts access to read-only operations. This prevents users of this API from mutating the state of your running Kafka Streams application out-of-band.</li>
</ul>
<p>The class/interface hierarchy for your custom store might look something like:</p>
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="kd">public</span> <span class="kd">class</span> <span class="nc">MyCustomStore</span><span class="o">&lt;</span><span class="n">K</span><span class="o">,</span><span class="n">V</span><span class="o">&gt;</span> <span class="kd">implements</span> <span class="n">StateStore</span><span class="o">,</span> <span class="n">MyWriteableCustomStore</span><span class="o">&lt;</span><span class="n">K</span><span class="o">,</span><span class="n">V</span><span class="o">&gt;</span> <span class="o">{</span>
<span class="c1">// implementation of the actual store</span>
<span class="o">}</span>
<span class="c1">// Read-write interface for MyCustomStore</span>
<span class="kd">public</span> <span class="kd">interface</span> <span class="nc">MyWriteableCustomStore</span><span class="o">&lt;</span><span class="n">K</span><span class="o">,</span><span class="n">V</span><span class="o">&gt;</span> <span class="kd">extends</span> <span class="n">MyReadableCustomStore</span><span class="o">&lt;</span><span class="n">K</span><span class="o">,</span><span class="n">V</span><span class="o">&gt;</span> <span class="o">{</span>
<span class="kt">void</span> <span class="nf">write</span><span class="o">(</span><span class="n">K</span> <span class="n">Key</span><span class="o">,</span> <span class="n">V</span> <span class="n">value</span><span class="o">);</span>
<span class="o">}</span>
<span class="c1">// Read-only interface for MyCustomStore</span>
<span class="kd">public</span> <span class="kd">interface</span> <span class="nc">MyReadableCustomStore</span><span class="o">&lt;</span><span class="n">K</span><span class="o">,</span><span class="n">V</span><span class="o">&gt;</span> <span class="o">{</span>
<span class="n">V</span> <span class="nf">read</span><span class="o">(</span><span class="n">K</span> <span class="n">key</span><span class="o">);</span>
<span class="o">}</span>
<span class="kd">public</span> <span class="kd">class</span> <span class="nc">MyCustomStoreBuilder</span> <span class="kd">implements</span> <span class="n">StoreBuilder</span> <span class="o">{</span>
<span class="c1">// implementation of the supplier for MyCustomStore</span>
<span class="o">}</span>
</pre></div>
</div>
<p>To make this store queryable you must:</p>
<ul class="simple">
<li>Provide an implementation of <a class="reference external" href="https://github.com/apache/kafka/blob/1.0/streams/src/main/java/org/apache/kafka/streams/state/QueryableStoreType.java">QueryableStoreType</a>.</li>
<li>Provide a wrapper class that has access to all of the underlying instances of the store and is used for querying.</li>
</ul>
<p>Here is how to implement <code class="docutils literal"><span class="pre">QueryableStoreType</span></code>:</p>
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="kd">public</span> <span class="kd">class</span> <span class="nc">MyCustomStoreType</span><span class="o">&lt;</span><span class="n">K</span><span class="o">,</span><span class="n">V</span><span class="o">&gt;</span> <span class="kd">implements</span> <span class="n">QueryableStoreType</span><span class="o">&lt;</span><span class="n">MyReadableCustomStore</span><span class="o">&lt;</span><span class="n">K</span><span class="o">,</span><span class="n">V</span><span class="o">&gt;&gt;</span> <span class="o">{</span>
<span class="c1">// Only accept StateStores that are of type MyCustomStore</span>
<span class="kd">public</span> <span class="kt">boolean</span> <span class="nf">accepts</span><span class="o">(</span><span class="kd">final</span> <span class="n">StateStore</span> <span class="n">stateStore</span><span class="o">)</span> <span class="o">{</span>
<span class="k">return</span> <span class="n">stateStore</span> <span class="n">instanceOf</span> <span class="n">MyCustomStore</span><span class="o">;</span>
<span class="o">}</span>
<span class="kd">public</span> <span class="n">MyReadableCustomStore</span><span class="o">&lt;</span><span class="n">K</span><span class="o">,</span><span class="n">V</span><span class="o">&gt;</span> <span class="nf">create</span><span class="o">(</span><span class="kd">final</span> <span class="n">StateStoreProvider</span> <span class="n">storeProvider</span><span class="o">,</span> <span class="kd">final</span> <span class="n">String</span> <span class="n">storeName</span><span class="o">)</span> <span class="o">{</span>
<span class="k">return</span> <span class="k">new</span> <span class="n">MyCustomStoreTypeWrapper</span><span class="o">(</span><span class="n">storeProvider</span><span class="o">,</span> <span class="n">storeName</span><span class="o">,</span> <span class="k">this</span><span class="o">);</span>
<span class="o">}</span>
<span class="o">}</span>
</pre></div>
</div>
<p>A wrapper class is required because each instance of a Kafka Streams application may run multiple stream tasks and manage
multiple local instances of a particular state store. The wrapper class hides this complexity and lets you query a &#8220;logical&#8221;
state store by name without having to know about all of the underlying local instances of that state store.</p>
<p>When implementing your wrapper class you must use the
<a class="reference external" href="https://github.com/apache/kafka/blob/1.0/streams/src/main/java/org/apache/kafka/streams/state/internals/StateStoreProvider.java">StateStoreProvider</a>
interface to get access to the underlying instances of your store.
<code class="docutils literal"><span class="pre">StateStoreProvider#stores(String</span> <span class="pre">storeName,</span> <span class="pre">QueryableStoreType&lt;T&gt;</span> <span class="pre">queryableStoreType)</span></code> returns a <code class="docutils literal"><span class="pre">List</span></code> of state
stores with the given storeName and of the type as defined by <code class="docutils literal"><span class="pre">queryableStoreType</span></code>.</p>
<p>Here is an example implementation of the wrapper follows (Java 8+):</p>
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="c1">// We strongly recommended implementing a read-only interface</span>
<span class="c1">// to restrict usage of the store to safe read operations!</span>
<span class="kd">public</span> <span class="kd">class</span> <span class="nc">MyCustomStoreTypeWrapper</span><span class="o">&lt;</span><span class="n">K</span><span class="o">,</span><span class="n">V</span><span class="o">&gt;</span> <span class="kd">implements</span> <span class="n">MyReadableCustomStore</span><span class="o">&lt;</span><span class="n">K</span><span class="o">,</span><span class="n">V</span><span class="o">&gt;</span> <span class="o">{</span>
<span class="kd">private</span> <span class="kd">final</span> <span class="n">QueryableStoreType</span><span class="o">&lt;</span><span class="n">MyReadableCustomStore</span><span class="o">&lt;</span><span class="n">K</span><span class="o">,</span> <span class="n">V</span><span class="o">&gt;&gt;</span> <span class="n">customStoreType</span><span class="o">;</span>
<span class="kd">private</span> <span class="kd">final</span> <span class="n">String</span> <span class="n">storeName</span><span class="o">;</span>
<span class="kd">private</span> <span class="kd">final</span> <span class="n">StateStoreProvider</span> <span class="n">provider</span><span class="o">;</span>
<span class="kd">public</span> <span class="nf">CustomStoreTypeWrapper</span><span class="o">(</span><span class="kd">final</span> <span class="n">StateStoreProvider</span> <span class="n">provider</span><span class="o">,</span>
<span class="kd">final</span> <span class="n">String</span> <span class="n">storeName</span><span class="o">,</span>
<span class="kd">final</span> <span class="n">QueryableStoreType</span><span class="o">&lt;</span><span class="n">MyReadableCustomStore</span><span class="o">&lt;</span><span class="n">K</span><span class="o">,</span> <span class="n">V</span><span class="o">&gt;&gt;</span> <span class="n">customStoreType</span><span class="o">)</span> <span class="o">{</span>
<span class="c1">// ... assign fields ...</span>
<span class="o">}</span>
<span class="c1">// Implement a safe read method</span>
<span class="nd">@Override</span>
<span class="kd">public</span> <span class="n">V</span> <span class="nf">read</span><span class="o">(</span><span class="kd">final</span> <span class="n">K</span> <span class="n">key</span><span class="o">)</span> <span class="o">{</span>
<span class="c1">// Get all the stores with storeName and of customStoreType</span>
<span class="kd">final</span> <span class="n">List</span><span class="o">&lt;</span><span class="n">MyReadableCustomStore</span><span class="o">&lt;</span><span class="n">K</span><span class="o">,</span> <span class="n">V</span><span class="o">&gt;&gt;</span> <span class="n">stores</span> <span class="o">=</span> <span class="n">provider</span><span class="o">.</span><span class="na">getStores</span><span class="o">(</span><span class="n">storeName</span><span class="o">,</span> <span class="n">customStoreType</span><span class="o">);</span>
<span class="c1">// Try and find the value for the given key</span>
<span class="kd">final</span> <span class="n">Optional</span><span class="o">&lt;</span><span class="n">V</span><span class="o">&gt;</span> <span class="n">value</span> <span class="o">=</span> <span class="n">stores</span><span class="o">.</span><span class="na">stream</span><span class="o">().</span><span class="na">filter</span><span class="o">(</span><span class="n">store</span> <span class="o">-&gt;</span> <span class="n">store</span><span class="o">.</span><span class="na">read</span><span class="o">(</span><span class="n">key</span><span class="o">)</span> <span class="o">!=</span> <span class="kc">null</span><span class="o">).</span><span class="na">findFirst</span><span class="o">();</span>
<span class="c1">// Return the value if it exists</span>
<span class="k">return</span> <span class="n">value</span><span class="o">.</span><span class="na">orElse</span><span class="o">(</span><span class="kc">null</span><span class="o">);</span>
<span class="o">}</span>
<span class="o">}</span>
</pre></div>
</div>
<p>You can now find and query your custom store:</p>
<div class="highlight-java"><div class="highlight"><pre><span></span>
<span class="n">Topology</span> <span class="n">topology</span> <span class="o">=</span> <span class="o">...;</span>
<span class="n">ProcessorSupplier</span> <span class="n">processorSuppler</span> <span class="o">=</span> <span class="o">...;</span>
<span class="c1">// Create CustomStoreSupplier for store name the-custom-store</span>
<span class="n">MyCustomStoreBuilder</span> <span class="n">customStoreBuilder</span> <span class="o">=</span> <span class="k">new</span> <span class="n">MyCustomStoreBuilder</span><span class="o">(</span><span class="s">&quot;the-custom-store&quot;</span><span class="o">)</span> <span class="c1">//...;</span>
<span class="c1">// Add the source topic</span>
<span class="n">topology</span><span class="o">.</span><span class="na">addSource</span><span class="o">(</span><span class="s">&quot;input&quot;</span><span class="o">,</span> <span class="s">&quot;inputTopic&quot;</span><span class="o">);</span>
<span class="c1">// Add a custom processor that reads from the source topic</span>
<span class="n">topology</span><span class="o">.</span><span class="na">addProcessor</span><span class="o">(</span><span class="s">&quot;the-processor&quot;</span><span class="o">,</span> <span class="n">processorSupplier</span><span class="o">,</span> <span class="s">&quot;input&quot;</span><span class="o">);</span>
<span class="c1">// Connect your custom state store to the custom processor above</span>
<span class="n">topology</span><span class="o">.</span><span class="na">addStateStore</span><span class="o">(</span><span class="n">customStoreBuilder</span><span class="o">,</span> <span class="s">&quot;the-processor&quot;</span><span class="o">);</span>
<span class="n">KafkaStreams</span> <span class="n">streams</span> <span class="o">=</span> <span class="k">new</span> <span class="n">KafkaStreams</span><span class="o">(</span><span class="n">topology</span><span class="o">,</span> <span class="n">config</span><span class="o">);</span>
<span class="n">streams</span><span class="o">.</span><span class="na">start</span><span class="o">();</span>
<span class="c1">// Get access to the custom store</span>
<span class="n">MyReadableCustomStore</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span><span class="n">String</span><span class="o">&gt;</span> <span class="n">store</span> <span class="o">=</span> <span class="n">streams</span><span class="o">.</span><span class="na">store</span><span class="o">(</span><span class="s">&quot;the-custom-store&quot;</span><span class="o">,</span> <span class="k">new</span> <span class="n">MyCustomStoreType</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span><span class="n">String</span><span class="o">&gt;());</span>
<span class="c1">// Query the store</span>
<span class="n">String</span> <span class="n">value</span> <span class="o">=</span> <span class="n">store</span><span class="o">.</span><span class="na">read</span><span class="o">(</span><span class="s">&quot;key&quot;</span><span class="o">);</span>
</pre></div>
</div>
</div>
</div>
<div class="section" id="querying-remote-state-stores-for-the-entire-app">
<span id="streams-developer-guide-interactive-queries-discovery"></span><h2><a class="toc-backref" href="#id7">Querying remote state stores for the entire app</a><a class="headerlink" href="#querying-remote-state-stores-for-the-entire-app" title="Permalink to this headline"></a></h2>
<p>To query remote states for the entire app, you must expose the application&#8217;s full state to other applications, including
applications that are running on different machines.</p>
<p>For example, you have a Kafka Streams application that processes user events in a multi-player video game, and you want to retrieve the latest status of each user directly and display it in a mobile app. Here are the required steps to make the full state of your application queryable:</p>
<ol class="arabic simple">
<li><a class="reference internal" href="#streams-developer-guide-interactive-queries-rpc-layer"><span class="std std-ref">Add an RPC layer to your application</span></a> so that
the instances of your application can be interacted with via the network (e.g., a REST API, Thrift, a custom protocol,
and so on). The instances must respond to interactive queries. You can follow the reference examples provided to get
started.</li>
<li><a class="reference internal" href="#streams-developer-guide-interactive-queries-expose-rpc"><span class="std std-ref">Expose the RPC endpoints</span></a> of
your application&#8217;s instances via the <code class="docutils literal"><span class="pre">application.server</span></code> configuration setting of Kafka Streams. Because RPC
endpoints must be unique within a network, each instance has its own value for this configuration setting.
This makes an application instance discoverable by other instances.</li>
<li>In the RPC layer, <a class="reference internal" href="#streams-developer-guide-interactive-queries-discover-app-instances-and-stores"><span class="std std-ref">discover remote application instances</span></a> and their state stores and <a class="reference internal" href="#streams-developer-guide-interactive-queries-local-stores"><span class="std std-ref">query locally available state stores</span></a> to make the full state of your application queryable. The remote application instances can forward queries to other app instances if a particular instance lacks the local data to respond to a query. The locally available state stores can directly respond to queries.</li>
</ol>
<div class="figure align-center" id="id2">
<img class="centered" src="/{{version}}/images/streams-interactive-queries-api-02.png">
<p class="caption"><span class="caption-text">Discover any running instances of the same application as well as the respective RPC endpoints they expose for
interactive queries</span></p>
</div>
<div class="section" id="adding-an-rpc-layer-to-your-application">
<span id="streams-developer-guide-interactive-queries-rpc-layer"></span><h3><a class="toc-backref" href="#id8">Adding an RPC layer to your application</a><a class="headerlink" href="#adding-an-rpc-layer-to-your-application" title="Permalink to this headline"></a></h3>
<p>There are many ways to add an RPC layer. The only requirements are that the RPC layer is embedded within the Kafka Streams
application and that it exposes an endpoint that other application instances and applications can connect to.</p>
</div>
<div class="section" id="exposing-the-rpc-endpoints-of-your-application">
<span id="streams-developer-guide-interactive-queries-expose-rpc"></span><h3><a class="toc-backref" href="#id9">Exposing the RPC endpoints of your application</a><a class="headerlink" href="#exposing-the-rpc-endpoints-of-your-application" title="Permalink to this headline"></a></h3>
<p>To enable remote state store discovery in a distributed Kafka Streams application, you must set the <a class="reference internal" href="config-streams.html#streams-developer-guide-required-configs"><span class="std std-ref">configuration property</span></a> in the config properties.
The <code class="docutils literal"><span class="pre">application.server</span></code> property defines a unique <code class="docutils literal"><span class="pre">host:port</span></code> pair that points to the RPC endpoint of the respective instance of a Kafka Streams application.
The value of this configuration property will vary across the instances of your application.
When this property is set, Kafka Streams will keep track of the RPC endpoint information for every instance of an application, its state stores, and assigned stream partitions through instances of <a class="reference external" href="/{{version}}/javadoc/org/apache/kafka/streams/state/StreamsMetadata.html">StreamsMetadata</a>.</p>
<div class="admonition tip">
<p><b>Tip</b></p>
<p class="last">Consider leveraging the exposed RPC endpoints of your application for further functionality, such as
piggybacking additional inter-application communication that goes beyond interactive queries.</p>
</div>
<p>This example shows how to configure and run a Kafka Streams application that supports the discovery of its state stores.</p>
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="n">Properties</span> <span class="n">props</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Properties</span><span class="o">();</span>
<span class="c1">// Set the unique RPC endpoint of this application instance through which it</span>
<span class="c1">// can be interactively queried. In a real application, the value would most</span>
<span class="c1">// probably not be hardcoded but derived dynamically.</span>
<span class="n">String</span> <span class="n">rpcEndpoint</span> <span class="o">=</span> <span class="s">&quot;host1:4460&quot;</span><span class="o">;</span>
<span class="n">props</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">StreamsConfig</span><span class="o">.</span><span class="na">APPLICATION_SERVER_CONFIG</span><span class="o">,</span> <span class="n">rpcEndpoint</span><span class="o">);</span>
<span class="c1">// ... further settings may follow here ...</span>
<span class="n">StreamsBuilder</span> <span class="n">builder</span> <span class="o">=</span> <span class="k">new</span> <span class="n">StreamsBuilder</span><span class="o">();</span>
<span class="n">KStream</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">String</span><span class="o">&gt;</span> <span class="n">textLines</span> <span class="o">=</span> <span class="n">builder</span><span class="o">.</span><span class="na">stream</span><span class="o">(</span><span class="n">stringSerde</span><span class="o">,</span> <span class="n">stringSerde</span><span class="o">,</span> <span class="s">&quot;word-count-input&quot;</span><span class="o">);</span>
<span class="kd">final</span> <span class="n">KGroupedStream</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">String</span><span class="o">&gt;</span> <span class="n">groupedByWord</span> <span class="o">=</span> <span class="n">textLines</span>
<span class="o">.</span><span class="na">flatMapValues</span><span class="o">(</span><span class="n">value</span> <span class="o">-&gt;</span> <span class="n">Arrays</span><span class="o">.</span><span class="na">asList</span><span class="o">(</span><span class="n">value</span><span class="o">.</span><span class="na">toLowerCase</span><span class="o">().</span><span class="na">split</span><span class="o">(</span><span class="s">&quot;\\W+&quot;</span><span class="o">)))</span>
<span class="o">.</span><span class="na">groupBy</span><span class="o">((</span><span class="n">key</span><span class="o">,</span> <span class="n">word</span><span class="o">)</span> <span class="o">-&gt;</span> <span class="n">word</span><span class="o">,</span> <span class="n">Grouped</span><span class="o">.</span><span class="na">with</span><span class="o">(</span><span class="n">stringSerde</span><span class="o">,</span> <span class="n">stringSerde</span><span class="o">));</span>
<span class="c1">// This call to `count()` creates a state store named &quot;word-count&quot;.</span>
<span class="c1">// The state store is discoverable and can be queried interactively.</span>
<span class="n">groupedByWord</span><span class="o">.</span><span class="na">count</span><span class="o">(</span><span class="n">Materialized</span><span class="o">.&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Long</span><span class="o">,</span> <span class="n">KeyValueStore</span><span class="o">&lt;</span><span class="n">Bytes</span><span class="o">,</span> <span class="kt">byte</span><span class="o">[]&gt;</span><span class="n">as</span><span class="o">(</span><span class="s">&quot;word-count&quot;</span><span class="o">));</span>
<span class="c1">// Start an instance of the topology</span>
<span class="n">KafkaStreams</span> <span class="n">streams</span> <span class="o">=</span> <span class="k">new</span> <span class="n">KafkaStreams</span><span class="o">(</span><span class="n">builder</span><span class="o">,</span> <span class="n">props</span><span class="o">);</span>
<span class="n">streams</span><span class="o">.</span><span class="na">start</span><span class="o">();</span>
<span class="c1">// Then, create and start the actual RPC service for remote access to this</span>
<span class="c1">// application instance&#39;s local state stores.</span>
<span class="c1">//</span>
<span class="c1">// This service should be started on the same host and port as defined above by</span>
<span class="c1">// the property `StreamsConfig.APPLICATION_SERVER_CONFIG`. The example below is</span>
<span class="c1">// fictitious, but we provide end-to-end demo applications (such as KafkaMusicExample)</span>
<span class="c1">// that showcase how to implement such a service to get you started.</span>
<span class="n">MyRPCService</span> <span class="n">rpcService</span> <span class="o">=</span> <span class="o">...;</span>
<span class="n">rpcService</span><span class="o">.</span><span class="na">listenAt</span><span class="o">(</span><span class="n">rpcEndpoint</span><span class="o">);</span>
</pre></div>
</div>
</div>
<div class="section" id="discovering-and-accessing-application-instances-and-their-local-state-stores">
<span id="streams-developer-guide-interactive-queries-discover-app-instances-and-stores"></span><h3><a class="toc-backref" href="#id10">Discovering and accessing application instances and their local state stores</a><a class="headerlink" href="#discovering-and-accessing-application-instances-and-their-local-state-stores" title="Permalink to this headline"></a></h3>
<p>The following methods return <a class="reference external" href="/{{version}}/javadoc/org/apache/kafka/streams/state/StreamsMetadata.html">StreamsMetadata</a> objects, which provide meta-information about application instances such as their RPC endpoint and locally available state stores.</p>
<ul class="simple">
<li><code class="docutils literal"><span class="pre">KafkaStreams#allMetadata()</span></code>: find all instances of this application</li>
<li><code class="docutils literal"><span class="pre">KafkaStreams#allMetadataForStore(String</span> <span class="pre">storeName)</span></code>: find those applications instances that manage local instances of the state store &#8220;storeName&#8221;</li>
<li><code class="docutils literal"><span class="pre">KafkaStreams#metadataForKey(String</span> <span class="pre">storeName,</span> <span class="pre">K</span> <span class="pre">key,</span> <span class="pre">Serializer&lt;K&gt;</span> <span class="pre">keySerializer)</span></code>: using the default stream partitioning strategy, find the one application instance that holds the data for the given key in the given state store</li>
<li><code class="docutils literal"><span class="pre">KafkaStreams#metadataForKey(String</span> <span class="pre">storeName,</span> <span class="pre">K</span> <span class="pre">key,</span> <span class="pre">StreamPartitioner&lt;K,</span> <span class="pre">?&gt;</span> <span class="pre">partitioner)</span></code>: using <code class="docutils literal"><span class="pre">partitioner</span></code>, find the one application instance that holds the data for the given key in the given state store</li>
</ul>
<div class="admonition attention">
<p class="first admonition-title">Attention</p>
<p class="last">If <code class="docutils literal"><span class="pre">application.server</span></code> is not configured for an application instance, then the above methods will not find any <a class="reference external" href="/{{version}}/javadoc/org/apache/kafka/streams/state/StreamsMetadata.html">StreamsMetadata</a> for it.</p>
</div>
<p>For example, we can now find the <code class="docutils literal"><span class="pre">StreamsMetadata</span></code> for the state store named &#8220;word-count&#8221; that we defined in the
code example shown in the previous section:</p>
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="n">KafkaStreams</span> <span class="n">streams</span> <span class="o">=</span> <span class="o">...;</span>
<span class="c1">// Find all the locations of local instances of the state store named &quot;word-count&quot;</span>
<span class="n">Collection</span><span class="o">&lt;</span><span class="n">StreamsMetadata</span><span class="o">&gt;</span> <span class="n">wordCountHosts</span> <span class="o">=</span> <span class="n">streams</span><span class="o">.</span><span class="na">allMetadataForStore</span><span class="o">(</span><span class="s">&quot;word-count&quot;</span><span class="o">);</span>
<span class="c1">// For illustrative purposes, we assume using an HTTP client to talk to remote app instances.</span>
<span class="n">HttpClient</span> <span class="n">http</span> <span class="o">=</span> <span class="o">...;</span>
<span class="c1">// Get the word count for word (aka key) &#39;alice&#39;: Approach 1</span>
<span class="c1">//</span>
<span class="c1">// We first find the one app instance that manages the count for &#39;alice&#39; in its local state stores.</span>
<span class="n">StreamsMetadata</span> <span class="n">metadata</span> <span class="o">=</span> <span class="n">streams</span><span class="o">.</span><span class="na">metadataForKey</span><span class="o">(</span><span class="s">&quot;word-count&quot;</span><span class="o">,</span> <span class="s">&quot;alice&quot;</span><span class="o">,</span> <span class="n">Serdes</span><span class="o">.</span><span class="na">String</span><span class="o">().</span><span class="na">serializer</span><span class="o">());</span>
<span class="c1">// Then, we query only that single app instance for the latest count of &#39;alice&#39;.</span>
<span class="c1">// Note: The RPC URL shown below is fictitious and only serves to illustrate the idea. Ultimately,</span>
<span class="c1">// the URL (or, in general, the method of communication) will depend on the RPC layer you opted to</span>
<span class="c1">// implement. Again, we provide end-to-end demo applications (such as KafkaMusicExample) that showcase</span>
<span class="c1">// how to implement such an RPC layer.</span>
<span class="n">Long</span> <span class="n">result</span> <span class="o">=</span> <span class="n">http</span><span class="o">.</span><span class="na">getLong</span><span class="o">(</span><span class="s">&quot;http://&quot;</span> <span class="o">+</span> <span class="n">metadata</span><span class="o">.</span><span class="na">host</span><span class="o">()</span> <span class="o">+</span> <span class="s">&quot;:&quot;</span> <span class="o">+</span> <span class="n">metadata</span><span class="o">.</span><span class="na">port</span><span class="o">()</span> <span class="o">+</span> <span class="s">&quot;/word-count/alice&quot;</span><span class="o">);</span>
<span class="c1">// Get the word count for word (aka key) &#39;alice&#39;: Approach 2</span>
<span class="c1">//</span>
<span class="c1">// Alternatively, we could also choose (say) a brute-force approach where we query every app instance</span>
<span class="c1">// until we find the one that happens to know about &#39;alice&#39;.</span>
<span class="n">Optional</span><span class="o">&lt;</span><span class="n">Long</span><span class="o">&gt;</span> <span class="n">result</span> <span class="o">=</span> <span class="n">streams</span><span class="o">.</span><span class="na">allMetadataForStore</span><span class="o">(</span><span class="s">&quot;word-count&quot;</span><span class="o">)</span>
<span class="o">.</span><span class="na">stream</span><span class="o">()</span>
<span class="o">.</span><span class="na">map</span><span class="o">(</span><span class="n">streamsMetadata</span> <span class="o">-&gt;</span> <span class="o">{</span>
<span class="c1">// Construct the (fictituous) full endpoint URL to query the current remote application instance</span>
<span class="n">String</span> <span class="n">url</span> <span class="o">=</span> <span class="s">&quot;http://&quot;</span> <span class="o">+</span> <span class="n">streamsMetadata</span><span class="o">.</span><span class="na">host</span><span class="o">()</span> <span class="o">+</span> <span class="s">&quot;:&quot;</span> <span class="o">+</span> <span class="n">streamsMetadata</span><span class="o">.</span><span class="na">port</span><span class="o">()</span> <span class="o">+</span> <span class="s">&quot;/word-count/alice&quot;</span><span class="o">;</span>
<span class="c1">// Read and return the count for &#39;alice&#39;, if any.</span>
<span class="k">return</span> <span class="n">http</span><span class="o">.</span><span class="na">getLong</span><span class="o">(</span><span class="n">url</span><span class="o">);</span>
<span class="o">})</span>
<span class="o">.</span><span class="na">filter</span><span class="o">(</span><span class="n">s</span> <span class="o">-&gt;</span> <span class="n">s</span> <span class="o">!=</span> <span class="kc">null</span><span class="o">)</span>
<span class="o">.</span><span class="na">findFirst</span><span class="o">();</span>
</pre></div>
</div>
<p>At this point the full state of the application is interactively queryable:</p>
<ul class="simple">
<li>You can discover the running instances of the application and the state stores they manage locally.</li>
<li>Through the RPC layer that was added to the application, you can communicate with these application instances over the
network and query them for locally available state.</li>
<li>The application instances are able to serve such queries because they can directly query their own local state stores
and respond via the RPC layer.</li>
<li>Collectively, this allows us to query the full state of the entire application.</li>
</ul>
<p>To see an end-to-end application with interactive queries, review the
<a class="reference internal" href="#streams-developer-guide-interactive-queries-demos"><span class="std std-ref">demo applications</span></a>.</p>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="pagination">
<a href="/{{version}}/documentation/streams/developer-guide/testing" class="pagination__btn pagination__btn__prev">Previous</a>
<a href="/{{version}}/documentation/streams/developer-guide/memory-mgmt" class="pagination__btn pagination__btn__next">Next</a>
</div>
</script>
<!--#include virtual="../../../includes/_header.htm" -->
<!--#include virtual="../../../includes/_top.htm" -->
<div class="content documentation documentation--current">
<!--#include virtual="../../../includes/_nav.htm" -->
<div class="right">
<!--#include virtual="../../../includes/_docs_banner.htm" -->
<ul class="breadcrumbs">
<li><a href="/documentation">Documentation</a></li>
<li><a href="/documentation/streams">Kafka Streams</a></li>
<li><a href="/documentation/streams/developer-guide/">Developer Guide</a></li>
</ul>
<div class="p-content"></div>
</div>
</div>
<!--#include virtual="../../../includes/_footer.htm" -->
<script>
$(function() {
// Show selected style on nav item
$('.b-nav__streams').addClass('selected');
//sticky secondary nav
var $navbar = $(".sub-nav-sticky"),
y_pos = $navbar.offset().top,
height = $navbar.height();
$(window).scroll(function() {
var scrollTop = $(window).scrollTop();
if (scrollTop > y_pos - height) {
$navbar.addClass("navbar-fixed")
} else if (scrollTop <= y_pos) {
$navbar.removeClass("navbar-fixed")
}
});
// Display docs subnav items
$('.b-nav__docs').parent().toggleClass('nav__item__with__subs--expanded');
});
</script>

View File

@@ -0,0 +1,129 @@
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<script><!--#include virtual="../../js/templateData.js" --></script>
<script id="content-template" type="text/x-handlebars-template">
<!-- h1>Developer Guide for Kafka Streams</h1 -->
<div class="sub-nav-sticky">
<div class="sticky-top">
<!-- div style="height:35px">
<a href="/{{version}}/documentation/streams/">Introduction</a>
<a class="active-menu-item" href="/{{version}}/documentation/streams/developer-guide">Developer Guide</a>
<a href="/{{version}}/documentation/streams/core-concepts">Concepts</a>
<a href="/{{version}}/documentation/streams/quickstart">Run Demo App</a>
<a href="/{{version}}/documentation/streams/tutorial">Tutorial: Write App</a>
</div -->
</div>
</div>
<div class="section" id="managing-streams-application-topics">
<span id="streams-developer-guide-topics"></span><h1>Managing Streams Application Topics<a class="headerlink" href="#managing-streams-application-topics" title="Permalink to this headline"></a></h1>
<p>A Kafka Streams application continuously reads from Kafka topics, processes the read data, and then
writes the processing results back into Kafka topics. The application may also auto-create other Kafka topics in the
Kafka brokers, for example state store changelogs topics. This section describes the differences these topic types and
how to manage the topics and your applications.</p>
<p>Kafka Streams distinguishes between <a class="reference internal" href="#streams-developer-guide-topics-user"><span class="std std-ref">user topics</span></a> and
<a class="reference internal" href="#streams-developer-guide-topics-internal"><span class="std std-ref">internal topics</span></a>.</p>
<div class="section" id="user-topics">
<span id="streams-developer-guide-topics-user"></span><h2>User topics<a class="headerlink" href="#user-topics" title="Permalink to this headline"></a></h2>
<p>User topics exist externally to an application and are read from or written to by the application, including:</p>
<dl class="docutils">
<dt>Input topics</dt>
<dd>Topics that are specified via source processors in the application&#8217;s topology; e.g. via <code class="docutils literal"><span class="pre">StreamsBuilder#stream()</span></code>, <code class="docutils literal"><span class="pre">StreamsBuilder#table()</span></code> and <code class="docutils literal"><span class="pre">Topology#addSource()</span></code>.</dd>
<dt>Output topics</dt>
<dd>Topics that are specified via sink processors in the application&#8217;s topology; e.g. via
<code class="docutils literal"><span class="pre">KStream#to()</span></code>, <code class="docutils literal"><span class="pre">KTable.to()</span></code> and <code class="docutils literal"><span class="pre">Topology#addSink()</span></code>.</dd>
<dt>Intermediate topics</dt>
<dd>Topics that are both input and output topics of the application&#8217;s topology; e.g. via
<code class="docutils literal"><span class="pre">KStream#through()</span></code>.</dd>
</dl>
<p>User topics must be created and manually managed ahead of time (e.g., via the
<a class="reference internal" href="../../kafka/post-deployment.html#kafka-operations-admin"><span class="std std-ref">topic tools</span></a>). If user topics are shared among multiple applications for reading and
writing, the application users must coordinate topic management. If user topics are centrally managed, then application
users then would not need to manage topics themselves but simply obtain access to them.</p>
<div class="admonition note">
<p class="first admonition-title">Note</p>
<p>You should not use the auto-create topic feature on the brokers to create user topics, because:</p>
<ul class="last simple">
<li>Auto-creation of topics may be disabled in your Kafka cluster.</li>
<li>Auto-creation automatically applies the default topic settings such as the replicaton factor. These default settings might not be what you want for certain output topics (e.g., <code class="docutils literal"><span class="pre">auto.create.topics.enable=true</span></code> in the <a class="reference external" href="http://kafka.apache.org/0100/documentation.html#brokerconfigs">Kafka broker configuration</a>).</li>
</ul>
</div>
</div>
<div class="section" id="internal-topics">
<span id="streams-developer-guide-topics-internal"></span><h2>Internal topics<a class="headerlink" href="#internal-topics" title="Permalink to this headline"></a></h2>
<p>Internal topics are used internally by the Kafka Streams application while executing, for example the
changelog topics for state stores. These topics are created by the application and are only used by that stream application.</p>
<p>If security is enabled on the Kafka brokers, you must grant the underlying clients admin permissions so that they can
create internal topics set. For more information, see <a class="reference internal" href="security.html#streams-developer-guide-security"><span class="std std-ref">Streams Security</span></a>.</p>
<div class="admonition note">
<p class="first admonition-title">Note</p>
<p class="last">The internal topics follow the naming convention <code class="docutils literal"><span class="pre">&lt;application.id&gt;-&lt;operatorName&gt;-&lt;suffix&gt;</span></code>, but this convention
is not guaranteed for future releases.</p>
</div>
</div>
</div>
</div>
</div>
<div class="pagination">
<a href="/{{version}}/documentation/streams/developer-guide/running-app" class="pagination__btn pagination__btn__prev">Previous</a>
<a href="/{{version}}/documentation/streams/developer-guide/security" class="pagination__btn pagination__btn__next">Next</a>
</div>
</script>
<!--#include virtual="../../../includes/_header.htm" -->
<!--#include virtual="../../../includes/_top.htm" -->
<div class="content documentation documentation--current">
<!--#include virtual="../../../includes/_nav.htm" -->
<div class="right">
<!--#include virtual="../../../includes/_docs_banner.htm" -->
<ul class="breadcrumbs">
<li><a href="/documentation">Documentation</a></li>
<li><a href="/documentation/streams">Kafka Streams</a></li>
<li><a href="/documentation/streams/developer-guide/">Developer Guide</a></li>
</ul>
<div class="p-content"></div>
</div>
</div>
<!--#include virtual="../../../includes/_footer.htm" -->
<script>
$(function() {
// Show selected style on nav item
$('.b-nav__streams').addClass('selected');
//sticky secondary nav
var $navbar = $(".sub-nav-sticky"),
y_pos = $navbar.offset().top,
height = $navbar.height();
$(window).scroll(function() {
var scrollTop = $(window).scrollTop();
if (scrollTop > y_pos - height) {
$navbar.addClass("navbar-fixed")
} else if (scrollTop <= y_pos) {
$navbar.removeClass("navbar-fixed")
}
});
// Display docs subnav items
$('.b-nav__docs').parent().toggleClass('nav__item__with__subs--expanded');
});
</script>

View File

@@ -0,0 +1,289 @@
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<script><!--#include virtual="../../js/templateData.js" --></script>
<script id="content-template" type="text/x-handlebars-template">
<!-- h1>Developer Guide for Kafka Streams</h1 -->
<div class="sub-nav-sticky">
<div class="sticky-top">
<!-- div style="height:35px">
<a href="/{{version}}/documentation/streams/">Introduction</a>
<a class="active-menu-item" href="/{{version}}/documentation/streams/developer-guide">Developer Guide</a>
<a href="/{{version}}/documentation/streams/core-concepts">Concepts</a>
<a href="/{{version}}/documentation/streams/quickstart">Run Demo App</a>
<a href="/{{version}}/documentation/streams/tutorial">Tutorial: Write App</a>
</div -->
</div>
</div>
<div class="section" id="memory-management">
<span id="streams-developer-guide-memory-management"></span><h1>Memory Management<a class="headerlink" href="#memory-management" title="Permalink to this headline"></a></h1>
<p>You can specify the total memory (RAM) size used for internal caching and compacting of records. This caching happens
before the records are written to state stores or forwarded downstream to other nodes.</p>
<p>The record caches are implemented slightly different in the DSL and Processor API.</p>
<div class="contents local topic" id="table-of-contents">
<p class="topic-title first"><b>Table of Contents</b></p>
<ul class="simple">
<li><a class="reference internal" href="#record-caches-in-the-dsl" id="id1">Record caches in the DSL</a></li>
<li><a class="reference internal" href="#record-caches-in-the-processor-api" id="id2">Record caches in the Processor API</a></li>
<li><a class="reference internal" href="#rocksdb" id="id3">RocksDB</a></li>
<li><a class="reference internal" href="#other-memory-usage" id="id4">Other memory usage</a></li>
</ul>
</div>
<div class="section" id="record-caches-in-the-dsl">
<span id="streams-developer-guide-memory-management-record-cache"></span><h2><a class="toc-backref" href="#id1">Record caches in the DSL</a><a class="headerlink" href="#record-caches-in-the-dsl" title="Permalink to this headline"></a></h2>
<p>You can specify the total memory (RAM) size of the record cache for an instance of the processing topology. It is leveraged
by the following <code class="docutils literal"><span class="pre">KTable</span></code> instances:</p>
<ul class="simple">
<li>Source <code class="docutils literal"><span class="pre">KTable</span></code>: <code class="docutils literal"><span class="pre">KTable</span></code> instances that are created via <code class="docutils literal"><span class="pre">StreamsBuilder#table()</span></code> or <code class="docutils literal"><span class="pre">StreamsBuilder#globalTable()</span></code>.</li>
<li>Aggregation <code class="docutils literal"><span class="pre">KTable</span></code>: instances of <code class="docutils literal"><span class="pre">KTable</span></code> that are created as a result of <a class="reference internal" href="dsl-api.html#streams-developer-guide-dsl-aggregating"><span class="std std-ref">aggregations</span></a>.</li>
</ul>
<p>For such <code class="docutils literal"><span class="pre">KTable</span></code> instances, the record cache is used for:</p>
<ul class="simple">
<li>Internal caching and compacting of output records before they are written by the underlying stateful
<a class="reference internal" href="../core-concepts#streams_processor_node"><span class="std std-ref">processor node</span></a> to its internal state stores.</li>
<li>Internal caching and compacting of output records before they are forwarded from the underlying stateful
<a class="reference internal" href="../core-concepts#streams_processor_node"><span class="std std-ref">processor node</span></a> to any of its downstream processor nodes.</li>
</ul>
<p>Use the following example to understand the behaviors with and without record caching. In this example, the input is a
<code class="docutils literal"><span class="pre">KStream&lt;String,</span> <span class="pre">Integer&gt;</span></code> with the records <code class="docutils literal"><span class="pre">&lt;K,V&gt;:</span> <span class="pre">&lt;A,</span> <span class="pre">1&gt;,</span> <span class="pre">&lt;D,</span> <span class="pre">5&gt;,</span> <span class="pre">&lt;A,</span> <span class="pre">20&gt;,</span> <span class="pre">&lt;A,</span> <span class="pre">300&gt;</span></code>. The focus in this example is
on the records with key == <code class="docutils literal"><span class="pre">A</span></code>.</p>
<ul>
<li><p class="first">An <a class="reference internal" href="dsl-api.html#streams-developer-guide-dsl-aggregating"><span class="std std-ref">aggregation</span></a> computes the sum of record values, grouped by key, for
the input and returns a <code class="docutils literal"><span class="pre">KTable&lt;String,</span> <span class="pre">Integer&gt;</span></code>.</p>
<blockquote>
<div><ul class="simple">
<li><strong>Without caching</strong>: a sequence of output records is emitted for key <code class="docutils literal"><span class="pre">A</span></code> that represent changes in the
resulting aggregation table. The parentheses (<code class="docutils literal"><span class="pre">()</span></code>) denote changes, the left number is the new aggregate value
and the right number is the old aggregate value: <code class="docutils literal"><span class="pre">&lt;A,</span> <span class="pre">(1,</span> <span class="pre">null)&gt;,</span> <span class="pre">&lt;A,</span> <span class="pre">(21,</span> <span class="pre">1)&gt;,</span> <span class="pre">&lt;A,</span> <span class="pre">(321,</span> <span class="pre">21)&gt;</span></code>.</li>
<li><strong>With caching</strong>: a single output record is emitted for key <code class="docutils literal"><span class="pre">A</span></code> that would likely be compacted in the cache,
leading to a single output record of <code class="docutils literal"><span class="pre">&lt;A,</span> <span class="pre">(321,</span> <span class="pre">null)&gt;</span></code>. This record is written to the aggregation&#8217;s internal state
store and forwarded to any downstream operations.</li>
</ul>
</div></blockquote>
</li>
</ul>
<p>The cache size is specified through the <code class="docutils literal"><span class="pre">cache.max.bytes.buffering</span></code> parameter, which is a global setting per
processing topology:</p>
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="c1">// Enable record cache of size 10 MB.</span>
<span class="n">Properties</span> <span class="n">props</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Properties</span><span class="o">();</span>
<span class="n">props</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">StreamsConfig</span><span class="o">.</span><span class="na">CACHE_MAX_BYTES_BUFFERING_CONFIG</span><span class="o">,</span> <span class="mi">10</span> <span class="o">*</span> <span class="mi">1024</span> <span class="o">*</span> <span class="mi">1024L</span><span class="o">);</span>
</pre></div>
</div>
<p>This parameter controls the number of bytes allocated for caching. Specifically, for a processor topology instance with
<code class="docutils literal"><span class="pre">T</span></code> threads and <code class="docutils literal"><span class="pre">C</span></code> bytes allocated for caching, each thread will have an even <code class="docutils literal"><span class="pre">C/T</span></code> bytes to construct its own
cache and use as it sees fit among its tasks. This means that there are as many caches as there are threads, but no sharing of
caches across threads happens.</p>
<p>The basic API for the cache is made of <code class="docutils literal"><span class="pre">put()</span></code> and <code class="docutils literal"><span class="pre">get()</span></code> calls. Records are
evicted using a simple LRU scheme after the cache size is reached. The first time a keyed record <code class="docutils literal"><span class="pre">R1</span> <span class="pre">=</span> <span class="pre">&lt;K1,</span> <span class="pre">V1&gt;</span></code>
finishes processing at a node, it is marked as dirty in the cache. Any other keyed record <code class="docutils literal"><span class="pre">R2</span> <span class="pre">=</span> <span class="pre">&lt;K1,</span> <span class="pre">V2&gt;</span></code> with the
same key <code class="docutils literal"><span class="pre">K1</span></code> that is processed on that node during that time will overwrite <code class="docutils literal"><span class="pre">&lt;K1,</span> <span class="pre">V1&gt;</span></code>, this is referred to as
&#8220;being compacted&#8221;. This has the same effect as
<a class="reference external" href="https://kafka.apache.org/documentation.html#compaction">Kafka&#8217;s log compaction</a>, but happens earlier, while the
records are still in memory, and within your client-side application, rather than on the server-side (i.e. the Kafka
broker). After flushing, <code class="docutils literal"><span class="pre">R2</span></code> is forwarded to the next processing node and then written to the local state store.</p>
<p>The semantics of caching is that data is flushed to the state store and forwarded to the next downstream processor node
whenever the earliest of <code class="docutils literal"><span class="pre">commit.interval.ms</span></code> or <code class="docutils literal"><span class="pre">cache.max.bytes.buffering</span></code> (cache pressure) hits. Both
<code class="docutils literal"><span class="pre">commit.interval.ms</span></code> and <code class="docutils literal"><span class="pre">cache.max.bytes.buffering</span></code> are global parameters. As such, it is not possible to specify
different parameters for individual nodes.</p>
<p>Here are example settings for both parameters based on desired scenarios.</p>
<ul>
<li><p class="first">To turn off caching the cache size can be set to zero:</p>
<blockquote>
<div><div class="highlight-java"><div class="highlight"><pre><span></span><span class="c1">// Disable record cache</span>
<span class="n">Properties</span> <span class="n">props</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Properties</span><span class="o">();</span>
<span class="n">props</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">StreamsConfig</span><span class="o">.</span><span class="na">CACHE_MAX_BYTES_BUFFERING_CONFIG</span><span class="o">,</span> <span class="mi">0</span><span class="o">);</span>
</pre></div>
</div>
<p>Turning off caching might result in high write traffic for the underlying RocksDB store.
With default settings caching is enabled within Kafka Streams but RocksDB caching is disabled.
Thus, to avoid high write traffic it is recommended to enable RocksDB caching if Kafka Streams caching is turned off.</p>
<p>For example, the RocksDB Block Cache could be set to 100MB and Write Buffer size to 32 MB. For more information, see
the <a class="reference internal" href="config-streams.html#streams-developer-guide-rocksdb-config"><span class="std std-ref">RocksDB config</span></a>.</p>
</div></blockquote>
</li>
<li><p class="first">To enable caching but still have an upper bound on how long records will be cached, you can set the commit interval. In this example, it is set to 1000 milliseconds:</p>
<blockquote>
<div><div class="highlight-java"><div class="highlight"><pre><span></span><span class="n">Properties</span> <span class="n">props</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Properties</span><span class="o">();</span>
<span class="c1">// Enable record cache of size 10 MB.</span>
<span class="n">props</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">StreamsConfig</span><span class="o">.</span><span class="na">CACHE_MAX_BYTES_BUFFERING_CONFIG</span><span class="o">,</span> <span class="mi">10</span> <span class="o">*</span> <span class="mi">1024</span> <span class="o">*</span> <span class="mi">1024L</span><span class="o">);</span>
<span class="c1">// Set commit interval to 1 second.</span>
<span class="n">props</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">StreamsConfig</span><span class="o">.</span><span class="na">COMMIT_INTERVAL_MS_CONFIG</span><span class="o">,</span> <span class="mi">1000</span><span class="o">);</span>
</pre></div>
</div>
</div></blockquote>
</li>
</ul>
<p>The effect of these two configurations is described in the figure below. The records are shown using 4 keys: blue, red, yellow, and green. Assume the cache has space for only 3 keys.</p>
<ul>
<li><p class="first">When the cache is disabled (a), all of the input records will be output.</p>
</li>
<li><p class="first">When the cache is enabled (b):</p>
<blockquote>
<div><ul class="simple">
<li>Most records are output at the end of commit intervals (e.g., at <code class="docutils literal"><span class="pre">t1</span></code> a single blue record is output, which is the final over-write of the blue key up to that time).</li>
<li>Some records are output because of cache pressure (i.e. before the end of a commit interval). For example, see the red record before <code class="docutils literal"><span class="pre">t2</span></code>. With smaller cache sizes we expect cache pressure to be the primary factor that dictates when records are output. With large cache sizes, the commit interval will be the primary factor.</li>
<li>The total number of records output has been reduced from 15 to 8.</li>
</ul>
</div></blockquote>
</li>
</ul>
<div class="figure align-center">
<img class="centered" src="/{{version}}/images/streams-cache-and-commit-interval.png">
</div>
</div>
<div class="section" id="record-caches-in-the-processor-api">
<span id="streams-developer-guide-memory-management-state-store-cache"></span><h2><a class="toc-backref" href="#id2">Record caches in the Processor API</a><a class="headerlink" href="#record-caches-in-the-processor-api" title="Permalink to this headline"></a></h2>
<p>You can specify the total memory (RAM) size of the record cache for an instance of the processing topology. It is used
for internal caching and compacting of output records before they are written from a stateful processor node to its
state stores.</p>
<p>The record cache in the Processor API does not cache or compact any output records that are being forwarded downstream.
This means that all downstream processor nodes can see all records, whereas the state stores see a reduced number of records.
This does not impact correctness of the system, but is a performance optimization for the state stores. For example, with the
Processor API you can store a record in a state store while forwarding a different value downstream.</p>
<p>Following from the example first shown in section <a class="reference internal" href="processor-api.html#streams-developer-guide-state-store"><span class="std std-ref">State Stores</span></a>, to disable caching, you can
add the <code class="docutils literal"><span class="pre">withCachingDisabled</span></code> call (note that caches are enabled by default, however there is an explicit <code class="docutils literal"><span class="pre">withCachingEnabled</span></code>
call).</p>
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="n">StoreBuilder</span> <span class="n">countStoreBuilder</span> <span class="o">=</span>
<span class="n">Stores</span><span class="o">.</span><span class="na">keyValueStoreBuilder</span><span class="o">(</span>
<span class="n">Stores</span><span class="o">.</span><span class="na">persistentKeyValueStore</span><span class="o">(</span><span class="s">&quot;Counts&quot;</span><span class="o">),</span>
<span class="n">Serdes</span><span class="o">.</span><span class="na">String</span><span class="o">(),</span>
<span class="n">Serdes</span><span class="o">.</span><span class="na">Long</span><span class="o">())</span>
<span class="o">.</span><span class="na">withCachingEnabled</span><span class="o">()</span>
</pre></div>
</div>
</div>
<div class="section" id="rocksdb">
<h2><a class="toc-backref" href="#id3">RocksDB</a><a class="headerlink" href="#rocksdb" title="Permalink to this headline"></a></h2>
<p> Each instance of RocksDB allocates off-heap memory for a block cache, index and filter blocks, and memtable (write buffer). Critical configs (for RocksDB version 4.1.0) include
<code class="docutils literal"><span class="pre">block_cache_size</span></code>, <code class="docutils literal"><span class="pre">write_buffer_size</span></code> and <code class="docutils literal"><span class="pre">max_write_buffer_number</span></code>. These can be specified through the
<code class="docutils literal"><span class="pre">rocksdb.config.setter</span></code> configuration.</li>
<p> As of 2.3.0 the memory usage across all instances can be bounded, limiting the total off-heap memory of your Kafka Streams application. To do so you must configure RocksDB to cache the index and filter blocks in the block cache, limit the memtable memory through a shared <a class="reference external" href="https://github.com/facebook/rocksdb/wiki/Write-Buffer-Manager">WriteBufferManager</a> and count its memory against the block cache, and then pass the same Cache object to each instance. See <a class="reference external" href="https://github.com/facebook/rocksdb/wiki/Memory-usage-in-RocksDB">RocksDB Memory Usage</a> for details. An example RocksDBConfigSetter implementing this is shown below:</p>
<div class="highlight-java"><div class="highlight"><pre><span></span> <span class="kd">public</span> <span class="kd">static</span> <span class="kd">class</span> <span class="nc">BoundedMemoryRocksDBConfig</span> <span class="kd">implements</span> <span class="n">RocksDBConfigSetter</span> <span class="o">{</span>
<span class="kd">private</span> <span class="kt">static</span> <span class="n">org.rocksdb.Cache</span> <span class="n">cache</span> <span class="o">=</span> <span class="k">new</span> <span class="n">org</span><span class="o">.</span><span class="na">rocksdb</span><span class="o">.</span><span class="na">LRUCache</span><span class="o">(</span><span class="mi">TOTAL_OFF_HEAP_MEMORY</span><span class="o">,</span> <span class="n">-1</span><span class="o">,</span> <span class="n">false</span><span class="o">,</span> <span class="n">INDEX_FILTER_BLOCK_RATIO</span><span class="o">);</span><sup><a href="#fn1" id="ref1">1</a></sup>
<span class="kd">private</span> <span class="kt">static</span> <span class="n">org.rocksdb.WriteBufferManager</span> <span class="n">writeBufferManager</span> <span class="o">=</span> <span class="k">new</span> <span class="n">org</span><span class="o">.</span><span class="na">rocksdb</span><span class="o">.</span><span class="na">WriteBufferManager</span><span class="o">(</span><span class="mi">TOTAL_MEMTABLE_MEMORY</span><span class="o">,</span> cache<span class="o">);</span>
<span class="nd">@Override</span>
<span class="kd">public</span> <span class="kt">void</span> <span class="nf">setConfig</span><span class="o">(</span><span class="kd">final</span> <span class="n">String</span> <span class="n">storeName</span><span class="o">,</span> <span class="kd">final</span> <span class="n">Options</span> <span class="n">options</span><span class="o">,</span> <span class="kd">final</span> <span class="n">Map</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Object</span><span class="o">&gt;</span> <span class="n">configs</span><span class="o">)</span> <span class="o">{</span>
<span class="n">BlockBasedTableConfig</span> <span class="n">tableConfig</span> <span class="o">=</span> <span class="k">(BlockBasedTableConfig)</span> <span class="n">options</span><span><span class="o">.</span><span class="na">tableFormatConfig</span><span class="o">();</span>
<span class="c1"> // These three options in combination will limit the memory used by RocksDB to the size passed to the block cache (TOTAL_OFF_HEAP_MEMORY)</span>
<span class="n">tableConfig</span><span class="o">.</span><span class="na">setBlockCache</span><span class="o">(</span><span class="mi">cache</span><span class="o">);</span>
<span class="n">tableConfig</span><span class="o">.</span><span class="na">setCacheIndexAndFilterBlocks</span><span class="o">(</span><span class="kc">true</span><span class="o">);</span>
<span class="n">options</span><span class="o">.</span><span class="na">setWriteBufferManager</span><span class="o">(</span><span class="mi">writeBufferManager</span><span class="o">);</span>
<span class="c1"> // These options are recommended to be set when bounding the total memory</span>
<span class="n">tableConfig</span><span class="o">.</span><span class="na">setCacheIndexAndFilterBlocksWithHighPriority</span><span class="o">(</span><span class="mi">true</span><span class="o">);</span><sup><a href="#fn2" id="ref2">2</a></sup>
<span class="n">tableConfig</span><span class="o">.</span><span class="na">setPinTopLevelIndexAndFilter</span><span class="o">(</span><span class="mi">true</span><span class="o">);</span>
<span class="n">tableConfig</span><span class="o">.</span><span class="na">setBlockSize</span><span class="o">(</span><span class="mi">BLOCK_SIZE</span><span class="o">);</span><sup><a href="#fn3" id="ref3">3</a></sup>
<span class="n">options</span><span class="o">.</span><span class="na">setMaxWriteBufferNumber</span><span class="o">(</span><span class="mi">N_MEMTABLES</span><span class="o">);</span>
<span class="n">options</span><span class="o">.</span><span class="na">setWriteBufferSize</span><span class="o">(</span><span class="mi">MEMTABLE_SIZE</span><span class="o">);</span>
<span class="n">options</span><span class="o">.</span><span class="na">setTableFormatConfig</span><span class="o">(</span><span class="n">tableConfig</span><span class="o">);</span>
<span class="o">}</span>
<span class="nd">@Override</span>
<span class="kd">public</span> <span class="kt">void</span> <span class="nf">close</span><span class="o">(</span><span class="kd">final</span> <span class="n">String</span> <span class="n">storeName</span><span class="o">,</span> <span class="kd">final</span> <span class="n">Options</span> <span class="n">options</span><span class="o">)</span> <span class="o">{</span>
<span class="c1">// Cache and WriteBufferManager should not be closed here, as the same objects are shared by every store instance.</span>
<span class="o">}</span>
<span class="o">}</span>
</div>
<sup id="fn1">1. INDEX_FILTER_BLOCK_RATIO can be used to set a fraction of the block cache to set aside for "high priority" (aka index and filter) blocks, preventing them from being evicted by data blocks. See the full signature of the <a class="reference external" href="https://github.com/facebook/rocksdb/blob/master/java/src/main/java/org/rocksdb/LRUCache.java#L72">LRUCache constructor</a>. </sup>
<br>
<sup id="fn2">2. This must be set in order for INDEX_FILTER_BLOCK_RATIO to take effect (see footnote 1) as described in the <a class="reference external" href="https://github.com/facebook/rocksdb/wiki/Block-Cache#caching-index-and-filter-blocks">RocksDB docs</a></sup>
<br>
<sup id="fn3">3. You may want to modify the default <a class="reference external" href="https://github.com/apache/kafka/blob/2.3/streams/src/main/java/org/apache/kafka/streams/state/internals/RocksDBStore.java#L79">block size</a> per these instructions from the <a class="reference external" href="https://github.com/facebook/rocksdb/wiki/Memory-usage-in-RocksDB#indexes-and-filter-blocks">RocksDB GitHub</a>. A larger block size means index blocks will be smaller, but the cached data blocks may contain more cold data that would otherwise be evicted.
<br>
<dl class="docutils">
<dt>Note:</dt>
While we recommend setting at least the above configs, the specific options that yield the best performance are workload dependent and you should consider experimenting with these to determine the best choices for your specific use case. Keep in mind that the optimal configs for one app may not apply to one with a different topology or input topic.
In addition to the recommended configs above, you may want to consider using partitioned index filters as described by the <a class="reference external" href="https://github.com/facebook/rocksdb/wiki/Partitioned-Index-Filters">RocksDB docs</a>.
</dl>
</div>
<div class="section" id="other-memory-usage">
<h2><a class="toc-backref" href="#id4">Other memory usage</a><a class="headerlink" href="#other-memory-usage" title="Permalink to this headline"></a></h2>
<p>There are other modules inside Apache Kafka that allocate memory during runtime. They include the following:</p>
<ul class="simple">
<li>Producer buffering, managed by the producer config <code class="docutils literal"><span class="pre">buffer.memory</span></code>.</li>
<li>Consumer buffering, currently not strictly managed, but can be indirectly controlled by fetch size, i.e.,
<code class="docutils literal"><span class="pre">fetch.max.bytes</span></code> and <code class="docutils literal"><span class="pre">fetch.max.wait.ms</span></code>.</li>
<li>Both producer and consumer also have separate TCP send / receive buffers that are not counted as the buffering memory.
These are controlled by the <code class="docutils literal"><span class="pre">send.buffer.bytes</span></code> / <code class="docutils literal"><span class="pre">receive.buffer.bytes</span></code> configs.</li>
<li>Deserialized objects buffering: after <code class="docutils literal"><span class="pre">consumer.poll()</span></code> returns records, they will be deserialized to extract
timestamp and buffered in the streams space. Currently this is only indirectly controlled by
<code class="docutils literal"><span class="pre">buffered.records.per.partition</span></code>.</li>
</ul>
<div class="admonition tip">
<p><b>Tip</b></p>
<p><strong>Iterators should be closed explicitly to release resources:</strong> Store iterators (e.g., <code class="docutils literal"><span class="pre">KeyValueIterator</span></code> and <code class="docutils literal"><span class="pre">WindowStoreIterator</span></code>) must be closed explicitly upon completeness to release resources such as open file handlers and in-memory read buffers, or use try-with-resources statement (available since JDK7) for this Closeable class.</p>
<p class="last">Otherwise, stream application&#8217;s memory usage keeps increasing when running until it hits an OOM.</p>
</div>
</div>
</div>
</div>
<div class="pagination">
<a href="/{{version}}/documentation/streams/developer-guide/interactive-queries" class="pagination__btn pagination__btn__prev">Previous</a>
<a href="/{{version}}/documentation/streams/developer-guide/running-app" class="pagination__btn pagination__btn__next">Next</a>
</div>
</script>
<!--#include virtual="../../../includes/_header.htm" -->
<!--#include virtual="../../../includes/_top.htm" -->
<div class="content documentation documentation--current">
<!--#include virtual="../../../includes/_nav.htm" -->
<div class="right">
<!--#include virtual="../../../includes/_docs_banner.htm" -->
<ul class="breadcrumbs">
<li><a href="/documentation">Documentation</a></li>
<li><a href="/documentation/streams">Kafka Streams</a></li>
<li><a href="/documentation/streams/developer-guide/">Developer Guide</a></li>
</ul>
<div class="p-content"></div>
</div>
</div>
<!--#include virtual="../../../includes/_footer.htm" -->
<script>
$(function() {
// Show selected style on nav item
$('.b-nav__streams').addClass('selected');
//sticky secondary nav
var $navbar = $(".sub-nav-sticky"),
y_pos = $navbar.offset().top,
height = $navbar.height();
$(window).scroll(function() {
var scrollTop = $(window).scrollTop();
if (scrollTop > y_pos - height) {
$navbar.addClass("navbar-fixed")
} else if (scrollTop <= y_pos) {
$navbar.removeClass("navbar-fixed")
}
});
// Display docs subnav items
$('.b-nav__docs').parent().toggleClass('nav__item__with__subs--expanded');
});
</script>

View File

@@ -0,0 +1,474 @@
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<script><!--#include virtual="../../js/templateData.js" --></script>
<script id="content-template" type="text/x-handlebars-template">
<!-- h1>Developer Guide for Kafka Streams</h1 -->
<div class="sub-nav-sticky">
<div class="sticky-top">
<!-- div style="height:35px">
<a href="/{{version}}/documentation/streams/">Introduction</a>
<a class="active-menu-item" href="/{{version}}/documentation/streams/developer-guide">Developer Guide</a>
<a href="/{{version}}/documentation/streams/core-concepts">Concepts</a>
<a href="/{{version}}/documentation/streams/quickstart">Run Demo App</a>
<a href="/{{version}}/documentation/streams/tutorial">Tutorial: Write App</a>
</div -->
</div>
</div>
<div class="section" id="processor-api">
<span id="streams-developer-guide-processor-api"></span><h1>Processor API<a class="headerlink" href="#processor-api" title="Permalink to this headline"></a></h1>
<p>The Processor API allows developers to define and connect custom processors and to interact with state stores. With the
Processor API, you can define arbitrary stream processors that process one received record at a time, and connect these
processors with their associated state stores to compose the processor topology that represents a customized processing
logic.</p>
<div class="contents local topic" id="table-of-contents">
<p class="topic-title first"><b>Table of Contents</b></p>
<ul class="simple">
<li><a class="reference internal" href="#overview" id="id1">Overview</a></li>
<li><a class="reference internal" href="#defining-a-stream-processor" id="id2">Defining a Stream
Processor</a></li>
<li><a class="reference internal" href="#unit-testing-processors" id="id9">Unit Testing Processors</a></li>
<li><a class="reference internal" href="#state-stores" id="id3">State Stores</a>
<ul>
<li><a class="reference internal" href="#defining-and-creating-a-state-store" id="id4">Defining and creating a State Store</a></li>
<li><a class="reference internal" href="#fault-tolerant-state-stores" id="id5">Fault-tolerant State Stores</a></li>
<li><a class="reference internal" href="#enable-or-disable-fault-tolerance-of-state-stores-store-changelogs" id="id6">Enable or Disable Fault Tolerance of State Stores (Store Changelogs)</a></li>
<li><a class="reference internal" href="#implementing-custom-state-stores" id="id7">Implementing Custom State Stores</a></li>
</ul>
</li>
<li><a class="reference internal" href="#connecting-processors-and-state-stores" id="id8">Connecting Processors and State Stores</a></li>
<li><a class="reference internal" href="#accessing-processor-context" id="id10">Accessing Processor Context</a></li>
</ul>
</div>
<div class="section" id="overview">
<h2><a class="toc-backref" href="#id1">Overview</a><a class="headerlink" href="#overview" title="Permalink to this headline"></a></h2>
<p>The Processor API can be used to implement both <strong>stateless</strong> as well as <strong>stateful</strong> operations, where the latter is
achieved through the use of <a class="reference internal" href="#streams-developer-guide-state-store"><span class="std std-ref">state stores</span></a>.</p>
<div class="admonition tip">
<p><b>Tip</b></p>
<p class="last"><strong>Combining the DSL and the Processor API:</strong>
You can combine the convenience of the DSL with the power and flexibility of the Processor API as described in the
section <a class="reference internal" href="dsl-api.html#streams-developer-guide-dsl-process"><span class="std std-ref">Applying processors and transformers (Processor API integration)</span></a>.</p>
</div>
<p>For a complete list of available API functionality, see the <a href="/{{version}}/javadoc/org/apache/kafka/streams/package-summary.html">Streams</a> API docs.</p>
</div>
<div class="section" id="defining-a-stream-processor">
<span id="streams-developer-guide-stream-processor"></span><h2><a class="toc-backref" href="#id2">Defining a Stream Processor</a><a class="headerlink" href="#defining-a-stream-processor" title="Permalink to this headline"></a></h2>
<p>A <a class="reference internal" href="../core-concepts.html#streams_processor_node"><span class="std std-ref">stream processor</span></a> is a node in the processor topology that represents a single processing step.
With the Processor API, you can define arbitrary stream processors that processes one received record at a time, and connect
these processors with their associated state stores to compose the processor topology.</p>
<p>You can define a customized stream processor by implementing the <code class="docutils literal"><span class="pre">Processor</span></code> interface, which provides the <code class="docutils literal"><span class="pre">process()</span></code> API method.
The <code class="docutils literal"><span class="pre">process()</span></code> method is called on each of the received records.</p>
<p>The <code class="docutils literal"><span class="pre">Processor</span></code> interface also has an <code class="docutils literal"><span class="pre">init()</span></code> method, which is called by the Kafka Streams library during task construction
phase. Processor instances should perform any required initialization in this method. The <code class="docutils literal"><span class="pre">init()</span></code> method passes in a <code class="docutils literal"><span class="pre">ProcessorContext</span></code>
instance, which provides access to the metadata of the currently processed record, including its source Kafka topic and partition,
its corresponding message offset, and further such information. You can also use this context instance to schedule a punctuation
function (via <code class="docutils literal"><span class="pre">ProcessorContext#schedule()</span></code>), to forward a new record as a key-value pair to the downstream processors (via <code class="docutils literal"><span class="pre">ProcessorContext#forward()</span></code>),
and to commit the current processing progress (via <code class="docutils literal"><span class="pre">ProcessorContext#commit()</span></code>).
Any resources you set up in <code class="docutils literal"><span class="pre">init()</span></code> can be cleaned up in the
<code class="docutils literal"><span class="pre">close()</span></code> method. Note that Kafka Streams may re-use a single
<code class="docutils literal"><span class="pre">Processor</span></code> object by calling
<code class="docutils literal"><span class="pre">init()</span></code> on it again after <code class="docutils literal"><span class="pre">close()</span></code>.</p>
<p>When records are forwarded via downstream processors they also get a timestamp assigned. There are two different default behaviors:
(1) If <code class="docutils literal"><span class="pre">#forward()</span></code> is called within <code class="docutils literal"><span class="pre">#process()</span></code> the output record inherits the input record timestamp.
(2) If <code class="docutils literal"><span class="pre">#forward()</span></code> is called within <code class="docutils literal"><span class="pre">punctuate()</span></code></p> the output record inherits the current punctuation timestamp (either current 'stream time' or system wall-clock time).
Note, that <code class="docutils literal"><span class="pre">#forward()</span></code> also allows to change the default behavior by passing a custom timestamp for the output record.</p>
<p>Specifically, <code class="docutils literal"><span class="pre">ProcessorContext#schedule()</span></code> accepts a user <code class="docutils literal"><span class="pre">Punctuator</span></code> callback interface, which triggers its <code class="docutils literal"><span class="pre">punctuate()</span></code>
API method periodically based on the <code class="docutils literal"><span class="pre">PunctuationType</span></code>. The <code class="docutils literal"><span class="pre">PunctuationType</span></code> determines what notion of time is used
for the punctuation scheduling: either <a class="reference internal" href="../core-concepts.html#streams_time"><span class="std std-ref">stream-time</span></a> or wall-clock-time (by default, stream-time
is configured to represent event-time via <code class="docutils literal"><span class="pre">TimestampExtractor</span></code>). When stream-time is used, <code class="docutils literal"><span class="pre">punctuate()</span></code> is triggered purely
by data because stream-time is determined (and advanced forward) by the timestamps derived from the input data. When there
is no new input data arriving, stream-time is not advanced and thus <code class="docutils literal"><span class="pre">punctuate()</span></code> is not called.</p>
<p>For example, if you schedule a <code class="docutils literal"><span class="pre">Punctuator</span></code> function every 10 seconds based on <code class="docutils literal"><span class="pre">PunctuationType.STREAM_TIME</span></code> and if you
process a stream of 60 records with consecutive timestamps from 1 (first record) to 60 seconds (last record),
then <code class="docutils literal"><span class="pre">punctuate()</span></code> would be called 6 times. This happens regardless of the time required to actually process those records. <code class="docutils literal"><span class="pre">punctuate()</span></code>
would be called 6 times regardless of whether processing these 60 records takes a second, a minute, or an hour.</p>
<p>When wall-clock-time (i.e. <code class="docutils literal"><span class="pre">PunctuationType.WALL_CLOCK_TIME</span></code>) is used, <code class="docutils literal"><span class="pre">punctuate()</span></code> is triggered purely by the wall-clock time.
Reusing the example above, if the <code class="docutils literal"><span class="pre">Punctuator</span></code> function is scheduled based on <code class="docutils literal"><span class="pre">PunctuationType.WALL_CLOCK_TIME</span></code>, and if these
60 records were processed within 20 seconds, <code class="docutils literal"><span class="pre">punctuate()</span></code> is called 2 times (one time every 10 seconds). If these 60 records
were processed within 5 seconds, then no <code class="docutils literal"><span class="pre">punctuate()</span></code> is called at all. Note that you can schedule multiple <code class="docutils literal"><span class="pre">Punctuator</span></code>
callbacks with different <code class="docutils literal"><span class="pre">PunctuationType</span></code> types within the same processor by calling <code class="docutils literal"><span class="pre">ProcessorContext#schedule()</span></code> multiple
times inside <code class="docutils literal"><span class="pre">init()</span></code> method.</p>
<div class="admonition attention">
<p class="first admonition-title"><b>Attention</b></p>
<p class="last">Stream-time is only advanced if all input partitions over all input topics have new data (with newer timestamps) available.
If at least one partition does not have any new data available, stream-time will not be advanced and thus <code class="docutils literal"><span class="pre">punctuate()</span></code> will not be triggered if <code class="docutils literal"><span class="pre">PunctuationType.STREAM_TIME</span></code> was specified.
This behavior is independent of the configured timestamp extractor, i.e., using <code class="docutils literal"><span class="pre">WallclockTimestampExtractor</span></code> does not enable wall-clock triggering of <code class="docutils literal"><span class="pre">punctuate()</span></code>.</p>
</div>
<p><b>Example</b></p>
<p>The following example <code class="docutils literal"><span class="pre">Processor</span></code> defines a simple word-count algorithm and the following actions are performed:</p>
<ul class="simple">
<li>In the <code class="docutils literal"><span class="pre">init()</span></code> method, schedule the punctuation every 1000 time units (the time unit is normally milliseconds, which in this example would translate to punctuation every 1 second) and retrieve the local state store by its name &#8220;Counts&#8221;.</li>
<li>In the <code class="docutils literal"><span class="pre">process()</span></code> method, upon each received record, split the value string into words, and update their counts into the state store (we will talk about this later in this section).</li>
<li>In the <code class="docutils literal"><span class="pre">punctuate()</span></code> method, iterate the local state store and send the aggregated counts to the downstream processor (we will talk about downstream processors later in this section), and commit the current stream state.</li>
</ul>
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="kd">public</span> <span class="kd">class</span> <span class="nc">WordCountProcessor</span> <span class="kd">implements</span> <span class="n">Processor</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">String</span><span class="o">&gt;</span> <span class="o">{</span>
<span class="kd">private</span> <span class="n">ProcessorContext</span> <span class="n">context</span><span class="o">;</span>
<span class="kd">private</span> <span class="n">KeyValueStore</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Long</span><span class="o">&gt;</span> <span class="n">kvStore</span><span class="o">;</span>
<span class="nd">@Override</span>
<span class="nd">@SuppressWarnings</span><span class="o">(</span><span class="s">&quot;unchecked&quot;</span><span class="o">)</span>
<span class="kd">public</span> <span class="kt">void</span> <span class="nf">init</span><span class="o">(</span><span class="n">ProcessorContext</span> <span class="n">context</span><span class="o">)</span> <span class="o">{</span>
<span class="c1">// keep the processor context locally because we need it in punctuate() and commit()</span>
<span class="k">this</span><span class="o">.</span><span class="na">context</span> <span class="o">=</span> <span class="n">context</span><span class="o">;</span>
<span class="c1">// retrieve the key-value store named &quot;Counts&quot;</span>
<span class="n">kvStore</span> <span class="o">=</span> <span class="o">(</span><span class="n">KeyValueStore</span><span class="o">)</span> <span class="n">context</span><span class="o">.</span><span class="na">getStateStore</span><span class="o">(</span><span class="s">&quot;Counts&quot;</span><span class="o">);</span>
<span class="c1">// schedule a punctuate() method every second based on stream-time</span>
<span class="k">this</span><span class="o">.</span><span class="na">context</span><span class="o">.</span><span class="na">schedule</span><span class="o">(</span><span class="na">Duration</span><span class="o">.</span><span class="na">ofSeconds</span><span class="o">(</span><span class="mi">1000</span><span class="o">),</span> <span class="n">PunctuationType</span><span class="o">.</span><span class="na">STREAM_TIME</span><span class="o">,</span> <span class="o">(</span><span class="n">timestamp</span><span class="o">)</span> <span class="o">-&gt;</span> <span class="o">{</span>
<span class="n">KeyValueIterator</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Long</span><span class="o">&gt;</span> <span class="n">iter</span> <span class="o">=</span> <span class="k">this</span><span class="o">.</span><span class="na">kvStore</span><span class="o">.</span><span class="na">all</span><span class="o">();</span>
<span class="k">while</span> <span class="o">(</span><span class="n">iter</span><span class="o">.</span><span class="na">hasNext</span><span class="o">())</span> <span class="o">{</span>
<span class="n">KeyValue</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Long</span><span class="o">&gt;</span> <span class="n">entry</span> <span class="o">=</span> <span class="n">iter</span><span class="o">.</span><span class="na">next</span><span class="o">();</span>
<span class="n">context</span><span class="o">.</span><span class="na">forward</span><span class="o">(</span><span class="n">entry</span><span class="o">.</span><span class="na">key</span><span class="o">,</span> <span class="n">entry</span><span class="o">.</span><span class="na">value</span><span class="o">.</span><span class="na">toString</span><span class="o">());</span>
<span class="o">}</span>
<span class="n">iter</span><span class="o">.</span><span class="na">close</span><span class="o">();</span>
<span class="c1">// commit the current processing progress</span>
<span class="n">context</span><span class="o">.</span><span class="na">commit</span><span class="o">();</span>
<span class="o">});</span>
<span class="o">}</span>
<span class="nd">@Override</span>
<span class="kd">public</span> <span class="kt">void</span> <span class="nf">punctuate</span><span class="o">(</span><span class="kt">long</span> <span class="n">timestamp</span><span class="o">)</span> <span class="o">{</span>
<span class="c1">// this method is deprecated and should not be used anymore</span>
<span class="o">}</span>
<span class="nd">@Override</span>
<span class="kd">public</span> <span class="kt">void</span> <span class="nf">close</span><span class="o">()</span> <span class="o">{</span>
<span class="c1">// close any resources managed by this processor</span>
<span class="c1">// Note: Do not close any StateStores as these are managed by the library</span>
<span class="o">}</span>
<span class="o">}</span>
</pre></div>
</div>
<div class="admonition note">
<p><b>Note</b></p>
<p class="last"><strong>Stateful processing with state stores:</strong>
The <code class="docutils literal"><span class="pre">WordCountProcessor</span></code> defined above can access the currently received record in its <code class="docutils literal"><span class="pre">process()</span></code> method, and it can
leverage <a class="reference internal" href="#streams-developer-guide-state-store"><span class="std std-ref">state stores</span></a> to maintain processing states to, for example, remember recently
arrived records for stateful processing needs like aggregations and joins. For more information, see the <a class="reference internal" href="#streams-developer-guide-state-store"><span class="std std-ref">state stores</span></a> documentation.</p>
</div>
</div>
<div class="section" id="unit-testing-processors">
<h2>
<a class="toc-backref" href="#id9">Unit Testing Processors</a>
<a class="headerlink" href="#unit-testing-processors" title="Permalink to this headline"></a>
</h2>
<p>
Kafka Streams comes with a <code>test-utils</code> module to help you write unit tests for your
processors <a href="testing.html#unit-testing-processors">here</a>.
</p>
</div>
<div class="section" id="state-stores">
<span id="streams-developer-guide-state-store"></span><h2><a class="toc-backref" href="#id3">State Stores</a><a class="headerlink" href="#state-stores" title="Permalink to this headline"></a></h2>
<p>To implement a <strong>stateful</strong> <code class="docutils literal"><span class="pre">Processor</span></code> or <code class="docutils literal"><span class="pre">Transformer</span></code>, you must provide one or more state stores to the processor
or transformer (<em>stateless</em> processors or transformers do not need state stores). State stores can be used to remember
recently received input records, to track rolling aggregates, to de-duplicate input records, and more.
Another feature of state stores is that they can be
<a class="reference internal" href="interactive-queries.html#streams-developer-guide-interactive-queries"><span class="std std-ref">interactively queried</span></a> from other applications, such as a
NodeJS-based dashboard or a microservice implemented in Scala or Go.</p>
<p>The
<a class="reference internal" href="#streams-developer-guide-state-store-defining"><span class="std std-ref">available state store types</span></a> in Kafka Streams have
<a class="reference internal" href="#streams-developer-guide-state-store-fault-tolerance"><span class="std std-ref">fault tolerance</span></a> enabled by default.</p>
<div class="section" id="defining-and-creating-a-state-store">
<span id="streams-developer-guide-state-store-defining"></span><h3><a class="toc-backref" href="#id4">Defining and creating a State Store</a><a class="headerlink" href="#defining-and-creating-a-state-store" title="Permalink to this headline"></a></h3>
<p>You can either use one of the available store types or
<a class="reference internal" href="#streams-developer-guide-state-store-custom"><span class="std std-ref">implement your own custom store type</span></a>.
It&#8217;s common practice to leverage an existing store type via the <code class="docutils literal"><span class="pre">Stores</span></code> factory.</p>
<p>Note that, when using Kafka Streams, you normally don&#8217;t create or instantiate state stores directly in your code.
Rather, you define state stores indirectly by creating a so-called <code class="docutils literal"><span class="pre">StoreBuilder</span></code>. This builder is used by
Kafka Streams as a factory to instantiate the actual state stores locally in application instances when and where
needed.</p>
<p>The following store types are available out of the box.</p>
<table border="1" class="non-scrolling-table width-100-percent docutils">
<colgroup>
<col width="19%" />
<col width="11%" />
<col width="18%" />
<col width="51%" />
</colgroup>
<thead valign="bottom">
<tr class="row-odd"><th class="head">Store Type</th>
<th class="head">Storage Engine</th>
<th class="head">Fault-tolerant?</th>
<th class="head">Description</th>
</tr>
</thead>
<tbody valign="top">
<tr class="row-even"><td>Persistent
<code class="docutils literal"><span class="pre">KeyValueStore&lt;K,</span> <span class="pre">V&gt;</span></code></td>
<td>RocksDB</td>
<td>Yes (enabled by default)</td>
<td><ul class="first simple">
<li><strong>The recommended store type for most use cases.</strong></li>
<li>Stores its data on local disk.</li>
<li>Storage capacity:
managed local state can be larger than the memory (heap space) of an
application instance, but must fit into the available local disk
space.</li>
<li>RocksDB settings can be fine-tuned, see
<a class="reference internal" href="config-streams.html#streams-developer-guide-rocksdb-config"><span class="std std-ref">RocksDB configuration</span></a>.</li>
<li>Available <a class="reference external" href="/{{version}}/javadoc/org/apache/kafka/streams/state/Stores.html#persistentKeyValueStore-java.lang.String-">store variants</a>:
time window key-value store, session window key-value store.</li>
</ul>
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="c1">// Creating a persistent key-value store:</span>
<span class="c1">// here, we create a `KeyValueStore&lt;String, Long&gt;` named &quot;persistent-counts&quot;.</span>
<span class="kn">import</span> <span class="nn">org.apache.kafka.streams.state.StoreBuilder</span><span class="o">;</span>
<span class="kn">import</span> <span class="nn">org.apache.kafka.streams.state.Stores</span><span class="o">;</span>
<span class="c1">// Using a `KeyValueStoreBuilder` to build a `KeyValueStore`.</span>
<span class="n">StoreBuilder</span><span class="o">&lt;</span><span class="n">KeyValueStore</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Long</span><span class="o">&gt;&gt;</span> <span class="n">countStoreSupplier</span> <span class="o">=</span>
<span class="n">Stores</span><span class="o">.</span><span class="na">keyValueStoreBuilder</span><span class="o">(</span>
<span class="n">Stores</span><span class="o">.</span><span class="na">persistentKeyValueStore</span><span class="o">(</span><span class="s">&quot;persistent-counts&quot;</span><span class="o">),</span>
<span class="n">Serdes</span><span class="o">.</span><span class="na">String</span><span class="o">(),</span>
<span class="n">Serdes</span><span class="o">.</span><span class="na">Long</span><span class="o">());</span>
<span class="n">KeyValueStore</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Long</span><span class="o">&gt;</span> <span class="n">countStore</span> <span class="o">=</span> <span class="n">countStoreSupplier</span><span class="o">.</span><span class="na">build</span><span class="o">();</span>
</pre></div>
</div>
</td>
</tr>
<tr class="row-odd"><td>In-memory
<code class="docutils literal"><span class="pre">KeyValueStore&lt;K,</span> <span class="pre">V&gt;</span></code></td>
<td>-</td>
<td>Yes (enabled by default)</td>
<td><ul class="first simple">
<li>Stores its data in memory.</li>
<li>Storage capacity:
managed local state must fit into memory (heap space) of an
application instance.</li>
<li>Useful when application instances run in an environment where local
disk space is either not available or local disk space is wiped
in-between app instance restarts.</li>
<li>Available <a class="reference external" href="/{{version}}/javadoc/org/apache/kafka/streams/state/Stores.html#inMemoryKeyValueStore-java.lang.String-">store variants</a>:
time window key-value store, session window key-value store.</li>
</ul>
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="c1">// Creating an in-memory key-value store:</span>
<span class="c1">// here, we create a `KeyValueStore&lt;String, Long&gt;` named &quot;inmemory-counts&quot;.</span>
<span class="kn">import</span> <span class="nn">org.apache.kafka.streams.state.StoreBuilder</span><span class="o">;</span>
<span class="kn">import</span> <span class="nn">org.apache.kafka.streams.state.Stores</span><span class="o">;</span>
<span class="c1">// Using a `KeyValueStoreBuilder` to build a `KeyValueStore`.</span>
<span class="n">StoreBuilder</span><span class="o">&lt;</span><span class="n">KeyValueStore</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Long</span><span class="o">&gt;&gt;</span> <span class="n">countStoreSupplier</span> <span class="o">=</span>
<span class="n">Stores</span><span class="o">.</span><span class="na">keyValueStoreBuilder</span><span class="o">(</span>
<span class="n">Stores</span><span class="o">.</span><span class="na">inMemoryKeyValueStore</span><span class="o">(</span><span class="s">&quot;inmemory-counts&quot;</span><span class="o">),</span>
<span class="n">Serdes</span><span class="o">.</span><span class="na">String</span><span class="o">(),</span>
<span class="n">Serdes</span><span class="o">.</span><span class="na">Long</span><span class="o">());</span>
<span class="n">KeyValueStore</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Long</span><span class="o">&gt;</span> <span class="n">countStore</span> <span class="o">=</span> <span class="n">countStoreSupplier</span><span class="o">.</span><span class="na">build</span><span class="o">();</span>
</pre></div>
</div>
</td>
</tr>
</tbody>
</table>
</div>
<div class="section" id="fault-tolerant-state-stores">
<span id="streams-developer-guide-state-store-fault-tolerance"></span><h3><a class="toc-backref" href="#id5">Fault-tolerant State Stores</a><a class="headerlink" href="#fault-tolerant-state-stores" title="Permalink to this headline"></a></h3>
<p>To make state stores fault-tolerant and to allow for state store migration without data loss, a state store can be
continuously backed up to a Kafka topic behind the scenes. For example, to migrate a stateful stream task from one
machine to another when <a class="reference internal" href="running-app.html#streams-developer-guide-execution-scaling"><span class="std std-ref">elastically adding or removing capacity from your application</span></a>.
This topic is sometimes referred to as the state store&#8217;s associated <em>changelog topic</em>, or its <em>changelog</em>. For example, if
you experience machine failure, the state store and the application&#8217;s state can be fully restored from its changelog. You can
<a class="reference internal" href="#streams-developer-guide-state-store-enable-disable-fault-tolerance"><span class="std std-ref">enable or disable this backup feature</span></a> for a
state store.</p>
<p>Fault-tolerant state stores are backed by a
<a class="reference external" href="https://kafka.apache.org/documentation.html#compaction">compacted</a> changelog topic. The purpose of compacting this
topic is to prevent the topic from growing indefinitely, to reduce the storage consumed in the associated Kafka cluster,
and to minimize recovery time if a state store needs to be restored from its changelog topic.</p>
<p>Fault-tolerant windowed state stores are backed by a topic that uses both compaction and
deletion. Because of the structure of the message keys that are being sent to the changelog topics, this combination of
deletion and compaction is required for the changelog topics of window stores. For window stores, the message keys are
composite keys that include the &#8220;normal&#8221; key and window timestamps. For these types of composite keys it would not
be sufficient to only enable compaction to prevent a changelog topic from growing out of bounds. With deletion
enabled, old windows that have expired will be cleaned up by Kafka&#8217;s log cleaner as the log segments expire. The
default retention setting is <code class="docutils literal"><span class="pre">Windows#maintainMs()</span></code> + 1 day. You can override this setting by specifying
<code class="docutils literal"><span class="pre">StreamsConfig.WINDOW_STORE_CHANGE_LOG_ADDITIONAL_RETENTION_MS_CONFIG</span></code> in the <code class="docutils literal"><span class="pre">StreamsConfig</span></code>.</p>
<p>When you open an <code class="docutils literal"><span class="pre">Iterator</span></code> from a state store you must call <code class="docutils literal"><span class="pre">close()</span></code> on the iterator when you are done working with
it to reclaim resources; or you can use the iterator from within a try-with-resources statement. If you do not close an iterator,
you may encounter an OOM error.</p>
</div>
<div class="section" id="enable-or-disable-fault-tolerance-of-state-stores-store-changelogs">
<span id="streams-developer-guide-state-store-enable-disable-fault-tolerance"></span><h3><a class="toc-backref" href="#id6">Enable or Disable Fault Tolerance of State Stores (Store Changelogs)</a><a class="headerlink" href="#enable-or-disable-fault-tolerance-of-state-stores-store-changelogs" title="Permalink to this headline"></a></h3>
<p>You can enable or disable fault tolerance for a state store by enabling or disabling the change logging
of the store through <code class="docutils literal"><span class="pre">enableLogging()</span></code> and <code class="docutils literal"><span class="pre">disableLogging()</span></code>.
You can also fine-tune the associated topic&#8217;s configuration if needed.</p>
<p>Example for disabling fault-tolerance:</p>
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">org.apache.kafka.streams.state.StoreBuilder</span><span class="o">;</span>
<span class="kn">import</span> <span class="nn">org.apache.kafka.streams.state.Stores</span><span class="o">;</span>
<span class="n">StoreBuilder</span><span class="o">&lt;</span><span class="n">KeyValueStore</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Long</span><span class="o">&gt;&gt;</span> <span class="n">countStoreSupplier</span> <span class="o">=</span> <span class="n">Stores</span><span class="o">.</span><span class="na">keyValueStoreBuilder</span><span class="o">(</span>
<span class="n">Stores</span><span class="o">.</span><span class="na">persistentKeyValueStore</span><span class="o">(</span><span class="s">&quot;Counts&quot;</span><span class="o">),</span>
<span class="n">Serdes</span><span class="o">.</span><span class="na">String</span><span class="o">(),</span>
<span class="n">Serdes</span><span class="o">.</span><span class="na">Long</span><span class="o">())</span>
<span class="o">.</span><span class="na">withLoggingDisabled</span><span class="o">();</span> <span class="c1">// disable backing up the store to a changelog topic</span>
</pre></div>
</div>
<div class="admonition attention">
<p class="first admonition-title">Attention</p>
<p class="last">If the changelog is disabled then the attached state store is no longer fault tolerant and it can&#8217;t have any <a class="reference internal" href="config-streams.html#streams-developer-guide-standby-replicas"><span class="std std-ref">standby replicas</span></a>.</p>
</div>
<p>Here is an example for enabling fault tolerance, with additional changelog-topic configuration:
You can add any log config from <a class="reference external" href="https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/log/LogConfig.scala">kafka.log.LogConfig</a>.
Unrecognized configs will be ignored.</p>
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">org.apache.kafka.streams.state.StoreBuilder</span><span class="o">;</span>
<span class="kn">import</span> <span class="nn">org.apache.kafka.streams.state.Stores</span><span class="o">;</span>
<span class="n">Map</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">String</span><span class="o">&gt;</span> <span class="n">changelogConfig</span> <span class="o">=</span> <span class="k">new</span> <span class="n">HashMap</span><span class="o">();</span>
<span class="c1">// override min.insync.replicas</span>
<span class="n">changelogConfig</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">TopicConfig</span><span class="o">.</span><span class="na">MIN_IN_SYNC_REPLICAS_CONFIG</span><span class="o">,</span> <span class="s">&quot;1&quot;</span><span class="o">)</span>
<span class="n">StoreBuilder</span><span class="o">&lt;</span><span class="n">KeyValueStore</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Long</span><span class="o">&gt;&gt;</span> <span class="n">countStoreSupplier</span> <span class="o">=</span> <span class="n">Stores</span><span class="o">.</span><span class="na">keyValueStoreBuilder</span><span class="o">(</span>
<span class="n">Stores</span><span class="o">.</span><span class="na">persistentKeyValueStore</span><span class="o">(</span><span class="s">&quot;Counts&quot;</span><span class="o">),</span>
<span class="n">Serdes</span><span class="o">.</span><span class="na">String</span><span class="o">(),</span>
<span class="n">Serdes</span><span class="o">.</span><span class="na">Long</span><span class="o">())</span>
<span class="o">.</span><span class="na">withLoggingEnabled</span><span class="o">(</span><span class="n">changlogConfig</span><span class="o">);</span> <span class="c1">// enable changelogging, with custom changelog settings</span>
</pre></div>
</div>
</div>
<div class="section" id="implementing-custom-state-stores">
<span id="streams-developer-guide-state-store-custom"></span><h3><a class="toc-backref" href="#id7">Implementing Custom State Stores</a><a class="headerlink" href="#implementing-custom-state-stores" title="Permalink to this headline"></a></h3>
<p>You can use the <a class="reference internal" href="#streams-developer-guide-state-store-defining"><span class="std std-ref">built-in state store types</span></a> or implement your own.
The primary interface to implement for the store is
<code class="docutils literal"><span class="pre">org.apache.kafka.streams.processor.StateStore</span></code>. Kafka Streams also has a few extended interfaces such
as <code class="docutils literal"><span class="pre">KeyValueStore</span></code>.</p>
<p>Note that your customized <code class="docutils literal"><span class="pre">org.apache.kafka.streams.processor.StateStore</span></code> implementation also needs to provide the logic on how to restore the state
via the <code class="docutils literal"><span class="pre">org.apache.kafka.streams.processor.StateRestoreCallback</span></code> or <code class="docutils literal"><span class="pre">org.apache.kafka.streams.processor.BatchingStateRestoreCallback</span></code> interface.
Details on how to instantiate these interfaces can be found in the <a class="reference external" href="/{{version}}/javadoc/org/apache/kafka/streams/processor/StateStore.html">javadocs</a>.</p>
<p>You also need to provide a &#8220;builder&#8221; for the store by implementing the
<code class="docutils literal"><span class="pre">org.apache.kafka.streams.state.StoreBuilder</span></code> interface, which Kafka Streams uses to create instances of
your store.</p>
</div>
</div>
<div class="section" id="accessing-processor-context">
<h2><a class="toc-backref" href="#id10">Accessing Processor Context</a><a class="headerlink" href="#accessing-processor-context" title="Permalink to this headline"></a></h2>
<p>As we have mentioned in the <a href=#defining-a-stream-processor>Defining a Stream Processor</a> section, a <code>ProcessorContext</code> control the processing workflow, such as scheduling a punctuation function, and committing the current processed state.</p>
<p>This object can also be used to access the metadata related with the application like
<code class="docutils literal"><span class="pre">applicationId</span></code>, <code class="docutils literal"><span class="pre">taskId</span></code>,
and <code class="docutils literal"><span class="pre">stateDir</span></code>, and also record related metadata as <code class="docutils literal"><span class="pre">topic</span></code>,
<code class="docutils literal"><span class="pre">partition</span></code>, <code class="docutils literal"><span class="pre">offset</span></code>, <code class="docutils literal"><span class="pre">timestamp</span></code> and
<code class="docutils literal"><span class="pre">headers</span></code>.</p>
<p>Here is an example implementation of how to add a new header to the record:</p>
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="n">public void process(String key, String value) {</span>
<span class="c1">// add a header to the elements</span>
<span class="n">context()</span><span class="o">.</span><span class="na">headers</span><span class="o">()</span><span class="o">.</span><span class="na">add</span><span class="o">.</span><span class="o">(</span><span class="s">&quot;key&quot;</span><span class="o">,</span> <span class="s">&quot;key&quot;</span>
<span class="o">}</span>
</pre></div>
</div>
<div class="section" id="connecting-processors-and-state-stores">
<h2><a class="toc-backref" href="#id8">Connecting Processors and State Stores</a><a class="headerlink" href="#connecting-processors-and-state-stores" title="Permalink to this headline"></a></h2>
<p>Now that a <a class="reference internal" href="#streams-developer-guide-stream-processor"><span class="std std-ref">processor</span></a> (WordCountProcessor) and the
state stores have been defined, you can construct the processor topology by connecting these processors and state stores together by
using the <code class="docutils literal"><span class="pre">Topology</span></code> instance. In addition, you can add source processors with the specified Kafka topics
to generate input data streams into the topology, and sink processors with the specified Kafka topics to generate
output data streams out of the topology.</p>
<p>Here is an example implementation:</p>
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="n">Topology</span> <span class="n">builder</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Topology</span><span class="o">();</span>
<span class="c1">// add the source processor node that takes Kafka topic &quot;source-topic&quot; as input</span>
<span class="n">builder</span><span class="o">.</span><span class="na">addSource</span><span class="o">(</span><span class="s">&quot;Source&quot;</span><span class="o">,</span> <span class="s">&quot;source-topic&quot;</span><span class="o">)</span>
<span class="c1">// add the WordCountProcessor node which takes the source processor as its upstream processor</span>
<span class="o">.</span><span class="na">addProcessor</span><span class="o">(</span><span class="s">&quot;Process&quot;</span><span class="o">,</span> <span class="o">()</span> <span class="o">-&gt;</span> <span class="k">new</span> <span class="n">WordCountProcessor</span><span class="o">(),</span> <span class="s">&quot;Source&quot;</span><span class="o">)</span>
<span class="c1">// add the count store associated with the WordCountProcessor processor</span>
<span class="o">.</span><span class="na">addStateStore</span><span class="o">(</span><span class="n">countStoreBuilder</span><span class="o">,</span> <span class="s">&quot;Process&quot;</span><span class="o">)</span>
<span class="c1">// add the sink processor node that takes Kafka topic &quot;sink-topic&quot; as output</span>
<span class="c1">// and the WordCountProcessor node as its upstream processor</span>
<span class="o">.</span><span class="na">addSink</span><span class="o">(</span><span class="s">&quot;Sink&quot;</span><span class="o">,</span> <span class="s">&quot;sink-topic&quot;</span><span class="o">,</span> <span class="s">&quot;Process&quot;</span><span class="o">);</span>
</pre></div>
</div>
<p>Here is a quick explanation of this example:</p>
<ul class="simple">
<li>A source processor node named <code class="docutils literal"><span class="pre">&quot;Source&quot;</span></code> is added to the topology using the <code class="docutils literal"><span class="pre">addSource</span></code> method, with one Kafka topic
<code class="docutils literal"><span class="pre">&quot;source-topic&quot;</span></code> fed to it.</li>
<li>A processor node named <code class="docutils literal"><span class="pre">&quot;Process&quot;</span></code> with the pre-defined <code class="docutils literal"><span class="pre">WordCountProcessor</span></code> logic is then added as the downstream
processor of the <code class="docutils literal"><span class="pre">&quot;Source&quot;</span></code> node using the <code class="docutils literal"><span class="pre">addProcessor</span></code> method.</li>
<li>A predefined persistent key-value state store is created and associated with the <code class="docutils literal"><span class="pre">&quot;Process&quot;</span></code> node, using
<code class="docutils literal"><span class="pre">countStoreBuilder</span></code>.</li>
<li>A sink processor node is then added to complete the topology using the <code class="docutils literal"><span class="pre">addSink</span></code> method, taking the <code class="docutils literal"><span class="pre">&quot;Process&quot;</span></code> node
as its upstream processor and writing to a separate <code class="docutils literal"><span class="pre">&quot;sink-topic&quot;</span></code> Kafka topic (note that users can also use another overloaded variant of <code class="docutils literal"><span class="pre">addSink</span></code>
to dynamically determine the Kafka topic to write to for each received record from the upstream processor).</li>
</ul>
<p>In this topology, the <code class="docutils literal"><span class="pre">&quot;Process&quot;</span></code> stream processor node is considered a downstream processor of the <code class="docutils literal"><span class="pre">&quot;Source&quot;</span></code> node, and an
upstream processor of the <code class="docutils literal"><span class="pre">&quot;Sink&quot;</span></code> node. As a result, whenever the <code class="docutils literal"><span class="pre">&quot;Source&quot;</span></code> node forwards a newly fetched record from
Kafka to its downstream <code class="docutils literal"><span class="pre">&quot;Process&quot;</span></code> node, the <code class="docutils literal"><span class="pre">WordCountProcessor#process()</span></code> method is triggered to process the record and
update the associated state store. Whenever <code class="docutils literal"><span class="pre">context#forward()</span></code> is called in the
<code class="docutils literal"><span class="pre">WordCountProcessor#punctuate()</span></code> method, the aggregate key-value pair will be sent via the <code class="docutils literal"><span class="pre">&quot;Sink&quot;</span></code> processor node to
the Kafka topic <code class="docutils literal"><span class="pre">&quot;sink-topic&quot;</span></code>. Note that in the <code class="docutils literal"><span class="pre">WordCountProcessor</span></code> implementation, you must refer to the
same store name <code class="docutils literal"><span class="pre">&quot;Counts&quot;</span></code> when accessing the key-value store, otherwise an exception will be thrown at runtime,
indicating that the state store cannot be found. If the state store is not associated with the processor
in the <code class="docutils literal"><span class="pre">Topology</span></code> code, accessing it in the processor&#8217;s <code class="docutils literal"><span class="pre">init()</span></code> method will also throw an exception at
runtime, indicating the state store is not accessible from this processor.</p>
<p>Now that you have fully defined your processor topology in your application, you can proceed to
<a class="reference internal" href="running-app.html#streams-developer-guide-execution"><span class="std std-ref">running the Kafka Streams application</span></a>.</p>
</div>
</div>
</div>
</div>
<div class="pagination">
<a href="/{{version}}/documentation/streams/developer-guide/dsl-api" class="pagination__btn pagination__btn__prev">Previous</a>
<a href="/{{version}}/documentation/streams/developer-guide/datatypes" class="pagination__btn pagination__btn__next">Next</a>
</div>
</script>
<!--#include virtual="../../../includes/_header.htm" -->
<!--#include virtual="../../../includes/_top.htm" -->
<div class="content documentation documentation--current">
<!--#include virtual="../../../includes/_nav.htm" -->
<div class="right">
<!--#include virtual="../../../includes/_docs_banner.htm" -->
<ul class="breadcrumbs">
<li><a href="/documentation">Documentation</a></li>
<li><a href="/documentation/streams">Kafka Streams</a></li>
<li><a href="/documentation/streams/developer-guide/">Developer Guide</a></li>
</ul>
<div class="p-content"></div>
</div>
</div>
<!--#include virtual="../../../includes/_footer.htm" -->
<script>
$(function() {
// Show selected style on nav item
$('.b-nav__streams').addClass('selected');
//sticky secondary nav
var $navbar = $(".sub-nav-sticky"),
y_pos = $navbar.offset().top,
height = $navbar.height();
$(window).scroll(function() {
var scrollTop = $(window).scrollTop();
if (scrollTop > y_pos - height) {
$navbar.addClass("navbar-fixed")
} else if (scrollTop <= y_pos) {
$navbar.removeClass("navbar-fixed")
}
});
// Display docs subnav items
$('.b-nav__docs').parent().toggleClass('nav__item__with__subs--expanded');
});
</script>

View File

@@ -0,0 +1,195 @@
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<script><!--#include virtual="../../js/templateData.js" --></script>
<script id="content-template" type="text/x-handlebars-template">
<!-- h1>Developer Guide for Kafka Streams</h1 -->
<div class="sub-nav-sticky">
<div class="sticky-top">
<!-- div style="height:35px">
<a href="/{{version}}/documentation/streams/">Introduction</a>
<a class="active-menu-item" href="/{{version}}/documentation/streams/developer-guide">Developer Guide</a>
<a href="/{{version}}/documentation/streams/core-concepts">Concepts</a>
<a href="/{{version}}/documentation/streams/quickstart">Run Demo App</a>
<a href="/{{version}}/documentation/streams/tutorial">Tutorial: Write App</a>
</div -->
</div>
</div>
<div class="section" id="running-streams-applications">
<span id="streams-developer-guide-execution"></span><h1>Running Streams Applications<a class="headerlink" href="#running-streams-applications" title="Permalink to this headline"></a></h1>
<p>You can run Java applications that use the Kafka Streams library without any additional configuration or requirements.</p>
<div class="contents local topic" id="table-of-contents">
<p class="topic-title first"><b>Table of Contents</b></p>
<ul class="simple">
<li><a class="reference internal" href="#starting-a-kafka-streams-application" id="id3">Starting a Kafka Streams application</a></li>
<li><a class="reference internal" href="#elastic-scaling-of-your-application" id="id4">Elastic scaling of your application</a><ul>
<li><a class="reference internal" href="#adding-capacity-to-your-application" id="id5">Adding capacity to your application</a></li>
<li><a class="reference internal" href="#removing-capacity-from-your-application" id="id6">Removing capacity from your application</a></li>
<li><a class="reference internal" href="#state-restoration-during-workload-rebalance" id="id7">State restoration during workload rebalance</a></li>
<li><a class="reference internal" href="#determining-how-many-application-instances-to-run" id="id8">Determining how many application instances to run</a></li>
</ul>
</li>
</ul>
</div>
<div class="section" id="running-streams-applications">
<span id="streams-developer-guide-execution"></span><h1>Running Streams Applications<a class="headerlink" href="#running-streams-applications" title="Permalink to this headline"></a></h1>
<p>You can run Java applications that use the Kafka Streams library without any additional configuration or requirements. Kafka Streams
also provides the ability to receive notification of the various states of the application. The ability to monitor the runtime
status is discussed in <a class="reference internal" href="../monitoring.html#streams-monitoring"><span class="std std-ref">the monitoring guide</span></a>.</p>
<div class="contents local topic" id="table-of-contents">
<p class="topic-title first"><b>Table of Contents</b></p>
<ul class="simple">
<li><a class="reference internal" href="#starting-a-kafka-streams-application" id="id3">Starting a Kafka Streams application</a></li>
<li><a class="reference internal" href="#elastic-scaling-of-your-application" id="id4">Elastic scaling of your application</a><ul>
<li><a class="reference internal" href="#adding-capacity-to-your-application" id="id5">Adding capacity to your application</a></li>
<li><a class="reference internal" href="#removing-capacity-from-your-application" id="id6">Removing capacity from your application</a></li>
<li><a class="reference internal" href="#state-restoration-during-workload-rebalance" id="id7">State restoration during workload rebalance</a></li>
<li><a class="reference internal" href="#determining-how-many-application-instances-to-run" id="id8">Determining how many application instances to run</a></li>
</ul>
</li>
</ul>
</div>
<div class="section" id="starting-a-kafka-streams-application">
<span id="streams-developer-guide-execution-starting"></span><h2><a class="toc-backref" href="#id3">Starting a Kafka Streams application</a><a class="headerlink" href="#starting-a-kafka-streams-application" title="Permalink to this headline"></a></h2>
<p>You can package your Java application as a fat JAR file and then start the application like this:</p>
<div class="highlight-bash"><div class="highlight"><pre><span></span><span class="c1"># Start the application in class `com.example.MyStreamsApp`</span>
<span class="c1"># from the fat JAR named `path-to-app-fatjar.jar`.</span>
$ java -cp path-to-app-fatjar.jar com.example.MyStreamsApp
</pre></div>
</div>
<p>When you start your application you are launching a Kafka Streams instance of your application. You can run multiple
instances of your application. A common scenario is that there are multiple instances of your application running in
parallel. For more information, see <a class="reference internal" href="../architecture.html#streams_architecture_tasks"><span class="std std-ref">Parallelism Model</span></a>.</p>
<p>When the application instance starts running, the defined processor topology will be initialized as one or more stream tasks.
If the processor topology defines any state stores, these are also constructed during the initialization period. For
more information, see the <a class="reference internal" href="#streams-developer-guide-execution-scaling-state-restoration"><span class="std std-ref">State restoration during workload rebalance</span></a> section).</p>
</div>
<div class="section" id="elastic-scaling-of-your-application">
<span id="streams-developer-guide-execution-scaling"></span><h2><a class="toc-backref" href="#id4">Elastic scaling of your application</a><a class="headerlink" href="#elastic-scaling-of-your-application" title="Permalink to this headline"></a></h2>
<p>Kafka Streams makes your stream processing applications elastic and scalable. You can add and remove processing capacity
dynamically during application runtime without any downtime or data loss. This makes your applications
resilient in the face of failures and for allows you to perform maintenance as needed (e.g. rolling upgrades).</p>
<p>For more information about this elasticity, see the <a class="reference internal" href="../architecture.html#streams_architecture_tasks"><span class="std std-ref">Parallelism Model</span></a> section. Kafka Streams
leverages the Kafka group management functionality, which is built right into the <a class="reference external" href="https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol">Kafka wire protocol</a>. It is the foundation that enables the
elasticity of Kafka Streams applications: members of a group coordinate and collaborate jointly on the consumption and
processing of data in Kafka. Additionally, Kafka Streams provides stateful processing and allows for fault-tolerant
state in environments where application instances may come and go at any time.</p>
<div class="section" id="adding-capacity-to-your-application">
<h3><a class="toc-backref" href="#id5">Adding capacity to your application</a><a class="headerlink" href="#adding-capacity-to-your-application" title="Permalink to this headline"></a></h3>
<p>If you need more processing capacity for your stream processing application, you can simply start another instance of your stream processing application, e.g. on another machine, in order to scale out. The instances of your application will become aware of each other and automatically begin to share the processing work. More specifically, what will be handed over from the existing instances to the new instances is (some of) the stream tasks that have been run by the existing instances. Moving stream tasks from one instance to another results in moving the processing work plus any internal state of these stream tasks (the state of a stream task will be re-created in the target instance by restoring the state from its corresponding changelog topic).</p>
<p>The various instances of your application each run in their own JVM process, which means that each instance can leverage all the processing capacity that is available to their respective JVM process (minus the capacity that any non-Kafka-Streams part of your application may be using). This explains why running additional instances will grant your application additional processing capacity. The exact capacity you will be adding by running a new instance depends of course on the environment in which the new instance runs: available CPU cores, available main memory and Java heap space, local storage, network bandwidth, and so on. Similarly, if you stop any of the running instances of your application, then you are removing and freeing up the respective processing capacity.</p>
<div class="figure align-center" id="id1">
<img class="centered" src="/{{version}}/images/streams-elastic-scaling-1.png">
<p class="caption"><span class="caption-text">Before adding capacity: only a single instance of your Kafka Streams application is running. At this point the corresponding Kafka consumer group of your application contains only a single member (this instance). All data is being read and processed by this single instance.</span></p>
</div>
<div class="figure align-center" id="id2">
<img class="centered" src="/{{version}}/images/streams-elastic-scaling-2.png">
<p class="caption"><span class="caption-text">After adding capacity: now two additional instances of your Kafka Streams application are running, and they have automatically joined the application&#8217;s Kafka consumer group for a total of three current members. These three instances are automatically splitting the processing work between each other. The splitting is based on the Kafka topic partitions from which data is being read.</span></p>
</div>
</div>
<div class="section" id="removing-capacity-from-your-application">
<h3><a class="toc-backref" href="#id6">Removing capacity from your application</a><a class="headerlink" href="#removing-capacity-from-your-application" title="Permalink to this headline"></a></h3>
<p>To remove processing capacity, you can stop running stream processing application instances (e.g., shut down two of
the four instances), it will automatically leave the applications consumer group, and the remaining instances of
your application will automatically take over the processing work. The remaining instances take over the stream tasks that
were run by the stopped instances. Moving stream tasks from one instance to another results in moving the processing
work plus any internal state of these stream tasks. The state of a stream task is recreated in the target instance
from its changelog topic.</p>
<div class="figure align-center">
<img class="centered" src="/{{version}}/images/streams-elastic-scaling-3.png">
</div>
</div>
<div class="section" id="state-restoration-during-workload-rebalance">
<span id="streams-developer-guide-execution-scaling-state-restoration"></span><h3><a class="toc-backref" href="#id7">State restoration during workload rebalance</a><a class="headerlink" href="#state-restoration-during-workload-rebalance" title="Permalink to this headline"></a></h3>
<p>When a task is migrated, the task processing state is fully restored before the application instance resumes
processing. This guarantees the correct processing results. In Kafka Streams, state restoration is usually done by
replaying the corresponding changelog topic to reconstruct the state store. To minimize changelog-based restoration
latency by using replicated local state stores, you can specify <code class="docutils literal"><span class="pre">num.standby.replicas</span></code>. When a stream task is
initialized or re-initialized on the application instance, its state store is restored like this:</p>
<ul class="simple">
<li>If no local state store exists, the changelog is replayed from the earliest to the current offset. This reconstructs the local state store to the most recent snapshot.</li>
<li>If a local state store exists, the changelog is replayed from the previously checkpointed offset. The changes are applied and the state is restored to the most recent snapshot. This method takes less time because it is applying a smaller portion of the changelog.</li>
</ul>
<p>For more information, see <a class="reference internal" href="config-streams.html#num-standby-replicas"><span class="std std-ref">Standby Replicas</span></a>.</p>
</div>
<div class="section" id="determining-how-many-application-instances-to-run">
<h3><a class="toc-backref" href="#id8">Determining how many application instances to run</a><a class="headerlink" href="#determining-how-many-application-instances-to-run" title="Permalink to this headline"></a></h3>
<p>The parallelism of a Kafka Streams application is primarily determined by how many partitions the input topics have. For
example, if your application reads from a single topic that has ten partitions, then you can run up to ten instances
of your applications. You can run further instances, but these will be idle.</p>
<p>The number of topic partitions is the upper limit for the parallelism of your Kafka Streams application and for the
number of running instances of your application.</p>
<p>To achieve balanced workload processing across application instances and to prevent processing hotpots, you should
distribute data and processing workloads:</p>
<ul class="simple">
<li>Data should be equally distributed across topic partitions. For example, if two topic partitions each have 1 million messages, this is better than a single partition with 2 million messages and none in the other.</li>
<li>Processing workload should be equally distributed across topic partitions. For example, if the time to process messages varies widely, then it is better to spread the processing-intensive messages across partitions rather than storing these messages within the same partition.</li>
</ul>
</div>
</div>
</div>
</div>
</div>
<div class="pagination">
<a href="/{{version}}/documentation/streams/developer-guide/memory-mgmt" class="pagination__btn pagination__btn__prev">Previous</a>
<a href="/{{version}}/documentation/streams/developer-guide/manage-topics" class="pagination__btn pagination__btn__next">Next</a>
</div>
</script>
<!--#include virtual="../../../includes/_header.htm" -->
<!--#include virtual="../../../includes/_top.htm" -->
<div class="content documentation documentation--current">
<!--#include virtual="../../../includes/_nav.htm" -->
<div class="right">
<!--#include virtual="../../../includes/_docs_banner.htm" -->
<ul class="breadcrumbs">
<li><a href="/documentation">Documentation</a></li>
<li><a href="/documentation/streams">Kafka Streams</a></li>
<li><a href="/documentation/streams/developer-guide/">Developer Guide</a></li>
</ul>
<div class="p-content"></div>
</div>
</div>
<!--#include virtual="../../../includes/_footer.htm" -->
<script>
$(function() {
// Show selected style on nav item
$('.b-nav__streams').addClass('selected');
//sticky secondary nav
var $navbar = $(".sub-nav-sticky"),
y_pos = $navbar.offset().top,
height = $navbar.height();
$(window).scroll(function() {
var scrollTop = $(window).scrollTop();
if (scrollTop > y_pos - height) {
$navbar.addClass("navbar-fixed")
} else if (scrollTop <= y_pos) {
$navbar.removeClass("navbar-fixed")
}
});
// Display docs subnav items
$('.b-nav__docs').parent().toggleClass('nav__item__with__subs--expanded');
});
</script>

View File

@@ -0,0 +1,185 @@
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<script><!--#include virtual="../../js/templateData.js" --></script>
<script id="content-template" type="text/x-handlebars-template">
<!-- h1>Developer Guide for Kafka Streams</h1 -->
<div class="sub-nav-sticky">
<div class="sticky-top">
<!-- div style="height:35px">
<a href="/{{version}}/documentation/streams/">Introduction</a>
<a class="active-menu-item" href="/{{version}}/documentation/streams/developer-guide">Developer Guide</a>
<a href="/{{version}}/documentation/streams/core-concepts">Concepts</a>
<a href="/{{version}}/documentation/streams/quickstart">Run Demo App</a>
<a href="/{{version}}/documentation/streams/tutorial">Tutorial: Write App</a>
</div -->
</div>
</div>
<div class="section" id="streams-security">
<span id="streams-developer-guide-security"></span><h1>Streams Security<a class="headerlink" href="#streams-security" title="Permalink to this headline"></a></h1>
<div class="contents local topic" id="table-of-contents">
<p class="topic-title first"><b>Table of Contents</b></p>
<ul class="simple">
<li><a class="reference internal" href="#required-acl-setting-for-secure-kafka-clusters" id="id1">Required ACL setting for secure Kafka clusters</a></li>
<li><a class="reference internal" href="#security-example" id="id2">Security example</a></li>
</ul>
</div>
<p>Kafka Streams natively integrates with the <a class="reference internal" href="../../documentation.html#security"><span class="std std-ref">Kafka&#8217;s security features</span></a> and supports all of the
client-side security features in Kafka. Streams leverages the <a class="reference internal" href="../../clients/index.html#kafka-clients"><span class="std std-ref">Java Producer and Consumer API</span></a>.</p>
<p>To secure your Stream processing applications, configure the security settings in the corresponding Kafka producer
and consumer clients, and then specify the corresponding configuration settings in your Kafka Streams application.</p>
<p>Kafka supports cluster encryption and authentication, including a mix of authenticated and unauthenticated,
and encrypted and non-encrypted clients. Using security is optional.</p>
<p>Here a few relevant client-side security features:</p>
<dl class="docutils">
<dt>Encrypt data-in-transit between your applications and Kafka brokers</dt>
<dd>You can enable the encryption of the client-server communication between your applications and the Kafka brokers.
For example, you can configure your applications to always use encryption when reading and writing data to and from
Kafka. This is critical when reading and writing data across security domains such as internal network, public
internet, and partner networks.</dd>
<dt>Client authentication</dt>
<dd>You can enable client authentication for connections from your application to Kafka brokers. For example, you can
define that only specific applications are allowed to connect to your Kafka cluster.</dd>
<dt>Client authorization</dt>
<dd>You can enable client authorization of read and write operations by your applications. For example, you can define
that only specific applications are allowed to read from a Kafka topic. You can also restrict write access to Kafka
topics to prevent data pollution or fraudulent activities.</dd>
</dl>
<p>For more information about the security features in Apache Kafka, see <a class="reference internal" href="../../documentation.html#security"><span class="std std-ref">Kafka Security</span></a>.</p>
<div class="section" id="required-acl-setting-for-secure-kafka-clusters">
<span id="streams-developer-guide-security-acls"></span><h2><a class="toc-backref" href="#id1">Required ACL setting for secure Kafka clusters</a><a class="headerlink" href="#required-acl-setting-for-secure-kafka-clusters" title="Permalink to this headline"></a></h2>
<p>Kafka clusters can use ACLs to control access to resources (like the ability to create topics), and for such clusters each client,
including Kafka Streams, is required to authenticate as a particular user in order to be authorized with appropriate access.
In particular, when Streams applications are run against a secured Kafka cluster, the principal running the application must have
the ACL set so that the application has the permissions to create, read and write
<a class="reference internal" href="manage-topics.html#streams-developer-guide-topics-internal"><span class="std std-ref">internal topics</span></a>.</p>
<p>Since all internal topics as well as the embedded consumer group name are prefixed with the <a class="reference internal" href="/{{version}}/documentation/streams/developer-guide/config-streams.html#required-configuration-parameters"><span class="std std-ref">application id</span></a>,
it is recommended to use ACLs on prefixed resource pattern
to configure control lists to allow client to manage all topics and consumer groups started with this prefix
as <code class="docutils literal"><span class="pre">--resource-pattern-type prefixed --topic your.application.id --operation All </span></code>
(see <a class="reference external" href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-277+-+Fine+Grained+ACL+for+CreateTopics+API">KIP-277</a>
and <a class="reference external" href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-290%3A+Support+for+Prefixed+ACLs">KIP-290</a> for details).
</p>
</div>
<div class="section" id="security-example">
<span id="streams-developer-guide-security-example"></span><h2><a class="toc-backref" href="#id2">Security example</a><a class="headerlink" href="#security-example" title="Permalink to this headline"></a></h2>
<p>The purpose is to configure a Kafka Streams application to enable client authentication and encrypt data-in-transit when
communicating with its Kafka cluster.</p>
<p>This example assumes that the Kafka brokers in the cluster already have their security setup and that the necessary SSL
certificates are available to the application in the local filesystem locations. For example, if you are using Docker
then you must also include these SSL certificates in the correct locations within the Docker image.</p>
<p>The snippet below shows the settings to enable client authentication and SSL encryption for data-in-transit between your
Kafka Streams application and the Kafka cluster it is reading and writing from:</p>
<div class="highlight-bash"><div class="highlight"><pre><span></span><span class="c1"># Essential security settings to enable client authentication and SSL encryption</span>
bootstrap.servers<span class="o">=</span>kafka.example.com:9093
security.protocol<span class="o">=</span>SSL
ssl.truststore.location<span class="o">=</span>/etc/security/tls/kafka.client.truststore.jks
ssl.truststore.password<span class="o">=</span>test1234
ssl.keystore.location<span class="o">=</span>/etc/security/tls/kafka.client.keystore.jks
ssl.keystore.password<span class="o">=</span>test1234
ssl.key.password<span class="o">=</span>test1234
</pre></div>
</div>
<p>Configure these settings in the application for your <code class="docutils literal"><span class="pre">Properties</span></code> instance. These settings will encrypt any
data-in-transit that is being read from or written to Kafka, and your application will authenticate itself against the
Kafka brokers that it is communicating with. Note that this example does not cover client authorization.</p>
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="c1">// Code of your Java application that uses the Kafka Streams library</span>
<span class="n">Properties</span> <span class="n">settings</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Properties</span><span class="o">();</span>
<span class="n">settings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">StreamsConfig</span><span class="o">.</span><span class="na">APPLICATION_ID_CONFIG</span><span class="o">,</span> <span class="s">&quot;secure-kafka-streams-app&quot;</span><span class="o">);</span>
<span class="c1">// Where to find secure Kafka brokers. Here, it&#39;s on port 9093.</span>
<span class="n">settings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">StreamsConfig</span><span class="o">.</span><span class="na">BOOTSTRAP_SERVERS_CONFIG</span><span class="o">,</span> <span class="s">&quot;kafka.example.com:9093&quot;</span><span class="o">);</span>
<span class="c1">//</span>
<span class="c1">// ...further non-security related settings may follow here...</span>
<span class="c1">//</span>
<span class="c1">// Security settings.</span>
<span class="c1">// 1. These settings must match the security settings of the secure Kafka cluster.</span>
<span class="c1">// 2. The SSL trust store and key store files must be locally accessible to the application.</span>
<span class="n">settings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">CommonClientConfigs</span><span class="o">.</span><span class="na">SECURITY_PROTOCOL_CONFIG</span><span class="o">,</span> <span class="s">&quot;SSL&quot;</span><span class="o">);</span>
<span class="n">settings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">SslConfigs</span><span class="o">.</span><span class="na">SSL_TRUSTSTORE_LOCATION_CONFIG</span><span class="o">,</span> <span class="s">&quot;/etc/security/tls/kafka.client.truststore.jks&quot;</span><span class="o">);</span>
<span class="n">settings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">SslConfigs</span><span class="o">.</span><span class="na">SSL_TRUSTSTORE_PASSWORD_CONFIG</span><span class="o">,</span> <span class="s">&quot;test1234&quot;</span><span class="o">);</span>
<span class="n">settings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">SslConfigs</span><span class="o">.</span><span class="na">SSL_KEYSTORE_LOCATION_CONFIG</span><span class="o">,</span> <span class="s">&quot;/etc/security/tls/kafka.client.keystore.jks&quot;</span><span class="o">);</span>
<span class="n">settings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">SslConfigs</span><span class="o">.</span><span class="na">SSL_KEYSTORE_PASSWORD_CONFIG</span><span class="o">,</span> <span class="s">&quot;test1234&quot;</span><span class="o">);</span>
<span class="n">settings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">SslConfigs</span><span class="o">.</span><span class="na">SSL_KEY_PASSWORD_CONFIG</span><span class="o">,</span> <span class="s">&quot;test1234&quot;</span><span class="o">);</span>
</pre></div>
</div>
<p>If you incorrectly configure a security setting in your application, it will fail at runtime, typically right after you
start it. For example, if you enter an incorrect password for the <code class="docutils literal"><span class="pre">ssl.keystore.password</span></code> setting, an error message
similar to this would be logged and then the application would terminate:</p>
<div class="highlight-bash"><div class="highlight"><pre><span></span><span class="c1"># Misconfigured ssl.keystore.password</span>
Exception in thread <span class="s2">&quot;main&quot;</span> org.apache.kafka.common.KafkaException: Failed to construct kafka producer
<span class="o">[</span>...snip...<span class="o">]</span>
Caused by: org.apache.kafka.common.KafkaException: org.apache.kafka.common.KafkaException:
java.io.IOException: Keystore was tampered with, or password was incorrect
<span class="o">[</span>...snip...<span class="o">]</span>
Caused by: java.security.UnrecoverableKeyException: Password verification failed
</pre></div>
</div>
<p>Monitor your Kafka Streams application log files for such error messages to spot any misconfigured applications quickly.</p>
</div>
</div>
</div>
</div>
<div class="pagination">
<a href="/{{version}}/documentation/streams/developer-guide/manage-topics" class="pagination__btn pagination__btn__prev">Previous</a>
<a href="/{{version}}/documentation/streams/developer-guide/app-reset-tool" class="pagination__btn pagination__btn__next">Next</a>
</div>
</script>
<!--#include virtual="../../../includes/_header.htm" -->
<!--#include virtual="../../../includes/_top.htm" -->
<div class="content documentation documentation--current">
<!--#include virtual="../../../includes/_nav.htm" -->
<div class="right">
<!--#include virtual="../../../includes/_docs_banner.htm" -->
<ul class="breadcrumbs">
<li><a href="/documentation">Documentation</a></li>
<li><a href="/documentation/streams">Kafka Streams</a></li>
<li><a href="/documentation/streams/developer-guide/">Developer Guide</a></li>
</ul>
<div class="p-content"></div>
</div>
</div>
<!--#include virtual="../../../includes/_footer.htm" -->
<script>
$(function() {
// Show selected style on nav item
$('.b-nav__streams').addClass('selected');
//sticky secondary nav
var $navbar = $(".sub-nav-sticky"),
y_pos = $navbar.offset().top,
height = $navbar.height();
$(window).scroll(function() {
var scrollTop = $(window).scrollTop();
if (scrollTop > y_pos - height) {
$navbar.addClass("navbar-fixed")
} else if (scrollTop <= y_pos) {
$navbar.removeClass("navbar-fixed")
}
});
// Display docs subnav items
$('.b-nav__docs').parent().toggleClass('nav__item__with__subs--expanded');
});
</script>

View File

@@ -0,0 +1,439 @@
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<script><!--#include virtual="../../js/templateData.js" --></script>
<script id="content-template" type="text/x-handlebars-template">
<!-- h1>Developer Guide for Kafka Streams</h1 -->
<div class="sub-nav-sticky">
<div class="sticky-top">
<!-- div style="height:35px">
<a href="/{{version}}/documentation/streams/">Introduction</a>
<a class="active-menu-item" href="/{{version}}/documentation/streams/developer-guide">Developer Guide</a>
<a href="/{{version}}/documentation/streams/core-concepts">Concepts</a>
<a href="/{{version}}/documentation/streams/quickstart">Run Demo App</a>
<a href="/{{version}}/documentation/streams/tutorial">Tutorial: Write App</a>
</div -->
</div>
</div>
<div class="section" id="testing">
<span id="streams-developer-guide-testing"></span>
<h1>Testing Kafka Streams<a class="headerlink" href="#testing" title="Permalink to this headline"></a></h1>
<div class="contents local topic" id="table-of-contents">
<p class="topic-title first"><b>Table of Contents</b></p>
<ul class="simple">
<li><a class="reference internal" href="#test-utils-artifact">Importing the test utilities</a></li>
<li><a class="reference internal" href="#testing-topologytestdriver">Testing Streams applications</a>
</li>
<li><a class="reference internal" href="#unit-testing-processors">Unit testing Processors</a>
</li>
</ul>
</div>
<div class="section" id="test-utils-artifact">
<h2><a class="toc-backref" href="#test-utils-artifact" title="Permalink to this headline">Importing the test
utilities</a></h2>
<p>
To test a Kafka Streams application, Kafka provides a test-utils artifact that can be added as regular
dependency to your test code base. Example <code>pom.xml</code> snippet when using Maven:
</p>
<pre>
&lt;dependency&gt;
&lt;groupId&gt;org.apache.kafka&lt;/groupId&gt;
&lt;artifactId&gt;kafka-streams-test-utils&lt;/artifactId&gt;
&lt;version&gt;{{fullDotVersion}}&lt;/version&gt;
&lt;scope&gt;test&lt;/scope&gt;
&lt;/dependency&gt;
</pre>
</div>
<div class="section" id="testing-topologytestdriver">
<h2><a class="toc-backref" href="#testing-topologytestdriver" title="Permalink to this headline">Testing a
Streams application</a></h2>
<p>
The test-utils package provides a <code>TopologyTestDriver</code> that can be used pipe data through a
<code>Topology</code> that is either assembled manually
using Processor API or via the DSL using <code>StreamsBuilder</code>.
The test driver simulates the library runtime that continuously fetches records from input topics and
processes them by traversing the topology.
You can use the test driver to verify that your specified processor topology computes the correct result
with the manually piped in data records.
The test driver captures the results records and allows to query its embedded state stores.
<pre>
// Processor API
Topology topology = new Topology();
topology.addSource("sourceProcessor", "input-topic");
topology.addProcessor("processor", ..., "sourceProcessor");
topology.addSink("sinkProcessor", "output-topic", "processor");
// or
// using DSL
StreamsBuilder builder = new StreamsBuilder();
builder.stream("input-topic").filter(...).to("output-topic");
Topology topology = builder.build();
// setup test driver
Properties props = new Properties();
props.put(StreamsConfig.APPLICATION_ID_CONFIG, "test");
props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "dummy:1234");
TopologyTestDriver testDriver = new TopologyTestDriver(topology, props);
</pre>
<p>
With the test driver you can create <code>TestInputTopic</code> giving topic name and the corresponding serializers.
<code>TestInputTopic</code> provides various methods to pipe new message values, keys and values, or list of KeyValue objects.
</p>
<pre>
TestInputTopic&lt;String, Long&gt; inputTopic = testDriver.createInputTopic("input-topic", stringSerde.serializer(), longSerde.serializer());
inputTopic.pipeInput("key", 42L);
</pre>
<p>
To verify the output, you can use <code>TestOutputTopic</code>
where you configure the topic and the corresponding deserializers during initialization.
It offers helper methods to read only certain parts of the result records or the collection of records.
For example, you can validate returned <code>KeyValue</code> with standard assertions
if you only care about the key and value, but not the timestamp of the result record.
</p>
<pre>
TestOutputTopic&lt;String, Long&gt; outputTopic = testDriver.createOutputTopic("result-topic", stringSerde.deserializer(), longSerde.deserializer());
assertThat(outputTopic.readKeyValue(), equalTo(new KeyValue&lt;&gt;("key", 42L)));
</pre>
<p>
<code>TopologyTestDriver</code> supports punctuations, too.
Event-time punctuations are triggered automatically based on the processed records' timestamps.
Wall-clock-time punctuations can also be triggered by advancing the test driver's wall-clock-time (the
driver mocks wall-clock-time internally to give users control over it).
</p>
<pre>
testDriver.advanceWallClockTime(Duration.ofSeconds(20));
</pre>
<p>
Additionally, you can access state stores via the test driver before or after a test.
Accessing stores before a test is useful to pre-populate a store with some initial values.
After data was processed, expected updates to the store can be verified.
</p>
<pre>
KeyValueStore store = testDriver.getKeyValueStore("store-name");
</pre>
<p>
Note, that you should always close the test driver at the end to make sure all resources are release
properly.
</p>
<pre>
testDriver.close();
</pre>
<h3>Example</h3>
<p>
The following example demonstrates how to use the test driver and helper classes.
The example creates a topology that computes the maximum value per key using a key-value-store.
While processing, no output is generated, but only the store is updated.
Output is only sent downstream based on event-time and wall-clock punctuations.
</p>
<pre>
private TopologyTestDriver testDriver;
private TestInputTopic&lt;String, Long&gt; inputTopic;
private TestOutputTopic&lt;String, Long&gt; outputTopic;
private KeyValueStore&lt;String, Long&gt; store;
private Serde&lt;String&gt; stringSerde = new Serdes.StringSerde();
private Serde&lt;Long&gt; longSerde = new Serdes.LongSerde();
@Before
public void setup() {
Topology topology = new Topology();
topology.addSource("sourceProcessor", "input-topic");
topology.addProcessor("aggregator", new CustomMaxAggregatorSupplier(), "sourceProcessor");
topology.addStateStore(
Stores.keyValueStoreBuilder(
Stores.inMemoryKeyValueStore("aggStore"),
Serdes.String(),
Serdes.Long()).withLoggingDisabled(), // need to disable logging to allow store pre-populating
"aggregator");
topology.addSink("sinkProcessor", "result-topic", "aggregator");
// setup test driver
Properties props = new Properties();
props.setProperty(StreamsConfig.APPLICATION_ID_CONFIG, "maxAggregation");
props.setProperty(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "dummy:1234");
props.setProperty(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName());
props.setProperty(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.Long().getClass().getName());
testDriver = new TopologyTestDriver(topology, props);
// setup test topics
inputTopic = testDriver.createInputTopic("input-topic", stringSerde.serializer(), longSerde.serializer());
outputTopic = testDriver.createOutputTopic("result-topic", stringSerde.deserializer(), longSerde.deserializer());
// pre-populate store
store = testDriver.getKeyValueStore("aggStore");
store.put("a", 21L);
}
@After
public void tearDown() {
testDriver.close();
}
@Test
public void shouldFlushStoreForFirstInput() {
inputTopic.pipeInput("a", 1L);
assertThat(outputTopic.readKeyValue(), equalTo(new KeyValue&lt;&gt;("a", 21L)));
assertThat(outputTopic.isEmpty(), is(true));
}
@Test
public void shouldNotUpdateStoreForSmallerValue() {
inputTopic.pipeInput("a", 1L);
assertThat(store.get("a"), equalTo(21L));
assertThat(outputTopic.readKeyValue(), equalTo(new KeyValue&lt;&gt;("a", 21L)));
assertThat(outputTopic.isEmpty(), is(true));
}
@Test
public void shouldNotUpdateStoreForLargerValue() {
inputTopic.pipeInput("a", 42L);
assertThat(store.get("a"), equalTo(42L));
assertThat(outputTopic.readKeyValue(), equalTo(new KeyValue&lt;&gt;("a", 42L)));
assertThat(outputTopic.isEmpty(), is(true));
}
@Test
public void shouldUpdateStoreForNewKey() {
inputTopic.pipeInput("b", 21L);
assertThat(store.get("b"), equalTo(21L));
assertThat(outputTopic.readKeyValue(), equalTo(new KeyValue&lt;&gt;("a", 21L)));
assertThat(outputTopic.readKeyValue(), equalTo(new KeyValue&lt;&gt;("b", 21L)));
assertThat(outputTopic.isEmpty(), is(true));
}
@Test
public void shouldPunctuateIfEvenTimeAdvances() {
final Instant recordTime = Instant.now();
inputTopic.pipeInput("a", 1L, recordTime);
assertThat(outputTopic.readKeyValue(), equalTo(new KeyValue&lt;&gt;("a", 21L)));
inputTopic.pipeInput("a", 1L, recordTime);
assertThat(outputTopic.isEmpty(), is(true));
inputTopic.pipeInput("a", 1L, recordTime.plusSeconds(10L));
assertThat(outputTopic.readKeyValue(), equalTo(new KeyValue&lt;&gt;("a", 21L)));
assertThat(outputTopic.isEmpty(), is(true));
}
@Test
public void shouldPunctuateIfWallClockTimeAdvances() {
testDriver.advanceWallClockTime(Duration.ofSeconds(60));
assertThat(outputTopic.readKeyValue(), equalTo(new KeyValue&lt;&gt;("a", 21L)));
assertThat(outputTopic.isEmpty(), is(true));
}
public class CustomMaxAggregatorSupplier implements ProcessorSupplier&lt;String, Long&gt; {
@Override
public Processor&lt;String, Long&gt; get() {
return new CustomMaxAggregator();
}
}
public class CustomMaxAggregator implements Processor&lt;String, Long&gt; {
ProcessorContext context;
private KeyValueStore&lt;String, Long&gt; store;
@SuppressWarnings("unchecked")
@Override
public void init(ProcessorContext context) {
this.context = context;
context.schedule(Duration.ofSeconds(60), PunctuationType.WALL_CLOCK_TIME, time -&gt; flushStore());
context.schedule(Duration.ofSeconds(10), PunctuationType.STREAM_TIME, time -&gt; flushStore());
store = (KeyValueStore&lt;String, Long&gt;) context.getStateStore("aggStore");
}
@Override
public void process(String key, Long value) {
Long oldValue = store.get(key);
if (oldValue == null || value &gt; oldValue) {
store.put(key, value);
}
}
private void flushStore() {
KeyValueIterator&lt;String, Long&gt; it = store.all();
while (it.hasNext()) {
KeyValue&lt;String, Long&gt; next = it.next();
context.forward(next.key, next.value);
}
}
@Override
public void close() {}
}
</pre>
</div>
<div class="section" id="unit-testing-processors">
<h2>
<a class="headerlink" href="#unit-testing-processors"
title="Permalink to this headline">Unit Testing Processors</a>
</h2>
<p>
If you <a href="processor-api.html">write a Processor</a>, you will want to test it.
</p>
<p>
Because the <code>Processor</code> forwards its results to the context rather than returning them,
Unit testing requires a mocked context capable of capturing forwarded data for inspection.
For this reason, we provide a <code>MockProcessorContext</code> in <a href="#test-utils-artifact"><code>test-utils</code></a>.
</p>
<b>Construction</b>
<p>
To begin with, instantiate your processor and initialize it with the mock context:
<pre>
final Processor processorUnderTest = ...;
final MockProcessorContext context = new MockProcessorContext();
processorUnderTest.init(context);
</pre>
If you need to pass configuration to your processor or set the default serdes, you can create the mock with
config:
<pre>
final Properties props = new Properties();
props.put(StreamsConfig.APPLICATION_ID_CONFIG, "unit-test");
props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "");
props.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass());
props.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.Long().getClass());
props.put("some.other.config", "some config value");
final MockProcessorContext context = new MockProcessorContext(props);
</pre>
</p>
<b>Captured data</b>
<p>
The mock will capture any values that your processor forwards. You can make assertions on them:
<pre>
processorUnderTest.process("key", "value");
final Iterator&lt;CapturedForward&gt; forwarded = context.forwarded().iterator();
assertEquals(forwarded.next().keyValue(), new KeyValue&lt;&gt;(..., ...));
assertFalse(forwarded.hasNext());
// you can reset forwards to clear the captured data. This may be helpful in constructing longer scenarios.
context.resetForwards();
assertEquals(context.forwarded().size(), 0);
</pre>
If your processor forwards to specific child processors, you can query the context for captured data by
child name:
<pre>
final List&lt;CapturedForward&gt; captures = context.forwarded("childProcessorName");
</pre>
The mock also captures whether your processor has called <code>commit()</code> on the context:
<pre>
assertTrue(context.committed());
// commit captures can also be reset.
context.resetCommit();
assertFalse(context.committed());
</pre>
</p>
<b>Setting record metadata</b>
<p>
In case your processor logic depends on the record metadata (topic, partition, offset, or timestamp),
you can set them on the context, either all together or individually:
<pre>
context.setRecordMetadata("topicName", /*partition*/ 0, /*offset*/ 0L, /*timestamp*/ 0L);
context.setTopic("topicName");
context.setPartition(0);
context.setOffset(0L);
context.setTimestamp(0L);
</pre>
Once these are set, the context will continue returning the same values, until you set new ones.
</p>
<b>State stores</b>
<p>
In case your punctuator is stateful, the mock context allows you to register state stores.
You're encouraged to use a simple in-memory store of the appropriate type (KeyValue, Windowed, or
Session), since the mock context does <i>not</i> manage changelogs, state directories, etc.
</p>
<pre>
final KeyValueStore&lt;String, Integer&gt; store =
Stores.keyValueStoreBuilder(
Stores.inMemoryKeyValueStore("myStore"),
Serdes.String(),
Serdes.Integer()
)
.withLoggingDisabled() // Changelog is not supported by MockProcessorContext.
.build();
store.init(context, store);
context.register(store, /*deprecated parameter*/ false, /*parameter unused in mock*/ null);
</pre>
<b>Verifying punctuators</b>
<p>
Processors can schedule punctuators to handle periodic tasks.
The mock context does <i>not</i> automatically execute punctuators, but it does capture them to
allow you to unit test them as well:
<pre>
final MockProcessorContext.CapturedPunctuator capturedPunctuator = context.scheduledPunctuators().get(0);
final long interval = capturedPunctuator.getIntervalMs();
final PunctuationType type = capturedPunctuator.getType();
final boolean cancelled = capturedPunctuator.cancelled();
final Punctuator punctuator = capturedPunctuator.getPunctuator();
punctuator.punctuate(/*timestamp*/ 0L);
</pre>
If you need to write tests involving automatic firing of scheduled punctuators, we recommend creating a
simple topology with your processor and using the <a href="testing.html#testing-topologytestdriver"><code>TopologyTestDriver</code></a>.
</p>
</div>
</div>
<div class="pagination">
<a href="/{{version}}/documentation/streams/developer-guide/datatypes" class="pagination__btn pagination__btn__prev">Previous</a>
<a href="/{{version}}/documentation/streams/developer-guide/interactive-queries" class="pagination__btn pagination__btn__next">Next</a>
</div>
</script>
<!--#include virtual="../../../includes/_header.htm" -->
<!--#include virtual="../../../includes/_top.htm" -->
<div class="content documentation documentation--current">
<!--#include virtual="../../../includes/_nav.htm" -->
<div class="right">
<!--#include virtual="../../../includes/_docs_banner.htm" -->
<ul class="breadcrumbs">
<li><a href="/documentation">Documentation</a></li>
<li><a href="/documentation/streams">Kafka Streams</a></li>
<li><a href="/documentation/streams/developer-guide/">Developer Guide</a></li>
</ul>
<div class="p-content"></div>
</div>
</div>
<!--#include virtual="../../../includes/_footer.htm" -->
<script>
$(function () {
// Show selected style on nav item
$('.b-nav__streams').addClass('selected');
//sticky secondary nav
var $navbar = $(".sub-nav-sticky"),
y_pos = $navbar.offset().top,
height = $navbar.height();
$(window).scroll(function () {
var scrollTop = $(window).scrollTop();
if (scrollTop > y_pos - height) {
$navbar.addClass("navbar-fixed")
} else if (scrollTop <= y_pos) {
$navbar.removeClass("navbar-fixed")
}
});
// Display docs subnav items
$('.b-nav__docs').parent().toggleClass('nav__item__with__subs--expanded');
});
</script>

View File

@@ -0,0 +1,263 @@
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<script><!--#include virtual="../../js/templateData.js" --></script>
<script id="content-template" type="text/x-handlebars-template">
<!-- h1>Developer Guide for Kafka Streams</h1 -->
<div class="sub-nav-sticky">
<!-- div class="sticky-top">
<div style="height:35px">
<a href="/{{version}}/documentation/streams/">Introduction</a>
<a class="active-menu-item" href="/{{version}}/documentation/streams/developer-guide">Developer Guide</a>
<a href="/{{version}}/documentation/streams/core-concepts">Concepts</a>
<a href="/{{version}}/documentation/streams/quickstart">Run Demo App</a>
<a href="/{{version}}/documentation/streams/tutorial">Tutorial: Write App</a>
</div>
</div -->
</div>
<div class="section" id="writing-a-streams-application">
<span id="streams-write-app"></span><h1>Writing a Streams Application<a class="headerlink" href="#writing-a-streams-application" title="Permalink to this headline"></a></h1>
<p class="topic-title first"><b>Table of Contents</b></p>
<ul class="simple">
<li><a class="reference internal" href="#libraries-and-maven-artifacts" id="id1">Libraries and Maven artifacts</a></li>
<li><a class="reference internal" href="#using-kafka-streams-within-your-application-code" id="id2">Using Kafka Streams within your application code</a></li>
<li><a class="reference internal" href="#testing-a-streams-app" id="id3">Testing a Streams application</a></li>
</ul>
<p>Any Java or Scala application that makes use of the Kafka Streams library is considered a Kafka Streams application.
The computational logic of a Kafka Streams application is defined as a <a class="reference internal" href="../core-concepts#streams_topology"><span class="std std-ref">processor topology</span></a>,
which is a graph of stream processors (nodes) and streams (edges).</p>
<p>You can define the processor topology with the Kafka Streams APIs:</p>
<dl class="docutils">
<dt><a class="reference internal" href="dsl-api.html#streams-developer-guide-dsl"><span class="std std-ref">Kafka Streams DSL</span></a></dt>
<dd>A high-level API that provides the most common data transformation operations such as <code class="docutils literal"><span class="pre">map</span></code>, <code class="docutils literal"><span class="pre">filter</span></code>, <code class="docutils literal"><span class="pre">join</span></code>, and <code class="docutils literal"><span class="pre">aggregations</span></code> out of the box. The DSL is the recommended starting point for developers new to Kafka Streams, and should cover many use cases and stream processing needs. If you're writing a Scala application then you can use the <a href="dsl-api.html#scala-dsl"><span class="std std-ref">Kafka Streams DSL for Scala</span></a> library which removes much of the Java/Scala interoperability boilerplate as opposed to working directly with the Java DSL.</a></dd>
<dt><a class="reference internal" href="processor-api.html#streams-developer-guide-processor-api"><span class="std std-ref">Processor API</span></a></dt>
<dd>A low-level API that lets you add and connect processors as well as interact directly with state stores. The Processor API provides you with even more flexibility than the DSL but at the expense of requiring more manual work on the side of the application developer (e.g., more lines of code).</dd>
</dl>
<div class="section" id="libraries-and-maven-artifacts">
<span id="streams-developer-guide-maven"></span><h2>Libraries and Maven artifacts</h2>
<p>This section lists the Kafka Streams related libraries that are available for writing your Kafka Streams applications.</p>
<p>You can define dependencies on the following libraries for your Kafka Streams applications.</p>
<table border="1" class="datatable">
<colgroup>
<col width="14%" />
<col width="19%" />
<col width="12%" />
<col width="55%" />
</colgroup>
<thead valign="bottom">
<tr class="row-odd"><th class="head">Group ID</th>
<th class="head">Artifact ID</th>
<th class="head">Version</th>
<th class="head">Description</th>
</tr>
</thead>
<tbody valign="top">
<tr class="row-even"><td><code class="docutils literal"><span class="pre">org.apache.kafka</span></code></td>
<td><code class="docutils literal"><span class="pre">kafka-streams</span></code></td>
<td><code class="docutils literal"><span class="pre">{{fullDotVersion}}</span></code></td>
<td>(Required) Base library for Kafka Streams.</td>
</tr>
<tr class="row-odd"><td><code class="docutils literal"><span class="pre">org.apache.kafka</span></code></td>
<td><code class="docutils literal"><span class="pre">kafka-clients</span></code></td>
<td><code class="docutils literal"><span class="pre">{{fullDotVersion}}</span></code></td>
<td>(Required) Kafka client library. Contains built-in serializers/deserializers.</td>
</tr>
<tr class="row-even"><td><code class="docutils literal"><span class="pre">org.apache.kafka</span></code></td>
<td><code class="docutils literal"><span class="pre">kafka-streams-scala</span></code></td>
<td><code class="docutils literal"><span class="pre">{{fullDotVersion}}</span></code></td>
<td>(Optional) Kafka Streams DSL for Scala library to write Scala Kafka Streams applications. When not using SBT you will need to suffix the artifact ID with the correct version of Scala your application is using (<code class="docutils literal"><span class="pre">_2.12</code></span>, <code class="docutils literal"><span class="pre">_2.13</code></span>)</td>
</tr>
</tbody>
</table>
<div class="admonition tip">
<p><b>Tip</b></p>
<p class="last">See the section <a class="reference internal" href="datatypes.html#streams-developer-guide-serdes"><span class="std std-ref">Data Types and Serialization</span></a> for more information about Serializers/Deserializers.</p>
</div>
<p>Example <code class="docutils literal"><span class="pre">pom.xml</span></code> snippet when using Maven:</p>
<pre class="brush: xml;">
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-streams</artifactId>
<version>{{fullDotVersion}}</version>
</dependency>
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-clients</artifactId>
<version>{{fullDotVersion}}</version>
</dependency>
<!-- Optionally include Kafka Streams DSL for Scala for Scala {{scalaVersion}} -->
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-streams-scala_{{scalaVersion}}</artifactId>
<version>{{fullDotVersion}}</version>
</dependency>
</pre>
</div>
<div class="section" id="using-kafka-streams-within-your-application-code">
<h2>Using Kafka Streams within your application code<a class="headerlink" href="#using-kafka-streams-within-your-application-code" title="Permalink to this headline"></a></h2>
<p>You can call Kafka Streams from anywhere in your application code, but usually these calls are made within the <code class="docutils literal"><span class="pre">main()</span></code> method of
your application, or some variant thereof. The basic elements of defining a processing topology within your application
are described below.</p>
<p>First, you must create an instance of <code class="docutils literal"><span class="pre">KafkaStreams</span></code>.</p>
<ul class="simple">
<li>The first argument of the <code class="docutils literal"><span class="pre">KafkaStreams</span></code> constructor takes a topology (either <code class="docutils literal"><span class="pre">StreamsBuilder#build()</span></code> for the
<a class="reference internal" href="dsl-api.html#streams-developer-guide-dsl"><span class="std std-ref">DSL</span></a> or <code class="docutils literal"><span class="pre">Topology</span></code> for the
<a class="reference internal" href="processor-api.html#streams-developer-guide-processor-api"><span class="std std-ref">Processor API</span></a>) that is used to define a topology.</li>
<li>The second argument is an instance of <code class="docutils literal"><span class="pre">java.util.Properties</span></code>, which defines the configuration for this specific topology.</li>
</ul>
<p>Code example:</p>
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">org.apache.kafka.streams.KafkaStreams</span><span class="o">;</span>
<span class="kn">import</span> <span class="nn">org.apache.kafka.streams.kstream.StreamsBuilder</span><span class="o">;</span>
<span class="kn">import</span> <span class="nn">org.apache.kafka.streams.processor.Topology</span><span class="o">;</span>
<span class="c1">// Use the builders to define the actual processing topology, e.g. to specify</span>
<span class="c1">// from which input topics to read, which stream operations (filter, map, etc.)</span>
<span class="c1">// should be called, and so on. We will cover this in detail in the subsequent</span>
<span class="c1">// sections of this Developer Guide.</span>
<span class="n">StreamsBuilder</span> <span class="n">builder</span> <span class="o">=</span> <span class="o">...;</span> <span class="c1">// when using the DSL</span>
<span class="n">Topology</span> <span class="n">topology</span> <span class="o">=</span> <span class="n">builder</span><span class="o">.</span><span class="na">build</span><span class="o">();</span>
<span class="c1">//</span>
<span class="c1">// OR</span>
<span class="c1">//</span>
<span class="n">Topology</span> <span class="n">topology</span> <span class="o">=</span> <span class="o">...;</span> <span class="c1">// when using the Processor API</span>
<span class="c1">// Use the configuration to tell your application where the Kafka cluster is,</span>
<span class="c1">// which Serializers/Deserializers to use by default, to specify security settings,</span>
<span class="c1">// and so on.</span>
<span class="n">Properties</span> <span class="n">props</span> <span class="o">=</span> <span class="o">...;</span>
<span class="n">KafkaStreams</span> <span class="n">streams</span> <span class="o">=</span> <span class="k">new</span> <span class="n">KafkaStreams</span><span class="o">(</span><span class="n">topology</span><span class="o">,</span> <span class="n">props</span><span class="o">);</span>
</pre></div>
</div>
<p>At this point, internal structures are initialized, but the processing is not started yet.
You have to explicitly start the Kafka Streams thread by calling the <code class="docutils literal"><span class="pre">KafkaStreams#start()</span></code> method:</p>
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="c1">// Start the Kafka Streams threads</span>
<span class="n">streams</span><span class="o">.</span><span class="na">start</span><span class="o">();</span>
</pre></div>
</div>
<p>If there are other instances of this stream processing application running elsewhere (e.g., on another machine), Kafka
Streams transparently re-assigns tasks from the existing instances to the new instance that you just started.
For more information, see <a class="reference internal" href="../architecture.html#streams_architecture_tasks"><span class="std std-ref">Stream Partitions and Tasks</span></a> and <a class="reference internal" href="../architecture.html#streams-architecture-threads"><span class="std std-ref">Threading Model</span></a>.</p>
<p>To catch any unexpected exceptions, you can set an <code class="docutils literal"><span class="pre">java.lang.Thread.UncaughtExceptionHandler</span></code> before you start the
application. This handler is called whenever a stream thread is terminated by an unexpected exception:</p>
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="c1">// Java 8+, using lambda expressions</span>
<span class="n">streams</span><span class="o">.</span><span class="na">setUncaughtExceptionHandler</span><span class="o">((</span><span class="n">Thread</span> <span class="n">thread</span><span class="o">,</span> <span class="n">Throwable</span> <span class="n">throwable</span><span class="o">)</span> <span class="o">-&gt;</span> <span class="o">{</span>
<span class="c1">// here you should examine the throwable/exception and perform an appropriate action!</span>
<span class="o">});</span>
<span class="c1">// Java 7</span>
<span class="n">streams</span><span class="o">.</span><span class="na">setUncaughtExceptionHandler</span><span class="o">(</span><span class="k">new</span> <span class="n">Thread</span><span class="o">.</span><span class="na">UncaughtExceptionHandler</span><span class="o">()</span> <span class="o">{</span>
<span class="kd">public</span> <span class="kt">void</span> <span class="nf">uncaughtException</span><span class="o">(</span><span class="n">Thread</span> <span class="n">thread</span><span class="o">,</span> <span class="n">Throwable</span> <span class="n">throwable</span><span class="o">)</span> <span class="o">{</span>
<span class="c1">// here you should examine the throwable/exception and perform an appropriate action!</span>
<span class="o">}</span>
<span class="o">});</span>
</pre></div>
</div>
<p>To stop the application instance, call the <code class="docutils literal"><span class="pre">KafkaStreams#close()</span></code> method:</p>
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="c1">// Stop the Kafka Streams threads</span>
<span class="n">streams</span><span class="o">.</span><span class="na">close</span><span class="o">();</span>
</pre></div>
</div>
<p>To allow your application to gracefully shutdown in response to SIGTERM, it is recommended that you add a shutdown hook
and call <code class="docutils literal"><span class="pre">KafkaStreams#close</span></code>.</p>
<ul>
<li><p class="first">Here is a shutdown hook example in Java 8+:</p>
<blockquote>
<div><div class="highlight-java"><div class="highlight"><pre><span></span><span class="c1">// Add shutdown hook to stop the Kafka Streams threads.</span>
<span class="c1">// You can optionally provide a timeout to `close`.</span>
<span class="n">Runtime</span><span class="o">.</span><span class="na">getRuntime</span><span class="o">().</span><span class="na">addShutdownHook</span><span class="o">(</span><span class="k">new</span> <span class="n">Thread</span><span class="o">(</span><span class="n">streams</span><span class="o">::</span><span class="n">close</span><span class="o">));</span>
</pre></div>
</div>
</div></blockquote>
</li>
<li><p class="first">Here is a shutdown hook example in Java 7:</p>
<blockquote>
<div><div class="highlight-java"><div class="highlight"><pre><span></span><span class="c1">// Add shutdown hook to stop the Kafka Streams threads.</span>
<span class="c1">// You can optionally provide a timeout to `close`.</span>
<span class="n">Runtime</span><span class="o">.</span><span class="na">getRuntime</span><span class="o">().</span><span class="na">addShutdownHook</span><span class="o">(</span><span class="k">new</span> <span class="n">Thread</span><span class="o">(</span><span class="k">new</span> <span class="n">Runnable</span><span class="o">()</span> <span class="o">{</span>
<span class="nd">@Override</span>
<span class="kd">public</span> <span class="kt">void</span> <span class="nf">run</span><span class="o">()</span> <span class="o">{</span>
<span class="n">streams</span><span class="o">.</span><span class="na">close</span><span class="o">();</span>
<span class="o">}</span>
<span class="o">}));</span>
</pre></div>
</div>
</div></blockquote>
</li>
</ul>
<p>After an application is stopped, Kafka Streams will migrate any tasks that had been running in this instance to available remaining
instances.</p>
</div>
<div class="section" id="testing-a-streams-app">
<a class="headerlink" href="#testing-a-streams-app" title="Permalink to this headline"><h2>Testing a Streams application</a></h2>
Kafka Streams comes with a <code>test-utils</code> module to help you test your application <a href="testing.html">here</a>.
</div>
</div>
</div>
</div>
<div class="pagination">
<a href="/{{version}}/documentation/streams/developer-guide/" class="pagination__btn pagination__btn__prev">Previous</a>
<a href="/{{version}}/documentation/streams/developer-guide/config-streams" class="pagination__btn pagination__btn__next">Next</a>
</div>
</script>
<!--#include virtual="../../../includes/_header.htm" -->
<!--#include virtual="../../../includes/_top.htm" -->
<div class="content documentation documentation--current">
<!--#include virtual="../../../includes/_nav.htm" -->
<div class="right">
<!--#include virtual="../../../includes/_docs_banner.htm" -->
<ul class="breadcrumbs">
<li><a href="/documentation">Documentation</a></li>
<li><a href="/documentation/streams">Kafka Streams</a></li>
<li><a href="/documentation/streams/developer-guide/">Developer Guide</a></li>
</ul>
<div class="p-content"></div>
</div>
</div>
<!--#include virtual="../../../includes/_footer.htm" -->
<script>
$(function() {
// Show selected style on nav item
$('.b-nav__streams').addClass('selected');
//sticky secondary nav
var $navbar = $(".sub-nav-sticky"),
y_pos = $navbar.offset().top,
height = $navbar.height();
$(window).scroll(function() {
var scrollTop = $(window).scrollTop();
if (scrollTop > y_pos - height) {
$navbar.addClass("navbar-fixed")
} else if (scrollTop <= y_pos) {
$navbar.removeClass("navbar-fixed")
}
});
// Display docs subnav items
$('.b-nav__docs').parent().toggleClass('nav__item__with__subs--expanded');
});
</script>