mirror of
https://github.com/didi/KnowStreaming.git
synced 2026-01-02 02:02:13 +08:00
Add km module kafka
This commit is contained in:
181
docs/streams/architecture.html
Normal file
181
docs/streams/architecture.html
Normal file
@@ -0,0 +1,181 @@
|
||||
<!--
|
||||
Licensed to the Apache Software Foundation (ASF) under one or more
|
||||
contributor license agreements. See the NOTICE file distributed with
|
||||
this work for additional information regarding copyright ownership.
|
||||
The ASF licenses this file to You under the Apache License, Version 2.0
|
||||
(the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
-->
|
||||
|
||||
<script><!--#include virtual="../js/templateData.js" --></script>
|
||||
|
||||
<script id="content-template" type="text/x-handlebars-template">
|
||||
<h1>Architecture</h1>
|
||||
<div class="sub-nav-sticky">
|
||||
<div class="sticky-top">
|
||||
<div style="height:35px">
|
||||
<a href="/{{version}}/documentation/streams/">Introduction</a>
|
||||
<a href="/{{version}}/documentation/streams/quickstart">Run Demo App</a>
|
||||
<a href="/{{version}}/documentation/streams/tutorial">Tutorial: Write App</a>
|
||||
<a href="/{{version}}/documentation/streams/core-concepts">Concepts</a>
|
||||
<a class="active-menu-item" href="/{{version}}/documentation/streams/architecture">Architecture</a>
|
||||
<a href="/{{version}}/documentation/streams/developer-guide/">Developer Guide</a>
|
||||
<a href="/{{version}}/documentation/streams/upgrade-guide">Upgrade</a>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
Kafka Streams simplifies application development by building on the Kafka producer and consumer libraries and leveraging the native capabilities of
|
||||
Kafka to offer data parallelism, distributed coordination, fault tolerance, and operational simplicity. In this section, we describe how Kafka Streams works underneath the covers.
|
||||
|
||||
<p>
|
||||
The picture below shows the anatomy of an application that uses the Kafka Streams library. Let's walk through some details.
|
||||
</p>
|
||||
<img class="centered" src="/{{version}}/images/streams-architecture-overview.jpg" style="width:750px">
|
||||
|
||||
<h3><a id="streams_architecture_tasks" href="#streams_architecture_tasks">Stream Partitions and Tasks</a></h3>
|
||||
|
||||
<p>
|
||||
The messaging layer of Kafka partitions data for storing and transporting it. Kafka Streams partitions data for processing it.
|
||||
In both cases, this partitioning is what enables data locality, elasticity, scalability, high performance, and fault tolerance.
|
||||
Kafka Streams uses the concepts of <b>partitions</b> and <b>tasks</b> as logical units of its parallelism model based on Kafka topic partitions.
|
||||
There are close links between Kafka Streams and Kafka in the context of parallelism:
|
||||
</p>
|
||||
|
||||
<ul>
|
||||
<li>Each <b>stream partition</b> is a totally ordered sequence of data records and maps to a Kafka <b>topic partition</b>.</li>
|
||||
<li>A <b>data record</b> in the stream maps to a Kafka <b>message</b> from that topic.</li>
|
||||
<li>The <b>keys</b> of data records determine the partitioning of data in both Kafka and Kafka Streams, i.e., how data is routed to specific partitions within topics.</li>
|
||||
</ul>
|
||||
|
||||
<p>
|
||||
An application's processor topology is scaled by breaking it into multiple tasks.
|
||||
More specifically, Kafka Streams creates a fixed number of tasks based on the input stream partitions for the application,
|
||||
with each task assigned a list of partitions from the input streams (i.e., Kafka topics). The assignment of partitions to tasks
|
||||
never changes so that each task is a fixed unit of parallelism of the application. Tasks can then instantiate their own processor topology
|
||||
based on the assigned partitions; they also maintain a buffer for each of its assigned partitions and process messages one-at-a-time from
|
||||
these record buffers. As a result stream tasks can be processed independently and in parallel without manual intervention.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
Slightly simplified, the maximum parallelism at which your application may run is bounded by the maximum number of stream tasks, which itself is determined by
|
||||
maximum number of partitions of the input topic(s) the application is reading from. For example, if your input topic has 5 partitions, then you can run up to 5
|
||||
applications instances. These instances will collaboratively process the topic’s data. If you run a larger number of app instances than partitions of the input
|
||||
topic, the “excess” app instances will launch but remain idle; however, if one of the busy instances goes down, one of the idle instances will resume the former’s
|
||||
work.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
It is important to understand that Kafka Streams is not a resource manager, but a library that "runs" anywhere its stream processing application runs.
|
||||
Multiple instances of the application are executed either on the same machine, or spread across multiple machines and tasks can be distributed automatically
|
||||
by the library to those running application instances. The assignment of partitions to tasks never changes; if an application instance fails, all its assigned
|
||||
tasks will be automatically restarted on other instances and continue to consume from the same stream partitions.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
The following diagram shows two tasks each assigned with one partition of the input streams.
|
||||
</p>
|
||||
<img class="centered" src="/{{version}}/images/streams-architecture-tasks.jpg" style="width:400px">
|
||||
<br>
|
||||
|
||||
<h3><a id="streams_architecture_threads" href="#streams_architecture_threads">Threading Model</a></h3>
|
||||
|
||||
<p>
|
||||
Kafka Streams allows the user to configure the number of <b>threads</b> that the library can use to parallelize processing within an application instance.
|
||||
Each thread can execute one or more tasks with their processor topologies independently. For example, the following diagram shows one stream thread running two stream tasks.
|
||||
</p>
|
||||
<img class="centered" src="/{{version}}/images/streams-architecture-threads.jpg" style="width:400px">
|
||||
|
||||
<p>
|
||||
Starting more stream threads or more instances of the application merely amounts to replicating the topology and having it process a different subset of Kafka partitions, effectively parallelizing processing.
|
||||
It is worth noting that there is no shared state amongst the threads, so no inter-thread coordination is necessary. This makes it very simple to run topologies in parallel across the application instances and threads.
|
||||
The assignment of Kafka topic partitions amongst the various stream threads is transparently handled by Kafka Streams leveraging <a href="https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Client-side+Assignment+Proposal">Kafka's coordination</a> functionality.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
As we described above, scaling your stream processing application with Kafka Streams is easy: you merely need to start additional instances of your application,
|
||||
and Kafka Streams takes care of distributing partitions amongst tasks that run in the application instances. You can start as many threads of the application
|
||||
as there are input Kafka topic partitions so that, across all running instances of an application, every thread (or rather, the tasks it runs) has at least one input partition to process.
|
||||
</p>
|
||||
<br>
|
||||
|
||||
<h3><a id="streams_architecture_state" href="#streams_architecture_state">Local State Stores</a></h3>
|
||||
|
||||
<p>
|
||||
Kafka Streams provides so-called <b>state stores</b>, which can be used by stream processing applications to store and query data,
|
||||
which is an important capability when implementing stateful operations. The <a href="/{{version}}/documentation/streams/developer-guide#streams_dsl">Kafka Streams DSL</a>, for example, automatically creates
|
||||
and manages such state stores when you are calling stateful operators such as <code>join()</code> or <code>aggregate()</code>, or when you are windowing a stream.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
Every stream task in a Kafka Streams application may embed one or more local state stores that can be accessed via APIs to store and query data required for processing.
|
||||
Kafka Streams offers fault-tolerance and automatic recovery for such local state stores.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
The following diagram shows two stream tasks with their dedicated local state stores.
|
||||
</p>
|
||||
<img class="centered" src="/{{version}}/images/streams-architecture-states.jpg" style="width:400px">
|
||||
<br>
|
||||
|
||||
<h3><a id="streams_architecture_recovery" href="#streams_architecture_recovery">Fault Tolerance</a></h3>
|
||||
|
||||
<p>
|
||||
Kafka Streams builds on fault-tolerance capabilities integrated natively within Kafka. Kafka partitions are highly available and replicated; so when stream data is persisted to Kafka it is available
|
||||
even if the application fails and needs to re-process it. Tasks in Kafka Streams leverage the fault-tolerance capability
|
||||
offered by the Kafka consumer client to handle failures.
|
||||
If a task runs on a machine that fails, Kafka Streams automatically restarts the task in one of the remaining running instances of the application.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
In addition, Kafka Streams makes sure that the local state stores are robust to failures, too. For each state store, it maintains a replicated changelog Kafka topic in which it tracks any state updates.
|
||||
These changelog topics are partitioned as well so that each local state store instance, and hence the task accessing the store, has its own dedicated changelog topic partition.
|
||||
<a href="/{{version}}/documentation/#compaction">Log compaction</a> is enabled on the changelog topics so that old data can be purged safely to prevent the topics from growing indefinitely.
|
||||
If tasks run on a machine that fails and are restarted on another machine, Kafka Streams guarantees to restore their associated state stores to the content before the failure by
|
||||
replaying the corresponding changelog topics prior to resuming the processing on the newly started tasks. As a result, failure handling is completely transparent to the end user.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
Note that the cost of task (re)initialization typically depends primarily on the time for restoring the state by replaying the state stores' associated changelog topics.
|
||||
To minimize this restoration time, users can configure their applications to have <b>standby replicas</b> of local states (i.e. fully replicated copies of the state).
|
||||
When a task migration happens, Kafka Streams then attempts to assign a task to an application instance where such a standby replica already exists in order to minimize
|
||||
the task (re)initialization cost. See <code>num.standby.replicas</code> in the <a href="/{{version}}/documentation/#streamsconfigs"><b>Kafka Streams Configs</b></a> section.
|
||||
</p>
|
||||
|
||||
<div class="pagination">
|
||||
<a href="/{{version}}/documentation/streams/core-concepts" class="pagination__btn pagination__btn__prev">Previous</a>
|
||||
<a href="/{{version}}/documentation/streams/developer-guide" class="pagination__btn pagination__btn__next">Next</a>
|
||||
</div>
|
||||
</script>
|
||||
|
||||
<!--#include virtual="../../includes/_header.htm" -->
|
||||
<!--#include virtual="../../includes/_top.htm" -->
|
||||
<div class="content documentation documentation--current">
|
||||
<!--#include virtual="../../includes/_nav.htm" -->
|
||||
<div class="right">
|
||||
<!--#include virtual="../../includes/_docs_banner.htm" -->
|
||||
<ul class="breadcrumbs">
|
||||
<li><a href="/documentation">Documentation</a></li>
|
||||
<li><a href="/documentation/streams">Kafka Streams</a></li>
|
||||
</ul>
|
||||
<div class="p-content"></div>
|
||||
</div>
|
||||
</div>
|
||||
<!--#include virtual="../../includes/_footer.htm" -->
|
||||
<script>
|
||||
$(function() {
|
||||
// Show selected style on nav item
|
||||
$('.b-nav__streams').addClass('selected');
|
||||
|
||||
// Display docs subnav items
|
||||
$('.b-nav__docs').parent().toggleClass('nav__item__with__subs--expanded');
|
||||
});
|
||||
</script>
|
||||
297
docs/streams/core-concepts.html
Normal file
297
docs/streams/core-concepts.html
Normal file
@@ -0,0 +1,297 @@
|
||||
<!--
|
||||
Licensed to the Apache Software Foundation (ASF) under one or more
|
||||
contributor license agreements. See the NOTICE file distributed with
|
||||
this work for additional information regarding copyright ownership.
|
||||
The ASF licenses this file to You under the Apache License, Version 2.0
|
||||
(the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
-->
|
||||
|
||||
<script><!--#include virtual="../js/templateData.js" --></script>
|
||||
|
||||
<script id="content-template" type="text/x-handlebars-template">
|
||||
<h1>Core Concepts</h1>
|
||||
<div class="sub-nav-sticky">
|
||||
<div class="sticky-top">
|
||||
<div style="height:35px">
|
||||
<a href="/{{version}}/documentation/streams/">Introduction</a>
|
||||
<a href="/{{version}}/documentation/streams/quickstart">Run Demo App</a>
|
||||
<a href="/{{version}}/documentation/streams/tutorial">Tutorial: Write App</a>
|
||||
<a class="active-menu-item" href="/{{version}}/documentation/streams/core-concepts">Concepts</a>
|
||||
<a href="/{{version}}/documentation/streams/architecture">Architecture</a>
|
||||
<a href="/{{version}}/documentation/streams/developer-guide/">Developer Guide</a>
|
||||
<a href="/{{version}}/documentation/streams/upgrade-guide">Upgrade</a>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
<p>
|
||||
Kafka Streams is a client library for processing and analyzing data stored in Kafka.
|
||||
It builds upon important stream processing concepts such as properly distinguishing between event time and processing time, windowing support, and simple yet efficient management and real-time querying of application state.
|
||||
</p>
|
||||
<p>
|
||||
Kafka Streams has a <b>low barrier to entry</b>: You can quickly write and run a small-scale proof-of-concept on a single machine; and you only need to run additional instances of your application on multiple machines to scale up to high-volume production workloads.
|
||||
Kafka Streams transparently handles the load balancing of multiple instances of the same application by leveraging Kafka's parallelism model.
|
||||
</p>
|
||||
<p>
|
||||
Some highlights of Kafka Streams:
|
||||
</p>
|
||||
|
||||
<ul>
|
||||
<li>Designed as a <b>simple and lightweight client library</b>, which can be easily embedded in any Java application and integrated with any existing packaging, deployment and operational tools that users have for their streaming applications.</li>
|
||||
<li>Has <b>no external dependencies on systems other than Apache Kafka itself</b> as the internal messaging layer; notably, it uses Kafka's partitioning model to horizontally scale processing while maintaining strong ordering guarantees.</li>
|
||||
<li>Supports <b>fault-tolerant local state</b>, which enables very fast and efficient stateful operations like windowed joins and aggregations.</li>
|
||||
<li>Supports <b>exactly-once</b> processing semantics to guarantee that each record will be processed once and only once even when there is a failure on either Streams clients or Kafka brokers in the middle of processing.</li>
|
||||
<li>Employs <b>one-record-at-a-time processing</b> to achieve millisecond processing latency, and supports <b>event-time based windowing operations</b> with out-of-order arrival of records.</li>
|
||||
<li>Offers necessary stream processing primitives, along with a <b>high-level Streams DSL</b> and a <b>low-level Processor API</b>.</li>
|
||||
|
||||
</ul>
|
||||
|
||||
<p>
|
||||
We first summarize the key concepts of Kafka Streams.
|
||||
</p>
|
||||
|
||||
<h3><a id="streams_topology" href="#streams_topology">Stream Processing Topology</a></h3>
|
||||
|
||||
<ul>
|
||||
<li>A <b>stream</b> is the most important abstraction provided by Kafka Streams: it represents an unbounded, continuously updating data set. A stream is an ordered, replayable, and fault-tolerant sequence of immutable data records, where a <b>data record</b> is defined as a key-value pair.</li>
|
||||
<li>A <b>stream processing application</b> is any program that makes use of the Kafka Streams library. It defines its computational logic through one or more <b>processor topologies</b>, where a processor topology is a graph of stream processors (nodes) that are connected by streams (edges).</li>
|
||||
<li>A <a id="defining-a-stream-processor" href="/{{version}}/documentation/streams/developer-guide/processor-api#defining-a-stream-processor"><b>stream processor</b></a> is a node in the processor topology; it represents a processing step to transform data in streams by receiving one input record at a time from its upstream processors in the topology, applying its operation to it, and may subsequently produce one or more output records to its downstream processors. </li>
|
||||
</ul>
|
||||
|
||||
There are two special processors in the topology:
|
||||
|
||||
<ul>
|
||||
<li><b>Source Processor</b>: A source processor is a special type of stream processor that does not have any upstream processors. It produces an input stream to its topology from one or multiple Kafka topics by consuming records from these topics and forwarding them to its down-stream processors.</li>
|
||||
<li><b>Sink Processor</b>: A sink processor is a special type of stream processor that does not have down-stream processors. It sends any received records from its up-stream processors to a specified Kafka topic.</li>
|
||||
</ul>
|
||||
|
||||
Note that in normal processor nodes other remote systems can also be accessed while processing the current record. Therefore the processed results can either be streamed back into Kafka or written to an external system.
|
||||
|
||||
<img class="centered" src="/{{version}}/images/streams-architecture-topology.jpg" style="width:400px">
|
||||
|
||||
<p>
|
||||
Kafka Streams offers two ways to define the stream processing topology: the <a href="/{{version}}/documentation/streams/developer-guide/dsl-api.html"><b>Kafka Streams DSL</b></a> provides
|
||||
the most common data transformation operations such as <code>map</code>, <code>filter</code>, <code>join</code> and <code>aggregations</code> out of the box; the lower-level <a href="/{{version}}/documentation/streams/developer-guide/processor-api.html"><b>Processor API</b></a> allows
|
||||
developers define and connect custom processors as well as to interact with <a href="#streams_state">state stores</a>.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
A processor topology is merely a logical abstraction for your stream processing code.
|
||||
At runtime, the logical topology is instantiated and replicated inside the application for parallel processing (see <a href="/{{version}}/documentation/streams/architecture#streams_architecture_tasks"><b>Stream Partitions and Tasks</b></a> for details).
|
||||
</p>
|
||||
|
||||
<h3><a id="streams_time" href="#streams_time">Time</a></h3>
|
||||
|
||||
<p>
|
||||
A critical aspect in stream processing is the notion of <b>time</b>, and how it is modeled and integrated.
|
||||
For example, some operations such as <b>windowing</b> are defined based on time boundaries.
|
||||
</p>
|
||||
<p>
|
||||
Common notions of time in streams are:
|
||||
</p>
|
||||
|
||||
<ul>
|
||||
<li><b>Event time</b> - The point in time when an event or data record occurred, i.e. was originally created "at the source". <b>Example:</b> If the event is a geo-location change reported by a GPS sensor in a car, then the associated event-time would be the time when the GPS sensor captured the location change.</li>
|
||||
<li><b>Processing time</b> - The point in time when the event or data record happens to be processed by the stream processing application, i.e. when the record is being consumed. The processing time may be milliseconds, hours, or days etc. later than the original event time. <b>Example:</b> Imagine an analytics application that reads and processes the geo-location data reported from car sensors to present it to a fleet management dashboard. Here, processing-time in the analytics application might be milliseconds or seconds (e.g. for real-time pipelines based on Apache Kafka and Kafka Streams) or hours (e.g. for batch pipelines based on Apache Hadoop or Apache Spark) after event-time.</li>
|
||||
<li><b>Ingestion time</b> - The point in time when an event or data record is stored in a topic partition by a Kafka broker. The difference to event time is that this ingestion timestamp is generated when the record is appended to the target topic by the Kafka broker, not when the record is created "at the source". The difference to processing time is that processing time is when the stream processing application processes the record. <b>For example,</b> if a record is never processed, there is no notion of processing time for it, but it still has an ingestion time.</li>
|
||||
</ul>
|
||||
<p>
|
||||
The choice between event-time and ingestion-time is actually done through the configuration of Kafka (not Kafka Streams): From Kafka 0.10.x onwards, timestamps are automatically embedded into Kafka messages. Depending on Kafka's configuration these timestamps represent event-time or ingestion-time. The respective Kafka configuration setting can be specified on the broker level or per topic. The default timestamp extractor in Kafka Streams will retrieve these embedded timestamps as-is. Hence, the effective time semantics of your application depend on the effective Kafka configuration for these embedded timestamps.
|
||||
</p>
|
||||
<p>
|
||||
Kafka Streams assigns a <b>timestamp</b> to every data record via the <code>TimestampExtractor</code> interface.
|
||||
These per-record timestamps describe the progress of a stream with regards to time and are leveraged by time-dependent operations such as window operations.
|
||||
As a result, this time will only advance when a new record arrives at the processor.
|
||||
We call this data-driven time the <b>stream time</b> of the application to differentiate with the <b>wall-clock time</b> when this application is actually executing.
|
||||
Concrete implementations of the <code>TimestampExtractor</code> interface will then provide different semantics to the stream time definition.
|
||||
For example retrieving or computing timestamps based on the actual contents of data records such as an embedded timestamp field to provide event time semantics,
|
||||
and returning the current wall-clock time thereby yield processing time semantics to stream time.
|
||||
Developers can thus enforce different notions of time depending on their business needs.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
Finally, whenever a Kafka Streams application writes records to Kafka, then it will also assign timestamps to these new records. The way the timestamps are assigned depends on the context:
|
||||
</p>
|
||||
|
||||
<ul>
|
||||
<li> When new output records are generated via processing some input record, for example, <code>context.forward()</code> triggered in the <code>process()</code> function call, output record timestamps are inherited from input record timestamps directly.</li>
|
||||
<li> When new output records are generated via periodic functions such as <code>Punctuator#punctuate()</code>, the output record timestamp is defined as the current internal time (obtained through <code>context.timestamp()</code>) of the stream task.</li>
|
||||
<li> For aggregations, the timestamp of a result update record will be the maximum timestamp of all input records contributing to the result.</li>
|
||||
</ul>
|
||||
|
||||
<p>
|
||||
Note, that the describe default behavior can be changed in the Processor API by assigning timestamps to output records explicitly when calling <code>#forward()</code>.
|
||||
</p>
|
||||
|
||||
<h3><a id="streams_concepts_aggregations" href="#streams_concepts_aggregations">Aggregations</a></h3>
|
||||
<p>
|
||||
An <strong>aggregation</strong> operation takes one input stream or table, and yields a new table by combining multiple input records into a single output record. Examples of aggregations are computing counts or sum.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
In the <code>Kafka Streams DSL</code>, an input stream of an <code>aggregation</code> can be a KStream or a KTable, but the output stream will always be a KTable. This allows Kafka Streams to update an aggregate value upon the out-of-order arrival of further records after the value was produced and emitted. When such out-of-order arrival happens, the aggregating KStream or KTable emits a new aggregate value. Because the output is a KTable, the new value is considered to overwrite the old value with the same key in subsequent processing steps.
|
||||
</p>
|
||||
|
||||
<h3> <a id="streams_concepts_windowing" href="#streams_concepts_windowing">Windowing</a></h3>
|
||||
<p>
|
||||
Windowing lets you control how to <em>group records that have the same key</em> for stateful operations such as <code>aggregations</code> or <code>joins</code> into so-called <em>windows</em>. Windows are tracked per record key.
|
||||
</p>
|
||||
<p>
|
||||
<code>Windowing operations</code> are available in the <code>Kafka Streams DSL</code>. When working with windows, you can specify a <strong>grace period</strong> for the window. This grace period controls how long Kafka Streams will wait for <strong>out-of-order</strong> data records for a given window. If a record arrives after the grace period of a window has passed, the record is discarded and will not be processed in that window. Specifically, a record is discarded if its timestamp dictates it belongs to a window, but the current stream time is greater than the end of the window plus the grace period.
|
||||
</p>
|
||||
<p>
|
||||
Out-of-order records are always possible in the real world and should be properly accounted for in your applications. It depends on the effective <code>time semantics </code> how out-of-order records are handled. In the case of processing-time, the semantics are "when the record is being processed", which means that the notion of out-of-order records is not applicable as, by definition, no record can be out-of-order. Hence, out-of-order records can only be considered as such for event-time. In both cases, Kafka Streams is able to properly handle out-of-order records.
|
||||
</p>
|
||||
|
||||
<h3><a id="streams_concepts_duality" href="#streams-concepts-duality">Duality of Streams and Tables</a></h3>
|
||||
<p>
|
||||
When implementing stream processing use cases in practice, you typically need both <strong>streams</strong> and also <strong>databases</strong>.
|
||||
An example use case that is very common in practice is an e-commerce application that enriches an incoming <em>stream</em> of customer
|
||||
transactions with the latest customer information from a <em>database table</em>. In other words, streams are everywhere, but databases are everywhere, too.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
Any stream processing technology must therefore provide <strong>first-class support for streams and tables</strong>.
|
||||
Kafka's Streams API provides such functionality through its core abstractions for
|
||||
<a id="streams_concepts_kstream" href="/{{version}}/documentation/streams/developer-guide/dsl-api#streams_concepts_kstream">streams</a>
|
||||
and <a id="streams_concepts_ktable" href="/{{version}}/documentation/streams/developer-guide/dsl-api#streams_concepts_ktable">tables</a>,
|
||||
which we will talk about in a minute. Now, an interesting observation is that there is actually a <strong>close relationship between streams and tables</strong>,
|
||||
the so-called stream-table duality. And Kafka exploits this duality in many ways: for example, to make your applications
|
||||
<a id="streams-developer-guide-execution-scaling" href="/{{version}}/documentation/streams/developer-guide/running-app#elastic-scaling-of-your-application">elastic</a>,
|
||||
to support <a id="streams_architecture_recovery" href="/{{version}}/documentation/streams/architecture#streams_architecture_recovery">fault-tolerant stateful processing</a>,
|
||||
or to run <a id="streams-developer-guide-interactive-queries" href="/{{version}}/documentation/streams/developer-guide/interactive-queries#interactive-queries">interactive queries</a>
|
||||
against your application's latest processing results. And, beyond its internal usage, the Kafka Streams API
|
||||
also allows developers to exploit this duality in their own applications.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
Before we discuss concepts such as <a id="streams-developer-guide-dsl-aggregating" href="/{{version}}/documentation/streams/developer-guide/dsl-api#aggregating">aggregations</a>
|
||||
in Kafka Streams, we must first introduce <strong>tables</strong> in more detail, and talk about the aforementioned stream-table duality.
|
||||
Essentially, this duality means that a stream can be viewed as a table, and a table can be viewed as a stream.
|
||||
</p>
|
||||
|
||||
<h3><a id="streams_state" href="#streams_state">States</a></h3>
|
||||
|
||||
<p>
|
||||
Some stream processing applications don't require state, which means the processing of a message is independent from
|
||||
the processing of all other messages.
|
||||
However, being able to maintain state opens up many possibilities for sophisticated stream processing applications: you
|
||||
can join input streams, or group and aggregate data records. Many such stateful operators are provided by the <a href="/{{version}}/documentation/streams/developer-guide/dsl-api.html"><b>Kafka Streams DSL</b></a>.
|
||||
</p>
|
||||
<p>
|
||||
Kafka Streams provides so-called <b>state stores</b>, which can be used by stream processing applications to store and query data.
|
||||
This is an important capability when implementing stateful operations.
|
||||
Every task in Kafka Streams embeds one or more state stores that can be accessed via APIs to store and query data required for processing.
|
||||
These state stores can either be a persistent key-value store, an in-memory hashmap, or another convenient data structure.
|
||||
Kafka Streams offers fault-tolerance and automatic recovery for local state stores.
|
||||
</p>
|
||||
<p>
|
||||
Kafka Streams allows direct read-only queries of the state stores by methods, threads, processes or applications external to the stream processing application that created the state stores. This is provided through a feature called <b>Interactive Queries</b>. All stores are named and Interactive Queries exposes only the read operations of the underlying implementation.
|
||||
</p>
|
||||
<br>
|
||||
|
||||
<h2><a id="streams_processing_guarantee" href="#streams_processing_guarantee">Processing Guarantees</a></h2>
|
||||
|
||||
<p>
|
||||
In stream processing, one of the most frequently asked question is "does my stream processing system guarantee that each record is processed once and only once, even if some failures are encountered in the middle of processing?"
|
||||
Failing to guarantee exactly-once stream processing is a deal-breaker for many applications that cannot tolerate any data-loss or data duplicates, and in that case a batch-oriented framework is usually used in addition
|
||||
to the stream processing pipeline, known as the <a href="http://lambda-architecture.net/">Lambda Architecture</a>.
|
||||
Prior to 0.11.0.0, Kafka only provides at-least-once delivery guarantees and hence any stream processing systems that leverage it as the backend storage could not guarantee end-to-end exactly-once semantics.
|
||||
In fact, even for those stream processing systems that claim to support exactly-once processing, as long as they are reading from / writing to Kafka as the source / sink, their applications cannot actually guarantee that
|
||||
no duplicates will be generated throughout the pipeline.
|
||||
|
||||
Since the 0.11.0.0 release, Kafka has added support to allow its producers to send messages to different topic partitions in a <a href="https://kafka.apache.org/documentation/#semantics">transactional and idempotent manner</a>,
|
||||
and Kafka Streams has hence added the end-to-end exactly-once processing semantics by leveraging these features.
|
||||
More specifically, it guarantees that for any record read from the source Kafka topics, its processing results will be reflected exactly once in the output Kafka topic as well as in the state stores for stateful operations.
|
||||
Note the key difference between Kafka Streams end-to-end exactly-once guarantee with other stream processing frameworks' claimed guarantees is that Kafka Streams tightly integrates with the underlying Kafka storage system and ensure that
|
||||
commits on the input topic offsets, updates on the state stores, and writes to the output topics will be completed atomically instead of treating Kafka as an external system that may have side-effects.
|
||||
To read more details on how this is done inside Kafka Streams, readers are recommended to read <a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-129%3A+Streams+Exactly-Once+Semantics">KIP-129</a>.
|
||||
|
||||
In order to achieve exactly-once semantics when running Kafka Streams applications, users can simply set the <code>processing.guarantee</code> config value to <b>exactly_once</b> (default value is <b>at_least_once</b>).
|
||||
More details can be found in the <a href="/{{version}}/documentation#streamsconfigs"><b>Kafka Streams Configs</b></a> section.
|
||||
</p>
|
||||
|
||||
<h3><a id="streams_out_of_ordering" href="#streams_out_of_ordering">Out-of-Order Handling</a></h3>
|
||||
|
||||
<p>
|
||||
Besides the guarantee that each record will be processed exactly-once, another issue that many stream processing application will face is how to
|
||||
handle <a href="https://dl.acm.org/citation.cfm?id=3242155">out-of-order data</a> that may impact their business logic. In Kafka Streams, there are two causes that could potentially
|
||||
result in out-of-order data arrivals with respect to their timestamps:
|
||||
</p>
|
||||
|
||||
<ul>
|
||||
<li> Within a topic-partition, a record's timestamp may not be monotonically increasing along with their offsets. Since Kafka Streams will always try to process records within a topic-partition to follow the offset order,
|
||||
it can cause records with larger timestamps (but smaller offsets) to be processed earlier than records with smaller timestamps (but larger offsets) in the same topic-partition.
|
||||
</li>
|
||||
<li> Within a <a href="/{{version}}/documentation/streams/architecture#streams_architecture_tasks">stream task</a> that may be processing multiple topic-partitions, if users configure the application to not wait for all partitions to contain some buffered data and
|
||||
pick from the partition with the smallest timestamp to process the next record, then later on when some records are fetched for other topic-partitions, their timestamps may be smaller than those processed records fetched from another topic-partition.
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
<p>
|
||||
For stateless operations, out-of-order data will not impact processing logic since only one record is considered at a time, without looking into the history of past processed records;
|
||||
for stateful operations such as aggregations and joins, however, out-of-order data could cause the processing logic to be incorrect. If users want to handle such out-of-order data, generally they need to allow their applications
|
||||
to wait for longer time while bookkeeping their states during the wait time, i.e. making trade-off decisions between latency, cost, and correctness.
|
||||
In Kafka Streams specifically, users can configure their window operators for windowed aggregations to achieve such trade-offs (details can be found in <a href="/{{version}}/documentation/streams/developer-guide"><b>Developer Guide</b></a>).
|
||||
As for Joins, users have to be aware that some of the out-of-order data cannot be handled by increasing on latency and cost in Streams yet:
|
||||
</p>
|
||||
|
||||
<ul>
|
||||
<li> For Stream-Stream joins, all three types (inner, outer, left) handle out-of-order records correctly, but the resulted stream may contain unnecessary leftRecord-null for left joins, and leftRecord-null or null-rightRecord for outer joins. </li>
|
||||
<li> For Stream-Table joins, out-of-order records are not handled (i.e., Streams applications don't check for out-of-order records and just process all records in offset order), and hence it may produce unpredictable results. </li>
|
||||
<li> For Table-Table joins, out-of-order records are not handled (i.e., Streams applications don't check for out-of-order records and just process all records in offset order). However, the join result is a changelog stream and hence will be eventually consistent. </li>
|
||||
</ul>
|
||||
|
||||
<div class="pagination">
|
||||
<a href="/{{version}}/documentation/streams/tutorial" class="pagination__btn pagination__btn__prev">Previous</a>
|
||||
<a href="/{{version}}/documentation/streams/architecture" class="pagination__btn pagination__btn__next">Next</a>
|
||||
</div>
|
||||
</script>
|
||||
|
||||
<!--#include virtual="../../includes/_header.htm" -->
|
||||
<!--#include virtual="../../includes/_top.htm" -->
|
||||
<div class="content documentation documentation--current">
|
||||
<!--#include virtual="../../includes/_nav.htm" -->
|
||||
<div class="right">
|
||||
<!--#include virtual="../../includes/_docs_banner.htm" -->
|
||||
<ul class="breadcrumbs">
|
||||
<li><a href="/documentation">Documentation</a></li>
|
||||
<li><a href="/documentation/streams">Kafka Streams</a></li>
|
||||
</ul>
|
||||
<div class="p-content"></div>
|
||||
</div>
|
||||
</div>
|
||||
<!--#include virtual="../../includes/_footer.htm" -->
|
||||
<script>
|
||||
$(function() {
|
||||
// Show selected style on nav item
|
||||
$('.b-nav__streams').addClass('selected');
|
||||
|
||||
|
||||
//sticky secondary nav
|
||||
var $navbar = $(".sub-nav-sticky"),
|
||||
y_pos = $navbar.offset().top,
|
||||
height = $navbar.height();
|
||||
|
||||
$(window).scroll(function() {
|
||||
var scrollTop = $(window).scrollTop();
|
||||
|
||||
if (scrollTop > y_pos - height) {
|
||||
$navbar.addClass("navbar-fixed")
|
||||
} else if (scrollTop <= y_pos) {
|
||||
$navbar.removeClass("navbar-fixed")
|
||||
}
|
||||
});
|
||||
|
||||
// Display docs subnav items
|
||||
$('.b-nav__docs').parent().toggleClass('nav__item__with__subs--expanded');
|
||||
});
|
||||
</script>
|
||||
196
docs/streams/developer-guide/app-reset-tool.html
Normal file
196
docs/streams/developer-guide/app-reset-tool.html
Normal file
@@ -0,0 +1,196 @@
|
||||
<!--
|
||||
Licensed to the Apache Software Foundation (ASF) under one or more
|
||||
contributor license agreements. See the NOTICE file distributed with
|
||||
this work for additional information regarding copyright ownership.
|
||||
The ASF licenses this file to You under the Apache License, Version 2.0
|
||||
(the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
-->
|
||||
|
||||
<script><!--#include virtual="../../js/templateData.js" --></script>
|
||||
|
||||
<script id="content-template" type="text/x-handlebars-template">
|
||||
<!-- h1>Developer Guide for Kafka Streams</h1 -->
|
||||
<div class="sub-nav-sticky">
|
||||
<div class="sticky-top">
|
||||
<!-- div style="height:35px">
|
||||
<a href="/{{version}}/documentation/streams/">Introduction</a>
|
||||
<a class="active-menu-item" href="/{{version}}/documentation/streams/developer-guide">Developer Guide</a>
|
||||
<a href="/{{version}}/documentation/streams/core-concepts">Concepts</a>
|
||||
<a href="/{{version}}/documentation/streams/quickstart">Run Demo App</a>
|
||||
<a href="/{{version}}/documentation/streams/tutorial">Tutorial: Write App</a>
|
||||
</div -->
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div class="section" id="application-reset-tool">
|
||||
<span id="streams-developer-guide-app-reset"></span><h1>Application Reset Tool<a class="headerlink" href="#application-reset-tool" title="Permalink to this headline"></a></h1>
|
||||
<p>You can reset an application and force it to reprocess its data from scratch by using the application reset tool.
|
||||
This can be useful for development and testing, or when fixing bugs.</p>
|
||||
<p>The application reset tool handles the Kafka Streams <a class="reference internal" href="manage-topics.html#streams-developer-guide-topics-user"><span class="std std-ref">user topics</span></a> (input,
|
||||
output, and intermediate topics) and <a class="reference internal" href="manage-topics.html#streams-developer-guide-topics-internal"><span class="std std-ref">internal topics</span></a> differently
|
||||
when resetting the application.</p>
|
||||
<p>Here’s what the application reset tool does for each topic type:</p>
|
||||
<ul class="simple">
|
||||
<li>Input topics: Reset offsets to specified position (by default to the beginning of the topic).</li>
|
||||
<li>Intermediate topics: Skip to the end of the topic, i.e., set the application’s committed consumer offsets for all partitions to each partition’s <code class="docutils literal"><span class="pre">logSize</span></code> (for consumer group <code class="docutils literal"><span class="pre">application.id</span></code>).</li>
|
||||
<li>Internal topics: Delete the internal topic (this automatically deletes any committed offsets).</li>
|
||||
</ul>
|
||||
<p>The application reset tool does not:</p>
|
||||
<ul class="simple">
|
||||
<li>Reset output topics of an application. If any output (or intermediate) topics are consumed by downstream
|
||||
applications, it is your responsibility to adjust those downstream applications as appropriate when you reset the
|
||||
upstream application.</li>
|
||||
<li>Reset the local environment of your application instances. It is your responsibility to delete the local
|
||||
state on any machine on which an application instance was run. See the instructions in section
|
||||
<a class="reference internal" href="#streams-developer-guide-reset-local-environment"><span class="std std-ref">Step 2: Reset the local environments of your application instances</span></a> on how to do this.</li>
|
||||
</ul>
|
||||
<dl class="docutils">
|
||||
<dt>Prerequisites</dt>
|
||||
<dd><ul class="first last">
|
||||
<li><p class="first">All instances of your application must be stopped. Otherwise, the application may enter an invalid state, crash, or produce incorrect results. You can verify whether the consumer group with ID <code class="docutils literal"><span class="pre">application.id</span></code> is still active by using <code class="docutils literal"><span class="pre">bin/kafka-consumer-groups</span></code>.</p>
|
||||
</li>
|
||||
<li><p class="first">Use this tool with care and double-check its parameters: If you provide wrong parameter values (e.g., typos in <code class="docutils literal"><span class="pre">application.id</span></code>) or specify parameters inconsistently (e.g., specify the wrong input topics for the application), this tool might invalidate the application’s state or even impact other applications, consumer groups, or your Kafka topics.</p>
|
||||
</li>
|
||||
<li><p class="first">You should manually delete and re-create any intermediate topics before running the application reset tool. This will free up disk space in Kafka brokers.</p>
|
||||
</li>
|
||||
<li><p class="first">You should delete and recreate intermediate topics before running the application reset tool, unless the following applies:</p>
|
||||
<blockquote>
|
||||
<div><ul class="simple">
|
||||
<li>You have external downstream consumers for the application’s intermediate topics.</li>
|
||||
<li>You are in a development environment where manually deleting and re-creating intermediate topics is unnecessary.</li>
|
||||
</ul>
|
||||
</div></blockquote>
|
||||
</li>
|
||||
</ul>
|
||||
</dd>
|
||||
</dl>
|
||||
<div class="section" id="step-1-run-the-application-reset-tool">
|
||||
<h2>Step 1: Run the application reset tool<a class="headerlink" href="#step-1-run-the-application-reset-tool" title="Permalink to this headline"></a></h2>
|
||||
<p>Invoke the application reset tool from the command line</p>
|
||||
<div class="highlight-bash"><div class="highlight"><pre><span></span><path-to-kafka>/bin/kafka-streams-application-reset
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>The tool accepts the following parameters:</p>
|
||||
<div class="highlight-bash"><div class="highlight"><pre><span></span>Option <span class="o">(</span>* <span class="o">=</span> required<span class="o">)</span> Description
|
||||
--------------------- -----------
|
||||
* --application-id <String: id> The Kafka Streams application ID
|
||||
<span class="o">(</span>application.id<span class="o">)</span>.
|
||||
--bootstrap-servers <String: urls> Comma-separated list of broker urls with
|
||||
format: HOST1:PORT1,HOST2:PORT2
|
||||
<span class="o">(</span>default: localhost:9092<span class="o">)</span>
|
||||
--by-duration <String: urls> Reset offsets to offset by duration from
|
||||
current timestamp. Format: '<span>PnDTnHnMnS</span>'
|
||||
--config-file <String: file name> Property file containing configs to be
|
||||
passed to admin clients and embedded
|
||||
consumer.
|
||||
--dry-run Display the actions that would be
|
||||
performed without executing the reset
|
||||
commands.
|
||||
--from-file <String: urls> Reset offsets to values defined in CSV
|
||||
file.
|
||||
--input-topics <String: list> Comma-separated list of user input
|
||||
topics. For these topics, the tool will
|
||||
reset the offset to the earliest
|
||||
available offset.
|
||||
--intermediate-topics <String: list> Comma-separated list of intermediate user
|
||||
topics <span class="o">(</span>topics used in the through<span class="o">()</span>
|
||||
method<span class="o">)</span>. For these topics, the tool
|
||||
will skip to the end.
|
||||
--shift-by <Long: number-of-offsets> Reset offsets shifting current offset by
|
||||
'n', where 'n' can be positive or
|
||||
negative
|
||||
--to-datetime <String> Reset offsets to offset from datetime.
|
||||
Format: 'YYYY-MM-DDTHH:mm:SS.sss'
|
||||
--to-earliest Reset offsets to earliest offset.
|
||||
--to-latest Reset offsets to latest offset.
|
||||
--to-offset <Long> Reset offsets to a specific offset.
|
||||
--zookeeper Zookeeper option is deprecated by
|
||||
bootstrap.servers, as the reset tool
|
||||
would no longer access Zookeeper
|
||||
directly.
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>Consider the following as reset-offset scenarios for <code>input-topics</code>:</p>
|
||||
<ul>
|
||||
<li> by-duration</li>
|
||||
<li> from-file</li>
|
||||
<li> shift-by</li>
|
||||
<li> to-datetime</li>
|
||||
<li> to-earliest</li>
|
||||
<li> to-latest</li>
|
||||
<li> to-offset</li>
|
||||
</ul>
|
||||
<p>Only one of these scenarios can be defined. If not, <code>to-earliest</code> will be executed by default</p>
|
||||
<p>All the other parameters can be combined as needed. For example, if you want to restart an application from an
|
||||
empty internal state, but not reprocess previous data, simply omit the parameters <code class="docutils literal"><span class="pre">--input-topics</span></code> and
|
||||
<code class="docutils literal"><span class="pre">--intermediate-topics</span></code>.</p>
|
||||
</div>
|
||||
<div class="section" id="step-2-reset-the-local-environments-of-your-application-instances">
|
||||
<span id="streams-developer-guide-reset-local-environment"></span><h2>Step 2: Reset the local environments of your application instances<a class="headerlink" href="#step-2-reset-the-local-environments-of-your-application-instances" title="Permalink to this headline"></a></h2>
|
||||
<p>For a complete application reset, you must delete the application’s local state directory on any machines where the
|
||||
application instance was run. You must do this before restarting an application instance on the same machine. You can
|
||||
use either of these methods:</p>
|
||||
<ul class="simple">
|
||||
<li>The API method <code class="docutils literal"><span class="pre">KafkaStreams#cleanUp()</span></code> in your application code.</li>
|
||||
<li>Manually delete the corresponding local state directory (default location: <code class="docutils literal"><span class="pre">/tmp/kafka-streams/<application.id></span></code>). For more information, see <a href="/{{version}}/javadoc/org/apache/kafka/streams/StreamsConfig.html#STATE_DIR_CONFIG">Streams</a> javadocs.</li>
|
||||
</ul>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
|
||||
</div>
|
||||
</div>
|
||||
<div class="pagination">
|
||||
<a href="/{{version}}/documentation/streams/developer-guide/security" class="pagination__btn pagination__btn__prev">Previous</a>
|
||||
<a href="/{{version}}/documentation/streams/upgrade-guide" class="pagination__btn pagination__btn__next">Next</a>
|
||||
</div>
|
||||
</script>
|
||||
|
||||
<!--#include virtual="../../../includes/_header.htm" -->
|
||||
<!--#include virtual="../../../includes/_top.htm" -->
|
||||
<div class="content documentation documentation--current">
|
||||
<!--#include virtual="../../../includes/_nav.htm" -->
|
||||
<div class="right">
|
||||
<!--#include virtual="../../../includes/_docs_banner.htm" -->
|
||||
<ul class="breadcrumbs">
|
||||
<li><a href="/documentation">Documentation</a></li>
|
||||
<li><a href="/documentation/streams">Kafka Streams</a></li>
|
||||
<li><a href="/documentation/streams/developer-guide/">Developer Guide</a></li>
|
||||
</ul>
|
||||
<div class="p-content"></div>
|
||||
</div>
|
||||
</div>
|
||||
<!--#include virtual="../../../includes/_footer.htm" -->
|
||||
<script>
|
||||
$(function() {
|
||||
// Show selected style on nav item
|
||||
$('.b-nav__streams').addClass('selected');
|
||||
|
||||
//sticky secondary nav
|
||||
var $navbar = $(".sub-nav-sticky"),
|
||||
y_pos = $navbar.offset().top,
|
||||
height = $navbar.height();
|
||||
|
||||
$(window).scroll(function() {
|
||||
var scrollTop = $(window).scrollTop();
|
||||
|
||||
if (scrollTop > y_pos - height) {
|
||||
$navbar.addClass("navbar-fixed")
|
||||
} else if (scrollTop <= y_pos) {
|
||||
$navbar.removeClass("navbar-fixed")
|
||||
}
|
||||
});
|
||||
|
||||
// Display docs subnav items
|
||||
$('.b-nav__docs').parent().toggleClass('nav__item__with__subs--expanded');
|
||||
});
|
||||
</script>
|
||||
858
docs/streams/developer-guide/config-streams.html
Normal file
858
docs/streams/developer-guide/config-streams.html
Normal file
@@ -0,0 +1,858 @@
|
||||
<!--
|
||||
Licensed to the Apache Software Foundation (ASF) under one or more
|
||||
contributor license agreements. See the NOTICE file distributed with
|
||||
this work for additional information regarding copyright ownership.
|
||||
The ASF licenses this file to You under the Apache License, Version 2.0
|
||||
(the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
-->
|
||||
|
||||
<script><!--#include virtual="../../js/templateData.js" --></script>
|
||||
|
||||
<script id="content-template" type="text/x-handlebars-template">
|
||||
<!-- h1>Developer Guide for Kafka Streams</h1 -->
|
||||
<div class="sub-nav-sticky">
|
||||
<div class="sticky-top">
|
||||
<!-- div style="height:35px">
|
||||
<a href="/{{version}}/documentation/streams/">Introduction</a>
|
||||
<a class="active-menu-item" href="/{{version}}/documentation/streams/developer-guide">Developer Guide</a>
|
||||
<a href="/{{version}}/documentation/streams/core-concepts">Concepts</a>
|
||||
<a href="/{{version}}/documentation/streams/quickstart">Run Demo App</a>
|
||||
<a href="/{{version}}/documentation/streams/tutorial">Tutorial: Write App</a>
|
||||
</div -->
|
||||
</div>
|
||||
</div>
|
||||
|
||||
|
||||
<div class="section" id="configuring-a-streams-application">
|
||||
<span id="streams-developer-guide-configuration"></span><h1>Configuring a Streams Application<a class="headerlink" href="#configuring-a-streams-application" title="Permalink to this headline"></a></h1>
|
||||
<p>Kafka and Kafka Streams configuration options must be configured before using Streams. You can configure Kafka Streams by specifying parameters in a <code class="docutils literal"><span class="pre">java.util.Properties</span></code> instance.</p>
|
||||
<ol class="arabic">
|
||||
<li><p class="first">Create a <code class="docutils literal"><span class="pre">java.util.Properties</span></code> instance.</p>
|
||||
</li>
|
||||
<li><p class="first">Set the <a class="reference internal" href="#streams-developer-guide-required-configs"><span class="std std-ref">parameters</span></a>. For example:</p>
|
||||
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">java.util.Properties</span><span class="o">;</span>
|
||||
<span class="kn">import</span> <span class="nn">org.apache.kafka.streams.StreamsConfig</span><span class="o">;</span>
|
||||
|
||||
<span class="n">Properties</span> <span class="n">settings</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Properties</span><span class="o">();</span>
|
||||
<span class="c1">// Set a few key parameters</span>
|
||||
<span class="n">settings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">StreamsConfig</span><span class="o">.</span><span class="na">APPLICATION_ID_CONFIG</span><span class="o">,</span> <span class="s">"my-first-streams-application"</span><span class="o">);</span>
|
||||
<span class="n">settings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">StreamsConfig</span><span class="o">.</span><span class="na">BOOTSTRAP_SERVERS_CONFIG</span><span class="o">,</span> <span class="s">"kafka-broker1:9092"</span><span class="o">);</span>
|
||||
<span class="c1">// Any further settings</span>
|
||||
<span class="n">settings</span><span class="o">.</span><span class="na">put</span><span class="o">(...</span> <span class="o">,</span> <span class="o">...);</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</li>
|
||||
</ol>
|
||||
<div class="section" id="configuration-parameter-reference">
|
||||
<span id="streams-developer-guide-required-configs"></span><h2>Configuration parameter reference<a class="headerlink" href="#configuration-parameter-reference" title="Permalink to this headline"></a></h2>
|
||||
<p>This section contains the most common Streams configuration parameters. For a full reference, see the <a class="reference external" href="/{{version}}/javadoc/org/apache/kafka/streams/StreamsConfig.html">Streams</a> Javadocs.</p>
|
||||
<div class="contents local topic" id="contents">
|
||||
<ul class="simple">
|
||||
<li><a class="reference internal" href="#required-configuration-parameters" id="id3">Required configuration parameters</a><ul>
|
||||
<li><a class="reference internal" href="#application-id" id="id4">application.id</a></li>
|
||||
<li><a class="reference internal" href="#bootstrap-servers" id="id5">bootstrap.servers</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
<li><a class="reference internal" href="#optional-configuration-parameters" id="id6">Optional configuration parameters</a><ul>
|
||||
<li><a class="reference internal" href="#default-deserialization-exception-handler" id="id7">default.deserialization.exception.handler</a></li>
|
||||
<li><a class="reference internal" href="#default-production-exception-handler" id="id24">default.production.exception.handler</a></li>
|
||||
<li><a class="reference internal" href="#default-key-serde" id="id8">default.key.serde</a></li>
|
||||
<li><a class="reference internal" href="#default-value-serde" id="id9">default.value.serde</a></li>
|
||||
<li><a class="reference internal" href="#num-standby-replicas" id="id10">num.standby.replicas</a></li>
|
||||
<li><a class="reference internal" href="#num-stream-threads" id="id11">num.stream.threads</a></li>
|
||||
<li><a class="reference internal" href="#partition-grouper" id="id12">partition.grouper</a></li>
|
||||
<li><a class="reference internal" href="#processing-guarantee" id="id25">processing.guarantee</a></li>
|
||||
<li><a class="reference internal" href="#replication-factor" id="id13">replication.factor</a></li>
|
||||
<li><a class="reference internal" href="#state-dir" id="id14">state.dir</a></li>
|
||||
<li><a class="reference internal" href="#timestamp-extractor" id="id15">timestamp.extractor</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
<li><a class="reference internal" href="#kafka-consumers-and-producer-configuration-parameters" id="id16">Kafka consumers and producer configuration parameters</a><ul>
|
||||
<li><a class="reference internal" href="#naming" id="id17">Naming</a></li>
|
||||
<li><a class="reference internal" href="#default-values" id="id18">Default Values</a></li>
|
||||
<li><a class="reference internal" href="#enable-auto-commit" id="id19">enable.auto.commit</a></li>
|
||||
<li><a class="reference internal" href="#rocksdb-config-setter" id="id20">rocksdb.config.setter</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
<li><a class="reference internal" href="#recommended-configuration-parameters-for-resiliency" id="id21">Recommended configuration parameters for resiliency</a><ul>
|
||||
<li><a class="reference internal" href="#acks" id="id22">acks</a></li>
|
||||
<li><a class="reference internal" href="#id2" id="id23">replication.factor</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
</div>
|
||||
<div class="section" id="required-configuration-parameters">
|
||||
<h3><a class="toc-backref" href="#id3">Required configuration parameters</a><a class="headerlink" href="#required-configuration-parameters" title="Permalink to this headline"></a></h3>
|
||||
<p>Here are the required Streams configuration parameters.</p>
|
||||
<table border="1" class="non-scrolling-table docutils">
|
||||
<thead valign="bottom">
|
||||
<tr class="row-odd"><th class="head">Parameter Name</th>
|
||||
<th class="head">Importance</th>
|
||||
<th class="head" colspan="2">Description</th>
|
||||
<th class="head">Default Value</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody valign="top">
|
||||
<tr class="row-even"><td>application.id</td>
|
||||
<td>Required</td>
|
||||
<td colspan="2">An identifier for the stream processing application. Must be unique within the Kafka cluster.</td>
|
||||
<td>None</td>
|
||||
</tr>
|
||||
<tr class="row-odd"><td>bootstrap.servers</td>
|
||||
<td>Required</td>
|
||||
<td colspan="2">A list of host/port pairs to use for establishing the initial connection to the Kafka cluster.</td>
|
||||
<td>None</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
<div class="section" id="application-id">
|
||||
<h4><a class="toc-backref" href="#id4">application.id</a><a class="headerlink" href="#application-id" title="Permalink to this headline"></a></h4>
|
||||
<blockquote>
|
||||
<div><p>(Required) The application ID. Each stream processing application must have a unique ID. The same ID must be given to
|
||||
all instances of the application. It is recommended to use only alphanumeric characters, <code class="docutils literal"><span class="pre">.</span></code> (dot), <code class="docutils literal"><span class="pre">-</span></code> (hyphen), and <code class="docutils literal"><span class="pre">_</span></code> (underscore). Examples: <code class="docutils literal"><span class="pre">"hello_world"</span></code>, <code class="docutils literal"><span class="pre">"hello_world-v1.0.0"</span></code></p>
|
||||
<p>This ID is used in the following places to isolate resources used by the application from others:</p>
|
||||
<ul class="simple">
|
||||
<li>As the default Kafka consumer and producer <code class="docutils literal"><span class="pre">client.id</span></code> prefix</li>
|
||||
<li>As the Kafka consumer <code class="docutils literal"><span class="pre">group.id</span></code> for coordination</li>
|
||||
<li>As the name of the subdirectory in the state directory (cf. <code class="docutils literal"><span class="pre">state.dir</span></code>)</li>
|
||||
<li>As the prefix of internal Kafka topic names</li>
|
||||
</ul>
|
||||
<dl class="docutils">
|
||||
<dt>Tip:</dt>
|
||||
<dd>When an application is updated, the <code class="docutils literal"><span class="pre">application.id</span></code> should be changed unless you want to reuse the existing data in internal topics and state stores.
|
||||
For example, you could embed the version information within <code class="docutils literal"><span class="pre">application.id</span></code>, as <code class="docutils literal"><span class="pre">my-app-v1.0.0</span></code> and <code class="docutils literal"><span class="pre">my-app-v1.0.2</span></code>.</dd>
|
||||
</dl>
|
||||
</div></blockquote>
|
||||
</div>
|
||||
<div class="section" id="bootstrap-servers">
|
||||
<h4><a class="toc-backref" href="#id5">bootstrap.servers</a><a class="headerlink" href="#bootstrap-servers" title="Permalink to this headline"></a></h4>
|
||||
<blockquote>
|
||||
<div><p>(Required) The Kafka bootstrap servers. This is the same <a class="reference external" href="http://kafka.apache.org/documentation.html#producerconfigs">setting</a> that is used by the underlying producer and consumer clients to connect to the Kafka cluster.
|
||||
Example: <code class="docutils literal"><span class="pre">"kafka-broker1:9092,kafka-broker2:9092"</span></code>.</p>
|
||||
<dl class="docutils">
|
||||
<dt>Tip:</dt>
|
||||
<dd>Kafka Streams applications can only communicate with a single Kafka cluster specified by this config value.
|
||||
Future versions of Kafka Streams will support connecting to different Kafka clusters for reading input
|
||||
streams and writing output streams.</dd>
|
||||
</dl>
|
||||
</div></blockquote>
|
||||
</div>
|
||||
</div>
|
||||
<div class="section" id="optional-configuration-parameters">
|
||||
<span id="streams-developer-guide-optional-configs"></span><h3><a class="toc-backref" href="#id6">Optional configuration parameters</a><a class="headerlink" href="#optional-configuration-parameters" title="Permalink to this headline"></a></h3>
|
||||
<p>Here are the optional <a href="/{{version}}/javadoc/org/apache/kafka/streams/StreamsConfig.html">Streams</a> javadocs, sorted by level of importance:</p>
|
||||
<blockquote>
|
||||
<div><ul class="simple">
|
||||
<li>High: These parameters can have a significant impact on performance. Take care when deciding the values of these parameters.</li>
|
||||
<li>Medium: These parameters can have some impact on performance. Your specific environment will determine how much tuning effort should be focused on these parameters.</li>
|
||||
<li>Low: These parameters have a less general or less significant impact on performance.</li>
|
||||
</ul>
|
||||
</div></blockquote>
|
||||
<table border="1" class="non-scrolling-table docutils">
|
||||
<thead valign="bottom">
|
||||
<tr class="row-odd"><th class="head">Parameter Name</th>
|
||||
<th class="head">Importance</th>
|
||||
<th class="head" colspan="2">Description</th>
|
||||
<th class="head">Default Value</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody valign="top">
|
||||
<tr class="row-even"><td>application.server</td>
|
||||
<td>Low</td>
|
||||
<td colspan="2">A host:port pair pointing to an embedded user defined endpoint that can be used for discovering the locations of
|
||||
state stores within a single Kafka Streams application. The value of this must be different for each instance
|
||||
of the application.</td>
|
||||
<td>the empty string</td>
|
||||
</tr>
|
||||
<tr class="row-odd"><td>buffered.records.per.partition</td>
|
||||
<td>Low</td>
|
||||
<td colspan="2">The maximum number of records to buffer per partition.</td>
|
||||
<td>1000</td>
|
||||
</tr>
|
||||
<tr class="row-even"><td>cache.max.bytes.buffering</td>
|
||||
<td>Medium</td>
|
||||
<td colspan="2">Maximum number of memory bytes to be used for record caches across all threads.</td>
|
||||
<td>10485760 bytes</td>
|
||||
</tr>
|
||||
<tr class="row-odd"><td>client.id</td>
|
||||
<td>Medium</td>
|
||||
<td colspan="2">An ID string to pass to the server when making requests.
|
||||
(This setting is passed to the consumer/producer clients used internally by Kafka Streams.)</td>
|
||||
<td>the empty string</td>
|
||||
</tr>
|
||||
<tr class="row-even"><td>commit.interval.ms</td>
|
||||
<td>Low</td>
|
||||
<td colspan="2">The frequency with which to save the position (offsets in source topics) of tasks.</td>
|
||||
<td>30000 milliseconds</td>
|
||||
</tr>
|
||||
<tr class="row-odd"><td>default.deserialization.exception.handler</td>
|
||||
<td>Medium</td>
|
||||
<td colspan="2">Exception handling class that implements the <code class="docutils literal"><span class="pre">DeserializationExceptionHandler</span></code> interface.</td>
|
||||
<td><code class="docutils literal"><span class="pre">LogAndContinueExceptionHandler</span></code></td>
|
||||
</tr>
|
||||
<tr class="row-even"><td>default.production.exception.handler</td>
|
||||
<td>Medium</td>
|
||||
<td colspan="2">Exception handling class that implements the <code class="docutils literal"><span class="pre">ProductionExceptionHandler</span></code> interface.</td>
|
||||
<td><code class="docutils literal"><span class="pre">DefaultProductionExceptionHandler</span></code></td>
|
||||
</tr>
|
||||
<tr class="row-odd"><td>key.serde</td>
|
||||
<td>Medium</td>
|
||||
<td colspan="2">Default serializer/deserializer class for record keys, implements the <code class="docutils literal"><span class="pre">Serde</span></code> interface (see also value.serde).</td>
|
||||
<td><code class="docutils literal"><span class="pre">Serdes.ByteArray().getClass().getName()</span></code></td>
|
||||
</tr>
|
||||
<tr class="row-even"><td>metric.reporters</td>
|
||||
<td>Low</td>
|
||||
<td colspan="2">A list of classes to use as metrics reporters.</td>
|
||||
<td>the empty list</td>
|
||||
</tr>
|
||||
<tr class="row-odd"><td>metrics.num.samples</td>
|
||||
<td>Low</td>
|
||||
<td colspan="2">The number of samples maintained to compute metrics.</td>
|
||||
<td>2</td>
|
||||
</tr>
|
||||
<tr class="row-even"><td>metrics.recording.level</td>
|
||||
<td>Low</td>
|
||||
<td colspan="2">The highest recording level for metrics.</td>
|
||||
<td><code class="docutils literal"><span class="pre">INFO</span></code></td>
|
||||
</tr>
|
||||
<tr class="row-odd"><td>metrics.sample.window.ms</td>
|
||||
<td>Low</td>
|
||||
<td colspan="2">The window of time a metrics sample is computed over.</td>
|
||||
<td>30000 milliseconds</td>
|
||||
</tr>
|
||||
<tr class="row-even"><td>num.standby.replicas</td>
|
||||
<td>Medium</td>
|
||||
<td colspan="2">The number of standby replicas for each task.</td>
|
||||
<td>0</td>
|
||||
</tr>
|
||||
<tr class="row-odd"><td>num.stream.threads</td>
|
||||
<td>Medium</td>
|
||||
<td colspan="2">The number of threads to execute stream processing.</td>
|
||||
<td>1</td>
|
||||
</tr>
|
||||
<tr class="row-even"><td>partition.grouper</td>
|
||||
<td>Low</td>
|
||||
<td colspan="2">Partition grouper class that implements the <code class="docutils literal"><span class="pre">PartitionGrouper</span></code> interface.</td>
|
||||
<td>See <a class="reference internal" href="#streams-developer-guide-partition-grouper"><span class="std std-ref">Partition Grouper</span></a></td>
|
||||
</tr>
|
||||
<tr class="row-even"><td>processing.guarantee</td>
|
||||
<td>Low</td>
|
||||
<td colspan="2">The processing mode. Can be either <code class="docutils literal"><span class="pre">"at_least_once"</span></code> (default) or <code class="docutils literal"><span class="pre">"exactly_once"</span></code>.
|
||||
<td>See <a class="reference internal" href="#streams-developer-guide-processing-guarantedd"><span class="std std-ref">Processing Guarantee</span></a></td>
|
||||
</tr>
|
||||
<tr class="row-odd"><td>poll.ms</td>
|
||||
<td>Low</td>
|
||||
<td colspan="2">The amount of time in milliseconds to block waiting for input.</td>
|
||||
<td>100 milliseconds</td>
|
||||
</tr>
|
||||
<tr class="row-even"><td>replication.factor</td>
|
||||
<td>High</td>
|
||||
<td colspan="2">The replication factor for changelog topics and repartition topics created by the application.</td>
|
||||
<td>1</td>
|
||||
</tr>
|
||||
<tr class="row-even"><td>retries</td>
|
||||
<td>Medium</td>
|
||||
<td colspan="2">The number of retries for broker requests that return a retryable error. </td>
|
||||
<td>0</td>
|
||||
</tr>
|
||||
<tr class="row-even"><td>retry.backoff.ms</td>
|
||||
<td>Medium</td>
|
||||
<td colspan="2">The amount of time in milliseconds, before a request is retried. This applies if the <code class="docutils literal"><span class="pre">retries</span></code> parameter is configured to be greater than 0. </td>
|
||||
<td>100</td>
|
||||
</tr>
|
||||
<tr class="row-odd"><td>state.cleanup.delay.ms</td>
|
||||
<td>Low</td>
|
||||
<td colspan="2">The amount of time in milliseconds to wait before deleting state when a partition has migrated.</td>
|
||||
<td>600000 milliseconds</td>
|
||||
</tr>
|
||||
<tr class="row-even"><td>state.dir</td>
|
||||
<td>High</td>
|
||||
<td colspan="2">Directory location for state stores.</td>
|
||||
<td><code class="docutils literal"><span class="pre">/tmp/kafka-streams</span></code></td>
|
||||
</tr>
|
||||
<tr class="row-odd"><td>timestamp.extractor</td>
|
||||
<td>Medium</td>
|
||||
<td colspan="2">Timestamp extractor class that implements the <code class="docutils literal"><span class="pre">TimestampExtractor</span></code> interface.</td>
|
||||
<td>See <a class="reference internal" href="#streams-developer-guide-timestamp-extractor"><span class="std std-ref">Timestamp Extractor</span></a></td>
|
||||
</tr>
|
||||
<tr class="row-even"><td>upgrade.from</td>
|
||||
<td>Medium</td>
|
||||
<td colspan="2">The version you are upgrading from during a rolling upgrade.</td>
|
||||
<td>See <a class="reference internal" href="#streams-developer-guide-upgrade-from"><span class="std std-ref">Upgrade From</span></a></td>
|
||||
</tr>
|
||||
<tr class="row-odd"><td>value.serde</td>
|
||||
<td>Medium</td>
|
||||
<td colspan="2">Default serializer/deserializer class for record values, implements the <code class="docutils literal"><span class="pre">Serde</span></code> interface (see also key.serde).</td>
|
||||
<td><code class="docutils literal"><span class="pre">Serdes.ByteArray().getClass().getName()</span></code></td>
|
||||
</tr>
|
||||
<tr class="row-even"><td>windowstore.changelog.additional.retention.ms</td>
|
||||
<td>Low</td>
|
||||
<td colspan="2">Added to a windows maintainMs to ensure data is not deleted from the log prematurely. Allows for clock drift.</td>
|
||||
<td>86400000 milliseconds = 1 day</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
<div class="section" id="default-deserialization-exception-handler">
|
||||
<span id="streams-developer-guide-deh"></span><h4><a class="toc-backref" href="#id7">default.deserialization.exception.handler</a><a class="headerlink" href="#default-deserialization-exception-handler" title="Permalink to this headline"></a></h4>
|
||||
<blockquote>
|
||||
<div><p>The default deserialization exception handler allows you to manage record exceptions that fail to deserialize. This
|
||||
can be caused by corrupt data, incorrect serialization logic, or unhandled record types. The implemented exception
|
||||
handler needs to return a <code>FAIL</code> or <code>CONTINUE</code> depending on the record and the exception thrown. Returning
|
||||
<code>FAIL</code> will signal that Streams should shut down and <code>CONTINUE</code> will signal that Streams should ignore the issue
|
||||
and continue processing. The following library built-in exception handlers are available:</p>
|
||||
<ul class="simple">
|
||||
<li><a class="reference external" href="/{{version}}/javadoc/org/apache/kafka/streams/errors/LogAndContinueExceptionHandler.html">LogAndContinueExceptionHandler</a>:
|
||||
This handler logs the deserialization exception and then signals the processing pipeline to continue processing more records.
|
||||
This log-and-skip strategy allows Kafka Streams to make progress instead of failing if there are records that fail
|
||||
to deserialize.</li>
|
||||
<li><a class="reference external" href="/{{version}}/javadoc/org/apache/kafka/streams/errors/LogAndFailExceptionHandler.html">LogAndFailExceptionHandler</a>.
|
||||
This handler logs the deserialization exception and then signals the processing pipeline to stop processing more records.</li>
|
||||
</ul>
|
||||
|
||||
<p>You can also provide your own customized exception handler besides the library provided ones to meet your needs. For example, you can choose to forward corrupt
|
||||
records into a quarantine topic (think: a "dead letter queue") for further processing. To do this, use the Producer API to write a corrupted record directly to
|
||||
the quarantine topic. To be more concrete, you can create a separate <code>KafkaProducer</code> object outside the Streams client, and pass in this object
|
||||
as well as the dead letter queue topic name into the <code>Properties</code> map, which then can be retrieved from the <code>configure</code> function call.
|
||||
The drawback of this approach is that "manual" writes are side effects that are invisible to the Kafka Streams runtime library,
|
||||
so they do not benefit from the end-to-end processing guarantees of the Streams API:</p>
|
||||
|
||||
<pre class="brush: java;">
|
||||
public class SendToDeadLetterQueueExceptionHandler implements DeserializationExceptionHandler {
|
||||
KafkaProducer<byte[], byte[]> dlqProducer;
|
||||
String dlqTopic;
|
||||
|
||||
@Override
|
||||
public DeserializationHandlerResponse handle(final ProcessorContext context,
|
||||
final ConsumerRecord<byte[], byte[]> record,
|
||||
final Exception exception) {
|
||||
|
||||
log.warn("Exception caught during Deserialization, sending to the dead queue topic; " +
|
||||
"taskId: {}, topic: {}, partition: {}, offset: {}",
|
||||
context.taskId(), record.topic(), record.partition(), record.offset(),
|
||||
exception);
|
||||
|
||||
dlqProducer.send(new ProducerRecord<>(dlqTopic, record.timestamp(), record.key(), record.value(), record.headers())).get();
|
||||
|
||||
return DeserializationHandlerResponse.CONTINUE;
|
||||
}
|
||||
|
||||
@Override
|
||||
public void configure(final Map<String, ?> configs) {
|
||||
dlqProducer = .. // get a producer from the configs map
|
||||
dlqTopic = .. // get the topic name from the configs map
|
||||
}
|
||||
}
|
||||
</pre>
|
||||
|
||||
</div></blockquote>
|
||||
</div>
|
||||
<div class="section" id="default-production-exception-handler">
|
||||
<span id="streams-developer-guide-peh"></span><h4><a class="toc-backref" href="#id24">default.production.exception.handler</a><a class="headerlink" href="#default-production-exception-handler" title="Permalink to this headline"></a></h4>
|
||||
<blockquote>
|
||||
<div><p>The default production exception handler allows you to manage exceptions triggered when trying to interact with a broker
|
||||
such as attempting to produce a record that is too large. By default, Kafka provides and uses the <a class="reference external" href="/{{version}}/javadoc/org/apache/kafka/streams/errors/DefaultProductionExceptionHandler.html">DefaultProductionExceptionHandler</a>
|
||||
that always fails when these exceptions occur.</p>
|
||||
|
||||
<p>Each exception handler can return a <code>FAIL</code> or <code>CONTINUE</code> depending on the record and the exception thrown. Returning <code>FAIL</code> will signal that Streams should shut down and <code>CONTINUE</code> will signal that Streams
|
||||
should ignore the issue and continue processing. If you want to provide an exception handler that always ignores records that are too large, you could implement something like the following:</p>
|
||||
|
||||
<pre class="brush: java;">
|
||||
import java.util.Properties;
|
||||
import org.apache.kafka.streams.StreamsConfig;
|
||||
import org.apache.kafka.common.errors.RecordTooLargeException;
|
||||
import org.apache.kafka.streams.errors.ProductionExceptionHandler;
|
||||
import org.apache.kafka.streams.errors.ProductionExceptionHandler.ProductionExceptionHandlerResponse;
|
||||
|
||||
public class IgnoreRecordTooLargeHandler implements ProductionExceptionHandler {
|
||||
public void configure(Map<String, Object> config) {}
|
||||
|
||||
public ProductionExceptionHandlerResponse handle(final ProducerRecord<byte[], byte[]> record,
|
||||
final Exception exception) {
|
||||
if (exception instanceof RecordTooLargeException) {
|
||||
return ProductionExceptionHandlerResponse.CONTINUE;
|
||||
} else {
|
||||
return ProductionExceptionHandlerResponse.FAIL;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
Properties settings = new Properties();
|
||||
|
||||
// other various kafka streams settings, e.g. bootstrap servers, application id, etc
|
||||
|
||||
settings.put(StreamsConfig.DEFAULT_PRODUCTION_EXCEPTION_HANDLER_CLASS_CONFIG,
|
||||
IgnoreRecordTooLargeHandler.class);</pre></div>
|
||||
</blockquote>
|
||||
</div>
|
||||
<div class="section" id="default-key-serde">
|
||||
<h4><a class="toc-backref" href="#id8">default.key.serde</a><a class="headerlink" href="#default-key-serde" title="Permalink to this headline"></a></h4>
|
||||
<blockquote>
|
||||
<div><p>The default Serializer/Deserializer class for record keys. Serialization and deserialization in Kafka Streams happens
|
||||
whenever data needs to be materialized, for example:</p>
|
||||
<blockquote>
|
||||
<div><ul class="simple">
|
||||
<li>Whenever data is read from or written to a <em>Kafka topic</em> (e.g., via the <code class="docutils literal"><span class="pre">StreamsBuilder#stream()</span></code> and <code class="docutils literal"><span class="pre">KStream#to()</span></code> methods).</li>
|
||||
<li>Whenever data is read from or written to a <em>state store</em>.</li>
|
||||
</ul>
|
||||
<p>This is discussed in more detail in <a class="reference internal" href="datatypes.html#streams-developer-guide-serdes"><span class="std std-ref">Data types and serialization</span></a>.</p>
|
||||
</div></blockquote>
|
||||
</div></blockquote>
|
||||
</div>
|
||||
<div class="section" id="default-value-serde">
|
||||
<h4><a class="toc-backref" href="#id9">default.value.serde</a><a class="headerlink" href="#default-value-serde" title="Permalink to this headline"></a></h4>
|
||||
<blockquote>
|
||||
<div><p>The default Serializer/Deserializer class for record values. Serialization and deserialization in Kafka Streams
|
||||
happens whenever data needs to be materialized, for example:</p>
|
||||
<ul class="simple">
|
||||
<li>Whenever data is read from or written to a <em>Kafka topic</em> (e.g., via the <code class="docutils literal"><span class="pre">StreamsBuilder#stream()</span></code> and <code class="docutils literal"><span class="pre">KStream#to()</span></code> methods).</li>
|
||||
<li>Whenever data is read from or written to a <em>state store</em>.</li>
|
||||
</ul>
|
||||
<p>This is discussed in more detail in <a class="reference internal" href="datatypes.html#streams-developer-guide-serdes"><span class="std std-ref">Data types and serialization</span></a>.</p>
|
||||
</div></blockquote>
|
||||
</div>
|
||||
<div class="section" id="num-standby-replicas">
|
||||
<span id="streams-developer-guide-standby-replicas"></span><h4><a class="toc-backref" href="#id10">num.standby.replicas</a><a class="headerlink" href="#num-standby-replicas" title="Permalink to this headline"></a></h4>
|
||||
<blockquote>
|
||||
<div>The number of standby replicas. Standby replicas are shadow copies of local state stores. Kafka Streams attempts to create the
|
||||
specified number of replicas per store and keep them up to date as long as there are enough instances running.
|
||||
Standby replicas are used to minimize the latency of task failover. A task that was previously running on a failed instance is
|
||||
preferred to restart on an instance that has standby replicas so that the local state store restoration process from its
|
||||
changelog can be minimized. Details about how Kafka Streams makes use of the standby replicas to minimize the cost of
|
||||
resuming tasks on failover can be found in the <a class="reference internal" href="../architecture.html#streams_architecture_state"><span class="std std-ref">State</span></a> section.</div></blockquote>
|
||||
</div>
|
||||
<div class="admonition note">
|
||||
<p class="first admonition-title">Note</p>
|
||||
<p class="last">If you enable <cite>n</cite> standby tasks, you need to provision <cite>n+1</cite> <code class="docutils literal"><span class="pre">KafkaStreams</span></code>
|
||||
instances.</p>
|
||||
</div>
|
||||
<div class="section" id="num-stream-threads">
|
||||
<h4><a class="toc-backref" href="#id11">num.stream.threads</a><a class="headerlink" href="#num-stream-threads" title="Permalink to this headline"></a></h4>
|
||||
<blockquote>
|
||||
<div>This specifies the number of stream threads in an instance of the Kafka Streams application. The stream processing code runs in these thread.
|
||||
For more information about Kafka Streams threading model, see <a class="reference internal" href="../architecture.html#streams_architecture_threads"><span class="std std-ref">Threading Model</span></a>.</div></blockquote>
|
||||
</div>
|
||||
<div class="section" id="partition-grouper">
|
||||
<span id="streams-developer-guide-partition-grouper"></span><h4><a class="toc-backref" href="#id12">partition.grouper</a><a class="headerlink" href="#partition-grouper" title="Permalink to this headline"></a></h4>
|
||||
<blockquote>
|
||||
<div>A partition grouper creates a list of stream tasks from the partitions of source topics, where each created task is assigned with a group of source topic partitions.
|
||||
The default implementation provided by Kafka Streams is <a class="reference external" href="/{{version}}/javadoc/org/apache/kafka/streams/processor/DefaultPartitionGrouper.html">DefaultPartitionGrouper</a>.
|
||||
It assigns each task with one partition for each of the source topic partitions. The generated number of tasks equals the largest
|
||||
number of partitions among the input topics. Usually an application does not need to customize the partition grouper.</div></blockquote>
|
||||
</div>
|
||||
<div class="section" id="processing-guarantee">
|
||||
<span id="streams-developer-guide-processing-guarantee"></span><h4><a class="toc-backref" href="#id25">processing.guarantee</a><a class="headerlink" href="#processing-guarantee" title="Permalink to this headline"></a></h4>
|
||||
<blockquote>
|
||||
<div>The processing guarantee that should be used. Possible values are <code class="docutils literal"><span class="pre">"at_least_once"</span></code> (default) and <code class="docutils literal"><span class="pre">"exactly_once"</span></code>.
|
||||
Note that if exactly-once processing is enabled, the default for parameter <code class="docutils literal"><span class="pre">commit.interval.ms</span></code> changes to 100ms.
|
||||
Additionally, consumers are configured with <code class="docutils literal"><span class="pre">isolation.level="read_committed"</span></code>
|
||||
and producers are configured with <code class="docutils literal"><span class="pre">retries=Integer.MAX_VALUE</span></code>, <code class="docutils literal"><span class="pre">enable.idempotence=true</span></code>,
|
||||
and <code class="docutils literal"><span class="pre">max.in.flight.requests.per.connection=1</span></code> per default.
|
||||
Note that by default exactly-once processing requires a cluster of at least three brokers what is the recommended setting for production.
|
||||
For development you can change this, by adjusting broker setting <code class="docutils literal"><span class="pre">transaction.state.log.replication.factor</span></code> and <code class="docutils literal"><span class="pre">transaction.state.log.min.isr</span></code> to the number of broker you want to use.
|
||||
For more details see <a href="../core-concepts#streams_processing_guarantee">Processing Guarantees</a>.
|
||||
</div></blockquote>
|
||||
</div>
|
||||
<div class="section" id="replication-factor">
|
||||
<span id="replication-factor-parm"></span><h4><a class="toc-backref" href="#id13">replication.factor</a><a class="headerlink" href="#replication-factor" title="Permalink to this headline"></a></h4>
|
||||
<blockquote>
|
||||
<div><p>This specifies the replication factor of internal topics that Kafka Streams creates when local states are used or a stream is
|
||||
repartitioned for aggregation. Replication is important for fault tolerance. Without replication even a single broker failure
|
||||
may prevent progress of the stream processing application. It is recommended to use a similar replication factor as source topics.</p>
|
||||
<dl class="docutils">
|
||||
<dt>Recommendation:</dt>
|
||||
<dd>Increase the replication factor to 3 to ensure that the internal Kafka Streams topic can tolerate up to 2 broker failures.
|
||||
Note that you will require more storage space as well (3 times more with the replication factor of 3).</dd>
|
||||
</dl>
|
||||
</div></blockquote>
|
||||
</div>
|
||||
<div class="section" id="state-dir">
|
||||
<h4><a class="toc-backref" href="#id14">state.dir</a><a class="headerlink" href="#state-dir" title="Permalink to this headline"></a></h4>
|
||||
<blockquote>
|
||||
<div>The state directory. Kafka Streams persists local states under the state directory. Each application has a subdirectory on its hosting
|
||||
machine that is located under the state directory. The name of the subdirectory is the application ID. The state stores associated
|
||||
with the application are created under this subdirectory. When running multiple instances of the same application on a single machine,
|
||||
this path must be unique for each such instance.</div>
|
||||
</blockquote>
|
||||
</div>
|
||||
<div class="section" id="timestamp-extractor">
|
||||
<span id="streams-developer-guide-timestamp-extractor"></span><h4><a class="toc-backref" href="#id15">timestamp.extractor</a><a class="headerlink" href="#timestamp-extractor" title="Permalink to this headline"></a></h4>
|
||||
<blockquote>
|
||||
<div><p>A timestamp extractor pulls a timestamp from an instance of <a class="reference external" href="/{{version}}/javadoc/org/apache/kafka/clients/consumer/ConsumerRecord.html">ConsumerRecord</a>.
|
||||
Timestamps are used to control the progress of streams.</p>
|
||||
<p>The default extractor is
|
||||
<a class="reference external" href="/{{version}}/javadoc/org/apache/kafka/streams/processor/FailOnInvalidTimestamp.html">FailOnInvalidTimestamp</a>.
|
||||
This extractor retrieves built-in timestamps that are automatically embedded into Kafka messages by the Kafka producer
|
||||
client since
|
||||
<a class="reference external" href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-32+-+Add+timestamps+to+Kafka+message">Kafka version 0.10</a>.
|
||||
Depending on the setting of Kafka’s server-side <code class="docutils literal"><span class="pre">log.message.timestamp.type</span></code> broker and <code class="docutils literal"><span class="pre">message.timestamp.type</span></code> topic parameters,
|
||||
this extractor provides you with:</p>
|
||||
<ul class="simple">
|
||||
<li><strong>event-time</strong> processing semantics if <code class="docutils literal"><span class="pre">log.message.timestamp.type</span></code> is set to <code class="docutils literal"><span class="pre">CreateTime</span></code> aka “producer time”
|
||||
(which is the default). This represents the time when a Kafka producer sent the original message. If you use Kafka’s
|
||||
official producer client, the timestamp represents milliseconds since the epoch.</li>
|
||||
<li><strong>ingestion-time</strong> processing semantics if <code class="docutils literal"><span class="pre">log.message.timestamp.type</span></code> is set to <code class="docutils literal"><span class="pre">LogAppendTime</span></code> aka “broker
|
||||
time”. This represents the time when the Kafka broker received the original message, in milliseconds since the epoch.</li>
|
||||
</ul>
|
||||
<p>The <code class="docutils literal"><span class="pre">FailOnInvalidTimestamp</span></code> extractor throws an exception if a record contains an invalid (i.e. negative) built-in
|
||||
timestamp, because Kafka Streams would not process this record but silently drop it. Invalid built-in timestamps can
|
||||
occur for various reasons: if for example, you consume a topic that is written to by pre-0.10 Kafka producer clients
|
||||
or by third-party producer clients that don’t support the new Kafka 0.10 message format yet; another situation where
|
||||
this may happen is after upgrading your Kafka cluster from <code class="docutils literal"><span class="pre">0.9</span></code> to <code class="docutils literal"><span class="pre">0.10</span></code>, where all the data that was generated
|
||||
with <code class="docutils literal"><span class="pre">0.9</span></code> does not include the <code class="docutils literal"><span class="pre">0.10</span></code> message timestamps.</p>
|
||||
<p>If you have data with invalid timestamps and want to process it, then there are two alternative extractors available.
|
||||
Both work on built-in timestamps, but handle invalid timestamps differently.</p>
|
||||
<ul class="simple">
|
||||
<li><a class="reference external" href="/{{version}}/javadoc/org/apache/kafka/streams/processor/LogAndSkipOnInvalidTimestamp.html">LogAndSkipOnInvalidTimestamp</a>:
|
||||
This extractor logs a warn message and returns the invalid timestamp to Kafka Streams, which will not process but
|
||||
silently drop the record.
|
||||
This log-and-skip strategy allows Kafka Streams to make progress instead of failing if there are records with an
|
||||
invalid built-in timestamp in your input data.</li>
|
||||
<li><a class="reference external" href="/{{version}}/javadoc/org/apache/kafka/streams/processor/UsePartitionTimeOnInvalidTimestamp.html">UsePartitionTimeOnInvalidTimestamp</a>.
|
||||
This extractor returns the record’s built-in timestamp if it is valid (i.e. not negative). If the record does not
|
||||
have a valid built-in timestamps, the extractor returns the previously extracted valid timestamp from a record of the
|
||||
same topic partition as the current record as a timestamp estimation. In case that no timestamp can be estimated, it
|
||||
throws an exception.</li>
|
||||
</ul>
|
||||
<p>Another built-in extractor is
|
||||
<a class="reference external" href="/{{version}}/javadoc/org/apache/kafka/streams/processor/WallclockTimestampExtractor.html">WallclockTimestampExtractor</a>.
|
||||
This extractor does not actually “extract” a timestamp from the consumed record but rather returns the current time in
|
||||
milliseconds from the system clock (think: <code class="docutils literal"><span class="pre">System.currentTimeMillis()</span></code>), which effectively means Streams will operate
|
||||
on the basis of the so-called <strong>processing-time</strong> of events.</p>
|
||||
<p>You can also provide your own timestamp extractors, for instance to retrieve timestamps embedded in the payload of
|
||||
messages. If you cannot extract a valid timestamp, you can either throw an exception, return a negative timestamp, or
|
||||
estimate a timestamp. Returning a negative timestamp will result in data loss – the corresponding record will not be
|
||||
processed but silently dropped. If you want to estimate a new timestamp, you can use the value provided via
|
||||
<code class="docutils literal"><span class="pre">previousTimestamp</span></code> (i.e., a Kafka Streams timestamp estimation). Here is an example of a custom
|
||||
<code class="docutils literal"><span class="pre">TimestampExtractor</span></code> implementation:</p>
|
||||
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">org.apache.kafka.clients.consumer.ConsumerRecord</span><span class="o">;</span>
|
||||
<span class="kn">import</span> <span class="nn">org.apache.kafka.streams.processor.TimestampExtractor</span><span class="o">;</span>
|
||||
|
||||
<span class="c1">// Extracts the embedded timestamp of a record (giving you "event-time" semantics).</span>
|
||||
<span class="kd">public</span> <span class="kd">class</span> <span class="nc">MyEventTimeExtractor</span> <span class="kd">implements</span> <span class="n">TimestampExtractor</span> <span class="o">{</span>
|
||||
|
||||
<span class="nd">@Override</span>
|
||||
<span class="kd">public</span> <span class="kt">long</span> <span class="nf">extract</span><span class="o">(</span><span class="kd">final</span> <span class="n">ConsumerRecord</span><span class="o"><</span><span class="n">Object</span><span class="o">,</span> <span class="n">Object</span><span class="o">></span> <span class="n">record</span><span class="o">,</span> <span class="kd">final</span> <span class="kt">long</span> <span class="n">previousTimestamp</span><span class="o">)</span> <span class="o">{</span>
|
||||
<span class="c1">// `Foo` is your own custom class, which we assume has a method that returns</span>
|
||||
<span class="c1">// the embedded timestamp (milliseconds since midnight, January 1, 1970 UTC).</span>
|
||||
<span class="kt">long</span> <span class="n">timestamp</span> <span class="o">=</span> <span class="o">-</span><span class="mi">1</span><span class="o">;</span>
|
||||
<span class="kd">final</span> <span class="n">Foo</span> <span class="n">myPojo</span> <span class="o">=</span> <span class="o">(</span><span class="n">Foo</span><span class="o">)</span> <span class="n">record</span><span class="o">.</span><span class="na">value</span><span class="o">();</span>
|
||||
<span class="k">if</span> <span class="o">(</span><span class="n">myPojo</span> <span class="o">!=</span> <span class="kc">null</span><span class="o">)</span> <span class="o">{</span>
|
||||
<span class="n">timestamp</span> <span class="o">=</span> <span class="n">myPojo</span><span class="o">.</span><span class="na">getTimestampInMillis</span><span class="o">();</span>
|
||||
<span class="o">}</span>
|
||||
<span class="k">if</span> <span class="o">(</span><span class="n">timestamp</span> <span class="o"><</span> <span class="mi">0</span><span class="o">)</span> <span class="o">{</span>
|
||||
<span class="c1">// Invalid timestamp! Attempt to estimate a new timestamp,</span>
|
||||
<span class="c1">// otherwise fall back to wall-clock time (processing-time).</span>
|
||||
<span class="k">if</span> <span class="o">(</span><span class="n">previousTimestamp</span> <span class="o">>=</span> <span class="mi">0</span><span class="o">)</span> <span class="o">{</span>
|
||||
<span class="k">return</span> <span class="n">previousTimestamp</span><span class="o">;</span>
|
||||
<span class="o">}</span> <span class="k">else</span> <span class="o">{</span>
|
||||
<span class="k">return</span> <span class="n">System</span><span class="o">.</span><span class="na">currentTimeMillis</span><span class="o">();</span>
|
||||
<span class="o">}</span>
|
||||
<span class="o">}</span>
|
||||
<span class="o">}</span>
|
||||
|
||||
<span class="o">}</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>You would then define the custom timestamp extractor in your Streams configuration as follows:</p>
|
||||
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">java.util.Properties</span><span class="o">;</span>
|
||||
<span class="kn">import</span> <span class="nn">org.apache.kafka.streams.StreamsConfig</span><span class="o">;</span>
|
||||
|
||||
<span class="n">Properties</span> <span class="n">streamsConfiguration</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Properties</span><span class="o">();</span>
|
||||
<span class="n">streamsConfiguration</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">StreamsConfig</span><span class="o">.</span><span class="na">DEFAULT_TIMESTAMP_EXTRACTOR_CLASS_CONFIG</span><span class="o">,</span> <span class="n">MyEventTimeExtractor</span><span class="o">.</span><span class="na">class</span><span class="o">);</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div></blockquote>
|
||||
</div>
|
||||
<div class="section" id="upgrade-from">
|
||||
<h4><a class="toc-backref" href="#id14">upgrade.from</a><a class="headerlink" href="#upgrade-from" title="Permalink to this headline"></a></h4>
|
||||
<blockquote>
|
||||
<div>
|
||||
The version you are upgrading from. It is important to set this config when performing a rolling upgrade to certain versions, as described in the upgrade guide.
|
||||
You should set this config to the appropriate version before bouncing your instances and upgrading them to the newer version. Once everyone is on the
|
||||
newer version, you should remove this config and do a second rolling bounce. It is only necessary to set this config and follow the two-bounce upgrade path
|
||||
when upgrading from below version 2.0, or when upgrading to 2.4+ from any version lower than 2.4.
|
||||
</div>
|
||||
</blockquote>
|
||||
</div>
|
||||
</div>
|
||||
<div class="section" id="kafka-consumers-and-producer-configuration-parameters">
|
||||
<h3><a class="toc-backref" href="#id16">Kafka consumers, producer and admin client configuration parameters</a><a class="headerlink" href="#kafka-consumers-and-producer-configuration-parameters" title="Permalink to this headline"></a></h3>
|
||||
<p>You can specify parameters for the Kafka <a class="reference external" href="/{{version}}/javadoc/org/apache/kafka/clients/consumer/package-summary.html">consumers</a>, <a class="reference external" href="/{{version}}/javadoc/org/apache/kafka/clients/producer/package-summary.html">producers</a>,
|
||||
and <a class="reference external" href="/{{version}}/javadoc/org/apache/kafka/kafka/clients/admin/package-summary.html">admin client</a> that are used internally.
|
||||
The consumer, producer and admin client settings are defined by specifying parameters in a <code class="docutils literal"><span class="pre">StreamsConfig</span></code> instance.</p>
|
||||
<p>In this example, the Kafka <a class="reference external" href="/{{version}}/javadoc/org/apache/kafka/clients/consumer/ConsumerConfig.html#SESSION_TIMEOUT_MS_CONFIG">consumer session timeout</a> is configured to be 60000 milliseconds in the Streams settings:</p>
|
||||
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="n">Properties</span> <span class="n">streamsSettings</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Properties</span><span class="o">();</span>
|
||||
<span class="c1">// Example of a "normal" setting for Kafka Streams</span>
|
||||
<span class="n">streamsSettings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">StreamsConfig</span><span class="o">.</span><span class="na">BOOTSTRAP_SERVERS_CONFIG</span><span class="o">,</span> <span class="s">"kafka-broker-01:9092"</span><span class="o">);</span>
|
||||
<span class="c1">// Customize the Kafka consumer settings of your Streams application</span>
|
||||
<span class="n">streamsSettings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">ConsumerConfig</span><span class="o">.</span><span class="na">SESSION_TIMEOUT_MS_CONFIG</span><span class="o">,</span> <span class="mi">60000</span><span class="o">);</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<div class="section" id="naming">
|
||||
<h4><a class="toc-backref" href="#id17">Naming</a><a class="headerlink" href="#naming" title="Permalink to this headline"></a></h4>
|
||||
<p>Some consumer, producer and admin client configuration parameters use the same parameter name, and Kafka Streams library itself also uses some parameters that share the same name with its embedded client. For example, <code class="docutils literal"><span class="pre">send.buffer.bytes</span></code> and
|
||||
<code class="docutils literal"><span class="pre">receive.buffer.bytes</span></code> are used to configure TCP buffers; <code class="docutils literal"><span class="pre">request.timeout.ms</span></code> and <code class="docutils literal"><span class="pre">retry.backoff.ms</span></code> control retries for client request;
|
||||
<code class="docutils literal"><span class="pre">retries</span></code> are used to configure how many retries are allowed when handling retriable errors from broker request responses.
|
||||
You can avoid duplicate names by prefix parameter names with <code class="docutils literal"><span class="pre">consumer.</span></code>, <code class="docutils literal"><span class="pre">producer.</span></code>, or <code class="docutils literal"><span class="pre">admin.</span></code> (e.g., <code class="docutils literal"><span class="pre">consumer.send.buffer.bytes</span></code> and <code class="docutils literal"><span class="pre">producer.send.buffer.bytes</span></code>).</p>
|
||||
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="n">Properties</span> <span class="n">streamsSettings</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Properties</span><span class="o">();</span>
|
||||
<span class="c1">// same value for consumer, producer, and admin client</span>
|
||||
<span class="n">streamsSettings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="s">"PARAMETER_NAME"</span><span class="o">,</span> <span class="s">"value"</span><span class="o">);</span>
|
||||
<span class="c1">// different values for consumer and producer</span>
|
||||
<span class="n">streamsSettings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="s">"consumer.PARAMETER_NAME"</span><span class="o">,</span> <span class="s">"consumer-value"</span><span class="o">);</span>
|
||||
<span class="n">streamsSettings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="s">"producer.PARAMETER_NAME"</span><span class="o">,</span> <span class="s">"producer-value"</span><span class="o">);</span>
|
||||
<span class="n">streamsSettings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="s">"admin.PARAMETER_NAME"</span><span class="o">,</span> <span class="s">"admin-value"</span><span class="o">);</span>
|
||||
<span class="c1">// alternatively, you can use</span>
|
||||
<span class="n">streamsSettings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">StreamsConfig</span><span class="o">.</span><span class="na">consumerPrefix</span><span class="o">(</span><span class="s">"PARAMETER_NAME"</span><span class="o">),</span> <span class="s">"consumer-value"</span><span class="o">);</span>
|
||||
<span class="n">streamsSettings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">StreamsConfig</span><span class="o">.</span><span class="na">producerPrefix</span><span class="o">(</span><span class="s">"PARAMETER_NAME"</span><span class="o">),</span> <span class="s">"producer-value"</span><span class="o">);</span>
|
||||
<span class="n">streamsSettings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">StreamsConfig</span><span class="o">.</span><span class="na">adminClientPrefix</span><span class="o">(</span><span class="s">"PARAMETER_NAME"</span><span class="o">),</span> <span class="s">"admin-value"</span><span class="o">);</span>
|
||||
</pre></div>
|
||||
<p>You could further separate consumer configuration by adding different prefixes:</p>
|
||||
<ul class="simple">
|
||||
<li><code class="docutils literal"><span class="pre">main.consumer.</span></code> for main consumer which is the default consumer of stream source.</li>
|
||||
<li><code class="docutils literal"><span class="pre">restore.consumer.</span></code> for restore consumer which is in charge of state store recovery.</li>
|
||||
<li><code class="docutils literal"><span class="pre">global.consumer.</span></code> for global consumer which is used in global KTable construction.</li>
|
||||
</ul>
|
||||
<p>For example, if you only want to set restore consumer config without touching other consumers' settings, you could simply use <code class="docutils literal"><span class="pre">restore.consumer.</span></code> to set the config.</p>
|
||||
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="n">Properties</span> <span class="n">streamsSettings</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Properties</span><span class="o">();</span>
|
||||
<span class="c1">// same config value for all consumer types</span>
|
||||
<span class="n">streamsSettings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="s">"consumer.PARAMETER_NAME"</span><span class="o">,</span> <span class="s">"general-consumer-value"</span><span class="o">);</span>
|
||||
<span class="c1">// set a different restore consumer config. This would make restore consumer take restore-consumer-value,</span>
|
||||
<span>// while main consumer and global consumer stay with general-consumer-value</span>
|
||||
<span class="n">streamsSettings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="s">"restore.consumer.PARAMETER_NAME"</span><span class="o">,</span> <span class="s">"restore-consumer-value"</span><span class="o">);</span>
|
||||
<span class="c1">// alternatively, you can use</span>
|
||||
<span class="n">streamsSettings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">StreamsConfig</span><span class="o">.</span><span class="na">restoreConsumerPrefix</span><span class="o">(</span><span class="s">"PARAMETER_NAME"</span><span class="o">),</span> <span class="s">"restore-consumer-value"</span><span class="o">);</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p> Same applied to <code class="docutils literal"><span class="pre">main.consumer.</span></code> and <code class="docutils literal"><span class="pre">main.consumer.</span></code>, if you only want to specify one consumer type config.</p>
|
||||
<p> Additionally, to configure the internal repartition/changelog topics, you could use the <code class="docutils literal"><span class="pre">topic.</span></code> prefix, followed by any of the standard topic configs.</p>
|
||||
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="n">Properties</span> <span class="n">streamsSettings</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Properties</span><span class="o">();</span>
|
||||
<span class="c1">// Override default for both changelog and repartition topics</span>
|
||||
<span class="n">streamsSettings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="s">"topic.PARAMETER_NAME"</span><span class="o">,</span> <span class="s">"topic-value"</span><span class="o">);</span>
|
||||
<span class="c1">// alternatively, you can use</span>
|
||||
<span class="n">streamsSettings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">StreamsConfig</span><span class="o">.</span><span class="na">topicPrefix</span><span class="o">(</span><span class="s">"PARAMETER_NAME"</span><span class="o">),</span> <span class="s">"topic-value"</span><span class="o">);</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
<div class="section" id="default-values">
|
||||
<h4><a class="toc-backref" href="#id18">Default Values</a><a class="headerlink" href="#default-values" title="Permalink to this headline"></a></h4>
|
||||
<p>Kafka Streams uses different default values for some of the underlying client configs, which are summarized below. For detailed descriptions
|
||||
of these configs, see <a class="reference external" href="http://kafka.apache.org/0100/documentation.html#producerconfigs">Producer Configs</a>
|
||||
and <a class="reference external" href="http://kafka.apache.org/0100/documentation.html#newconsumerconfigs">Consumer Configs</a>.</p>
|
||||
<table border="1" class="non-scrolling-table docutils">
|
||||
<thead valign="bottom">
|
||||
<tr class="row-odd"><th class="head">Parameter Name</th>
|
||||
<th class="head">Corresponding Client</th>
|
||||
<th class="head">Streams Default</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody valign="top">
|
||||
<tr class="row-even"><td>auto.offset.reset</td>
|
||||
<td>Global Consumer</td>
|
||||
<td>none (cannot be changed)</td>
|
||||
</tr>
|
||||
<tr class="row-even"><td>auto.offset.reset</td>
|
||||
<td>Restore Consumer</td>
|
||||
<td>none (cannot be changed)</td>
|
||||
</tr>
|
||||
<tr class="row-even"><td>auto.offset.reset</td>
|
||||
<td>Consumer</td>
|
||||
<td>earliest</td>
|
||||
</tr>
|
||||
<tr class="row-odd"><td>enable.auto.commit</td>
|
||||
<td>Consumer</td>
|
||||
<td>false</td>
|
||||
</tr>
|
||||
<tr class="row-even"><td>linger.ms</td>
|
||||
<td>Producer</td>
|
||||
<td>100</td>
|
||||
</tr>
|
||||
<tr class="row-odd"><td>max.poll.interval.ms</td>
|
||||
<td>Consumer</td>
|
||||
<td>Integer.MAX_VALUE</td>
|
||||
</tr>
|
||||
<tr class="row-even"><td>max.poll.records</td>
|
||||
<td>Consumer</td>
|
||||
<td>1000</td>
|
||||
</tr>
|
||||
<tr class="row-odd"><td>rocksdb.config.setter</td>
|
||||
<td>Consumer</td>
|
||||
<td> </td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
</div>
|
||||
<div class="section" id="enable-auto-commit">
|
||||
<span id="streams-developer-guide-consumer-auto-commit"></span><h4><a class="toc-backref" href="#id19">enable.auto.commit</a><a class="headerlink" href="#enable-auto-commit" title="Permalink to this headline"></a></h4>
|
||||
<blockquote>
|
||||
<div>The consumer auto commit. To guarantee at-least-once processing semantics and turn off auto commits, Kafka Streams overrides this consumer config
|
||||
value to <code class="docutils literal"><span class="pre">false</span></code>. Consumers will only commit explicitly via <em>commitSync</em> calls when the Kafka Streams library or a user decides
|
||||
to commit the current processing state.</div></blockquote>
|
||||
</div>
|
||||
<div class="section" id="rocksdb-config-setter">
|
||||
<span id="streams-developer-guide-rocksdb-config"></span><h4><a class="toc-backref" href="#id20">rocksdb.config.setter</a><a class="headerlink" href="#rocksdb-config-setter" title="Permalink to this headline"></a></h4>
|
||||
<blockquote>
|
||||
<div><p>The RocksDB configuration. Kafka Streams uses RocksDB as the default storage engine for persistent stores. To change the default
|
||||
configuration for RocksDB, implement <code class="docutils literal"><span class="pre">RocksDBConfigSetter</span></code> and provide your custom class via <a class="reference external" href="/{{version}}/javadoc/org/apache/kafka/streams/state/RocksDBConfigSetter.html">rocksdb.config.setter</a>.</p>
|
||||
<p>Here is an example that adjusts the memory size consumed by RocksDB.</p>
|
||||
<div class="highlight-java"><div class="highlight"><pre><span></span> <span class="kd">public</span> <span class="kd">static</span> <span class="kd">class</span> <span class="nc">CustomRocksDBConfig</span> <span class="kd">implements</span> <span class="n">RocksDBConfigSetter</span> <span class="o">{</span>
|
||||
|
||||
<span class="c1">// This object should be a member variable so it can be closed in RocksDBConfigSetter#close.</span>
|
||||
<span class="kd">private</span> <span class="n">org.rocksdb.Cache</span> <span class="n">cache</span> <span class="o">=</span> <span class="k">new</span> <span class="n">org</span><span class="o">.</span><span class="na">rocksdb</span><span class="o">.</span><span class="na">LRUCache</span><span class="o">(</span><span class="mi">16</span> <span class="o">*</span> <span class="mi">1024L</span> <span class="o">*</span> <span class="mi">1024L</span><span class="o">);</span>
|
||||
|
||||
<span class="nd">@Override</span>
|
||||
<span class="kd">public</span> <span class="kt">void</span> <span class="nf">setConfig</span><span class="o">(</span><span class="kd">final</span> <span class="n">String</span> <span class="n">storeName</span><span class="o">,</span> <span class="kd">final</span> <span class="n">Options</span> <span class="n">options</span><span class="o">,</span> <span class="kd">final</span> <span class="n">Map</span><span class="o"><</span><span class="n">String</span><span class="o">,</span> <span class="n">Object</span><span class="o">></span> <span class="n">configs</span><span class="o">)</span> <span class="o">{</span>
|
||||
<span class="c1">// See #1 below.</span>
|
||||
<span class="n">BlockBasedTableConfig</span> <span class="n">tableConfig</span> <span class="o">=</span> <span class="k">(BlockBasedTableConfig)</span> <span class="n">options</span><span><span class="o">.</span><span class="na">tableFormatConfig</span><span class="o">();</span>
|
||||
<span class="n">tableConfig</span><span class="o">.</span><span class="na">setBlockCache</span><span class="o">(</span><span class="mi">cache</span></span><span class="o">);</span>
|
||||
<span class="c1">// See #2 below.</span>
|
||||
<span class="n">tableConfig</span><span class="o">.</span><span class="na">setBlockSize</span><span class="o">(</span><span class="mi">16</span> <span class="o">*</span> <span class="mi">1024L</span><span class="o">);</span>
|
||||
<span class="c1">// See #3 below.</span>
|
||||
<span class="n">tableConfig</span><span class="o">.</span><span class="na">setCacheIndexAndFilterBlocks</span><span class="o">(</span><span class="kc">true</span><span class="o">);</span>
|
||||
<span class="n">options</span><span class="o">.</span><span class="na">setTableFormatConfig</span><span class="o">(</span><span class="n">tableConfig</span><span class="o">);</span>
|
||||
<span class="c1">// See #4 below.</span>
|
||||
<span class="n">options</span><span class="o">.</span><span class="na">setMaxWriteBufferNumber</span><span class="o">(</span><span class="mi">2</span><span class="o">);</span>
|
||||
<span class="o">}</span>
|
||||
|
||||
<span class="nd">@Override</span>
|
||||
<span class="kd">public</span> <span class="kt">void</span> <span class="nf">close</span><span class="o">(</span><span class="kd">final</span> <span class="n">String</span> <span class="n">storeName</span><span class="o">,</span> <span class="kd">final</span> <span class="n">Options</span> <span class="n">options</span><span class="o">)</span> <span class="o">{</span>
|
||||
<span class="c1">// See #5 below.</span>
|
||||
<span class="n">cache</span><span class="o">.</span><span class="na">close</span><span class="o">();</span>
|
||||
<span class="o">}</span>
|
||||
<span class="o">}</span>
|
||||
|
||||
<span class="n">Properties</span> <span class="n">streamsSettings</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Properties</span><span class="o">();</span>
|
||||
<span class="n">streamsConfig</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">StreamsConfig</span><span class="o">.</span><span class="na">ROCKSDB_CONFIG_SETTER_CLASS_CONFIG</span><span class="o">,</span> <span class="n">CustomRocksDBConfig</span><span class="o">.</span><span class="na">class</span><span class="o">);</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<dl class="docutils">
|
||||
<dt>Notes for example:</dt>
|
||||
<dd><ol class="first last arabic simple">
|
||||
<li><code class="docutils literal"><span class="pre">BlockBasedTableConfig tableConfig = (BlockBasedTableConfig) options.tableFormatConfig();</span></code> Get a reference to the existing table config rather than create a new one, so you don't accidentally overwrite defaults such as the <code class="docutils literal"><span class="pre">BloomFilter</span></code>, which is an important optimization.
|
||||
<li><code class="docutils literal"><span class="pre">tableConfig.setBlockSize(16</span> <span class="pre">*</span> <span class="pre">1024L);</span></code> Modify the default <a class="reference external" href="https://github.com/apache/kafka/blob/2.3/streams/src/main/java/org/apache/kafka/streams/state/internals/RocksDBStore.java#L79">block size</a> per these instructions from the <a class="reference external" href="https://github.com/facebook/rocksdb/wiki/Memory-usage-in-RocksDB#indexes-and-filter-blocks">RocksDB GitHub</a>.</li>
|
||||
<li><code class="docutils literal"><span class="pre">tableConfig.setCacheIndexAndFilterBlocks(true);</span></code> Do not let the index and filter blocks grow unbounded. For more information, see the <a class="reference external" href="https://github.com/facebook/rocksdb/wiki/Block-Cache#caching-index-and-filter-blocks">RocksDB GitHub</a>.</li>
|
||||
<li><code class="docutils literal"><span class="pre">options.setMaxWriteBufferNumber(2);</span></code> See the advanced options in the <a class="reference external" href="https://github.com/facebook/rocksdb/blob/8dee8cad9ee6b70fd6e1a5989a8156650a70c04f/include/rocksdb/advanced_options.h#L103">RocksDB GitHub</a>.</li>
|
||||
<li><code class="docutils literal"><span class="pre">cache.close();</span></code> To avoid memory leaks, you must close any objects you constructed that extend org.rocksdb.RocksObject. See <a class="reference external" href="https://github.com/facebook/rocksdb/wiki/RocksJava-Basics#memory-management">RocksJava docs</a> for more details.</li>
|
||||
</ol>
|
||||
</dd>
|
||||
</dl>
|
||||
</div></blockquote>
|
||||
</div>
|
||||
</div>
|
||||
<div class="section" id="recommended-configuration-parameters-for-resiliency">
|
||||
<h3><a class="toc-backref" href="#id21">Recommended configuration parameters for resiliency</a><a class="headerlink" href="#recommended-configuration-parameters-for-resiliency" title="Permalink to this headline"></a></h3>
|
||||
<p>There are several Kafka and Kafka Streams configuration options that need to be configured explicitly for resiliency in face of broker failures:</p>
|
||||
<table border="1" class="non-scrolling-table docutils">
|
||||
<thead valign="bottom">
|
||||
<tr class="row-odd"><th class="head">Parameter Name</th>
|
||||
<th class="head">Corresponding Client</th>
|
||||
<th class="head">Default value</th>
|
||||
<th class="head">Consider setting to</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody valign="top">
|
||||
<tr class="row-even"><td>acks</td>
|
||||
<td>Producer</td>
|
||||
<td><code class="docutils literal"><span class="pre">acks=1</span></code></td>
|
||||
<td><code class="docutils literal"><span class="pre">acks=all</span></code></td>
|
||||
</tr>
|
||||
<tr class="row-odd"><td>replication.factor</td>
|
||||
<td>Streams</td>
|
||||
<td><code class="docutils literal"><span class="pre">1</span></code></td>
|
||||
<td><code class="docutils literal"><span class="pre">3</span></code></td>
|
||||
</tr>
|
||||
<tr class="row-even"><td>min.insync.replicas</td>
|
||||
<td>Broker</td>
|
||||
<td><code class="docutils literal"><span class="pre">1</span></code></td>
|
||||
<td><code class="docutils literal"><span class="pre">2</span></code></td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
<p>Increasing the replication factor to 3 ensures that the internal Kafka Streams topic can tolerate up to 2 broker failures. Changing the acks setting to “all”
|
||||
guarantees that a record will not be lost as long as one replica is alive. The tradeoff from moving to the default values to the recommended ones is
|
||||
that some performance and more storage space (3x with the replication factor of 3) are sacrificed for more resiliency.</p>
|
||||
<div class="section" id="acks">
|
||||
<h4><a class="toc-backref" href="#id22">acks</a><a class="headerlink" href="#acks" title="Permalink to this headline"></a></h4>
|
||||
<blockquote>
|
||||
<div><p>The number of acknowledgments that the leader must have received before considering a request complete. This controls
|
||||
the durability of records that are sent. The possible values are:</p>
|
||||
<ul class="simple">
|
||||
<li><code class="docutils literal"><span class="pre">acks=0</span></code> The producer does not wait for acknowledgment from the server and the record is immediately added to the socket buffer and considered sent. No guarantee can be made that the server has received the record in this case, and the <code class="docutils literal"><span class="pre">retries</span></code> configuration will not take effect (as the client won’t generally know of any failures). The offset returned for each record will always be set to <code class="docutils literal"><span class="pre">-1</span></code>.</li>
|
||||
<li><code class="docutils literal"><span class="pre">acks=1</span></code> The leader writes the record to its local log and responds without waiting for full acknowledgement from all followers. If the leader immediately fails after acknowledging the record, but before the followers have replicated it, then the record will be lost.</li>
|
||||
<li><code class="docutils literal"><span class="pre">acks=all</span></code> The leader waits for the full set of in-sync replicas to acknowledge the record. This guarantees that the record will not be lost if there is at least one in-sync replica alive. This is the strongest available guarantee.</li>
|
||||
</ul>
|
||||
<p>For more information, see the <a class="reference external" href="https://kafka.apache.org/documentation/#producerconfigs">Kafka Producer documentation</a>.</p>
|
||||
</div></blockquote>
|
||||
</div>
|
||||
<div class="section" id="id2">
|
||||
<h4><a class="toc-backref" href="#id23">replication.factor</a><a class="headerlink" href="#id2" title="Permalink to this headline"></a></h4>
|
||||
<blockquote>
|
||||
<div>See the <a class="reference internal" href="#replication-factor-parm"><span class="std std-ref">description here</span></a>.</div></blockquote>
|
||||
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="n">Properties</span> <span class="n">streamsSettings</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Properties</span><span class="o">();</span>
|
||||
<span class="n">streamsSettings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">StreamsConfig</span><span class="o">.</span><span class="na">REPLICATION_FACTOR_CONFIG</span><span class="o">,</span> <span class="mi">3</span><span class="o">);</span>
|
||||
<span class="n">streamsSettings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">StreamsConfig</span><span class="o">.</span><span class="na">topicPrefix</span><span class="o">(</span><span class="n">TopicConfig</span><span class="o">.</span><span class="na">MIN_IN_SYNC_REPLICAS_CONFIG</span><span class="o">),</span> <span class="mi">2</span><span class="o">);</span>
|
||||
<span class="n">streamsSettings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">StreamsConfig</span><span class="o">.</span><span class="na">producerPrefix</span><span class="o">(</span><span class="n">ProducerConfig</span><span class="o">.</span><span class="na">ACKS_CONFIG</span><span class="o">),</span> <span class="s">"all"</span><span class="o">);</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
|
||||
</div>
|
||||
</div>
|
||||
<div class="pagination">
|
||||
<a href="/{{version}}/documentation/streams/developer-guide/write-streams" class="pagination__btn pagination__btn__prev">Previous</a>
|
||||
<a href="/{{version}}/documentation/streams/developer-guide/dsl-api" class="pagination__btn pagination__btn__next">Next</a>
|
||||
</div>
|
||||
</script>
|
||||
|
||||
<!--#include virtual="../../../includes/_header.htm" -->
|
||||
<!--#include virtual="../../../includes/_top.htm" -->
|
||||
<div class="content documentation documentation--current">
|
||||
<!--#include virtual="../../../includes/_nav.htm" -->
|
||||
<div class="right">
|
||||
<!--#include virtual="../../../includes/_docs_banner.htm" -->
|
||||
<ul class="breadcrumbs">
|
||||
<li><a href="/documentation">Documentation</a></li>
|
||||
<li><a href="/documentation/streams">Kafka Streams</a></li>
|
||||
<li><a href="/documentation/streams/developer-guide/">Developer Guide</a></li>
|
||||
</ul>
|
||||
<div class="p-content"></div>
|
||||
</div>
|
||||
</div>
|
||||
<!--#include virtual="../../../includes/_footer.htm" -->
|
||||
<script>
|
||||
$(function() {
|
||||
// Show selected style on nav item
|
||||
$('.b-nav__streams').addClass('selected');
|
||||
|
||||
//sticky secondary nav
|
||||
var $navbar = $(".sub-nav-sticky"),
|
||||
y_pos = $navbar.offset().top,
|
||||
height = $navbar.height();
|
||||
|
||||
$(window).scroll(function() {
|
||||
var scrollTop = $(window).scrollTop();
|
||||
|
||||
if (scrollTop > y_pos - height) {
|
||||
$navbar.addClass("navbar-fixed")
|
||||
} else if (scrollTop <= y_pos) {
|
||||
$navbar.removeClass("navbar-fixed")
|
||||
}
|
||||
});
|
||||
|
||||
// Display docs subnav items
|
||||
$('.b-nav__docs').parent().toggleClass('nav__item__with__subs--expanded');
|
||||
});
|
||||
</script>
|
||||
239
docs/streams/developer-guide/datatypes.html
Normal file
239
docs/streams/developer-guide/datatypes.html
Normal file
@@ -0,0 +1,239 @@
|
||||
<!--
|
||||
Licensed to the Apache Software Foundation (ASF) under one or more
|
||||
contributor license agreements. See the NOTICE file distributed with
|
||||
this work for additional information regarding copyright ownership.
|
||||
The ASF licenses this file to You under the Apache License, Version 2.0
|
||||
(the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
-->
|
||||
|
||||
<script><!--#include virtual="../../js/templateData.js" --></script>
|
||||
|
||||
<script id="content-template" type="text/x-handlebars-template">
|
||||
<!-- h1>Developer Guide for Kafka Streams</h1 -->
|
||||
<div class="sub-nav-sticky">
|
||||
<div class="sticky-top">
|
||||
<!-- div style="height:35px">
|
||||
<a href="/{{version}}/documentation/streams/">Introduction</a>
|
||||
<a class="active-menu-item" href="/{{version}}/documentation/streams/developer-guide">Developer Guide</a>
|
||||
<a href="/{{version}}/documentation/streams/core-concepts">Concepts</a>
|
||||
<a href="/{{version}}/documentation/streams/quickstart">Run Demo App</a>
|
||||
<a href="/{{version}}/documentation/streams/tutorial">Tutorial: Write App</a>
|
||||
</div -->
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div class="section" id="data-types-and-serialization">
|
||||
<span id="streams-developer-guide-serdes"></span><h1>Data Types and Serialization<a class="headerlink" href="#data-types-and-serialization" title="Permalink to this headline"></a></h1>
|
||||
<p>Every Kafka Streams application must provide SerDes (Serializer/Deserializer) for the data types of record keys and record values (e.g. <code class="docutils literal"><span class="pre">java.lang.String</span></code>) to materialize the data when necessary. Operations that require such SerDes information include: <code class="docutils literal"><span class="pre">stream()</span></code>, <code class="docutils literal"><span class="pre">table()</span></code>, <code class="docutils literal"><span class="pre">to()</span></code>, <code class="docutils literal"><span class="pre">through()</span></code>, <code class="docutils literal"><span class="pre">groupByKey()</span></code>, <code class="docutils literal"><span class="pre">groupBy()</span></code>.</p>
|
||||
<p>You can provide SerDes by using either of these methods:</p>
|
||||
<ul class="simple">
|
||||
<li>By setting default SerDes in the <code class="docutils literal"><span class="pre">java.util.Properties</span></code> config instance.</li>
|
||||
<li>By specifying explicit SerDes when calling the appropriate API methods, thus overriding the defaults.</li>
|
||||
</ul>
|
||||
|
||||
<p class="topic-title first"><b>Table of Contents</b></p>
|
||||
<ul class="simple">
|
||||
<li><a class="reference internal" href="#configuring-serdes" id="id1">Configuring SerDes</a></li>
|
||||
<li><a class="reference internal" href="#overriding-default-serdes" id="id2">Overriding default SerDes</a></li>
|
||||
<li><a class="reference internal" href="#available-serdes" id="id3">Available SerDes</a></li>
|
||||
<ul>
|
||||
<li><a class="reference internal" href="#primitive-and-basic-types" id="id4">Primitive and basic types</a></li>
|
||||
<li><a class="reference internal" href="#json" id="id6">JSON</a></li>
|
||||
<li><a class="reference internal" href="#implementing-custom-serdes" id="id5">Implementing custom serdes</a></li>
|
||||
</ul>
|
||||
<li><a class="reference internal" href="#scala-dsl-serdes" id="id8">Kafka Streams DSL for Scala Implicit SerDes</a></li>
|
||||
</ul>
|
||||
<div class="section" id="configuring-serdes">
|
||||
<h2>Configuring SerDes<a class="headerlink" href="#configuring-serdes" title="Permalink to this headline"></a></h2>
|
||||
<p>SerDes specified in the Streams configuration are used as the default in your Kafka Streams application.</p>
|
||||
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">org.apache.kafka.common.serialization.Serdes</span><span class="o">;</span>
|
||||
<span class="kn">import</span> <span class="nn">org.apache.kafka.streams.StreamsConfig</span><span class="o">;</span>
|
||||
|
||||
<span class="n">Properties</span> <span class="n">settings</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Properties</span><span class="o">();</span>
|
||||
<span class="c1">// Default serde for keys of data records (here: built-in serde for String type)</span>
|
||||
<span class="n">settings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">StreamsConfig</span><span class="o">.</span><span class="na">DEFAULT_KEY_SERDE_CLASS_CONFIG</span><span class="o">,</span> <span class="n">Serdes</span><span class="o">.</span><span class="na">String</span><span class="o">().</span><span class="na">getClass</span><span class="o">().</span><span class="na">getName</span><span class="o">());</span>
|
||||
<span class="c1">// Default serde for values of data records (here: built-in serde for Long type)</span>
|
||||
<span class="n">settings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">StreamsConfig</span><span class="o">.</span><span class="na">DEFAULT_VALUE_SERDE_CLASS_CONFIG</span><span class="o">,</span> <span class="n">Serdes</span><span class="o">.</span><span class="na">Long</span><span class="o">().</span><span class="na">getClass</span><span class="o">().</span><span class="na">getName</span><span class="o">());</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div>
|
||||
<div class="section" id="overriding-default-serdes">
|
||||
<h2>Overriding default SerDes<a class="headerlink" href="#overriding-default-serdes" title="Permalink to this headline"></a></h2>
|
||||
<p>You can also specify SerDes explicitly by passing them to the appropriate API methods, which overrides the default serde settings:</p>
|
||||
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">org.apache.kafka.common.serialization.Serde</span><span class="o">;</span>
|
||||
<span class="kn">import</span> <span class="nn">org.apache.kafka.common.serialization.Serdes</span><span class="o">;</span>
|
||||
|
||||
<span class="kd">final</span> <span class="n">Serde</span><span class="o"><</span><span class="n">String</span><span class="o">></span> <span class="n">stringSerde</span> <span class="o">=</span> <span class="n">Serdes</span><span class="o">.</span><span class="na">String</span><span class="o">();</span>
|
||||
<span class="kd">final</span> <span class="n">Serde</span><span class="o"><</span><span class="n">Long</span><span class="o">></span> <span class="n">longSerde</span> <span class="o">=</span> <span class="n">Serdes</span><span class="o">.</span><span class="na">Long</span><span class="o">();</span>
|
||||
|
||||
<span class="c1">// The stream userCountByRegion has type `String` for record keys (for region)</span>
|
||||
<span class="c1">// and type `Long` for record values (for user counts).</span>
|
||||
<span class="n">KStream</span><span class="o"><</span><span class="n">String</span><span class="o">,</span> <span class="n">Long</span><span class="o">></span> <span class="n">userCountByRegion</span> <span class="o">=</span> <span class="o">...;</span>
|
||||
<span class="n">userCountByRegion</span><span class="o">.</span><span class="na">to</span><span class="o">(</span><span class="s">"RegionCountsTopic"</span><span class="o">,</span> <span class="n">Produced</span><span class="o">.</span><span class="na">with</span><span class="o">(</span><span class="n">stringSerde</span><span class="o">,</span> <span class="n">longSerde</span><span class="o">));</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>If you want to override serdes selectively, i.e., keep the defaults for some fields, then don’t specify the serde whenever you want to leverage the default settings:</p>
|
||||
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">org.apache.kafka.common.serialization.Serde</span><span class="o">;</span>
|
||||
<span class="kn">import</span> <span class="nn">org.apache.kafka.common.serialization.Serdes</span><span class="o">;</span>
|
||||
|
||||
<span class="c1">// Use the default serializer for record keys (here: region as String) by not specifying the key serde,</span>
|
||||
<span class="c1">// but override the default serializer for record values (here: userCount as Long).</span>
|
||||
<span class="kd">final</span> <span class="n">Serde</span><span class="o"><</span><span class="n">Long</span><span class="o">></span> <span class="n">longSerde</span> <span class="o">=</span> <span class="n">Serdes</span><span class="o">.</span><span class="na">Long</span><span class="o">();</span>
|
||||
<span class="n">KStream</span><span class="o"><</span><span class="n">String</span><span class="o">,</span> <span class="n">Long</span><span class="o">></span> <span class="n">userCountByRegion</span> <span class="o">=</span> <span class="o">...;</span>
|
||||
<span class="n">userCountByRegion</span><span class="o">.</span><span class="na">to</span><span class="o">(</span><span class="s">"RegionCountsTopic"</span><span class="o">,</span> <span class="n">Produced</span><span class="o">.</span><span class="na">valueSerde</span><span class="o">(</span><span class="n">Serdes</span><span class="o">.</span><span class="na">Long</span><span class="o">()));</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>If some of your incoming records are corrupted or ill-formatted, they will cause the deserializer class to report an error.
|
||||
Since 1.0.x we have introduced an <code>DeserializationExceptionHandler</code> interface which allows
|
||||
you to customize how to handle such records. The customized implementation of the interface can be specified via the <code>StreamsConfig</code>.
|
||||
For more details, please feel free to read the <a href="config-streams.html#default-deserialization-exception-handler">Configuring a Streams Application</a> section.
|
||||
</p>
|
||||
</div>
|
||||
<div class="section" id="available-serdes">
|
||||
<span id="streams-developer-guide-serdes-available"></span><h2>Available SerDes<a class="headerlink" href="#available-serdes" title="Permalink to this headline"></a></h2>
|
||||
<div class="section" id="primitive-and-basic-types">
|
||||
<h3>Primitive and basic types<a class="headerlink" href="#primitive-and-basic-types" title="Permalink to this headline"></a></h3>
|
||||
<p>Apache Kafka includes several built-in serde implementations for Java primitives and basic types such as <code class="docutils literal"><span class="pre">byte[]</span></code> in
|
||||
its <code class="docutils literal"><span class="pre">kafka-clients</span></code> Maven artifact:</p>
|
||||
<div class="highlight-xml"><div class="highlight"><pre><span></span><span class="nt"><dependency></span>
|
||||
<span class="nt"><groupId></span>org.apache.kafka<span class="nt"></groupId></span>
|
||||
<span class="nt"><artifactId></span>kafka-clients<span class="nt"></artifactId></span>
|
||||
<span class="nt"><version></span>{{fullDotVersion}}<span class="nt"></version></span>
|
||||
<span class="nt"></dependency></span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>This artifact provides the following serde implementations under the package <a class="reference external" href="https://github.com/apache/kafka/blob/{{dotVersion}}/clients/src/main/java/org/apache/kafka/common/serialization">org.apache.kafka.common.serialization</a>, which you can leverage when e.g., defining default serializers in your Streams configuration.</p>
|
||||
<table border="1" class="docutils">
|
||||
<colgroup>
|
||||
<col width="17%" />
|
||||
<col width="83%" />
|
||||
</colgroup>
|
||||
<thead valign="bottom">
|
||||
<tr class="row-odd"><th class="head">Data type</th>
|
||||
<th class="head">Serde</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody valign="top">
|
||||
<tr class="row-even"><td>byte[]</td>
|
||||
<td><code class="docutils literal"><span class="pre">Serdes.ByteArray()</span></code>, <code class="docutils literal"><span class="pre">Serdes.Bytes()</span></code> (see tip below)</td>
|
||||
</tr>
|
||||
<tr class="row-odd"><td>ByteBuffer</td>
|
||||
<td><code class="docutils literal"><span class="pre">Serdes.ByteBuffer()</span></code></td>
|
||||
</tr>
|
||||
<tr class="row-even"><td>Double</td>
|
||||
<td><code class="docutils literal"><span class="pre">Serdes.Double()</span></code></td>
|
||||
</tr>
|
||||
<tr class="row-odd"><td>Integer</td>
|
||||
<td><code class="docutils literal"><span class="pre">Serdes.Integer()</span></code></td>
|
||||
</tr>
|
||||
<tr class="row-even"><td>Long</td>
|
||||
<td><code class="docutils literal"><span class="pre">Serdes.Long()</span></code></td>
|
||||
</tr>
|
||||
<tr class="row-odd"><td>String</td>
|
||||
<td><code class="docutils literal"><span class="pre">Serdes.String()</span></code></td>
|
||||
</tr>
|
||||
<tr class="row-even"><td>UUID</td>
|
||||
<td><code class="docutils literal"><span class="pre">Serdes.UUID()</span></code></td>
|
||||
</tr>
|
||||
</tr>
|
||||
<tr class="row-odd"><td>Void</td>
|
||||
<td><code class="docutils literal"><span class="pre">Serdes.Void()</span></code></td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
<div class="admonition tip">
|
||||
<p><b>Tip</b></p>
|
||||
<p class="last"><a class="reference external" href="https://github.com/apache/kafka/blob/{{dotVersion}}/clients/src/main/java/org/apache/kafka/common/utils/Bytes.java">Bytes</a> is a wrapper for Java’s <code class="docutils literal"><span class="pre">byte[]</span></code> (byte array) that supports proper equality and ordering semantics. You may want to consider using <code class="docutils literal"><span class="pre">Bytes</span></code> instead of <code class="docutils literal"><span class="pre">byte[]</span></code> in your applications.</p>
|
||||
</div>
|
||||
</div>
|
||||
<div class="section" id="json">
|
||||
<h3>JSON<a class="headerlink" href="#json" title="Permalink to this headline"></a></h3>
|
||||
<p>The Kafka Streams code examples also include a basic serde implementation for JSON:</p>
|
||||
<ul class="simple">
|
||||
<li><a class="reference external" href="https://github.com/apache/kafka/blob/{{dotVersion}}/streams/examples/src/main/java/org/apache/kafka/streams/examples/pageview/PageViewTypedDemo.java#L83">PageViewTypedDemo</a></li>
|
||||
</ul>
|
||||
<p>As shown in the example, you can use JSONSerdes inner classes <code class="docutils literal"><span class="pre">Serdes.serdeFrom(<serializerInstance>, <deserializerInstance>)</span></code> to construct JSON compatible serializers and deserializers.
|
||||
</p>
|
||||
</div>
|
||||
<div class="section" id="implementing-custom-serdes">
|
||||
<span id="streams-developer-guide-serdes-custom"></span><h2>Implementing custom SerDes<a class="headerlink" href="#implementing-custom-serdes" title="Permalink to this headline"></a></h2>
|
||||
<p>If you need to implement custom SerDes, your best starting point is to take a look at the source code references of
|
||||
existing SerDes (see previous section). Typically, your workflow will be similar to:</p>
|
||||
<ol class="arabic simple">
|
||||
<li>Write a <em>serializer</em> for your data type <code class="docutils literal"><span class="pre">T</span></code> by implementing
|
||||
<a class="reference external" href="https://github.com/apache/kafka/blob/{{dotVersion}}/clients/src/main/java/org/apache/kafka/common/serialization/Serializer.java">org.apache.kafka.common.serialization.Serializer</a>.</li>
|
||||
<li>Write a <em>deserializer</em> for <code class="docutils literal"><span class="pre">T</span></code> by implementing
|
||||
<a class="reference external" href="https://github.com/apache/kafka/blob/{{dotVersion}}/clients/src/main/java/org/apache/kafka/common/serialization/Deserializer.java">org.apache.kafka.common.serialization.Deserializer</a>.</li>
|
||||
<li>Write a <em>serde</em> for <code class="docutils literal"><span class="pre">T</span></code> by implementing
|
||||
<a class="reference external" href="https://github.com/apache/kafka/blob/{{dotVersion}}/clients/src/main/java/org/apache/kafka/common/serialization/Serde.java">org.apache.kafka.common.serialization.Serde</a>,
|
||||
which you either do manually (see existing SerDes in the previous section) or by leveraging helper functions in
|
||||
<a class="reference external" href="https://github.com/apache/kafka/blob/{{dotVersion}}/clients/src/main/java/org/apache/kafka/common/serialization/Serdes.java">Serdes</a>
|
||||
such as <code class="docutils literal"><span class="pre">Serdes.serdeFrom(Serializer<T>, Deserializer<T>)</span></code>.
|
||||
Note that you will need to implement your own class (that has no generic types) if you want to use your custom serde in the configuration provided to <code class="docutils literal"><span class="pre">KafkaStreams</span></code>.
|
||||
If your serde class has generic types or you use <code class="docutils literal"><span class="pre">Serdes.serdeFrom(Serializer<T>, Deserializer<T>)</span></code>, you can pass your serde only
|
||||
via methods calls (for example <code class="docutils literal"><span class="pre">builder.stream("topicName", Consumed.with(...))</span></code>).</li>
|
||||
</ol>
|
||||
</div>
|
||||
</div>
|
||||
<div class="section" id="scala-dsl-serdes">
|
||||
<h2>Kafka Streams DSL for Scala Implicit SerDes<a class="headerlink" href="scala-dsl-serdes" title="Permalink to this headline"></a></h2>
|
||||
<p>When using the <a href="dsl-api.html#scala-dsl">Kafka Streams DSL for Scala</a> you're not required to configure a default SerDes. In fact, it's not supported. SerDes are instead provided implicitly by default implementations for common primitive datatypes. See the <a href="dsl-api.html#scala-dsl-implicit-serdes">Implicit SerDes</a> and <a href="dsl-api.html#scala-dsl-user-defined-serdes">User-Defined SerDes</a> sections in the DSL API documentation for details</p>
|
||||
</div>
|
||||
|
||||
|
||||
</div>
|
||||
</div>
|
||||
<div class="pagination">
|
||||
<a href="/{{version}}/documentation/streams/developer-guide/processor-api" class="pagination__btn pagination__btn__prev">Previous</a>
|
||||
<a href="/{{version}}/documentation/streams/developer-guide/testing" class="pagination__btn pagination__btn__next">Next</a>
|
||||
</div>
|
||||
</script>
|
||||
|
||||
<!--#include virtual="../../../includes/_header.htm" -->
|
||||
<!--#include virtual="../../../includes/_top.htm" -->
|
||||
<div class="content documentation documentation--current">
|
||||
<!--#include virtual="../../../includes/_nav.htm" -->
|
||||
<div class="right">
|
||||
<!--#include virtual="../../../includes/_docs_banner.htm" -->
|
||||
<ul class="breadcrumbs">
|
||||
<li><a href="/documentation">Documentation</a></li>
|
||||
<li><a href="/documentation/streams">Kafka Streams</a></li>
|
||||
<li><a href="/documentation/streams/developer-guide/">Developer Guide</a></li>
|
||||
</ul>
|
||||
<div class="p-content"></div>
|
||||
</div>
|
||||
</div>
|
||||
<!--#include virtual="../../../includes/_footer.htm" -->
|
||||
<script>
|
||||
$(function() {
|
||||
// Show selected style on nav item
|
||||
$('.b-nav__streams').addClass('selected');
|
||||
|
||||
//sticky secondary nav
|
||||
var $navbar = $(".sub-nav-sticky"),
|
||||
y_pos = $navbar.offset().top,
|
||||
height = $navbar.height();
|
||||
|
||||
$(window).scroll(function() {
|
||||
var scrollTop = $(window).scrollTop();
|
||||
|
||||
if (scrollTop > y_pos - height) {
|
||||
$navbar.addClass("navbar-fixed")
|
||||
} else if (scrollTop <= y_pos) {
|
||||
$navbar.removeClass("navbar-fixed")
|
||||
}
|
||||
});
|
||||
|
||||
// Display docs subnav items
|
||||
$('.b-nav__docs').parent().toggleClass('nav__item__with__subs--expanded');
|
||||
});
|
||||
</script>
|
||||
3962
docs/streams/developer-guide/dsl-api.html
Normal file
3962
docs/streams/developer-guide/dsl-api.html
Normal file
File diff suppressed because it is too large
Load Diff
370
docs/streams/developer-guide/dsl-topology-naming.html
Normal file
370
docs/streams/developer-guide/dsl-topology-naming.html
Normal file
@@ -0,0 +1,370 @@
|
||||
<!--
|
||||
Licensed to the Apache Software Foundation (ASF) under one or more
|
||||
contributor license agreements. See the NOTICE file distributed with
|
||||
this work for additional information regarding copyright ownership.
|
||||
The ASF licenses this file to You under the Apache License, Version 2.0
|
||||
(the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
-->
|
||||
|
||||
<script><!--#include virtual="../../js/templateData.js" --></script>
|
||||
|
||||
<script id="content-template" type="text/x-handlebars-template">
|
||||
<!-- h1>Developer Guide for Kafka Streams</h1 -->
|
||||
<div class="sub-nav-sticky">
|
||||
<div class="sticky-top">
|
||||
<!-- div style="height:35px">
|
||||
<a href="/{{version}}/documentation/streams/">Introduction</a>
|
||||
<a class="active-menu-item" href="/{{version}}/documentation/streams/developer-guide">Developer Guide</a>
|
||||
<a href="/{{version}}/documentation/streams/core-concepts">Concepts</a>
|
||||
<a href="/{{version}}/documentation/streams/quickstart">Run Demo App</a>
|
||||
<a href="/{{version}}/documentation/streams/tutorial">Tutorial: Write App</a>
|
||||
</div -->
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div class="section" id="naming">
|
||||
<span id="streams-developer-guide-dsl-topology-naming"></span>
|
||||
<h1>Naming Operators in a Kafka Streams DSL Application<a class="headerlink" href="#naming" title="Permalink to this headline"></a></h1>
|
||||
|
||||
<p>
|
||||
You now can give names to processors when using the Kafka Streams DSL.
|
||||
In the PAPI there are <code>Processors</code> and <code>State Stores</code> and
|
||||
you are required to explicitly name each one.
|
||||
</p>
|
||||
<p>
|
||||
At the DLS layer, there are operators. A single DSL operator may
|
||||
compile down to multiple <code>Processors</code> and <code>State Stores</code>, and
|
||||
if required <code>repartition topics</code>. But with the Kafka Streams
|
||||
DSL, all these names are generated for you. There is a relationship between
|
||||
the generated processor name state store names (hence changelog topic names) and repartition
|
||||
topic names. Note, that the names of state stores and changelog/repartition topics
|
||||
are "stateful" while processor names are "stateless".
|
||||
</p>
|
||||
<p>
|
||||
This distinction
|
||||
of stateful vs. stateless names has important implications when updating your topology.
|
||||
While the internal naming makes creating
|
||||
a topology with the DSL much more straightforward,
|
||||
there are a couple of trade-offs. The first trade-off is what we could
|
||||
consider a readability issue. The other
|
||||
more severe trade-off is the shifting of names due to the relationship between the
|
||||
DSL operator and the generated <code>Processors</code>, <code>State Stores</code> changelog
|
||||
topics and repartition topics.
|
||||
</p>
|
||||
|
||||
|
||||
<h2>Readability Issues</h2>
|
||||
|
||||
<p>
|
||||
By saying there is a readability trade-off, we are referring to viewing a description of the topology.
|
||||
When you render the string description of your topology via the <code>Topology#desribe()</code>
|
||||
method, you can see what the processor is, but you don't have any context for its business purpose.
|
||||
For example, consider the following simple topology:
|
||||
|
||||
<br/>
|
||||
<pre>
|
||||
KStream<String,String> stream = builder.stream("input");
|
||||
stream.filter((k,v) -> !v.equals("invalid_txn"))
|
||||
.mapValues((v) -> v.substring(0,5))
|
||||
.to("output")
|
||||
</pre>
|
||||
|
||||
</p>
|
||||
|
||||
<p>
|
||||
Running <code>Topology#describe()</code> yields this string:
|
||||
|
||||
<pre>
|
||||
Topologies:
|
||||
Sub-topology: 0
|
||||
Source: KSTREAM-SOURCE-0000000000 (topics: [input])
|
||||
--> KSTREAM-FILTER-0000000001
|
||||
Processor: KSTREAM-FILTER-0000000001 (stores: [])
|
||||
--> KSTREAM-MAPVALUES-0000000002
|
||||
<-- KSTREAM-SOURCE-0000000000
|
||||
Processor: KSTREAM-MAPVALUES-0000000002 (stores: [])
|
||||
--> KSTREAM-SINK-0000000003
|
||||
<-- KSTREAM-FILTER-0000000001
|
||||
Sink: KSTREAM-SINK-0000000003 (topic: output)
|
||||
<-- KSTREAM-MAPVALUES-0000000002
|
||||
</pre>
|
||||
|
||||
From this report, you can see what the different operators are, but what is the broader context here?
|
||||
For example, consider <code>KSTREAM-FILTER-0000000001</code>, we can see that it's a
|
||||
filter operation, which means that records are dropped that don't match the given predicate. But what is
|
||||
the meaning of the predicate? Additionally, you can see the topic names of the source and sink nodes,
|
||||
but what if the topics aren't named in a meaningful way? Then you're left to guess the
|
||||
business purpose behind these topics.
|
||||
</p>
|
||||
<p>
|
||||
Also notice the numbering here: the source node is suffixed with <code>0000000000</code>
|
||||
indicating it's the first processor in the topology.
|
||||
The filter is suffixed with <code>0000000001</code>, indicating it's the second processor in
|
||||
the topology. In Kafka Streams, there are now overloaded methods for
|
||||
both <code>KStream</code> and <code>KTable</code> that accept
|
||||
a new parameter <code>Named</code>. By using the <code>Named</code> class DSL users can
|
||||
provide meaningful names to the processors in their topology.
|
||||
</p>
|
||||
<p>
|
||||
Now let's take a look at your topology with all the processors named:
|
||||
<pre>
|
||||
KStream<String,String> stream =
|
||||
builder.stream("input", Consumed.as("Customer_transactions_input_topic"));
|
||||
stream.filter((k,v) -> !v.equals("invalid_txn"), Named.as("filter_out_invalid_txns"))
|
||||
.mapValues((v) -> v.substring(0,5), Named.as("Map_values_to_first_6_characters"))
|
||||
.to("output", Produced.as("Mapped_transactions_output_topic"));
|
||||
</pre>
|
||||
|
||||
<pre>
|
||||
Topologies:
|
||||
Sub-topology: 0
|
||||
Source: Customer_transactions_input_topic (topics: [input])
|
||||
--> filter_out_invalid_txns
|
||||
Processor: filter_out_invalid_txns (stores: [])
|
||||
--> Map_values_to_first_6_characters
|
||||
<-- Customer_transactions_input_topic
|
||||
Processor: Map_values_to_first_6_characters (stores: [])
|
||||
--> Mapped_transactions_output_topic
|
||||
<-- filter_out_invalid_txns
|
||||
Sink: Mapped_transactions_output_topic (topic: output)
|
||||
<-- Map_values_to_first_6_characters
|
||||
</pre>
|
||||
|
||||
Now you can look at the topology description and easily understand what role each processor
|
||||
plays in the topology. But there's another reason for naming your processor nodes when you
|
||||
have stateful operators that remain between restarts of your Kafka Streams applications,
|
||||
state stores, changelog topics, and repartition topics.
|
||||
</p>
|
||||
|
||||
<h2>Changing Names</h2>
|
||||
<p>
|
||||
Generated names are numbered where they are built in the topology.
|
||||
The name generation strategy is
|
||||
<code>KSTREAM|KTABLE->operator name<->number suffix<</code>. The number is a
|
||||
globally incrementing number that represents the operator's order in the topology.
|
||||
The generated number is prefixed with a varying number of "0"s to create a
|
||||
string that is consistently 10 characters long.
|
||||
This means that if you add/remove or shift the order of operations, the position of the
|
||||
processor shifts, which shifts the name of the processor. Since <strong>most</strong> processors exist
|
||||
in memory only, this name shifting presents no issue for many topologies. But the name
|
||||
shifting does have implications for topologies with stateful operators or repartition topics.
|
||||
|
||||
Here's a different topology with some state:
|
||||
<pre>
|
||||
KStream<String,String> stream = builder.stream("input");
|
||||
stream.groupByKey()
|
||||
.count()
|
||||
.toStream()
|
||||
.to("output");
|
||||
</pre>
|
||||
This topology description yields the following:
|
||||
<pre>
|
||||
Topologies:
|
||||
Sub-topology: 0
|
||||
Source: KSTREAM-SOURCE-0000000000 (topics: [input])
|
||||
--> KSTREAM-AGGREGATE-0000000002
|
||||
Processor: KSTREAM-AGGREGATE-0000000002 (stores: [KSTREAM-AGGREGATE-STATE-STORE-0000000001])
|
||||
--> KTABLE-TOSTREAM-0000000003
|
||||
<-- KSTREAM-SOURCE-0000000000
|
||||
Processor: KTABLE-TOSTREAM-0000000003 (stores: [])
|
||||
--> KSTREAM-SINK-0000000004
|
||||
<-- KSTREAM-AGGREGATE-0000000002
|
||||
Sink: KSTREAM-SINK-0000000004 (topic: output)
|
||||
<-- KTABLE-TOSTREAM-0000000003
|
||||
</pre>
|
||||
</p>
|
||||
<p>
|
||||
You can see from the topology description above that the state store is named
|
||||
<code>KSTREAM-AGGREGATE-STATE-STORE-0000000002</code>. Here's what happens when you
|
||||
add a filter to keep some of the records out of the aggregation:
|
||||
<pre>
|
||||
KStream<String,String> stream = builder.stream("input");
|
||||
stream.filter((k,v)-> v !=null && v.length() >= 6 )
|
||||
.groupByKey()
|
||||
.count()
|
||||
.toStream()
|
||||
.to("output");
|
||||
</pre>
|
||||
|
||||
And the corresponding topology:
|
||||
<pre>
|
||||
Topologies:
|
||||
Sub-topology: 0
|
||||
Source: KSTREAM-SOURCE-0000000000 (topics: [input])
|
||||
--> KSTREAM-FILTER-0000000001
|
||||
Processor: KSTREAM-FILTER-0000000001 (stores: [])
|
||||
--> KSTREAM-AGGREGATE-0000000003
|
||||
<-- KSTREAM-SOURCE-0000000000
|
||||
Processor: KSTREAM-AGGREGATE-0000000003 (stores: [KSTREAM-AGGREGATE-STATE-STORE-0000000002])
|
||||
--> KTABLE-TOSTREAM-0000000004
|
||||
<-- KSTREAM-FILTER-0000000001
|
||||
Processor: KTABLE-TOSTREAM-0000000004 (stores: [])
|
||||
--> KSTREAM-SINK-0000000005
|
||||
<-- KSTREAM-AGGREGATE-0000000003
|
||||
Sink: KSTREAM-SINK-0000000005 (topic: output)
|
||||
<-- KTABLE-TOSTREAM-0000000004
|
||||
</pre>
|
||||
</p>
|
||||
<p>
|
||||
Notice that since you've added an operation <em>before</em> the <code>count</code> operation, the state
|
||||
store (and the changelog topic) names have changed. This name change means you can't
|
||||
do a rolling re-deployment of your updated topology. Also, you must use the
|
||||
<a href="/{{version}}/documentation/streams/developer-guide/app-reset-tool">Streams Reset Tool</a>
|
||||
to re-calculate the aggregations, because the changelog topic has changed on start-up and the
|
||||
new changelog topic contains no data.
|
||||
|
||||
Fortunately, there's an easy solution to remedy this situation. Give the
|
||||
state store a user-defined name instead of relying on the generated one,
|
||||
so you don't have to worry about topology changes shifting the name of the state store.
|
||||
|
||||
You've had the ability to name repartition topics with the <code>Joined</code>,
|
||||
<code>StreamJoined</code>, and<code>Grouped</code> classes, and
|
||||
name state store and changelog topics with <code>Materialized</code>.
|
||||
But it's worth reiterating the importance of naming these DSL topology operations again.
|
||||
|
||||
Here's how your DSL code looks now giving a specific name to your state store:
|
||||
<pre>
|
||||
KStream<String,String> stream = builder.stream("input");
|
||||
stream.filter((k, v) -> v != null && v.length() >= 6)
|
||||
.groupByKey()
|
||||
.count(Materialized.as("Purchase_count_store"))
|
||||
.toStream()
|
||||
.to("output");
|
||||
</pre>
|
||||
|
||||
And here's the topology
|
||||
|
||||
<pre>
|
||||
Topologies:
|
||||
Sub-topology: 0
|
||||
Source: KSTREAM-SOURCE-0000000000 (topics: [input])
|
||||
--> KSTREAM-FILTER-0000000001
|
||||
Processor: KSTREAM-FILTER-0000000001 (stores: [])
|
||||
--> KSTREAM-AGGREGATE-0000000002
|
||||
<-- KSTREAM-SOURCE-0000000000
|
||||
Processor: KSTREAM-AGGREGATE-0000000002 (stores: [Purchase_count_store])
|
||||
--> KTABLE-TOSTREAM-0000000003
|
||||
<-- KSTREAM-FILTER-0000000001
|
||||
Processor: KTABLE-TOSTREAM-0000000003 (stores: [])
|
||||
--> KSTREAM-SINK-0000000004
|
||||
<-- KSTREAM-AGGREGATE-0000000002
|
||||
Sink: KSTREAM-SINK-0000000004 (topic: output)
|
||||
<-- KTABLE-TOSTREAM-0000000003
|
||||
</pre>
|
||||
</p>
|
||||
<p>
|
||||
Now, even though you've added processors before your state store, the store name and its changelog
|
||||
topic names don't change. This makes your topology more robust and resilient to changes made by
|
||||
adding or removing processors.
|
||||
</p>
|
||||
|
||||
<h2>Conclusion</h2>
|
||||
|
||||
It's a good practice to name your processing nodes when using the DSL, and it's even
|
||||
more important to do this when you have "stateful" processors
|
||||
your application such as repartition
|
||||
topics and state stores (and the accompanying changelog topics).
|
||||
<p>
|
||||
Here are a couple of points to remember when naming your DSL topology:
|
||||
<ol>
|
||||
<li>
|
||||
If you have an <em>existing topology</em> and you <em>haven't</em> named your
|
||||
state stores (and changelog topics) and repartition topics, we recommended that you
|
||||
do so. But this will be a topology breaking change, so you'll need to shut down all
|
||||
application instances, make the changes, and run the
|
||||
<a href="/{{version}}/documentation/streams/developer-guide/app-reset-tool">Streams Reset Tool</a>.
|
||||
Although this may be inconvenient at first, it's worth the effort to protect your application from
|
||||
unexpected errors due to topology changes.
|
||||
</li>
|
||||
<li>
|
||||
If you have a <em>new topology</em>, make sure you name the persistent parts of your topology:
|
||||
state stores (changelog topics) and repartition topics. This way, when you deploy your
|
||||
application, you're protected from topology changes that otherwise would break your Kafka Streams application.
|
||||
If you don't want to add names to stateless processors at first, that's fine as you can
|
||||
always go back and add the names later.
|
||||
</li>
|
||||
</ol>
|
||||
|
||||
Here's a quick reference on naming the critical parts of
|
||||
your Kafka Streams application to prevent topology name changes from breaking your application:
|
||||
|
||||
<table>
|
||||
<tr>
|
||||
<th>Operation</th><th>Naming Class</th>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>Aggregation repartition topics</td><td>Grouped</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>KStream-KStream Join repartition topics</td><td>StreamJoined</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>KStream-KTable Join repartition topic</td><td>Joined</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>KStream-KStream Join state stores</td><td>StreamJoined</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>State Stores (for aggregations and KTable-KTable joins)</td><td>Materialized</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>Stream/Table non-stateful operations</td><td>Named</td>
|
||||
</tr>
|
||||
</table>
|
||||
</p>
|
||||
</div>
|
||||
|
||||
</script>
|
||||
|
||||
<!--#include virtual="../../../includes/_header.htm" -->
|
||||
<!--#include virtual="../../../includes/_top.htm" -->
|
||||
<div class="content documentation documentation--current">
|
||||
<!--#include virtual="../../../includes/_nav.htm" -->
|
||||
<div class="right">
|
||||
<!--#include virtual="../../../includes/_docs_banner.htm" -->
|
||||
<ul class="breadcrumbs">
|
||||
<li><a href="/documentation">Documentation</a></li>
|
||||
<li><a href="/documentation/streams">Kafka Streams</a></li>
|
||||
<li><a href="/documentation/streams/developer-guide/">Developer Guide</a></li>
|
||||
</ul>
|
||||
<div class="p-content"></div>
|
||||
</div>
|
||||
</div>
|
||||
<!--#include virtual="../../../includes/_footer.htm" -->
|
||||
<script>
|
||||
$(function () {
|
||||
// Show selected style on nav item
|
||||
$('.b-nav__streams').addClass('selected');
|
||||
|
||||
//sticky secondary nav
|
||||
var $navbar = $(".sub-nav-sticky"),
|
||||
y_pos = $navbar.offset().top,
|
||||
height = $navbar.height();
|
||||
|
||||
$(window).scroll(function () {
|
||||
var scrollTop = $(window).scrollTop();
|
||||
|
||||
if (scrollTop > y_pos - height) {
|
||||
$navbar.addClass("navbar-fixed")
|
||||
} else if (scrollTop <= y_pos) {
|
||||
$navbar.removeClass("navbar-fixed")
|
||||
}
|
||||
});
|
||||
|
||||
// Display docs subnav items
|
||||
$('.b-nav__docs').parent().toggleClass('nav__item__with__subs--expanded');
|
||||
});
|
||||
</script>
|
||||
|
||||
|
||||
|
||||
|
||||
106
docs/streams/developer-guide/index.html
Normal file
106
docs/streams/developer-guide/index.html
Normal file
@@ -0,0 +1,106 @@
|
||||
<!--
|
||||
Licensed to the Apache Software Foundation (ASF) under one or more
|
||||
contributor license agreements. See the NOTICE file distributed with
|
||||
this work for additional information regarding copyright ownership.
|
||||
The ASF licenses this file to You under the Apache License, Version 2.0
|
||||
(the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
-->
|
||||
|
||||
<script><!--#include virtual="../../js/templateData.js" --></script>
|
||||
|
||||
<script id="content-template" type="text/x-handlebars-template">
|
||||
<h1>Developer Guide for Kafka Streams</h1>
|
||||
<div class="sub-nav-sticky">
|
||||
<div class="sticky-top">
|
||||
<div style="height:35px">
|
||||
<a href="/{{version}}/documentation/streams/">Introduction</a>
|
||||
<a href="/{{version}}/documentation/streams/quickstart">Run Demo App</a>
|
||||
<a href="/{{version}}/documentation/streams/tutorial">Tutorial: Write App</a>
|
||||
<a href="/{{version}}/documentation/streams/core-concepts">Concepts</a>
|
||||
<a href="/{{version}}/documentation/streams/architecture">Architecture</a>
|
||||
<a class="active-menu-item" href="/{{version}}/documentation/streams/developer-guide/">Developer Guide</a>
|
||||
<a href="/{{version}}/documentation/streams/upgrade-guide">Upgrade</a>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
|
||||
<div class="section" id="developer-guide">
|
||||
<!-- span id="streams-developer-guide"></span><h1>Developer Guide<a class="headerlink" href="#developer-guide" title="Permalink to this headline"></a></h1 -->
|
||||
<p>This developer guide describes how to write, configure, and execute a Kafka Streams application.</p>
|
||||
<div class="toctree-wrapper compound">
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference internal" href="write-streams.html">Writing a Streams Application</a></li>
|
||||
<li class="toctree-l1"><a class="reference internal" href="config-streams.html">Configuring a Streams Application</a></li>
|
||||
<li class="toctree-l1"><a class="reference internal" href="dsl-api.html">Streams DSL</a></li>
|
||||
<li class="toctree-l1"><a class="reference internal" href="processor-api.html">Processor API</a></li>
|
||||
<li class="toctree-l1"><a class="reference internal" href="dsl-topology-naming.html">Naming Operators in a Streams DSL application</a></li>
|
||||
<li class="toctree-l1"><a class="reference internal" href="datatypes.html">Data Types and Serialization</a></li>
|
||||
<li class="toctree-l1"><a class="reference internal" href="testing.html">Testing a Streams Application</a></li>
|
||||
<li class="toctree-l1"><a class="reference internal" href="interactive-queries.html">Interactive Queries</a></li>
|
||||
<li class="toctree-l1"><a class="reference internal" href="memory-mgmt.html">Memory Management</a></li>
|
||||
<li class="toctree-l1"><a class="reference internal" href="running-app.html">Running Streams Applications</a></li>
|
||||
<li class="toctree-l1"><a class="reference internal" href="manage-topics.html">Managing Streams Application Topics</a></li>
|
||||
<li class="toctree-l1"><a class="reference internal" href="security.html">Streams Security</a></li>
|
||||
<li class="toctree-l1"><a class="reference internal" href="app-reset-tool.html">Application Reset Tool</a></li>
|
||||
</ul>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div class="pagination">
|
||||
<a href="/{{version}}/documentation/streams/architecture" class="pagination__btn pagination__btn__prev">Previous</a>
|
||||
<a href="/{{version}}/documentation/streams/developer-guide/write-streams" class="pagination__btn pagination__btn__next">Next</a>
|
||||
</div>
|
||||
</script>
|
||||
|
||||
<!--#include virtual="../../../includes/_header.htm" -->
|
||||
<!--#include virtual="../../../includes/_top.htm" -->
|
||||
<div class="content documentation documentation--current">
|
||||
<!--#include virtual="../../../includes/_nav.htm" -->
|
||||
<div class="right">
|
||||
<!--#include virtual="../../../includes/_docs_banner.htm" -->
|
||||
<ul class="breadcrumbs">
|
||||
<li><a href="/documentation">Documentation</a></li>
|
||||
<li><a href="/documentation/streams">Kafka Streams</a></li>
|
||||
</ul>
|
||||
<div class="p-content"></div>
|
||||
</div>
|
||||
</div>
|
||||
<!--#include virtual="../../../includes/_footer.htm" -->
|
||||
<script>
|
||||
$(function() {
|
||||
// Show selected style on nav item
|
||||
$('.b-nav__streams').addClass('selected');
|
||||
|
||||
//sticky secondary nav
|
||||
var $navbar = $(".sub-nav-sticky"),
|
||||
y_pos = $navbar.offset().top,
|
||||
height = $navbar.height();
|
||||
|
||||
$(window).scroll(function() {
|
||||
var scrollTop = $(window).scrollTop();
|
||||
|
||||
if (scrollTop > y_pos - height) {
|
||||
$navbar.addClass("navbar-fixed")
|
||||
} else if (scrollTop <= y_pos) {
|
||||
$navbar.removeClass("navbar-fixed")
|
||||
}
|
||||
});
|
||||
|
||||
// Display docs subnav items
|
||||
$('.b-nav__docs').parent().toggleClass('nav__item__with__subs--expanded');
|
||||
});
|
||||
</script>
|
||||
529
docs/streams/developer-guide/interactive-queries.html
Normal file
529
docs/streams/developer-guide/interactive-queries.html
Normal file
@@ -0,0 +1,529 @@
|
||||
<!--
|
||||
Licensed to the Apache Software Foundation (ASF) under one or more
|
||||
contributor license agreements. See the NOTICE file distributed with
|
||||
this work for additional information regarding copyright ownership.
|
||||
The ASF licenses this file to You under the Apache License, Version 2.0
|
||||
(the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
-->
|
||||
|
||||
<script><!--#include virtual="../../js/templateData.js" --></script>
|
||||
|
||||
<script id="content-template" type="text/x-handlebars-template">
|
||||
<!-- h1>Developer Guide for Kafka Streams</h1 -->
|
||||
<div class="sub-nav-sticky">
|
||||
<div class="sticky-top">
|
||||
<!-- div style="height:35px">
|
||||
<a href="/{{version}}/documentation/streams/">Introduction</a>
|
||||
<a class="active-menu-item" href="/{{version}}/documentation/streams/developer-guide">Developer Guide</a>
|
||||
<a href="/{{version}}/documentation/streams/core-concepts">Concepts</a>
|
||||
<a href="/{{version}}/documentation/streams/quickstart">Run Demo App</a>
|
||||
<a href="/{{version}}/documentation/streams/tutorial">Tutorial: Write App</a>
|
||||
</div -->
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div class="section" id="interactive-queries">
|
||||
<span id="streams-developer-guide-interactive-queries"></span><h1>Interactive Queries<a class="headerlink" href="#interactive-queries" title="Permalink to this headline"></a></h1>
|
||||
<p>Interactive queries allow you to leverage the state of your application from outside your application. The Kafka Streams enables your applications to be queryable.</p>
|
||||
<div class="contents local topic" id="table-of-contents">
|
||||
<p class="topic-title first"><b>Table of Contents</b></p>
|
||||
<ul class="simple">
|
||||
<li><a class="reference internal" href="#querying-local-state-stores-for-an-app-instance" id="id3">Querying local state stores for an app instance</a><ul>
|
||||
<li><a class="reference internal" href="#querying-local-key-value-stores" id="id4">Querying local key-value stores</a></li>
|
||||
<li><a class="reference internal" href="#querying-local-window-stores" id="id5">Querying local window stores</a></li>
|
||||
<li><a class="reference internal" href="#querying-local-custom-state-stores" id="id6">Querying local custom state stores</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
<li><a class="reference internal" href="#querying-remote-state-stores-for-the-entire-app" id="id7">Querying remote state stores for the entire app</a><ul>
|
||||
<li><a class="reference internal" href="#adding-an-rpc-layer-to-your-application" id="id8">Adding an RPC layer to your application</a></li>
|
||||
<li><a class="reference internal" href="#exposing-the-rpc-endpoints-of-your-application" id="id9">Exposing the RPC endpoints of your application</a></li>
|
||||
<li><a class="reference internal" href="#discovering-and-accessing-application-instances-and-their-local-state-stores" id="id10">Discovering and accessing application instances and their local state stores</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
<li><a class="reference internal" href="#demo-applications" id="id11">Demo applications</a></li>
|
||||
</ul>
|
||||
</div>
|
||||
<p>The full state of your application is typically <a class="reference internal" href="../architecture.html#streams_architecture_state"><span class="std std-ref">split across many distributed instances of your application</span></a>, and across many state stores that are managed locally by these application instances.</p>
|
||||
<div class="figure align-center">
|
||||
<img class="centered" src="/{{version}}/images/streams-interactive-queries-03.png">
|
||||
</div>
|
||||
<p>There are local and remote components to interactively querying the state of your application.</p>
|
||||
<dl class="docutils">
|
||||
<dt>Local state</dt>
|
||||
<dd>An application instance can query the locally managed portion of the state and directly query its own local state stores. You can use the corresponding local data in other parts of your application code, as long as it doesn’t require calling the Kafka Streams API. Querying state stores is always read-only to guarantee that the underlying state stores will never be mutated out-of-band (e.g., you cannot add new entries). State stores should only be mutated by the corresponding processor topology and the input data it operates on. For more information, see <a class="reference internal" href="#streams-developer-guide-interactive-queries-local-stores"><span class="std std-ref">Querying local state stores for an app instance</span></a>.</dd>
|
||||
<dt>Remote state</dt>
|
||||
<dd><p class="first">To query the full state of your application, you must connect the various fragments of the state, including:</p>
|
||||
<ul class="simple">
|
||||
<li>query local state stores</li>
|
||||
<li>discover all running instances of your application in the network and their state stores</li>
|
||||
<li>communicate with these instances over the network (e.g., an RPC layer)</li>
|
||||
</ul>
|
||||
<p class="last">Connecting these fragments enables communication between instances of the same app and communication from other applications for interactive queries. For more information, see <a class="reference internal" href="#streams-developer-guide-interactive-queries-discovery"><span class="std std-ref">Querying remote state stores for the entire app</span></a>.</p>
|
||||
</dd>
|
||||
</dl>
|
||||
<p>Kafka Streams natively provides all of the required functionality for interactively querying the state of your application, except if you want to expose the full state of your application via interactive queries. To allow application instances to communicate over the network, you must add a Remote Procedure Call (RPC) layer to your application (e.g., REST API).</p>
|
||||
<p>This table shows the Kafka Streams native communication support for various procedures.</p>
|
||||
<table border="1" class="docutils">
|
||||
<colgroup>
|
||||
<col width="42%" />
|
||||
<col width="27%" />
|
||||
<col width="31%" />
|
||||
</colgroup>
|
||||
<thead valign="bottom">
|
||||
<tr class="row-odd"><th class="head">Procedure</th>
|
||||
<th class="head">Application instance</th>
|
||||
<th class="head">Entire application</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody valign="top">
|
||||
<tr class="row-even"><td>Query local state stores of an app instance</td>
|
||||
<td>Supported</td>
|
||||
<td>Supported</td>
|
||||
</tr>
|
||||
<tr class="row-odd"><td>Make an app instance discoverable to others</td>
|
||||
<td>Supported</td>
|
||||
<td>Supported</td>
|
||||
</tr>
|
||||
<tr class="row-even"><td>Discover all running app instances and their state stores</td>
|
||||
<td>Supported</td>
|
||||
<td>Supported</td>
|
||||
</tr>
|
||||
<tr class="row-odd"><td>Communicate with app instances over the network (RPC)</td>
|
||||
<td>Supported</td>
|
||||
<td>Not supported (you must configure)</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
<div class="section" id="querying-local-state-stores-for-an-app-instance">
|
||||
<span id="streams-developer-guide-interactive-queries-local-stores"></span><h2><a class="toc-backref" href="#id3">Querying local state stores for an app instance</a><a class="headerlink" href="#querying-local-state-stores-for-an-app-instance" title="Permalink to this headline"></a></h2>
|
||||
<p>A Kafka Streams application typically runs on multiple instances. The state that is locally available on any given instance is only a subset of the <a class="reference internal" href="../architecture.html#streams-architecture-state"><span class="std std-ref">application’s entire state</span></a>. Querying the local stores on an instance will only return data locally available on that particular instance.</p>
|
||||
<p>The method <code class="docutils literal"><span class="pre">KafkaStreams#store(...)</span></code> finds an application instance’s local state stores by name and type.</p>
|
||||
<div class="figure align-center" id="id1">
|
||||
<img class="centered" src="/{{version}}/images/streams-interactive-queries-api-01.png">
|
||||
<p class="caption"><span class="caption-text">Every application instance can directly query any of its local state stores.</span></p>
|
||||
</div>
|
||||
<p>The <em>name</em> of a state store is defined when you create the store. You can create the store explicitly by using the Processor API or implicitly by using stateful operations in the DSL.</p>
|
||||
<p>The <em>type</em> of a state store is defined by <code class="docutils literal"><span class="pre">QueryableStoreType</span></code>. You can access the built-in types via the class <code class="docutils literal"><span class="pre">QueryableStoreTypes</span></code>.
|
||||
Kafka Streams currently has two built-in types:</p>
|
||||
<ul class="simple">
|
||||
<li>A key-value store <code class="docutils literal"><span class="pre">QueryableStoreTypes#keyValueStore()</span></code>, see <a class="reference internal" href="#streams-developer-guide-interactive-queries-local-key-value-stores"><span class="std std-ref">Querying local key-value stores</span></a>.</li>
|
||||
<li>A window store <code class="docutils literal"><span class="pre">QueryableStoreTypes#windowStore()</span></code>, see <a class="reference internal" href="#streams-developer-guide-interactive-queries-local-window-stores"><span class="std std-ref">Querying local window stores</span></a>.</li>
|
||||
</ul>
|
||||
<p>You can also <a class="reference internal" href="#streams-developer-guide-interactive-queries-custom-stores"><span class="std std-ref">implement your own QueryableStoreType</span></a> as described in section <a class="reference internal" href="#streams-developer-guide-interactive-queries-custom-stores"><span class="std std-ref">Querying local custom state stores</span></a>.</p>
|
||||
<div class="admonition note">
|
||||
<p><b>Note</b></p>
|
||||
<p class="last">Kafka Streams materializes one state store per stream partition. This means your application will potentially manage
|
||||
many underlying state stores. The API enables you to query all of the underlying stores without having to know which
|
||||
partition the data is in.</p>
|
||||
</div>
|
||||
<div class="section" id="querying-local-key-value-stores">
|
||||
<span id="streams-developer-guide-interactive-queries-local-key-value-stores"></span><h3><a class="toc-backref" href="#id4">Querying local key-value stores</a><a class="headerlink" href="#querying-local-key-value-stores" title="Permalink to this headline"></a></h3>
|
||||
<p>To query a local key-value store, you must first create a topology with a key-value store. This example creates a key-value
|
||||
store named “CountsKeyValueStore”. This store will hold the latest count for any word that is found on the topic “word-count-input”.</p>
|
||||
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="n">Properties </span> <span class="n">props</span> <span class="o">=</span> <span class="o">...;</span>
|
||||
<span class="n">StreamsBuilder</span> <span class="n">builder</span> <span class="o">=</span> <span class="o">...;</span>
|
||||
<span class="n">KStream</span><span class="o"><</span><span class="n">String</span><span class="o">,</span> <span class="n">String</span><span class="o">></span> <span class="n">textLines</span> <span class="o">=</span> <span class="o">...;</span>
|
||||
|
||||
<span class="c1">// Define the processing topology (here: WordCount)</span>
|
||||
<span class="n">KGroupedStream</span><span class="o"><</span><span class="n">String</span><span class="o">,</span> <span class="n">String</span><span class="o">></span> <span class="n">groupedByWord</span> <span class="o">=</span> <span class="n">textLines</span>
|
||||
<span class="o">.</span><span class="na">flatMapValues</span><span class="o">(</span><span class="n">value</span> <span class="o">-></span> <span class="n">Arrays</span><span class="o">.</span><span class="na">asList</span><span class="o">(</span><span class="n">value</span><span class="o">.</span><span class="na">toLowerCase</span><span class="o">().</span><span class="na">split</span><span class="o">(</span><span class="s">"\\W+"</span><span class="o">)))</span>
|
||||
<span class="o">.</span><span class="na">groupBy</span><span class="o">((</span><span class="n">key</span><span class="o">,</span> <span class="n">word</span><span class="o">)</span> <span class="o">-></span> <span class="n">word</span><span class="o">,</span> <span class="n">Grouped</span><span class="o">.</span><span class="na">with</span><span class="o">(</span><span class="n">stringSerde</span><span class="o">,</span> <span class="n">stringSerde</span><span class="o">));</span>
|
||||
|
||||
<span class="c1">// Create a key-value store named "CountsKeyValueStore" for the all-time word counts</span>
|
||||
<span class="n">groupedByWord</span><span class="o">.</span><span class="na">count</span><span class="o">(</span><span class="n">Materialized</span><span class="o">.<</span><span class="n">String</span><span class="o">,</span> <span class="n">String</span><span class="o">,</span> <span class="n">KeyValueStore</span><span class="o"><</span><span class="n">Bytes</span><span class="o">,</span> <span class="kt">byte</span><span class="o">[]></span><span class="n">as</span><span class="o">(</span><span class="s">"CountsKeyValueStore"</span><span class="o">));</span>
|
||||
|
||||
<span class="c1">// Start an instance of the topology</span>
|
||||
<span class="n">KafkaStreams</span> <span class="n">streams</span> <span class="o">=</span> <span class="k">new</span> <span class="n">KafkaStreams</span><span class="o">(</span><span class="n">builder</span><span class="o">,</span> <span class="n">props</span><span class="o">);</span>
|
||||
<span class="n">streams</span><span class="o">.</span><span class="na">start</span><span class="o">();</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>After the application has started, you can get access to “CountsKeyValueStore” and then query it via the <a class="reference external" href="https://github.com/apache/kafka/blob/1.0/streams/src/main/java/org/apache/kafka/streams/state/ReadOnlyKeyValueStore.java">ReadOnlyKeyValueStore</a> API:</p>
|
||||
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="c1">// Get the key-value store CountsKeyValueStore</span>
|
||||
<span class="n">ReadOnlyKeyValueStore</span><span class="o"><</span><span class="n">String</span><span class="o">,</span> <span class="n">Long</span><span class="o">></span> <span class="n">keyValueStore</span> <span class="o">=</span>
|
||||
<span class="n">streams</span><span class="o">.</span><span class="na">store</span><span class="o">(</span><span class="s">"CountsKeyValueStore"</span><span class="o">,</span> <span class="n">QueryableStoreTypes</span><span class="o">.</span><span class="na">keyValueStore</span><span class="o">());</span>
|
||||
|
||||
<span class="c1">// Get value by key</span>
|
||||
<span class="n">System</span><span class="o">.</span><span class="na">out</span><span class="o">.</span><span class="na">println</span><span class="o">(</span><span class="s">"count for hello:"</span> <span class="o">+</span> <span class="n">keyValueStore</span><span class="o">.</span><span class="na">get</span><span class="o">(</span><span class="s">"hello"</span><span class="o">));</span>
|
||||
|
||||
<span class="c1">// Get the values for a range of keys available in this application instance</span>
|
||||
<span class="n">KeyValueIterator</span><span class="o"><</span><span class="n">String</span><span class="o">,</span> <span class="n">Long</span><span class="o">></span> <span class="n">range</span> <span class="o">=</span> <span class="n">keyValueStore</span><span class="o">.</span><span class="na">range</span><span class="o">(</span><span class="s">"all"</span><span class="o">,</span> <span class="s">"streams"</span><span class="o">);</span>
|
||||
<span class="k">while</span> <span class="o">(</span><span class="n">range</span><span class="o">.</span><span class="na">hasNext</span><span class="o">())</span> <span class="o">{</span>
|
||||
<span class="n">KeyValue</span><span class="o"><</span><span class="n">String</span><span class="o">,</span> <span class="n">Long</span><span class="o">></span> <span class="n">next</span> <span class="o">=</span> <span class="n">range</span><span class="o">.</span><span class="na">next</span><span class="o">();</span>
|
||||
<span class="n">System</span><span class="o">.</span><span class="na">out</span><span class="o">.</span><span class="na">println</span><span class="o">(</span><span class="s">"count for "</span> <span class="o">+</span> <span class="n">next</span><span class="o">.</span><span class="na">key</span> <span class="o">+</span> <span class="s">": "</span> <span class="o">+</span> <span class="n">next</span><span class="o">.</span><span class="na">value</span><span class="o">);</span>
|
||||
<span class="o">}</span>
|
||||
|
||||
<span class="c1">// Get the values for all of the keys available in this application instance</span>
|
||||
<span class="n">KeyValueIterator</span><span class="o"><</span><span class="n">String</span><span class="o">,</span> <span class="n">Long</span><span class="o">></span> <span class="n">range</span> <span class="o">=</span> <span class="n">keyValueStore</span><span class="o">.</span><span class="na">all</span><span class="o">();</span>
|
||||
<span class="k">while</span> <span class="o">(</span><span class="n">range</span><span class="o">.</span><span class="na">hasNext</span><span class="o">())</span> <span class="o">{</span>
|
||||
<span class="n">KeyValue</span><span class="o"><</span><span class="n">String</span><span class="o">,</span> <span class="n">Long</span><span class="o">></span> <span class="n">next</span> <span class="o">=</span> <span class="n">range</span><span class="o">.</span><span class="na">next</span><span class="o">();</span>
|
||||
<span class="n">System</span><span class="o">.</span><span class="na">out</span><span class="o">.</span><span class="na">println</span><span class="o">(</span><span class="s">"count for "</span> <span class="o">+</span> <span class="n">next</span><span class="o">.</span><span class="na">key</span> <span class="o">+</span> <span class="s">": "</span> <span class="o">+</span> <span class="n">next</span><span class="o">.</span><span class="na">value</span><span class="o">);</span>
|
||||
<span class="o">}</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>You can also materialize the results of stateless operators by using the overloaded methods that take a <code class="docutils literal"><span class="pre">queryableStoreName</span></code>
|
||||
as shown in the example below:</p>
|
||||
<div class="highlight-java"><div class="highlight"><pre><span></span>
|
||||
<span class="n">StreamsBuilder</span> <span class="n">builder</span> <span class="o">=</span> <span class="o">...;</span>
|
||||
<span class="n">KTable</span><span class="o"><</span><span class="n">String</span><span class="o">,</span> <span class="n">Integer</span><span class="o">></span> <span class="n">regionCounts</span> <span class="o">=</span> <span class="o">...;</span>
|
||||
|
||||
<span class="c1">// materialize the result of filtering corresponding to odd numbers</span>
|
||||
<span class="c1">// the "queryableStoreName" can be subsequently queried.</span>
|
||||
<span class="n">KTable</span><span class="o"><</span><span class="n">String</span><span class="o">,</span> <span class="n">Integer</span><span class="o">></span> <span class="n">oddCounts</span> <span class="o">=</span> <span class="n">numberLines</span><span class="o">.</span><span class="na">filter</span><span class="o">((</span><span class="n">region</span><span class="o">,</span> <span class="n">count</span><span class="o">)</span> <span class="o">-></span> <span class="o">(</span><span class="n">count</span> <span class="o">%</span> <span class="mi">2</span> <span class="o">!=</span> <span class="mi">0</span><span class="o">),</span>
|
||||
<span class="n">Materialized</span><span class="o">.<</span><span class="n">String</span><span class="o">,</span> <span class="n">Integer</span><span class="o">,</span> <span class="n">KeyValueStore</span><span class="o"><</span><span class="n">Bytes</span><span class="o">,</span> <span class="kt">byte</span><span class="o">[]></span><span class="n">as</span><span class="o">(</span><span class="s">"queryableStoreName"</span><span class="o">));</span>
|
||||
|
||||
<span class="c1">// do not materialize the result of filtering corresponding to even numbers</span>
|
||||
<span class="c1">// this means that these results will not be materialized and cannot be queried.</span>
|
||||
<span class="n">KTable</span><span class="o"><</span><span class="n">String</span><span class="o">,</span> <span class="n">Integer</span><span class="o">></span> <span class="n">oddCounts</span> <span class="o">=</span> <span class="n">numberLines</span><span class="o">.</span><span class="na">filter</span><span class="o">((</span><span class="n">region</span><span class="o">,</span> <span class="n">count</span><span class="o">)</span> <span class="o">-></span> <span class="o">(</span><span class="n">count</span> <span class="o">%</span> <span class="mi">2</span> <span class="o">==</span> <span class="mi">0</span><span class="o">));</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div>
|
||||
<div class="section" id="querying-local-window-stores">
|
||||
<span id="streams-developer-guide-interactive-queries-local-window-stores"></span><h3><a class="toc-backref" href="#id5">Querying local window stores</a><a class="headerlink" href="#querying-local-window-stores" title="Permalink to this headline"></a></h3>
|
||||
<p>A window store will potentially have many results for any given key because the key can be present in multiple windows.
|
||||
However, there is only one result per window for a given key.</p>
|
||||
<p>To query a local window store, you must first create a topology with a window store. This example creates a window store
|
||||
named “CountsWindowStore” that contains the counts for words in 1-minute windows.</p>
|
||||
<div class="highlight-java"><div class="highlight"><pre><span></span>
|
||||
<span class="n">StreamsBuilder</span> <span class="n">builder</span> <span class="o">=</span> <span class="o">...;</span>
|
||||
<span class="n">KStream</span><span class="o"><</span><span class="n">String</span><span class="o">,</span> <span class="n">String</span><span class="o">></span> <span class="n">textLines</span> <span class="o">=</span> <span class="o">...;</span>
|
||||
|
||||
<span class="c1">// Define the processing topology (here: WordCount)</span>
|
||||
<span class="n">KGroupedStream</span><span class="o"><</span><span class="n">String</span><span class="o">,</span> <span class="n">String</span><span class="o">></span> <span class="n">groupedByWord</span> <span class="o">=</span> <span class="n">textLines</span>
|
||||
<span class="o">.</span><span class="na">flatMapValues</span><span class="o">(</span><span class="n">value</span> <span class="o">-></span> <span class="n">Arrays</span><span class="o">.</span><span class="na">asList</span><span class="o">(</span><span class="n">value</span><span class="o">.</span><span class="na">toLowerCase</span><span class="o">().</span><span class="na">split</span><span class="o">(</span><span class="s">"\\W+"</span><span class="o">)))</span>
|
||||
<span class="o">.</span><span class="na">groupBy</span><span class="o">((</span><span class="n">key</span><span class="o">,</span> <span class="n">word</span><span class="o">)</span> <span class="o">-></span> <span class="n">word</span><span class="o">,</span> <span class="n">Grouped</span><span class="o">.</span><span class="na">with</span><span class="o">(</span><span class="n">stringSerde</span><span class="o">,</span> <span class="n">stringSerde</span><span class="o">));</span>
|
||||
|
||||
<span class="c1">// Create a window state store named "CountsWindowStore" that contains the word counts for every minute</span>
|
||||
<span class="n">groupedByWord</span><span class="o">.</span><span class="na">windowedBy</span><span class="o">(</span><span class="n">TimeWindows</span><span class="o">.</span><span class="na">of</span><span class="o">(<span class="n">Duration</span><span class="o">.</span><span class="na">ofSeconds</span><span class="o">(</span><span class="mi">60</span><span class="o">)))</span>
|
||||
<span class="o">.</span><span class="na">count</span><span class="o">(</span><span class="n">Materialized</span><span class="o">.<</span><span class="n">String</span><span class="o">,</span> <span class="n">Long</span><span class="o">,</span> <span class="n">WindowStore</span><span class="o"><</span><span class="n">Bytes</span><span class="o">,</span> <span class="kt">byte</span><span class="o">[]></span><span class="n">as</span><span class="o">(</span><span class="s">"CountsWindowStore"</span><span class="o">));</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>After the application has started, you can get access to “CountsWindowStore” and then query it via the <a class="reference external" href="https://github.com/apache/kafka/blob/1.0/streams/src/main/java/org/apache/kafka/streams/state/ReadOnlyWindowStore.java">ReadOnlyWindowStore</a> API:</p>
|
||||
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="c1">// Get the window store named "CountsWindowStore"</span>
|
||||
<span class="n">ReadOnlyWindowStore</span><span class="o"><</span><span class="n">String</span><span class="o">,</span> <span class="n">Long</span><span class="o">></span> <span class="n">windowStore</span> <span class="o">=</span>
|
||||
<span class="n">streams</span><span class="o">.</span><span class="na">store</span><span class="o">(</span><span class="s">"CountsWindowStore"</span><span class="o">,</span> <span class="n">QueryableStoreTypes</span><span class="o">.</span><span class="na">windowStore</span><span class="o">());</span>
|
||||
|
||||
<span class="c1">// Fetch values for the key "world" for all of the windows available in this application instance.</span>
|
||||
<span class="c1">// To get *all* available windows we fetch windows from the beginning of time until now.</span>
|
||||
<span class="kt">Instant</span> <span class="n">timeFrom</span> <span class="o">=</span> <span class="na">Instant</span><span class="o">.</span><span class="na">ofEpochMilli<span class="o">(</span><span class="mi">0</span><span class="o">);</span> <span class="c1">// beginning of time = oldest available</span>
|
||||
<span class="kt">Instant</span> <span class="n">timeTo</span> <span class="o">=</span> <span class="n">Instant</span><span class="o">.</span><span class="na">now</span><span class="o">();</span> <span class="c1">// now (in processing-time)</span>
|
||||
<span class="n">WindowStoreIterator</span><span class="o"><</span><span class="n">Long</span><span class="o">></span> <span class="n">iterator</span> <span class="o">=</span> <span class="n">windowStore</span><span class="o">.</span><span class="na">fetch</span><span class="o">(</span><span class="s">"world"</span><span class="o">,</span> <span class="n">timeFrom</span><span class="o">,</span> <span class="n">timeTo</span><span class="o">);</span>
|
||||
<span class="k">while</span> <span class="o">(</span><span class="n">iterator</span><span class="o">.</span><span class="na">hasNext</span><span class="o">())</span> <span class="o">{</span>
|
||||
<span class="n">KeyValue</span><span class="o"><</span><span class="n">Long</span><span class="o">,</span> <span class="n">Long</span><span class="o">></span> <span class="n">next</span> <span class="o">=</span> <span class="n">iterator</span><span class="o">.</span><span class="na">next</span><span class="o">();</span>
|
||||
<span class="kt">long</span> <span class="n">windowTimestamp</span> <span class="o">=</span> <span class="n">next</span><span class="o">.</span><span class="na">key</span><span class="o">;</span>
|
||||
<span class="n">System</span><span class="o">.</span><span class="na">out</span><span class="o">.</span><span class="na">println</span><span class="o">(</span><span class="s">"Count of 'world' @ time "</span> <span class="o">+</span> <span class="n">windowTimestamp</span> <span class="o">+</span> <span class="s">" is "</span> <span class="o">+</span> <span class="n">next</span><span class="o">.</span><span class="na">value</span><span class="o">);</span>
|
||||
<span class="o">}</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div>
|
||||
<div class="section" id="querying-local-custom-state-stores">
|
||||
<span id="streams-developer-guide-interactive-queries-custom-stores"></span><h3><a class="toc-backref" href="#id6">Querying local custom state stores</a><a class="headerlink" href="#querying-local-custom-state-stores" title="Permalink to this headline"></a></h3>
|
||||
<div class="admonition note">
|
||||
<p><b>Note</b></p>
|
||||
<p class="last">Only the <a class="reference internal" href="processor-api.html#streams-developer-guide-processor-api"><span class="std std-ref">Processor API</span></a> supports custom state stores.</p>
|
||||
</div>
|
||||
<p>Before querying the custom state stores you must implement these interfaces:</p>
|
||||
<ul class="simple">
|
||||
<li>Your custom state store must implement <code class="docutils literal"><span class="pre">StateStore</span></code>.</li>
|
||||
<li>You must have an interface to represent the operations available on the store.</li>
|
||||
<li>You must provide an implementation of <code class="docutils literal"><span class="pre">StoreBuilder</span></code> for creating instances of your store.</li>
|
||||
<li>It is recommended that you provide an interface that restricts access to read-only operations. This prevents users of this API from mutating the state of your running Kafka Streams application out-of-band.</li>
|
||||
</ul>
|
||||
<p>The class/interface hierarchy for your custom store might look something like:</p>
|
||||
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="kd">public</span> <span class="kd">class</span> <span class="nc">MyCustomStore</span><span class="o"><</span><span class="n">K</span><span class="o">,</span><span class="n">V</span><span class="o">></span> <span class="kd">implements</span> <span class="n">StateStore</span><span class="o">,</span> <span class="n">MyWriteableCustomStore</span><span class="o"><</span><span class="n">K</span><span class="o">,</span><span class="n">V</span><span class="o">></span> <span class="o">{</span>
|
||||
<span class="c1">// implementation of the actual store</span>
|
||||
<span class="o">}</span>
|
||||
|
||||
<span class="c1">// Read-write interface for MyCustomStore</span>
|
||||
<span class="kd">public</span> <span class="kd">interface</span> <span class="nc">MyWriteableCustomStore</span><span class="o"><</span><span class="n">K</span><span class="o">,</span><span class="n">V</span><span class="o">></span> <span class="kd">extends</span> <span class="n">MyReadableCustomStore</span><span class="o"><</span><span class="n">K</span><span class="o">,</span><span class="n">V</span><span class="o">></span> <span class="o">{</span>
|
||||
<span class="kt">void</span> <span class="nf">write</span><span class="o">(</span><span class="n">K</span> <span class="n">Key</span><span class="o">,</span> <span class="n">V</span> <span class="n">value</span><span class="o">);</span>
|
||||
<span class="o">}</span>
|
||||
|
||||
<span class="c1">// Read-only interface for MyCustomStore</span>
|
||||
<span class="kd">public</span> <span class="kd">interface</span> <span class="nc">MyReadableCustomStore</span><span class="o"><</span><span class="n">K</span><span class="o">,</span><span class="n">V</span><span class="o">></span> <span class="o">{</span>
|
||||
<span class="n">V</span> <span class="nf">read</span><span class="o">(</span><span class="n">K</span> <span class="n">key</span><span class="o">);</span>
|
||||
<span class="o">}</span>
|
||||
|
||||
<span class="kd">public</span> <span class="kd">class</span> <span class="nc">MyCustomStoreBuilder</span> <span class="kd">implements</span> <span class="n">StoreBuilder</span> <span class="o">{</span>
|
||||
<span class="c1">// implementation of the supplier for MyCustomStore</span>
|
||||
<span class="o">}</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>To make this store queryable you must:</p>
|
||||
<ul class="simple">
|
||||
<li>Provide an implementation of <a class="reference external" href="https://github.com/apache/kafka/blob/1.0/streams/src/main/java/org/apache/kafka/streams/state/QueryableStoreType.java">QueryableStoreType</a>.</li>
|
||||
<li>Provide a wrapper class that has access to all of the underlying instances of the store and is used for querying.</li>
|
||||
</ul>
|
||||
<p>Here is how to implement <code class="docutils literal"><span class="pre">QueryableStoreType</span></code>:</p>
|
||||
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="kd">public</span> <span class="kd">class</span> <span class="nc">MyCustomStoreType</span><span class="o"><</span><span class="n">K</span><span class="o">,</span><span class="n">V</span><span class="o">></span> <span class="kd">implements</span> <span class="n">QueryableStoreType</span><span class="o"><</span><span class="n">MyReadableCustomStore</span><span class="o"><</span><span class="n">K</span><span class="o">,</span><span class="n">V</span><span class="o">>></span> <span class="o">{</span>
|
||||
|
||||
<span class="c1">// Only accept StateStores that are of type MyCustomStore</span>
|
||||
<span class="kd">public</span> <span class="kt">boolean</span> <span class="nf">accepts</span><span class="o">(</span><span class="kd">final</span> <span class="n">StateStore</span> <span class="n">stateStore</span><span class="o">)</span> <span class="o">{</span>
|
||||
<span class="k">return</span> <span class="n">stateStore</span> <span class="n">instanceOf</span> <span class="n">MyCustomStore</span><span class="o">;</span>
|
||||
<span class="o">}</span>
|
||||
|
||||
<span class="kd">public</span> <span class="n">MyReadableCustomStore</span><span class="o"><</span><span class="n">K</span><span class="o">,</span><span class="n">V</span><span class="o">></span> <span class="nf">create</span><span class="o">(</span><span class="kd">final</span> <span class="n">StateStoreProvider</span> <span class="n">storeProvider</span><span class="o">,</span> <span class="kd">final</span> <span class="n">String</span> <span class="n">storeName</span><span class="o">)</span> <span class="o">{</span>
|
||||
<span class="k">return</span> <span class="k">new</span> <span class="n">MyCustomStoreTypeWrapper</span><span class="o">(</span><span class="n">storeProvider</span><span class="o">,</span> <span class="n">storeName</span><span class="o">,</span> <span class="k">this</span><span class="o">);</span>
|
||||
<span class="o">}</span>
|
||||
|
||||
<span class="o">}</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>A wrapper class is required because each instance of a Kafka Streams application may run multiple stream tasks and manage
|
||||
multiple local instances of a particular state store. The wrapper class hides this complexity and lets you query a “logical”
|
||||
state store by name without having to know about all of the underlying local instances of that state store.</p>
|
||||
<p>When implementing your wrapper class you must use the
|
||||
<a class="reference external" href="https://github.com/apache/kafka/blob/1.0/streams/src/main/java/org/apache/kafka/streams/state/internals/StateStoreProvider.java">StateStoreProvider</a>
|
||||
interface to get access to the underlying instances of your store.
|
||||
<code class="docutils literal"><span class="pre">StateStoreProvider#stores(String</span> <span class="pre">storeName,</span> <span class="pre">QueryableStoreType<T></span> <span class="pre">queryableStoreType)</span></code> returns a <code class="docutils literal"><span class="pre">List</span></code> of state
|
||||
stores with the given storeName and of the type as defined by <code class="docutils literal"><span class="pre">queryableStoreType</span></code>.</p>
|
||||
<p>Here is an example implementation of the wrapper follows (Java 8+):</p>
|
||||
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="c1">// We strongly recommended implementing a read-only interface</span>
|
||||
<span class="c1">// to restrict usage of the store to safe read operations!</span>
|
||||
<span class="kd">public</span> <span class="kd">class</span> <span class="nc">MyCustomStoreTypeWrapper</span><span class="o"><</span><span class="n">K</span><span class="o">,</span><span class="n">V</span><span class="o">></span> <span class="kd">implements</span> <span class="n">MyReadableCustomStore</span><span class="o"><</span><span class="n">K</span><span class="o">,</span><span class="n">V</span><span class="o">></span> <span class="o">{</span>
|
||||
|
||||
<span class="kd">private</span> <span class="kd">final</span> <span class="n">QueryableStoreType</span><span class="o"><</span><span class="n">MyReadableCustomStore</span><span class="o"><</span><span class="n">K</span><span class="o">,</span> <span class="n">V</span><span class="o">>></span> <span class="n">customStoreType</span><span class="o">;</span>
|
||||
<span class="kd">private</span> <span class="kd">final</span> <span class="n">String</span> <span class="n">storeName</span><span class="o">;</span>
|
||||
<span class="kd">private</span> <span class="kd">final</span> <span class="n">StateStoreProvider</span> <span class="n">provider</span><span class="o">;</span>
|
||||
|
||||
<span class="kd">public</span> <span class="nf">CustomStoreTypeWrapper</span><span class="o">(</span><span class="kd">final</span> <span class="n">StateStoreProvider</span> <span class="n">provider</span><span class="o">,</span>
|
||||
<span class="kd">final</span> <span class="n">String</span> <span class="n">storeName</span><span class="o">,</span>
|
||||
<span class="kd">final</span> <span class="n">QueryableStoreType</span><span class="o"><</span><span class="n">MyReadableCustomStore</span><span class="o"><</span><span class="n">K</span><span class="o">,</span> <span class="n">V</span><span class="o">>></span> <span class="n">customStoreType</span><span class="o">)</span> <span class="o">{</span>
|
||||
|
||||
<span class="c1">// ... assign fields ...</span>
|
||||
<span class="o">}</span>
|
||||
|
||||
<span class="c1">// Implement a safe read method</span>
|
||||
<span class="nd">@Override</span>
|
||||
<span class="kd">public</span> <span class="n">V</span> <span class="nf">read</span><span class="o">(</span><span class="kd">final</span> <span class="n">K</span> <span class="n">key</span><span class="o">)</span> <span class="o">{</span>
|
||||
<span class="c1">// Get all the stores with storeName and of customStoreType</span>
|
||||
<span class="kd">final</span> <span class="n">List</span><span class="o"><</span><span class="n">MyReadableCustomStore</span><span class="o"><</span><span class="n">K</span><span class="o">,</span> <span class="n">V</span><span class="o">>></span> <span class="n">stores</span> <span class="o">=</span> <span class="n">provider</span><span class="o">.</span><span class="na">getStores</span><span class="o">(</span><span class="n">storeName</span><span class="o">,</span> <span class="n">customStoreType</span><span class="o">);</span>
|
||||
<span class="c1">// Try and find the value for the given key</span>
|
||||
<span class="kd">final</span> <span class="n">Optional</span><span class="o"><</span><span class="n">V</span><span class="o">></span> <span class="n">value</span> <span class="o">=</span> <span class="n">stores</span><span class="o">.</span><span class="na">stream</span><span class="o">().</span><span class="na">filter</span><span class="o">(</span><span class="n">store</span> <span class="o">-></span> <span class="n">store</span><span class="o">.</span><span class="na">read</span><span class="o">(</span><span class="n">key</span><span class="o">)</span> <span class="o">!=</span> <span class="kc">null</span><span class="o">).</span><span class="na">findFirst</span><span class="o">();</span>
|
||||
<span class="c1">// Return the value if it exists</span>
|
||||
<span class="k">return</span> <span class="n">value</span><span class="o">.</span><span class="na">orElse</span><span class="o">(</span><span class="kc">null</span><span class="o">);</span>
|
||||
<span class="o">}</span>
|
||||
|
||||
<span class="o">}</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>You can now find and query your custom store:</p>
|
||||
<div class="highlight-java"><div class="highlight"><pre><span></span>
|
||||
<span class="n">Topology</span> <span class="n">topology</span> <span class="o">=</span> <span class="o">...;</span>
|
||||
<span class="n">ProcessorSupplier</span> <span class="n">processorSuppler</span> <span class="o">=</span> <span class="o">...;</span>
|
||||
|
||||
<span class="c1">// Create CustomStoreSupplier for store name the-custom-store</span>
|
||||
<span class="n">MyCustomStoreBuilder</span> <span class="n">customStoreBuilder</span> <span class="o">=</span> <span class="k">new</span> <span class="n">MyCustomStoreBuilder</span><span class="o">(</span><span class="s">"the-custom-store"</span><span class="o">)</span> <span class="c1">//...;</span>
|
||||
<span class="c1">// Add the source topic</span>
|
||||
<span class="n">topology</span><span class="o">.</span><span class="na">addSource</span><span class="o">(</span><span class="s">"input"</span><span class="o">,</span> <span class="s">"inputTopic"</span><span class="o">);</span>
|
||||
<span class="c1">// Add a custom processor that reads from the source topic</span>
|
||||
<span class="n">topology</span><span class="o">.</span><span class="na">addProcessor</span><span class="o">(</span><span class="s">"the-processor"</span><span class="o">,</span> <span class="n">processorSupplier</span><span class="o">,</span> <span class="s">"input"</span><span class="o">);</span>
|
||||
<span class="c1">// Connect your custom state store to the custom processor above</span>
|
||||
<span class="n">topology</span><span class="o">.</span><span class="na">addStateStore</span><span class="o">(</span><span class="n">customStoreBuilder</span><span class="o">,</span> <span class="s">"the-processor"</span><span class="o">);</span>
|
||||
|
||||
<span class="n">KafkaStreams</span> <span class="n">streams</span> <span class="o">=</span> <span class="k">new</span> <span class="n">KafkaStreams</span><span class="o">(</span><span class="n">topology</span><span class="o">,</span> <span class="n">config</span><span class="o">);</span>
|
||||
<span class="n">streams</span><span class="o">.</span><span class="na">start</span><span class="o">();</span>
|
||||
|
||||
<span class="c1">// Get access to the custom store</span>
|
||||
<span class="n">MyReadableCustomStore</span><span class="o"><</span><span class="n">String</span><span class="o">,</span><span class="n">String</span><span class="o">></span> <span class="n">store</span> <span class="o">=</span> <span class="n">streams</span><span class="o">.</span><span class="na">store</span><span class="o">(</span><span class="s">"the-custom-store"</span><span class="o">,</span> <span class="k">new</span> <span class="n">MyCustomStoreType</span><span class="o"><</span><span class="n">String</span><span class="o">,</span><span class="n">String</span><span class="o">>());</span>
|
||||
<span class="c1">// Query the store</span>
|
||||
<span class="n">String</span> <span class="n">value</span> <span class="o">=</span> <span class="n">store</span><span class="o">.</span><span class="na">read</span><span class="o">(</span><span class="s">"key"</span><span class="o">);</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
<div class="section" id="querying-remote-state-stores-for-the-entire-app">
|
||||
<span id="streams-developer-guide-interactive-queries-discovery"></span><h2><a class="toc-backref" href="#id7">Querying remote state stores for the entire app</a><a class="headerlink" href="#querying-remote-state-stores-for-the-entire-app" title="Permalink to this headline"></a></h2>
|
||||
<p>To query remote states for the entire app, you must expose the application’s full state to other applications, including
|
||||
applications that are running on different machines.</p>
|
||||
<p>For example, you have a Kafka Streams application that processes user events in a multi-player video game, and you want to retrieve the latest status of each user directly and display it in a mobile app. Here are the required steps to make the full state of your application queryable:</p>
|
||||
<ol class="arabic simple">
|
||||
<li><a class="reference internal" href="#streams-developer-guide-interactive-queries-rpc-layer"><span class="std std-ref">Add an RPC layer to your application</span></a> so that
|
||||
the instances of your application can be interacted with via the network (e.g., a REST API, Thrift, a custom protocol,
|
||||
and so on). The instances must respond to interactive queries. You can follow the reference examples provided to get
|
||||
started.</li>
|
||||
<li><a class="reference internal" href="#streams-developer-guide-interactive-queries-expose-rpc"><span class="std std-ref">Expose the RPC endpoints</span></a> of
|
||||
your application’s instances via the <code class="docutils literal"><span class="pre">application.server</span></code> configuration setting of Kafka Streams. Because RPC
|
||||
endpoints must be unique within a network, each instance has its own value for this configuration setting.
|
||||
This makes an application instance discoverable by other instances.</li>
|
||||
<li>In the RPC layer, <a class="reference internal" href="#streams-developer-guide-interactive-queries-discover-app-instances-and-stores"><span class="std std-ref">discover remote application instances</span></a> and their state stores and <a class="reference internal" href="#streams-developer-guide-interactive-queries-local-stores"><span class="std std-ref">query locally available state stores</span></a> to make the full state of your application queryable. The remote application instances can forward queries to other app instances if a particular instance lacks the local data to respond to a query. The locally available state stores can directly respond to queries.</li>
|
||||
</ol>
|
||||
<div class="figure align-center" id="id2">
|
||||
<img class="centered" src="/{{version}}/images/streams-interactive-queries-api-02.png">
|
||||
<p class="caption"><span class="caption-text">Discover any running instances of the same application as well as the respective RPC endpoints they expose for
|
||||
interactive queries</span></p>
|
||||
</div>
|
||||
<div class="section" id="adding-an-rpc-layer-to-your-application">
|
||||
<span id="streams-developer-guide-interactive-queries-rpc-layer"></span><h3><a class="toc-backref" href="#id8">Adding an RPC layer to your application</a><a class="headerlink" href="#adding-an-rpc-layer-to-your-application" title="Permalink to this headline"></a></h3>
|
||||
<p>There are many ways to add an RPC layer. The only requirements are that the RPC layer is embedded within the Kafka Streams
|
||||
application and that it exposes an endpoint that other application instances and applications can connect to.</p>
|
||||
</div>
|
||||
<div class="section" id="exposing-the-rpc-endpoints-of-your-application">
|
||||
<span id="streams-developer-guide-interactive-queries-expose-rpc"></span><h3><a class="toc-backref" href="#id9">Exposing the RPC endpoints of your application</a><a class="headerlink" href="#exposing-the-rpc-endpoints-of-your-application" title="Permalink to this headline"></a></h3>
|
||||
<p>To enable remote state store discovery in a distributed Kafka Streams application, you must set the <a class="reference internal" href="config-streams.html#streams-developer-guide-required-configs"><span class="std std-ref">configuration property</span></a> in the config properties.
|
||||
The <code class="docutils literal"><span class="pre">application.server</span></code> property defines a unique <code class="docutils literal"><span class="pre">host:port</span></code> pair that points to the RPC endpoint of the respective instance of a Kafka Streams application.
|
||||
The value of this configuration property will vary across the instances of your application.
|
||||
When this property is set, Kafka Streams will keep track of the RPC endpoint information for every instance of an application, its state stores, and assigned stream partitions through instances of <a class="reference external" href="/{{version}}/javadoc/org/apache/kafka/streams/state/StreamsMetadata.html">StreamsMetadata</a>.</p>
|
||||
<div class="admonition tip">
|
||||
<p><b>Tip</b></p>
|
||||
<p class="last">Consider leveraging the exposed RPC endpoints of your application for further functionality, such as
|
||||
piggybacking additional inter-application communication that goes beyond interactive queries.</p>
|
||||
</div>
|
||||
<p>This example shows how to configure and run a Kafka Streams application that supports the discovery of its state stores.</p>
|
||||
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="n">Properties</span> <span class="n">props</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Properties</span><span class="o">();</span>
|
||||
<span class="c1">// Set the unique RPC endpoint of this application instance through which it</span>
|
||||
<span class="c1">// can be interactively queried. In a real application, the value would most</span>
|
||||
<span class="c1">// probably not be hardcoded but derived dynamically.</span>
|
||||
<span class="n">String</span> <span class="n">rpcEndpoint</span> <span class="o">=</span> <span class="s">"host1:4460"</span><span class="o">;</span>
|
||||
<span class="n">props</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">StreamsConfig</span><span class="o">.</span><span class="na">APPLICATION_SERVER_CONFIG</span><span class="o">,</span> <span class="n">rpcEndpoint</span><span class="o">);</span>
|
||||
<span class="c1">// ... further settings may follow here ...</span>
|
||||
|
||||
<span class="n">StreamsBuilder</span> <span class="n">builder</span> <span class="o">=</span> <span class="k">new</span> <span class="n">StreamsBuilder</span><span class="o">();</span>
|
||||
|
||||
<span class="n">KStream</span><span class="o"><</span><span class="n">String</span><span class="o">,</span> <span class="n">String</span><span class="o">></span> <span class="n">textLines</span> <span class="o">=</span> <span class="n">builder</span><span class="o">.</span><span class="na">stream</span><span class="o">(</span><span class="n">stringSerde</span><span class="o">,</span> <span class="n">stringSerde</span><span class="o">,</span> <span class="s">"word-count-input"</span><span class="o">);</span>
|
||||
|
||||
<span class="kd">final</span> <span class="n">KGroupedStream</span><span class="o"><</span><span class="n">String</span><span class="o">,</span> <span class="n">String</span><span class="o">></span> <span class="n">groupedByWord</span> <span class="o">=</span> <span class="n">textLines</span>
|
||||
<span class="o">.</span><span class="na">flatMapValues</span><span class="o">(</span><span class="n">value</span> <span class="o">-></span> <span class="n">Arrays</span><span class="o">.</span><span class="na">asList</span><span class="o">(</span><span class="n">value</span><span class="o">.</span><span class="na">toLowerCase</span><span class="o">().</span><span class="na">split</span><span class="o">(</span><span class="s">"\\W+"</span><span class="o">)))</span>
|
||||
<span class="o">.</span><span class="na">groupBy</span><span class="o">((</span><span class="n">key</span><span class="o">,</span> <span class="n">word</span><span class="o">)</span> <span class="o">-></span> <span class="n">word</span><span class="o">,</span> <span class="n">Grouped</span><span class="o">.</span><span class="na">with</span><span class="o">(</span><span class="n">stringSerde</span><span class="o">,</span> <span class="n">stringSerde</span><span class="o">));</span>
|
||||
|
||||
<span class="c1">// This call to `count()` creates a state store named "word-count".</span>
|
||||
<span class="c1">// The state store is discoverable and can be queried interactively.</span>
|
||||
<span class="n">groupedByWord</span><span class="o">.</span><span class="na">count</span><span class="o">(</span><span class="n">Materialized</span><span class="o">.<</span><span class="n">String</span><span class="o">,</span> <span class="n">Long</span><span class="o">,</span> <span class="n">KeyValueStore</span><span class="o"><</span><span class="n">Bytes</span><span class="o">,</span> <span class="kt">byte</span><span class="o">[]></span><span class="n">as</span><span class="o">(</span><span class="s">"word-count"</span><span class="o">));</span>
|
||||
|
||||
<span class="c1">// Start an instance of the topology</span>
|
||||
<span class="n">KafkaStreams</span> <span class="n">streams</span> <span class="o">=</span> <span class="k">new</span> <span class="n">KafkaStreams</span><span class="o">(</span><span class="n">builder</span><span class="o">,</span> <span class="n">props</span><span class="o">);</span>
|
||||
<span class="n">streams</span><span class="o">.</span><span class="na">start</span><span class="o">();</span>
|
||||
|
||||
<span class="c1">// Then, create and start the actual RPC service for remote access to this</span>
|
||||
<span class="c1">// application instance's local state stores.</span>
|
||||
<span class="c1">//</span>
|
||||
<span class="c1">// This service should be started on the same host and port as defined above by</span>
|
||||
<span class="c1">// the property `StreamsConfig.APPLICATION_SERVER_CONFIG`. The example below is</span>
|
||||
<span class="c1">// fictitious, but we provide end-to-end demo applications (such as KafkaMusicExample)</span>
|
||||
<span class="c1">// that showcase how to implement such a service to get you started.</span>
|
||||
<span class="n">MyRPCService</span> <span class="n">rpcService</span> <span class="o">=</span> <span class="o">...;</span>
|
||||
<span class="n">rpcService</span><span class="o">.</span><span class="na">listenAt</span><span class="o">(</span><span class="n">rpcEndpoint</span><span class="o">);</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div>
|
||||
<div class="section" id="discovering-and-accessing-application-instances-and-their-local-state-stores">
|
||||
<span id="streams-developer-guide-interactive-queries-discover-app-instances-and-stores"></span><h3><a class="toc-backref" href="#id10">Discovering and accessing application instances and their local state stores</a><a class="headerlink" href="#discovering-and-accessing-application-instances-and-their-local-state-stores" title="Permalink to this headline"></a></h3>
|
||||
<p>The following methods return <a class="reference external" href="/{{version}}/javadoc/org/apache/kafka/streams/state/StreamsMetadata.html">StreamsMetadata</a> objects, which provide meta-information about application instances such as their RPC endpoint and locally available state stores.</p>
|
||||
<ul class="simple">
|
||||
<li><code class="docutils literal"><span class="pre">KafkaStreams#allMetadata()</span></code>: find all instances of this application</li>
|
||||
<li><code class="docutils literal"><span class="pre">KafkaStreams#allMetadataForStore(String</span> <span class="pre">storeName)</span></code>: find those applications instances that manage local instances of the state store “storeName”</li>
|
||||
<li><code class="docutils literal"><span class="pre">KafkaStreams#metadataForKey(String</span> <span class="pre">storeName,</span> <span class="pre">K</span> <span class="pre">key,</span> <span class="pre">Serializer<K></span> <span class="pre">keySerializer)</span></code>: using the default stream partitioning strategy, find the one application instance that holds the data for the given key in the given state store</li>
|
||||
<li><code class="docutils literal"><span class="pre">KafkaStreams#metadataForKey(String</span> <span class="pre">storeName,</span> <span class="pre">K</span> <span class="pre">key,</span> <span class="pre">StreamPartitioner<K,</span> <span class="pre">?></span> <span class="pre">partitioner)</span></code>: using <code class="docutils literal"><span class="pre">partitioner</span></code>, find the one application instance that holds the data for the given key in the given state store</li>
|
||||
</ul>
|
||||
<div class="admonition attention">
|
||||
<p class="first admonition-title">Attention</p>
|
||||
<p class="last">If <code class="docutils literal"><span class="pre">application.server</span></code> is not configured for an application instance, then the above methods will not find any <a class="reference external" href="/{{version}}/javadoc/org/apache/kafka/streams/state/StreamsMetadata.html">StreamsMetadata</a> for it.</p>
|
||||
</div>
|
||||
<p>For example, we can now find the <code class="docutils literal"><span class="pre">StreamsMetadata</span></code> for the state store named “word-count” that we defined in the
|
||||
code example shown in the previous section:</p>
|
||||
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="n">KafkaStreams</span> <span class="n">streams</span> <span class="o">=</span> <span class="o">...;</span>
|
||||
<span class="c1">// Find all the locations of local instances of the state store named "word-count"</span>
|
||||
<span class="n">Collection</span><span class="o"><</span><span class="n">StreamsMetadata</span><span class="o">></span> <span class="n">wordCountHosts</span> <span class="o">=</span> <span class="n">streams</span><span class="o">.</span><span class="na">allMetadataForStore</span><span class="o">(</span><span class="s">"word-count"</span><span class="o">);</span>
|
||||
|
||||
<span class="c1">// For illustrative purposes, we assume using an HTTP client to talk to remote app instances.</span>
|
||||
<span class="n">HttpClient</span> <span class="n">http</span> <span class="o">=</span> <span class="o">...;</span>
|
||||
|
||||
<span class="c1">// Get the word count for word (aka key) 'alice': Approach 1</span>
|
||||
<span class="c1">//</span>
|
||||
<span class="c1">// We first find the one app instance that manages the count for 'alice' in its local state stores.</span>
|
||||
<span class="n">StreamsMetadata</span> <span class="n">metadata</span> <span class="o">=</span> <span class="n">streams</span><span class="o">.</span><span class="na">metadataForKey</span><span class="o">(</span><span class="s">"word-count"</span><span class="o">,</span> <span class="s">"alice"</span><span class="o">,</span> <span class="n">Serdes</span><span class="o">.</span><span class="na">String</span><span class="o">().</span><span class="na">serializer</span><span class="o">());</span>
|
||||
<span class="c1">// Then, we query only that single app instance for the latest count of 'alice'.</span>
|
||||
<span class="c1">// Note: The RPC URL shown below is fictitious and only serves to illustrate the idea. Ultimately,</span>
|
||||
<span class="c1">// the URL (or, in general, the method of communication) will depend on the RPC layer you opted to</span>
|
||||
<span class="c1">// implement. Again, we provide end-to-end demo applications (such as KafkaMusicExample) that showcase</span>
|
||||
<span class="c1">// how to implement such an RPC layer.</span>
|
||||
<span class="n">Long</span> <span class="n">result</span> <span class="o">=</span> <span class="n">http</span><span class="o">.</span><span class="na">getLong</span><span class="o">(</span><span class="s">"http://"</span> <span class="o">+</span> <span class="n">metadata</span><span class="o">.</span><span class="na">host</span><span class="o">()</span> <span class="o">+</span> <span class="s">":"</span> <span class="o">+</span> <span class="n">metadata</span><span class="o">.</span><span class="na">port</span><span class="o">()</span> <span class="o">+</span> <span class="s">"/word-count/alice"</span><span class="o">);</span>
|
||||
|
||||
<span class="c1">// Get the word count for word (aka key) 'alice': Approach 2</span>
|
||||
<span class="c1">//</span>
|
||||
<span class="c1">// Alternatively, we could also choose (say) a brute-force approach where we query every app instance</span>
|
||||
<span class="c1">// until we find the one that happens to know about 'alice'.</span>
|
||||
<span class="n">Optional</span><span class="o"><</span><span class="n">Long</span><span class="o">></span> <span class="n">result</span> <span class="o">=</span> <span class="n">streams</span><span class="o">.</span><span class="na">allMetadataForStore</span><span class="o">(</span><span class="s">"word-count"</span><span class="o">)</span>
|
||||
<span class="o">.</span><span class="na">stream</span><span class="o">()</span>
|
||||
<span class="o">.</span><span class="na">map</span><span class="o">(</span><span class="n">streamsMetadata</span> <span class="o">-></span> <span class="o">{</span>
|
||||
<span class="c1">// Construct the (fictituous) full endpoint URL to query the current remote application instance</span>
|
||||
<span class="n">String</span> <span class="n">url</span> <span class="o">=</span> <span class="s">"http://"</span> <span class="o">+</span> <span class="n">streamsMetadata</span><span class="o">.</span><span class="na">host</span><span class="o">()</span> <span class="o">+</span> <span class="s">":"</span> <span class="o">+</span> <span class="n">streamsMetadata</span><span class="o">.</span><span class="na">port</span><span class="o">()</span> <span class="o">+</span> <span class="s">"/word-count/alice"</span><span class="o">;</span>
|
||||
<span class="c1">// Read and return the count for 'alice', if any.</span>
|
||||
<span class="k">return</span> <span class="n">http</span><span class="o">.</span><span class="na">getLong</span><span class="o">(</span><span class="n">url</span><span class="o">);</span>
|
||||
<span class="o">})</span>
|
||||
<span class="o">.</span><span class="na">filter</span><span class="o">(</span><span class="n">s</span> <span class="o">-></span> <span class="n">s</span> <span class="o">!=</span> <span class="kc">null</span><span class="o">)</span>
|
||||
<span class="o">.</span><span class="na">findFirst</span><span class="o">();</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>At this point the full state of the application is interactively queryable:</p>
|
||||
<ul class="simple">
|
||||
<li>You can discover the running instances of the application and the state stores they manage locally.</li>
|
||||
<li>Through the RPC layer that was added to the application, you can communicate with these application instances over the
|
||||
network and query them for locally available state.</li>
|
||||
<li>The application instances are able to serve such queries because they can directly query their own local state stores
|
||||
and respond via the RPC layer.</li>
|
||||
<li>Collectively, this allows us to query the full state of the entire application.</li>
|
||||
</ul>
|
||||
<p>To see an end-to-end application with interactive queries, review the
|
||||
<a class="reference internal" href="#streams-developer-guide-interactive-queries-demos"><span class="std std-ref">demo applications</span></a>.</p>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
|
||||
</div>
|
||||
</div>
|
||||
<div class="pagination">
|
||||
<a href="/{{version}}/documentation/streams/developer-guide/testing" class="pagination__btn pagination__btn__prev">Previous</a>
|
||||
<a href="/{{version}}/documentation/streams/developer-guide/memory-mgmt" class="pagination__btn pagination__btn__next">Next</a>
|
||||
</div>
|
||||
</script>
|
||||
|
||||
<!--#include virtual="../../../includes/_header.htm" -->
|
||||
<!--#include virtual="../../../includes/_top.htm" -->
|
||||
<div class="content documentation documentation--current">
|
||||
<!--#include virtual="../../../includes/_nav.htm" -->
|
||||
<div class="right">
|
||||
<!--#include virtual="../../../includes/_docs_banner.htm" -->
|
||||
<ul class="breadcrumbs">
|
||||
<li><a href="/documentation">Documentation</a></li>
|
||||
<li><a href="/documentation/streams">Kafka Streams</a></li>
|
||||
<li><a href="/documentation/streams/developer-guide/">Developer Guide</a></li>
|
||||
</ul>
|
||||
<div class="p-content"></div>
|
||||
</div>
|
||||
</div>
|
||||
<!--#include virtual="../../../includes/_footer.htm" -->
|
||||
<script>
|
||||
$(function() {
|
||||
// Show selected style on nav item
|
||||
$('.b-nav__streams').addClass('selected');
|
||||
|
||||
//sticky secondary nav
|
||||
var $navbar = $(".sub-nav-sticky"),
|
||||
y_pos = $navbar.offset().top,
|
||||
height = $navbar.height();
|
||||
|
||||
$(window).scroll(function() {
|
||||
var scrollTop = $(window).scrollTop();
|
||||
|
||||
if (scrollTop > y_pos - height) {
|
||||
$navbar.addClass("navbar-fixed")
|
||||
} else if (scrollTop <= y_pos) {
|
||||
$navbar.removeClass("navbar-fixed")
|
||||
}
|
||||
});
|
||||
|
||||
// Display docs subnav items
|
||||
$('.b-nav__docs').parent().toggleClass('nav__item__with__subs--expanded');
|
||||
});
|
||||
</script>
|
||||
129
docs/streams/developer-guide/manage-topics.html
Normal file
129
docs/streams/developer-guide/manage-topics.html
Normal file
@@ -0,0 +1,129 @@
|
||||
<!--
|
||||
Licensed to the Apache Software Foundation (ASF) under one or more
|
||||
contributor license agreements. See the NOTICE file distributed with
|
||||
this work for additional information regarding copyright ownership.
|
||||
The ASF licenses this file to You under the Apache License, Version 2.0
|
||||
(the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
-->
|
||||
|
||||
<script><!--#include virtual="../../js/templateData.js" --></script>
|
||||
|
||||
<script id="content-template" type="text/x-handlebars-template">
|
||||
<!-- h1>Developer Guide for Kafka Streams</h1 -->
|
||||
<div class="sub-nav-sticky">
|
||||
<div class="sticky-top">
|
||||
<!-- div style="height:35px">
|
||||
<a href="/{{version}}/documentation/streams/">Introduction</a>
|
||||
<a class="active-menu-item" href="/{{version}}/documentation/streams/developer-guide">Developer Guide</a>
|
||||
<a href="/{{version}}/documentation/streams/core-concepts">Concepts</a>
|
||||
<a href="/{{version}}/documentation/streams/quickstart">Run Demo App</a>
|
||||
<a href="/{{version}}/documentation/streams/tutorial">Tutorial: Write App</a>
|
||||
</div -->
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div class="section" id="managing-streams-application-topics">
|
||||
<span id="streams-developer-guide-topics"></span><h1>Managing Streams Application Topics<a class="headerlink" href="#managing-streams-application-topics" title="Permalink to this headline"></a></h1>
|
||||
<p>A Kafka Streams application continuously reads from Kafka topics, processes the read data, and then
|
||||
writes the processing results back into Kafka topics. The application may also auto-create other Kafka topics in the
|
||||
Kafka brokers, for example state store changelogs topics. This section describes the differences these topic types and
|
||||
how to manage the topics and your applications.</p>
|
||||
<p>Kafka Streams distinguishes between <a class="reference internal" href="#streams-developer-guide-topics-user"><span class="std std-ref">user topics</span></a> and
|
||||
<a class="reference internal" href="#streams-developer-guide-topics-internal"><span class="std std-ref">internal topics</span></a>.</p>
|
||||
<div class="section" id="user-topics">
|
||||
<span id="streams-developer-guide-topics-user"></span><h2>User topics<a class="headerlink" href="#user-topics" title="Permalink to this headline"></a></h2>
|
||||
<p>User topics exist externally to an application and are read from or written to by the application, including:</p>
|
||||
<dl class="docutils">
|
||||
<dt>Input topics</dt>
|
||||
<dd>Topics that are specified via source processors in the application’s topology; e.g. via <code class="docutils literal"><span class="pre">StreamsBuilder#stream()</span></code>, <code class="docutils literal"><span class="pre">StreamsBuilder#table()</span></code> and <code class="docutils literal"><span class="pre">Topology#addSource()</span></code>.</dd>
|
||||
<dt>Output topics</dt>
|
||||
<dd>Topics that are specified via sink processors in the application’s topology; e.g. via
|
||||
<code class="docutils literal"><span class="pre">KStream#to()</span></code>, <code class="docutils literal"><span class="pre">KTable.to()</span></code> and <code class="docutils literal"><span class="pre">Topology#addSink()</span></code>.</dd>
|
||||
<dt>Intermediate topics</dt>
|
||||
<dd>Topics that are both input and output topics of the application’s topology; e.g. via
|
||||
<code class="docutils literal"><span class="pre">KStream#through()</span></code>.</dd>
|
||||
</dl>
|
||||
<p>User topics must be created and manually managed ahead of time (e.g., via the
|
||||
<a class="reference internal" href="../../kafka/post-deployment.html#kafka-operations-admin"><span class="std std-ref">topic tools</span></a>). If user topics are shared among multiple applications for reading and
|
||||
writing, the application users must coordinate topic management. If user topics are centrally managed, then application
|
||||
users then would not need to manage topics themselves but simply obtain access to them.</p>
|
||||
<div class="admonition note">
|
||||
<p class="first admonition-title">Note</p>
|
||||
<p>You should not use the auto-create topic feature on the brokers to create user topics, because:</p>
|
||||
<ul class="last simple">
|
||||
<li>Auto-creation of topics may be disabled in your Kafka cluster.</li>
|
||||
<li>Auto-creation automatically applies the default topic settings such as the replicaton factor. These default settings might not be what you want for certain output topics (e.g., <code class="docutils literal"><span class="pre">auto.create.topics.enable=true</span></code> in the <a class="reference external" href="http://kafka.apache.org/0100/documentation.html#brokerconfigs">Kafka broker configuration</a>).</li>
|
||||
</ul>
|
||||
</div>
|
||||
</div>
|
||||
<div class="section" id="internal-topics">
|
||||
<span id="streams-developer-guide-topics-internal"></span><h2>Internal topics<a class="headerlink" href="#internal-topics" title="Permalink to this headline"></a></h2>
|
||||
<p>Internal topics are used internally by the Kafka Streams application while executing, for example the
|
||||
changelog topics for state stores. These topics are created by the application and are only used by that stream application.</p>
|
||||
<p>If security is enabled on the Kafka brokers, you must grant the underlying clients admin permissions so that they can
|
||||
create internal topics set. For more information, see <a class="reference internal" href="security.html#streams-developer-guide-security"><span class="std std-ref">Streams Security</span></a>.</p>
|
||||
<div class="admonition note">
|
||||
<p class="first admonition-title">Note</p>
|
||||
<p class="last">The internal topics follow the naming convention <code class="docutils literal"><span class="pre"><application.id>-<operatorName>-<suffix></span></code>, but this convention
|
||||
is not guaranteed for future releases.</p>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
|
||||
</div>
|
||||
</div>
|
||||
<div class="pagination">
|
||||
<a href="/{{version}}/documentation/streams/developer-guide/running-app" class="pagination__btn pagination__btn__prev">Previous</a>
|
||||
<a href="/{{version}}/documentation/streams/developer-guide/security" class="pagination__btn pagination__btn__next">Next</a>
|
||||
</div>
|
||||
</script>
|
||||
|
||||
<!--#include virtual="../../../includes/_header.htm" -->
|
||||
<!--#include virtual="../../../includes/_top.htm" -->
|
||||
<div class="content documentation documentation--current">
|
||||
<!--#include virtual="../../../includes/_nav.htm" -->
|
||||
<div class="right">
|
||||
<!--#include virtual="../../../includes/_docs_banner.htm" -->
|
||||
<ul class="breadcrumbs">
|
||||
<li><a href="/documentation">Documentation</a></li>
|
||||
<li><a href="/documentation/streams">Kafka Streams</a></li>
|
||||
<li><a href="/documentation/streams/developer-guide/">Developer Guide</a></li>
|
||||
</ul>
|
||||
<div class="p-content"></div>
|
||||
</div>
|
||||
</div>
|
||||
<!--#include virtual="../../../includes/_footer.htm" -->
|
||||
<script>
|
||||
$(function() {
|
||||
// Show selected style on nav item
|
||||
$('.b-nav__streams').addClass('selected');
|
||||
|
||||
//sticky secondary nav
|
||||
var $navbar = $(".sub-nav-sticky"),
|
||||
y_pos = $navbar.offset().top,
|
||||
height = $navbar.height();
|
||||
|
||||
$(window).scroll(function() {
|
||||
var scrollTop = $(window).scrollTop();
|
||||
|
||||
if (scrollTop > y_pos - height) {
|
||||
$navbar.addClass("navbar-fixed")
|
||||
} else if (scrollTop <= y_pos) {
|
||||
$navbar.removeClass("navbar-fixed")
|
||||
}
|
||||
});
|
||||
|
||||
// Display docs subnav items
|
||||
$('.b-nav__docs').parent().toggleClass('nav__item__with__subs--expanded');
|
||||
});
|
||||
</script>
|
||||
289
docs/streams/developer-guide/memory-mgmt.html
Normal file
289
docs/streams/developer-guide/memory-mgmt.html
Normal file
@@ -0,0 +1,289 @@
|
||||
<!--
|
||||
Licensed to the Apache Software Foundation (ASF) under one or more
|
||||
contributor license agreements. See the NOTICE file distributed with
|
||||
this work for additional information regarding copyright ownership.
|
||||
The ASF licenses this file to You under the Apache License, Version 2.0
|
||||
(the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
-->
|
||||
|
||||
<script><!--#include virtual="../../js/templateData.js" --></script>
|
||||
|
||||
<script id="content-template" type="text/x-handlebars-template">
|
||||
<!-- h1>Developer Guide for Kafka Streams</h1 -->
|
||||
<div class="sub-nav-sticky">
|
||||
<div class="sticky-top">
|
||||
<!-- div style="height:35px">
|
||||
<a href="/{{version}}/documentation/streams/">Introduction</a>
|
||||
<a class="active-menu-item" href="/{{version}}/documentation/streams/developer-guide">Developer Guide</a>
|
||||
<a href="/{{version}}/documentation/streams/core-concepts">Concepts</a>
|
||||
<a href="/{{version}}/documentation/streams/quickstart">Run Demo App</a>
|
||||
<a href="/{{version}}/documentation/streams/tutorial">Tutorial: Write App</a>
|
||||
</div -->
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div class="section" id="memory-management">
|
||||
<span id="streams-developer-guide-memory-management"></span><h1>Memory Management<a class="headerlink" href="#memory-management" title="Permalink to this headline"></a></h1>
|
||||
<p>You can specify the total memory (RAM) size used for internal caching and compacting of records. This caching happens
|
||||
before the records are written to state stores or forwarded downstream to other nodes.</p>
|
||||
<p>The record caches are implemented slightly different in the DSL and Processor API.</p>
|
||||
<div class="contents local topic" id="table-of-contents">
|
||||
<p class="topic-title first"><b>Table of Contents</b></p>
|
||||
<ul class="simple">
|
||||
<li><a class="reference internal" href="#record-caches-in-the-dsl" id="id1">Record caches in the DSL</a></li>
|
||||
<li><a class="reference internal" href="#record-caches-in-the-processor-api" id="id2">Record caches in the Processor API</a></li>
|
||||
<li><a class="reference internal" href="#rocksdb" id="id3">RocksDB</a></li>
|
||||
<li><a class="reference internal" href="#other-memory-usage" id="id4">Other memory usage</a></li>
|
||||
</ul>
|
||||
</div>
|
||||
<div class="section" id="record-caches-in-the-dsl">
|
||||
<span id="streams-developer-guide-memory-management-record-cache"></span><h2><a class="toc-backref" href="#id1">Record caches in the DSL</a><a class="headerlink" href="#record-caches-in-the-dsl" title="Permalink to this headline"></a></h2>
|
||||
<p>You can specify the total memory (RAM) size of the record cache for an instance of the processing topology. It is leveraged
|
||||
by the following <code class="docutils literal"><span class="pre">KTable</span></code> instances:</p>
|
||||
<ul class="simple">
|
||||
<li>Source <code class="docutils literal"><span class="pre">KTable</span></code>: <code class="docutils literal"><span class="pre">KTable</span></code> instances that are created via <code class="docutils literal"><span class="pre">StreamsBuilder#table()</span></code> or <code class="docutils literal"><span class="pre">StreamsBuilder#globalTable()</span></code>.</li>
|
||||
<li>Aggregation <code class="docutils literal"><span class="pre">KTable</span></code>: instances of <code class="docutils literal"><span class="pre">KTable</span></code> that are created as a result of <a class="reference internal" href="dsl-api.html#streams-developer-guide-dsl-aggregating"><span class="std std-ref">aggregations</span></a>.</li>
|
||||
</ul>
|
||||
<p>For such <code class="docutils literal"><span class="pre">KTable</span></code> instances, the record cache is used for:</p>
|
||||
<ul class="simple">
|
||||
<li>Internal caching and compacting of output records before they are written by the underlying stateful
|
||||
<a class="reference internal" href="../core-concepts#streams_processor_node"><span class="std std-ref">processor node</span></a> to its internal state stores.</li>
|
||||
<li>Internal caching and compacting of output records before they are forwarded from the underlying stateful
|
||||
<a class="reference internal" href="../core-concepts#streams_processor_node"><span class="std std-ref">processor node</span></a> to any of its downstream processor nodes.</li>
|
||||
</ul>
|
||||
<p>Use the following example to understand the behaviors with and without record caching. In this example, the input is a
|
||||
<code class="docutils literal"><span class="pre">KStream<String,</span> <span class="pre">Integer></span></code> with the records <code class="docutils literal"><span class="pre"><K,V>:</span> <span class="pre"><A,</span> <span class="pre">1>,</span> <span class="pre"><D,</span> <span class="pre">5>,</span> <span class="pre"><A,</span> <span class="pre">20>,</span> <span class="pre"><A,</span> <span class="pre">300></span></code>. The focus in this example is
|
||||
on the records with key == <code class="docutils literal"><span class="pre">A</span></code>.</p>
|
||||
<ul>
|
||||
<li><p class="first">An <a class="reference internal" href="dsl-api.html#streams-developer-guide-dsl-aggregating"><span class="std std-ref">aggregation</span></a> computes the sum of record values, grouped by key, for
|
||||
the input and returns a <code class="docutils literal"><span class="pre">KTable<String,</span> <span class="pre">Integer></span></code>.</p>
|
||||
<blockquote>
|
||||
<div><ul class="simple">
|
||||
<li><strong>Without caching</strong>: a sequence of output records is emitted for key <code class="docutils literal"><span class="pre">A</span></code> that represent changes in the
|
||||
resulting aggregation table. The parentheses (<code class="docutils literal"><span class="pre">()</span></code>) denote changes, the left number is the new aggregate value
|
||||
and the right number is the old aggregate value: <code class="docutils literal"><span class="pre"><A,</span> <span class="pre">(1,</span> <span class="pre">null)>,</span> <span class="pre"><A,</span> <span class="pre">(21,</span> <span class="pre">1)>,</span> <span class="pre"><A,</span> <span class="pre">(321,</span> <span class="pre">21)></span></code>.</li>
|
||||
<li><strong>With caching</strong>: a single output record is emitted for key <code class="docutils literal"><span class="pre">A</span></code> that would likely be compacted in the cache,
|
||||
leading to a single output record of <code class="docutils literal"><span class="pre"><A,</span> <span class="pre">(321,</span> <span class="pre">null)></span></code>. This record is written to the aggregation’s internal state
|
||||
store and forwarded to any downstream operations.</li>
|
||||
</ul>
|
||||
</div></blockquote>
|
||||
</li>
|
||||
</ul>
|
||||
<p>The cache size is specified through the <code class="docutils literal"><span class="pre">cache.max.bytes.buffering</span></code> parameter, which is a global setting per
|
||||
processing topology:</p>
|
||||
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="c1">// Enable record cache of size 10 MB.</span>
|
||||
<span class="n">Properties</span> <span class="n">props</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Properties</span><span class="o">();</span>
|
||||
<span class="n">props</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">StreamsConfig</span><span class="o">.</span><span class="na">CACHE_MAX_BYTES_BUFFERING_CONFIG</span><span class="o">,</span> <span class="mi">10</span> <span class="o">*</span> <span class="mi">1024</span> <span class="o">*</span> <span class="mi">1024L</span><span class="o">);</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>This parameter controls the number of bytes allocated for caching. Specifically, for a processor topology instance with
|
||||
<code class="docutils literal"><span class="pre">T</span></code> threads and <code class="docutils literal"><span class="pre">C</span></code> bytes allocated for caching, each thread will have an even <code class="docutils literal"><span class="pre">C/T</span></code> bytes to construct its own
|
||||
cache and use as it sees fit among its tasks. This means that there are as many caches as there are threads, but no sharing of
|
||||
caches across threads happens.</p>
|
||||
<p>The basic API for the cache is made of <code class="docutils literal"><span class="pre">put()</span></code> and <code class="docutils literal"><span class="pre">get()</span></code> calls. Records are
|
||||
evicted using a simple LRU scheme after the cache size is reached. The first time a keyed record <code class="docutils literal"><span class="pre">R1</span> <span class="pre">=</span> <span class="pre"><K1,</span> <span class="pre">V1></span></code>
|
||||
finishes processing at a node, it is marked as dirty in the cache. Any other keyed record <code class="docutils literal"><span class="pre">R2</span> <span class="pre">=</span> <span class="pre"><K1,</span> <span class="pre">V2></span></code> with the
|
||||
same key <code class="docutils literal"><span class="pre">K1</span></code> that is processed on that node during that time will overwrite <code class="docutils literal"><span class="pre"><K1,</span> <span class="pre">V1></span></code>, this is referred to as
|
||||
“being compacted”. This has the same effect as
|
||||
<a class="reference external" href="https://kafka.apache.org/documentation.html#compaction">Kafka’s log compaction</a>, but happens earlier, while the
|
||||
records are still in memory, and within your client-side application, rather than on the server-side (i.e. the Kafka
|
||||
broker). After flushing, <code class="docutils literal"><span class="pre">R2</span></code> is forwarded to the next processing node and then written to the local state store.</p>
|
||||
<p>The semantics of caching is that data is flushed to the state store and forwarded to the next downstream processor node
|
||||
whenever the earliest of <code class="docutils literal"><span class="pre">commit.interval.ms</span></code> or <code class="docutils literal"><span class="pre">cache.max.bytes.buffering</span></code> (cache pressure) hits. Both
|
||||
<code class="docutils literal"><span class="pre">commit.interval.ms</span></code> and <code class="docutils literal"><span class="pre">cache.max.bytes.buffering</span></code> are global parameters. As such, it is not possible to specify
|
||||
different parameters for individual nodes.</p>
|
||||
<p>Here are example settings for both parameters based on desired scenarios.</p>
|
||||
<ul>
|
||||
<li><p class="first">To turn off caching the cache size can be set to zero:</p>
|
||||
<blockquote>
|
||||
<div><div class="highlight-java"><div class="highlight"><pre><span></span><span class="c1">// Disable record cache</span>
|
||||
<span class="n">Properties</span> <span class="n">props</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Properties</span><span class="o">();</span>
|
||||
<span class="n">props</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">StreamsConfig</span><span class="o">.</span><span class="na">CACHE_MAX_BYTES_BUFFERING_CONFIG</span><span class="o">,</span> <span class="mi">0</span><span class="o">);</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>Turning off caching might result in high write traffic for the underlying RocksDB store.
|
||||
With default settings caching is enabled within Kafka Streams but RocksDB caching is disabled.
|
||||
Thus, to avoid high write traffic it is recommended to enable RocksDB caching if Kafka Streams caching is turned off.</p>
|
||||
<p>For example, the RocksDB Block Cache could be set to 100MB and Write Buffer size to 32 MB. For more information, see
|
||||
the <a class="reference internal" href="config-streams.html#streams-developer-guide-rocksdb-config"><span class="std std-ref">RocksDB config</span></a>.</p>
|
||||
</div></blockquote>
|
||||
</li>
|
||||
<li><p class="first">To enable caching but still have an upper bound on how long records will be cached, you can set the commit interval. In this example, it is set to 1000 milliseconds:</p>
|
||||
<blockquote>
|
||||
<div><div class="highlight-java"><div class="highlight"><pre><span></span><span class="n">Properties</span> <span class="n">props</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Properties</span><span class="o">();</span>
|
||||
<span class="c1">// Enable record cache of size 10 MB.</span>
|
||||
<span class="n">props</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">StreamsConfig</span><span class="o">.</span><span class="na">CACHE_MAX_BYTES_BUFFERING_CONFIG</span><span class="o">,</span> <span class="mi">10</span> <span class="o">*</span> <span class="mi">1024</span> <span class="o">*</span> <span class="mi">1024L</span><span class="o">);</span>
|
||||
<span class="c1">// Set commit interval to 1 second.</span>
|
||||
<span class="n">props</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">StreamsConfig</span><span class="o">.</span><span class="na">COMMIT_INTERVAL_MS_CONFIG</span><span class="o">,</span> <span class="mi">1000</span><span class="o">);</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div></blockquote>
|
||||
</li>
|
||||
</ul>
|
||||
<p>The effect of these two configurations is described in the figure below. The records are shown using 4 keys: blue, red, yellow, and green. Assume the cache has space for only 3 keys.</p>
|
||||
<ul>
|
||||
<li><p class="first">When the cache is disabled (a), all of the input records will be output.</p>
|
||||
</li>
|
||||
<li><p class="first">When the cache is enabled (b):</p>
|
||||
<blockquote>
|
||||
<div><ul class="simple">
|
||||
<li>Most records are output at the end of commit intervals (e.g., at <code class="docutils literal"><span class="pre">t1</span></code> a single blue record is output, which is the final over-write of the blue key up to that time).</li>
|
||||
<li>Some records are output because of cache pressure (i.e. before the end of a commit interval). For example, see the red record before <code class="docutils literal"><span class="pre">t2</span></code>. With smaller cache sizes we expect cache pressure to be the primary factor that dictates when records are output. With large cache sizes, the commit interval will be the primary factor.</li>
|
||||
<li>The total number of records output has been reduced from 15 to 8.</li>
|
||||
</ul>
|
||||
</div></blockquote>
|
||||
</li>
|
||||
</ul>
|
||||
<div class="figure align-center">
|
||||
<img class="centered" src="/{{version}}/images/streams-cache-and-commit-interval.png">
|
||||
</div>
|
||||
</div>
|
||||
<div class="section" id="record-caches-in-the-processor-api">
|
||||
<span id="streams-developer-guide-memory-management-state-store-cache"></span><h2><a class="toc-backref" href="#id2">Record caches in the Processor API</a><a class="headerlink" href="#record-caches-in-the-processor-api" title="Permalink to this headline"></a></h2>
|
||||
<p>You can specify the total memory (RAM) size of the record cache for an instance of the processing topology. It is used
|
||||
for internal caching and compacting of output records before they are written from a stateful processor node to its
|
||||
state stores.</p>
|
||||
<p>The record cache in the Processor API does not cache or compact any output records that are being forwarded downstream.
|
||||
This means that all downstream processor nodes can see all records, whereas the state stores see a reduced number of records.
|
||||
This does not impact correctness of the system, but is a performance optimization for the state stores. For example, with the
|
||||
Processor API you can store a record in a state store while forwarding a different value downstream.</p>
|
||||
<p>Following from the example first shown in section <a class="reference internal" href="processor-api.html#streams-developer-guide-state-store"><span class="std std-ref">State Stores</span></a>, to disable caching, you can
|
||||
add the <code class="docutils literal"><span class="pre">withCachingDisabled</span></code> call (note that caches are enabled by default, however there is an explicit <code class="docutils literal"><span class="pre">withCachingEnabled</span></code>
|
||||
call).</p>
|
||||
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="n">StoreBuilder</span> <span class="n">countStoreBuilder</span> <span class="o">=</span>
|
||||
<span class="n">Stores</span><span class="o">.</span><span class="na">keyValueStoreBuilder</span><span class="o">(</span>
|
||||
<span class="n">Stores</span><span class="o">.</span><span class="na">persistentKeyValueStore</span><span class="o">(</span><span class="s">"Counts"</span><span class="o">),</span>
|
||||
<span class="n">Serdes</span><span class="o">.</span><span class="na">String</span><span class="o">(),</span>
|
||||
<span class="n">Serdes</span><span class="o">.</span><span class="na">Long</span><span class="o">())</span>
|
||||
<span class="o">.</span><span class="na">withCachingEnabled</span><span class="o">()</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div>
|
||||
<div class="section" id="rocksdb">
|
||||
<h2><a class="toc-backref" href="#id3">RocksDB</a><a class="headerlink" href="#rocksdb" title="Permalink to this headline"></a></h2>
|
||||
<p> Each instance of RocksDB allocates off-heap memory for a block cache, index and filter blocks, and memtable (write buffer). Critical configs (for RocksDB version 4.1.0) include
|
||||
<code class="docutils literal"><span class="pre">block_cache_size</span></code>, <code class="docutils literal"><span class="pre">write_buffer_size</span></code> and <code class="docutils literal"><span class="pre">max_write_buffer_number</span></code>. These can be specified through the
|
||||
<code class="docutils literal"><span class="pre">rocksdb.config.setter</span></code> configuration.</li>
|
||||
<p> As of 2.3.0 the memory usage across all instances can be bounded, limiting the total off-heap memory of your Kafka Streams application. To do so you must configure RocksDB to cache the index and filter blocks in the block cache, limit the memtable memory through a shared <a class="reference external" href="https://github.com/facebook/rocksdb/wiki/Write-Buffer-Manager">WriteBufferManager</a> and count its memory against the block cache, and then pass the same Cache object to each instance. See <a class="reference external" href="https://github.com/facebook/rocksdb/wiki/Memory-usage-in-RocksDB">RocksDB Memory Usage</a> for details. An example RocksDBConfigSetter implementing this is shown below:</p>
|
||||
|
||||
<div class="highlight-java"><div class="highlight"><pre><span></span> <span class="kd">public</span> <span class="kd">static</span> <span class="kd">class</span> <span class="nc">BoundedMemoryRocksDBConfig</span> <span class="kd">implements</span> <span class="n">RocksDBConfigSetter</span> <span class="o">{</span>
|
||||
|
||||
<span class="kd">private</span> <span class="kt">static</span> <span class="n">org.rocksdb.Cache</span> <span class="n">cache</span> <span class="o">=</span> <span class="k">new</span> <span class="n">org</span><span class="o">.</span><span class="na">rocksdb</span><span class="o">.</span><span class="na">LRUCache</span><span class="o">(</span><span class="mi">TOTAL_OFF_HEAP_MEMORY</span><span class="o">,</span> <span class="n">-1</span><span class="o">,</span> <span class="n">false</span><span class="o">,</span> <span class="n">INDEX_FILTER_BLOCK_RATIO</span><span class="o">);</span><sup><a href="#fn1" id="ref1">1</a></sup>
|
||||
<span class="kd">private</span> <span class="kt">static</span> <span class="n">org.rocksdb.WriteBufferManager</span> <span class="n">writeBufferManager</span> <span class="o">=</span> <span class="k">new</span> <span class="n">org</span><span class="o">.</span><span class="na">rocksdb</span><span class="o">.</span><span class="na">WriteBufferManager</span><span class="o">(</span><span class="mi">TOTAL_MEMTABLE_MEMORY</span><span class="o">,</span> cache<span class="o">);</span>
|
||||
|
||||
<span class="nd">@Override</span>
|
||||
<span class="kd">public</span> <span class="kt">void</span> <span class="nf">setConfig</span><span class="o">(</span><span class="kd">final</span> <span class="n">String</span> <span class="n">storeName</span><span class="o">,</span> <span class="kd">final</span> <span class="n">Options</span> <span class="n">options</span><span class="o">,</span> <span class="kd">final</span> <span class="n">Map</span><span class="o"><</span><span class="n">String</span><span class="o">,</span> <span class="n">Object</span><span class="o">></span> <span class="n">configs</span><span class="o">)</span> <span class="o">{</span>
|
||||
|
||||
<span class="n">BlockBasedTableConfig</span> <span class="n">tableConfig</span> <span class="o">=</span> <span class="k">(BlockBasedTableConfig)</span> <span class="n">options</span><span><span class="o">.</span><span class="na">tableFormatConfig</span><span class="o">();</span>
|
||||
|
||||
<span class="c1"> // These three options in combination will limit the memory used by RocksDB to the size passed to the block cache (TOTAL_OFF_HEAP_MEMORY)</span>
|
||||
<span class="n">tableConfig</span><span class="o">.</span><span class="na">setBlockCache</span><span class="o">(</span><span class="mi">cache</span><span class="o">);</span>
|
||||
<span class="n">tableConfig</span><span class="o">.</span><span class="na">setCacheIndexAndFilterBlocks</span><span class="o">(</span><span class="kc">true</span><span class="o">);</span>
|
||||
<span class="n">options</span><span class="o">.</span><span class="na">setWriteBufferManager</span><span class="o">(</span><span class="mi">writeBufferManager</span><span class="o">);</span>
|
||||
|
||||
<span class="c1"> // These options are recommended to be set when bounding the total memory</span>
|
||||
<span class="n">tableConfig</span><span class="o">.</span><span class="na">setCacheIndexAndFilterBlocksWithHighPriority</span><span class="o">(</span><span class="mi">true</span><span class="o">);</span><sup><a href="#fn2" id="ref2">2</a></sup>
|
||||
<span class="n">tableConfig</span><span class="o">.</span><span class="na">setPinTopLevelIndexAndFilter</span><span class="o">(</span><span class="mi">true</span><span class="o">);</span>
|
||||
<span class="n">tableConfig</span><span class="o">.</span><span class="na">setBlockSize</span><span class="o">(</span><span class="mi">BLOCK_SIZE</span><span class="o">);</span><sup><a href="#fn3" id="ref3">3</a></sup>
|
||||
<span class="n">options</span><span class="o">.</span><span class="na">setMaxWriteBufferNumber</span><span class="o">(</span><span class="mi">N_MEMTABLES</span><span class="o">);</span>
|
||||
<span class="n">options</span><span class="o">.</span><span class="na">setWriteBufferSize</span><span class="o">(</span><span class="mi">MEMTABLE_SIZE</span><span class="o">);</span>
|
||||
|
||||
<span class="n">options</span><span class="o">.</span><span class="na">setTableFormatConfig</span><span class="o">(</span><span class="n">tableConfig</span><span class="o">);</span>
|
||||
<span class="o">}</span>
|
||||
|
||||
<span class="nd">@Override</span>
|
||||
<span class="kd">public</span> <span class="kt">void</span> <span class="nf">close</span><span class="o">(</span><span class="kd">final</span> <span class="n">String</span> <span class="n">storeName</span><span class="o">,</span> <span class="kd">final</span> <span class="n">Options</span> <span class="n">options</span><span class="o">)</span> <span class="o">{</span>
|
||||
<span class="c1">// Cache and WriteBufferManager should not be closed here, as the same objects are shared by every store instance.</span>
|
||||
<span class="o">}</span>
|
||||
<span class="o">}</span>
|
||||
</div>
|
||||
<sup id="fn1">1. INDEX_FILTER_BLOCK_RATIO can be used to set a fraction of the block cache to set aside for "high priority" (aka index and filter) blocks, preventing them from being evicted by data blocks. See the full signature of the <a class="reference external" href="https://github.com/facebook/rocksdb/blob/master/java/src/main/java/org/rocksdb/LRUCache.java#L72">LRUCache constructor</a>. </sup>
|
||||
<br>
|
||||
<sup id="fn2">2. This must be set in order for INDEX_FILTER_BLOCK_RATIO to take effect (see footnote 1) as described in the <a class="reference external" href="https://github.com/facebook/rocksdb/wiki/Block-Cache#caching-index-and-filter-blocks">RocksDB docs</a></sup>
|
||||
<br>
|
||||
<sup id="fn3">3. You may want to modify the default <a class="reference external" href="https://github.com/apache/kafka/blob/2.3/streams/src/main/java/org/apache/kafka/streams/state/internals/RocksDBStore.java#L79">block size</a> per these instructions from the <a class="reference external" href="https://github.com/facebook/rocksdb/wiki/Memory-usage-in-RocksDB#indexes-and-filter-blocks">RocksDB GitHub</a>. A larger block size means index blocks will be smaller, but the cached data blocks may contain more cold data that would otherwise be evicted.
|
||||
<br>
|
||||
<dl class="docutils">
|
||||
<dt>Note:</dt>
|
||||
While we recommend setting at least the above configs, the specific options that yield the best performance are workload dependent and you should consider experimenting with these to determine the best choices for your specific use case. Keep in mind that the optimal configs for one app may not apply to one with a different topology or input topic.
|
||||
In addition to the recommended configs above, you may want to consider using partitioned index filters as described by the <a class="reference external" href="https://github.com/facebook/rocksdb/wiki/Partitioned-Index-Filters">RocksDB docs</a>.
|
||||
|
||||
</dl>
|
||||
</div>
|
||||
<div class="section" id="other-memory-usage">
|
||||
<h2><a class="toc-backref" href="#id4">Other memory usage</a><a class="headerlink" href="#other-memory-usage" title="Permalink to this headline"></a></h2>
|
||||
<p>There are other modules inside Apache Kafka that allocate memory during runtime. They include the following:</p>
|
||||
<ul class="simple">
|
||||
<li>Producer buffering, managed by the producer config <code class="docutils literal"><span class="pre">buffer.memory</span></code>.</li>
|
||||
<li>Consumer buffering, currently not strictly managed, but can be indirectly controlled by fetch size, i.e.,
|
||||
<code class="docutils literal"><span class="pre">fetch.max.bytes</span></code> and <code class="docutils literal"><span class="pre">fetch.max.wait.ms</span></code>.</li>
|
||||
<li>Both producer and consumer also have separate TCP send / receive buffers that are not counted as the buffering memory.
|
||||
These are controlled by the <code class="docutils literal"><span class="pre">send.buffer.bytes</span></code> / <code class="docutils literal"><span class="pre">receive.buffer.bytes</span></code> configs.</li>
|
||||
<li>Deserialized objects buffering: after <code class="docutils literal"><span class="pre">consumer.poll()</span></code> returns records, they will be deserialized to extract
|
||||
timestamp and buffered in the streams space. Currently this is only indirectly controlled by
|
||||
<code class="docutils literal"><span class="pre">buffered.records.per.partition</span></code>.</li>
|
||||
</ul>
|
||||
<div class="admonition tip">
|
||||
<p><b>Tip</b></p>
|
||||
<p><strong>Iterators should be closed explicitly to release resources:</strong> Store iterators (e.g., <code class="docutils literal"><span class="pre">KeyValueIterator</span></code> and <code class="docutils literal"><span class="pre">WindowStoreIterator</span></code>) must be closed explicitly upon completeness to release resources such as open file handlers and in-memory read buffers, or use try-with-resources statement (available since JDK7) for this Closeable class.</p>
|
||||
<p class="last">Otherwise, stream application’s memory usage keeps increasing when running until it hits an OOM.</p>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
|
||||
</div>
|
||||
</div>
|
||||
<div class="pagination">
|
||||
<a href="/{{version}}/documentation/streams/developer-guide/interactive-queries" class="pagination__btn pagination__btn__prev">Previous</a>
|
||||
<a href="/{{version}}/documentation/streams/developer-guide/running-app" class="pagination__btn pagination__btn__next">Next</a>
|
||||
</div>
|
||||
</script>
|
||||
|
||||
<!--#include virtual="../../../includes/_header.htm" -->
|
||||
<!--#include virtual="../../../includes/_top.htm" -->
|
||||
<div class="content documentation documentation--current">
|
||||
<!--#include virtual="../../../includes/_nav.htm" -->
|
||||
<div class="right">
|
||||
<!--#include virtual="../../../includes/_docs_banner.htm" -->
|
||||
<ul class="breadcrumbs">
|
||||
<li><a href="/documentation">Documentation</a></li>
|
||||
<li><a href="/documentation/streams">Kafka Streams</a></li>
|
||||
<li><a href="/documentation/streams/developer-guide/">Developer Guide</a></li>
|
||||
</ul>
|
||||
<div class="p-content"></div>
|
||||
</div>
|
||||
</div>
|
||||
<!--#include virtual="../../../includes/_footer.htm" -->
|
||||
<script>
|
||||
$(function() {
|
||||
// Show selected style on nav item
|
||||
$('.b-nav__streams').addClass('selected');
|
||||
|
||||
//sticky secondary nav
|
||||
var $navbar = $(".sub-nav-sticky"),
|
||||
y_pos = $navbar.offset().top,
|
||||
height = $navbar.height();
|
||||
|
||||
$(window).scroll(function() {
|
||||
var scrollTop = $(window).scrollTop();
|
||||
|
||||
if (scrollTop > y_pos - height) {
|
||||
$navbar.addClass("navbar-fixed")
|
||||
} else if (scrollTop <= y_pos) {
|
||||
$navbar.removeClass("navbar-fixed")
|
||||
}
|
||||
});
|
||||
|
||||
// Display docs subnav items
|
||||
$('.b-nav__docs').parent().toggleClass('nav__item__with__subs--expanded');
|
||||
});
|
||||
</script>
|
||||
474
docs/streams/developer-guide/processor-api.html
Normal file
474
docs/streams/developer-guide/processor-api.html
Normal file
@@ -0,0 +1,474 @@
|
||||
<!--
|
||||
Licensed to the Apache Software Foundation (ASF) under one or more
|
||||
contributor license agreements. See the NOTICE file distributed with
|
||||
this work for additional information regarding copyright ownership.
|
||||
The ASF licenses this file to You under the Apache License, Version 2.0
|
||||
(the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
-->
|
||||
|
||||
<script><!--#include virtual="../../js/templateData.js" --></script>
|
||||
|
||||
<script id="content-template" type="text/x-handlebars-template">
|
||||
<!-- h1>Developer Guide for Kafka Streams</h1 -->
|
||||
<div class="sub-nav-sticky">
|
||||
<div class="sticky-top">
|
||||
<!-- div style="height:35px">
|
||||
<a href="/{{version}}/documentation/streams/">Introduction</a>
|
||||
<a class="active-menu-item" href="/{{version}}/documentation/streams/developer-guide">Developer Guide</a>
|
||||
<a href="/{{version}}/documentation/streams/core-concepts">Concepts</a>
|
||||
<a href="/{{version}}/documentation/streams/quickstart">Run Demo App</a>
|
||||
<a href="/{{version}}/documentation/streams/tutorial">Tutorial: Write App</a>
|
||||
</div -->
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div class="section" id="processor-api">
|
||||
<span id="streams-developer-guide-processor-api"></span><h1>Processor API<a class="headerlink" href="#processor-api" title="Permalink to this headline"></a></h1>
|
||||
<p>The Processor API allows developers to define and connect custom processors and to interact with state stores. With the
|
||||
Processor API, you can define arbitrary stream processors that process one received record at a time, and connect these
|
||||
processors with their associated state stores to compose the processor topology that represents a customized processing
|
||||
logic.</p>
|
||||
<div class="contents local topic" id="table-of-contents">
|
||||
<p class="topic-title first"><b>Table of Contents</b></p>
|
||||
<ul class="simple">
|
||||
<li><a class="reference internal" href="#overview" id="id1">Overview</a></li>
|
||||
<li><a class="reference internal" href="#defining-a-stream-processor" id="id2">Defining a Stream
|
||||
Processor</a></li>
|
||||
<li><a class="reference internal" href="#unit-testing-processors" id="id9">Unit Testing Processors</a></li>
|
||||
<li><a class="reference internal" href="#state-stores" id="id3">State Stores</a>
|
||||
<ul>
|
||||
<li><a class="reference internal" href="#defining-and-creating-a-state-store" id="id4">Defining and creating a State Store</a></li>
|
||||
<li><a class="reference internal" href="#fault-tolerant-state-stores" id="id5">Fault-tolerant State Stores</a></li>
|
||||
<li><a class="reference internal" href="#enable-or-disable-fault-tolerance-of-state-stores-store-changelogs" id="id6">Enable or Disable Fault Tolerance of State Stores (Store Changelogs)</a></li>
|
||||
<li><a class="reference internal" href="#implementing-custom-state-stores" id="id7">Implementing Custom State Stores</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
<li><a class="reference internal" href="#connecting-processors-and-state-stores" id="id8">Connecting Processors and State Stores</a></li>
|
||||
<li><a class="reference internal" href="#accessing-processor-context" id="id10">Accessing Processor Context</a></li>
|
||||
</ul>
|
||||
</div>
|
||||
<div class="section" id="overview">
|
||||
<h2><a class="toc-backref" href="#id1">Overview</a><a class="headerlink" href="#overview" title="Permalink to this headline"></a></h2>
|
||||
<p>The Processor API can be used to implement both <strong>stateless</strong> as well as <strong>stateful</strong> operations, where the latter is
|
||||
achieved through the use of <a class="reference internal" href="#streams-developer-guide-state-store"><span class="std std-ref">state stores</span></a>.</p>
|
||||
<div class="admonition tip">
|
||||
<p><b>Tip</b></p>
|
||||
<p class="last"><strong>Combining the DSL and the Processor API:</strong>
|
||||
You can combine the convenience of the DSL with the power and flexibility of the Processor API as described in the
|
||||
section <a class="reference internal" href="dsl-api.html#streams-developer-guide-dsl-process"><span class="std std-ref">Applying processors and transformers (Processor API integration)</span></a>.</p>
|
||||
</div>
|
||||
<p>For a complete list of available API functionality, see the <a href="/{{version}}/javadoc/org/apache/kafka/streams/package-summary.html">Streams</a> API docs.</p>
|
||||
</div>
|
||||
<div class="section" id="defining-a-stream-processor">
|
||||
<span id="streams-developer-guide-stream-processor"></span><h2><a class="toc-backref" href="#id2">Defining a Stream Processor</a><a class="headerlink" href="#defining-a-stream-processor" title="Permalink to this headline"></a></h2>
|
||||
<p>A <a class="reference internal" href="../core-concepts.html#streams_processor_node"><span class="std std-ref">stream processor</span></a> is a node in the processor topology that represents a single processing step.
|
||||
With the Processor API, you can define arbitrary stream processors that processes one received record at a time, and connect
|
||||
these processors with their associated state stores to compose the processor topology.</p>
|
||||
<p>You can define a customized stream processor by implementing the <code class="docutils literal"><span class="pre">Processor</span></code> interface, which provides the <code class="docutils literal"><span class="pre">process()</span></code> API method.
|
||||
The <code class="docutils literal"><span class="pre">process()</span></code> method is called on each of the received records.</p>
|
||||
<p>The <code class="docutils literal"><span class="pre">Processor</span></code> interface also has an <code class="docutils literal"><span class="pre">init()</span></code> method, which is called by the Kafka Streams library during task construction
|
||||
phase. Processor instances should perform any required initialization in this method. The <code class="docutils literal"><span class="pre">init()</span></code> method passes in a <code class="docutils literal"><span class="pre">ProcessorContext</span></code>
|
||||
instance, which provides access to the metadata of the currently processed record, including its source Kafka topic and partition,
|
||||
its corresponding message offset, and further such information. You can also use this context instance to schedule a punctuation
|
||||
function (via <code class="docutils literal"><span class="pre">ProcessorContext#schedule()</span></code>), to forward a new record as a key-value pair to the downstream processors (via <code class="docutils literal"><span class="pre">ProcessorContext#forward()</span></code>),
|
||||
and to commit the current processing progress (via <code class="docutils literal"><span class="pre">ProcessorContext#commit()</span></code>).
|
||||
Any resources you set up in <code class="docutils literal"><span class="pre">init()</span></code> can be cleaned up in the
|
||||
<code class="docutils literal"><span class="pre">close()</span></code> method. Note that Kafka Streams may re-use a single
|
||||
<code class="docutils literal"><span class="pre">Processor</span></code> object by calling
|
||||
<code class="docutils literal"><span class="pre">init()</span></code> on it again after <code class="docutils literal"><span class="pre">close()</span></code>.</p>
|
||||
<p>When records are forwarded via downstream processors they also get a timestamp assigned. There are two different default behaviors:
|
||||
(1) If <code class="docutils literal"><span class="pre">#forward()</span></code> is called within <code class="docutils literal"><span class="pre">#process()</span></code> the output record inherits the input record timestamp.
|
||||
(2) If <code class="docutils literal"><span class="pre">#forward()</span></code> is called within <code class="docutils literal"><span class="pre">punctuate()</span></code></p> the output record inherits the current punctuation timestamp (either current 'stream time' or system wall-clock time).
|
||||
Note, that <code class="docutils literal"><span class="pre">#forward()</span></code> also allows to change the default behavior by passing a custom timestamp for the output record.</p>
|
||||
<p>Specifically, <code class="docutils literal"><span class="pre">ProcessorContext#schedule()</span></code> accepts a user <code class="docutils literal"><span class="pre">Punctuator</span></code> callback interface, which triggers its <code class="docutils literal"><span class="pre">punctuate()</span></code>
|
||||
API method periodically based on the <code class="docutils literal"><span class="pre">PunctuationType</span></code>. The <code class="docutils literal"><span class="pre">PunctuationType</span></code> determines what notion of time is used
|
||||
for the punctuation scheduling: either <a class="reference internal" href="../core-concepts.html#streams_time"><span class="std std-ref">stream-time</span></a> or wall-clock-time (by default, stream-time
|
||||
is configured to represent event-time via <code class="docutils literal"><span class="pre">TimestampExtractor</span></code>). When stream-time is used, <code class="docutils literal"><span class="pre">punctuate()</span></code> is triggered purely
|
||||
by data because stream-time is determined (and advanced forward) by the timestamps derived from the input data. When there
|
||||
is no new input data arriving, stream-time is not advanced and thus <code class="docutils literal"><span class="pre">punctuate()</span></code> is not called.</p>
|
||||
<p>For example, if you schedule a <code class="docutils literal"><span class="pre">Punctuator</span></code> function every 10 seconds based on <code class="docutils literal"><span class="pre">PunctuationType.STREAM_TIME</span></code> and if you
|
||||
process a stream of 60 records with consecutive timestamps from 1 (first record) to 60 seconds (last record),
|
||||
then <code class="docutils literal"><span class="pre">punctuate()</span></code> would be called 6 times. This happens regardless of the time required to actually process those records. <code class="docutils literal"><span class="pre">punctuate()</span></code>
|
||||
would be called 6 times regardless of whether processing these 60 records takes a second, a minute, or an hour.</p>
|
||||
<p>When wall-clock-time (i.e. <code class="docutils literal"><span class="pre">PunctuationType.WALL_CLOCK_TIME</span></code>) is used, <code class="docutils literal"><span class="pre">punctuate()</span></code> is triggered purely by the wall-clock time.
|
||||
Reusing the example above, if the <code class="docutils literal"><span class="pre">Punctuator</span></code> function is scheduled based on <code class="docutils literal"><span class="pre">PunctuationType.WALL_CLOCK_TIME</span></code>, and if these
|
||||
60 records were processed within 20 seconds, <code class="docutils literal"><span class="pre">punctuate()</span></code> is called 2 times (one time every 10 seconds). If these 60 records
|
||||
were processed within 5 seconds, then no <code class="docutils literal"><span class="pre">punctuate()</span></code> is called at all. Note that you can schedule multiple <code class="docutils literal"><span class="pre">Punctuator</span></code>
|
||||
callbacks with different <code class="docutils literal"><span class="pre">PunctuationType</span></code> types within the same processor by calling <code class="docutils literal"><span class="pre">ProcessorContext#schedule()</span></code> multiple
|
||||
times inside <code class="docutils literal"><span class="pre">init()</span></code> method.</p>
|
||||
<div class="admonition attention">
|
||||
<p class="first admonition-title"><b>Attention</b></p>
|
||||
<p class="last">Stream-time is only advanced if all input partitions over all input topics have new data (with newer timestamps) available.
|
||||
If at least one partition does not have any new data available, stream-time will not be advanced and thus <code class="docutils literal"><span class="pre">punctuate()</span></code> will not be triggered if <code class="docutils literal"><span class="pre">PunctuationType.STREAM_TIME</span></code> was specified.
|
||||
This behavior is independent of the configured timestamp extractor, i.e., using <code class="docutils literal"><span class="pre">WallclockTimestampExtractor</span></code> does not enable wall-clock triggering of <code class="docutils literal"><span class="pre">punctuate()</span></code>.</p>
|
||||
</div>
|
||||
<p><b>Example</b></p>
|
||||
<p>The following example <code class="docutils literal"><span class="pre">Processor</span></code> defines a simple word-count algorithm and the following actions are performed:</p>
|
||||
<ul class="simple">
|
||||
<li>In the <code class="docutils literal"><span class="pre">init()</span></code> method, schedule the punctuation every 1000 time units (the time unit is normally milliseconds, which in this example would translate to punctuation every 1 second) and retrieve the local state store by its name “Counts”.</li>
|
||||
<li>In the <code class="docutils literal"><span class="pre">process()</span></code> method, upon each received record, split the value string into words, and update their counts into the state store (we will talk about this later in this section).</li>
|
||||
<li>In the <code class="docutils literal"><span class="pre">punctuate()</span></code> method, iterate the local state store and send the aggregated counts to the downstream processor (we will talk about downstream processors later in this section), and commit the current stream state.</li>
|
||||
</ul>
|
||||
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="kd">public</span> <span class="kd">class</span> <span class="nc">WordCountProcessor</span> <span class="kd">implements</span> <span class="n">Processor</span><span class="o"><</span><span class="n">String</span><span class="o">,</span> <span class="n">String</span><span class="o">></span> <span class="o">{</span>
|
||||
|
||||
<span class="kd">private</span> <span class="n">ProcessorContext</span> <span class="n">context</span><span class="o">;</span>
|
||||
<span class="kd">private</span> <span class="n">KeyValueStore</span><span class="o"><</span><span class="n">String</span><span class="o">,</span> <span class="n">Long</span><span class="o">></span> <span class="n">kvStore</span><span class="o">;</span>
|
||||
|
||||
<span class="nd">@Override</span>
|
||||
<span class="nd">@SuppressWarnings</span><span class="o">(</span><span class="s">"unchecked"</span><span class="o">)</span>
|
||||
<span class="kd">public</span> <span class="kt">void</span> <span class="nf">init</span><span class="o">(</span><span class="n">ProcessorContext</span> <span class="n">context</span><span class="o">)</span> <span class="o">{</span>
|
||||
<span class="c1">// keep the processor context locally because we need it in punctuate() and commit()</span>
|
||||
<span class="k">this</span><span class="o">.</span><span class="na">context</span> <span class="o">=</span> <span class="n">context</span><span class="o">;</span>
|
||||
|
||||
<span class="c1">// retrieve the key-value store named "Counts"</span>
|
||||
<span class="n">kvStore</span> <span class="o">=</span> <span class="o">(</span><span class="n">KeyValueStore</span><span class="o">)</span> <span class="n">context</span><span class="o">.</span><span class="na">getStateStore</span><span class="o">(</span><span class="s">"Counts"</span><span class="o">);</span>
|
||||
|
||||
<span class="c1">// schedule a punctuate() method every second based on stream-time</span>
|
||||
<span class="k">this</span><span class="o">.</span><span class="na">context</span><span class="o">.</span><span class="na">schedule</span><span class="o">(</span><span class="na">Duration</span><span class="o">.</span><span class="na">ofSeconds</span><span class="o">(</span><span class="mi">1000</span><span class="o">),</span> <span class="n">PunctuationType</span><span class="o">.</span><span class="na">STREAM_TIME</span><span class="o">,</span> <span class="o">(</span><span class="n">timestamp</span><span class="o">)</span> <span class="o">-></span> <span class="o">{</span>
|
||||
<span class="n">KeyValueIterator</span><span class="o"><</span><span class="n">String</span><span class="o">,</span> <span class="n">Long</span><span class="o">></span> <span class="n">iter</span> <span class="o">=</span> <span class="k">this</span><span class="o">.</span><span class="na">kvStore</span><span class="o">.</span><span class="na">all</span><span class="o">();</span>
|
||||
<span class="k">while</span> <span class="o">(</span><span class="n">iter</span><span class="o">.</span><span class="na">hasNext</span><span class="o">())</span> <span class="o">{</span>
|
||||
<span class="n">KeyValue</span><span class="o"><</span><span class="n">String</span><span class="o">,</span> <span class="n">Long</span><span class="o">></span> <span class="n">entry</span> <span class="o">=</span> <span class="n">iter</span><span class="o">.</span><span class="na">next</span><span class="o">();</span>
|
||||
<span class="n">context</span><span class="o">.</span><span class="na">forward</span><span class="o">(</span><span class="n">entry</span><span class="o">.</span><span class="na">key</span><span class="o">,</span> <span class="n">entry</span><span class="o">.</span><span class="na">value</span><span class="o">.</span><span class="na">toString</span><span class="o">());</span>
|
||||
<span class="o">}</span>
|
||||
<span class="n">iter</span><span class="o">.</span><span class="na">close</span><span class="o">();</span>
|
||||
|
||||
<span class="c1">// commit the current processing progress</span>
|
||||
<span class="n">context</span><span class="o">.</span><span class="na">commit</span><span class="o">();</span>
|
||||
<span class="o">});</span>
|
||||
<span class="o">}</span>
|
||||
|
||||
<span class="nd">@Override</span>
|
||||
<span class="kd">public</span> <span class="kt">void</span> <span class="nf">punctuate</span><span class="o">(</span><span class="kt">long</span> <span class="n">timestamp</span><span class="o">)</span> <span class="o">{</span>
|
||||
<span class="c1">// this method is deprecated and should not be used anymore</span>
|
||||
<span class="o">}</span>
|
||||
|
||||
<span class="nd">@Override</span>
|
||||
<span class="kd">public</span> <span class="kt">void</span> <span class="nf">close</span><span class="o">()</span> <span class="o">{</span>
|
||||
<span class="c1">// close any resources managed by this processor</span>
|
||||
<span class="c1">// Note: Do not close any StateStores as these are managed by the library</span>
|
||||
<span class="o">}</span>
|
||||
|
||||
<span class="o">}</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<div class="admonition note">
|
||||
<p><b>Note</b></p>
|
||||
<p class="last"><strong>Stateful processing with state stores:</strong>
|
||||
The <code class="docutils literal"><span class="pre">WordCountProcessor</span></code> defined above can access the currently received record in its <code class="docutils literal"><span class="pre">process()</span></code> method, and it can
|
||||
leverage <a class="reference internal" href="#streams-developer-guide-state-store"><span class="std std-ref">state stores</span></a> to maintain processing states to, for example, remember recently
|
||||
arrived records for stateful processing needs like aggregations and joins. For more information, see the <a class="reference internal" href="#streams-developer-guide-state-store"><span class="std std-ref">state stores</span></a> documentation.</p>
|
||||
</div>
|
||||
</div>
|
||||
<div class="section" id="unit-testing-processors">
|
||||
<h2>
|
||||
<a class="toc-backref" href="#id9">Unit Testing Processors</a>
|
||||
<a class="headerlink" href="#unit-testing-processors" title="Permalink to this headline"></a>
|
||||
</h2>
|
||||
<p>
|
||||
Kafka Streams comes with a <code>test-utils</code> module to help you write unit tests for your
|
||||
processors <a href="testing.html#unit-testing-processors">here</a>.
|
||||
</p>
|
||||
</div>
|
||||
<div class="section" id="state-stores">
|
||||
<span id="streams-developer-guide-state-store"></span><h2><a class="toc-backref" href="#id3">State Stores</a><a class="headerlink" href="#state-stores" title="Permalink to this headline"></a></h2>
|
||||
<p>To implement a <strong>stateful</strong> <code class="docutils literal"><span class="pre">Processor</span></code> or <code class="docutils literal"><span class="pre">Transformer</span></code>, you must provide one or more state stores to the processor
|
||||
or transformer (<em>stateless</em> processors or transformers do not need state stores). State stores can be used to remember
|
||||
recently received input records, to track rolling aggregates, to de-duplicate input records, and more.
|
||||
Another feature of state stores is that they can be
|
||||
<a class="reference internal" href="interactive-queries.html#streams-developer-guide-interactive-queries"><span class="std std-ref">interactively queried</span></a> from other applications, such as a
|
||||
NodeJS-based dashboard or a microservice implemented in Scala or Go.</p>
|
||||
<p>The
|
||||
<a class="reference internal" href="#streams-developer-guide-state-store-defining"><span class="std std-ref">available state store types</span></a> in Kafka Streams have
|
||||
<a class="reference internal" href="#streams-developer-guide-state-store-fault-tolerance"><span class="std std-ref">fault tolerance</span></a> enabled by default.</p>
|
||||
<div class="section" id="defining-and-creating-a-state-store">
|
||||
<span id="streams-developer-guide-state-store-defining"></span><h3><a class="toc-backref" href="#id4">Defining and creating a State Store</a><a class="headerlink" href="#defining-and-creating-a-state-store" title="Permalink to this headline"></a></h3>
|
||||
<p>You can either use one of the available store types or
|
||||
<a class="reference internal" href="#streams-developer-guide-state-store-custom"><span class="std std-ref">implement your own custom store type</span></a>.
|
||||
It’s common practice to leverage an existing store type via the <code class="docutils literal"><span class="pre">Stores</span></code> factory.</p>
|
||||
<p>Note that, when using Kafka Streams, you normally don’t create or instantiate state stores directly in your code.
|
||||
Rather, you define state stores indirectly by creating a so-called <code class="docutils literal"><span class="pre">StoreBuilder</span></code>. This builder is used by
|
||||
Kafka Streams as a factory to instantiate the actual state stores locally in application instances when and where
|
||||
needed.</p>
|
||||
<p>The following store types are available out of the box.</p>
|
||||
<table border="1" class="non-scrolling-table width-100-percent docutils">
|
||||
<colgroup>
|
||||
<col width="19%" />
|
||||
<col width="11%" />
|
||||
<col width="18%" />
|
||||
<col width="51%" />
|
||||
</colgroup>
|
||||
<thead valign="bottom">
|
||||
<tr class="row-odd"><th class="head">Store Type</th>
|
||||
<th class="head">Storage Engine</th>
|
||||
<th class="head">Fault-tolerant?</th>
|
||||
<th class="head">Description</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody valign="top">
|
||||
<tr class="row-even"><td>Persistent
|
||||
<code class="docutils literal"><span class="pre">KeyValueStore<K,</span> <span class="pre">V></span></code></td>
|
||||
<td>RocksDB</td>
|
||||
<td>Yes (enabled by default)</td>
|
||||
<td><ul class="first simple">
|
||||
<li><strong>The recommended store type for most use cases.</strong></li>
|
||||
<li>Stores its data on local disk.</li>
|
||||
<li>Storage capacity:
|
||||
managed local state can be larger than the memory (heap space) of an
|
||||
application instance, but must fit into the available local disk
|
||||
space.</li>
|
||||
<li>RocksDB settings can be fine-tuned, see
|
||||
<a class="reference internal" href="config-streams.html#streams-developer-guide-rocksdb-config"><span class="std std-ref">RocksDB configuration</span></a>.</li>
|
||||
<li>Available <a class="reference external" href="/{{version}}/javadoc/org/apache/kafka/streams/state/Stores.html#persistentKeyValueStore-java.lang.String-">store variants</a>:
|
||||
time window key-value store, session window key-value store.</li>
|
||||
</ul>
|
||||
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="c1">// Creating a persistent key-value store:</span>
|
||||
<span class="c1">// here, we create a `KeyValueStore<String, Long>` named "persistent-counts".</span>
|
||||
<span class="kn">import</span> <span class="nn">org.apache.kafka.streams.state.StoreBuilder</span><span class="o">;</span>
|
||||
<span class="kn">import</span> <span class="nn">org.apache.kafka.streams.state.Stores</span><span class="o">;</span>
|
||||
|
||||
<span class="c1">// Using a `KeyValueStoreBuilder` to build a `KeyValueStore`.</span>
|
||||
<span class="n">StoreBuilder</span><span class="o"><</span><span class="n">KeyValueStore</span><span class="o"><</span><span class="n">String</span><span class="o">,</span> <span class="n">Long</span><span class="o">>></span> <span class="n">countStoreSupplier</span> <span class="o">=</span>
|
||||
<span class="n">Stores</span><span class="o">.</span><span class="na">keyValueStoreBuilder</span><span class="o">(</span>
|
||||
<span class="n">Stores</span><span class="o">.</span><span class="na">persistentKeyValueStore</span><span class="o">(</span><span class="s">"persistent-counts"</span><span class="o">),</span>
|
||||
<span class="n">Serdes</span><span class="o">.</span><span class="na">String</span><span class="o">(),</span>
|
||||
<span class="n">Serdes</span><span class="o">.</span><span class="na">Long</span><span class="o">());</span>
|
||||
<span class="n">KeyValueStore</span><span class="o"><</span><span class="n">String</span><span class="o">,</span> <span class="n">Long</span><span class="o">></span> <span class="n">countStore</span> <span class="o">=</span> <span class="n">countStoreSupplier</span><span class="o">.</span><span class="na">build</span><span class="o">();</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</td>
|
||||
</tr>
|
||||
<tr class="row-odd"><td>In-memory
|
||||
<code class="docutils literal"><span class="pre">KeyValueStore<K,</span> <span class="pre">V></span></code></td>
|
||||
<td>-</td>
|
||||
<td>Yes (enabled by default)</td>
|
||||
<td><ul class="first simple">
|
||||
<li>Stores its data in memory.</li>
|
||||
<li>Storage capacity:
|
||||
managed local state must fit into memory (heap space) of an
|
||||
application instance.</li>
|
||||
<li>Useful when application instances run in an environment where local
|
||||
disk space is either not available or local disk space is wiped
|
||||
in-between app instance restarts.</li>
|
||||
<li>Available <a class="reference external" href="/{{version}}/javadoc/org/apache/kafka/streams/state/Stores.html#inMemoryKeyValueStore-java.lang.String-">store variants</a>:
|
||||
time window key-value store, session window key-value store.</li>
|
||||
</ul>
|
||||
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="c1">// Creating an in-memory key-value store:</span>
|
||||
<span class="c1">// here, we create a `KeyValueStore<String, Long>` named "inmemory-counts".</span>
|
||||
<span class="kn">import</span> <span class="nn">org.apache.kafka.streams.state.StoreBuilder</span><span class="o">;</span>
|
||||
<span class="kn">import</span> <span class="nn">org.apache.kafka.streams.state.Stores</span><span class="o">;</span>
|
||||
|
||||
<span class="c1">// Using a `KeyValueStoreBuilder` to build a `KeyValueStore`.</span>
|
||||
<span class="n">StoreBuilder</span><span class="o"><</span><span class="n">KeyValueStore</span><span class="o"><</span><span class="n">String</span><span class="o">,</span> <span class="n">Long</span><span class="o">>></span> <span class="n">countStoreSupplier</span> <span class="o">=</span>
|
||||
<span class="n">Stores</span><span class="o">.</span><span class="na">keyValueStoreBuilder</span><span class="o">(</span>
|
||||
<span class="n">Stores</span><span class="o">.</span><span class="na">inMemoryKeyValueStore</span><span class="o">(</span><span class="s">"inmemory-counts"</span><span class="o">),</span>
|
||||
<span class="n">Serdes</span><span class="o">.</span><span class="na">String</span><span class="o">(),</span>
|
||||
<span class="n">Serdes</span><span class="o">.</span><span class="na">Long</span><span class="o">());</span>
|
||||
<span class="n">KeyValueStore</span><span class="o"><</span><span class="n">String</span><span class="o">,</span> <span class="n">Long</span><span class="o">></span> <span class="n">countStore</span> <span class="o">=</span> <span class="n">countStoreSupplier</span><span class="o">.</span><span class="na">build</span><span class="o">();</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
</div>
|
||||
<div class="section" id="fault-tolerant-state-stores">
|
||||
<span id="streams-developer-guide-state-store-fault-tolerance"></span><h3><a class="toc-backref" href="#id5">Fault-tolerant State Stores</a><a class="headerlink" href="#fault-tolerant-state-stores" title="Permalink to this headline"></a></h3>
|
||||
<p>To make state stores fault-tolerant and to allow for state store migration without data loss, a state store can be
|
||||
continuously backed up to a Kafka topic behind the scenes. For example, to migrate a stateful stream task from one
|
||||
machine to another when <a class="reference internal" href="running-app.html#streams-developer-guide-execution-scaling"><span class="std std-ref">elastically adding or removing capacity from your application</span></a>.
|
||||
This topic is sometimes referred to as the state store’s associated <em>changelog topic</em>, or its <em>changelog</em>. For example, if
|
||||
you experience machine failure, the state store and the application’s state can be fully restored from its changelog. You can
|
||||
<a class="reference internal" href="#streams-developer-guide-state-store-enable-disable-fault-tolerance"><span class="std std-ref">enable or disable this backup feature</span></a> for a
|
||||
state store.</p>
|
||||
<p>Fault-tolerant state stores are backed by a
|
||||
<a class="reference external" href="https://kafka.apache.org/documentation.html#compaction">compacted</a> changelog topic. The purpose of compacting this
|
||||
topic is to prevent the topic from growing indefinitely, to reduce the storage consumed in the associated Kafka cluster,
|
||||
and to minimize recovery time if a state store needs to be restored from its changelog topic.</p>
|
||||
<p>Fault-tolerant windowed state stores are backed by a topic that uses both compaction and
|
||||
deletion. Because of the structure of the message keys that are being sent to the changelog topics, this combination of
|
||||
deletion and compaction is required for the changelog topics of window stores. For window stores, the message keys are
|
||||
composite keys that include the “normal” key and window timestamps. For these types of composite keys it would not
|
||||
be sufficient to only enable compaction to prevent a changelog topic from growing out of bounds. With deletion
|
||||
enabled, old windows that have expired will be cleaned up by Kafka’s log cleaner as the log segments expire. The
|
||||
default retention setting is <code class="docutils literal"><span class="pre">Windows#maintainMs()</span></code> + 1 day. You can override this setting by specifying
|
||||
<code class="docutils literal"><span class="pre">StreamsConfig.WINDOW_STORE_CHANGE_LOG_ADDITIONAL_RETENTION_MS_CONFIG</span></code> in the <code class="docutils literal"><span class="pre">StreamsConfig</span></code>.</p>
|
||||
<p>When you open an <code class="docutils literal"><span class="pre">Iterator</span></code> from a state store you must call <code class="docutils literal"><span class="pre">close()</span></code> on the iterator when you are done working with
|
||||
it to reclaim resources; or you can use the iterator from within a try-with-resources statement. If you do not close an iterator,
|
||||
you may encounter an OOM error.</p>
|
||||
</div>
|
||||
<div class="section" id="enable-or-disable-fault-tolerance-of-state-stores-store-changelogs">
|
||||
<span id="streams-developer-guide-state-store-enable-disable-fault-tolerance"></span><h3><a class="toc-backref" href="#id6">Enable or Disable Fault Tolerance of State Stores (Store Changelogs)</a><a class="headerlink" href="#enable-or-disable-fault-tolerance-of-state-stores-store-changelogs" title="Permalink to this headline"></a></h3>
|
||||
<p>You can enable or disable fault tolerance for a state store by enabling or disabling the change logging
|
||||
of the store through <code class="docutils literal"><span class="pre">enableLogging()</span></code> and <code class="docutils literal"><span class="pre">disableLogging()</span></code>.
|
||||
You can also fine-tune the associated topic’s configuration if needed.</p>
|
||||
<p>Example for disabling fault-tolerance:</p>
|
||||
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">org.apache.kafka.streams.state.StoreBuilder</span><span class="o">;</span>
|
||||
<span class="kn">import</span> <span class="nn">org.apache.kafka.streams.state.Stores</span><span class="o">;</span>
|
||||
|
||||
<span class="n">StoreBuilder</span><span class="o"><</span><span class="n">KeyValueStore</span><span class="o"><</span><span class="n">String</span><span class="o">,</span> <span class="n">Long</span><span class="o">>></span> <span class="n">countStoreSupplier</span> <span class="o">=</span> <span class="n">Stores</span><span class="o">.</span><span class="na">keyValueStoreBuilder</span><span class="o">(</span>
|
||||
<span class="n">Stores</span><span class="o">.</span><span class="na">persistentKeyValueStore</span><span class="o">(</span><span class="s">"Counts"</span><span class="o">),</span>
|
||||
<span class="n">Serdes</span><span class="o">.</span><span class="na">String</span><span class="o">(),</span>
|
||||
<span class="n">Serdes</span><span class="o">.</span><span class="na">Long</span><span class="o">())</span>
|
||||
<span class="o">.</span><span class="na">withLoggingDisabled</span><span class="o">();</span> <span class="c1">// disable backing up the store to a changelog topic</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<div class="admonition attention">
|
||||
<p class="first admonition-title">Attention</p>
|
||||
<p class="last">If the changelog is disabled then the attached state store is no longer fault tolerant and it can’t have any <a class="reference internal" href="config-streams.html#streams-developer-guide-standby-replicas"><span class="std std-ref">standby replicas</span></a>.</p>
|
||||
</div>
|
||||
<p>Here is an example for enabling fault tolerance, with additional changelog-topic configuration:
|
||||
You can add any log config from <a class="reference external" href="https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/log/LogConfig.scala">kafka.log.LogConfig</a>.
|
||||
Unrecognized configs will be ignored.</p>
|
||||
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">org.apache.kafka.streams.state.StoreBuilder</span><span class="o">;</span>
|
||||
<span class="kn">import</span> <span class="nn">org.apache.kafka.streams.state.Stores</span><span class="o">;</span>
|
||||
|
||||
<span class="n">Map</span><span class="o"><</span><span class="n">String</span><span class="o">,</span> <span class="n">String</span><span class="o">></span> <span class="n">changelogConfig</span> <span class="o">=</span> <span class="k">new</span> <span class="n">HashMap</span><span class="o">();</span>
|
||||
<span class="c1">// override min.insync.replicas</span>
|
||||
<span class="n">changelogConfig</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">TopicConfig</span><span class="o">.</span><span class="na">MIN_IN_SYNC_REPLICAS_CONFIG</span><span class="o">,</span> <span class="s">"1"</span><span class="o">)</span>
|
||||
|
||||
<span class="n">StoreBuilder</span><span class="o"><</span><span class="n">KeyValueStore</span><span class="o"><</span><span class="n">String</span><span class="o">,</span> <span class="n">Long</span><span class="o">>></span> <span class="n">countStoreSupplier</span> <span class="o">=</span> <span class="n">Stores</span><span class="o">.</span><span class="na">keyValueStoreBuilder</span><span class="o">(</span>
|
||||
<span class="n">Stores</span><span class="o">.</span><span class="na">persistentKeyValueStore</span><span class="o">(</span><span class="s">"Counts"</span><span class="o">),</span>
|
||||
<span class="n">Serdes</span><span class="o">.</span><span class="na">String</span><span class="o">(),</span>
|
||||
<span class="n">Serdes</span><span class="o">.</span><span class="na">Long</span><span class="o">())</span>
|
||||
<span class="o">.</span><span class="na">withLoggingEnabled</span><span class="o">(</span><span class="n">changlogConfig</span><span class="o">);</span> <span class="c1">// enable changelogging, with custom changelog settings</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div>
|
||||
<div class="section" id="implementing-custom-state-stores">
|
||||
<span id="streams-developer-guide-state-store-custom"></span><h3><a class="toc-backref" href="#id7">Implementing Custom State Stores</a><a class="headerlink" href="#implementing-custom-state-stores" title="Permalink to this headline"></a></h3>
|
||||
<p>You can use the <a class="reference internal" href="#streams-developer-guide-state-store-defining"><span class="std std-ref">built-in state store types</span></a> or implement your own.
|
||||
The primary interface to implement for the store is
|
||||
<code class="docutils literal"><span class="pre">org.apache.kafka.streams.processor.StateStore</span></code>. Kafka Streams also has a few extended interfaces such
|
||||
as <code class="docutils literal"><span class="pre">KeyValueStore</span></code>.</p>
|
||||
<p>Note that your customized <code class="docutils literal"><span class="pre">org.apache.kafka.streams.processor.StateStore</span></code> implementation also needs to provide the logic on how to restore the state
|
||||
via the <code class="docutils literal"><span class="pre">org.apache.kafka.streams.processor.StateRestoreCallback</span></code> or <code class="docutils literal"><span class="pre">org.apache.kafka.streams.processor.BatchingStateRestoreCallback</span></code> interface.
|
||||
Details on how to instantiate these interfaces can be found in the <a class="reference external" href="/{{version}}/javadoc/org/apache/kafka/streams/processor/StateStore.html">javadocs</a>.</p>
|
||||
<p>You also need to provide a “builder” for the store by implementing the
|
||||
<code class="docutils literal"><span class="pre">org.apache.kafka.streams.state.StoreBuilder</span></code> interface, which Kafka Streams uses to create instances of
|
||||
your store.</p>
|
||||
</div>
|
||||
</div>
|
||||
<div class="section" id="accessing-processor-context">
|
||||
<h2><a class="toc-backref" href="#id10">Accessing Processor Context</a><a class="headerlink" href="#accessing-processor-context" title="Permalink to this headline"></a></h2>
|
||||
<p>As we have mentioned in the <a href=#defining-a-stream-processor>Defining a Stream Processor</a> section, a <code>ProcessorContext</code> control the processing workflow, such as scheduling a punctuation function, and committing the current processed state.</p>
|
||||
<p>This object can also be used to access the metadata related with the application like
|
||||
<code class="docutils literal"><span class="pre">applicationId</span></code>, <code class="docutils literal"><span class="pre">taskId</span></code>,
|
||||
and <code class="docutils literal"><span class="pre">stateDir</span></code>, and also record related metadata as <code class="docutils literal"><span class="pre">topic</span></code>,
|
||||
<code class="docutils literal"><span class="pre">partition</span></code>, <code class="docutils literal"><span class="pre">offset</span></code>, <code class="docutils literal"><span class="pre">timestamp</span></code> and
|
||||
<code class="docutils literal"><span class="pre">headers</span></code>.</p>
|
||||
<p>Here is an example implementation of how to add a new header to the record:</p>
|
||||
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="n">public void process(String key, String value) {</span>
|
||||
|
||||
<span class="c1">// add a header to the elements</span>
|
||||
<span class="n">context()</span><span class="o">.</span><span class="na">headers</span><span class="o">()</span><span class="o">.</span><span class="na">add</span><span class="o">.</span><span class="o">(</span><span class="s">"key"</span><span class="o">,</span> <span class="s">"key"</span>
|
||||
<span class="o">}</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<div class="section" id="connecting-processors-and-state-stores">
|
||||
<h2><a class="toc-backref" href="#id8">Connecting Processors and State Stores</a><a class="headerlink" href="#connecting-processors-and-state-stores" title="Permalink to this headline"></a></h2>
|
||||
<p>Now that a <a class="reference internal" href="#streams-developer-guide-stream-processor"><span class="std std-ref">processor</span></a> (WordCountProcessor) and the
|
||||
state stores have been defined, you can construct the processor topology by connecting these processors and state stores together by
|
||||
using the <code class="docutils literal"><span class="pre">Topology</span></code> instance. In addition, you can add source processors with the specified Kafka topics
|
||||
to generate input data streams into the topology, and sink processors with the specified Kafka topics to generate
|
||||
output data streams out of the topology.</p>
|
||||
<p>Here is an example implementation:</p>
|
||||
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="n">Topology</span> <span class="n">builder</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Topology</span><span class="o">();</span>
|
||||
|
||||
<span class="c1">// add the source processor node that takes Kafka topic "source-topic" as input</span>
|
||||
<span class="n">builder</span><span class="o">.</span><span class="na">addSource</span><span class="o">(</span><span class="s">"Source"</span><span class="o">,</span> <span class="s">"source-topic"</span><span class="o">)</span>
|
||||
|
||||
<span class="c1">// add the WordCountProcessor node which takes the source processor as its upstream processor</span>
|
||||
<span class="o">.</span><span class="na">addProcessor</span><span class="o">(</span><span class="s">"Process"</span><span class="o">,</span> <span class="o">()</span> <span class="o">-></span> <span class="k">new</span> <span class="n">WordCountProcessor</span><span class="o">(),</span> <span class="s">"Source"</span><span class="o">)</span>
|
||||
|
||||
<span class="c1">// add the count store associated with the WordCountProcessor processor</span>
|
||||
<span class="o">.</span><span class="na">addStateStore</span><span class="o">(</span><span class="n">countStoreBuilder</span><span class="o">,</span> <span class="s">"Process"</span><span class="o">)</span>
|
||||
|
||||
<span class="c1">// add the sink processor node that takes Kafka topic "sink-topic" as output</span>
|
||||
<span class="c1">// and the WordCountProcessor node as its upstream processor</span>
|
||||
<span class="o">.</span><span class="na">addSink</span><span class="o">(</span><span class="s">"Sink"</span><span class="o">,</span> <span class="s">"sink-topic"</span><span class="o">,</span> <span class="s">"Process"</span><span class="o">);</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>Here is a quick explanation of this example:</p>
|
||||
<ul class="simple">
|
||||
<li>A source processor node named <code class="docutils literal"><span class="pre">"Source"</span></code> is added to the topology using the <code class="docutils literal"><span class="pre">addSource</span></code> method, with one Kafka topic
|
||||
<code class="docutils literal"><span class="pre">"source-topic"</span></code> fed to it.</li>
|
||||
<li>A processor node named <code class="docutils literal"><span class="pre">"Process"</span></code> with the pre-defined <code class="docutils literal"><span class="pre">WordCountProcessor</span></code> logic is then added as the downstream
|
||||
processor of the <code class="docutils literal"><span class="pre">"Source"</span></code> node using the <code class="docutils literal"><span class="pre">addProcessor</span></code> method.</li>
|
||||
<li>A predefined persistent key-value state store is created and associated with the <code class="docutils literal"><span class="pre">"Process"</span></code> node, using
|
||||
<code class="docutils literal"><span class="pre">countStoreBuilder</span></code>.</li>
|
||||
<li>A sink processor node is then added to complete the topology using the <code class="docutils literal"><span class="pre">addSink</span></code> method, taking the <code class="docutils literal"><span class="pre">"Process"</span></code> node
|
||||
as its upstream processor and writing to a separate <code class="docutils literal"><span class="pre">"sink-topic"</span></code> Kafka topic (note that users can also use another overloaded variant of <code class="docutils literal"><span class="pre">addSink</span></code>
|
||||
to dynamically determine the Kafka topic to write to for each received record from the upstream processor).</li>
|
||||
</ul>
|
||||
<p>In this topology, the <code class="docutils literal"><span class="pre">"Process"</span></code> stream processor node is considered a downstream processor of the <code class="docutils literal"><span class="pre">"Source"</span></code> node, and an
|
||||
upstream processor of the <code class="docutils literal"><span class="pre">"Sink"</span></code> node. As a result, whenever the <code class="docutils literal"><span class="pre">"Source"</span></code> node forwards a newly fetched record from
|
||||
Kafka to its downstream <code class="docutils literal"><span class="pre">"Process"</span></code> node, the <code class="docutils literal"><span class="pre">WordCountProcessor#process()</span></code> method is triggered to process the record and
|
||||
update the associated state store. Whenever <code class="docutils literal"><span class="pre">context#forward()</span></code> is called in the
|
||||
<code class="docutils literal"><span class="pre">WordCountProcessor#punctuate()</span></code> method, the aggregate key-value pair will be sent via the <code class="docutils literal"><span class="pre">"Sink"</span></code> processor node to
|
||||
the Kafka topic <code class="docutils literal"><span class="pre">"sink-topic"</span></code>. Note that in the <code class="docutils literal"><span class="pre">WordCountProcessor</span></code> implementation, you must refer to the
|
||||
same store name <code class="docutils literal"><span class="pre">"Counts"</span></code> when accessing the key-value store, otherwise an exception will be thrown at runtime,
|
||||
indicating that the state store cannot be found. If the state store is not associated with the processor
|
||||
in the <code class="docutils literal"><span class="pre">Topology</span></code> code, accessing it in the processor’s <code class="docutils literal"><span class="pre">init()</span></code> method will also throw an exception at
|
||||
runtime, indicating the state store is not accessible from this processor.</p>
|
||||
<p>Now that you have fully defined your processor topology in your application, you can proceed to
|
||||
<a class="reference internal" href="running-app.html#streams-developer-guide-execution"><span class="std std-ref">running the Kafka Streams application</span></a>.</p>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
|
||||
</div>
|
||||
</div>
|
||||
<div class="pagination">
|
||||
<a href="/{{version}}/documentation/streams/developer-guide/dsl-api" class="pagination__btn pagination__btn__prev">Previous</a>
|
||||
<a href="/{{version}}/documentation/streams/developer-guide/datatypes" class="pagination__btn pagination__btn__next">Next</a>
|
||||
</div>
|
||||
</script>
|
||||
|
||||
<!--#include virtual="../../../includes/_header.htm" -->
|
||||
<!--#include virtual="../../../includes/_top.htm" -->
|
||||
<div class="content documentation documentation--current">
|
||||
<!--#include virtual="../../../includes/_nav.htm" -->
|
||||
<div class="right">
|
||||
<!--#include virtual="../../../includes/_docs_banner.htm" -->
|
||||
<ul class="breadcrumbs">
|
||||
<li><a href="/documentation">Documentation</a></li>
|
||||
<li><a href="/documentation/streams">Kafka Streams</a></li>
|
||||
<li><a href="/documentation/streams/developer-guide/">Developer Guide</a></li>
|
||||
</ul>
|
||||
<div class="p-content"></div>
|
||||
</div>
|
||||
</div>
|
||||
<!--#include virtual="../../../includes/_footer.htm" -->
|
||||
<script>
|
||||
$(function() {
|
||||
// Show selected style on nav item
|
||||
$('.b-nav__streams').addClass('selected');
|
||||
|
||||
//sticky secondary nav
|
||||
var $navbar = $(".sub-nav-sticky"),
|
||||
y_pos = $navbar.offset().top,
|
||||
height = $navbar.height();
|
||||
|
||||
$(window).scroll(function() {
|
||||
var scrollTop = $(window).scrollTop();
|
||||
|
||||
if (scrollTop > y_pos - height) {
|
||||
$navbar.addClass("navbar-fixed")
|
||||
} else if (scrollTop <= y_pos) {
|
||||
$navbar.removeClass("navbar-fixed")
|
||||
}
|
||||
});
|
||||
|
||||
// Display docs subnav items
|
||||
$('.b-nav__docs').parent().toggleClass('nav__item__with__subs--expanded');
|
||||
});
|
||||
</script>
|
||||
195
docs/streams/developer-guide/running-app.html
Normal file
195
docs/streams/developer-guide/running-app.html
Normal file
@@ -0,0 +1,195 @@
|
||||
<!--
|
||||
Licensed to the Apache Software Foundation (ASF) under one or more
|
||||
contributor license agreements. See the NOTICE file distributed with
|
||||
this work for additional information regarding copyright ownership.
|
||||
The ASF licenses this file to You under the Apache License, Version 2.0
|
||||
(the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
-->
|
||||
|
||||
<script><!--#include virtual="../../js/templateData.js" --></script>
|
||||
|
||||
<script id="content-template" type="text/x-handlebars-template">
|
||||
<!-- h1>Developer Guide for Kafka Streams</h1 -->
|
||||
<div class="sub-nav-sticky">
|
||||
<div class="sticky-top">
|
||||
<!-- div style="height:35px">
|
||||
<a href="/{{version}}/documentation/streams/">Introduction</a>
|
||||
<a class="active-menu-item" href="/{{version}}/documentation/streams/developer-guide">Developer Guide</a>
|
||||
<a href="/{{version}}/documentation/streams/core-concepts">Concepts</a>
|
||||
<a href="/{{version}}/documentation/streams/quickstart">Run Demo App</a>
|
||||
<a href="/{{version}}/documentation/streams/tutorial">Tutorial: Write App</a>
|
||||
</div -->
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div class="section" id="running-streams-applications">
|
||||
<span id="streams-developer-guide-execution"></span><h1>Running Streams Applications<a class="headerlink" href="#running-streams-applications" title="Permalink to this headline"></a></h1>
|
||||
<p>You can run Java applications that use the Kafka Streams library without any additional configuration or requirements.</p>
|
||||
<div class="contents local topic" id="table-of-contents">
|
||||
<p class="topic-title first"><b>Table of Contents</b></p>
|
||||
<ul class="simple">
|
||||
<li><a class="reference internal" href="#starting-a-kafka-streams-application" id="id3">Starting a Kafka Streams application</a></li>
|
||||
<li><a class="reference internal" href="#elastic-scaling-of-your-application" id="id4">Elastic scaling of your application</a><ul>
|
||||
<li><a class="reference internal" href="#adding-capacity-to-your-application" id="id5">Adding capacity to your application</a></li>
|
||||
<li><a class="reference internal" href="#removing-capacity-from-your-application" id="id6">Removing capacity from your application</a></li>
|
||||
<li><a class="reference internal" href="#state-restoration-during-workload-rebalance" id="id7">State restoration during workload rebalance</a></li>
|
||||
<li><a class="reference internal" href="#determining-how-many-application-instances-to-run" id="id8">Determining how many application instances to run</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
</div>
|
||||
<div class="section" id="running-streams-applications">
|
||||
<span id="streams-developer-guide-execution"></span><h1>Running Streams Applications<a class="headerlink" href="#running-streams-applications" title="Permalink to this headline"></a></h1>
|
||||
<p>You can run Java applications that use the Kafka Streams library without any additional configuration or requirements. Kafka Streams
|
||||
also provides the ability to receive notification of the various states of the application. The ability to monitor the runtime
|
||||
status is discussed in <a class="reference internal" href="../monitoring.html#streams-monitoring"><span class="std std-ref">the monitoring guide</span></a>.</p>
|
||||
<div class="contents local topic" id="table-of-contents">
|
||||
<p class="topic-title first"><b>Table of Contents</b></p>
|
||||
<ul class="simple">
|
||||
<li><a class="reference internal" href="#starting-a-kafka-streams-application" id="id3">Starting a Kafka Streams application</a></li>
|
||||
<li><a class="reference internal" href="#elastic-scaling-of-your-application" id="id4">Elastic scaling of your application</a><ul>
|
||||
<li><a class="reference internal" href="#adding-capacity-to-your-application" id="id5">Adding capacity to your application</a></li>
|
||||
<li><a class="reference internal" href="#removing-capacity-from-your-application" id="id6">Removing capacity from your application</a></li>
|
||||
<li><a class="reference internal" href="#state-restoration-during-workload-rebalance" id="id7">State restoration during workload rebalance</a></li>
|
||||
<li><a class="reference internal" href="#determining-how-many-application-instances-to-run" id="id8">Determining how many application instances to run</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
</div>
|
||||
<div class="section" id="starting-a-kafka-streams-application">
|
||||
<span id="streams-developer-guide-execution-starting"></span><h2><a class="toc-backref" href="#id3">Starting a Kafka Streams application</a><a class="headerlink" href="#starting-a-kafka-streams-application" title="Permalink to this headline"></a></h2>
|
||||
<p>You can package your Java application as a fat JAR file and then start the application like this:</p>
|
||||
<div class="highlight-bash"><div class="highlight"><pre><span></span><span class="c1"># Start the application in class `com.example.MyStreamsApp`</span>
|
||||
<span class="c1"># from the fat JAR named `path-to-app-fatjar.jar`.</span>
|
||||
$ java -cp path-to-app-fatjar.jar com.example.MyStreamsApp
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>When you start your application you are launching a Kafka Streams instance of your application. You can run multiple
|
||||
instances of your application. A common scenario is that there are multiple instances of your application running in
|
||||
parallel. For more information, see <a class="reference internal" href="../architecture.html#streams_architecture_tasks"><span class="std std-ref">Parallelism Model</span></a>.</p>
|
||||
<p>When the application instance starts running, the defined processor topology will be initialized as one or more stream tasks.
|
||||
If the processor topology defines any state stores, these are also constructed during the initialization period. For
|
||||
more information, see the <a class="reference internal" href="#streams-developer-guide-execution-scaling-state-restoration"><span class="std std-ref">State restoration during workload rebalance</span></a> section).</p>
|
||||
</div>
|
||||
<div class="section" id="elastic-scaling-of-your-application">
|
||||
<span id="streams-developer-guide-execution-scaling"></span><h2><a class="toc-backref" href="#id4">Elastic scaling of your application</a><a class="headerlink" href="#elastic-scaling-of-your-application" title="Permalink to this headline"></a></h2>
|
||||
<p>Kafka Streams makes your stream processing applications elastic and scalable. You can add and remove processing capacity
|
||||
dynamically during application runtime without any downtime or data loss. This makes your applications
|
||||
resilient in the face of failures and for allows you to perform maintenance as needed (e.g. rolling upgrades).</p>
|
||||
<p>For more information about this elasticity, see the <a class="reference internal" href="../architecture.html#streams_architecture_tasks"><span class="std std-ref">Parallelism Model</span></a> section. Kafka Streams
|
||||
leverages the Kafka group management functionality, which is built right into the <a class="reference external" href="https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol">Kafka wire protocol</a>. It is the foundation that enables the
|
||||
elasticity of Kafka Streams applications: members of a group coordinate and collaborate jointly on the consumption and
|
||||
processing of data in Kafka. Additionally, Kafka Streams provides stateful processing and allows for fault-tolerant
|
||||
state in environments where application instances may come and go at any time.</p>
|
||||
<div class="section" id="adding-capacity-to-your-application">
|
||||
<h3><a class="toc-backref" href="#id5">Adding capacity to your application</a><a class="headerlink" href="#adding-capacity-to-your-application" title="Permalink to this headline"></a></h3>
|
||||
<p>If you need more processing capacity for your stream processing application, you can simply start another instance of your stream processing application, e.g. on another machine, in order to scale out. The instances of your application will become aware of each other and automatically begin to share the processing work. More specifically, what will be handed over from the existing instances to the new instances is (some of) the stream tasks that have been run by the existing instances. Moving stream tasks from one instance to another results in moving the processing work plus any internal state of these stream tasks (the state of a stream task will be re-created in the target instance by restoring the state from its corresponding changelog topic).</p>
|
||||
<p>The various instances of your application each run in their own JVM process, which means that each instance can leverage all the processing capacity that is available to their respective JVM process (minus the capacity that any non-Kafka-Streams part of your application may be using). This explains why running additional instances will grant your application additional processing capacity. The exact capacity you will be adding by running a new instance depends of course on the environment in which the new instance runs: available CPU cores, available main memory and Java heap space, local storage, network bandwidth, and so on. Similarly, if you stop any of the running instances of your application, then you are removing and freeing up the respective processing capacity.</p>
|
||||
<div class="figure align-center" id="id1">
|
||||
<img class="centered" src="/{{version}}/images/streams-elastic-scaling-1.png">
|
||||
<p class="caption"><span class="caption-text">Before adding capacity: only a single instance of your Kafka Streams application is running. At this point the corresponding Kafka consumer group of your application contains only a single member (this instance). All data is being read and processed by this single instance.</span></p>
|
||||
</div>
|
||||
<div class="figure align-center" id="id2">
|
||||
<img class="centered" src="/{{version}}/images/streams-elastic-scaling-2.png">
|
||||
<p class="caption"><span class="caption-text">After adding capacity: now two additional instances of your Kafka Streams application are running, and they have automatically joined the application’s Kafka consumer group for a total of three current members. These three instances are automatically splitting the processing work between each other. The splitting is based on the Kafka topic partitions from which data is being read.</span></p>
|
||||
</div>
|
||||
</div>
|
||||
<div class="section" id="removing-capacity-from-your-application">
|
||||
<h3><a class="toc-backref" href="#id6">Removing capacity from your application</a><a class="headerlink" href="#removing-capacity-from-your-application" title="Permalink to this headline"></a></h3>
|
||||
<p>To remove processing capacity, you can stop running stream processing application instances (e.g., shut down two of
|
||||
the four instances), it will automatically leave the application’s consumer group, and the remaining instances of
|
||||
your application will automatically take over the processing work. The remaining instances take over the stream tasks that
|
||||
were run by the stopped instances. Moving stream tasks from one instance to another results in moving the processing
|
||||
work plus any internal state of these stream tasks. The state of a stream task is recreated in the target instance
|
||||
from its changelog topic.</p>
|
||||
<div class="figure align-center">
|
||||
<img class="centered" src="/{{version}}/images/streams-elastic-scaling-3.png">
|
||||
</div>
|
||||
</div>
|
||||
<div class="section" id="state-restoration-during-workload-rebalance">
|
||||
<span id="streams-developer-guide-execution-scaling-state-restoration"></span><h3><a class="toc-backref" href="#id7">State restoration during workload rebalance</a><a class="headerlink" href="#state-restoration-during-workload-rebalance" title="Permalink to this headline"></a></h3>
|
||||
<p>When a task is migrated, the task processing state is fully restored before the application instance resumes
|
||||
processing. This guarantees the correct processing results. In Kafka Streams, state restoration is usually done by
|
||||
replaying the corresponding changelog topic to reconstruct the state store. To minimize changelog-based restoration
|
||||
latency by using replicated local state stores, you can specify <code class="docutils literal"><span class="pre">num.standby.replicas</span></code>. When a stream task is
|
||||
initialized or re-initialized on the application instance, its state store is restored like this:</p>
|
||||
<ul class="simple">
|
||||
<li>If no local state store exists, the changelog is replayed from the earliest to the current offset. This reconstructs the local state store to the most recent snapshot.</li>
|
||||
<li>If a local state store exists, the changelog is replayed from the previously checkpointed offset. The changes are applied and the state is restored to the most recent snapshot. This method takes less time because it is applying a smaller portion of the changelog.</li>
|
||||
</ul>
|
||||
<p>For more information, see <a class="reference internal" href="config-streams.html#num-standby-replicas"><span class="std std-ref">Standby Replicas</span></a>.</p>
|
||||
</div>
|
||||
<div class="section" id="determining-how-many-application-instances-to-run">
|
||||
<h3><a class="toc-backref" href="#id8">Determining how many application instances to run</a><a class="headerlink" href="#determining-how-many-application-instances-to-run" title="Permalink to this headline"></a></h3>
|
||||
<p>The parallelism of a Kafka Streams application is primarily determined by how many partitions the input topics have. For
|
||||
example, if your application reads from a single topic that has ten partitions, then you can run up to ten instances
|
||||
of your applications. You can run further instances, but these will be idle.</p>
|
||||
<p>The number of topic partitions is the upper limit for the parallelism of your Kafka Streams application and for the
|
||||
number of running instances of your application.</p>
|
||||
<p>To achieve balanced workload processing across application instances and to prevent processing hotpots, you should
|
||||
distribute data and processing workloads:</p>
|
||||
<ul class="simple">
|
||||
<li>Data should be equally distributed across topic partitions. For example, if two topic partitions each have 1 million messages, this is better than a single partition with 2 million messages and none in the other.</li>
|
||||
<li>Processing workload should be equally distributed across topic partitions. For example, if the time to process messages varies widely, then it is better to spread the processing-intensive messages across partitions rather than storing these messages within the same partition.</li>
|
||||
</ul>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
|
||||
</div>
|
||||
</div>
|
||||
<div class="pagination">
|
||||
<a href="/{{version}}/documentation/streams/developer-guide/memory-mgmt" class="pagination__btn pagination__btn__prev">Previous</a>
|
||||
<a href="/{{version}}/documentation/streams/developer-guide/manage-topics" class="pagination__btn pagination__btn__next">Next</a>
|
||||
</div>
|
||||
</script>
|
||||
|
||||
<!--#include virtual="../../../includes/_header.htm" -->
|
||||
<!--#include virtual="../../../includes/_top.htm" -->
|
||||
<div class="content documentation documentation--current">
|
||||
<!--#include virtual="../../../includes/_nav.htm" -->
|
||||
<div class="right">
|
||||
<!--#include virtual="../../../includes/_docs_banner.htm" -->
|
||||
<ul class="breadcrumbs">
|
||||
<li><a href="/documentation">Documentation</a></li>
|
||||
<li><a href="/documentation/streams">Kafka Streams</a></li>
|
||||
<li><a href="/documentation/streams/developer-guide/">Developer Guide</a></li>
|
||||
</ul>
|
||||
<div class="p-content"></div>
|
||||
</div>
|
||||
</div>
|
||||
<!--#include virtual="../../../includes/_footer.htm" -->
|
||||
<script>
|
||||
$(function() {
|
||||
// Show selected style on nav item
|
||||
$('.b-nav__streams').addClass('selected');
|
||||
|
||||
//sticky secondary nav
|
||||
var $navbar = $(".sub-nav-sticky"),
|
||||
y_pos = $navbar.offset().top,
|
||||
height = $navbar.height();
|
||||
|
||||
$(window).scroll(function() {
|
||||
var scrollTop = $(window).scrollTop();
|
||||
|
||||
if (scrollTop > y_pos - height) {
|
||||
$navbar.addClass("navbar-fixed")
|
||||
} else if (scrollTop <= y_pos) {
|
||||
$navbar.removeClass("navbar-fixed")
|
||||
}
|
||||
});
|
||||
|
||||
// Display docs subnav items
|
||||
$('.b-nav__docs').parent().toggleClass('nav__item__with__subs--expanded');
|
||||
});
|
||||
</script>
|
||||
185
docs/streams/developer-guide/security.html
Normal file
185
docs/streams/developer-guide/security.html
Normal file
@@ -0,0 +1,185 @@
|
||||
<!--
|
||||
Licensed to the Apache Software Foundation (ASF) under one or more
|
||||
contributor license agreements. See the NOTICE file distributed with
|
||||
this work for additional information regarding copyright ownership.
|
||||
The ASF licenses this file to You under the Apache License, Version 2.0
|
||||
(the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
-->
|
||||
|
||||
<script><!--#include virtual="../../js/templateData.js" --></script>
|
||||
|
||||
<script id="content-template" type="text/x-handlebars-template">
|
||||
<!-- h1>Developer Guide for Kafka Streams</h1 -->
|
||||
<div class="sub-nav-sticky">
|
||||
<div class="sticky-top">
|
||||
<!-- div style="height:35px">
|
||||
<a href="/{{version}}/documentation/streams/">Introduction</a>
|
||||
<a class="active-menu-item" href="/{{version}}/documentation/streams/developer-guide">Developer Guide</a>
|
||||
<a href="/{{version}}/documentation/streams/core-concepts">Concepts</a>
|
||||
<a href="/{{version}}/documentation/streams/quickstart">Run Demo App</a>
|
||||
<a href="/{{version}}/documentation/streams/tutorial">Tutorial: Write App</a>
|
||||
</div -->
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div class="section" id="streams-security">
|
||||
<span id="streams-developer-guide-security"></span><h1>Streams Security<a class="headerlink" href="#streams-security" title="Permalink to this headline"></a></h1>
|
||||
<div class="contents local topic" id="table-of-contents">
|
||||
<p class="topic-title first"><b>Table of Contents</b></p>
|
||||
<ul class="simple">
|
||||
<li><a class="reference internal" href="#required-acl-setting-for-secure-kafka-clusters" id="id1">Required ACL setting for secure Kafka clusters</a></li>
|
||||
<li><a class="reference internal" href="#security-example" id="id2">Security example</a></li>
|
||||
</ul>
|
||||
</div>
|
||||
<p>Kafka Streams natively integrates with the <a class="reference internal" href="../../documentation.html#security"><span class="std std-ref">Kafka’s security features</span></a> and supports all of the
|
||||
client-side security features in Kafka. Streams leverages the <a class="reference internal" href="../../clients/index.html#kafka-clients"><span class="std std-ref">Java Producer and Consumer API</span></a>.</p>
|
||||
<p>To secure your Stream processing applications, configure the security settings in the corresponding Kafka producer
|
||||
and consumer clients, and then specify the corresponding configuration settings in your Kafka Streams application.</p>
|
||||
<p>Kafka supports cluster encryption and authentication, including a mix of authenticated and unauthenticated,
|
||||
and encrypted and non-encrypted clients. Using security is optional.</p>
|
||||
<p>Here a few relevant client-side security features:</p>
|
||||
<dl class="docutils">
|
||||
<dt>Encrypt data-in-transit between your applications and Kafka brokers</dt>
|
||||
<dd>You can enable the encryption of the client-server communication between your applications and the Kafka brokers.
|
||||
For example, you can configure your applications to always use encryption when reading and writing data to and from
|
||||
Kafka. This is critical when reading and writing data across security domains such as internal network, public
|
||||
internet, and partner networks.</dd>
|
||||
<dt>Client authentication</dt>
|
||||
<dd>You can enable client authentication for connections from your application to Kafka brokers. For example, you can
|
||||
define that only specific applications are allowed to connect to your Kafka cluster.</dd>
|
||||
<dt>Client authorization</dt>
|
||||
<dd>You can enable client authorization of read and write operations by your applications. For example, you can define
|
||||
that only specific applications are allowed to read from a Kafka topic. You can also restrict write access to Kafka
|
||||
topics to prevent data pollution or fraudulent activities.</dd>
|
||||
</dl>
|
||||
<p>For more information about the security features in Apache Kafka, see <a class="reference internal" href="../../documentation.html#security"><span class="std std-ref">Kafka Security</span></a>.</p>
|
||||
<div class="section" id="required-acl-setting-for-secure-kafka-clusters">
|
||||
<span id="streams-developer-guide-security-acls"></span><h2><a class="toc-backref" href="#id1">Required ACL setting for secure Kafka clusters</a><a class="headerlink" href="#required-acl-setting-for-secure-kafka-clusters" title="Permalink to this headline"></a></h2>
|
||||
<p>Kafka clusters can use ACLs to control access to resources (like the ability to create topics), and for such clusters each client,
|
||||
including Kafka Streams, is required to authenticate as a particular user in order to be authorized with appropriate access.
|
||||
In particular, when Streams applications are run against a secured Kafka cluster, the principal running the application must have
|
||||
the ACL set so that the application has the permissions to create, read and write
|
||||
<a class="reference internal" href="manage-topics.html#streams-developer-guide-topics-internal"><span class="std std-ref">internal topics</span></a>.</p>
|
||||
|
||||
<p>Since all internal topics as well as the embedded consumer group name are prefixed with the <a class="reference internal" href="/{{version}}/documentation/streams/developer-guide/config-streams.html#required-configuration-parameters"><span class="std std-ref">application id</span></a>,
|
||||
it is recommended to use ACLs on prefixed resource pattern
|
||||
to configure control lists to allow client to manage all topics and consumer groups started with this prefix
|
||||
as <code class="docutils literal"><span class="pre">--resource-pattern-type prefixed --topic your.application.id --operation All </span></code>
|
||||
(see <a class="reference external" href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-277+-+Fine+Grained+ACL+for+CreateTopics+API">KIP-277</a>
|
||||
and <a class="reference external" href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-290%3A+Support+for+Prefixed+ACLs">KIP-290</a> for details).
|
||||
</p>
|
||||
</div>
|
||||
<div class="section" id="security-example">
|
||||
<span id="streams-developer-guide-security-example"></span><h2><a class="toc-backref" href="#id2">Security example</a><a class="headerlink" href="#security-example" title="Permalink to this headline"></a></h2>
|
||||
<p>The purpose is to configure a Kafka Streams application to enable client authentication and encrypt data-in-transit when
|
||||
communicating with its Kafka cluster.</p>
|
||||
<p>This example assumes that the Kafka brokers in the cluster already have their security setup and that the necessary SSL
|
||||
certificates are available to the application in the local filesystem locations. For example, if you are using Docker
|
||||
then you must also include these SSL certificates in the correct locations within the Docker image.</p>
|
||||
<p>The snippet below shows the settings to enable client authentication and SSL encryption for data-in-transit between your
|
||||
Kafka Streams application and the Kafka cluster it is reading and writing from:</p>
|
||||
<div class="highlight-bash"><div class="highlight"><pre><span></span><span class="c1"># Essential security settings to enable client authentication and SSL encryption</span>
|
||||
bootstrap.servers<span class="o">=</span>kafka.example.com:9093
|
||||
security.protocol<span class="o">=</span>SSL
|
||||
ssl.truststore.location<span class="o">=</span>/etc/security/tls/kafka.client.truststore.jks
|
||||
ssl.truststore.password<span class="o">=</span>test1234
|
||||
ssl.keystore.location<span class="o">=</span>/etc/security/tls/kafka.client.keystore.jks
|
||||
ssl.keystore.password<span class="o">=</span>test1234
|
||||
ssl.key.password<span class="o">=</span>test1234
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>Configure these settings in the application for your <code class="docutils literal"><span class="pre">Properties</span></code> instance. These settings will encrypt any
|
||||
data-in-transit that is being read from or written to Kafka, and your application will authenticate itself against the
|
||||
Kafka brokers that it is communicating with. Note that this example does not cover client authorization.</p>
|
||||
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="c1">// Code of your Java application that uses the Kafka Streams library</span>
|
||||
<span class="n">Properties</span> <span class="n">settings</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Properties</span><span class="o">();</span>
|
||||
<span class="n">settings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">StreamsConfig</span><span class="o">.</span><span class="na">APPLICATION_ID_CONFIG</span><span class="o">,</span> <span class="s">"secure-kafka-streams-app"</span><span class="o">);</span>
|
||||
<span class="c1">// Where to find secure Kafka brokers. Here, it's on port 9093.</span>
|
||||
<span class="n">settings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">StreamsConfig</span><span class="o">.</span><span class="na">BOOTSTRAP_SERVERS_CONFIG</span><span class="o">,</span> <span class="s">"kafka.example.com:9093"</span><span class="o">);</span>
|
||||
<span class="c1">//</span>
|
||||
<span class="c1">// ...further non-security related settings may follow here...</span>
|
||||
<span class="c1">//</span>
|
||||
<span class="c1">// Security settings.</span>
|
||||
<span class="c1">// 1. These settings must match the security settings of the secure Kafka cluster.</span>
|
||||
<span class="c1">// 2. The SSL trust store and key store files must be locally accessible to the application.</span>
|
||||
<span class="n">settings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">CommonClientConfigs</span><span class="o">.</span><span class="na">SECURITY_PROTOCOL_CONFIG</span><span class="o">,</span> <span class="s">"SSL"</span><span class="o">);</span>
|
||||
<span class="n">settings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">SslConfigs</span><span class="o">.</span><span class="na">SSL_TRUSTSTORE_LOCATION_CONFIG</span><span class="o">,</span> <span class="s">"/etc/security/tls/kafka.client.truststore.jks"</span><span class="o">);</span>
|
||||
<span class="n">settings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">SslConfigs</span><span class="o">.</span><span class="na">SSL_TRUSTSTORE_PASSWORD_CONFIG</span><span class="o">,</span> <span class="s">"test1234"</span><span class="o">);</span>
|
||||
<span class="n">settings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">SslConfigs</span><span class="o">.</span><span class="na">SSL_KEYSTORE_LOCATION_CONFIG</span><span class="o">,</span> <span class="s">"/etc/security/tls/kafka.client.keystore.jks"</span><span class="o">);</span>
|
||||
<span class="n">settings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">SslConfigs</span><span class="o">.</span><span class="na">SSL_KEYSTORE_PASSWORD_CONFIG</span><span class="o">,</span> <span class="s">"test1234"</span><span class="o">);</span>
|
||||
<span class="n">settings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">SslConfigs</span><span class="o">.</span><span class="na">SSL_KEY_PASSWORD_CONFIG</span><span class="o">,</span> <span class="s">"test1234"</span><span class="o">);</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>If you incorrectly configure a security setting in your application, it will fail at runtime, typically right after you
|
||||
start it. For example, if you enter an incorrect password for the <code class="docutils literal"><span class="pre">ssl.keystore.password</span></code> setting, an error message
|
||||
similar to this would be logged and then the application would terminate:</p>
|
||||
<div class="highlight-bash"><div class="highlight"><pre><span></span><span class="c1"># Misconfigured ssl.keystore.password</span>
|
||||
Exception in thread <span class="s2">"main"</span> org.apache.kafka.common.KafkaException: Failed to construct kafka producer
|
||||
<span class="o">[</span>...snip...<span class="o">]</span>
|
||||
Caused by: org.apache.kafka.common.KafkaException: org.apache.kafka.common.KafkaException:
|
||||
java.io.IOException: Keystore was tampered with, or password was incorrect
|
||||
<span class="o">[</span>...snip...<span class="o">]</span>
|
||||
Caused by: java.security.UnrecoverableKeyException: Password verification failed
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>Monitor your Kafka Streams application log files for such error messages to spot any misconfigured applications quickly.</p>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
|
||||
</div>
|
||||
</div>
|
||||
<div class="pagination">
|
||||
<a href="/{{version}}/documentation/streams/developer-guide/manage-topics" class="pagination__btn pagination__btn__prev">Previous</a>
|
||||
<a href="/{{version}}/documentation/streams/developer-guide/app-reset-tool" class="pagination__btn pagination__btn__next">Next</a>
|
||||
</div>
|
||||
</script>
|
||||
|
||||
<!--#include virtual="../../../includes/_header.htm" -->
|
||||
<!--#include virtual="../../../includes/_top.htm" -->
|
||||
<div class="content documentation documentation--current">
|
||||
<!--#include virtual="../../../includes/_nav.htm" -->
|
||||
<div class="right">
|
||||
<!--#include virtual="../../../includes/_docs_banner.htm" -->
|
||||
<ul class="breadcrumbs">
|
||||
<li><a href="/documentation">Documentation</a></li>
|
||||
<li><a href="/documentation/streams">Kafka Streams</a></li>
|
||||
<li><a href="/documentation/streams/developer-guide/">Developer Guide</a></li>
|
||||
</ul>
|
||||
<div class="p-content"></div>
|
||||
</div>
|
||||
</div>
|
||||
<!--#include virtual="../../../includes/_footer.htm" -->
|
||||
<script>
|
||||
$(function() {
|
||||
// Show selected style on nav item
|
||||
$('.b-nav__streams').addClass('selected');
|
||||
|
||||
//sticky secondary nav
|
||||
var $navbar = $(".sub-nav-sticky"),
|
||||
y_pos = $navbar.offset().top,
|
||||
height = $navbar.height();
|
||||
|
||||
$(window).scroll(function() {
|
||||
var scrollTop = $(window).scrollTop();
|
||||
|
||||
if (scrollTop > y_pos - height) {
|
||||
$navbar.addClass("navbar-fixed")
|
||||
} else if (scrollTop <= y_pos) {
|
||||
$navbar.removeClass("navbar-fixed")
|
||||
}
|
||||
});
|
||||
|
||||
// Display docs subnav items
|
||||
$('.b-nav__docs').parent().toggleClass('nav__item__with__subs--expanded');
|
||||
});
|
||||
</script>
|
||||
439
docs/streams/developer-guide/testing.html
Normal file
439
docs/streams/developer-guide/testing.html
Normal file
@@ -0,0 +1,439 @@
|
||||
<!--
|
||||
Licensed to the Apache Software Foundation (ASF) under one or more
|
||||
contributor license agreements. See the NOTICE file distributed with
|
||||
this work for additional information regarding copyright ownership.
|
||||
The ASF licenses this file to You under the Apache License, Version 2.0
|
||||
(the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
-->
|
||||
|
||||
<script><!--#include virtual="../../js/templateData.js" --></script>
|
||||
|
||||
<script id="content-template" type="text/x-handlebars-template">
|
||||
<!-- h1>Developer Guide for Kafka Streams</h1 -->
|
||||
<div class="sub-nav-sticky">
|
||||
<div class="sticky-top">
|
||||
<!-- div style="height:35px">
|
||||
<a href="/{{version}}/documentation/streams/">Introduction</a>
|
||||
<a class="active-menu-item" href="/{{version}}/documentation/streams/developer-guide">Developer Guide</a>
|
||||
<a href="/{{version}}/documentation/streams/core-concepts">Concepts</a>
|
||||
<a href="/{{version}}/documentation/streams/quickstart">Run Demo App</a>
|
||||
<a href="/{{version}}/documentation/streams/tutorial">Tutorial: Write App</a>
|
||||
</div -->
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div class="section" id="testing">
|
||||
<span id="streams-developer-guide-testing"></span>
|
||||
<h1>Testing Kafka Streams<a class="headerlink" href="#testing" title="Permalink to this headline"></a></h1>
|
||||
<div class="contents local topic" id="table-of-contents">
|
||||
<p class="topic-title first"><b>Table of Contents</b></p>
|
||||
<ul class="simple">
|
||||
<li><a class="reference internal" href="#test-utils-artifact">Importing the test utilities</a></li>
|
||||
<li><a class="reference internal" href="#testing-topologytestdriver">Testing Streams applications</a>
|
||||
</li>
|
||||
<li><a class="reference internal" href="#unit-testing-processors">Unit testing Processors</a>
|
||||
</li>
|
||||
</ul>
|
||||
</div>
|
||||
<div class="section" id="test-utils-artifact">
|
||||
<h2><a class="toc-backref" href="#test-utils-artifact" title="Permalink to this headline">Importing the test
|
||||
utilities</a></h2>
|
||||
<p>
|
||||
To test a Kafka Streams application, Kafka provides a test-utils artifact that can be added as regular
|
||||
dependency to your test code base. Example <code>pom.xml</code> snippet when using Maven:
|
||||
</p>
|
||||
<pre>
|
||||
<dependency>
|
||||
<groupId>org.apache.kafka</groupId>
|
||||
<artifactId>kafka-streams-test-utils</artifactId>
|
||||
<version>{{fullDotVersion}}</version>
|
||||
<scope>test</scope>
|
||||
</dependency>
|
||||
</pre>
|
||||
</div>
|
||||
<div class="section" id="testing-topologytestdriver">
|
||||
<h2><a class="toc-backref" href="#testing-topologytestdriver" title="Permalink to this headline">Testing a
|
||||
Streams application</a></h2>
|
||||
|
||||
<p>
|
||||
The test-utils package provides a <code>TopologyTestDriver</code> that can be used pipe data through a
|
||||
<code>Topology</code> that is either assembled manually
|
||||
using Processor API or via the DSL using <code>StreamsBuilder</code>.
|
||||
The test driver simulates the library runtime that continuously fetches records from input topics and
|
||||
processes them by traversing the topology.
|
||||
You can use the test driver to verify that your specified processor topology computes the correct result
|
||||
with the manually piped in data records.
|
||||
The test driver captures the results records and allows to query its embedded state stores.
|
||||
<pre>
|
||||
// Processor API
|
||||
Topology topology = new Topology();
|
||||
topology.addSource("sourceProcessor", "input-topic");
|
||||
topology.addProcessor("processor", ..., "sourceProcessor");
|
||||
topology.addSink("sinkProcessor", "output-topic", "processor");
|
||||
// or
|
||||
// using DSL
|
||||
StreamsBuilder builder = new StreamsBuilder();
|
||||
builder.stream("input-topic").filter(...).to("output-topic");
|
||||
Topology topology = builder.build();
|
||||
|
||||
// setup test driver
|
||||
Properties props = new Properties();
|
||||
props.put(StreamsConfig.APPLICATION_ID_CONFIG, "test");
|
||||
props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "dummy:1234");
|
||||
TopologyTestDriver testDriver = new TopologyTestDriver(topology, props);
|
||||
</pre>
|
||||
<p>
|
||||
With the test driver you can create <code>TestInputTopic</code> giving topic name and the corresponding serializers.
|
||||
<code>TestInputTopic</code> provides various methods to pipe new message values, keys and values, or list of KeyValue objects.
|
||||
</p>
|
||||
<pre>
|
||||
TestInputTopic<String, Long> inputTopic = testDriver.createInputTopic("input-topic", stringSerde.serializer(), longSerde.serializer());
|
||||
inputTopic.pipeInput("key", 42L);
|
||||
</pre>
|
||||
<p>
|
||||
To verify the output, you can use <code>TestOutputTopic</code>
|
||||
where you configure the topic and the corresponding deserializers during initialization.
|
||||
It offers helper methods to read only certain parts of the result records or the collection of records.
|
||||
For example, you can validate returned <code>KeyValue</code> with standard assertions
|
||||
if you only care about the key and value, but not the timestamp of the result record.
|
||||
</p>
|
||||
<pre>
|
||||
TestOutputTopic<String, Long> outputTopic = testDriver.createOutputTopic("result-topic", stringSerde.deserializer(), longSerde.deserializer());
|
||||
assertThat(outputTopic.readKeyValue(), equalTo(new KeyValue<>("key", 42L)));
|
||||
</pre>
|
||||
<p>
|
||||
<code>TopologyTestDriver</code> supports punctuations, too.
|
||||
Event-time punctuations are triggered automatically based on the processed records' timestamps.
|
||||
Wall-clock-time punctuations can also be triggered by advancing the test driver's wall-clock-time (the
|
||||
driver mocks wall-clock-time internally to give users control over it).
|
||||
</p>
|
||||
<pre>
|
||||
testDriver.advanceWallClockTime(Duration.ofSeconds(20));
|
||||
</pre>
|
||||
<p>
|
||||
Additionally, you can access state stores via the test driver before or after a test.
|
||||
Accessing stores before a test is useful to pre-populate a store with some initial values.
|
||||
After data was processed, expected updates to the store can be verified.
|
||||
</p>
|
||||
<pre>
|
||||
KeyValueStore store = testDriver.getKeyValueStore("store-name");
|
||||
</pre>
|
||||
<p>
|
||||
Note, that you should always close the test driver at the end to make sure all resources are release
|
||||
properly.
|
||||
</p>
|
||||
<pre>
|
||||
testDriver.close();
|
||||
</pre>
|
||||
|
||||
<h3>Example</h3>
|
||||
<p>
|
||||
The following example demonstrates how to use the test driver and helper classes.
|
||||
The example creates a topology that computes the maximum value per key using a key-value-store.
|
||||
While processing, no output is generated, but only the store is updated.
|
||||
Output is only sent downstream based on event-time and wall-clock punctuations.
|
||||
</p>
|
||||
<pre>
|
||||
private TopologyTestDriver testDriver;
|
||||
private TestInputTopic<String, Long> inputTopic;
|
||||
private TestOutputTopic<String, Long> outputTopic;
|
||||
private KeyValueStore<String, Long> store;
|
||||
|
||||
private Serde<String> stringSerde = new Serdes.StringSerde();
|
||||
private Serde<Long> longSerde = new Serdes.LongSerde();
|
||||
|
||||
@Before
|
||||
public void setup() {
|
||||
Topology topology = new Topology();
|
||||
topology.addSource("sourceProcessor", "input-topic");
|
||||
topology.addProcessor("aggregator", new CustomMaxAggregatorSupplier(), "sourceProcessor");
|
||||
topology.addStateStore(
|
||||
Stores.keyValueStoreBuilder(
|
||||
Stores.inMemoryKeyValueStore("aggStore"),
|
||||
Serdes.String(),
|
||||
Serdes.Long()).withLoggingDisabled(), // need to disable logging to allow store pre-populating
|
||||
"aggregator");
|
||||
topology.addSink("sinkProcessor", "result-topic", "aggregator");
|
||||
|
||||
// setup test driver
|
||||
Properties props = new Properties();
|
||||
props.setProperty(StreamsConfig.APPLICATION_ID_CONFIG, "maxAggregation");
|
||||
props.setProperty(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "dummy:1234");
|
||||
props.setProperty(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName());
|
||||
props.setProperty(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.Long().getClass().getName());
|
||||
testDriver = new TopologyTestDriver(topology, props);
|
||||
|
||||
// setup test topics
|
||||
inputTopic = testDriver.createInputTopic("input-topic", stringSerde.serializer(), longSerde.serializer());
|
||||
outputTopic = testDriver.createOutputTopic("result-topic", stringSerde.deserializer(), longSerde.deserializer());
|
||||
|
||||
// pre-populate store
|
||||
store = testDriver.getKeyValueStore("aggStore");
|
||||
store.put("a", 21L);
|
||||
}
|
||||
|
||||
@After
|
||||
public void tearDown() {
|
||||
testDriver.close();
|
||||
}
|
||||
|
||||
@Test
|
||||
public void shouldFlushStoreForFirstInput() {
|
||||
inputTopic.pipeInput("a", 1L);
|
||||
assertThat(outputTopic.readKeyValue(), equalTo(new KeyValue<>("a", 21L)));
|
||||
assertThat(outputTopic.isEmpty(), is(true));
|
||||
}
|
||||
|
||||
@Test
|
||||
public void shouldNotUpdateStoreForSmallerValue() {
|
||||
inputTopic.pipeInput("a", 1L);
|
||||
assertThat(store.get("a"), equalTo(21L));
|
||||
assertThat(outputTopic.readKeyValue(), equalTo(new KeyValue<>("a", 21L)));
|
||||
assertThat(outputTopic.isEmpty(), is(true));
|
||||
}
|
||||
|
||||
@Test
|
||||
public void shouldNotUpdateStoreForLargerValue() {
|
||||
inputTopic.pipeInput("a", 42L);
|
||||
assertThat(store.get("a"), equalTo(42L));
|
||||
assertThat(outputTopic.readKeyValue(), equalTo(new KeyValue<>("a", 42L)));
|
||||
assertThat(outputTopic.isEmpty(), is(true));
|
||||
}
|
||||
|
||||
@Test
|
||||
public void shouldUpdateStoreForNewKey() {
|
||||
inputTopic.pipeInput("b", 21L);
|
||||
assertThat(store.get("b"), equalTo(21L));
|
||||
assertThat(outputTopic.readKeyValue(), equalTo(new KeyValue<>("a", 21L)));
|
||||
assertThat(outputTopic.readKeyValue(), equalTo(new KeyValue<>("b", 21L)));
|
||||
assertThat(outputTopic.isEmpty(), is(true));
|
||||
}
|
||||
|
||||
@Test
|
||||
public void shouldPunctuateIfEvenTimeAdvances() {
|
||||
final Instant recordTime = Instant.now();
|
||||
inputTopic.pipeInput("a", 1L, recordTime);
|
||||
assertThat(outputTopic.readKeyValue(), equalTo(new KeyValue<>("a", 21L)));
|
||||
|
||||
inputTopic.pipeInput("a", 1L, recordTime);
|
||||
assertThat(outputTopic.isEmpty(), is(true));
|
||||
|
||||
inputTopic.pipeInput("a", 1L, recordTime.plusSeconds(10L));
|
||||
assertThat(outputTopic.readKeyValue(), equalTo(new KeyValue<>("a", 21L)));
|
||||
assertThat(outputTopic.isEmpty(), is(true));
|
||||
}
|
||||
|
||||
@Test
|
||||
public void shouldPunctuateIfWallClockTimeAdvances() {
|
||||
testDriver.advanceWallClockTime(Duration.ofSeconds(60));
|
||||
assertThat(outputTopic.readKeyValue(), equalTo(new KeyValue<>("a", 21L)));
|
||||
assertThat(outputTopic.isEmpty(), is(true));
|
||||
}
|
||||
|
||||
public class CustomMaxAggregatorSupplier implements ProcessorSupplier<String, Long> {
|
||||
@Override
|
||||
public Processor<String, Long> get() {
|
||||
return new CustomMaxAggregator();
|
||||
}
|
||||
}
|
||||
|
||||
public class CustomMaxAggregator implements Processor<String, Long> {
|
||||
ProcessorContext context;
|
||||
private KeyValueStore<String, Long> store;
|
||||
|
||||
@SuppressWarnings("unchecked")
|
||||
@Override
|
||||
public void init(ProcessorContext context) {
|
||||
this.context = context;
|
||||
context.schedule(Duration.ofSeconds(60), PunctuationType.WALL_CLOCK_TIME, time -> flushStore());
|
||||
context.schedule(Duration.ofSeconds(10), PunctuationType.STREAM_TIME, time -> flushStore());
|
||||
store = (KeyValueStore<String, Long>) context.getStateStore("aggStore");
|
||||
}
|
||||
|
||||
@Override
|
||||
public void process(String key, Long value) {
|
||||
Long oldValue = store.get(key);
|
||||
if (oldValue == null || value > oldValue) {
|
||||
store.put(key, value);
|
||||
}
|
||||
}
|
||||
|
||||
private void flushStore() {
|
||||
KeyValueIterator<String, Long> it = store.all();
|
||||
while (it.hasNext()) {
|
||||
KeyValue<String, Long> next = it.next();
|
||||
context.forward(next.key, next.value);
|
||||
}
|
||||
}
|
||||
|
||||
@Override
|
||||
public void close() {}
|
||||
}
|
||||
</pre>
|
||||
</div>
|
||||
<div class="section" id="unit-testing-processors">
|
||||
<h2>
|
||||
<a class="headerlink" href="#unit-testing-processors"
|
||||
title="Permalink to this headline">Unit Testing Processors</a>
|
||||
</h2>
|
||||
<p>
|
||||
If you <a href="processor-api.html">write a Processor</a>, you will want to test it.
|
||||
</p>
|
||||
<p>
|
||||
Because the <code>Processor</code> forwards its results to the context rather than returning them,
|
||||
Unit testing requires a mocked context capable of capturing forwarded data for inspection.
|
||||
For this reason, we provide a <code>MockProcessorContext</code> in <a href="#test-utils-artifact"><code>test-utils</code></a>.
|
||||
</p>
|
||||
<b>Construction</b>
|
||||
<p>
|
||||
To begin with, instantiate your processor and initialize it with the mock context:
|
||||
<pre>
|
||||
final Processor processorUnderTest = ...;
|
||||
final MockProcessorContext context = new MockProcessorContext();
|
||||
processorUnderTest.init(context);
|
||||
</pre>
|
||||
If you need to pass configuration to your processor or set the default serdes, you can create the mock with
|
||||
config:
|
||||
<pre>
|
||||
final Properties props = new Properties();
|
||||
props.put(StreamsConfig.APPLICATION_ID_CONFIG, "unit-test");
|
||||
props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "");
|
||||
props.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass());
|
||||
props.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.Long().getClass());
|
||||
props.put("some.other.config", "some config value");
|
||||
final MockProcessorContext context = new MockProcessorContext(props);
|
||||
</pre>
|
||||
</p>
|
||||
<b>Captured data</b>
|
||||
<p>
|
||||
The mock will capture any values that your processor forwards. You can make assertions on them:
|
||||
<pre>
|
||||
processorUnderTest.process("key", "value");
|
||||
|
||||
final Iterator<CapturedForward> forwarded = context.forwarded().iterator();
|
||||
assertEquals(forwarded.next().keyValue(), new KeyValue<>(..., ...));
|
||||
assertFalse(forwarded.hasNext());
|
||||
|
||||
// you can reset forwards to clear the captured data. This may be helpful in constructing longer scenarios.
|
||||
context.resetForwards();
|
||||
|
||||
assertEquals(context.forwarded().size(), 0);
|
||||
</pre>
|
||||
If your processor forwards to specific child processors, you can query the context for captured data by
|
||||
child name:
|
||||
<pre>
|
||||
final List<CapturedForward> captures = context.forwarded("childProcessorName");
|
||||
</pre>
|
||||
The mock also captures whether your processor has called <code>commit()</code> on the context:
|
||||
<pre>
|
||||
assertTrue(context.committed());
|
||||
|
||||
// commit captures can also be reset.
|
||||
context.resetCommit();
|
||||
|
||||
assertFalse(context.committed());
|
||||
</pre>
|
||||
</p>
|
||||
<b>Setting record metadata</b>
|
||||
<p>
|
||||
In case your processor logic depends on the record metadata (topic, partition, offset, or timestamp),
|
||||
you can set them on the context, either all together or individually:
|
||||
<pre>
|
||||
context.setRecordMetadata("topicName", /*partition*/ 0, /*offset*/ 0L, /*timestamp*/ 0L);
|
||||
context.setTopic("topicName");
|
||||
context.setPartition(0);
|
||||
context.setOffset(0L);
|
||||
context.setTimestamp(0L);
|
||||
</pre>
|
||||
Once these are set, the context will continue returning the same values, until you set new ones.
|
||||
</p>
|
||||
<b>State stores</b>
|
||||
<p>
|
||||
In case your punctuator is stateful, the mock context allows you to register state stores.
|
||||
You're encouraged to use a simple in-memory store of the appropriate type (KeyValue, Windowed, or
|
||||
Session), since the mock context does <i>not</i> manage changelogs, state directories, etc.
|
||||
</p>
|
||||
<pre>
|
||||
final KeyValueStore<String, Integer> store =
|
||||
Stores.keyValueStoreBuilder(
|
||||
Stores.inMemoryKeyValueStore("myStore"),
|
||||
Serdes.String(),
|
||||
Serdes.Integer()
|
||||
)
|
||||
.withLoggingDisabled() // Changelog is not supported by MockProcessorContext.
|
||||
.build();
|
||||
store.init(context, store);
|
||||
context.register(store, /*deprecated parameter*/ false, /*parameter unused in mock*/ null);
|
||||
</pre>
|
||||
<b>Verifying punctuators</b>
|
||||
<p>
|
||||
Processors can schedule punctuators to handle periodic tasks.
|
||||
The mock context does <i>not</i> automatically execute punctuators, but it does capture them to
|
||||
allow you to unit test them as well:
|
||||
<pre>
|
||||
final MockProcessorContext.CapturedPunctuator capturedPunctuator = context.scheduledPunctuators().get(0);
|
||||
final long interval = capturedPunctuator.getIntervalMs();
|
||||
final PunctuationType type = capturedPunctuator.getType();
|
||||
final boolean cancelled = capturedPunctuator.cancelled();
|
||||
final Punctuator punctuator = capturedPunctuator.getPunctuator();
|
||||
punctuator.punctuate(/*timestamp*/ 0L);
|
||||
</pre>
|
||||
If you need to write tests involving automatic firing of scheduled punctuators, we recommend creating a
|
||||
simple topology with your processor and using the <a href="testing.html#testing-topologytestdriver"><code>TopologyTestDriver</code></a>.
|
||||
</p>
|
||||
</div>
|
||||
</div>
|
||||
<div class="pagination">
|
||||
<a href="/{{version}}/documentation/streams/developer-guide/datatypes" class="pagination__btn pagination__btn__prev">Previous</a>
|
||||
<a href="/{{version}}/documentation/streams/developer-guide/interactive-queries" class="pagination__btn pagination__btn__next">Next</a>
|
||||
</div>
|
||||
</script>
|
||||
|
||||
<!--#include virtual="../../../includes/_header.htm" -->
|
||||
<!--#include virtual="../../../includes/_top.htm" -->
|
||||
<div class="content documentation documentation--current">
|
||||
<!--#include virtual="../../../includes/_nav.htm" -->
|
||||
<div class="right">
|
||||
<!--#include virtual="../../../includes/_docs_banner.htm" -->
|
||||
<ul class="breadcrumbs">
|
||||
<li><a href="/documentation">Documentation</a></li>
|
||||
<li><a href="/documentation/streams">Kafka Streams</a></li>
|
||||
<li><a href="/documentation/streams/developer-guide/">Developer Guide</a></li>
|
||||
</ul>
|
||||
<div class="p-content"></div>
|
||||
</div>
|
||||
</div>
|
||||
<!--#include virtual="../../../includes/_footer.htm" -->
|
||||
<script>
|
||||
$(function () {
|
||||
// Show selected style on nav item
|
||||
$('.b-nav__streams').addClass('selected');
|
||||
|
||||
//sticky secondary nav
|
||||
var $navbar = $(".sub-nav-sticky"),
|
||||
y_pos = $navbar.offset().top,
|
||||
height = $navbar.height();
|
||||
|
||||
$(window).scroll(function () {
|
||||
var scrollTop = $(window).scrollTop();
|
||||
|
||||
if (scrollTop > y_pos - height) {
|
||||
$navbar.addClass("navbar-fixed")
|
||||
} else if (scrollTop <= y_pos) {
|
||||
$navbar.removeClass("navbar-fixed")
|
||||
}
|
||||
});
|
||||
|
||||
// Display docs subnav items
|
||||
$('.b-nav__docs').parent().toggleClass('nav__item__with__subs--expanded');
|
||||
});
|
||||
</script>
|
||||
263
docs/streams/developer-guide/write-streams.html
Normal file
263
docs/streams/developer-guide/write-streams.html
Normal file
@@ -0,0 +1,263 @@
|
||||
<!--
|
||||
Licensed to the Apache Software Foundation (ASF) under one or more
|
||||
contributor license agreements. See the NOTICE file distributed with
|
||||
this work for additional information regarding copyright ownership.
|
||||
The ASF licenses this file to You under the Apache License, Version 2.0
|
||||
(the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
-->
|
||||
|
||||
<script><!--#include virtual="../../js/templateData.js" --></script>
|
||||
|
||||
<script id="content-template" type="text/x-handlebars-template">
|
||||
<!-- h1>Developer Guide for Kafka Streams</h1 -->
|
||||
<div class="sub-nav-sticky">
|
||||
<!-- div class="sticky-top">
|
||||
<div style="height:35px">
|
||||
<a href="/{{version}}/documentation/streams/">Introduction</a>
|
||||
<a class="active-menu-item" href="/{{version}}/documentation/streams/developer-guide">Developer Guide</a>
|
||||
<a href="/{{version}}/documentation/streams/core-concepts">Concepts</a>
|
||||
<a href="/{{version}}/documentation/streams/quickstart">Run Demo App</a>
|
||||
<a href="/{{version}}/documentation/streams/tutorial">Tutorial: Write App</a>
|
||||
</div>
|
||||
</div -->
|
||||
</div>
|
||||
|
||||
<div class="section" id="writing-a-streams-application">
|
||||
<span id="streams-write-app"></span><h1>Writing a Streams Application<a class="headerlink" href="#writing-a-streams-application" title="Permalink to this headline"></a></h1>
|
||||
<p class="topic-title first"><b>Table of Contents</b></p>
|
||||
<ul class="simple">
|
||||
<li><a class="reference internal" href="#libraries-and-maven-artifacts" id="id1">Libraries and Maven artifacts</a></li>
|
||||
<li><a class="reference internal" href="#using-kafka-streams-within-your-application-code" id="id2">Using Kafka Streams within your application code</a></li>
|
||||
<li><a class="reference internal" href="#testing-a-streams-app" id="id3">Testing a Streams application</a></li>
|
||||
</ul>
|
||||
<p>Any Java or Scala application that makes use of the Kafka Streams library is considered a Kafka Streams application.
|
||||
The computational logic of a Kafka Streams application is defined as a <a class="reference internal" href="../core-concepts#streams_topology"><span class="std std-ref">processor topology</span></a>,
|
||||
which is a graph of stream processors (nodes) and streams (edges).</p>
|
||||
<p>You can define the processor topology with the Kafka Streams APIs:</p>
|
||||
<dl class="docutils">
|
||||
<dt><a class="reference internal" href="dsl-api.html#streams-developer-guide-dsl"><span class="std std-ref">Kafka Streams DSL</span></a></dt>
|
||||
<dd>A high-level API that provides the most common data transformation operations such as <code class="docutils literal"><span class="pre">map</span></code>, <code class="docutils literal"><span class="pre">filter</span></code>, <code class="docutils literal"><span class="pre">join</span></code>, and <code class="docutils literal"><span class="pre">aggregations</span></code> out of the box. The DSL is the recommended starting point for developers new to Kafka Streams, and should cover many use cases and stream processing needs. If you're writing a Scala application then you can use the <a href="dsl-api.html#scala-dsl"><span class="std std-ref">Kafka Streams DSL for Scala</span></a> library which removes much of the Java/Scala interoperability boilerplate as opposed to working directly with the Java DSL.</a></dd>
|
||||
<dt><a class="reference internal" href="processor-api.html#streams-developer-guide-processor-api"><span class="std std-ref">Processor API</span></a></dt>
|
||||
<dd>A low-level API that lets you add and connect processors as well as interact directly with state stores. The Processor API provides you with even more flexibility than the DSL but at the expense of requiring more manual work on the side of the application developer (e.g., more lines of code).</dd>
|
||||
</dl>
|
||||
<div class="section" id="libraries-and-maven-artifacts">
|
||||
<span id="streams-developer-guide-maven"></span><h2>Libraries and Maven artifacts</h2>
|
||||
<p>This section lists the Kafka Streams related libraries that are available for writing your Kafka Streams applications.</p>
|
||||
<p>You can define dependencies on the following libraries for your Kafka Streams applications.</p>
|
||||
<table border="1" class="datatable">
|
||||
<colgroup>
|
||||
<col width="14%" />
|
||||
<col width="19%" />
|
||||
<col width="12%" />
|
||||
<col width="55%" />
|
||||
</colgroup>
|
||||
<thead valign="bottom">
|
||||
<tr class="row-odd"><th class="head">Group ID</th>
|
||||
<th class="head">Artifact ID</th>
|
||||
<th class="head">Version</th>
|
||||
<th class="head">Description</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody valign="top">
|
||||
<tr class="row-even"><td><code class="docutils literal"><span class="pre">org.apache.kafka</span></code></td>
|
||||
<td><code class="docutils literal"><span class="pre">kafka-streams</span></code></td>
|
||||
<td><code class="docutils literal"><span class="pre">{{fullDotVersion}}</span></code></td>
|
||||
<td>(Required) Base library for Kafka Streams.</td>
|
||||
</tr>
|
||||
<tr class="row-odd"><td><code class="docutils literal"><span class="pre">org.apache.kafka</span></code></td>
|
||||
<td><code class="docutils literal"><span class="pre">kafka-clients</span></code></td>
|
||||
<td><code class="docutils literal"><span class="pre">{{fullDotVersion}}</span></code></td>
|
||||
<td>(Required) Kafka client library. Contains built-in serializers/deserializers.</td>
|
||||
</tr>
|
||||
<tr class="row-even"><td><code class="docutils literal"><span class="pre">org.apache.kafka</span></code></td>
|
||||
<td><code class="docutils literal"><span class="pre">kafka-streams-scala</span></code></td>
|
||||
<td><code class="docutils literal"><span class="pre">{{fullDotVersion}}</span></code></td>
|
||||
<td>(Optional) Kafka Streams DSL for Scala library to write Scala Kafka Streams applications. When not using SBT you will need to suffix the artifact ID with the correct version of Scala your application is using (<code class="docutils literal"><span class="pre">_2.12</code></span>, <code class="docutils literal"><span class="pre">_2.13</code></span>)</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
<div class="admonition tip">
|
||||
<p><b>Tip</b></p>
|
||||
<p class="last">See the section <a class="reference internal" href="datatypes.html#streams-developer-guide-serdes"><span class="std std-ref">Data Types and Serialization</span></a> for more information about Serializers/Deserializers.</p>
|
||||
</div>
|
||||
<p>Example <code class="docutils literal"><span class="pre">pom.xml</span></code> snippet when using Maven:</p>
|
||||
<pre class="brush: xml;">
|
||||
<dependency>
|
||||
<groupId>org.apache.kafka</groupId>
|
||||
<artifactId>kafka-streams</artifactId>
|
||||
<version>{{fullDotVersion}}</version>
|
||||
</dependency>
|
||||
<dependency>
|
||||
<groupId>org.apache.kafka</groupId>
|
||||
<artifactId>kafka-clients</artifactId>
|
||||
<version>{{fullDotVersion}}</version>
|
||||
</dependency>
|
||||
<!-- Optionally include Kafka Streams DSL for Scala for Scala {{scalaVersion}} -->
|
||||
<dependency>
|
||||
<groupId>org.apache.kafka</groupId>
|
||||
<artifactId>kafka-streams-scala_{{scalaVersion}}</artifactId>
|
||||
<version>{{fullDotVersion}}</version>
|
||||
</dependency>
|
||||
</pre>
|
||||
</div>
|
||||
<div class="section" id="using-kafka-streams-within-your-application-code">
|
||||
<h2>Using Kafka Streams within your application code<a class="headerlink" href="#using-kafka-streams-within-your-application-code" title="Permalink to this headline"></a></h2>
|
||||
<p>You can call Kafka Streams from anywhere in your application code, but usually these calls are made within the <code class="docutils literal"><span class="pre">main()</span></code> method of
|
||||
your application, or some variant thereof. The basic elements of defining a processing topology within your application
|
||||
are described below.</p>
|
||||
<p>First, you must create an instance of <code class="docutils literal"><span class="pre">KafkaStreams</span></code>.</p>
|
||||
<ul class="simple">
|
||||
<li>The first argument of the <code class="docutils literal"><span class="pre">KafkaStreams</span></code> constructor takes a topology (either <code class="docutils literal"><span class="pre">StreamsBuilder#build()</span></code> for the
|
||||
<a class="reference internal" href="dsl-api.html#streams-developer-guide-dsl"><span class="std std-ref">DSL</span></a> or <code class="docutils literal"><span class="pre">Topology</span></code> for the
|
||||
<a class="reference internal" href="processor-api.html#streams-developer-guide-processor-api"><span class="std std-ref">Processor API</span></a>) that is used to define a topology.</li>
|
||||
<li>The second argument is an instance of <code class="docutils literal"><span class="pre">java.util.Properties</span></code>, which defines the configuration for this specific topology.</li>
|
||||
</ul>
|
||||
<p>Code example:</p>
|
||||
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">org.apache.kafka.streams.KafkaStreams</span><span class="o">;</span>
|
||||
<span class="kn">import</span> <span class="nn">org.apache.kafka.streams.kstream.StreamsBuilder</span><span class="o">;</span>
|
||||
<span class="kn">import</span> <span class="nn">org.apache.kafka.streams.processor.Topology</span><span class="o">;</span>
|
||||
|
||||
<span class="c1">// Use the builders to define the actual processing topology, e.g. to specify</span>
|
||||
<span class="c1">// from which input topics to read, which stream operations (filter, map, etc.)</span>
|
||||
<span class="c1">// should be called, and so on. We will cover this in detail in the subsequent</span>
|
||||
<span class="c1">// sections of this Developer Guide.</span>
|
||||
|
||||
<span class="n">StreamsBuilder</span> <span class="n">builder</span> <span class="o">=</span> <span class="o">...;</span> <span class="c1">// when using the DSL</span>
|
||||
<span class="n">Topology</span> <span class="n">topology</span> <span class="o">=</span> <span class="n">builder</span><span class="o">.</span><span class="na">build</span><span class="o">();</span>
|
||||
<span class="c1">//</span>
|
||||
<span class="c1">// OR</span>
|
||||
<span class="c1">//</span>
|
||||
<span class="n">Topology</span> <span class="n">topology</span> <span class="o">=</span> <span class="o">...;</span> <span class="c1">// when using the Processor API</span>
|
||||
|
||||
<span class="c1">// Use the configuration to tell your application where the Kafka cluster is,</span>
|
||||
<span class="c1">// which Serializers/Deserializers to use by default, to specify security settings,</span>
|
||||
<span class="c1">// and so on.</span>
|
||||
<span class="n">Properties</span> <span class="n">props</span> <span class="o">=</span> <span class="o">...;</span>
|
||||
|
||||
<span class="n">KafkaStreams</span> <span class="n">streams</span> <span class="o">=</span> <span class="k">new</span> <span class="n">KafkaStreams</span><span class="o">(</span><span class="n">topology</span><span class="o">,</span> <span class="n">props</span><span class="o">);</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>At this point, internal structures are initialized, but the processing is not started yet.
|
||||
You have to explicitly start the Kafka Streams thread by calling the <code class="docutils literal"><span class="pre">KafkaStreams#start()</span></code> method:</p>
|
||||
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="c1">// Start the Kafka Streams threads</span>
|
||||
<span class="n">streams</span><span class="o">.</span><span class="na">start</span><span class="o">();</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>If there are other instances of this stream processing application running elsewhere (e.g., on another machine), Kafka
|
||||
Streams transparently re-assigns tasks from the existing instances to the new instance that you just started.
|
||||
For more information, see <a class="reference internal" href="../architecture.html#streams_architecture_tasks"><span class="std std-ref">Stream Partitions and Tasks</span></a> and <a class="reference internal" href="../architecture.html#streams-architecture-threads"><span class="std std-ref">Threading Model</span></a>.</p>
|
||||
<p>To catch any unexpected exceptions, you can set an <code class="docutils literal"><span class="pre">java.lang.Thread.UncaughtExceptionHandler</span></code> before you start the
|
||||
application. This handler is called whenever a stream thread is terminated by an unexpected exception:</p>
|
||||
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="c1">// Java 8+, using lambda expressions</span>
|
||||
<span class="n">streams</span><span class="o">.</span><span class="na">setUncaughtExceptionHandler</span><span class="o">((</span><span class="n">Thread</span> <span class="n">thread</span><span class="o">,</span> <span class="n">Throwable</span> <span class="n">throwable</span><span class="o">)</span> <span class="o">-></span> <span class="o">{</span>
|
||||
<span class="c1">// here you should examine the throwable/exception and perform an appropriate action!</span>
|
||||
<span class="o">});</span>
|
||||
|
||||
|
||||
<span class="c1">// Java 7</span>
|
||||
<span class="n">streams</span><span class="o">.</span><span class="na">setUncaughtExceptionHandler</span><span class="o">(</span><span class="k">new</span> <span class="n">Thread</span><span class="o">.</span><span class="na">UncaughtExceptionHandler</span><span class="o">()</span> <span class="o">{</span>
|
||||
<span class="kd">public</span> <span class="kt">void</span> <span class="nf">uncaughtException</span><span class="o">(</span><span class="n">Thread</span> <span class="n">thread</span><span class="o">,</span> <span class="n">Throwable</span> <span class="n">throwable</span><span class="o">)</span> <span class="o">{</span>
|
||||
<span class="c1">// here you should examine the throwable/exception and perform an appropriate action!</span>
|
||||
<span class="o">}</span>
|
||||
<span class="o">});</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>To stop the application instance, call the <code class="docutils literal"><span class="pre">KafkaStreams#close()</span></code> method:</p>
|
||||
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="c1">// Stop the Kafka Streams threads</span>
|
||||
<span class="n">streams</span><span class="o">.</span><span class="na">close</span><span class="o">();</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>To allow your application to gracefully shutdown in response to SIGTERM, it is recommended that you add a shutdown hook
|
||||
and call <code class="docutils literal"><span class="pre">KafkaStreams#close</span></code>.</p>
|
||||
<ul>
|
||||
<li><p class="first">Here is a shutdown hook example in Java 8+:</p>
|
||||
<blockquote>
|
||||
<div><div class="highlight-java"><div class="highlight"><pre><span></span><span class="c1">// Add shutdown hook to stop the Kafka Streams threads.</span>
|
||||
<span class="c1">// You can optionally provide a timeout to `close`.</span>
|
||||
<span class="n">Runtime</span><span class="o">.</span><span class="na">getRuntime</span><span class="o">().</span><span class="na">addShutdownHook</span><span class="o">(</span><span class="k">new</span> <span class="n">Thread</span><span class="o">(</span><span class="n">streams</span><span class="o">::</span><span class="n">close</span><span class="o">));</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div></blockquote>
|
||||
</li>
|
||||
<li><p class="first">Here is a shutdown hook example in Java 7:</p>
|
||||
<blockquote>
|
||||
<div><div class="highlight-java"><div class="highlight"><pre><span></span><span class="c1">// Add shutdown hook to stop the Kafka Streams threads.</span>
|
||||
<span class="c1">// You can optionally provide a timeout to `close`.</span>
|
||||
<span class="n">Runtime</span><span class="o">.</span><span class="na">getRuntime</span><span class="o">().</span><span class="na">addShutdownHook</span><span class="o">(</span><span class="k">new</span> <span class="n">Thread</span><span class="o">(</span><span class="k">new</span> <span class="n">Runnable</span><span class="o">()</span> <span class="o">{</span>
|
||||
<span class="nd">@Override</span>
|
||||
<span class="kd">public</span> <span class="kt">void</span> <span class="nf">run</span><span class="o">()</span> <span class="o">{</span>
|
||||
<span class="n">streams</span><span class="o">.</span><span class="na">close</span><span class="o">();</span>
|
||||
<span class="o">}</span>
|
||||
<span class="o">}));</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div></blockquote>
|
||||
</li>
|
||||
</ul>
|
||||
<p>After an application is stopped, Kafka Streams will migrate any tasks that had been running in this instance to available remaining
|
||||
instances.</p>
|
||||
</div>
|
||||
|
||||
<div class="section" id="testing-a-streams-app">
|
||||
<a class="headerlink" href="#testing-a-streams-app" title="Permalink to this headline"><h2>Testing a Streams application</a></h2>
|
||||
Kafka Streams comes with a <code>test-utils</code> module to help you test your application <a href="testing.html">here</a>.
|
||||
</div>
|
||||
</div>
|
||||
|
||||
|
||||
</div>
|
||||
</div>
|
||||
<div class="pagination">
|
||||
<a href="/{{version}}/documentation/streams/developer-guide/" class="pagination__btn pagination__btn__prev">Previous</a>
|
||||
<a href="/{{version}}/documentation/streams/developer-guide/config-streams" class="pagination__btn pagination__btn__next">Next</a>
|
||||
</div>
|
||||
</script>
|
||||
|
||||
<!--#include virtual="../../../includes/_header.htm" -->
|
||||
<!--#include virtual="../../../includes/_top.htm" -->
|
||||
<div class="content documentation documentation--current">
|
||||
<!--#include virtual="../../../includes/_nav.htm" -->
|
||||
<div class="right">
|
||||
<!--#include virtual="../../../includes/_docs_banner.htm" -->
|
||||
<ul class="breadcrumbs">
|
||||
<li><a href="/documentation">Documentation</a></li>
|
||||
<li><a href="/documentation/streams">Kafka Streams</a></li>
|
||||
<li><a href="/documentation/streams/developer-guide/">Developer Guide</a></li>
|
||||
</ul>
|
||||
<div class="p-content"></div>
|
||||
</div>
|
||||
</div>
|
||||
<!--#include virtual="../../../includes/_footer.htm" -->
|
||||
<script>
|
||||
$(function() {
|
||||
// Show selected style on nav item
|
||||
$('.b-nav__streams').addClass('selected');
|
||||
|
||||
//sticky secondary nav
|
||||
var $navbar = $(".sub-nav-sticky"),
|
||||
y_pos = $navbar.offset().top,
|
||||
height = $navbar.height();
|
||||
|
||||
$(window).scroll(function() {
|
||||
var scrollTop = $(window).scrollTop();
|
||||
|
||||
if (scrollTop > y_pos - height) {
|
||||
$navbar.addClass("navbar-fixed")
|
||||
} else if (scrollTop <= y_pos) {
|
||||
$navbar.removeClass("navbar-fixed")
|
||||
}
|
||||
});
|
||||
|
||||
// Display docs subnav items
|
||||
$('.b-nav__docs').parent().toggleClass('nav__item__with__subs--expanded');
|
||||
});
|
||||
</script>
|
||||
370
docs/streams/index.html
Normal file
370
docs/streams/index.html
Normal file
@@ -0,0 +1,370 @@
|
||||
<!--
|
||||
Licensed to the Apache Software Foundation (ASF) under one or more
|
||||
contributor license agreements. See the NOTICE file distributed with
|
||||
this work for additional information regarding copyright ownership.
|
||||
The ASF licenses this file to You under the Apache License, Version 2.0
|
||||
(the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
-->
|
||||
<script>
|
||||
<!--#include virtual="../js/templateData.js" -->
|
||||
</script>
|
||||
<style>
|
||||
.video__item{cursor:pointer;}
|
||||
</style>
|
||||
<script id="streams-template" type="text/x-handlebars-template">
|
||||
<h1>Kafka Streams</h1>
|
||||
<div class="sub-nav-sticky">
|
||||
<div class="sticky-top">
|
||||
<div style="height:35px">
|
||||
<a class="active-menu-item" href="/{{version}}/documentation/streams/">Introduction</a>
|
||||
<a href="/{{version}}/documentation/streams/quickstart">Run Demo App</a>
|
||||
<a href="/{{version}}/documentation/streams/tutorial">Tutorial: Write App</a>
|
||||
<a href="/{{version}}/documentation/streams/core-concepts">Concepts</a>
|
||||
<a href="/{{version}}/documentation/streams/architecture">Architecture</a>
|
||||
<a href="/{{version}}/documentation/streams/developer-guide/">Developer Guide</a>
|
||||
<a href="/{{version}}/documentation/streams/upgrade-guide">Upgrade</a>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
<h3 class="streams_intro">The easiest way to write mission-critical real-time applications and microservices</h3>
|
||||
<p class="streams__description">Kafka Streams is a client library for building applications and microservices, where the input and output data are stored in Kafka clusters. It combines the simplicity of writing and deploying standard Java and Scala applications on the client side with the benefits of Kafka's server-side cluster technology.</p>
|
||||
<div class="video__series__grid">
|
||||
<div class="yt__video__block">
|
||||
<div class="yt__video__inner__block">
|
||||
<iframe class="yt_series video_1 active" style="display:block" src="https://www.youtube.com/embed/Z3JKCLG3VP4?rel=0&showinfo=0&end=602" frameborder="0" allowfullscreen></iframe>
|
||||
<iframe class="yt_series video_2" src="https://www.youtube.com/embed/LxxeXI1mPKo?rel=0&showinfo=0&end=622" frameborder="0" allowfullscreen></iframe>
|
||||
<iframe class="yt_series video_3" src="https://www.youtube.com/embed/7JYEEx7SBuE?rel=0&showinfo=0end=557" frameborder="0" allowfullscreen></iframe>
|
||||
<iframe class="yt_series video_4" src="https://www.youtube.com/embed/3kJgYIkAeHs?rel=0&showinfo=0&end=564" frameborder="0" allowfullscreen></iframe>
|
||||
</div>
|
||||
</div>
|
||||
<div class="video__block">
|
||||
<h3>TOUR OF THE STREAMS API</h3>
|
||||
<div class="video__list">
|
||||
<p class="video__item video_list_1 active" onclick="$('.video__item').removeClass('active'); $(this).addClass('active');$('.yt_series').hide();$('.video_1').show();">
|
||||
<span class="number">1</span><span class="video__text">Intro to Streams</span>
|
||||
</p>
|
||||
<p class="video__item video_list_2" onclick="$('.video__item').removeClass('active'); $(this).addClass('active');$('.yt_series').hide();$('.video_2').show();">
|
||||
<span class="number">2</span><span class="video__text">Creating a Streams Application</span>
|
||||
</p>
|
||||
<p class="video__item video_list_3" onclick="$('.video__item').removeClass('active'); $(this).addClass('active');$('.yt_series').hide();$('.video_3').show();">
|
||||
<span class="number">3</span><span class="video__text">Transforming Data Pt. 1</span>
|
||||
</p>
|
||||
<p class="video__item video_list_4" onclick="$('.video__item').removeClass('active'); $(this).addClass('active');$('.yt_series').hide();$('.video_4').show();">
|
||||
<span class="number">4</span><span class="video__text">Transforming Data Pt. 11</span>
|
||||
</p>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
<hr class="separator">
|
||||
<div class="use-item-section">
|
||||
<div class="use__list__sec">
|
||||
<h3>Why you'll love using Kafka Streams!</h3>
|
||||
<ul class="use-feature-list">
|
||||
<li>Elastic, highly scalable, fault-tolerant</li>
|
||||
<li>Deploy to containers, VMs, bare metal, cloud</li>
|
||||
<li>Equally viable for small, medium, & large use cases</li>
|
||||
<li>Fully integrated with Kafka security</li>
|
||||
<li>Write standard Java and Scala applications</li>
|
||||
<li>Exactly-once processing semantics</li>
|
||||
<li>No separate processing cluster required</li>
|
||||
<li>Develop on Mac, Linux, Windows</li>
|
||||
|
||||
</ul>
|
||||
</div>
|
||||
<div class="first__app__cta">
|
||||
<a href="/{{version}}/documentation/streams/tutorial" class="first__app__btn">Write your first app</a>
|
||||
</div>
|
||||
</div>
|
||||
<hr class="separator" id="streams-use-cases">
|
||||
<h3 class="stream__text">Kafka Streams use cases</h3>
|
||||
<div class="customers__grid">
|
||||
<div class="customer__grid">
|
||||
<div class="customer__item streams_logo_grid streams__ny__grid">
|
||||
<a href="https://open.nytimes.com/publishing-with-apache-kafka-at-the-new-york-times-7f0e3b7d2077" target="_blank" class="grid__logo__link">
|
||||
<span class="grid__item__logo" style="background-image: url('/images/powered-by/NYT.jpg');"></span>
|
||||
</a>
|
||||
<p class="grid__item__customer__description extra__space">
|
||||
<a href="https://open.nytimes.com/publishing-with-apache-kafka-at-the-new-york-times-7f0e3b7d2077" target="_blank">The New York Times uses Apache Kafka </a>and the Kafka Streams to store and distribute, in real-time, published content to the various applications and systems that make it available to the readers.
|
||||
</p>
|
||||
</div>
|
||||
</div>
|
||||
<div class="customer__grid">
|
||||
<div class="customer__item streams_logo_grid streams__zalando__grid">
|
||||
<a href="https://www.confluent.io/blog/ranking-websites-real-time-apache-kafkas-streams-api/" target="_blank" class="grid__logo__link">
|
||||
<span class="grid__item__logo" style="background-image: url('/images/powered-by/zalando.jpg');"></span>
|
||||
</a>
|
||||
<p class="grid__item__customer__description extra__space">As the leading online fashion retailer in Europe, Zalando uses Kafka as an ESB (Enterprise Service Bus), which helps us in transitioning from a monolithic to a micro services architecture. Using Kafka for processing
|
||||
<a href="https://www.confluent.io/blog/ranking-websites-real-time-apache-kafkas-streams-api/" target='blank'> event streams</a> enables our technical team to do near-real time business intelligence.
|
||||
</p>
|
||||
</div>
|
||||
</div>
|
||||
<div class="customer__grid">
|
||||
<div class="customer__item streams_logo_grid streams__line__grid">
|
||||
<a href="https://engineering.linecorp.com/en/blog/detail/80" target="_blank" class="grid__logo__link">
|
||||
<span class="grid__item__logo" style="background-image: url('/images/powered-by/line.svg');width:9rem"></span>
|
||||
</a>
|
||||
<p class="grid__item__customer__description extra__space"><a href="https://engineering.linecorp.com/en/blog/detail/80" target="_blank">LINE uses Apache Kafka</a> as a central datahub for our services to communicate to one another. Hundreds of billions of messages are produced daily and are used to execute various business logic, threat detection, search indexing and data analysis. LINE leverages Kafka Streams to reliably transform and filter topics enabling sub topics consumers can efficiently consume, meanwhile retaining easy maintainability thanks to its sophisticated yet minimal code base.</p>
|
||||
</div>
|
||||
</div>
|
||||
<div class="customer__grid">
|
||||
<div class="customer__item streams_logo_grid streams__ny__grid">
|
||||
<a href="https://medium.com/@Pinterest_Engineering/using-kafka-streams-api-for-predictive-budgeting-9f58d206c996" target="_blank" class="grid__logo__link">
|
||||
<span class="grid__item__logo" style="background-image: url('/images/powered-by/pinterest.png');"></span>
|
||||
</a>
|
||||
<p class="grid__item__customer__description">
|
||||
<a href="https://medium.com/@Pinterest_Engineering/using-kafka-streams-api-for-predictive-budgeting-9f58d206c996" target="_blank">Pinterest uses Apache Kafka and the Kafka Streams</a> at large scale to power the real-time, predictive budgeting system of their advertising infrastructure. With Kafka Streams, spend predictions are more accurate than ever.
|
||||
</p>
|
||||
</div>
|
||||
</div>
|
||||
<div class="customer__grid">
|
||||
<div class="customer__item streams_logo_grid streams__rabobank__grid">
|
||||
<a href="https://www.confluent.io/blog/real-time-financial-alerts-rabobank-apache-kafkas-streams-api/" target="_blank" class="grid__logo__link">
|
||||
<span class="grid__item__logo" style="background-image: url('/images/powered-by/rabobank.jpg');"></span>
|
||||
</a>
|
||||
<p class="grid__item__customer__description">Rabobank is one of the 3 largest banks in the Netherlands. Its digital nervous system, the Business Event Bus, is powered by Apache Kafka. It is used by an increasing amount of financial processes and services, one of which is Rabo Alerts. This service alerts customers in real-time upon financial events and is <a href="https://www.confluent.io/blog/real-time-financial-alerts-rabobank-apache-kafkas-streams-api/" target="_blank">built using Kafka Streams.</a></p>
|
||||
</div>
|
||||
</div>
|
||||
<div class="customer__grid">
|
||||
<div class="customer__item streams_logo_grid streams__ny__grid">
|
||||
<a href="https://speakerdeck.com/xenji/kafka-and-debezium-at-trivago-code-dot-talks-2017-edition" target="_blank" class="grid__logo__link">
|
||||
<span class="grid__item__logo" style="background-image: url('/images/powered-by/trivago.png');"></span>
|
||||
</a>
|
||||
<p class="grid__item__customer__description">
|
||||
Trivago is a global hotel search platform. We are focused on reshaping the way travelers search for and compare hotels, while enabling hotel advertisers to grow their businesses by providing access to a broad audience of travelers via our websites and apps. As of 2017, we offer access to approximately 1.8 million hotels and other accommodations in over 190 countries. We use Kafka, Kafka Connect, and Kafka Streams to <a href="https://speakerdeck.com/xenji/kafka-and-debezium-at-trivago-code-dot-talks-2017-edition" target="_blank">enable our developers</a> to access data freely in the company. Kafka Streams powers parts of our analytics pipeline and delivers endless options to explore and operate on the data sources we have at hand.
|
||||
</p>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
</div>
|
||||
<h3 style="margin-top: 5.3rem;">Hello Kafka Streams</h3>
|
||||
<p>The code example below implements a WordCount application that is elastic, highly scalable, fault-tolerant, stateful, and ready to run in production at large scale</p>
|
||||
|
||||
<div class="code-example">
|
||||
<div class="btn-group">
|
||||
<a class="selected b-java-8" data-section="java-8">Java 8+</a>
|
||||
<a class="b-java-7" data-section="java-7">Java 7</a>
|
||||
<a class="b-scala" data-section="scala">Scala</a>
|
||||
</div>
|
||||
|
||||
<div class="code-example__snippet b-java-8 selected">
|
||||
<pre class="brush: java;">
|
||||
import org.apache.kafka.common.serialization.Serdes;
|
||||
import org.apache.kafka.common.utils.Bytes;
|
||||
import org.apache.kafka.streams.KafkaStreams;
|
||||
import org.apache.kafka.streams.StreamsBuilder;
|
||||
import org.apache.kafka.streams.StreamsConfig;
|
||||
import org.apache.kafka.streams.kstream.KStream;
|
||||
import org.apache.kafka.streams.kstream.KTable;
|
||||
import org.apache.kafka.streams.kstream.Materialized;
|
||||
import org.apache.kafka.streams.kstream.Produced;
|
||||
import org.apache.kafka.streams.state.KeyValueStore;
|
||||
|
||||
import java.util.Arrays;
|
||||
import java.util.Properties;
|
||||
|
||||
public class WordCountApplication {
|
||||
|
||||
public static void main(final String[] args) throws Exception {
|
||||
Properties props = new Properties();
|
||||
props.put(StreamsConfig.APPLICATION_ID_CONFIG, "wordcount-application");
|
||||
props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "kafka-broker1:9092");
|
||||
props.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass());
|
||||
props.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass());
|
||||
|
||||
StreamsBuilder builder = new StreamsBuilder();
|
||||
KStream<String, String> textLines = builder.stream("TextLinesTopic");
|
||||
KTable<String, Long> wordCounts = textLines
|
||||
.flatMapValues(textLine -> Arrays.asList(textLine.toLowerCase().split("\\W+")))
|
||||
.groupBy((key, word) -> word)
|
||||
.count(Materialized.<String, Long, KeyValueStore<Bytes, byte[]>>as("counts-store"));
|
||||
wordCounts.toStream().to("WordsWithCountsTopic", Produced.with(Serdes.String(), Serdes.Long()));
|
||||
|
||||
KafkaStreams streams = new KafkaStreams(builder.build(), props);
|
||||
streams.start();
|
||||
}
|
||||
|
||||
}
|
||||
</pre>
|
||||
</div>
|
||||
|
||||
<div class="code-example__snippet b-java-7">
|
||||
<pre class="brush: java;">
|
||||
import org.apache.kafka.common.serialization.Serdes;
|
||||
import org.apache.kafka.common.utils.Bytes;
|
||||
import org.apache.kafka.streams.KafkaStreams;
|
||||
import org.apache.kafka.streams.StreamsBuilder;
|
||||
import org.apache.kafka.streams.StreamsConfig;
|
||||
import org.apache.kafka.streams.kstream.KStream;
|
||||
import org.apache.kafka.streams.kstream.KTable;
|
||||
import org.apache.kafka.streams.kstream.ValueMapper;
|
||||
import org.apache.kafka.streams.kstream.KeyValueMapper;
|
||||
import org.apache.kafka.streams.kstream.Materialized;
|
||||
import org.apache.kafka.streams.kstream.Produced;
|
||||
import org.apache.kafka.streams.state.KeyValueStore;
|
||||
|
||||
import java.util.Arrays;
|
||||
import java.util.Properties;
|
||||
|
||||
public class WordCountApplication {
|
||||
|
||||
public static void main(final String[] args) throws Exception {
|
||||
Properties props = new Properties();
|
||||
props.put(StreamsConfig.APPLICATION_ID_CONFIG, "wordcount-application");
|
||||
props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "kafka-broker1:9092");
|
||||
props.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass());
|
||||
props.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass());
|
||||
|
||||
StreamsBuilder builder = new StreamsBuilder();
|
||||
KStream<String, String> textLines = builder.stream("TextLinesTopic");
|
||||
KTable<String, Long> wordCounts = textLines
|
||||
.flatMapValues(new ValueMapper<String, Iterable<String>>() {
|
||||
@Override
|
||||
public Iterable<String> apply(String textLine) {
|
||||
return Arrays.asList(textLine.toLowerCase().split("\\W+"));
|
||||
}
|
||||
})
|
||||
.groupBy(new KeyValueMapper<String, String, String>() {
|
||||
@Override
|
||||
public String apply(String key, String word) {
|
||||
return word;
|
||||
}
|
||||
})
|
||||
.count(Materialized.<String, Long, KeyValueStore<Bytes, byte[]>>as("counts-store"));
|
||||
|
||||
|
||||
wordCounts.toStream().to("WordsWithCountsTopic", Produced.with(Serdes.String(), Serdes.Long()));
|
||||
|
||||
KafkaStreams streams = new KafkaStreams(builder.build(), props);
|
||||
streams.start();
|
||||
}
|
||||
|
||||
}
|
||||
</pre>
|
||||
</div>
|
||||
|
||||
<div class="code-example__snippet b-scala">
|
||||
<pre class="brush: scala;">
|
||||
import java.util.Properties
|
||||
import java.util.concurrent.TimeUnit
|
||||
|
||||
import org.apache.kafka.streams.kstream.Materialized
|
||||
import org.apache.kafka.streams.scala.ImplicitConversions._
|
||||
import org.apache.kafka.streams.scala._
|
||||
import org.apache.kafka.streams.scala.kstream._
|
||||
import org.apache.kafka.streams.{KafkaStreams, StreamsConfig}
|
||||
|
||||
object WordCountApplication extends App {
|
||||
import Serdes._
|
||||
|
||||
val props: Properties = {
|
||||
val p = new Properties()
|
||||
p.put(StreamsConfig.APPLICATION_ID_CONFIG, "wordcount-application")
|
||||
p.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "kafka-broker1:9092")
|
||||
p
|
||||
}
|
||||
|
||||
val builder: StreamsBuilder = new StreamsBuilder
|
||||
val textLines: KStream[String, String] = builder.stream[String, String]("TextLinesTopic")
|
||||
val wordCounts: KTable[String, Long] = textLines
|
||||
.flatMapValues(textLine => textLine.toLowerCase.split("\\W+"))
|
||||
.groupBy((_, word) => word)
|
||||
.count()(Materialized.as("counts-store"))
|
||||
wordCounts.toStream.to("WordsWithCountsTopic")
|
||||
|
||||
val streams: KafkaStreams = new KafkaStreams(builder.build(), props)
|
||||
streams.start()
|
||||
|
||||
sys.ShutdownHookThread {
|
||||
streams.close(10, TimeUnit.SECONDS)
|
||||
}
|
||||
}
|
||||
</pre>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div class="pagination">
|
||||
<a href="/{{version}}/documentation" class="pagination__btn pagination__btn__prev">Previous</a>
|
||||
<a href="/{{version}}/documentation/streams/quickstart" class="pagination__btn pagination__btn__next">Next</a>
|
||||
</div>
|
||||
|
||||
</script>
|
||||
<!--#include virtual="../../includes/_header.htm" -->
|
||||
<!--#include virtual="../../includes/_top.htm" -->
|
||||
<div class="content documentation documentation--current">
|
||||
<!--#include virtual="../../includes/_nav.htm" -->
|
||||
<div class="right">
|
||||
<!--#include virtual="../../includes/_docs_banner.htm" -->
|
||||
<ul class="breadcrumbs">
|
||||
<li><a href="/documentation">Documentation</a>
|
||||
<li><a href="/documentation/streams">Kafka Streams</a></li>
|
||||
</li>
|
||||
</ul>
|
||||
<div class="p-streams"></div>
|
||||
</div>
|
||||
</div>
|
||||
<!--#include virtual="../../includes/_footer.htm" -->
|
||||
<script>
|
||||
$(function() {
|
||||
|
||||
// Show selected style on nav item
|
||||
$('.b-nav__streams').addClass('selected');
|
||||
|
||||
$('.video_list_1').click(function(){
|
||||
$('.video_2').attr('src', $('.video_2').attr('src'));
|
||||
$('.video_3').attr('src', $('.video_3').attr('src'));
|
||||
$('.video_4').attr('src', $('.video_4').attr('src'));
|
||||
|
||||
});
|
||||
|
||||
$('.video_list_2').click(function(){
|
||||
$('.video_1').attr('src', $('.video_1').attr('src'));
|
||||
$('.video_3').attr('src', $('.video_3').attr('src'));
|
||||
$('.video_4').attr('src', $('.video_4').attr('src'));
|
||||
|
||||
});
|
||||
|
||||
$('.video_list_3').click(function(){
|
||||
$('.video_1').attr('src', $('.video_1').attr('src'));
|
||||
$('.video_2').attr('src', $('.video_2').attr('src'));
|
||||
$('.video_4').attr('src', $('.video_4').attr('src'));
|
||||
});
|
||||
|
||||
$('.video_list_4').click(function(){
|
||||
$('.video_1').attr('src', $('.video_1').attr('src'));
|
||||
$('.video_2').attr('src', $('.video_2').attr('src'));
|
||||
$('.video_3').attr('src', $('.video_3').attr('src'));
|
||||
});
|
||||
|
||||
|
||||
//sticky secondary nav
|
||||
var $navbar = $(".sub-nav-sticky"),
|
||||
y_pos = $navbar.offset().top,
|
||||
height = $navbar.height();
|
||||
|
||||
$(window).scroll(function() {
|
||||
var scrollTop = $(window).scrollTop();
|
||||
|
||||
if (scrollTop > y_pos - height) {
|
||||
$navbar.addClass("navbar-fixed")
|
||||
} else if (scrollTop <= y_pos) {
|
||||
$navbar.removeClass("navbar-fixed")
|
||||
}
|
||||
});
|
||||
|
||||
// Display docs subnav items
|
||||
$('.b-nav__docs').parent().toggleClass('nav__item__with__subs--expanded');
|
||||
// Show selected code example
|
||||
$('.btn-group a').click(function(){
|
||||
var targetClass = '.b-' + $(this).data().section;
|
||||
$('.code-example__snippet, .btn-group a').removeClass('selected');
|
||||
$(targetClass).addClass('selected');
|
||||
});
|
||||
});
|
||||
</script>
|
||||
391
docs/streams/quickstart.html
Normal file
391
docs/streams/quickstart.html
Normal file
@@ -0,0 +1,391 @@
|
||||
<!--
|
||||
Licensed to the Apache Software Foundation (ASF) under one or more
|
||||
contributor license agreements. See the NOTICE file distributed with
|
||||
this work for additional information regarding copyright ownership.
|
||||
The ASF licenses this file to You under the Apache License, Version 2.0
|
||||
(the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
-->
|
||||
<script><!--#include virtual="../js/templateData.js" --></script>
|
||||
|
||||
<script id="content-template" type="text/x-handlebars-template">
|
||||
|
||||
<h1>Run Kafka Streams Demo Application</h1>
|
||||
<div class="sub-nav-sticky">
|
||||
<div class="sticky-top">
|
||||
<div style="height:35px">
|
||||
<a href="/{{version}}/documentation/streams/">Introduction</a>
|
||||
<a class="active-menu-item" href="/{{version}}/documentation/streams/quickstart">Run Demo App</a>
|
||||
<a href="/{{version}}/documentation/streams/tutorial">Tutorial: Write App</a>
|
||||
<a href="/{{version}}/documentation/streams/core-concepts">Concepts</a>
|
||||
<a href="/{{version}}/documentation/streams/architecture">Architecture</a>
|
||||
<a href="/{{version}}/documentation/streams/developer-guide/">Developer Guide</a>
|
||||
<a href="/{{version}}/documentation/streams/upgrade-guide">Upgrade</a>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
<p>
|
||||
This tutorial assumes you are starting fresh and have no existing Kafka or ZooKeeper data. However, if you have already started Kafka and
|
||||
ZooKeeper, feel free to skip the first two steps.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
Kafka Streams is a client library for building mission-critical real-time applications and microservices,
|
||||
where the input and/or output data is stored in Kafka clusters. Kafka Streams combines the simplicity of
|
||||
writing and deploying standard Java and Scala applications on the client side with the benefits of Kafka's
|
||||
server-side cluster technology to make these applications highly scalable, elastic, fault-tolerant, distributed,
|
||||
and much more.
|
||||
</p>
|
||||
<p>
|
||||
This quickstart example will demonstrate how to run a streaming application coded in this library. Here is the gist
|
||||
of the <code><a href="https://github.com/apache/kafka/blob/{{dotVersion}}/streams/examples/src/main/java/org/apache/kafka/streams/examples/wordcount/WordCountDemo.java">WordCountDemo</a></code> example code (converted to use Java 8 lambda expressions for easy reading).
|
||||
</p>
|
||||
<pre class="brush: java;">
|
||||
// Serializers/deserializers (serde) for String and Long types
|
||||
final Serde<String> stringSerde = Serdes.String();
|
||||
final Serde<Long> longSerde = Serdes.Long();
|
||||
|
||||
// Construct a `KStream` from the input topic "streams-plaintext-input", where message values
|
||||
// represent lines of text (for the sake of this example, we ignore whatever may be stored
|
||||
// in the message keys).
|
||||
KStream<String, String> textLines = builder.stream(
|
||||
"streams-plaintext-input",
|
||||
Consumed.with(stringSerde, stringSerde)
|
||||
);
|
||||
|
||||
KTable<String, Long> wordCounts = textLines
|
||||
// Split each text line, by whitespace, into words.
|
||||
.flatMapValues(value -> Arrays.asList(value.toLowerCase().split("\\W+")))
|
||||
|
||||
// Group the text words as message keys
|
||||
.groupBy((key, value) -> value)
|
||||
|
||||
// Count the occurrences of each word (message key).
|
||||
.count();
|
||||
|
||||
// Store the running counts as a changelog stream to the output topic.
|
||||
wordCounts.toStream().to("streams-wordcount-output", Produced.with(Serdes.String(), Serdes.Long()));
|
||||
</pre>
|
||||
|
||||
<p>
|
||||
It implements the WordCount
|
||||
algorithm, which computes a word occurrence histogram from the input text. However, unlike other WordCount examples
|
||||
you might have seen before that operate on bounded data, the WordCount demo application behaves slightly differently because it is
|
||||
designed to operate on an <b>infinite, unbounded stream</b> of data. Similar to the bounded variant, it is a stateful algorithm that
|
||||
tracks and updates the counts of words. However, since it must assume potentially
|
||||
unbounded input data, it will periodically output its current state and results while continuing to process more data
|
||||
because it cannot know when it has processed "all" the input data.
|
||||
</p>
|
||||
<p>
|
||||
As the first step, we will start Kafka (unless you already have it started) and then we will
|
||||
prepare input data to a Kafka topic, which will subsequently be processed by a Kafka Streams application.
|
||||
</p>
|
||||
|
||||
<h4><a id="quickstart_streams_download" href="#quickstart_streams_download">Step 1: Download the code</a></h4>
|
||||
|
||||
<a href="https://www.apache.org/dyn/closer.cgi?path=/kafka/{{fullDotVersion}}/kafka_{{scalaVersion}}-{{fullDotVersion}}.tgz" title="Kafka downloads">Download</a> the {{fullDotVersion}} release and un-tar it.
|
||||
Note that there are multiple downloadable Scala versions and we choose to use the recommended version ({{scalaVersion}}) here:
|
||||
|
||||
<pre class="brush: bash;">
|
||||
> tar -xzf kafka_{{scalaVersion}}-{{fullDotVersion}}.tgz
|
||||
> cd kafka_{{scalaVersion}}-{{fullDotVersion}}
|
||||
</pre>
|
||||
|
||||
<h4><a id="quickstart_streams_startserver" href="#quickstart_streams_startserver">Step 2: Start the Kafka server</a></h4>
|
||||
|
||||
<p>
|
||||
Kafka uses <a href="https://zookeeper.apache.org/">ZooKeeper</a> so you need to first start a ZooKeeper server if you don't already have one. You can use the convenience script packaged with kafka to get a quick-and-dirty single-node ZooKeeper instance.
|
||||
</p>
|
||||
|
||||
<pre class="brush: bash;">
|
||||
> bin/zookeeper-server-start.sh config/zookeeper.properties
|
||||
[2013-04-22 15:01:37,495] INFO Reading configuration from: config/zookeeper.properties (org.apache.zookeeper.server.quorum.QuorumPeerConfig)
|
||||
...
|
||||
</pre>
|
||||
|
||||
<p>Now start the Kafka server:</p>
|
||||
<pre class="brush: bash;">
|
||||
> bin/kafka-server-start.sh config/server.properties
|
||||
[2013-04-22 15:01:47,028] INFO Verifying properties (kafka.utils.VerifiableProperties)
|
||||
[2013-04-22 15:01:47,051] INFO Property socket.send.buffer.bytes is overridden to 1048576 (kafka.utils.VerifiableProperties)
|
||||
...
|
||||
</pre>
|
||||
|
||||
|
||||
<h4><a id="quickstart_streams_prepare" href="#quickstart_streams_prepare">Step 3: Prepare input topic and start Kafka producer</a></h4>
|
||||
|
||||
<!--
|
||||
|
||||
<pre class="brush: bash;">
|
||||
> echo -e "all streams lead to kafka\nhello kafka streams\njoin kafka summit" > file-input.txt
|
||||
</pre>
|
||||
Or on Windows:
|
||||
<pre class="brush: bash;">
|
||||
> echo all streams lead to kafka> file-input.txt
|
||||
> echo hello kafka streams>> file-input.txt
|
||||
> echo|set /p=join kafka summit>> file-input.txt
|
||||
</pre>
|
||||
|
||||
-->
|
||||
|
||||
Next, we create the input topic named <b>streams-plaintext-input</b> and the output topic named <b>streams-wordcount-output</b>:
|
||||
|
||||
<pre class="brush: bash;">
|
||||
> bin/kafka-topics.sh --create \
|
||||
--bootstrap-server localhost:9092 \
|
||||
--replication-factor 1 \
|
||||
--partitions 1 \
|
||||
--topic streams-plaintext-input
|
||||
Created topic "streams-plaintext-input".
|
||||
</pre>
|
||||
|
||||
Note: we create the output topic with compaction enabled because the output stream is a changelog stream
|
||||
(cf. <a href="#anchor-changelog-output">explanation of application output</a> below).
|
||||
|
||||
<pre class="brush: bash;">
|
||||
> bin/kafka-topics.sh --create \
|
||||
--bootstrap-server localhost:9092 \
|
||||
--replication-factor 1 \
|
||||
--partitions 1 \
|
||||
--topic streams-wordcount-output \
|
||||
--config cleanup.policy=compact
|
||||
Created topic "streams-wordcount-output".
|
||||
</pre>
|
||||
|
||||
The created topic can be described with the same <b>kafka-topics</b> tool:
|
||||
|
||||
<pre class="brush: bash;">
|
||||
> bin/kafka-topics.sh --bootstrap-server localhost:9092 --describe
|
||||
|
||||
Topic:streams-wordcount-output PartitionCount:1 ReplicationFactor:1 Configs:cleanup.policy=compact,segment.bytes=1073741824
|
||||
Topic: streams-wordcount-output Partition: 0 Leader: 0 Replicas: 0 Isr: 0
|
||||
Topic:streams-plaintext-input PartitionCount:1 ReplicationFactor:1 Configs:segment.bytes=1073741824
|
||||
Topic: streams-plaintext-input Partition: 0 Leader: 0 Replicas: 0 Isr: 0
|
||||
</pre>
|
||||
|
||||
<h4><a id="quickstart_streams_start" href="#quickstart_streams_start">Step 4: Start the Wordcount Application</a></h4>
|
||||
|
||||
The following command starts the WordCount demo application:
|
||||
|
||||
<pre class="brush: bash;">
|
||||
> bin/kafka-run-class.sh org.apache.kafka.streams.examples.wordcount.WordCountDemo
|
||||
</pre>
|
||||
|
||||
<p>
|
||||
The demo application will read from the input topic <b>streams-plaintext-input</b>, perform the computations of the WordCount algorithm on each of the read messages,
|
||||
and continuously write its current results to the output topic <b>streams-wordcount-output</b>.
|
||||
Hence there won't be any STDOUT output except log entries as the results are written back into in Kafka.
|
||||
</p>
|
||||
|
||||
Now we can start the console producer in a separate terminal to write some input data to this topic:
|
||||
|
||||
<pre class="brush: bash;">
|
||||
> bin/kafka-console-producer.sh --bootstrap-server localhost:9092 --topic streams-plaintext-input
|
||||
</pre>
|
||||
|
||||
and inspect the output of the WordCount demo application by reading from its output topic with the console consumer in a separate terminal:
|
||||
|
||||
<pre class="brush: bash;">
|
||||
> bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 \
|
||||
--topic streams-wordcount-output \
|
||||
--from-beginning \
|
||||
--formatter kafka.tools.DefaultMessageFormatter \
|
||||
--property print.key=true \
|
||||
--property print.value=true \
|
||||
--property key.deserializer=org.apache.kafka.common.serialization.StringDeserializer \
|
||||
--property value.deserializer=org.apache.kafka.common.serialization.LongDeserializer
|
||||
</pre>
|
||||
|
||||
|
||||
<h4><a id="quickstart_streams_process" href="#quickstart_streams_process">Step 5: Process some data</a></h4>
|
||||
|
||||
Now let's write some message with the console producer into the input topic <b>streams-plaintext-input</b> by entering a single line of text and then hit <RETURN>.
|
||||
This will send a new message to the input topic, where the message key is null and the message value is the string encoded text line that you just entered
|
||||
(in practice, input data for applications will typically be streaming continuously into Kafka, rather than being manually entered as we do in this quickstart):
|
||||
|
||||
<pre class="brush: bash;">
|
||||
> bin/kafka-console-producer.sh --bootstrap-server localhost:9092 --topic streams-plaintext-input
|
||||
all streams lead to kafka
|
||||
</pre>
|
||||
|
||||
<p>
|
||||
This message will be processed by the Wordcount application and the following output data will be written to the <b>streams-wordcount-output</b> topic and printed by the console consumer:
|
||||
</p>
|
||||
|
||||
<pre class="brush: bash;">
|
||||
> bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 \
|
||||
--topic streams-wordcount-output \
|
||||
--from-beginning \
|
||||
--formatter kafka.tools.DefaultMessageFormatter \
|
||||
--property print.key=true \
|
||||
--property print.value=true \
|
||||
--property key.deserializer=org.apache.kafka.common.serialization.StringDeserializer \
|
||||
--property value.deserializer=org.apache.kafka.common.serialization.LongDeserializer
|
||||
|
||||
all 1
|
||||
streams 1
|
||||
lead 1
|
||||
to 1
|
||||
kafka 1
|
||||
</pre>
|
||||
|
||||
<p>
|
||||
Here, the first column is the Kafka message key in <code>java.lang.String</code> format and represents a word that is being counted, and the second column is the message value in <code>java.lang.Long</code>format, representing the word's latest count.
|
||||
</p>
|
||||
|
||||
Now let's continue writing one more message with the console producer into the input topic <b>streams-plaintext-input</b>.
|
||||
Enter the text line "hello kafka streams" and hit <RETURN>.
|
||||
Your terminal should look as follows:
|
||||
|
||||
<pre class="brush: bash;">
|
||||
> bin/kafka-console-producer.sh --bootstrap-server localhost:9092 --topic streams-plaintext-input
|
||||
all streams lead to kafka
|
||||
hello kafka streams
|
||||
</pre>
|
||||
|
||||
In your other terminal in which the console consumer is running, you will observe that the WordCount application wrote new output data:
|
||||
|
||||
<pre class="brush: bash;">
|
||||
> bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 \
|
||||
--topic streams-wordcount-output \
|
||||
--from-beginning \
|
||||
--formatter kafka.tools.DefaultMessageFormatter \
|
||||
--property print.key=true \
|
||||
--property print.value=true \
|
||||
--property key.deserializer=org.apache.kafka.common.serialization.StringDeserializer \
|
||||
--property value.deserializer=org.apache.kafka.common.serialization.LongDeserializer
|
||||
|
||||
all 1
|
||||
streams 1
|
||||
lead 1
|
||||
to 1
|
||||
kafka 1
|
||||
hello 1
|
||||
kafka 2
|
||||
streams 2
|
||||
</pre>
|
||||
|
||||
Here the last printed lines <b>kafka 2</b> and <b>streams 2</b> indicate updates to the keys <b>kafka</b> and <b>streams</b> whose counts have been incremented from <b>1</b> to <b>2</b>.
|
||||
Whenever you write further input messages to the input topic, you will observe new messages being added to the <b>streams-wordcount-output</b> topic,
|
||||
representing the most recent word counts as computed by the WordCount application.
|
||||
Let's enter one final input text line "join kafka summit" and hit <RETURN> in the console producer to the input topic <b>streams-plaintext-input</b> before we wrap up this quickstart:
|
||||
|
||||
<pre class="brush: bash;">
|
||||
> bin/kafka-console-producer.sh --bootstrap-server localhost:9092 --topic streams-plaintext-input
|
||||
all streams lead to kafka
|
||||
hello kafka streams
|
||||
join kafka summit
|
||||
</pre>
|
||||
|
||||
<a name="anchor-changelog-output"></a>
|
||||
The <b>streams-wordcount-output</b> topic will subsequently show the corresponding updated word counts (see last three lines):
|
||||
|
||||
<pre class="brush: bash;">
|
||||
> bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 \
|
||||
--topic streams-wordcount-output \
|
||||
--from-beginning \
|
||||
--formatter kafka.tools.DefaultMessageFormatter \
|
||||
--property print.key=true \
|
||||
--property print.value=true \
|
||||
--property key.deserializer=org.apache.kafka.common.serialization.StringDeserializer \
|
||||
--property value.deserializer=org.apache.kafka.common.serialization.LongDeserializer
|
||||
|
||||
all 1
|
||||
streams 1
|
||||
lead 1
|
||||
to 1
|
||||
kafka 1
|
||||
hello 1
|
||||
kafka 2
|
||||
streams 2
|
||||
join 1
|
||||
kafka 3
|
||||
summit 1
|
||||
</pre>
|
||||
|
||||
As one can see, outputs of the Wordcount application is actually a continuous stream of updates, where each output record (i.e. each line in the original output above) is
|
||||
an updated count of a single word, aka record key such as "kafka". For multiple records with the same key, each later record is an update of the previous one.
|
||||
|
||||
<p>
|
||||
The two diagrams below illustrate what is essentially happening behind the scenes.
|
||||
The first column shows the evolution of the current state of the <code>KTable<String, Long></code> that is counting word occurrences for <code>count</code>.
|
||||
The second column shows the change records that result from state updates to the KTable and that are being sent to the output Kafka topic <b>streams-wordcount-output</b>.
|
||||
</p>
|
||||
|
||||
<img src="/{{version}}/images/streams-table-updates-02.png" style="float: right; width: 25%;">
|
||||
<img src="/{{version}}/images/streams-table-updates-01.png" style="float: right; width: 25%;">
|
||||
|
||||
<p>
|
||||
First the text line "all streams lead to kafka" is being processed.
|
||||
The <code>KTable</code> is being built up as each new word results in a new table entry (highlighted with a green background), and a corresponding change record is sent to the downstream <code>KStream</code>.
|
||||
</p>
|
||||
<p>
|
||||
When the second text line "hello kafka streams" is processed, we observe, for the first time, that existing entries in the <code>KTable</code> are being updated (here: for the words "kafka" and for "streams"). And again, change records are being sent to the output topic.
|
||||
</p>
|
||||
<p>
|
||||
And so on (we skip the illustration of how the third line is being processed). This explains why the output topic has the contents we showed above, because it contains the full record of changes.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
Looking beyond the scope of this concrete example, what Kafka Streams is doing here is to leverage the duality between a table and a changelog stream (here: table = the KTable, changelog stream = the downstream KStream): you can publish every change of the table to a stream, and if you consume the entire changelog stream from beginning to end, you can reconstruct the contents of the table.
|
||||
</p>
|
||||
|
||||
<h4><a id="quickstart_streams_stop" href="#quickstart_streams_stop">Step 6: Teardown the application</a></h4>
|
||||
|
||||
<p>You can now stop the console consumer, the console producer, the Wordcount application, the Kafka broker and the ZooKeeper server in order via <b>Ctrl-C</b>.</p>
|
||||
|
||||
<div class="pagination">
|
||||
<a href="/{{version}}/documentation/streams" class="pagination__btn pagination__btn__prev">Previous</a>
|
||||
<a href="/{{version}}/documentation/streams/tutorial" class="pagination__btn pagination__btn__next">Next</a>
|
||||
</div>
|
||||
</script>
|
||||
|
||||
<div class="p-quickstart-streams"></div>
|
||||
|
||||
<!--#include virtual="../../includes/_header.htm" -->
|
||||
<!--#include virtual="../../includes/_top.htm" -->
|
||||
<div class="content documentation documentation--current">
|
||||
<!--#include virtual="../../includes/_nav.htm" -->
|
||||
<div class="right">
|
||||
<!--#include virtual="../../includes/_docs_banner.htm" -->
|
||||
<ul class="breadcrumbs">
|
||||
<li><a href="/documentation">Documentation</a></li>
|
||||
<li><a href="/documentation/streams">Kafka Streams</a></li>
|
||||
</ul>
|
||||
<div class="p-content"></div>
|
||||
</div>
|
||||
</div>
|
||||
<!--#include virtual="../../includes/_footer.htm" -->
|
||||
<script>
|
||||
$(function() {
|
||||
// Show selected style on nav item
|
||||
$('.b-nav__streams').addClass('selected');
|
||||
|
||||
|
||||
//sticky secondary nav
|
||||
var $navbar = $(".sub-nav-sticky"),
|
||||
y_pos = $navbar.offset().top,
|
||||
height = $navbar.height();
|
||||
|
||||
$(window).scroll(function() {
|
||||
var scrollTop = $(window).scrollTop();
|
||||
|
||||
if (scrollTop > y_pos - height) {
|
||||
$navbar.addClass("navbar-fixed")
|
||||
} else if (scrollTop <= y_pos) {
|
||||
$navbar.removeClass("navbar-fixed")
|
||||
}
|
||||
});
|
||||
|
||||
// Display docs subnav items
|
||||
$('.b-nav__docs').parent().toggleClass('nav__item__with__subs--expanded');
|
||||
});
|
||||
</script>
|
||||
667
docs/streams/tutorial.html
Normal file
667
docs/streams/tutorial.html
Normal file
@@ -0,0 +1,667 @@
|
||||
<!--
|
||||
Licensed to the Apache Software Foundation (ASF) under one or more
|
||||
contributor license agreements. See the NOTICE file distributed with
|
||||
this work for additional information regarding copyright ownership.
|
||||
The ASF licenses this file to You under the Apache License, Version 2.0
|
||||
(the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
-->
|
||||
<script><!--#include virtual="../js/templateData.js" --></script>
|
||||
|
||||
<script id="content-template" type="text/x-handlebars-template">
|
||||
<h1>Tutorial: Write a Kafka Streams Application</h1>
|
||||
<div class="sub-nav-sticky">
|
||||
<div class="sticky-top">
|
||||
<div style="height:35px">
|
||||
<a href="/{{version}}/documentation/streams/">Introduction</a>
|
||||
<a href="/{{version}}/documentation/streams/quickstart">Run Demo App</a>
|
||||
<a class="active-menu-item" href="/{{version}}/documentation/streams/tutorial">Tutorial: Write App</a>
|
||||
<a href="/{{version}}/documentation/streams/core-concepts">Concepts</a>
|
||||
<a href="/{{version}}/documentation/streams/architecture">Architecture</a>
|
||||
<a href="/{{version}}/documentation/streams/developer-guide/">Developer Guide</a>
|
||||
<a href="/{{version}}/documentation/streams/upgrade-guide">Upgrade</a>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
<p>
|
||||
In this guide we will start from scratch on setting up your own project to write a stream processing application using Kafka Streams.
|
||||
It is highly recommended to read the <a href="/{{version}}/documentation/streams/quickstart">quickstart</a> first on how to run a Streams application written in Kafka Streams if you have not done so.
|
||||
</p>
|
||||
|
||||
<h4><a id="tutorial_maven_setup" href="#tutorial_maven_setup">Setting up a Maven Project</a></h4>
|
||||
|
||||
<p>
|
||||
We are going to use a Kafka Streams Maven Archetype for creating a Streams project structure with the following commands:
|
||||
</p>
|
||||
|
||||
<pre class="brush: bash;">
|
||||
mvn archetype:generate \
|
||||
-DarchetypeGroupId=org.apache.kafka \
|
||||
-DarchetypeArtifactId=streams-quickstart-java \
|
||||
-DarchetypeVersion={{fullDotVersion}} \
|
||||
-DgroupId=streams.examples \
|
||||
-DartifactId=streams.examples \
|
||||
-Dversion=0.1 \
|
||||
-Dpackage=myapps
|
||||
</pre>
|
||||
|
||||
<p>
|
||||
You can use a different value for <code>groupId</code>, <code>artifactId</code> and <code>package</code> parameters if you like.
|
||||
Assuming the above parameter values are used, this command will create a project structure that looks like this:
|
||||
</p>
|
||||
|
||||
<pre class="brush: bash;">
|
||||
> tree streams.examples
|
||||
streams-quickstart
|
||||
|-- pom.xml
|
||||
|-- src
|
||||
|-- main
|
||||
|-- java
|
||||
| |-- myapps
|
||||
| |-- LineSplit.java
|
||||
| |-- Pipe.java
|
||||
| |-- WordCount.java
|
||||
|-- resources
|
||||
|-- log4j.properties
|
||||
</pre>
|
||||
|
||||
<p>
|
||||
The <code>pom.xml</code> file included in the project already has the Streams dependency defined.
|
||||
Note, that the generated <code>pom.xml</code> targets Java 8, and does not work with higher Java versions.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
There are already several example programs written with Streams library under <code>src/main/java</code>.
|
||||
Since we are going to start writing such programs from scratch, we can now delete these examples:
|
||||
</p>
|
||||
|
||||
<pre class="brush: bash;">
|
||||
> cd streams-quickstart
|
||||
> rm src/main/java/myapps/*.java
|
||||
</pre>
|
||||
|
||||
<h4><a id="tutorial_code_pipe" href="#tutorial_code_pipe">Writing a first Streams application: Pipe</a></h4>
|
||||
|
||||
It's coding time now! Feel free to open your favorite IDE and import this Maven project, or simply open a text editor and create a java file under <code>src/main/java/myapps</code>.
|
||||
Let's name it <code>Pipe.java</code>:
|
||||
|
||||
<pre class="brush: java;">
|
||||
package myapps;
|
||||
|
||||
public class Pipe {
|
||||
|
||||
public static void main(String[] args) throws Exception {
|
||||
|
||||
}
|
||||
}
|
||||
</pre>
|
||||
|
||||
<p>
|
||||
We are going to fill in the <code>main</code> function to write this pipe program. Note that we will not list the import statements as we go since IDEs can usually add them automatically.
|
||||
However if you are using a text editor you need to manually add the imports, and at the end of this section we'll show the complete code snippet with import statement for you.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
The first step to write a Streams application is to create a <code>java.util.Properties</code> map to specify different Streams execution configuration values as defined in <code>StreamsConfig</code>.
|
||||
A couple of important configuration values you need to set are: <code>StreamsConfig.BOOTSTRAP_SERVERS_CONFIG</code>, which specifies a list of host/port pairs to use for establishing the initial connection to the Kafka cluster,
|
||||
and <code>StreamsConfig.APPLICATION_ID_CONFIG</code>, which gives the unique identifier of your Streams application to distinguish itself with other applications talking to the same Kafka cluster:
|
||||
</p>
|
||||
|
||||
<pre class="brush: java;">
|
||||
Properties props = new Properties();
|
||||
props.put(StreamsConfig.APPLICATION_ID_CONFIG, "streams-pipe");
|
||||
props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092"); // assuming that the Kafka broker this application is talking to runs on local machine with port 9092
|
||||
</pre>
|
||||
|
||||
<p>
|
||||
In addition, you can customize other configurations in the same map, for example, default serialization and deserialization libraries for the record key-value pairs:
|
||||
</p>
|
||||
|
||||
<pre class="brush: java;">
|
||||
props.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass());
|
||||
props.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass());
|
||||
</pre>
|
||||
|
||||
<p>
|
||||
For a full list of configurations of Kafka Streams please refer to this <a href="/{{version}}/documentation/#streamsconfigs">table</a>.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
Next we will define the computational logic of our Streams application.
|
||||
In Kafka Streams this computational logic is defined as a <code>topology</code> of connected processor nodes.
|
||||
We can use a topology builder to construct such a topology,
|
||||
</p>
|
||||
|
||||
<pre class="brush: java;">
|
||||
final StreamsBuilder builder = new StreamsBuilder();
|
||||
</pre>
|
||||
|
||||
<p>
|
||||
And then create a source stream from a Kafka topic named <code>streams-plaintext-input</code> using this topology builder:
|
||||
</p>
|
||||
|
||||
<pre class="brush: java;">
|
||||
KStream<String, String> source = builder.stream("streams-plaintext-input");
|
||||
</pre>
|
||||
|
||||
<p>
|
||||
Now we get a <code>KStream</code> that is continuously generating records from its source Kafka topic <code>streams-plaintext-input</code>.
|
||||
The records are organized as <code>String</code> typed key-value pairs.
|
||||
The simplest thing we can do with this stream is to write it into another Kafka topic, say it's named <code>streams-pipe-output</code>:
|
||||
</p>
|
||||
|
||||
<pre class="brush: java;">
|
||||
source.to("streams-pipe-output");
|
||||
</pre>
|
||||
|
||||
<p>
|
||||
Note that we can also concatenate the above two lines into a single line as:
|
||||
</p>
|
||||
|
||||
<pre class="brush: java;">
|
||||
builder.stream("streams-plaintext-input").to("streams-pipe-output");
|
||||
</pre>
|
||||
|
||||
<p>
|
||||
We can inspect what kind of <code>topology</code> is created from this builder by doing the following:
|
||||
</p>
|
||||
|
||||
<pre class="brush: java;">
|
||||
final Topology topology = builder.build();
|
||||
</pre>
|
||||
|
||||
<p>
|
||||
And print its description to standard output as:
|
||||
</p>
|
||||
|
||||
<pre class="brush: java;">
|
||||
System.out.println(topology.describe());
|
||||
</pre>
|
||||
|
||||
<p>
|
||||
If we just stop here, compile and run the program, it will output the following information:
|
||||
</p>
|
||||
|
||||
<pre class="brush: bash;">
|
||||
> mvn clean package
|
||||
> mvn exec:java -Dexec.mainClass=myapps.Pipe
|
||||
Sub-topologies:
|
||||
Sub-topology: 0
|
||||
Source: KSTREAM-SOURCE-0000000000(topics: streams-plaintext-input) --> KSTREAM-SINK-0000000001
|
||||
Sink: KSTREAM-SINK-0000000001(topic: streams-pipe-output) <-- KSTREAM-SOURCE-0000000000
|
||||
Global Stores:
|
||||
none
|
||||
</pre>
|
||||
|
||||
<p>
|
||||
As shown above, it illustrates that the constructed topology has two processor nodes, a source node <code>KSTREAM-SOURCE-0000000000</code> and a sink node <code>KSTREAM-SINK-0000000001</code>.
|
||||
<code>KSTREAM-SOURCE-0000000000</code> continuously read records from Kafka topic <code>streams-plaintext-input</code> and pipe them to its downstream node <code>KSTREAM-SINK-0000000001</code>;
|
||||
<code>KSTREAM-SINK-0000000001</code> will write each of its received record in order to another Kafka topic <code>streams-pipe-output</code>
|
||||
(the <code>--></code> and <code><--</code> arrows dictates the downstream and upstream processor nodes of this node, i.e. "children" and "parents" within the topology graph).
|
||||
It also illustrates that this simple topology has no global state stores associated with it (we will talk about state stores more in the following sections).
|
||||
</p>
|
||||
|
||||
<p>
|
||||
Note that we can always describe the topology as we did above at any given point while we are building it in the code, so as a user you can interactively "try and taste" your computational logic defined in the topology until you are happy with it.
|
||||
Suppose we are already done with this simple topology that just pipes data from one Kafka topic to another in an endless streaming manner,
|
||||
we can now construct the Streams client with the two components we have just constructed above: the configuration map specified in a <code>java.util.Properties</code> instance and the <code>Topology</code> object.
|
||||
</p>
|
||||
|
||||
<pre class="brush: java;">
|
||||
final KafkaStreams streams = new KafkaStreams(topology, props);
|
||||
</pre>
|
||||
|
||||
<p>
|
||||
By calling its <code>start()</code> function we can trigger the execution of this client.
|
||||
The execution won't stop until <code>close()</code> is called on this client.
|
||||
We can, for example, add a shutdown hook with a countdown latch to capture a user interrupt and close the client upon terminating this program:
|
||||
</p>
|
||||
|
||||
<pre class="brush: java;">
|
||||
final CountDownLatch latch = new CountDownLatch(1);
|
||||
|
||||
// attach shutdown handler to catch control-c
|
||||
Runtime.getRuntime().addShutdownHook(new Thread("streams-shutdown-hook") {
|
||||
@Override
|
||||
public void run() {
|
||||
streams.close();
|
||||
latch.countDown();
|
||||
}
|
||||
});
|
||||
|
||||
try {
|
||||
streams.start();
|
||||
latch.await();
|
||||
} catch (Throwable e) {
|
||||
System.exit(1);
|
||||
}
|
||||
System.exit(0);
|
||||
</pre>
|
||||
|
||||
<p>
|
||||
The complete code so far looks like this:
|
||||
</p>
|
||||
|
||||
<pre class="brush: java;">
|
||||
package myapps;
|
||||
|
||||
import org.apache.kafka.common.serialization.Serdes;
|
||||
import org.apache.kafka.streams.KafkaStreams;
|
||||
import org.apache.kafka.streams.StreamsBuilder;
|
||||
import org.apache.kafka.streams.StreamsConfig;
|
||||
import org.apache.kafka.streams.Topology;
|
||||
|
||||
import java.util.Properties;
|
||||
import java.util.concurrent.CountDownLatch;
|
||||
|
||||
public class Pipe {
|
||||
|
||||
public static void main(String[] args) throws Exception {
|
||||
Properties props = new Properties();
|
||||
props.put(StreamsConfig.APPLICATION_ID_CONFIG, "streams-pipe");
|
||||
props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
|
||||
props.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass());
|
||||
props.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass());
|
||||
|
||||
final StreamsBuilder builder = new StreamsBuilder();
|
||||
|
||||
builder.stream("streams-plaintext-input").to("streams-pipe-output");
|
||||
|
||||
final Topology topology = builder.build();
|
||||
|
||||
final KafkaStreams streams = new KafkaStreams(topology, props);
|
||||
final CountDownLatch latch = new CountDownLatch(1);
|
||||
|
||||
// attach shutdown handler to catch control-c
|
||||
Runtime.getRuntime().addShutdownHook(new Thread("streams-shutdown-hook") {
|
||||
@Override
|
||||
public void run() {
|
||||
streams.close();
|
||||
latch.countDown();
|
||||
}
|
||||
});
|
||||
|
||||
try {
|
||||
streams.start();
|
||||
latch.await();
|
||||
} catch (Throwable e) {
|
||||
System.exit(1);
|
||||
}
|
||||
System.exit(0);
|
||||
}
|
||||
}
|
||||
</pre>
|
||||
|
||||
<p>
|
||||
If you already have the Kafka broker up and running at <code>localhost:9092</code>,
|
||||
and the topics <code>streams-plaintext-input</code> and <code>streams-pipe-output</code> created on that broker,
|
||||
you can run this code in your IDE or on the command line, using Maven:
|
||||
</p>
|
||||
|
||||
<pre class="brush: brush;">
|
||||
> mvn clean package
|
||||
> mvn exec:java -Dexec.mainClass=myapps.Pipe
|
||||
</pre>
|
||||
|
||||
<p>
|
||||
For detailed instructions on how to run a Streams application and observe its computing results,
|
||||
please read the <a href="/{{version}}/documentation/streams/quickstart">Play with a Streams Application</a> section.
|
||||
We will not talk about this in the rest of this section.
|
||||
</p>
|
||||
|
||||
<h4><a id="tutorial_code_linesplit" href="#tutorial_code_linesplit">Writing a second Streams application: Line Split</a></h4>
|
||||
|
||||
<p>
|
||||
We have learned how to construct a Streams client with its two key components: the <code>StreamsConfig</code> and <code>Topology</code>.
|
||||
Now let's move on to add some real processing logic by augmenting the current topology.
|
||||
We can first create another program by first copy the existing <code>Pipe.java</code> class:
|
||||
</p>
|
||||
|
||||
<pre class="brush: brush;">
|
||||
> cp src/main/java/myapps/Pipe.java src/main/java/myapps/LineSplit.java
|
||||
</pre>
|
||||
|
||||
<p>
|
||||
And change its class name as well as the application id config to distinguish with the original program:
|
||||
</p>
|
||||
|
||||
<pre class="brush: java;">
|
||||
public class LineSplit {
|
||||
|
||||
public static void main(String[] args) throws Exception {
|
||||
Properties props = new Properties();
|
||||
props.put(StreamsConfig.APPLICATION_ID_CONFIG, "streams-linesplit");
|
||||
// ...
|
||||
}
|
||||
}
|
||||
</pre>
|
||||
|
||||
<p>
|
||||
Since each of the source stream's record is a <code>String</code> typed key-value pair,
|
||||
let's treat the value string as a text line and split it into words with a <code>FlatMapValues</code> operator:
|
||||
</p>
|
||||
|
||||
<pre class="brush: java;">
|
||||
KStream<String, String> source = builder.stream("streams-plaintext-input");
|
||||
KStream<String, String> words = source.flatMapValues(new ValueMapper<String, Iterable<String>>() {
|
||||
@Override
|
||||
public Iterable<String> apply(String value) {
|
||||
return Arrays.asList(value.split("\\W+"));
|
||||
}
|
||||
});
|
||||
</pre>
|
||||
|
||||
<p>
|
||||
The operator will take the <code>source</code> stream as its input, and generate a new stream named <code>words</code>
|
||||
by processing each record from its source stream in order and breaking its value string into a list of words, and producing
|
||||
each word as a new record to the output <code>words</code> stream.
|
||||
This is a stateless operator that does not need to keep track of any previously received records or processed results.
|
||||
Note if you are using JDK 8 you can use lambda expression and simplify the above code as:
|
||||
</p>
|
||||
|
||||
<pre class="brush: java;">
|
||||
KStream<String, String> source = builder.stream("streams-plaintext-input");
|
||||
KStream<String, String> words = source.flatMapValues(value -> Arrays.asList(value.split("\\W+")));
|
||||
</pre>
|
||||
|
||||
<p>
|
||||
And finally we can write the word stream back into another Kafka topic, say <code>streams-linesplit-output</code>.
|
||||
Again, these two steps can be concatenated as the following (assuming lambda expression is used):
|
||||
</p>
|
||||
|
||||
<pre class="brush: java;">
|
||||
KStream<String, String> source = builder.stream("streams-plaintext-input");
|
||||
source.flatMapValues(value -> Arrays.asList(value.split("\\W+")))
|
||||
.to("streams-linesplit-output");
|
||||
</pre>
|
||||
|
||||
<p>
|
||||
If we now describe this augmented topology as <code>System.out.println(topology.describe())</code>, we will get the following:
|
||||
</p>
|
||||
|
||||
<pre class="brush: bash;">
|
||||
> mvn clean package
|
||||
> mvn exec:java -Dexec.mainClass=myapps.LineSplit
|
||||
Sub-topologies:
|
||||
Sub-topology: 0
|
||||
Source: KSTREAM-SOURCE-0000000000(topics: streams-plaintext-input) --> KSTREAM-FLATMAPVALUES-0000000001
|
||||
Processor: KSTREAM-FLATMAPVALUES-0000000001(stores: []) --> KSTREAM-SINK-0000000002 <-- KSTREAM-SOURCE-0000000000
|
||||
Sink: KSTREAM-SINK-0000000002(topic: streams-linesplit-output) <-- KSTREAM-FLATMAPVALUES-0000000001
|
||||
Global Stores:
|
||||
none
|
||||
</pre>
|
||||
|
||||
<p>
|
||||
As we can see above, a new processor node <code>KSTREAM-FLATMAPVALUES-0000000001</code> is injected into the topology between the original source and sink nodes.
|
||||
It takes the source node as its parent and the sink node as its child.
|
||||
In other words, each record fetched by the source node will first traverse to the newly added <code>KSTREAM-FLATMAPVALUES-0000000001</code> node to be processed,
|
||||
and one or more new records will be generated as a result. They will continue traverse down to the sink node to be written back to Kafka.
|
||||
Note this processor node is "stateless" as it is not associated with any stores (i.e. <code>(stores: [])</code>).
|
||||
</p>
|
||||
|
||||
<p>
|
||||
The complete code looks like this (assuming lambda expression is used):
|
||||
</p>
|
||||
|
||||
<pre class="brush: java;">
|
||||
package myapps;
|
||||
|
||||
import org.apache.kafka.common.serialization.Serdes;
|
||||
import org.apache.kafka.streams.KafkaStreams;
|
||||
import org.apache.kafka.streams.StreamsBuilder;
|
||||
import org.apache.kafka.streams.StreamsConfig;
|
||||
import org.apache.kafka.streams.Topology;
|
||||
import org.apache.kafka.streams.kstream.KStream;
|
||||
|
||||
import java.util.Arrays;
|
||||
import java.util.Properties;
|
||||
import java.util.concurrent.CountDownLatch;
|
||||
|
||||
public class LineSplit {
|
||||
|
||||
public static void main(String[] args) throws Exception {
|
||||
Properties props = new Properties();
|
||||
props.put(StreamsConfig.APPLICATION_ID_CONFIG, "streams-linesplit");
|
||||
props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
|
||||
props.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass());
|
||||
props.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass());
|
||||
|
||||
final StreamsBuilder builder = new StreamsBuilder();
|
||||
|
||||
KStream<String, String> source = builder.stream("streams-plaintext-input");
|
||||
source.flatMapValues(value -> Arrays.asList(value.split("\\W+")))
|
||||
.to("streams-linesplit-output");
|
||||
|
||||
final Topology topology = builder.build();
|
||||
final KafkaStreams streams = new KafkaStreams(topology, props);
|
||||
final CountDownLatch latch = new CountDownLatch(1);
|
||||
|
||||
// ... same as Pipe.java above
|
||||
}
|
||||
}
|
||||
</pre>
|
||||
|
||||
<h4><a id="tutorial_code_wordcount" href="#tutorial_code_wordcount">Writing a third Streams application: Wordcount</a></h4>
|
||||
|
||||
<p>
|
||||
Let's now take a step further to add some "stateful" computations to the topology by counting the occurrence of the words split from the source text stream.
|
||||
Following similar steps let's create another program based on the <code>LineSplit.java</code> class:
|
||||
</p>
|
||||
|
||||
<pre class="brush: java;">
|
||||
public class WordCount {
|
||||
|
||||
public static void main(String[] args) throws Exception {
|
||||
Properties props = new Properties();
|
||||
props.put(StreamsConfig.APPLICATION_ID_CONFIG, "streams-wordcount");
|
||||
// ...
|
||||
}
|
||||
}
|
||||
</pre>
|
||||
|
||||
<p>
|
||||
In order to count the words we can first modify the <code>flatMapValues</code> operator to treat all of them as lower case (assuming lambda expression is used):
|
||||
</p>
|
||||
|
||||
<pre class="brush: java;">
|
||||
source.flatMapValues(new ValueMapper<String, Iterable<String>>() {
|
||||
@Override
|
||||
public Iterable<String> apply(String value) {
|
||||
return Arrays.asList(value.toLowerCase(Locale.getDefault()).split("\\W+"));
|
||||
}
|
||||
});
|
||||
</pre>
|
||||
|
||||
<p>
|
||||
In order to do the counting aggregation we have to first specify that we want to key the stream on the value string, i.e. the lower cased word, with a <code>groupBy</code> operator.
|
||||
This operator generate a new grouped stream, which can then be aggregated by a <code>count</code> operator, which generates a running count on each of the grouped keys:
|
||||
</p>
|
||||
|
||||
<pre class="brush: java;">
|
||||
KTable<String, Long> counts =
|
||||
source.flatMapValues(new ValueMapper<String, Iterable<String>>() {
|
||||
@Override
|
||||
public Iterable<String> apply(String value) {
|
||||
return Arrays.asList(value.toLowerCase(Locale.getDefault()).split("\\W+"));
|
||||
}
|
||||
})
|
||||
.groupBy(new KeyValueMapper<String, String, String>() {
|
||||
@Override
|
||||
public String apply(String key, String value) {
|
||||
return value;
|
||||
}
|
||||
})
|
||||
// Materialize the result into a KeyValueStore named "counts-store".
|
||||
// The Materialized store is always of type <Bytes, byte[]> as this is the format of the inner most store.
|
||||
.count(Materialized.<String, Long, KeyValueStore<Bytes, byte[]>> as("counts-store"));
|
||||
</pre>
|
||||
|
||||
<p>
|
||||
Note that the <code>count</code> operator has a <code>Materialized</code> parameter that specifies that the
|
||||
running count should be stored in a state store named <code>counts-store</code>.
|
||||
This <code>Counts</code> store can be queried in real-time, with details described in the <a href="/{{version}}/documentation/streams/developer-guide#streams_interactive_queries">Developer Manual</a>.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
We can also write the <code>counts</code> KTable's changelog stream back into another Kafka topic, say <code>streams-wordcount-output</code>.
|
||||
Because the result is a changelog stream, the output topic <code>streams-wordcount-output</code> should be configured with log compaction enabled.
|
||||
Note that this time the value type is no longer <code>String</code> but <code>Long</code>, so the default serialization classes are not viable for writing it to Kafka anymore.
|
||||
We need to provide overridden serialization methods for <code>Long</code> types, otherwise a runtime exception will be thrown:
|
||||
</p>
|
||||
|
||||
<pre class="brush: java;">
|
||||
counts.toStream().to("streams-wordcount-output", Produced.with(Serdes.String(), Serdes.Long()));
|
||||
</pre>
|
||||
|
||||
<p>
|
||||
Note that in order to read the changelog stream from topic <code>streams-wordcount-output</code>,
|
||||
one needs to set the value deserialization as <code>org.apache.kafka.common.serialization.LongDeserializer</code>.
|
||||
Details of this can be found in the <a href="/{{version}}/documentation/streams/quickstart">Play with a Streams Application</a> section.
|
||||
Assuming lambda expression from JDK 8 can be used, the above code can be simplified as:
|
||||
</p>
|
||||
|
||||
<pre class="brush: java;">
|
||||
KStream<String, String> source = builder.stream("streams-plaintext-input");
|
||||
source.flatMapValues(value -> Arrays.asList(value.toLowerCase(Locale.getDefault()).split("\\W+")))
|
||||
.groupBy((key, value) -> value)
|
||||
.count(Materialized.<String, Long, KeyValueStore<Bytes, byte[]>>as("counts-store"))
|
||||
.toStream()
|
||||
.to("streams-wordcount-output", Produced.with(Serdes.String(), Serdes.Long()));
|
||||
</pre>
|
||||
|
||||
<p>
|
||||
If we again describe this augmented topology as <code>System.out.println(topology.describe())</code>, we will get the following:
|
||||
</p>
|
||||
|
||||
<pre class="brush: bash;">
|
||||
> mvn clean package
|
||||
> mvn exec:java -Dexec.mainClass=myapps.WordCount
|
||||
Sub-topologies:
|
||||
Sub-topology: 0
|
||||
Source: KSTREAM-SOURCE-0000000000(topics: streams-plaintext-input) --> KSTREAM-FLATMAPVALUES-0000000001
|
||||
Processor: KSTREAM-FLATMAPVALUES-0000000001(stores: []) --> KSTREAM-KEY-SELECT-0000000002 <-- KSTREAM-SOURCE-0000000000
|
||||
Processor: KSTREAM-KEY-SELECT-0000000002(stores: []) --> KSTREAM-FILTER-0000000005 <-- KSTREAM-FLATMAPVALUES-0000000001
|
||||
Processor: KSTREAM-FILTER-0000000005(stores: []) --> KSTREAM-SINK-0000000004 <-- KSTREAM-KEY-SELECT-0000000002
|
||||
Sink: KSTREAM-SINK-0000000004(topic: Counts-repartition) <-- KSTREAM-FILTER-0000000005
|
||||
Sub-topology: 1
|
||||
Source: KSTREAM-SOURCE-0000000006(topics: Counts-repartition) --> KSTREAM-AGGREGATE-0000000003
|
||||
Processor: KSTREAM-AGGREGATE-0000000003(stores: [Counts]) --> KTABLE-TOSTREAM-0000000007 <-- KSTREAM-SOURCE-0000000006
|
||||
Processor: KTABLE-TOSTREAM-0000000007(stores: []) --> KSTREAM-SINK-0000000008 <-- KSTREAM-AGGREGATE-0000000003
|
||||
Sink: KSTREAM-SINK-0000000008(topic: streams-wordcount-output) <-- KTABLE-TOSTREAM-0000000007
|
||||
Global Stores:
|
||||
none
|
||||
</pre>
|
||||
|
||||
<p>
|
||||
As we can see above, the topology now contains two disconnected sub-topologies.
|
||||
The first sub-topology's sink node <code>KSTREAM-SINK-0000000004</code> will write to a repartition topic <code>Counts-repartition</code>,
|
||||
which will be read by the second sub-topology's source node <code>KSTREAM-SOURCE-0000000006</code>.
|
||||
The repartition topic is used to "shuffle" the source stream by its aggregation key, which is in this case the value string.
|
||||
In addition, inside the first sub-topology a stateless <code>KSTREAM-FILTER-0000000005</code> node is injected between the grouping <code>KSTREAM-KEY-SELECT-0000000002</code> node and the sink node to filter out any intermediate record whose aggregate key is empty.
|
||||
</p>
|
||||
<p>
|
||||
In the second sub-topology, the aggregation node <code>KSTREAM-AGGREGATE-0000000003</code> is associated with a state store named <code>Counts</code> (the name is specified by the user in the <code>count</code> operator).
|
||||
Upon receiving each record from its upcoming stream source node, the aggregation processor will first query its associated <code>Counts</code> store to get the current count for that key, augment by one, and then write the new count back to the store.
|
||||
Each updated count for the key will also be piped downstream to the <code>KTABLE-TOSTREAM-0000000007</code> node, which interpret this update stream as a record stream before further piping to the sink node <code>KSTREAM-SINK-0000000008</code> for writing back to Kafka.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
The complete code looks like this (assuming lambda expression is used):
|
||||
</p>
|
||||
|
||||
<pre class="brush: java;">
|
||||
package myapps;
|
||||
|
||||
import org.apache.kafka.common.serialization.Serdes;
|
||||
import org.apache.kafka.streams.KafkaStreams;
|
||||
import org.apache.kafka.streams.StreamsBuilder;
|
||||
import org.apache.kafka.streams.StreamsConfig;
|
||||
import org.apache.kafka.streams.Topology;
|
||||
import org.apache.kafka.streams.kstream.KStream;
|
||||
|
||||
import java.util.Arrays;
|
||||
import java.util.Locale;
|
||||
import java.util.Properties;
|
||||
import java.util.concurrent.CountDownLatch;
|
||||
|
||||
public class WordCount {
|
||||
|
||||
public static void main(String[] args) throws Exception {
|
||||
Properties props = new Properties();
|
||||
props.put(StreamsConfig.APPLICATION_ID_CONFIG, "streams-wordcount");
|
||||
props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
|
||||
props.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass());
|
||||
props.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass());
|
||||
|
||||
final StreamsBuilder builder = new StreamsBuilder();
|
||||
|
||||
KStream<String, String> source = builder.stream("streams-plaintext-input");
|
||||
source.flatMapValues(value -> Arrays.asList(value.toLowerCase(Locale.getDefault()).split("\\W+")))
|
||||
.groupBy((key, value) -> value)
|
||||
.count(Materialized.<String, Long, KeyValueStore<Bytes, byte[]>>as("counts-store"))
|
||||
.toStream()
|
||||
.to("streams-wordcount-output", Produced.with(Serdes.String(), Serdes.Long()));
|
||||
|
||||
final Topology topology = builder.build();
|
||||
final KafkaStreams streams = new KafkaStreams(topology, props);
|
||||
final CountDownLatch latch = new CountDownLatch(1);
|
||||
|
||||
// ... same as Pipe.java above
|
||||
}
|
||||
}
|
||||
</pre>
|
||||
|
||||
<div class="pagination">
|
||||
<a href="/{{version}}/documentation/streams/quickstart" class="pagination__btn pagination__btn__prev">Previous</a>
|
||||
<a href="/{{version}}/documentation/streams/core-concepts" class="pagination__btn pagination__btn__next">Next</a>
|
||||
</div>
|
||||
</script>
|
||||
|
||||
<div class="p-quickstart-streams"></div>
|
||||
|
||||
<!--#include virtual="../../includes/_header.htm" -->
|
||||
<!--#include virtual="../../includes/_top.htm" -->
|
||||
<div class="content documentation documentation--current">
|
||||
<!--#include virtual="../../includes/_nav.htm" -->
|
||||
<div class="right">
|
||||
<!--#include virtual="../../includes/_docs_banner.htm" -->
|
||||
<ul class="breadcrumbs">
|
||||
<li><a href="/documentation">Documentation</a></li>
|
||||
<li><a href="/documentation/streams">Kafka Streams</a></li>
|
||||
</ul>
|
||||
<div class="p-content"></div>
|
||||
</div>
|
||||
</div>
|
||||
<!--#include virtual="../../includes/_footer.htm" -->
|
||||
<script>
|
||||
$(function() {
|
||||
// Show selected style on nav item
|
||||
$('.b-nav__streams').addClass('selected');
|
||||
|
||||
//sticky secondary nav
|
||||
var $navbar = $(".sub-nav-sticky"),
|
||||
y_pos = $navbar.offset().top,
|
||||
height = $navbar.height();
|
||||
|
||||
$(window).scroll(function() {
|
||||
var scrollTop = $(window).scrollTop();
|
||||
|
||||
if (scrollTop > y_pos - height) {
|
||||
$navbar.addClass("navbar-fixed")
|
||||
} else if (scrollTop <= y_pos) {
|
||||
$navbar.removeClass("navbar-fixed")
|
||||
}
|
||||
});
|
||||
|
||||
// Display docs subnav items
|
||||
$('.b-nav__docs').parent().toggleClass('nav__item__with__subs--expanded');
|
||||
});
|
||||
</script>
|
||||
921
docs/streams/upgrade-guide.html
Normal file
921
docs/streams/upgrade-guide.html
Normal file
@@ -0,0 +1,921 @@
|
||||
<!--
|
||||
Licensed to the Apache Software Foundation (ASF) under one or more
|
||||
contributor license agreements. See the NOTICE file distributed with
|
||||
this work for additional information regarding copyright ownership.
|
||||
The ASF licenses this file to You under the Apache License, Version 2.0
|
||||
(the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
-->
|
||||
|
||||
<script><!--#include virtual="../js/templateData.js" --></script>
|
||||
|
||||
<script id="content-template" type="text/x-handlebars-template">
|
||||
<h1>Upgrade Guide and API Changes</h1>
|
||||
<div class="sub-nav-sticky">
|
||||
<div class="sticky-top">
|
||||
<div style="height:35px">
|
||||
<a href="/{{version}}/documentation/streams/">Introduction</a>
|
||||
<a href="/{{version}}/documentation/streams/quickstart">Run Demo App</a>
|
||||
<a href="/{{version}}/documentation/streams/tutorial">Tutorial: Write App</a>
|
||||
<a href="/{{version}}/documentation/streams/core-concepts">Concepts</a>
|
||||
<a href="/{{version}}/documentation/streams/architecture">Architecture</a>
|
||||
<a href="/{{version}}/documentation/streams/developer-guide/">Developer Guide</a>
|
||||
<a class="active-menu-item" href="/{{version}}/documentation/streams/upgrade-guide">Upgrade</a>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<p>
|
||||
Upgrading from any older version to {{fullDotVersion}} is possible: you will need to do two rolling bounces, where during the first rolling bounce phase you set the config <code>upgrade.from="older version"</code>
|
||||
(possible values are <code>"0.10.0" - "2.3"</code>) and during the second you remove it. This is required to safely upgrade to the new cooperative rebalancing protocol of the embedded consumer. Note that you will remain using the old eager
|
||||
rebalancing protocol if you skip or delay the second rolling bounce, but you can safely switch over to cooperative at any time once the entire group is on 2.4+ by removing the config value and bouncing. For more details please refer to
|
||||
<a href="https://cwiki.apache.org/confluence/x/vAclBg">KIP-429</a>:
|
||||
</p>
|
||||
<ul>
|
||||
<li> prepare your application instances for a rolling bounce and make sure that config <code>upgrade.from</code> is set to the version from which it is being upgrade.</li>
|
||||
<li> bounce each instance of your application once </li>
|
||||
<li> prepare your newly deployed {{fullDotVersion}} application instances for a second round of rolling bounces; make sure to remove the value for config <code>upgrade.mode</code> </li>
|
||||
<li> bounce each instance of your application once more to complete the upgrade </li>
|
||||
</ul>
|
||||
<p> As an alternative, an offline upgrade is also possible. Upgrading from any versions as old as 0.10.0.x to {{fullDotVersion}} in offline mode require the following steps: </p>
|
||||
<ul>
|
||||
<li> stop all old (e.g., 0.10.0.x) application instances </li>
|
||||
<li> update your code and swap old code and jar file with new code and new jar file </li>
|
||||
<li> restart all new ({{fullDotVersion}}) application instances </li>
|
||||
</ul>
|
||||
|
||||
<p>
|
||||
To run a Kafka Streams application version 2.2.1, 2.3.0, or higher a broker version 0.11.0 or higher is required
|
||||
and the on-disk message format must be 0.11 or higher.
|
||||
Brokers must be on version 0.10.1 or higher to run a Kafka Streams application version 0.10.1 to 2.2.0.
|
||||
Additionally, on-disk message format must be 0.10 or higher to run a Kafka Streams application version 1.0 to 2.2.0.
|
||||
For Kafka Streams 0.10.0, broker version 0.10.0 or higher is required.
|
||||
</p>
|
||||
|
||||
<p>Since 2.2.0 release, Kafka Streams depends on a RocksDBs version that requires MacOS 10.13 or higher.</p>
|
||||
|
||||
<p>
|
||||
Another important thing to keep in mind: in deprecated <code>KStreamBuilder</code> class, when a <code>KTable</code> is created from a source topic via <code>KStreamBuilder.table()</code>, its materialized state store
|
||||
will reuse the source topic as its changelog topic for restoring, and will disable logging to avoid appending new updates to the source topic; in the <code>StreamsBuilder</code> class introduced in 1.0, this behavior was changed
|
||||
accidentally: we still reuse the source topic as the changelog topic for restoring, but will also create a separate changelog topic to append the update records from source topic to. In the 2.0 release, we have fixed this issue and now users
|
||||
can choose whether or not to reuse the source topic based on the <code>StreamsConfig#TOPOLOGY_OPTIMIZATION</code>: if you are upgrading from the old <code>KStreamBuilder</code> class and hence you need to change your code to use
|
||||
the new <code>StreamsBuilder</code>, you should set this config value to <code>StreamsConfig#OPTIMIZE</code> to continue reusing the source topic; if you are upgrading from 1.0 or 1.1 where you are already using <code>StreamsBuilder</code> and hence have already
|
||||
created a separate changelog topic, you should set this config value to <code>StreamsConfig#NO_OPTIMIZATION</code> when upgrading to {{fullDotVersion}} in order to use that changelog topic for restoring the state store.
|
||||
More details about the new config <code>StreamsConfig#TOPOLOGY_OPTIMIZATION</code> can be found in <a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-295%3A+Add+Streams+Configuration+Allowing+for+Optional+Topology+Optimization">KIP-295</a>.
|
||||
</p>
|
||||
|
||||
<h3><a id="streams_api_changes_250" href="#streams_api_changes_250">Streams API changes in 2.5.0</a></h3>
|
||||
<p>
|
||||
We add a new <code>cogroup()</code> operator (via <a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-150+-+Kafka-Streams+Cogroup">KIP-150</a>>)
|
||||
that allows to aggregate multiple streams in a single operation.
|
||||
Cogrouped streams can also be windowed before they are aggregated.
|
||||
We refer to the <a href="/{{version}}/documentation/streams/developer-guide/dsl-api.html">developer guide</a> for more details.
|
||||
</p>
|
||||
<p>
|
||||
We added a new <code>KStream.toTable()</code> API to translate an input event stream into a changelog stream as per
|
||||
<a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-523%3A+Add+KStream%23toTable+to+the+Streams+DSL">KIP-523</a>.
|
||||
</p>
|
||||
<p>
|
||||
We added a new Serde type <code>Void</code> in <a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-527%3A+Add+VoidSerde+to+Serdes">KIP-527</a> to represent
|
||||
null keys or null values from input topic.
|
||||
</p>
|
||||
<p>
|
||||
Deprecated <code>UsePreviousTimeOnInvalidTimestamp</code> and replaced it with <code>UsePartitionTimeOnInvalidTimeStamp</code> as per
|
||||
<a href="https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=130028807">KIP-530</a>.
|
||||
</p>
|
||||
<p>
|
||||
Deprecated <code>KafkaStreams.store(String, QueryableStoreType)</code> and replaced it with <code>KafkaStreams.store(StoreQueryParameters)</code> to allow querying
|
||||
for a store with variety of parameters, including querying a specific task and stale stores, as per
|
||||
<a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-562%3A+Allow+fetching+a+key+from+a+single+partition+rather+than+iterating+over+all+the+stores+on+an+instance">KIP-562</a> and
|
||||
<a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-535%3A+Allow+state+stores+to+serve+stale+reads+during+rebalance">KIP-535</a> respectively.
|
||||
</p>
|
||||
|
||||
<h3><a id="streams_api_changes_240" href="#streams_api_changes_240">Streams API changes in 2.4.0</a></h3>
|
||||
<p>
|
||||
As of 2.4.0 Kafka Streams offers a KTable-KTable foreign-key join (as per <a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-213+Support+non-key+joining+in+KTable">KIP-213</a>).
|
||||
This joiner allows for records to be joined between two KTables with different keys.
|
||||
Both <a href="/{{version}}/documentation/streams/developer-guide/dsl-api.html#ktable-ktable-fk-join">INNER and LEFT foreign-key joins</a>
|
||||
are supported.
|
||||
</p>
|
||||
<p>
|
||||
In the 2.4 release, you now can name all operators in a Kafka Streams DSL topology via
|
||||
<a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-307%3A+Allow+to+define+custom+processor+names+with+KStreams+DSL">KIP-307</a>.
|
||||
Giving your operators meaningful names makes it easier to understand the topology
|
||||
description (<code>Topology#describe()#toString()</code>) and
|
||||
understand the full context of what your Kafka Streams application is doing.
|
||||
<br />
|
||||
There are new overloads on most <code>KStream</code> and <code>KTable</code> methods
|
||||
that accept a <code>Named</code> object. Typically you'll provide a name for the DSL operation by
|
||||
using <code>Named.as("my operator name")</code>. Naming of repartition topics for aggregation
|
||||
operations will still use <code>Grouped</code> and join operations will use
|
||||
either <code>Joined</code> or the new <code>StreamJoined</code> object.
|
||||
|
||||
</p>
|
||||
<p>
|
||||
Before the 2.4.0 version of Kafka Streams, users of the DSL could not name the state stores involved in a stream-stream join.
|
||||
If users changed their topology and added a operator before the
|
||||
join, the internal names of the state stores would shift, requiring an application reset when redeploying.
|
||||
In the 2.4.0 release, Kafka Streams adds the <code>StreamJoined</code>
|
||||
class, which gives users the ability to name the join processor, repartition topic(s) (if a repartition is required),
|
||||
and the state stores involved in the join. Also, by naming the state stores, the changelog topics
|
||||
backing the state stores are named as well. It's important to note that naming the stores
|
||||
<strong>will not</strong> make them queryable via Interactive Queries.
|
||||
<br/>
|
||||
Another feature delivered by <code>StreamJoined</code> is that you can now configure the type of state store used in the join.
|
||||
You can elect to use in-memory stores or custom state stores for a stream-stream join. Note that the provided stores
|
||||
will not be available for querying via Interactive Queries. With the addition
|
||||
of <code>StreamJoined</code>, stream-stream join operations
|
||||
using <code>Joined</code> have been deprecated. Please switch over to stream-stream join methods using the
|
||||
new overloaded methods. You can get more details from
|
||||
<a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-479%3A+Add+StreamJoined+config+object+to+Join">KIP-479</a>.
|
||||
</p>
|
||||
<p>
|
||||
With the introduction of incremental cooperative rebalancing, Streams no longer requires all tasks be revoked at the beginning of a rebalance. Instead, at the completion of the rebalance only those tasks which are to be migrated to another consumer
|
||||
for overall load balance will need to be closed and revoked. This changes the semantics of the <code>StateListener</code> a bit, as it will not necessarily transition to <code>REBALANCING</code> at the beginning of a rebalance anymore. Note that
|
||||
this means IQ will now be available at all times except during state restoration, including while a rebalance is in progress. If restoration is occurring when a rebalance begins, we will continue to actively restore the state stores and/or process
|
||||
standby tasks during a cooperative rebalance. Note that with this new rebalancing protocol, you may sometimes see a rebalance be followed by a second short rebalance that ensures all tasks are safely distributed. For details on please see
|
||||
<a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-429%3A+Kafka+Consumer+Incremental+Rebalance+Protocol">KIP-429</a>.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
The 2.4.0 release contains newly added and reworked metrics.
|
||||
<a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-444%3A+Augment+metrics+for+Kafka+Streams">KIP-444</a>
|
||||
adds new <em>client level</em> (i.e., <code>KafkaStreams</code> instance level) metrics to the existing
|
||||
thread-level, task-level, and processor-/state-store-level metrics.
|
||||
For a full list of available client level metrics, see the
|
||||
<a href="/{{version}}/documentation/#kafka_streams_client_monitoring">KafkaStreams monitoring</a>
|
||||
section in the operations guide.
|
||||
<br />
|
||||
Furthermore, RocksDB metrics are exposed via
|
||||
<a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-471%3A+Expose+RocksDB+Metrics+in+Kafka+Streams">KIP-471</a>.
|
||||
For a full list of available RocksDB metrics, see the
|
||||
<a href="/{{version}}/documentation/#kafka_streams_rocksdb_monitoring">RocksDB monitoring</a>
|
||||
section in the operations guide.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
Kafka Streams <code>test-utils</code> got improved via
|
||||
<a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-470%3A+TopologyTestDriver+test+input+and+output+usability+improvements">KIP-470</a>
|
||||
to simplify the process of using <code>TopologyTestDriver</code> to test your application code.
|
||||
We deprecated <code>ConsumerRecordFactory</code>, <code>TopologyTestDriver#pipeInput()</code>,
|
||||
<code>OutputVerifier</code>, as well as <code>TopologyTestDriver#readOutput()</code> and replace them with
|
||||
<code>TestInputTopic</code> and <code>TestOutputTopic</code>, respectively.
|
||||
We also introduced a new class <code>TestRecord</code> that simplifies assertion code.
|
||||
For full details see the
|
||||
<a href="/{{version}}/documentation/streams/developer-guide/testing.html">Testing section</a> in the developer guide.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
In 2.4.0, we deprecated <code>WindowStore#put(K key, V value)</code> that should never be used.
|
||||
Instead the existing <code>WindowStore#put(K key, V value, long windowStartTimestamp)</code> should be used
|
||||
(<a href="https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=115526545">KIP-474</a>).
|
||||
</p>
|
||||
|
||||
<p>
|
||||
Furthermore, the <code>PartitionGrouper</code> interface and its corresponding configuration parameter
|
||||
<code>partition.grouper</code> were deprecated
|
||||
(<a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-528%3A+Deprecate+PartitionGrouper+configuration+and+interface">KIP-528</a>)
|
||||
and will be removed in the next major release (<a href="https://issues.apache.org/jira/browse/KAFKA-7785">KAFKA-7785</a>.
|
||||
Hence, this feature won't be supported in the future any longer and you need to updated your code accordingly.
|
||||
If you use a custom <code>PartitionGrouper</code> and stop to use it, the created tasks might change.
|
||||
Hence, you will need to reset your application to upgrade it.
|
||||
|
||||
|
||||
<h3><a id="streams_api_changes_230" href="#streams_api_changes_230">Streams API changes in 2.3.0</a></h3>
|
||||
|
||||
<p>Version 2.3.0 adds the Suppress operator to the <code>kafka-streams-scala</code> Ktable API.</p>
|
||||
|
||||
<p>
|
||||
As of 2.3.0 Streams now offers an in-memory version of the window (<a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-428%3A+Add+in-memory+window+store">KIP-428</a>)
|
||||
and the session (<a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-445%3A+In-memory+Session+Store">KIP-445</a>) store, in addition to the persistent ones based on RocksDB.
|
||||
The new public interfaces <code>inMemoryWindowStore()</code> and <code>inMemorySessionStore()</code> are added to <code>Stores</code> and provide the built-in in-memory window or session store.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
As of 2.3.0 we've updated how to turn on optimizations. Now to enable optimizations, you need to do two things.
|
||||
First add this line to your properties <code>properties.setProperty(StreamsConfig.TOPOLOGY_OPTIMIZATION, StreamsConfig.OPTIMIZE);</code>, as you have done before.
|
||||
Second, when constructing your <code>KafkaStreams</code> instance, you'll need to pass your configuration properties when building your
|
||||
topology by using the overloaded <code>StreamsBuilder.build(Properties)</code> method.
|
||||
For example <code>KafkaStreams myStream = new KafkaStreams(streamsBuilder.build(properties), properties)</code>.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
In 2.3.0 we have added default implementation to <code>close()</code> and <code>configure()</code> for <code>Serializer</code>,
|
||||
<code>Deserializer</code> and <code>Serde</code> so that they can be implemented by lambda expression.
|
||||
For more details please read <a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-331+Add+default+implementation+to+close%28%29+and+configure%28%29+for+Serializer%2C+Deserializer+and+Serde">KIP-331</a>.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
To improve operator semantics, new store types are added that allow storing an additional timestamp per key-value pair or window.
|
||||
Some DSL operators (for example KTables) are using those new stores.
|
||||
Hence, you can now retrieve the last update timestamp via Interactive Queries if you specify
|
||||
<code>TimestampedKeyValueStoreType</code> or <code>TimestampedWindowStoreType</code> as your <code>QueryableStoreType</code>.
|
||||
While this change is mainly transparent, there are some corner cases that may require code changes:
|
||||
<strong>Caution: If you receive an untyped store and use a cast, you might need to update your code to cast to the correct type.
|
||||
Otherwise, you might get an exception similar to
|
||||
<code>java.lang.ClassCastException: class org.apache.kafka.streams.state.ValueAndTimestamp cannot be cast to class YOUR-VALUE-TYPE</code>
|
||||
upon getting a value from the store.</strong>
|
||||
Additionally, <code>TopologyTestDriver#getStateStore()</code> only returns non-built-in stores and throws an exception if a built-in store is accessed.
|
||||
For more details please read <a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-258%3A+Allow+to+Store+Record+Timestamps+in+RocksDB">KIP-258</a>.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
To improve type safety, a new operator <code>KStream#flatTransformValues</code> is added.
|
||||
For more details please read <a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-313%3A+Add+KStream.flatTransform+and+KStream.flatTransformValues">KIP-313</a>.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
Kafka Streams used to set the configuration parameter <code>max.poll.interval.ms</code> to <code>Integer.MAX_VALUE</code>.
|
||||
This default value is removed and Kafka Streams uses the consumer default value now.
|
||||
For more details please read <a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-442%3A+Return+to+default+max+poll+interval+in+Streams">KIP-442</a>.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
Default configuration for repartition topic was changed:
|
||||
The segment size for index files (<code>segment.index.bytes</code>) is no longer 50MB, but uses the cluster default.
|
||||
Similarly, the configuration <code>segment.ms</code> in no longer 10 minutes, but uses the cluster default configuration.
|
||||
Lastly, the retention period (<code>retention.ms</code>) is changed from <code>Long.MAX_VALUE</code> to <code>-1</code> (infinite).
|
||||
For more details please read <a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-443%3A+Return+to+default+segment.ms+and+segment.index.bytes+in+Streams+repartition+topics">KIP-443</a>.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
To avoid memory leaks, <code>RocksDBConfigSetter</code> has a new <code>close()</code> method that is called on shutdown.
|
||||
Users should implement this method to release any memory used by RocksDB config objects, by closing those objects.
|
||||
For more details please read <a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-453%3A+Add+close%28%29+method+to+RocksDBConfigSetter">KIP-453</a>.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
RocksDB dependency was updated to version <code>5.18.3</code>.
|
||||
The new version allows to specify more RocksDB configurations, including <code>WriteBufferManager</code> which helps to limit RocksDB off-heap memory usage.
|
||||
For more details please read <a href="https://issues.apache.org/jira/browse/KAFKA-8215">KAFKA-8215</a>.
|
||||
</p>
|
||||
|
||||
<h3><a id="streams_notable_changes_221" href="#streams_api_changes_221">Notable changes in Kafka Streams 2.2.1</a></h3>
|
||||
<p>
|
||||
As of Kafka Streams 2.2.1 a message format 0.11 or higher is required;
|
||||
this implies that brokers must be on version 0.11.0 or higher.
|
||||
</p>
|
||||
|
||||
<h3><a id="streams_api_changes_220" href="#streams_api_changes_220">Streams API changes in 2.2.0</a></h3>
|
||||
<p>
|
||||
We've simplified the <code>KafkaStreams#state</code> transition diagram during the starting up phase a bit in 2.2.0: in older versions the state will transit from <code>CREATED</code> to <code>RUNNING</code>, and then to <code>REBALANCING</code> to get the first
|
||||
stream task assignment, and then back to <code>RUNNING</code>; starting in 2.2.0 it will transit from <code>CREATED</code> directly to <code>REBALANCING</code> and then to <code>RUNNING</code>.
|
||||
If you have registered a <code>StateListener</code> that captures state transition events, you may need to adjust your listener implementation accordingly for this simplification (in practice, your listener logic should be very unlikely to be affected at all).
|
||||
</p>
|
||||
|
||||
<p>
|
||||
In <code>WindowedSerdes</code>, we've added a new static constructor to return a <code>TimeWindowSerde</code> with configurable window size. This is to help users to construct time window serdes to read directly from a time-windowed store's changelog.
|
||||
More details can be found in <a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-393%3A+Time+windowed+serde+to+properly+deserialize+changelog+input+topic">KIP-393</a>.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
In 2.2.0 we have extended a few public interfaces including <code>KafkaStreams</code> to extend <code>AutoCloseable</code> so that they can be
|
||||
used in a try-with-resource statement. For a full list of public interfaces that get impacted please read <a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-376%3A+Implement+AutoClosable+on+appropriate+classes+that+want+to+be+used+in+a+try-with-resource+statement">KIP-376</a>.
|
||||
</p>
|
||||
|
||||
<h3><a id="streams_api_changes_210" href="#streams_api_changes_210">Streams API changes in 2.1.0</a></h3>
|
||||
<p>
|
||||
We updated <code>TopologyDescription</code> API to allow for better runtime checking.
|
||||
Users are encouraged to use <code>#topicSet()</code> and <code>#topicPattern()</code> accordingly on <code>TopologyDescription.Source</code> nodes,
|
||||
instead of using <code>#topics()</code>, which has since been deprecated. Similarly, use <code>#topic()</code> and <code>#topicNameExtractor()</code>
|
||||
to get descriptions of <code>TopologyDescription.Sink</code> nodes. For more details, see
|
||||
<a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-321%3A+Update+TopologyDescription+to+better+represent+Source+and+Sink+Nodes">KIP-321</a>.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
We've added a new class <code>Grouped</code> and deprecated <code>Serialized</code>. The intent of adding <code>Grouped</code> is the ability to
|
||||
name repartition topics created when performing aggregation operations. Users can name the potential repartition topic using the
|
||||
<code>Grouped#as()</code> method which takes a <code>String</code> and is used as part of the repartition topic name. The resulting repartition
|
||||
topic name will still follow the pattern of <code>${application-id}->name<-repartition</code>. The <code>Grouped</code> class is now favored over
|
||||
<code>Serialized</code> in <code>KStream#groupByKey()</code>, <code>KStream#groupBy()</code>, and <code>KTable#groupBy()</code>.
|
||||
Note that Kafka Streams does not automatically create repartition topics for aggregation operations.
|
||||
|
||||
Additionally, we've updated the <code>Joined</code> class with a new method <code>Joined#withName</code>
|
||||
enabling users to name any repartition topics required for performing Stream/Stream or Stream/Table join. For more details repartition
|
||||
topic naming, see <a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-372%3A+Naming+Repartition+Topics+for+Joins+and+Grouping">KIP-372</a>.
|
||||
|
||||
As a result we've updated the Kafka Streams Scala API and removed the <code>Serialized</code> class in favor of adding <code>Grouped</code>.
|
||||
If you just rely on the implicit <code>Serialized</code>, you just need to recompile; if you pass in <code>Serialized</code> explicitly, sorry you'll have to make code changes.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
We've added a new config named <code>max.task.idle.ms</code> to allow users specify how to handle out-of-order data within a task that may be processing multiple
|
||||
topic-partitions (see <a href="/{{version}}/documentation/streams/core-concepts.html#streams_out_of_ordering">Out-of-Order Handling</a> section for more details).
|
||||
The default value is set to <code>0</code>, to favor minimized latency over synchronization between multiple input streams from topic-partitions.
|
||||
If users would like to wait for longer time when some of the topic-partitions do not have data available to process and hence cannot determine its corresponding stream time,
|
||||
they can override this config to a larger value.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
We've added the missing <code>SessionBytesStoreSupplier#retentionPeriod()</code> to be consistent with the <code>WindowBytesStoreSupplier</code> which allows users to get the specified retention period for session-windowed stores.
|
||||
We've also added the missing <code>StoreBuilder#withCachingDisabled()</code> to allow users to turn off caching for their customized stores.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
We added a new serde for UUIDs (<code>Serdes.UUIDSerde</code>) that you can use via <code>Serdes.UUID()</code>
|
||||
(cf. <a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-206%3A+Add+support+for+UUID+serialization+and+deserialization">KIP-206</a>).
|
||||
</p>
|
||||
|
||||
<p>
|
||||
We updated a list of methods that take <code>long</code> arguments as either timestamp (fix point) or duration (time period)
|
||||
and replaced them with <code>Instant</code> and <code>Duration</code> parameters for improved semantics.
|
||||
Some old methods base on <code>long</code> are deprecated and users are encouraged to update their code.
|
||||
<br />
|
||||
In particular, aggregation windows (hopping/tumbling/unlimited time windows and session windows) as well as join windows now take <code>Duration</code>
|
||||
arguments to specify window size, hop, and gap parameters.
|
||||
Also, window sizes and retention times are now specified as <code>Duration</code> type in <code>Stores</code> class.
|
||||
The <code>Window</code> class has new methods <code>#startTime()</code> and <code>#endTime()</code> that return window start/end timestamp as <code>Instant</code>.
|
||||
For interactive queries, there are new <code>#fetch(...)</code> overloads taking <code>Instant</code> arguments.
|
||||
Additionally, punctuations are now registerd via <code>ProcessorContext#schedule(Duration interval, ...)</code>.
|
||||
For more details, see <a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-358%3A+Migrate+Streams+API+to+Duration+instead+of+long+ms+times">KIP-358</a>.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
We deprecated <code>KafkaStreams#close(...)</code> and replaced it with <code>KafkaStreams#close(Duration)</code> that accepts a single timeout argument
|
||||
Note: the new <code>#close(Duration)</code> method has improved (but slightly different) semantics.
|
||||
For more details, see <a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-358%3A+Migrate+Streams+API+to+Duration+instead+of+long+ms+times">KIP-358</a>.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
The newly exposed <code>AdminClient</code> metrics are now available when calling the <code>KafkaStream#metrics()</code> method.
|
||||
For more details on exposing <code>AdminClients</code> metrics
|
||||
see <a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-324%3A+Add+method+to+get+metrics%28%29+in+AdminClient">KIP-324</a>
|
||||
</p>
|
||||
|
||||
<p>
|
||||
We deprecated the notion of segments in window stores as those are intended to be an implementation details.
|
||||
Thus, method <code>Windows#segments()</code> and variable <code>Windows#segments</code> were deprecated.
|
||||
If you implement custom windows, you should update your code accordingly.
|
||||
Similarly, <code>WindowBytesStoreSupplier#segments()</code> was deprecated and replaced with <code>WindowBytesStoreSupplier#segmentInterval()</code>.
|
||||
If you implement custom window store, you need to update your code accordingly.
|
||||
Finally, <code>Stores#persistentWindowStore(...)</code> were deprecated and replaced with a new overload that does not allow to specify the number of segments any longer.
|
||||
For more details, see <a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-319%3A+Replace+segments+with+segmentInterval+in+WindowBytesStoreSupplier">KIP-319</a>
|
||||
(note: <a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-328%3A+Ability+to+suppress+updates+for+KTables">KIP-328</a> and
|
||||
<a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-358%3A+Migrate+Streams+API+to+Duration+instead+of+long+ms+times">KIP-358</a> 'overlap' with KIP-319).
|
||||
</p>
|
||||
|
||||
<p>
|
||||
We've added an overloaded <code>StreamsBuilder#build</code> method that accepts an instance of <code>java.util.Properties</code> with the intent of using the
|
||||
<code>StreamsConfig#TOPOLOGY_OPTIMIZATION</code> config added in Kafka Streams 2.0. Before 2.1, when building a topology with
|
||||
the DSL, Kafka Streams writes the physical plan as the user makes calls on the DSL. Now by providing a <code>java.util.Properties</code> instance when
|
||||
executing a <code>StreamsBuilder#build</code> call, Kafka Streams can optimize the physical plan of the topology, provided the <code>StreamsConfig#TOPOLOGY_OPTIMIZATION</code>
|
||||
config is set to <code>StreamsConfig#OPTIMIZE</code>. By setting <code>StreamsConfig#OPTIMIZE</code> in addition to the <code>KTable</code> optimization of
|
||||
reusing the source topic as the changelog topic, the topology may be optimized to merge redundant repartition topics into one
|
||||
repartition topic. The original no parameter version of <code>StreamsBuilder#build</code> is still available for those who wish to not
|
||||
optimize their topology. Note that enabling optimization of the topology may require you to do an application reset when redeploying the application. For more
|
||||
details, see <a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-312%3A+Add+Overloaded+StreamsBuilder+Build+Method+to+Accept+java.util.Properties">KIP-312</a>
|
||||
</p>
|
||||
|
||||
<p>
|
||||
We are introducing static membership towards Kafka Streams user. This feature reduces unnecessary rebalances during normal application upgrades or rolling bounces.
|
||||
For more details on how to use it, checkout <a href="/{{version}}/documentation/#static_membership">static membership design</a>.
|
||||
Note, Kafka Streams uses the same <code>ConsumerConfig#GROUP_INSTANCE_ID_CONFIG</code>, and you only need to make sure it is uniquely defined across
|
||||
different stream instances in one application.
|
||||
</p>
|
||||
|
||||
<h3><a id="streams_api_changes_200" href="#streams_api_changes_200">Streams API changes in 2.0.0</a></h3>
|
||||
<p>
|
||||
In 2.0.0 we have added a few new APIs on the <code>ReadOnlyWindowStore</code> interface (for details please read <a href="#streams_api_changes_200">Streams API changes</a> below).
|
||||
If you have customized window store implementations that extends the <code>ReadOnlyWindowStore</code> interface you need to make code changes.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
In addition, if you using Java 8 method references in your Kafka Streams code you might need to update your code to resolve method ambiguities.
|
||||
Hot-swapping the jar-file only might not work for this case.
|
||||
See below a complete list of <a href="#streams_api_changes_200">2.0.0</a>
|
||||
API and semantic changes that allow you to advance your application and/or simplify your code base.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
We moved <code>Consumed</code> interface from <code>org.apache.kafka.streams</code> to <code>org.apache.kafka.streams.kstream</code>
|
||||
as it was mistakenly placed in the previous release. If your code has already used it there is a simple one-liner change needed in your import statement.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
We have also removed some public APIs that are deprecated prior to 1.0.x in 2.0.0.
|
||||
See below for a detailed list of removed APIs.
|
||||
</p>
|
||||
<p>
|
||||
We have removed the <code>skippedDueToDeserializationError-rate</code> and <code>skippedDueToDeserializationError-total</code> metrics.
|
||||
Deserialization errors, and all other causes of record skipping, are now accounted for in the pre-existing metrics
|
||||
<code>skipped-records-rate</code> and <code>skipped-records-total</code>. When a record is skipped, the event is
|
||||
now logged at WARN level. If these warnings become burdensome, we recommend explicitly filtering out unprocessable
|
||||
records instead of depending on record skipping semantics. For more details, see
|
||||
<a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-274%3A+Kafka+Streams+Skipped+Records+Metrics">KIP-274</a>.
|
||||
As of right now, the potential causes of skipped records are:
|
||||
</p>
|
||||
<ul>
|
||||
<li><code>null</code> keys in table sources</li>
|
||||
<li><code>null</code> keys in table-table inner/left/outer/right joins</li>
|
||||
<li><code>null</code> keys or values in stream-table joins</li>
|
||||
<li><code>null</code> keys or values in stream-stream joins</li>
|
||||
<li><code>null</code> keys or values in aggregations on grouped streams</li>
|
||||
<li><code>null</code> keys or values in reductions on grouped streams</li>
|
||||
<li><code>null</code> keys in aggregations on windowed streams</li>
|
||||
<li><code>null</code> keys in reductions on windowed streams</li>
|
||||
<li><code>null</code> keys in aggregations on session-windowed streams</li>
|
||||
<li>
|
||||
Errors producing results, when the configured <code>default.production.exception.handler</code> decides to
|
||||
<code>CONTINUE</code> (the default is to <code>FAIL</code> and throw an exception).
|
||||
</li>
|
||||
<li>
|
||||
Errors deserializing records, when the configured <code>default.deserialization.exception.handler</code>
|
||||
decides to <code>CONTINUE</code> (the default is to <code>FAIL</code> and throw an exception).
|
||||
This was the case previously captured in the <code>skippedDueToDeserializationError</code> metrics.
|
||||
</li>
|
||||
<li>Fetched records having a negative timestamp.</li>
|
||||
</ul>
|
||||
|
||||
<p>
|
||||
We've also fixed the metrics name for time and session windowed store operations in 2.0. As a result, our current built-in stores
|
||||
will have their store types in the metric names as <code>in-memory-state</code>, <code>in-memory-lru-state</code>,
|
||||
<code>rocksdb-state</code>, <code>rocksdb-window-state</code>, and <code>rocksdb-session-state</code>. For example, a RocksDB time windowed store's
|
||||
put operation metrics would now be
|
||||
<code>kafka.streams:type=stream-rocksdb-window-state-metrics,client-id=([-.\w]+),task-id=([-.\w]+),rocksdb-window-state-id=([-.\w]+)</code>.
|
||||
Users need to update their metrics collecting and reporting systems for their time and session windowed stores accordingly.
|
||||
For more details, please read the <a href="/{{version}}/documentation/#kafka_streams_store_monitoring">State Store Metrics</a> section.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
We have added support for methods in <code>ReadOnlyWindowStore</code> which allows for querying a single window's key-value pair.
|
||||
For users who have customized window store implementations on the above interface, they'd need to update their code to implement the newly added method as well.
|
||||
For more details, see <a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-261%3A+Add+Single+Value+Fetch+in+Window+Stores">KIP-261</a>.
|
||||
</p>
|
||||
<p>
|
||||
We have added public <code>WindowedSerdes</code> to allow users to read from / write to a topic storing windowed table changelogs directly.
|
||||
In addition, in <code>StreamsConfig</code> we have also added <code>default.windowed.key.serde.inner</code> and <code>default.windowed.value.serde.inner</code>
|
||||
to let users specify inner serdes if the default serde classes are windowed serdes.
|
||||
For more details, see <a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-265%3A+Make+Windowed+Serde+to+public+APIs">KIP-265</a>.
|
||||
</p>
|
||||
<p>
|
||||
We've added message header support in the <code>Processor API</code> in Kafka 2.0.0. In particular, we have added a new API <code>ProcessorContext#headers()</code>
|
||||
which returns a <code>Headers</code> object that keeps track of the headers of the source topic's message that is being processed. Through this object, users can manipulate
|
||||
the headers map that is being propagated throughout the processor topology as well. For more details please feel free to read
|
||||
the <a href="/{{version}}/documentation/streams/developer-guide/processor-api.html#accessing-processor-context">Developer Guide</a> section.
|
||||
</p>
|
||||
<p>
|
||||
We have deprecated constructors of <code>KafkaStreams</code> that take a <code>StreamsConfig</code> as parameter.
|
||||
Please use the other corresponding constructors that accept <code>java.util.Properties</code> instead.
|
||||
For more details, see <a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-245%3A+Use+Properties+instead+of+StreamsConfig+in+KafkaStreams+constructor">KIP-245</a>.
|
||||
</p>
|
||||
<p>
|
||||
Kafka 2.0.0 allows to manipulate timestamps of output records using the Processor API (<a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-251%3A+Allow+timestamp+manipulation+in+Processor+API">KIP-251</a>).
|
||||
To enable this new feature, <code>ProcessorContext#forward(...)</code> was modified.
|
||||
The two existing overloads <code>#forward(Object key, Object value, String childName)</code> and <code>#forward(Object key, Object value, int childIndex)</code> were deprecated and a new overload <code>#forward(Object key, Object value, To to)</code> was added.
|
||||
The new class <code>To</code> allows you to send records to all or specific downstream processors by name and to set the timestamp for the output record.
|
||||
Forwarding based on child index is not supported in the new API any longer.
|
||||
</p>
|
||||
<p>
|
||||
We have added support to allow routing records dynamically to Kafka topics. More specifically, in both the lower-level <code>Topology#addSink</code> and higher-level <code>KStream#to</code> APIs, we have added variants that
|
||||
take a <code>TopicNameExtractor</code> instance instead of a specific <code>String</code> typed topic name, such that for each received record from the upstream processor, the library will dynamically determine which Kafka topic to write to
|
||||
based on the record's key and value, as well as record context. Note that all the Kafka topics that may possibly be used are still considered as user topics and hence required to be pre-created. In addition to that, we have modified the
|
||||
<code>StreamPartitioner</code> interface to add the topic name parameter since the topic name now may not be known beforehand; users who have customized implementations of this interface would need to update their code while upgrading their application
|
||||
to use Kafka Streams 2.0.0.
|
||||
</p>
|
||||
<p>
|
||||
<a href="https://cwiki.apache.org/confluence/x/DVyHB">KIP-284</a> changed the retention time for repartition topics by setting its default value to <code>Long.MAX_VALUE</code>.
|
||||
Instead of relying on data retention Kafka Streams uses the new purge data API to delete consumed data from those topics and to keep used storage small now.
|
||||
</p>
|
||||
<p>
|
||||
We have modified the <code>ProcessorStateManger#register(...)</code> signature and removed the deprecated <code>loggingEnabled</code> boolean parameter as it is specified in the <code>StoreBuilder</code>.
|
||||
Users who used this function to register their state stores into the processor topology need to simply update their code and remove this parameter from the caller.
|
||||
</p>
|
||||
<p>
|
||||
Kafka Streams DSL for Scala is a new Kafka Streams client library available for developers authoring Kafka Streams applications in Scala. It wraps core Kafka Streams DSL types to make it easier to call when
|
||||
interoperating with Scala code. For example, it includes higher order functions as parameters for transformations avoiding the need anonymous classes in Java 7 or experimental SAM type conversions in Scala 2.11,
|
||||
automatic conversion between Java and Scala collection types, a way
|
||||
to implicitly provide SerDes to reduce boilerplate from your application and make it more typesafe, and more! For more information see the
|
||||
<a href="/{{version}}/documentation/streams/developer-guide/dsl-api.html#scala-dsl">Kafka Streams DSL for Scala documentation</a> and
|
||||
<a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-270+-+A+Scala+Wrapper+Library+for+Kafka+Streams">KIP-270</a>.
|
||||
</p>
|
||||
<p>
|
||||
We have removed these deprecated APIs:
|
||||
</p>
|
||||
<ul>
|
||||
<li><code>KafkaStreams#toString</code> no longer returns the topology and runtime metadata; to get topology metadata users can call <code>Topology#describe()</code> and to get thread runtime metadata users can call <code>KafkaStreams#localThreadsMetadata</code> (they are deprecated since 1.0.0).
|
||||
For detailed guidance on how to update your code please read <a href="#streams_api_changes_100">here</a></li>
|
||||
<li><code>TopologyBuilder</code> and <code>KStreamBuilder</code> are removed and replaced by <code>Topology</code> and <code>StreamsBuidler</code> respectively (they are deprecated since 1.0.0).
|
||||
For detailed guidance on how to update your code please read <a href="#streams_api_changes_100">here</a></li>
|
||||
<li><code>StateStoreSupplier</code> are removed and replaced with <code>StoreBuilder</code> (they are deprecated since 1.0.0);
|
||||
and the corresponding <code>Stores#create</code> and <code>KStream, KTable, KGroupedStream</code> overloaded functions that use it have also been removed.
|
||||
For detailed guidance on how to update your code please read <a href="#streams_api_changes_100">here</a></li>
|
||||
<li><code>KStream, KTable, KGroupedStream</code> overloaded functions that requires serde and other specifications explicitly are removed and replaced with simpler overloaded functions that use <code>Consumed, Produced, Serialized, Materialized, Joined</code> (they are deprecated since 1.0.0).
|
||||
For detailed guidance on how to update your code please read <a href="#streams_api_changes_100">here</a></li>
|
||||
<li><code>Processor#punctuate</code>, <code>ValueTransformer#punctuate</code>, <code>ValueTransformer#punctuate</code> and <code>ProcessorContext#schedule(long)</code> are removed and replaced by <code>ProcessorContext#schedule(long, PunctuationType, Punctuator)</code> (they are deprecated in 1.0.0). </li>
|
||||
<li>The second <code>boolean</code> typed parameter "loggingEnabled" in <code>ProcessorContext#register</code> has been removed; users can now use <code>StoreBuilder#withLoggingEnabled, withLoggingDisabled</code> to specify the behavior when they create the state store. </li>
|
||||
<li><code>KTable#writeAs, print, foreach, to, through</code> are removed, users can call <code>KTable#tostream()#writeAs</code> instead for the same purpose (they are deprecated since 0.11.0.0).
|
||||
For detailed list of removed APIs please read <a href="#streams_api_changes_0110">here</a></li>
|
||||
<li><code>StreamsConfig#KEY_SERDE_CLASS_CONFIG, VALUE_SERDE_CLASS_CONFIG, TIMESTAMP_EXTRACTOR_CLASS_CONFIG</code> are removed and replaced with <code>StreamsConfig#DEFAULT_KEY_SERDE_CLASS_CONFIG, DEFAULT_VALUE_SERDE_CLASS_CONFIG, DEFAULT_TIMESTAMP_EXTRACTOR_CLASS_CONFIG</code> respectively (they are deprecated since 0.11.0.0). </li>
|
||||
<li><code>StreamsConfig#ZOOKEEPER_CONNECT_CONFIG</code> are removed as we do not need ZooKeeper dependency in Streams any more (it is deprecated since 0.10.2.0). </li>
|
||||
</ul>
|
||||
|
||||
<h3><a id="streams_api_changes_110" href="#streams_api_changes_110">Streams API changes in 1.1.0</a></h3>
|
||||
<p>
|
||||
We have added support for methods in <code>ReadOnlyWindowStore</code> which allows for querying <code>WindowStore</code>s without the necessity of providing keys.
|
||||
For users who have customized window store implementations on the above interface, they'd need to update their code to implement the newly added method as well.
|
||||
For more details, see <a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-205%3A+Add+all%28%29+and+range%28%29+API+to+ReadOnlyWindowStore">KIP-205</a>.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
There is a new artifact <code>kafka-streams-test-utils</code> providing a <code>TopologyTestDriver</code>, <code>ConsumerRecordFactory</code>, and <code>OutputVerifier</code> class.
|
||||
You can include the new artifact as a regular dependency to your unit tests and use the test driver to test your business logic of your Kafka Streams application.
|
||||
For more details, see <a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-247%3A+Add+public+test+utils+for+Kafka+Streams">KIP-247</a>.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
The introduction of <a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-220%3A+Add+AdminClient+into+Kafka+Streams%27+ClientSupplier">KIP-220</a>
|
||||
enables you to provide configuration parameters for the embedded admin client created by Kafka Streams, similar to the embedded producer and consumer clients.
|
||||
You can provide the configs via <code>StreamsConfig</code> by adding the configs with the prefix <code>admin.</code> as defined by <code>StreamsConfig#adminClientPrefix(String)</code>
|
||||
to distinguish them from configurations of other clients that share the same config names.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
New method in <code>KTable</code>
|
||||
</p>
|
||||
<ul>
|
||||
<li> <code>transformValues</code> methods have been added to <code>KTable</code>. Similar to those on <code>KStream</code>, these methods allow for richer, stateful, value transformation similar to the Processor API.</li>
|
||||
</ul>
|
||||
|
||||
<p>
|
||||
New method in <code>GlobalKTable</code>
|
||||
</p>
|
||||
<ul>
|
||||
<li> A method has been provided such that it will return the store name associated with the <code>GlobalKTable</code> or <code>null</code> if the store name is non-queryable. </li>
|
||||
</ul>
|
||||
|
||||
<p>
|
||||
New methods in <code>KafkaStreams</code>:
|
||||
</p>
|
||||
<ul>
|
||||
<li> added overload for the constructor that allows overriding the <code>Time</code> object used for tracking system wall-clock time; this is useful for unit testing your application code. </li>
|
||||
</ul>
|
||||
|
||||
<p> New methods in <code>KafkaClientSupplier</code>: </p>
|
||||
<ul>
|
||||
<li> added <code>getAdminClient(config)</code> that allows to override an <code>AdminClient</code> used for administrative requests such as internal topic creations, etc. </li>
|
||||
</ul>
|
||||
|
||||
<p>New error handling for exceptions during production:</p>
|
||||
<ul>
|
||||
<li>added interface <code>ProductionExceptionHandler</code> that allows implementors to decide whether or not Streams should <code>FAIL</code> or <code>CONTINUE</code> when certain exception occur while trying to produce.</li>
|
||||
<li>provided an implementation, <code>DefaultProductionExceptionHandler</code> that always fails, preserving the existing behavior by default.</li>
|
||||
<li>changing which implementation is used can be done by settings <code>default.production.exception.handler</code> to the fully qualified name of a class implementing this interface.</li>
|
||||
</ul>
|
||||
|
||||
<p> Changes in <code>StreamsResetter</code>: </p>
|
||||
<ul>
|
||||
<li> added options to specify input topics offsets to reset according to <a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-171+-+Extend+Consumer+Group+Reset+Offset+for+Stream+Application">KIP-171</a></li>
|
||||
</ul>
|
||||
|
||||
<h3><a id="streams_api_changes_100" href="#streams_api_changes_100">Streams API changes in 1.0.0</a></h3>
|
||||
|
||||
<p>
|
||||
With 1.0 a major API refactoring was accomplished and the new API is cleaner and easier to use.
|
||||
This change includes the five main classes <code>KafkaStreams</code>, <code>KStreamBuilder</code>,
|
||||
<code>KStream</code>, <code>KTable</code>, and <code>TopologyBuilder</code> (and some more others).
|
||||
All changes are fully backward compatible as old API is only deprecated but not removed.
|
||||
We recommend to move to the new API as soon as you can.
|
||||
We will summarize all API changes in the next paragraphs.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
The two main classes to specify a topology via the DSL (<code>KStreamBuilder</code>)
|
||||
or the Processor API (<code>TopologyBuilder</code>) were deprecated and replaced by
|
||||
<code>StreamsBuilder</code> and <code>Topology</code> (both new classes are located in
|
||||
package <code>org.apache.kafka.streams</code>).
|
||||
Note, that <code>StreamsBuilder</code> does not extend <code>Topology</code>, i.e.,
|
||||
the class hierarchy is different now.
|
||||
The new classes have basically the same methods as the old ones to build a topology via DSL or Processor API.
|
||||
However, some internal methods that were public in <code>KStreamBuilder</code>
|
||||
and <code>TopologyBuilder</code> but not part of the actual API are not present
|
||||
in the new classes any longer.
|
||||
Furthermore, some overloads were simplified compared to the original classes.
|
||||
See <a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-120%3A+Cleanup+Kafka+Streams+builder+API">KIP-120</a>
|
||||
and <a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-182%3A+Reduce+Streams+DSL+overloads+and+allow+easier+use+of+custom+storage+engines">KIP-182</a>
|
||||
for full details.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
Changing how a topology is specified also affects <code>KafkaStreams</code> constructors,
|
||||
that now only accept a <code>Topology</code>.
|
||||
Using the DSL builder class <code>StreamsBuilder</code> one can get the constructed
|
||||
<code>Topology</code> via <code>StreamsBuilder#build()</code>.
|
||||
Additionally, a new class <code>org.apache.kafka.streams.TopologyDescription</code>
|
||||
(and some more dependent classes) were added.
|
||||
Those can be used to get a detailed description of the specified topology
|
||||
and can be obtained by calling <code>Topology#describe()</code>.
|
||||
An example using this new API is shown in the <a href="/{{version}}/documentation/streams/quickstart">quickstart section</a>.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
New methods in <code>KStream</code>:
|
||||
</p>
|
||||
<ul>
|
||||
<li>With the introduction of <a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-202+Move+merge%28%29+from+StreamsBuilder+to+KStream">KIP-202</a>
|
||||
a new method <code>merge()</code> has been created in <code>KStream</code> as the StreamsBuilder class's <code>StreamsBuilder#merge()</code> has been removed.
|
||||
The method signature was also changed, too: instead of providing multiple <code>KStream</code>s into the method at the once, only a single <code>KStream</code> is accepted.
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
<p>
|
||||
New methods in <code>KafkaStreams</code>:
|
||||
</p>
|
||||
<ul>
|
||||
<li>retrieve the current runtime information about the local threads via <code>localThreadsMetadata()</code> </li>
|
||||
<li>observe the restoration of all state stores via <code>setGlobalStateRestoreListener()</code>, in which users can provide their customized implementation of the <code>org.apache.kafka.streams.processor.StateRestoreListener</code> interface</li>
|
||||
</ul>
|
||||
|
||||
<p>
|
||||
Deprecated / modified methods in <code>KafkaStreams</code>:
|
||||
</p>
|
||||
<ul>
|
||||
<li>
|
||||
<code>toString()</code>, <code>toString(final String indent)</code> were previously used to return static and runtime information.
|
||||
They have been deprecated in favor of using the new classes/methods <code>localThreadsMetadata()</code> / <code>ThreadMetadata</code> (returning runtime information) and
|
||||
<code>TopologyDescription</code> / <code>Topology#describe()</code> (returning static information).
|
||||
</li>
|
||||
<li>
|
||||
With the introduction of <a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-182%3A+Reduce+Streams+DSL+overloads+and+allow+easier+use+of+custom+storage+engines">KIP-182</a>
|
||||
you should no longer pass in <code>Serde</code> to <code>KStream#print</code> operations.
|
||||
If you can't rely on using <code>toString</code> to print your keys an values, you should instead you provide a custom <code>KeyValueMapper</code> via the <code>Printed#withKeyValueMapper</code> call.
|
||||
</li>
|
||||
<li>
|
||||
<code>setStateListener()</code> now can only be set before the application start running, i.e. before <code>KafkaStreams.start()</code> is called.
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
<p>
|
||||
Deprecated methods in <code>KGroupedStream</code>
|
||||
</p>
|
||||
<ul>
|
||||
<li>
|
||||
Windowed aggregations have been deprecated from <code>KGroupedStream</code> and moved to <code>WindowedKStream</code>.
|
||||
You can now perform a windowed aggregation by, for example, using <code>KGroupedStream#windowedBy(Windows)#reduce(Reducer)</code>.
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
<p>
|
||||
Modified methods in <code>Processor</code>:
|
||||
</p>
|
||||
<ul>
|
||||
<li>
|
||||
<p>
|
||||
The Processor API was extended to allow users to schedule <code>punctuate</code> functions either based on data-driven <b>stream time</b> or wall-clock time.
|
||||
As a result, the original <code>ProcessorContext#schedule</code> is deprecated with a new overloaded function that accepts a user customizable <code>Punctuator</code> callback interface, which triggers its <code>punctuate</code> API method periodically based on the <code>PunctuationType</code>.
|
||||
The <code>PunctuationType</code> determines what notion of time is used for the punctuation scheduling: either <a href="/{{version}}/documentation/streams/core-concepts#streams_time">stream time</a> or wall-clock time (by default, <b>stream time</b> is configured to represent event time via <code>TimestampExtractor</code>).
|
||||
In addition, the <code>punctuate</code> function inside <code>Processor</code> is also deprecated.
|
||||
</p>
|
||||
<p>
|
||||
Before this, users could only schedule based on stream time (i.e. <code>PunctuationType.STREAM_TIME</code>) and hence the <code>punctuate</code> function was data-driven only because stream time is determined (and advanced forward) by the timestamps derived from the input data.
|
||||
If there is no data arriving at the processor, the stream time would not advance and hence punctuation will not be triggered.
|
||||
On the other hand, When wall-clock time (i.e. <code>PunctuationType.WALL_CLOCK_TIME</code>) is used, <code>punctuate</code> will be triggered purely based on wall-clock time.
|
||||
So for example if the <code>Punctuator</code> function is scheduled based on <code>PunctuationType.WALL_CLOCK_TIME</code>, if these 60 records were processed within 20 seconds,
|
||||
<code>punctuate</code> would be called 2 times (one time every 10 seconds);
|
||||
if these 60 records were processed within 5 seconds, then no <code>punctuate</code> would be called at all.
|
||||
Users can schedule multiple <code>Punctuator</code> callbacks with different <code>PunctuationType</code>s within the same processor by simply calling <code>ProcessorContext#schedule</code> multiple times inside processor's <code>init()</code> method.
|
||||
</p>
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
<p>
|
||||
If you are monitoring on task level or processor-node / state store level Streams metrics, please note that the metrics sensor name and hierarchy was changed:
|
||||
The task ids, store names and processor names are no longer in the sensor metrics names, but instead are added as tags of the sensors to achieve consistent metrics hierarchy.
|
||||
As a result you may need to make corresponding code changes on your metrics reporting and monitoring tools when upgrading to 1.0.0.
|
||||
Detailed metrics sensor can be found in the <a href="#kafka_streams_monitoring">Streams Monitoring</a> section.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
The introduction of <a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-161%3A+streams+deserialization+exception+handlers">KIP-161</a>
|
||||
enables you to provide a default exception handler for deserialization errors when reading data from Kafka rather than throwing the exception all the way out of your streams application.
|
||||
You can provide the configs via the <code>StreamsConfig</code> as <code>StreamsConfig#DEFAULT_DESERIALIZATION_EXCEPTION_HANDLER_CLASS_CONFIG</code>.
|
||||
The specified handler must implement the <code>org.apache.kafka.streams.errors.DeserializationExceptionHandler</code> interface.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
The introduction of <a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-173%3A+Add+prefix+to+StreamsConfig+to+enable+setting+default+internal+topic+configs">KIP-173</a>
|
||||
enables you to provide topic configuration parameters for any topics created by Kafka Streams.
|
||||
This includes repartition and changelog topics.
|
||||
You can provide the configs via the <code>StreamsConfig</code> by adding the configs with the prefix as defined by <code>StreamsConfig#topicPrefix(String)</code>.
|
||||
Any properties in the <code>StreamsConfig</code> with the prefix will be applied when creating internal topics.
|
||||
Any configs that aren't topic configs will be ignored.
|
||||
If you already use <code>StateStoreSupplier</code> or <code>Materialized</code> to provide configs for changelogs, then they will take precedence over those supplied in the config.
|
||||
</p>
|
||||
|
||||
<h3><a id="streams_api_changes_0110" href="#streams_api_changes_0110">Streams API changes in 0.11.0.0</a></h3>
|
||||
|
||||
<p> Updates in <code>StreamsConfig</code>: </p>
|
||||
<ul>
|
||||
<li> new configuration parameter <code>processing.guarantee</code> is added </li>
|
||||
<li> configuration parameter <code>key.serde</code> was deprecated and replaced by <code>default.key.serde</code> </li>
|
||||
<li> configuration parameter <code>value.serde</code> was deprecated and replaced by <code>default.value.serde</code> </li>
|
||||
<li> configuration parameter <code>timestamp.extractor</code> was deprecated and replaced by <code>default.timestamp.extractor</code> </li>
|
||||
<li> method <code>keySerde()</code> was deprecated and replaced by <code>defaultKeySerde()</code> </li>
|
||||
<li> method <code>valueSerde()</code> was deprecated and replaced by <code>defaultValueSerde()</code> </li>
|
||||
<li> new method <code>defaultTimestampExtractor()</code> was added </li>
|
||||
</ul>
|
||||
|
||||
<p> New methods in <code>TopologyBuilder</code>: </p>
|
||||
<ul>
|
||||
<li> added overloads for <code>addSource()</code> that allow to define a <code>TimestampExtractor</code> per source node </li>
|
||||
<li> added overloads for <code>addGlobalStore()</code> that allow to define a <code>TimestampExtractor</code> per source node associated with the global store </li>
|
||||
</ul>
|
||||
|
||||
<p> New methods in <code>KStreamBuilder</code>: </p>
|
||||
<ul>
|
||||
<li> added overloads for <code>stream()</code> that allow to define a <code>TimestampExtractor</code> per input stream </li>
|
||||
<li> added overloads for <code>table()</code> that allow to define a <code>TimestampExtractor</code> per input table </li>
|
||||
<li> added overloads for <code>globalKTable()</code> that allow to define a <code>TimestampExtractor</code> per global table </li>
|
||||
</ul>
|
||||
|
||||
<p> Deprecated methods in <code>KTable</code>: </p>
|
||||
<ul>
|
||||
<li> <code>void foreach(final ForeachAction<? super K, ? super V> action)</code> </li>
|
||||
<li> <code>void print()</code> </li>
|
||||
<li> <code>void print(final String streamName)</code> </li>
|
||||
<li> <code>void print(final Serde<K> keySerde, final Serde<V> valSerde)</code> </li>
|
||||
<li> <code>void print(final Serde<K> keySerde, final Serde<V> valSerde, final String streamName)</code> </li>
|
||||
<li> <code>void writeAsText(final String filePath)</code> </li>
|
||||
<li> <code>void writeAsText(final String filePath, final String streamName)</code> </li>
|
||||
<li> <code>void writeAsText(final String filePath, final Serde<K> keySerde, final Serde<V> valSerde)</code> </li>
|
||||
<li> <code>void writeAsText(final String filePath, final String streamName, final Serde<K> keySerde, final Serde<V> valSerde)</code> </li>
|
||||
</ul>
|
||||
|
||||
<p>
|
||||
The above methods have been deprecated in favor of using the Interactive Queries API.
|
||||
If you want to query the current content of the state store backing the KTable, use the following approach:
|
||||
</p>
|
||||
<ul>
|
||||
<li> Make a call to <code>KafkaStreams.store(final String storeName, final QueryableStoreType<T> queryableStoreType)</code> </li>
|
||||
<li> Then make a call to <code>ReadOnlyKeyValueStore.all()</code> to iterate over the keys of a <code>KTable</code>. </li>
|
||||
</ul>
|
||||
<p>
|
||||
If you want to view the changelog stream of the <code>KTable</code> then you could call <code>KTable.toStream().print(Printed.toSysOut)</code>.
|
||||
</p>
|
||||
|
||||
<p> Metrics using exactly-once semantics: </p>
|
||||
<p>
|
||||
If exactly-once processing is enabled via the <code>processing.guarantees</code> parameter, internally Streams switches from a producer per thread to a producer per task runtime model.
|
||||
In order to distinguish the different producers, the producer's <code>client.id</code> additionally encodes the task-ID for this case.
|
||||
Because the producer's <code>client.id</code> is used to report JMX metrics, it might be required to update tools that receive those metrics.
|
||||
|
||||
</p>
|
||||
|
||||
<p> Producer's <code>client.id</code> naming schema: </p>
|
||||
<ul>
|
||||
<li> at-least-once (default): <code>[client.Id]-StreamThread-[sequence-number]</code> </li>
|
||||
<li> exactly-once: <code>[client.Id]-StreamThread-[sequence-number]-[taskId]</code> </li>
|
||||
</ul>
|
||||
<p> <code>[client.Id]</code> is either set via Streams configuration parameter <code>client.id</code> or defaults to <code>[application.id]-[processId]</code> (<code>[processId]</code> is a random UUID). </p>
|
||||
|
||||
<h3><a id="streams_api_changes_01021" href="#streams_api_changes_01021">Notable changes in 0.10.2.1</a></h3>
|
||||
|
||||
<p>
|
||||
Parameter updates in <code>StreamsConfig</code>:
|
||||
</p>
|
||||
<ul>
|
||||
<li> The default config values of embedded producer's <code>retries</code> and consumer's <code>max.poll.interval.ms</code> have been changed to improve the resiliency of a Kafka Streams application </li>
|
||||
</ul>
|
||||
|
||||
<h3><a id="streams_api_changes_0102" href="#streams_api_changes_0102">Streams API changes in 0.10.2.0</a></h3>
|
||||
|
||||
<p>
|
||||
New methods in <code>KafkaStreams</code>:
|
||||
</p>
|
||||
<ul>
|
||||
<li> set a listener to react on application state change via <code>setStateListener(StateListener listener)</code> </li>
|
||||
<li> retrieve the current application state via <code>state()</code> </li>
|
||||
<li> retrieve the global metrics registry via <code>metrics()</code> </li>
|
||||
<li> apply a timeout when closing an application via <code>close(long timeout, TimeUnit timeUnit)</code> </li>
|
||||
<li> specify a custom indent when retrieving Kafka Streams information via <code>toString(String indent)</code> </li>
|
||||
</ul>
|
||||
|
||||
<p>
|
||||
Parameter updates in <code>StreamsConfig</code>:
|
||||
</p>
|
||||
<ul>
|
||||
<li> parameter <code>zookeeper.connect</code> was deprecated; a Kafka Streams application does no longer interact with ZooKeeper for topic management but uses the new broker admin protocol
|
||||
(cf. <a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations#KIP-4-Commandlineandcentralizedadministrativeoperations-TopicAdminSchema.1">KIP-4, Section "Topic Admin Schema"</a>) </li>
|
||||
<li> added many new parameters for metrics, security, and client configurations </li>
|
||||
</ul>
|
||||
|
||||
<p> Changes in <code>StreamsMetrics</code> interface: </p>
|
||||
<ul>
|
||||
<li> removed methods: <code>addLatencySensor()</code> </li>
|
||||
<li> added methods: <code>addLatencyAndThroughputSensor()</code>, <code>addThroughputSensor()</code>, <code>recordThroughput()</code>,
|
||||
<code>addSensor()</code>, <code>removeSensor()</code> </li>
|
||||
</ul>
|
||||
|
||||
<p> New methods in <code>TopologyBuilder</code>: </p>
|
||||
<ul>
|
||||
<li> added overloads for <code>addSource()</code> that allow to define a <code>auto.offset.reset</code> policy per source node </li>
|
||||
<li> added methods <code>addGlobalStore()</code> to add global <code>StateStore</code>s </li>
|
||||
</ul>
|
||||
|
||||
<p> New methods in <code>KStreamBuilder</code>: </p>
|
||||
<ul>
|
||||
<li> added overloads for <code>stream()</code> and <code>table()</code> that allow to define a <code>auto.offset.reset</code> policy per input stream/table </li>
|
||||
<li> added method <code>globalKTable()</code> to create a <code>GlobalKTable</code> </li>
|
||||
</ul>
|
||||
|
||||
<p> New joins for <code>KStream</code>: </p>
|
||||
<ul>
|
||||
<li> added overloads for <code>join()</code> to join with <code>KTable</code> </li>
|
||||
<li> added overloads for <code>join()</code> and <code>leftJoin()</code> to join with <code>GlobalKTable</code> </li>
|
||||
<li> note, join semantics in 0.10.2 were improved and thus you might see different result compared to 0.10.0.x and 0.10.1.x
|
||||
(cf. <a href="https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Streams+Join+Semantics">Kafka Streams Join Semantics</a> in the Apache Kafka wiki)
|
||||
</ul>
|
||||
|
||||
<p> Aligned <code>null</code>-key handling for <code>KTable</code> joins: </p>
|
||||
<ul>
|
||||
<li> like all other KTable operations, <code>KTable-KTable</code> joins do not throw an exception on <code>null</code> key records anymore, but drop those records silently </li>
|
||||
</ul>
|
||||
|
||||
<p> New window type <em>Session Windows</em>: </p>
|
||||
<ul>
|
||||
<li> added class <code>SessionWindows</code> to specify session windows </li>
|
||||
<li> added overloads for <code>KGroupedStream</code> methods <code>count()</code>, <code>reduce()</code>, and <code>aggregate()</code>
|
||||
to allow session window aggregations </li>
|
||||
</ul>
|
||||
|
||||
<p> Changes to <code>TimestampExtractor</code>: </p>
|
||||
<ul>
|
||||
<li> method <code>extract()</code> has a second parameter now </li>
|
||||
<li> new default timestamp extractor class <code>FailOnInvalidTimestamp</code>
|
||||
(it gives the same behavior as old (and removed) default extractor <code>ConsumerRecordTimestampExtractor</code>) </li>
|
||||
<li> new alternative timestamp extractor classes <code>LogAndSkipOnInvalidTimestamp</code> and <code>UsePreviousTimeOnInvalidTimestamps</code> </li>
|
||||
</ul>
|
||||
|
||||
<p> Relaxed type constraints of many DSL interfaces, classes, and methods (cf. <a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-100+-+Relax+Type+constraints+in+Kafka+Streams+API">KIP-100</a>). </p>
|
||||
|
||||
<h3><a id="streams_api_changes_0101" href="#streams_api_changes_0101">Streams API changes in 0.10.1.0</a></h3>
|
||||
|
||||
<p> Stream grouping and aggregation split into two methods: </p>
|
||||
<ul>
|
||||
<li> old: KStream #aggregateByKey(), #reduceByKey(), and #countByKey() </li>
|
||||
<li> new: KStream#groupByKey() plus KGroupedStream #aggregate(), #reduce(), and #count() </li>
|
||||
<li> Example: stream.countByKey() changes to stream.groupByKey().count() </li>
|
||||
</ul>
|
||||
|
||||
<p> Auto Repartitioning: </p>
|
||||
<ul>
|
||||
<li> a call to through() after a key-changing operator and before an aggregation/join is no longer required </li>
|
||||
<li> Example: stream.selectKey(...).through(...).countByKey() changes to stream.selectKey().groupByKey().count() </li>
|
||||
</ul>
|
||||
|
||||
<p> TopologyBuilder: </p>
|
||||
<ul>
|
||||
<li> methods #sourceTopics(String applicationId) and #topicGroups(String applicationId) got simplified to #sourceTopics() and #topicGroups() </li>
|
||||
</ul>
|
||||
|
||||
<p> DSL: new parameter to specify state store names: </p>
|
||||
<ul>
|
||||
<li> The new Interactive Queries feature requires to specify a store name for all source KTables and window aggregation result KTables (previous parameter "operator/window name" is now the storeName) </li>
|
||||
<li> KStreamBuilder#table(String topic) changes to #topic(String topic, String storeName) </li>
|
||||
<li> KTable#through(String topic) changes to #through(String topic, String storeName) </li>
|
||||
<li> KGroupedStream #aggregate(), #reduce(), and #count() require additional parameter "String storeName"</li>
|
||||
<li> Example: stream.countByKey(TimeWindows.of("windowName", 1000)) changes to stream.groupByKey().count(TimeWindows.of(1000), "countStoreName") </li>
|
||||
</ul>
|
||||
|
||||
<p> Windowing: </p>
|
||||
<ul>
|
||||
<li> Windows are not named anymore: TimeWindows.of("name", 1000) changes to TimeWindows.of(1000) (cf. DSL: new parameter to specify state store names) </li>
|
||||
<li> JoinWindows has no default size anymore: JoinWindows.of("name").within(1000) changes to JoinWindows.of(1000) </li>
|
||||
</ul>
|
||||
|
||||
<div class="pagination">
|
||||
<a href="/{{version}}/documentation/streams/developer-guide/app-reset-tool" class="pagination__btn pagination__btn__prev">Previous</a>
|
||||
<a href="#" class="pagination__btn pagination__btn__next pagination__btn--disabled">Next</a>
|
||||
</div>
|
||||
</script>
|
||||
|
||||
<!--#include virtual="../../includes/_header.htm" -->
|
||||
<!--#include virtual="../../includes/_top.htm" -->
|
||||
<div class="content documentation documentation--current">
|
||||
<!--#include virtual="../../includes/_nav.htm" -->
|
||||
<div class="right">
|
||||
<!--#include virtual="../../includes/_docs_banner.htm" -->
|
||||
<ul class="breadcrumbs">
|
||||
<li><a href="/documentation">Documentation</a></li>
|
||||
<li><a href="/documentation/streams">Kafka Streams</a></li>
|
||||
</ul>
|
||||
<div class="p-content"></div>
|
||||
</div>
|
||||
</div>
|
||||
<!--#include virtual="../../includes/_footer.htm" -->
|
||||
<script>
|
||||
$(function() {
|
||||
// Show selected style on nav item
|
||||
$('.b-nav__streams').addClass('selected');
|
||||
|
||||
// Display docs subnav items
|
||||
$('.b-nav__docs').parent().toggleClass('nav__item__with__subs--expanded');
|
||||
});
|
||||
</script>
|
||||
Reference in New Issue
Block a user