{"id":18998,"date":"2020-10-08T13:00:05","date_gmt":"2020-10-08T19:00:05","guid":{"rendered":"https:\/\/www.fullcontact.com\/?p=18998"},"modified":"2025-01-06T04:31:47","modified_gmt":"2025-01-06T11:31:47","slug":"building-a-lambda-architecture-with-druid-and-kafka-streams","status":"publish","type":"post","link":"https:\/\/www.fullcontact.com\/blog\/engineering\/building-a-lambda-architecture-with-druid-and-kafka-streams\/","title":{"rendered":"Building a Lambda Architecture with Druid and Kafka Streams"},"content":{"rendered":"<p><span style=\"font-weight: 400;\">At FullContact, engineers have the opportunity to solve the unique and challenging problems created by a growing Identity Resolution Business. The start to any good solution is researching the tools your team is familiar with, along with the vast array of solutions out in the open-source world. This blog will outline our use of Apache Kafka and Druid and how we added Kafka Streams to the stack in order to solve a new problem.<\/span><\/p>\n<h3><b>The Problem<\/b><\/h3>\n<p><span style=\"font-weight: 400;\">FullContact needs to keep track of every API call and response that a customer makes, along with the types of data returned in each response. This data is used for billing and analytics. Customers need to be able to see how much data they are using, and FullContact needs to ensure that the usage remains within the contracted limits. We needed a system that could track usage for analytics, limiting, billing, and a system that could store the contractual agreements and limitations on each customer account.<\/span><\/p>\n<p>Many of these features were being implemented by a third-party API management solution we were using, but due to scaling challenges and feature limitations, it was time to move on and build our own. On the technical side, we needed a system that was scalable and fast.<\/p>\n<h3><b>Last Year\u2019s Solution<\/b><\/h3>\n<p><span style=\"font-weight: 400;\">In order to solve the problem, we chose Kafka and Druid. For a more in-depth look at the solution, you can take a look at our previous <\/span><a href=\"https:\/\/www.fullcontact.com\/blog\/2019\/09\/18\/real-time-analytics-with-apache-druid-at-fullcontact\/\"><span style=\"font-weight: 400;\">meetup talk<\/span><\/a><span style=\"font-weight: 400;\"> and <\/span><a href=\"https:\/\/www.fullcontact.com\/blog\/2019\/09\/05\/our-process-of-deploying-running-and-maintaining-druid\/\"><span style=\"font-weight: 400;\">blog post<\/span><\/a><span style=\"font-weight: 400;\">. At a high level, the solution looks like this:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Each call to a FullContact API results in an Avro usage message sent to Kafka that has the details of each request (any sensitive details are encrypted with a unique key).<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Druid consumes the usage topic for real-time ingestion and querying.<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Secor consumes the usage topic for long term archiving to S3 in parquet file format.<\/span><\/li>\n<\/ul>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-19001\" src=\"https:\/\/www.fullcontact.com\/wp-content\/uploads\/2020\/10\/flowchart1@2x.png\" alt=\"\" width=\"1600\" height=\"774\" srcset=\"https:\/\/www.fullcontact.com\/wp-content\/uploads\/2020\/10\/flowchart1@2x.png 1600w, https:\/\/www.fullcontact.com\/wp-content\/uploads\/2020\/10\/flowchart1@2x-300x145.png 300w, https:\/\/www.fullcontact.com\/wp-content\/uploads\/2020\/10\/flowchart1@2x-1024x495.png 1024w, https:\/\/www.fullcontact.com\/wp-content\/uploads\/2020\/10\/flowchart1@2x-768x372.png 768w, https:\/\/www.fullcontact.com\/wp-content\/uploads\/2020\/10\/flowchart1@2x-1536x743.png 1536w\" sizes=\"auto, (max-width: 1600px) 100vw, 1600px\" \/><\/p>\n<p><span style=\"font-weight: 400;\">Using the framework above we were able to provide several useful tools to our customers and internal stakeholders:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">A realtime dashboard that instantly reflects new usage and shows patterns over time.<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Powerful internal reporting and insight through the open source tool <\/span><a href=\"https:\/\/github.com\/allegro\/turnilo\"><span style=\"font-weight: 400;\">Turnilo.<\/span><\/a><\/li>\n<\/ul>\n<h3><b>The New Problem<\/b><span style=\"font-weight: 400;\">\u00a0<\/span><\/h3>\n<p><span style=\"font-weight: 400;\">Now that we had a way to keep track of usage and describe the limits for each client, we needed something that would automatically enforce those limits in close to real-time.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Here are a few of the requirements that influenced our decision to leverage Kafka Streams:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Adjusting service to each client should happen nearly instantaneously when they reach their specified limit.<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Checking limits should be done in an asynchronous manner, no additional latency or complexity is introduced into the API serving layer.<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">The component keeping track of real-time aggregations should be able to be restarted and easily restore the previous state.<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">We should not overload our existing Druid cluster by querying it for current usage on every API request.\u00a0<\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400;\">To solve this problem we came up with a solution that resembles a <\/span><a href=\"https:\/\/en.wikipedia.org\/wiki\/Lambda_architecture\"><span style=\"font-weight: 400;\">lambda architecture<\/span><\/a><span style=\"font-weight: 400;\">. In our case, instead of having a batch method and stream method, we have Druid with real-time ingestion for historical aggregation and Kafka Streams for our stream processing and real-time eventing engine.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">As a quick introduction, here is the official description of Kafka Streams from its website:<\/span><\/p>\n<blockquote><p><i>&#8220;Kafka Streams is a client library for building applications and microservices, where the input and output data are stored in Kafka clusters. It combines the simplicity of writing and deploying standard Java and Scala applications on the client-side with the benefits of Kafka&#8217;s server-side cluster technology.&#8221;<\/i><\/p><\/blockquote>\n<p><span style=\"font-weight: 400;\">When our Kafka Streams app initially starts up and starts to aggregate the number of usage events for a client, it has no concept of any historical usage that occurred before that time.<\/span><\/p>\n<p>In order to get that view of the world, it queries Druid to return an aggregated count of all usage that occurred since the client&#8217;s contracted start date to \u201cnow\u201d (the current timestamp in the stream where aggregation started). As additional usage rolls in, the streams app continues to update the aggregation and emits new events to downstream topics when the client has reached their usage threshold.<\/p>\n<p>The current aggregated usage number for each client is persisted in Kafka Streams state stores. Any subsequent restarts result in automatic recovery of the aggregated counts from the state store instead of a re-query to Druid.<\/p>\n<h3><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-19002\" src=\"https:\/\/www.fullcontact.com\/wp-content\/uploads\/2020\/10\/flowchart2@2x.png\" alt=\"FullContact Flowchart Process\" width=\"2188\" height=\"580\" srcset=\"https:\/\/www.fullcontact.com\/wp-content\/uploads\/2020\/10\/flowchart2@2x.png 2188w, https:\/\/www.fullcontact.com\/wp-content\/uploads\/2020\/10\/flowchart2@2x-300x80.png 300w, https:\/\/www.fullcontact.com\/wp-content\/uploads\/2020\/10\/flowchart2@2x-1024x271.png 1024w, https:\/\/www.fullcontact.com\/wp-content\/uploads\/2020\/10\/flowchart2@2x-768x204.png 768w, https:\/\/www.fullcontact.com\/wp-content\/uploads\/2020\/10\/flowchart2@2x-1536x407.png 1536w, https:\/\/www.fullcontact.com\/wp-content\/uploads\/2020\/10\/flowchart2@2x-2048x543.png 2048w\" sizes=\"auto, (max-width: 2188px) 100vw, 2188px\" \/><\/h3>\n<p><b>Features in Kafka Streams:<\/b><\/p>\n<p><span style=\"font-weight: 400;\">We made use of a lot of helpful features from Kafka Streams in order to build this solution:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Exactly-once processing helps us ensure we only process a usage message once so we are not overcounting or missing messages even if there are failures or latency in the system.<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">RocksDB state stores persist the aggregation results on the local disk and also allow for clean recovery by backing up state to the Kafka broker<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">The Kafka Streams deployment model is incredibly simple. It&#8217;s just a JVM app so it can be deployed like you would any JVM app and doesn&#8217;t need a specialized streaming cluster like Storm, Flink, Spark, etc. We used our normal approach of deploying our app as a Docker container managed by a Kubernetes.\u00a0<\/span><\/li>\n<\/ul>\n<h3><b>Challenges We Faced<\/b><\/h3>\n<p><span style=\"font-weight: 400;\">As happens when you start using any new technology and start to scale we met a few challenges along the way.<\/span><\/p>\n<p><b>Kafka Transactions\u00a0<\/b><\/p>\n<p><span style=\"font-weight: 400;\">Kafka transactions were a new feature introduced in <\/span><a href=\"https:\/\/cwiki.apache.org\/confluence\/display\/KAFKA\/KIP-98+-+Exactly+Once+Delivery+and+Transactional+Messaging\"><span style=\"font-weight: 400;\">KIP-98<\/span><\/a><span style=\"font-weight: 400;\"> that Kafka Streams uses to ensure exactly-once processing. As your stream processing topology is running, it will commit each transaction. If transactions are not committed in a timely manner, the broker will \u201cFence\u201d (ProducerFenceException) and a rebalance will be caused. We found this out the hard way when a few parts of our topology had bottlenecks and inefficiencies that caused us to go into an endless rebalance loop.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Whenever the rebalances started happening it became difficult to know which stream threads were assigned to which partitions and if a particular thread was the culprit. Here are a few simple scripts we used to help shed light on this:<\/span><\/p>\n<ul>\n<li style=\"list-style-type: none;\">\n<ul>\n<li style=\"font-weight: 400;\"><a href=\"https:\/\/gist.github.com\/gazz\/8c4b4307c5f37e0b729bf8db0ac622d5\"><span style=\"font-weight: 400;\">https:\/\/gist.github.com\/gazz\/8c4b4307c5f37e0b729bf8db0ac622d5<\/span><\/a><span style=\"font-weight: 400;\">\u00a0<\/span><\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<pre>bin\/kafka-consumer-groups --bootstrap-server $MSK_BROKERS --describe --group stream.topic.prod.stream.v3 | .\/parse_partition_assignments.py\r\nHost: \/10.2.69.5, stream.topic.prod.stream.v3-2d800eba-776f-4e8e-801b-6060c706d279\r\n\tThreads:\r\n\t\t1: ['databus.stream.usagetopic.v0', 'stream.topic.cpuc.v2', 'stream.topic.internal.contracts.v0']\r\n\t\t2: ['stream.topic.load.contract.v0']\r\n\t\t3: ['stream.topic.prod.stream.v3-aggregated-usage-repartition']\r\nHost: \/10.2.112.202, stream.topic.prod.stream.v3-e4788e3f-864d-45c5-9174-6168207a4c7f\r\n\tThreads:\r\n\t\t1: ['databus.stream.usagetopic.v0', 'stream.topic.cpuc.v2', 'stream.topic.internal.contracts.v0']\r\n\t\t2: ['stream.topic.load.contract.v0']\r\n\t\t3: ['stream.topic.prod.stream.v3-aggregated-usage-repartition']\r\nHost: \/10.2.112.124, stream.topic.prod.stream.v3-a44ecbbc-0023-49d3-b866-88d8157585e0\r\n\tThreads:\r\n\t\t1: ['databus.stream.usagetopic.v0', 'stream.topic.cost.v2', 'stream.topic.internal.contracts.v0']\r\n\t\t2: ['stream.topic.load.contract.v0']\r\n\t\t3: ['stream.topic.prod.stream.v3-aggregated-usage-repartition']\r\n<\/pre>\n<p>&nbsp;<\/p>\n<p><span style=\"font-weight: 400;\"><b>Topologies and the Monostream<\/b><\/span><\/p>\n<p><span style=\"font-weight: 400;\">Similar to the outcast monoservice, the monostream is what happens when you let your Kafka Stream start to take on too many responsibilities. The topology directed acyclic graph (DAG) that represents the aggregation logic quickly becomes unwieldy. This can make it difficult to reason how data flows through your topology and to determine where the possible bottlenecks and issues are. While inarguably the best solution to this is to just keep your stream app simple, visualizing your DAG can often help as well.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">The Kafka Streams API has a <\/span><a href=\"https:\/\/kafka.apache.org\/11\/javadoc\/org\/apache\/kafka\/streams\/Topology.html#describe--\"><span style=\"font-weight: 400;\">description<\/span><\/a><span style=\"font-weight: 400;\"> and toString method that will produce a text output of your DAG.\u00a0<\/span><\/p>\n<p><code>topology.describe().toString();<\/code><\/p>\n<p><span style=\"font-weight: 400;\">If you take the output of that and plug it into the online <\/span><a href=\"https:\/\/zz85.github.io\/kafka-streams-viz\/\"><span style=\"font-weight: 400;\">Kafka Viz App<\/span><\/a><span style=\"font-weight: 400;\"> created by Joshua Koo (<\/span><a href=\"https:\/\/github.com\/zz85\"><span style=\"font-weight: 400;\">@zz85<\/span><\/a><span style=\"font-weight: 400;\">).<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Here is an example topology description from a getting started Udemy Course:<\/span><\/p>\n<pre>Topologies:\r\n   Sub-topology: 0\r\n    Source: KSTREAM-SOURCE-0000000000 (topics: [favourite-colour-input])\r\n      --&gt; KSTREAM-FILTER-0000000001\r\n    Processor: KSTREAM-FILTER-0000000001 (stores: [])\r\n      --&gt; KSTREAM-KEY-SELECT-0000000002\r\n      &lt;-- KSTREAM-SOURCE-0000000000 Processor: KSTREAM-KEY-SELECT-0000000002 (stores: []) --&gt; KSTREAM-MAPVALUES-0000000003\r\n      &lt;-- KSTREAM-FILTER-0000000001 Processor: KSTREAM-MAPVALUES-0000000003 (stores: []) --&gt; KSTREAM-FILTER-0000000004\r\n      &lt;-- KSTREAM-KEY-SELECT-0000000002 Processor: KSTREAM-FILTER-0000000004 (stores: []) --&gt; KSTREAM-SINK-0000000005\r\n      &lt;-- KSTREAM-MAPVALUES-0000000003\r\n    Sink: KSTREAM-SINK-0000000005 (topic: user-keys-and-colours)\r\n      &lt;-- KSTREAM-FILTER-0000000004 Sub-topology: 1 Source: KSTREAM-SOURCE-0000000007 (topics: [user-keys-and-colours]) --&gt; KTABLE-SOURCE-0000000008\r\n    Processor: KTABLE-SOURCE-0000000008 (stores: [user-keys-and-colours-STATE-STORE-0000000006])\r\n      --&gt; KTABLE-SELECT-0000000009\r\n      &lt;-- KSTREAM-SOURCE-0000000007 Processor: KTABLE-SELECT-0000000009 (stores: []) --&gt; CountsByColours-sink\r\n      &lt;-- KTABLE-SOURCE-0000000008\r\n    Sink: CountsByColours-sink (topic: KTABLE-AGGREGATE-STATE-STORE-0000000010-repartition)\r\n      &lt;-- KTABLE-SELECT-0000000009 Sub-topology: 2 Source: CountsByColours-source (topics: [KTABLE-AGGREGATE-STATE-STORE-0000000010-repartition]) --&gt; CountsByColours\r\n    Processor: CountsByColours (stores: [KTABLE-AGGREGATE-STATE-STORE-0000000010])\r\n      --&gt; favourite-colour-output\r\n      &lt;-- CountsByColours-source Processor: favourite-colour-output (stores: []) --&gt; none\r\n      &lt;-- CountsByColours\r\n<\/pre>\n<p>&nbsp;<\/p>\n<p><span style=\"font-weight: 400;\">When plugged into kafka-viz it will produce a sketch of your topology: <\/span><\/p>\n<p><center><img decoding=\"async\" class=\"alignnone wp-image-19003 size-large\" src=\"https:\/\/www.fullcontact.com\/wp-content\/uploads\/2020\/10\/output-sketch-diagram@2x.png\" alt=\"\" width=\"200\" height=\"auto\" srcset=\"https:\/\/www.fullcontact.com\/wp-content\/uploads\/2020\/10\/output-sketch-diagram@2x.png 932w, https:\/\/www.fullcontact.com\/wp-content\/uploads\/2020\/10\/output-sketch-diagram@2x-50x300.png 50w, https:\/\/www.fullcontact.com\/wp-content\/uploads\/2020\/10\/output-sketch-diagram@2x-768x4615.png 768w, https:\/\/www.fullcontact.com\/wp-content\/uploads\/2020\/10\/output-sketch-diagram@2x-341x2048.png 341w\" sizes=\"(max-width: 932px) 100vw, 932px\" \/><\/center><b>Default Kubernetes Deployment Strategy and Thrashing Rebalances<\/b><\/p>\n<p><span style=\"font-weight: 400;\">Kafka Streams apps (and normal Kafka Consumer Groups) have an automatic way to handle members of the group coming or going. Whenever a new member is detected, processing pauses while a rebalance occurs and Kafka partitions are redistributed and assigned to the new members.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">While this is great for handling the occasional crash or restart, it&#8217;s less than ideal when it happens every single time you deploy a new version of your application.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">With Kubernetes Deployments, the default deployment strategy is a <\/span><a href=\"https:\/\/kubernetes.io\/docs\/concepts\/workloads\/controllers\/deployment\/#rolling-update-deployment\"><span style=\"font-weight: 400;\">RollingUpdate<\/span><\/a><span style=\"font-weight: 400;\">. This deployment strategy ensures that a new instance of the application is only added one by one, and old instances are only killed one by one after each new instance declares itself healthy. When you\u2019re running a REST service that always needs to respond to traffic this is a great way to ensure you always have a minimum number of healthy apps to serve traffic. When you\u2019re deploying a new instance of your Kafka Streaming app, it is a recipe for pain as the rebalance process occurs during <\/span><i><span style=\"font-weight: 400;\">every single step<\/span><\/i><span style=\"font-weight: 400;\"> of the above process.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Really what we want in the case of deploying a streaming application is to cleanly kill all the old instances of the service, then add all of the new instances of the service at the same time, allowing them to rebalance once. Luckily Kubernetes lets us do this by specifying <\/span><a href=\"https:\/\/kubernetes.io\/docs\/concepts\/workloads\/controllers\/deployment\/#recreate-deployment\"><span style=\"font-weight: 400;\">Recreate<\/span><\/a><span style=\"font-weight: 400;\"> as the deployment strategy:<\/span><\/p>\n<p><code>.spec.strategy.type==Recreate<\/code><\/p>\n<h3><b>Recap and the Future of Our Streaming Data Journey<\/b><\/h3>\n<p><span style=\"font-weight: 400;\">As a quick recap, we started out simply wanting to capture all of our API usages and use it for analytics. Druid and vanilla Kafka does that nearly out of the box. When we needed to build a real-time eventing system that reacted continuously based on updated aggregation counts we chose Kafka Streams for the job. While it worked well, it did take a bit of learning and we most likely inadvertently created a bit of a monostream that we will be forced to continue to maintain or refactor. But hey, it does the job!<\/span><\/p>\n<p><span style=\"font-weight: 400;\">When we aren\u2019t busy maintaining or refactoring here are a few tools out there we would like to spend more time learning about and applying to future problems if they fit.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">New features in <\/span><a href=\"https:\/\/github.com\/apache\/druid\/releases\/tag\/druid-0.19.0\"><span style=\"font-weight: 400;\">Druid 0.19 <\/span><\/a><span style=\"font-weight: 400;\">&#8211; we have been running Druid 0.16 for a little longer than ideal and look forward to the new features like JOINS, vectorized queries, and more!<\/span><\/p>\n<p><a href=\"https:\/\/ksqldb.io\/\"><span style=\"font-weight: 400;\">ksqlDB<\/span><\/a><span style=\"font-weight: 400;\"> &#8211; provides a database-like API to Kafka streams and KTables<\/span><\/p>\n<p><a href=\"https:\/\/pulsar.apache.org\/\"><span style=\"font-weight: 400;\">Apache Pulsar <\/span><\/a><span style=\"font-weight: 400;\">&#8211; cloud-native distributed messaging platform alternative to Kafka which has its own concept of stream processing (Pulsar Function).\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Thanks for reading, and keep an eye out for new learnings, hackathons, and blogs around these in the future.\u00a0<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>At FullContact, engineers have the opportunity to solve the unique and challenging problems created by a growing Identity Resolution Business. The start to any good solution is researching the tools your team is familiar with, along with the vast array of solutions out in the open-source world. This blog will outline our use of Apache [&hellip;]<\/p>\n","protected":false},"author":91,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"_improvement_type_select":"improve_an_existing","_thumb_yes_seoaic":false,"_frame_yes_seoaic":false,"seoaic_generate_description":"","seoaic_improve_instructions_prompt":"","seoaic_rollback_content_improvement":"","seoaic_idea_thumbnail_generator":"","thumbnail_generated":false,"thumbnail_generate_prompt":"","seoaic_article_description":"","seoaic_article_subtitles":[],"footnotes":""},"categories":[656],"tags":[5912,5898,5899,611,612,281],"class_list":["post-18998","post","type-post","status-publish","format-standard","hentry","category-engineering","tag-kubernetes-deployments","tag-kafka-streams","tag-apache-kafka","tag-druid","tag-fullcontact-engineering","tag-engineering"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v27.1 (Yoast SEO v27.1.1) - https:\/\/yoast.com\/product\/yoast-seo-premium-wordpress\/ -->\n<title>Building a Lambda Architecture with Druid and Kafka Streams | FullContact<\/title>\n<meta name=\"description\" content=\"At FullContact, engineers have the opportunity to solve the unique and challenging problems created by a growing Identity Resolution Business. The start\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.fullcontact.com\/blog\/engineering\/building-a-lambda-architecture-with-druid-and-kafka-streams\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Building a Lambda Architecture with Druid and Kafka Streams\" \/>\n<meta property=\"og:description\" content=\"At FullContact, engineers have the opportunity to solve the unique and challenging problems created by a growing Identity Resolution Business. The start\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.fullcontact.com\/blog\/engineering\/building-a-lambda-architecture-with-druid-and-kafka-streams\/\" \/>\n<meta property=\"og:site_name\" content=\"FullContact\" \/>\n<meta property=\"article:published_time\" content=\"2020-10-08T19:00:05+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-01-06T11:31:47+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.fullcontact.com\/wp-content\/uploads\/2020\/10\/flowchart1@2x.png\" \/>\n<meta name=\"author\" content=\"Jeremy Plichta\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@fullcontact\" \/>\n<meta name=\"twitter:site\" content=\"@fullcontact\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Jeremy Plichta\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"9 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/www.fullcontact.com\/blog\/engineering\/building-a-lambda-architecture-with-druid-and-kafka-streams\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/www.fullcontact.com\/blog\/engineering\/building-a-lambda-architecture-with-druid-and-kafka-streams\/\"},\"author\":{\"name\":\"Jeremy Plichta\",\"@id\":\"https:\/\/www.fullcontact.com\/#\/schema\/person\/72c64648c1d628849506fa594a520e11\"},\"headline\":\"Building a Lambda Architecture with Druid and Kafka Streams\",\"datePublished\":\"2020-10-08T19:00:05+00:00\",\"dateModified\":\"2025-01-06T11:31:47+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/www.fullcontact.com\/blog\/engineering\/building-a-lambda-architecture-with-druid-and-kafka-streams\/\"},\"wordCount\":1630,\"publisher\":{\"@id\":\"https:\/\/www.fullcontact.com\/#organization\"},\"image\":{\"@id\":\"https:\/\/www.fullcontact.com\/blog\/engineering\/building-a-lambda-architecture-with-druid-and-kafka-streams\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.fullcontact.com\/wp-content\/uploads\/2020\/10\/flowchart1@2x.png\",\"keywords\":[\"Kubernetes Deployments\",\"Kafka Streams\",\"Apache Kafka\",\"Druid\",\"FullContact Engineering\",\"engineering\"],\"articleSection\":[\"Engineering\"],\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.fullcontact.com\/blog\/engineering\/building-a-lambda-architecture-with-druid-and-kafka-streams\/\",\"url\":\"https:\/\/www.fullcontact.com\/blog\/engineering\/building-a-lambda-architecture-with-druid-and-kafka-streams\/\",\"name\":\"Building a Lambda Architecture with Druid and Kafka Streams | FullContact\",\"isPartOf\":{\"@id\":\"https:\/\/www.fullcontact.com\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.fullcontact.com\/blog\/engineering\/building-a-lambda-architecture-with-druid-and-kafka-streams\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.fullcontact.com\/blog\/engineering\/building-a-lambda-architecture-with-druid-and-kafka-streams\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.fullcontact.com\/wp-content\/uploads\/2020\/10\/flowchart1@2x.png\",\"datePublished\":\"2020-10-08T19:00:05+00:00\",\"dateModified\":\"2025-01-06T11:31:47+00:00\",\"description\":\"At FullContact, engineers have the opportunity to solve the unique and challenging problems created by a growing Identity Resolution Business. The start\",\"breadcrumb\":{\"@id\":\"https:\/\/www.fullcontact.com\/blog\/engineering\/building-a-lambda-architecture-with-druid-and-kafka-streams\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.fullcontact.com\/blog\/engineering\/building-a-lambda-architecture-with-druid-and-kafka-streams\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.fullcontact.com\/blog\/engineering\/building-a-lambda-architecture-with-druid-and-kafka-streams\/#primaryimage\",\"url\":\"https:\/\/www.fullcontact.com\/wp-content\/uploads\/2020\/10\/flowchart1@2x.png\",\"contentUrl\":\"https:\/\/www.fullcontact.com\/wp-content\/uploads\/2020\/10\/flowchart1@2x.png\",\"width\":1600,\"height\":774},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.fullcontact.com\/blog\/engineering\/building-a-lambda-architecture-with-druid-and-kafka-streams\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.fullcontact.com\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Building a Lambda Architecture with Druid and Kafka Streams\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.fullcontact.com\/#website\",\"url\":\"https:\/\/www.fullcontact.com\/\",\"name\":\"FullContact\",\"description\":\"Relationships, reimagined.\",\"publisher\":{\"@id\":\"https:\/\/www.fullcontact.com\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.fullcontact.com\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.fullcontact.com\/#organization\",\"name\":\"FullContact\",\"url\":\"https:\/\/www.fullcontact.com\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.fullcontact.com\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/www.fullcontact.com\/wp-content\/uploads\/2019\/11\/fc-logo@2x.png\",\"contentUrl\":\"https:\/\/www.fullcontact.com\/wp-content\/uploads\/2019\/11\/fc-logo@2x.png\",\"width\":200,\"height\":38,\"caption\":\"FullContact\"},\"image\":{\"@id\":\"https:\/\/www.fullcontact.com\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/x.com\/fullcontact\",\"https:\/\/www.linkedin.com\/company\/fullcontact-inc-\",\"https:\/\/www.youtube.com\/user\/FullContactAPI\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.fullcontact.com\/#\/schema\/person\/72c64648c1d628849506fa594a520e11\",\"name\":\"Jeremy Plichta\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.fullcontact.com\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/da25040c6b6f787c42fda93950ff50ef24ce1de3063a3943928f7e6b67db954a?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/da25040c6b6f787c42fda93950ff50ef24ce1de3063a3943928f7e6b67db954a?s=96&d=mm&r=g\",\"caption\":\"Jeremy Plichta\"},\"url\":\"https:\/\/www.fullcontact.com\/blog\/author\/jeremyplichta\/\"}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Building a Lambda Architecture with Druid and Kafka Streams | FullContact","description":"At FullContact, engineers have the opportunity to solve the unique and challenging problems created by a growing Identity Resolution Business. The start","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.fullcontact.com\/blog\/engineering\/building-a-lambda-architecture-with-druid-and-kafka-streams\/","og_locale":"en_US","og_type":"article","og_title":"Building a Lambda Architecture with Druid and Kafka Streams","og_description":"At FullContact, engineers have the opportunity to solve the unique and challenging problems created by a growing Identity Resolution Business. The start","og_url":"https:\/\/www.fullcontact.com\/blog\/engineering\/building-a-lambda-architecture-with-druid-and-kafka-streams\/","og_site_name":"FullContact","article_published_time":"2020-10-08T19:00:05+00:00","article_modified_time":"2025-01-06T11:31:47+00:00","og_image":[{"url":"https:\/\/www.fullcontact.com\/wp-content\/uploads\/2020\/10\/flowchart1@2x.png","type":"","width":"","height":""}],"author":"Jeremy Plichta","twitter_card":"summary_large_image","twitter_creator":"@fullcontact","twitter_site":"@fullcontact","twitter_misc":{"Written by":"Jeremy Plichta","Est. reading time":"9 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.fullcontact.com\/blog\/engineering\/building-a-lambda-architecture-with-druid-and-kafka-streams\/#article","isPartOf":{"@id":"https:\/\/www.fullcontact.com\/blog\/engineering\/building-a-lambda-architecture-with-druid-and-kafka-streams\/"},"author":{"name":"Jeremy Plichta","@id":"https:\/\/www.fullcontact.com\/#\/schema\/person\/72c64648c1d628849506fa594a520e11"},"headline":"Building a Lambda Architecture with Druid and Kafka Streams","datePublished":"2020-10-08T19:00:05+00:00","dateModified":"2025-01-06T11:31:47+00:00","mainEntityOfPage":{"@id":"https:\/\/www.fullcontact.com\/blog\/engineering\/building-a-lambda-architecture-with-druid-and-kafka-streams\/"},"wordCount":1630,"publisher":{"@id":"https:\/\/www.fullcontact.com\/#organization"},"image":{"@id":"https:\/\/www.fullcontact.com\/blog\/engineering\/building-a-lambda-architecture-with-druid-and-kafka-streams\/#primaryimage"},"thumbnailUrl":"https:\/\/www.fullcontact.com\/wp-content\/uploads\/2020\/10\/flowchart1@2x.png","keywords":["Kubernetes Deployments","Kafka Streams","Apache Kafka","Druid","FullContact Engineering","engineering"],"articleSection":["Engineering"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/www.fullcontact.com\/blog\/engineering\/building-a-lambda-architecture-with-druid-and-kafka-streams\/","url":"https:\/\/www.fullcontact.com\/blog\/engineering\/building-a-lambda-architecture-with-druid-and-kafka-streams\/","name":"Building a Lambda Architecture with Druid and Kafka Streams | FullContact","isPartOf":{"@id":"https:\/\/www.fullcontact.com\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.fullcontact.com\/blog\/engineering\/building-a-lambda-architecture-with-druid-and-kafka-streams\/#primaryimage"},"image":{"@id":"https:\/\/www.fullcontact.com\/blog\/engineering\/building-a-lambda-architecture-with-druid-and-kafka-streams\/#primaryimage"},"thumbnailUrl":"https:\/\/www.fullcontact.com\/wp-content\/uploads\/2020\/10\/flowchart1@2x.png","datePublished":"2020-10-08T19:00:05+00:00","dateModified":"2025-01-06T11:31:47+00:00","description":"At FullContact, engineers have the opportunity to solve the unique and challenging problems created by a growing Identity Resolution Business. The start","breadcrumb":{"@id":"https:\/\/www.fullcontact.com\/blog\/engineering\/building-a-lambda-architecture-with-druid-and-kafka-streams\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.fullcontact.com\/blog\/engineering\/building-a-lambda-architecture-with-druid-and-kafka-streams\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.fullcontact.com\/blog\/engineering\/building-a-lambda-architecture-with-druid-and-kafka-streams\/#primaryimage","url":"https:\/\/www.fullcontact.com\/wp-content\/uploads\/2020\/10\/flowchart1@2x.png","contentUrl":"https:\/\/www.fullcontact.com\/wp-content\/uploads\/2020\/10\/flowchart1@2x.png","width":1600,"height":774},{"@type":"BreadcrumbList","@id":"https:\/\/www.fullcontact.com\/blog\/engineering\/building-a-lambda-architecture-with-druid-and-kafka-streams\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.fullcontact.com\/"},{"@type":"ListItem","position":2,"name":"Building a Lambda Architecture with Druid and Kafka Streams"}]},{"@type":"WebSite","@id":"https:\/\/www.fullcontact.com\/#website","url":"https:\/\/www.fullcontact.com\/","name":"FullContact","description":"Relationships, reimagined.","publisher":{"@id":"https:\/\/www.fullcontact.com\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.fullcontact.com\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.fullcontact.com\/#organization","name":"FullContact","url":"https:\/\/www.fullcontact.com\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.fullcontact.com\/#\/schema\/logo\/image\/","url":"https:\/\/www.fullcontact.com\/wp-content\/uploads\/2019\/11\/fc-logo@2x.png","contentUrl":"https:\/\/www.fullcontact.com\/wp-content\/uploads\/2019\/11\/fc-logo@2x.png","width":200,"height":38,"caption":"FullContact"},"image":{"@id":"https:\/\/www.fullcontact.com\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/x.com\/fullcontact","https:\/\/www.linkedin.com\/company\/fullcontact-inc-","https:\/\/www.youtube.com\/user\/FullContactAPI"]},{"@type":"Person","@id":"https:\/\/www.fullcontact.com\/#\/schema\/person\/72c64648c1d628849506fa594a520e11","name":"Jeremy Plichta","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.fullcontact.com\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/da25040c6b6f787c42fda93950ff50ef24ce1de3063a3943928f7e6b67db954a?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/da25040c6b6f787c42fda93950ff50ef24ce1de3063a3943928f7e6b67db954a?s=96&d=mm&r=g","caption":"Jeremy Plichta"},"url":"https:\/\/www.fullcontact.com\/blog\/author\/jeremyplichta\/"}]}},"_links":{"self":[{"href":"https:\/\/www.fullcontact.com\/wp-json\/wp\/v2\/posts\/18998","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.fullcontact.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.fullcontact.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.fullcontact.com\/wp-json\/wp\/v2\/users\/91"}],"replies":[{"embeddable":true,"href":"https:\/\/www.fullcontact.com\/wp-json\/wp\/v2\/comments?post=18998"}],"version-history":[{"count":0,"href":"https:\/\/www.fullcontact.com\/wp-json\/wp\/v2\/posts\/18998\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.fullcontact.com\/wp-json\/wp\/v2\/media?parent=18998"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.fullcontact.com\/wp-json\/wp\/v2\/categories?post=18998"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.fullcontact.com\/wp-json\/wp\/v2\/tags?post=18998"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}