Sep 11-13, 2017
Kulturbrauerei
Flink Forward Berlin, the premier conference on Apache Flink®
Watch the talk recordings hereFlink Forward Berlin, the premier conference on Apache Flink®
Watch the talk recordings hereStreaming applications almost always require a schema. This is because the most interesting operations that can be applied to a data stream -- projection, scaling, aggregation, filtering, joining, streaming SQL -- all require you to know something about the types and values of fields in your data; otherwise you’re just moving bytes and counting anonymous things. This talk is an introduction and overview of shared schema registries [1,2] with a demonstration of how they can be integrated into Apache Flink pipelines to centralize schema management and enable schema reuse across data flow systems (e.g., from Apache Kafka or Apache NiFi to Flink and back again). We will begin with a discussion about the shortcomings of the common practice of embedding schemas and generated classes in code projects, followed by an illustration of essential registry features (e.g., centralization, versioning, transformation and validation) as they appear in both Confluent’s and Hortonworks’s schema registries. And, we’ll close with a detailed look at how these schema registries can be integrated into Flink serializers, sources and sinks. 1. https://github.com/confluentinc/schema-registry 2. http://github.com/hortonworks/registry