Kafka JSON schema and limitations
When creating a Kafka dataset, you have the possibility to enter a custom JSON schema which is then used when reading/writing from the selected topic.
Caveats for working with JSON and Kafka input
The current implementation of JSON support in Kafka works as follows:
- The schema is inferred from the first JSON record, this schema is then used to convert subsequent JSON records.
- If a JSON record does not match the inferred JSON schema, it is dropped silently (with a debug message).
Example of a Kafka topic with the following JSON
records:
1 - {"title":"The Matrix","year":1999,"cast":["Keanu Reeves","Laurence Fishburne","Carrie-Anne Moss","Hugo Weaving","Joe Pantoliano"],"genres":["Science Fiction"]}
2 - {"Test" : true}
3 - {"title":"Toy Story","year":1995,"cast":["Tim Allen","Tom Hanks","(voices)"],"genres":["Animated"]}
The Kafka input connector will handle the messages like this:
- Infer the schema from the first incoming JSON record (message number 1).
- Forward message number 1 to the next connector.
- Drop message number 2 as it does not match the inferred schema.
- Forward message number 3 to the next connector as it matches the inferred schema.
Caveats for working with JSON and Kafka output
The Kafka output connector cannot handle properly the Bytes type.