Supported versions: 4.1+
Work with JSON strings
In this tutorial, we will see how to use Lenses SQL to process JSON strings using JsonPath , a sort of XPath for JSON objects.
JSON_EXTRACT_FIRST()
and JSON_EXTRACT_ALL()
to navigate and transform
JSON strings.
Setting up our example
We have a topic http_logs
that collects the details of HTTP calls to a microservice. Basic details of the request
are stored, like the URL
and the HTTP
method used. The payload of the requests is stored as well as a string.
We can create the topic and insert some example data through SQL Studio:
CREATE TABLE http_logs(_key string, method string, url string, content_type string, body string)
FORMAT (STRING, AVRO);
INSERT INTO http_logs(_key, method, url, content_type, body) VALUES
('event_1', 'POST', '/users', 'application/json', '{ "id": 1, "username": "juno", "contacts": [ { "type": "email", "value": "juno@example.org" }, { "type": "phone", "value": "12345" }] }'),
('event_2', 'POST', '/users', 'application/json', '{ "id": 2, "username": "max", "contacts": [ { "type": "email", "value": "max@example.org" }, { "type": "twitter", "value": "@max" }] }'),
('event_3', 'GET', '/users/1', '', ''),
('event_4', 'GET', '/users/2', '', ''),
('event_5', 'POST', '/users', 'application/json', '{ "id": 3, "username": "nic", "contacts": [ { "type": "email", "value": "nic@example.org" }, { "type": "phone", "value": "78910" }] }'),
('event_6', 'PUT', '/users/1', 'application/json', '{ "username": "juno", "contacts": [ { "type": "email", "value": "juno@domain.org" }] }'),
('event_7', 'POST', '/users', 'application/json', '{ "id": 4, "username": "john", "contacts": [ { "type": "email", "value": "john@example.org" }] }');
The HTTP
method and the URL
used for the request are stored in
the method
and url
fields respectively, while the optional payload, and its content-type, are stored in the body
and content_type
fields.
As you can imagine the logs contained in this topic are quite generic, and different endpoints may have different
content-type
s for their body. For this reason the best the system can do is storing the payload as a simple string, whenever
that is possible.
This comes with some drawbacks: since the data is a simple string, and it is not structured, it is not possible to
inspect it as we would do with a normal AVRO
/JSON
object.
Fortunately Lenses SQL offers a couple of handful functions that make our life easier in these kind of scenarios.
Extracting fields from a JSON string
Our first task is to find the username of users created with a call to POST /users
.
To do that we can use JSON_EXTRACT_FIRST(json_string, pattern)
, one of the
string functions
available in Lenses SQL.
The first argument of the function is the string representing the JSON
we want to manipulate. The second is a string
representing a
JsonPath
.
JsonPath is a powerful way to traverse and extract elements from a JSON
object. Explaining the full details of
goes beyond the scope of this article, but in general it can be thought as a JSON
version of XPath
, the standard used
to select elements from an XML document.
A nice way to try and test if your JsonPaths are doing what you intended, is using the JsonPath online evaluator .
In our case, we would like to extract the name of the user just created. The simple path $.username
will do it!
Let’s try to use it in a SELECT
that we can run in SQL Studio:
SELECT
JSON_EXTRACT_FIRST(body, '$.username') as username
FROM
http_logs
That query will produce the results
{ "username": "\"juno\"" }
{ "username": "\"max\"" }
{ "username": null }
{ "username": null }
{ "username": "\"nic\"" }
{ "username": "\"juno\"" }
{ "username": "\"john\"" }
As you can see we have two entries for juno
. That’s because the user was first created, and then modified later, with a PUT
call.
Also, there are some null
values. This is because JSON_EXTRACT_FIRST
was not able to extract the username, either
because the payload was not valid JSON
, or because the field was not found.
We can fix this restricting our query to user creation calls:
SELECT
JSON_EXTRACT_FIRST(body, '$.username') as username
FROM
http_logs
WHERE
method = "POST" AND
url = "/users"
We have now only valid results:
{ "username": "\"juno\"" }
{ "username": "\"max\"" }
{ "username": "\"nic\"" }
{ "username": "\"john\"" }
Filtering the results
All Lenses SQL functions can be used in any part of the query.
Thus JSON_EXTRACT_FIRST
can be used in the projections, where, and group bys.
For example, you can run the query
SELECT
JSON_EXTRACT_FIRST(body, '$.contacts[?(@.type=="email")].value') as email
FROM
http_logs
WHERE
JSON_EXTRACT_FIRST(body, '$.username') = '"max"'
to retrieve max
’s e-mail:
{ "email" : "max@example.org" }
Extract multiple values from a JSON string
So far we had fun using JSON_EXTRACT_FIRST
, but we have not talked yet about its bigger brother, JSON_EXTRACT_ALL
.
JSON_EXTRACT_ALL(json_string, pattern)
works like JSON_EXTRACT_FIRST
, except that it will return all the values
that match the pattern
. The results will be returned in an array, and when no results are found the empty array
will be returned.
Let’s make use of it, extracting all the contact types used at the moment of the creation of the user:
SELECT
JSON_EXTRACT_FIRST(body, '$.username') as username,
JSON_EXTRACT_ALL(body, '$.contacts[*].type') as contact_types
FROM
http_logs
WHERE
method = "POST" AND
url = "/users"
Running the query above we get what we desired:
{ "username": "\"juno\"", "contact_types": [ "email", "\"phone\"" ] }'),
{ "username": "\"max\"", "contact_types": [ "email", "\"twitter\"" ] }'),
{ "username": "\"nic\"", "contact_types": [ "email", "\"phone\"" ] }'),
{ "username": "\"john\"", "contact_types": [ "\"email\""] }')
Sql Processors
JSON_EXTRACT_FIRST()
and JSON_EXTRACT_ALL()
are available also in the Streaming Engine, like most Lenses SQL functions.
Let’s say we want another topic continuously filled with the contact types used for user creations. We also want each record containing a single username-contact type pair. To achieve that we can take the query of the last example and adapt it a bit, using a Lateral Join :
SET defaults.topic.autocreate=true;
INSERT INTO contact_types
SELECT STREAM
JSON_EXTRACT_FIRST(body, '$.username') as username,
contact_type
FROM
http_logs LATERAL
JSON_EXTRACT_ALL(body, '$.contacts[*].type') as contact_type
WHERE
method = "POST" AND
url = "/users"
Conclusions
JSON_EXTRACT_FIRST()
and JSON_EXTRACT_ALL()
simplifies your life every time you have to deal with JSON
that
is represented as a string value of a field in your topic.
The use of JsonPath
make them very powerful and even
complex operations are easily representable with it.
To know more about JsonPath you can check the author's original blog post , while to know more about Lenses SQL String functions you can check our documentation .