AWS Athena with QueryDSL

Fraser sequeira
3 min readMay 2, 2020

--

QueryDSL for Athena

AWS Athena is a Serverless query service that allows querying your S3 data using standard SQL. It’s fully managed by AWS and you pay only for the queries you run. When writing my Athena client in java I wondered if there was a way to use the rich Criteria Query API that supports JPA entities could it be extended for Athena ?

Problems using Criteria Query API for Athena

  1. Persistence Unit: Getting a CriteriaBuilder instance requires configuring an entity manager. While it’s possible to configure an Athena JDBC DataSource it makes no sense doing that, since I am not interested in having managed entities in my application or updating my Athena table. The Athena query service is completely asynchronous, you fire a query, it generates a query-id and you later retrieve the data using that query-id.
  2. Schema validations: An Athena table schema doesn’t have/require a primary key however when defining a JPA entity it expects @ID annotated over one or more fields.

After a bit of exploration I found two popular Java DSL libraries :- QueryDSL and JOOQ that could work as a substitute for the Criteria Query API and help generate standard SQL queries that are Athena compliant.

In this post we will look at using QueryDSL to generate an Athena compliant Query.

a. Maven Dependencies : Add these two dependencies in your application:

Query DSL dependencies

b. SQLTemplates: The H2Template offered by QueryDSL is compatible with Athena. We can configure it as below:

SQLQuery Instance using an H2Template

c. Generate Athena Compatible Query: If you have worked previously with the CriteriaBuilder this approach is somewhat similar and much less verbose and readable:

Athena QueryDSL code snippet
QueryDSL generate SQL executed in Athena

Conclusion: In this article we have seen how QueryDSL has helped generate Athena Compliant standard SQL queries and also makes our code more maintainable, object oriented and easier to understand.

Its any day better than writing your own java sql query string which is more prone to errors and hard to debug in case of complex queries involving multiple joins/predicates. You could find my sample AthenaQueryBuilder code on github.

This is a sample Athena compliant SQL generation code and may not have all that your looking for. Do let me know if this was useful.

GitHub Link : -> https://github.com/Fraser27/AthenaQueryBuilder

--

--

No responses yet