Skip to main content

Extend and reuse an existing AirByte destination connector

AirByte is an open-source ELT (Extract, Load, and Transformation) application. It heavily uses containerization for the deployment of its various components. On the local machine, we need docker to run it.

AirByte has an impressive list of source and destination connectors available.

One of my use case data destinations is the ClickHouse data warehouse and its destination connector is not yet (2021-12-08) available.

As per the documentation, It seems that creating a destination connector is a non-trivial job. It's a great idea to build an open-source ClickHouse destination connector. However, I tried avoiding the temptation to create one because of the required effort.

AirByte has a MySql destination connector available. ClickHouse provides a MySQL connector for access from any MySQL client. We need to configure Clickhouse to give support for the MySQL connector. Accessing ClickHouse from AirByte using its MySQL destination connector looks promising. However, when I tried connecting it from AirByte a small snag prevented it from making a connection.

Error Public Key Retrieval is not allowed is thrown by AirByte.

The solution for this error is to pass a property allowPublicKeyRetrieval with TRUE value to the underlying JDBC driver (AirByte uses JDBC MySQL drive). Unfortunately, the AirByte setting doesn't have any field to take these additional JDBC driver properties in the MySQL destination connector. However, we can copy, extend and deploy an existing Connector as per our need and add them into AirByte as a new source.

I git clone AirByte git repo at a local directory. All official code for source and destination connectors can be found in the directory airbyte/airbyte-integrations/connectors.

I went ahead inside destination-mysql and made some changes to the appropriate Java Source file to accommodate allowPublicKeyRetrieval=true properties.

Inside destination-mysql we have Dockerfile, I made the change in the

LABEL io.airbyte.name=geekmj/destination-mysql

It will help in getting a docker image with geekmj/destination-mysql name.

To build the docker image I run the following command:


# Must be run from the Airbyte project root
./gradlew :airbyte-integrations:connectors:destination-<name>:build

I published the generated image to Docker Repository.

In AirByte settings > Destinations I added a new connector with the appropriate name.

While creating a new destination I was able to see the newly added destination connector now.

When I tried connecting to ClickHouse now I started seeing a new error.

	
 SQL Error [62] [00000]: Code: 62. DB::Exception: Syntax error: failed at position 54 ('FULL'): FULL TABLES FROM test. Expected one of: TABLES, CLUSTER, CHANGED, GRANTS, CREATE, ACCESS, QUOTA, SETTINGS, CURRENT ROLES, PRIVILEGES, PROCESSLIST, CLUSTERS, DATABASES, CURRENT QUOTA, ENABLED ROLES, CREATE, DICTIONARIES, USERS, ROLES, SETTINGS PROFILES, PROFILES, ROW POLICIES, POLICIES, QUOTAS. (SYNTAX_ERROR) (version 21.11.4.14 (official build))
  Code: 62. DB::Exception: Syntax error: failed at position 54 ('FULL'): FULL TABLES FROM test. Expected one of: TABLES, CLUSTER, CHANGED, GRANTS, CREATE, ACCESS, QUOTA, SETTINGS, CURRENT ROLES, PRIVILEGES, PROCESSLIST, CLUSTERS, DATABASES, CURRENT QUOTA, ENABLED ROLES, CREATE, DICTIONARIES, USERS, ROLES, SETTINGS PROFILES, PROFILES, ROW POLICIES, POLICIES, QUOTAS. (SYNTAX_ERROR) (version 21.11.4.14 (official build))

I realized when we try connecting to ClickHouse using the JDBC MySQL connector it doesn't work as expected. While I can see the connection is established with the server but table level queries started failing.

Hence I decided to create a ClickHouse destination connector from scratch using the JDBC Clickhouse driver.

Comments

Popular posts from this blog

Working with request header in Jersey (JAX-RS) guide

In the  previous post , we talked about, how to get parameters and their values from the request query string. In this guide learn how to get request header values in Jersey (JAX-RS) based application. We had tested or used the following tools and technologies in this project: Jersey (v 2.21) Gradle Build System (v 2.9) Spring Boot (v 1.3) Java (v 1.8) Eclipse IDE This is a part of  Jersey (JAX-RS) Restful Web Services Development Guides series. Please read Jersey + Spring Boot getting started guide . Gradle Build File We are using Gradle for our build and dependency management (Using Maven rather than Gradle is a very trivial task). File: build.gradle buildscript { ext { springBootVersion = '1.3.0.RELEASE' } repositories { mavenCentral() } dependencies { classpath("org.springframework.boot:spring-boot-gradle-plugin:${springBootVersion}") } } apply plugin: 'java' apply plugin: 'eclipse' a

Ajax Cross Domain Resource Access Using jQuery

Some time back in our project we faced a problem while making an Ajax call using jQuery. Chrome Browser console had given some weird error message like below when we try to access one of our web pages: When we try to access the same web page in the Firefox browser, it doesn't give any error in the console but some parsing error occurred. In our case, we were accessing XML as an Ajax request resource. I was curious to check if the non-XML cross-domain resource was successfully loading or not. But finally, I realized that it is not going through. jersey-spring-boot-quick-starter-guide In our Ajax call, requesting domain was not the same as the requested URL domain. $.ajax({ url: "https://10.11.2.171:81/xxxxxx/xxxxxxx.xml" , type : "get" , success: function (response) { alert( "Load was performed." ); }, error : function (xhr, status) {

FastAPI first shot

Setup on my Mac (Macbook Pro 15 inch Retina, Mid 2014) Prerequisite Python 3.6+ (I used 3.7.x. I recently reinstalled OS after cleaning up disk, where stock Python 2.7 was available. I installed Pyenv and then used it to install 3.7.x). I already had a git repo initialized at Github for this project. I checked that out. I use this approach to keep all the source code safe or at a specific place 😀. I set the Python version in .python-version file. I also initialize the virtual environment using pyenv in venv folder. I started the virtual environment. FastAPI specific dependencies setup Now I started with basic pip commands to install dependency for the project. I saved dependencies in requirements.txt  the file. Minimal viable code to spin an API Server FastAPI is as cool as NodeJS or Go Lang (?) to demonstrate the ability to spin an API endpoint up and running in no time. I had the same feeling for the Flask too, which was also super cool. app/main.py: from typing i