Settings

Theme

Python ETL with Airbyte and Pathway

pathway.com

3 points by janchorowski 2 years ago · 2 comments

Reader

janchorowskiOP 2 years ago

Now you can use Airbyte source connectors to process data in memory with Python.

We integrated Airbyte connectors with Pathway, a Python stream processing framework, using the airbyte-serverless project. We believe ETL pipelines are coming back with many use cases in AI (RAG pipelines), ETL for unstructured data and pipelines that deal with PII data. In this article, we show how to stream data from Github using Airbyte and remove PII data with Pathway. We are curious on your feedback on the implementation and other use cases you may think of from decoupling the extract and load steps.

Arimbr 2 years ago

Interesting implementation! For complex stream and text processing, I also prefer processing data in memory with Python (ETL) rather than SQL in the warehouse (ELT).

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection