+----------------------------------------+-------------+------------+-----------+-----+-------+
|address |street_number|street_name |city |state|zip |
+----------------------------------------+-------------+------------+-----------+-----+-------+
|123 Main St, New York, NY 10001 |123 |Main |New York |NY |10001 |
|456 Oak Ave Apt 5B, Los Angeles, CA... |456 |Oak |Los Angeles|CA |90001 |
|789 Elm Street, Chicago, IL 60601 |789 |Elm |Chicago |IL |60601 |
|321 Pine Road Suite 100, Boston, MA... |321 |Pine |Boston |MA |02101 |
+----------------------------------------+-------------+------------+-----------+-----+-------+
Every function is native PySpark. No UDFs. No black boxes. Just code that handles edge cases you haven't thought of yet.
Zero dependencies beyond PySpark. No framework lock-in, no version conflicts.
Copy-paste, don't import. The code is yours to modify and own.
Production-ready transformations. Phone numbers, emails, addresses, and more.
Why DataCompose?
Stop writing the same regex patterns. Stop debugging phone number edge cases. Stop maintaining transformation libraries. DataCompose generates the code you'd write yourself if you had the time. Then gives it to you to own, modify, and deploy however you want.