Show HN: Xplore Path – Explore untidy data spread across scattered files/formats
github.comXplore Path is a tool for quick-and-dirty data exploration, built for messy, untidy data scattered across files and formats.
* Simple syntax: Query data with an intuitive, XPath-like syntax.
* Broad format support: Search through CSVs, XLSXs, JSONs, YAMLs, DOCXs, PDFs, XMLs, HTMLs, ...
* Fuzzy search support: Search using globs, regex, number ranges, or approximate string matching.
* Unified environment: Search through disparate files and formats within a single context.
* Extendable: Add functions and formats to customize to your use case (e.g. 3D scene graphs, flow cytometry, ...).
Xplore Path aims to be the first tool you reach for when inspecting / exploring new data "thrown over the fence" by a colleague or partner. Imagine receiving a zip file containing a nested directory structure full of cryptically named CSVs, XLSXs, PDFs, JSONs, HTMLs, and maybe even a SQLite database. Instead of hopping between tools or libraries to piece it all together, Xplore Path loads everything as a hierarchy and lets you search it with XPath-like simplicity. You can explore, slice, and integrate data with an easy to use REPL that provides auto-completion.
The GitHub link has instructions on getting started. Please take a look and let me know your thoughts. I wrote Xplore Path over the holiday break becauause the scenario above is one I've commonly experienced throughout my career, especially when dealing with third parties (e.g. university labs and CROs).
The project is functional but still rough around the edges. Here's what needs ironing out:
* Language grammar needs to be tightened up.
* More exhaustive testing.
* More exhaustive documentation (both code documentation and user documentation).
* Squeezing more speed (joins are especially slow).
* Some formats need better support / more formats need to be supported.
* Some formats need functionality added to allow the user to transform how the hierarchy gets loaded.
The first item (grammar design) is the one I'm most inexperienced with. If any language design experts have time to kill, please feel free to review and make suggestions.
No comments yet.