Settings

Theme

One Does Not Simply “Create a Visualization” from Unstructured Data

11 points by alexmarquardt 3 years ago · 6 comments · 2 min read


When I worked as a database consultant, I often saw conversations similar to the following at clients:

Manager: Hey Data Analyst! We have a huge amount of data in our data lake, and I would like you to use it to provide us with some analysis and visualizations. Could you get that to me next week?

Data Analyst: Not really. The data that we have in the data lake is unstructured, and I can't do much with it until it's structured.

Manager: What do you need in order to structure it?

Data Analyst: Our data engineers have the skills to do that.

Manager: Oh, I see – OK, I’ll talk to the Data Engineer. Hey, Data Engineer! Can you structure this data by next week?

Data Engineer: Next week sounds a bit unrealistic. I’ll need to talk to the teams that have created the data in order to be sure that I understand how it is organized and what it means. After that, I can write some custom code or use a pre-existing tool to extract information from the unstructured data, and then store it in a structured table. Let’s set up a project and allocate time and resources to do this, it could take some time!

Manager: You mean that we have all this data, but we can’t use it unless we spend a bunch of engineering resources to process it first?

Data Engineer: That is correct.

Manager: My boss is not going to be happy … Can you help me to get a better understanding of the difference between structured and unstructured data? I’m going to need some good justification for this!

Data Engineer: Sure, I’ve written a summary of the difference between structured, semi-structured, and unstructured data for you. Keep reading at https://airbyte.com/blog/analyze-unstructured-data!

alexmarquardtOP 3 years ago

Here is a related tweet that touches on exactly this issue: https://twitter.com/sethrosen/status/1252291581320757249?lan...

alexmarquardtOP 3 years ago

For a more detailed explanation of unstructured vs. structured data, see: https://airbyte.com/blog/analyze-unstructured-data

alexairbyte 3 years ago

This hits way too close to home - I have actually had this conversation with my manager

swyx 3 years ago

haha nice little skit, i wonder if this could work as a youtube short as well. the famous SethRosen tweet in TFA is basically the start of a long conversation about what we want from data vs the work that it takes to get it

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection