One Does Not Simply “Create a Visualization” from Unstructured Data
When I worked as a database consultant, I often saw conversations similar to the following at clients:
Manager: Hey Data Analyst! We have a huge amount of data in our data lake, and I would like you to use it to provide us with some analysis and visualizations. Could you get that to me next week?
Data Analyst: Not really. The data that we have in the data lake is unstructured, and I can't do much with it until it's structured.
Manager: What do you need in order to structure it?
Data Analyst: Our data engineers have the skills to do that.
Manager: Oh, I see – OK, I’ll talk to the Data Engineer. Hey, Data Engineer! Can you structure this data by next week?
Data Engineer: Next week sounds a bit unrealistic. I’ll need to talk to the teams that have created the data in order to be sure that I understand how it is organized and what it means. After that, I can write some custom code or use a pre-existing tool to extract information from the unstructured data, and then store it in a structured table. Let’s set up a project and allocate time and resources to do this, it could take some time!
Manager: You mean that we have all this data, but we can’t use it unless we spend a bunch of engineering resources to process it first?
Data Engineer: That is correct.
Manager: My boss is not going to be happy … Can you help me to get a better understanding of the difference between structured and unstructured data? I’m going to need some good justification for this!
Data Engineer: Sure, I’ve written a summary of the difference between structured, semi-structured, and unstructured data for you. Keep reading at https://airbyte.com/blog/analyze-unstructured-data! Here is a related tweet that touches on exactly this issue: https://twitter.com/sethrosen/status/1252291581320757249?lan... For a more detailed explanation of unstructured vs. structured data, see: https://airbyte.com/blog/analyze-unstructured-data This hits way too close to home - I have actually had this conversation with my manager haha nice little skit, i wonder if this could work as a youtube short as well. the famous SethRosen tweet in TFA is basically the start of a long conversation about what we want from data vs the work that it takes to get it I like it! Great idea!