Show HN: Transform a CSV into a JSON and vice versa

jsonmatic.com

125 points by okumurahata 5 years ago · 110 comments

Reader

af3d 5 years ago

Looks a bit like adware IMO. The library appears to be drenched in analytics. Dependencies include:

https://www.npmjs.com/package/web-vitals/v/0.1.0 https://www.npmjs.com/package/@fingerprintjs/fingerprintjs

Harvesting user's data, most likely...

okumurahataOP 5 years ago

Author here; some clarification. I use fingerprint to get the number of visits (instead of using something invasive like Google Analytics):
https://github.com/erikmartinjordan/jsonmatic/blob/master/sr...
I get the fingerprint as a UID (which is like a random number for me). I don't harvest any user's data. Code is open-source, you can verify what I'm saying if you wish.
- koolba 5 years ago
  
  Collecting usage statistics is harvesting data. This is a classic example of why you should never run random NPM modules. Or even install them as all of this is possible in a post install script too.
  Putting analytics in a deployed app is your prerogative. Putting it in what touts itself as a reusable component is at best frowned upon.
  - okumurahataOP 5 years ago
    
    Ok, I see your point. I update the website.
    This commit deletes any kind of data harvesting (removes fingerprint and visit counting):
    https://github.com/erikmartinjordan/jsonmatic/commit/7f3fa89...
    
    johndough 5 years ago
    
    web-vitals is still there. What is it used for? https://github.com/erikmartinjordan/jsonmatic/blob/f926f197b...
    
    okumurahataOP 5 years ago
    
    It comes by default on React apps, it’s not used for anything.
Karliss 5 years ago

It looks like there is some confusion of what it is. The content you see in linked page is the the software not a demonstration of how the library output looks. If it was a library taking in bytes and outputting bytes I would agree that it shouldn't depend on any analytics, but if it's a website that's more of authors choice.
ianschmitz 5 years ago

There's nothing wrong with web-vitals...
- hughcrt 5 years ago
  
  There's nothing wrong with web-vitals, and it's included in create-react-app, which the author used.
  - true_religion 5 years ago
    
    I agree, though for the sake of argument Facebooks tolerance for tracking and fingerprinting far exceeds anyone else’s on the internet so their stamp of approval for web vitals is meaningless.
- croes 5 years ago
  
  But there is much wrong with finger printing.
MonaroVXR 5 years ago

How did you figure this out?
- lcabral 5 years ago
  
  The page has a link at the bottom to the GitHub project where you check the dependencies...
  - true_religion 5 years ago
    
    Yes but it’s not a library. It’s an entire website. It even uses Firebase.

jarofgreen 5 years ago

At work we work an a Python Library to do this, and much more:

PyPi: https://pypi.org/project/flattentool/

Source: https://github.com/OpenDataServices/flatten-tool

Docs: https://flatten-tool.readthedocs.io/en/latest/

It converts JSON to CSV and vice versa but also Spreadsheet files, XML ...

It has recently had some work to make it memory efficient for large files.

Work, BTW, is an Open Data Workers Co-op working on data and standards. We use this tool a lot directly, but also as a library in other tools. https://dataquality.threesixtygiving.org/ for instance - this is a website that checks data against the 360 Giving Data Standard [ https://www.threesixtygiving.org/ ].

contravariant 5 years ago

What are the advantages of this tool in comparison with e.g. pandas' json_normalize?
- jarofgreen 5 years ago
  
  Flatten-tool has more options and functions than just that one pandas functions (But I haven't done a full comparison to all Pandas functions. I wasn't around when the tool was started so I can't say what analysis was done at the time.)
  For instance I note with interest their examples on nested data and arrays. We have various different ways you can work with arrays, so you can design user-friendly spreadsheets as you want and still get JSON of the right structure out: https://flatten-tool.readthedocs.io/en/latest/examples/#one-... (Letting people work on data in user-friendly spreadsheets and converting it to JSON when they are done is one of the big use cases we have)

brundolf 5 years ago

Slightly OT: I've realized that CSVs are dramatically more information-dense than the equivalent JSON, and actually make a pretty reasonable API response format if your dataset is large and fits into the tabular shape. They can be a fraction of the size, mainly because keys aren't duplicated for every item.

OJFord 5 years ago
Yeah, lists of objects are pretty crap, because they're almost always homogeneous but unenforcédly so; it's not just a size issue but a parsing (or not - usage) issue too.
You could approximate CSV in a JSON response like:
```
    {
      "columns": ["a", ..., "z"],
      "rows": [[1, ..., 26], ..., [11, 2266]]
    }
```
Or:
```
    {
      "a": [1, ..., 11],
      ...
      "z": [26, ..., 2266]
    }
```
which I've never seen, but would save space, and sort of enforced in the sense that if you trust your serialiser for it as much as you trust an equivalent CSV serialiser, it's fine. (But the same argument could be made for more usual JSON object lists. Only arguable difference is that there's more of an assertion to the client that they should be expected to be homogeneous.)
- inopinatus 5 years ago
  
  A columnar structure is definitely popular in the analytics community, although there are binary formats that are faster to scan again and can be mmap'd for zero-copy access.
  The fundamental problem with CSV is that it has no canonical form or formal construction. Even the RFC documents it by example and historical reference, rather than from first principles, and does so with liberal use of "maybe". Consequently being very easy to fling about as a human but much harder to reason about in the abstract, and this is most evident when you get bogged down in the gritty details of writing a CSV importer for your application.
  - scrollaway 5 years ago
    
    +1 - If you are looking for something JSON-compatible, JSON Lines (one json object per line - https://jsonlines.org/) is pretty popular as well.
    You could store a CSV in JSONL very easily. In fact, jsonlines' website shows it as its first example: https://jsonlines.org/examples/
    And if you wanted something that is json file-wide, you can just add some commas and wrap in [] for a list of rows.
    
    hnlmorg 5 years ago
    
    jsonlines (and ndjson too, since they're overlapping specs) is amazing. It's definitely the better way to convert between JSON and CSV.
    » ps aux | grep root | head -n5 | format jsonl ["root","87596","0.0","0.0","4359648","116","??","Ss","10:01am","0:00.02","com.apple.cmio.registerassistantservice"] ["root","81777","0.0","0.0","4321932","88","??","Ss","Wed12am","0:00.01","PlugInLibraryService"] ["root","71784","0.0","0.1","4365572","10504","??","Ss","Tue11pm","0:25.88","PerfPowerServices"] ["root","42906","0.0","0.0","4321572","88","??","Ss","Tue09am","0:00.01","com.apple.ColorSyncXPCAgent"] ["root","47415","0.0","0.0","4303172","88","??","Ss","Sat04am","0:00.01","aslmanager"]
    It has the readability of CSV but the stricter formatting of JSON. Win win.
    
    aae42 5 years ago
    
    i've known about json lines for a bit (had to make a utility that parsed some output of a program that used jsonlines and was irritated it didn't use the json spec)...
    but for some reason just this post made me realize something... i've always been irritated by CSV log files (or space delimited), but would prefer something more structured like JSON, but JSON log files are pretty obnoxious for the reasons mentioned in the comments, a lot of data duplication...
    JSON lines for log files seems like a great fit, not sure why i didn't realize this until now, i suppose it was the context of the discussion!
    
    DangitBobby 5 years ago
    
    Oh no, this is so similar to http://ndjson.org/
    
    boygobbo 5 years ago
    
    ndjson appears to be a fork of jsonlines with the addition of a spec (see https://github.com/ndjson/ndjson.github.io/issues/1)
- cowsandmilk 5 years ago
  
  I’ll note, your second item here is a structure of arrays, which is often higher performance in practice when you are only interested in certain portions of the data.
  For this reason, your second structure is how I serialize code I’m interacting with in C.
  - mananaysiempre 5 years ago
    
    ... Also known as a very simple case of a “column-oriented database”, of which are several at various scales from Metakit[1] to Clickhouse[2]. It’s a neat way to have columns which are sparsely populated, required to accommodate large blobs, numerous but usually not accessed all at once, or frequently added and deleted.
    Nothing’s perfect, of course: you can’t stream records in such a format, so no convenient Unix-style tooling.
    [1]: http://www.equi4.com/metakit.html [2]: https://yandex.com/dev/clickhouse/
  - dmw_ng 5 years ago
    
    Higher performance, much smaller compressed and uncompressed, accepted by a wide range of tools, e.g. Pandas, and often more convenient to parse from statically typed languages.
  - 1vuio0pswjnm7 5 years ago
    
    https://shakti.com
- jarofgreen 5 years ago
  
  > Yeah, lists of objects are pretty crap, because they're almost always homogeneous but unenforcédly so
  We use JSON Schema heavily to enforce structure in JSON data. https://json-schema.org/
- amichal 5 years ago
  
  i've done the first when serializing an arbitrary table of trace data in a jsonb column. Did it to make it compact and then realized that if all i wanted to do was show it in a UI it was much easier to parse as a html table, and only marginally harder to present an array of objects.
sonthonax 5 years ago

Shouldn’t really matter too much if the response is being compressed.
If you’re rendering the table in the DOM, the response size is the least of your issues.
- ludocode 5 years ago
  
  This is the real answer. All of the other answers are suggesting various changes to the JSON structure to eliminate key repetition, but this is irrelevant under compression.
  Where it becomes relevant is if each record is stored as a separate document so you can't just compress them all together. Compressing each record separately won't eliminate the duplication, so you're better off with either a columnar format (like a typical database) or a schema-based format (like protobuf.)
  - rovr138 5 years ago
    
    To parse it, you need to check the keys. If there can’t be other keys, you can just use an array which is stable on JSON and you can save on keys.
    So you just have an array of arrays. Or even a huge array and every X elements, it’s a new record.
    If each one has 2 keys,
    [ { key1: ‘a’, key2: ‘b’ }, { key1: ‘a’, key2: ‘b’ } ]
    Can become,
    [ [ ‘a’, ‘b’ ], [ ‘a’, ‘b’ ] ]
    Or just every 2 will be a new record,
    [ ‘a’, ‘b’, ‘a’, ‘b’ ]
    
    ludocode 5 years ago
    
    But why? Why save on keys when compression will nearly eliminate them for you?
    
    rovr138 5 years ago
    
    Compression mainly helps with transmission.
    Trying to point out that the original structure allows for more flexibility.
    If you only cared about space, this compresses better anyway and uncompressed, it still occupies less space.
- brundolf 5 years ago
  
  Sometimes you fetch a large dataset and only show one page at a time in the DOM, or render it as a line in a chart or something. At a previous workplace we had CSV responses in the hundreds of megabytes.
  - nly 5 years ago
    
    70 GB CSV files aren't uncommon at my work. It's not really a problem since CSV streams well.
  - cerved 5 years ago
    
    That sounds incredible inefficient
    What was the rational for such enormous single payloads?
    
    hnlmorg 5 years ago
    
    Without knowing more about the application, I'd guess probably caching and/or scaling. If you only need 1 payload then that can be statically generated and cached in your CDN. Which in turn reduces your dependence on the web servers so few nodes are required and/or you can scale your site more easily to demand. Also compute time is more expensive than CDN costs so there might well be some cost savings there too.
    
    brundolf 5 years ago
    
    This was basically it. The dataset was the same across users so caching was simple and efficient, and the front-end had no difficulty handling this much data (and paging client-side was snappier than requesting anew each time)
killingtime74 5 years ago

It’s because it’s self describing right? If you look at protobuf, thrift, avro those are even denser
quantumofalpha 5 years ago

If you'd use a column-oriented format like {"col1":["a","b","c",...],"col2":[1,2,3...],...}, it's about the same density, no?
- rahimnathwani 5 years ago
  
  This is the default format used by pandas.DataFrame.to_dict()
  I usually need a less dense version, e.g. to send to a jinja2 template, so mostly use to_dict(orient='index').
anonytrary 5 years ago
It would be much nicer for the consumer to just de-dupe the keys in your json than to serve an annoying format like CSV. Your JSON could basically be a matrix with a header row, there's nothing forcing you to duplicate keys.
```
  { header: [...columnNames], rows: [...values2DArray]}
```
- earthboundkid 5 years ago
  
  Make rows 1 dimensional. You don’t need the second dimension, it’s implied by header length. Once you do this, the JSON gzips down to about the same size as CSV, according to the last time I tested this IIRC.
  - anonytrary 5 years ago
    
    I edited-in the "2DArray" because I thought it was confusing... But you're right, just calculate offsets. The dominating term is still quadratic, and the term you mentioned is linear. It could be worth it for a scaled org like Google!
    I wonder which parses faster. I guess CSV does but then the consuming code would still have to parse the strings into JS primitives...
    
    earthboundkid 5 years ago
    
    I haven't tested this, but my guess is that because the browser built in JSON.parse will be faster than whatever CSV parser you can write in JS just because it's precompiled to native code. Then the question becomes how long does it take to do the unpacking loop, but it should be pretty quick.
    I'd love it if someone did a benchmark though.
  - dragonwriter 5 years ago
    
    Heck, you could do a single 1-D list (no object), and just give the header count as the first element, which would be even more compact.
    
    earthboundkid 5 years ago
    
    Smart idea.
jpitz 5 years ago

Hell is other people's CSVs.

earthboundkid 5 years ago

I wrote my own converter a few years ago, then ended up needing it again last week. It’s one of those things you don’t always need but it’s handy to have when you do. https://github.com/baltimore-sun-data/csv2json

jabo 5 years ago

I recently heard about a tool called Miller that helps convert between JSON and CSV among other formats: https://github.com/johnkerl/miller

  mlr --c2j cat documents.csv > documents.jsonl

Converts a CSV file to a JSONL file

th0ma5 5 years ago

CSV is more of a rumor than a standard, plus JSON can have a tree structure. It is a fun idea to think about and may be useful in some narrow cases, but will fail in almost all but those most trivial of structures.

amyjess 5 years ago

> CSV is more of a rumor than a standard
This reminds me of something my boss at a previous job would say: "I am morally opposed to CSV."
Why? Because we worked at an NLP company, where we would frequently have tabular data featuring commas, which means if we used CSV we'd have a lot of overhead involving quoting all our CSV data. Instead my boss preferred TSV (T = tab) as our preferred tabular data format, which was much simpler for us to parse since we didn't really deal with any fields that had \t in them.
- earthboundkid 5 years ago
  
  Lol, so instead of having an actually working solution (escaping), you had a still broken solution that just didn’t blow up as often so you could ignore it until it caused a crash.
  - th0ma5 5 years ago
    
    Escaping breaks often as well. The general problem is that the data is inline with the format. Parquet files or something that has clear demarcation between data and file format are more ideal but probably nothing is perfect or future proof. Or accepting of past mistakes either.
    
    quickthrower2 5 years ago
    
    You can write a perfectly isomorphic escaping printer/parser
    
    th0ma5 5 years ago
    
    Obligatory https://xkcd.com/927/ but also all systems would have to be proven correct or else it wouldn't work. "Forgiving" parsers are the norm and you can't rely on what they do deterministically.
- fellowniusmonk 5 years ago
  
  I always try to use ascii char 31 (unit seperator) if I'm computationally generating csvs for shunting around data internally.
  - gfody 5 years ago
    
    it's crazy how our progenitors had the wisdom and foresight to reserve FOUR distinct delimiters for us 28-31 file/group/record/unit but webdevs are just nope we'll go with commas and crlfs and when that doesn't work we'll JSON
    
    elcritch 5 years ago
    
    Not sure I've seen a classic Unix tool utilize \x28-\x31 by default. Ignoring standards in ascii was I'm vogue long before webdevs.
    
    quickthrower2 5 years ago
    
    JSON is probably popular because JS was popular and it’s a subset. And being a programming language that’s is typed by humans, so those special characters would not have naturally featured.
    For data, if we are going to use reserved characters, maybe we just use protobuf and let the serialisation code take the strain.
    
    cerved 5 years ago
    
    ah, yes, CSVs. The format famously preferred and lauded by the web development community
- hnick 5 years ago
  
  PSV (Pipe) is also good, tabs can rarely show up in some sets of data like mail addresses if humans key them in. I usually go with one or the other if I have a choice.
  - hnlmorg 5 years ago
    
    CSV (and it's derivatives) made some sense 20 years ago but these days if you want to make your lives easier with tables you're better off using jsonlines.
    ["Name", "Session", "Score", "Completed"] ["Gilbert", "2013", 24, true] ["Alexa", "2013", 29, true] ["May", "2012B", 14, false] ["Deloise", "2012A", 19, true]
    Plenty of tools, including some I've written myself, support it.
    https://jsonlines.org/examples/
    https://murex.rocks/docs/types/jsonl.html
    
    867-5309 5 years ago
    
    that's just CSV with unnecessary square brackets and whitespace
    
    hnlmorg 5 years ago
    
    As someone who's written parsers for both CSV and jsonlines, I can assure you that you could not be further from the truth:
    1. The whitespace is optional. It's just put there for illustrative purposes.
    2. Whitespace in CSV can actually corrupt the data where some parsers make incompatible assumptions vs other CSV parsers. Eg space characters used before or after commas -- do you trim them or include them? Some will delimit on tabs as well as commas rather than either/or. Some handle new lines differently
    3. Continuing off the previous point: new lines in CSVs, if you're following IBM's spec, should be literal new lines in the file. This breaks readability and it makes streaming CSVs more awkward (because you then break the assumption that you can read a file one line at a time). jsonlines is much cleaner (see next point).
    4. Escaping is properly defined. eg how do you escape quotation marks in CSV? IBM's CSV spec states double quotes should be doubled up (eg "Hello ""world""") where as some CSV parsers prefer C-style escaping (eg "Hello \"world\"") and some CSV parsers don't handle that edge case at all. jsonlines already has those edge cases solved.
    5. CSV is data type less. This causes issues when importing numbers in different parsers (eg "012345" might need to be a string but might get parsed as an integer with the leading space removed. Also should `true` be a string or a boolean? jsonlines is typed like JSON.
    The entire reason I recommend jsonlines over CSV is because jsonlines has the readability of CSV while having the edge cases covered that otherwise leads to data corruption in CSV files (and believe me, I've head to deal with a lot of that over my extensive career!)
    
    hnick 5 years ago
    
    CSV is terrible, and I'd never write my own parser, but there are datasets where tab or pipe can never appear and just using @line = split(/|/, $data) or similar in another language is so convenient for quick and dirty scripting.
    
    hnlmorg 5 years ago
    
    Not just datasets where tab or pipe can never appear, but also that quotation marks aren't used and new lines can never appear (in CSV a row of data can legally span multiple lines because you're not supposed to escape char 12 (or '\n' as it appears in C-like documents).
    I do get the convenience of CSV and I've used it loads in the past myself. But if ever you're dealing with data of which the contents of it you cannot be 100% sure of, it's safer to use a standard that has strict rules about how to parse control characters.
    
    hnick 5 years ago
    
    TSV/PSV generally don't allow newlines and commas/quotes are not special so are fine. Though Excel doesn't always play nice, but if you care about data integrity, you won't open it in Excel anyway.
    
    hnlmorg 5 years ago
    
    AFAIK TSV and PSV aren't specs, they're just an alternative delimiters for CSV. To that end most TSV and PSV parsers will be CSV parsers which match on a different byte ('\t' or '|' as opposed to ','). Which means if the parser follows spec (which not all do) then they will allow newlines and quotes too.
    I'm not saying your use case is isn't appropriate though. eg if you're exporting from a DB who's records have already been sanitised and wanting to do some quick analysis then TSV/PSV is probably fine. But if you aren't dealing with sanitised data that doesn't contain \n, \" or others, then there is a good chance that your parser will handle them differently to your expectations -- and even a slim chance that your parser might just go ahead and slightly corrupt your data rather than warn you about differing column lengths et al. So it's definitely worth being aware that TSV and PSV suffer from all the same weaknesses as CSV.
    
    867-5309 5 years ago
    
    thanks for your reply, it was really informative. I'm not a fan of CSV so will delve deeper into jsonlines as an alternative next time it crops up
    
    th0ma5 5 years ago
    
    https://xkcd.com/927/
    
    hnlmorg 5 years ago
    
    That's the nice thing about jsonlines, it's not creating a new competing standard. It's just making better use of an existing standard (JSON).
nly 5 years ago

Trees are trivially flattened, and it's literally a couple lines of comments or documentation to describe what flavor of CSV you're using.
- th0ma5 5 years ago
  
  Flattening is a lot of duplication

nly 5 years ago

jq's stream and fromstream functions can be used to flatten and unflatten JSON. I use it all the time at work for POs who want to see data in Excel

https://jqplay.org/s/ub-WvXCcPn

... from there it's just a row->column rotation to CSV.

sireat 5 years ago

CSV<->JSON is fundamentally an unsolvable problem because of mismatch in data hierarchies among them.

Plus you have the type looseness for both and lack of standards for CSV.

A trivial 2-D case is handled well by Python library such as Pandas. Here OP could be an alternative.

When I say trivial I mean flat 2 dimensional data, such as you would get from Mockaroo or similar source.

However in real life - data is messy.

As you get into 3,4 and deeper hierarchies on JSON you can't really translate that into nice flat 2d CSV.

Then you have missing keys, mixed up types and you end up rolling you own hand written converters.

code-faster 5 years ago

I have a couple of open source CLI tools to do this: - https://github.com/tyleradams/json-toolkit/blob/master/csv-t... - https://github.com/tyleradams/json-toolkit/blob/master/json-...

tyingq 5 years ago

Wouldn't this need to allow upload/download of CSV to really meet the spirit of the title? Or maybe replace the references of CSV with "HTML Table"?

okumurahataOP 5 years ago

Yes, you are right. I added a button to allow CSV upload instead of only add data by copying/pasting.

robbiejs 5 years ago

If anyone is looking at a free tool to quickly edit CSV data in an Excel-like editor, see https://editcsvonline.com

somishere 5 years ago

Built something similar on codepen quite a few years ago. Not sure where I came up with the format, seems a bit wild looking at the v. nice dot notation used here, but possibly more useful/efficient for variable data models, also takes into account data types:

https://codepen.io/theprojectsomething/pen/OwppWW

Note: click the Toggle Info to read the "spec" (groan) :)

osullip 5 years ago

I run a software company and we have a challenge when it comes to these types of conversion tools.

If there is any data that is a) not publicly accessible or b) contains personal information, I cannot authorise the use of a web based third party tool. There is just too much risk that some bad actor uses this as a method to soak up data.

I would love to verify /validate that all of the processing is local and have some way to certify if this hasn't changed.

hmsimha 5 years ago

It would make it much easier for users to visually parse the JSON section if you added `font-family: monospace` to the textarea element

okumurahataOP 5 years ago

Done.

me_bx 5 years ago

Nice, I like the look of the editable table.

Shameless plug: a similar solution, working all client side, not imposing to use a key as first column, and with options regarding CSV format.

https://mango-is.com/tools/csv-to-json/

867-5309 5 years ago

this can turn an HTML table into JSON with the option to download the JSON, or it can turn JSON into an HTML table with no option to download a CSV -- where does CSV come into this?

also, clearly javascript is a bit too ambitious for the job when e.g. PHP could provide the intended functionality with two lines of code: foreach($arrays as $values){echo implode(',', $values) . "\n";} echo json_encode($arrays);

also, CSV is more for storing rigidly-structured uniform columns and rows, whereas JSON is more for storing loosely-structured varying objects, otherwise you're redeclaring column headings in every array, which wouldn't make much difference for gzipped transport but still wasteful and verbose nonetheless. column headings are usually the first line of a CSV

laumars 5 years ago

If you’re using JSON for tables then you’re much better off using jsonlines. It’s got a properly defined specification (unlike the wishy washy spec of CSV which every maintainer seems to implement differently) so you’re less likely to garble your data while still having all the benefits that CSVs do. Plus a lot of JSON marshaller will natively support jsonlines despite not advertising that functionality.
Personally I’d recommend jsonlines over regular CSVs these days but I’ve had so many issues with CSV parsers being incompatible over the years that I’d welcome anything which offers stricter formatting rules.
I’d definitely recommend you check it out. https://jsonlines.org

oweiler 5 years ago

Why doesn't the library transform the csv into a an array of json objects?

aae42 5 years ago

i was wondering if there would be an option for this in the UI somewhere, but it doesn't look like there is
i wonder if it requires a "key" column to serve as the dictionary key
EDIT: i kind of wouldn't mind a HN discussion on which is better... ruby is my go-to scripting language, so i find this structure very natural, i've found when trying to do the same things with Go, i much prefer things to be structured more like an array of objects
would be interesting to hear the merits of both

brianzelip 5 years ago

FYI, the view on small devices is pretty bad - the demo json output is almost unreadable without an awkward pinch + scroll. Compare this view to the same content on the Readme via GitHub.

okumurahataOP 5 years ago

Fixed.

ddgflorida 5 years ago

Shameless plug - I wrote convertcsv.com and it supports about everything you can think of as far as format conversions. JSON, XML, YAML, JSON Lines, Fixed Width, ...

AbhyudayaSharma 5 years ago

You can do this in Powershell

    cat file.csv | ConvertFrom-Csv | ConvertTo-Json

darrenf 5 years ago

`jq` can transform CSV to JSON and vice-versa, especially for simple/naive data where simply splitting on `,` is good enough - and where you aren't too bothered by types (e.g. if you don't mind numbers ending up as strings).

First attempt is to simply read each line in as raw and split on `,` - sort of does the job of, but it isn't the array of arrays that you might expect:

    $ echo -e "foo,bar,quux\n1,2,3\n4,5,6\n7,8,9" > foo.csv

    $ jq -cR 'split(",")' foo.csv
    ["foo","bar","quux"]
    ["1","2","3"]
    ["4","5","6"]
    ["7","8","9"]

Pipe that back to `jq` in slurp mode, though:

   $ jq -R 'split(",")' foo.csv | jq -cs
   [["foo","bar","quux"],["1","2","3"],["4","5","6"],["7","8","9"]]

And if you prefer objects, this output can be combined with the csv2json recipe from the jq cookbook[0], without requiring `any-json` or any other external tool:

   $ jq -cR 'split(",")' foo.csv | jq -csf csv2json.jq
   [{"foo":1,"bar":2,"quux":3},
    {"foo":4,"bar":5,"quux":6},
    {"foo":7,"bar":8,"quux":9}]

Note that this recipe also keeps numbers as numbers!

In the reverse direction there's a builtin `@csv` format string. This can be use with the second example above to say "turn each array into a CSV row" like so:

   $ jq -R  'split(",")' foo.csv | jq -sr '.[]|@csv'
   "foo","bar","quux"
   "1","2","3"
   "4","5","6"
   "7","8","9"

And to turn the fuller structure from the third example back into CSV, you can pick out the fields, albeit this one is less friendly with quotes and doesn't spit out a header (probably doable by calling `keys` on `.[0]` only...):

    $ jq -cR 'split(",")' foo.csv | jq -csf csv2json.jq | \
    > jq -r '.[]|[.foo,.bar,.quux]|@csv'
    1,2,3
    4,5,6
    7,8,9

I don't consider myself much of a jq power user, but I am a huge admirer of its capabilities.

[0] https://github.com/stedolan/jq/wiki/Cookbook#convert-a-csv-f...

0x008 5 years ago

This is the kind of comment I came here for.

gspr 5 years ago

I'm sorry, but why is this a website?

AnthonBerg 5 years ago

So that we may have this discussion and get out of this strange rut that is the care and feeding of idempotent little formats.
And to educate! To show each other. To make knowledge discoverable. That’s the reason websites like this are honestly a very good thing.
- okumurahataOP 5 years ago
  
  Thanks, AnthonBerg.
  - AnthonBerg 5 years ago
    
    Kudos for doing the work! :bow:
    I see an appropriate beauty in your username representing an information propagation model – of modern waves in modern habitats.
    I do hope that my comment about a “rut” and “little formats” doesn’t disparage the work. I try to speak for enlightenment but sometimes I fall into lamenting the darkness.
me_bx 5 years ago

Why not?
Some users don't like to install too many desktop applications and rather use simple web apps...

stevage 5 years ago

How do you actually load CSVs into it?

okumurahataOP 5 years ago

Feature added (see comment below).

lettergram 5 years ago

Can this handle uploading csvs?

okumurahataOP 5 years ago

Feature added. You could only add CSV data by copying/pasting on the table, but now you can upload a CSV file as well.
https://github.com/erikmartinjordan/jsonmatic/blob/525b7fbc9...

luming 5 years ago

You should use outline instead of border in your cell css.

codetrotter 5 years ago

Why?
I read https://css-tricks.com/almanac/properties/o/outline/ and it says
> The outline property in CSS draws a line around the outside of an element. It’s similar to border except that:
> 1. It always goes around all the sides, you can’t specify particular sides
> 2. It’s not a part of the box model, so it won’t affect the position of the element or adjacent elements (nice for debugging!)
> […]
> It is often used for accessibility reasons, to emphasize a link when tabbed to without affecting positioning and in a different way than hover.
I guess this is why you said outline should be used instead in this case.
luming 5 years ago

And ::focus-within pseudo-class.

Settings

Show HN: Transform a CSV into a JSON and vice versa

Keyboard Shortcuts