Show HN: Choose – An alternative to cut and sometimes awk
github.comI think it is very cool that Rust has led to a renaissance of rewriting classic Unix tools to make them fit more with current use. Unix was never meant to stand still. It just happened that AT&T broke up and it took a while for Linux to catch up, and by then people got used to the idea of a fixed set of POSIX utilities. But their CLIs are often quite bad, and security was never a consideration in the olden days, so it’s good see them re-evaluated.
> I think it is very cool that Rust has led to a renaissance of rewriting classic Unix tools to make them fit more with current use. Unix was never meant to stand still.
Technically it has not, even for the core tools they've been getting extended, usually incompatibly, in both GNU and BSD lineages. Though it's pretty funny how much the rust community has been taken up by providing alternatives and replacements for "classic" (POSIX) utilities.
While development has not stopped, there haven't really been many advances in improving the syntax. Many of these tools are stuck in awkard, unintuitive syntax, which is cumbersome unless you use them frequently. I feel like the renaissance has lately been one of usability, which I personally really appreciate. Obviously, hard-core daily users would disagree, but considering I use `awk` at most once every 6 months, I hate that I need to spend 20 minutes re-learning how to use it every single time, particularly for basic purposes.
Hmm, so compared to cut this
1. saves -f because it doesn’t support cut’s -b and -c modes (edit: actually -c is supported, I just didn’t see it);
2. Uses -f instead of -d, making it rather confusing for cut users;
3. Uses : instead of - for range specifications;
4. Offers an exclusive indexing mode;
5. Misses a bunch of other cut features (assuming coreutils cut).
Not sure I see much appeal...
Edit: Another thing I missed: regex separator instead of just character list.
> Not sure I see much appeal…
The appeal is the same as replacing grep with a fancier searcher:
1. it has good and sensible defaults (field mode, also I'd have to check but hopefully and unlike cut it doesn't print the entire line when it's unhappy with the selection you asked for, that's worse error handling than ed) (edit: confirmed, if you give `choose` nonsensical selection it doesn't print anything e.g. if you ask `cut` for columns 10-15 of data with 3 columns it's going to print the source as-is, choose is properly going to print a bunch of empty lines, that alone makes it better than cut)
2. It works better on actual data, which is generally whitespace-separated rather than tab-separated, meaning cut requires preprocessing before it'll do anything of use
Can you massage cut or the data to fit? Yes, in the same way you can massage grep or your data to fit. That you don't have to and the utility behaves sensibly by default is appealing. This exact thing is one I've been thinking about for some time now, I'm glad somebody else agreed and did the legwork.
To get the appeal, give a generalized cut version of:
echo -e "foo bar baz" | choose -1 -2
I can't understand why you are mentioning awk. Cut or choose cannot be compared to awk, awk is a programming language.
Also I don't think that it's so much easier to use than cut. On the other hand every *nix system has cut so if you make scripts with it they are portable.
> I can't understand why you are mentioning awk. Cut or choose cannot be compared to awk, awk is a programming language.
Because 99% of awk IRL use is just as a fancier cut.
It's very rare someone even sets a variable using awk. If you do it, you are a statistical rarity.
> Also I don't think that it's so much easier to use than cut. On the other hand every *nix system has cut so if you make scripts with it they are portable.
I, for one, never remember the syntax for cut. If "choose" gets a deb, I'll use it: Python slicing is something familiar to me.
I don't care if cut is on every unix system: if I have the possibility to install things on the machine, then I'll just install what I need. I have a script for that. If I don't, I'll google/man/--help GNU commands as usual.
And as for writing shell scripts, I use Python anyway.
> Because 99% of awk IRL use is just a as fancier cut.
You say "fancier", I say "working": since cut can't work on general whitespace without a pre-processing phase (e.g. tr), it simply doesn't work for the vast majority of the things I try to shove into it, and I pretty much always end up using awk instead.
Choose means my awk use will fall down by 99% or so.
Agreed.
In fact, anybody promoting cut, please give me the cut version of:
It should work on an arbitrary number of spaces, and fields.echo -e "foo bar baz" | choose -1 -2The oneliner is going to be... interesting.
Now you can do it with awk using:
But it's neither easy to type, nor to remember.echo -e "foo bar baz" | awk '{ print $NF " " $(NF-1)}'Choose is what cut should have been.
By using only basic functionality that's easy enough to remember I guess I'd go with something like
`echo -e "foo bar baz" | tr -s ' ' | rev | cut -d ' ' -f 1-2 | rev | awk '{print $2 " " $1}`
Everything except the awk part is something that I use all the time and is easy to type & remember.
To be honest I'd use `choose` if it was available everywhere, but for string manipulation I can't justify using nonstandard tools since they aren't always available.
Every now and then there are some new ones I actually start to use. For example `ripgrep` mostly replaced `grep -R` for me some time ago, a lot of it has to do with the fact that if `rg` is not found I can fallback to normal grep and get the same result, just a bit slower.
I guess my point is that while I do appreciate innovation & making better tooling, the hard part always is getting the tool where it's most needed.
cut? I don't even leave the shell.
echo ... |while read a b x ;do ... ;done
For what it's worth, BSD cut has a `-w` flag for separating on general whitespace.
> BSD cut
Only FreeBSD, and it's somewhat recent (it was apparently added in 2012, in 9.2, so it's not in osx either).
> regular expression field separators using Rust's regex syntax
This actually makes choose a cut-killer for me. It can be frustrating having to figure out which delimiters to use - tabs or spaces? If spaces, you'll have to chain it with tr, or resort to awk.
And choose makes it even better by using "\s" as a default separator. So you usually don't have to specify a separator at all.
Not sure why it is even compared to awk instead of just cut. It could've been introduced as cut-like command with regex input field separator. Or at least not say things like:
>However, the awk command is not ideal for rapid shell use
And
>cut is far from ideal for rapid shell use, because of its confusing syntax
anything new is confusing until you learn enough to be comfortable
>ranges are just plain difficult to get right on the first try
and how does choose become easy to use with ':' character instead of '-'
Is this a typo or does inclusive/exclusive depend on whether first number is specified?
>choose 2:5 # print everything from the 2nd to 5th item on the line, _inclusive_ of the 5th
>choose :3 # print the beginning of the line to the 3rd item _exclusive_
I heard those arguments when ag went out as an alternative to grep, and ffind as an alternative to find.
But now, I install their successors, ripgrep and fdfind, on all my machines. Including the windows ones.
I do not have an issue with this command, regex field separator and negative index alone makes it a good alternative over cut. And I use ripgrep too. I'm bothered by the description.
I would even suggest the command to add ability to invert the ranges, byte selection (if -c is character and not byte selection), add examples for character splitting in README, etc.
There are literally thousands of Unix tool replacements released on GitHub, and more every year.
Just off the top of my head, fex and miller are alternative cut-likes with field extraction.
Unless a new unix-tool-alike is significantly better and backwards-compatible, HN and most greybeard nixers tend toward conservatism. The old tools are usually good for 95% of use-cases anyway. There's just way too much to keep track of if you're eager to switch to any old shiny new thing.
Tools like rg offer additional features and/or save you from tediously specifying many options. This one gives you -f for free, that’s about it. Detailed comparison: https://news.ycombinator.com/edit?id=23445931
- it supports regex separators. That's a great feature to me.
- the default separator is "\s", like python's split(). Just for that I will adopt it: not having to care about tabs/spaces/mixes is a much better experience.
- it has negative indexes, again like python. Getting the last field, or the last nth field, is something common enought. I don't want to rewrite the thing with a twisted double "rev" with proper index. And I don't want to have to google it.
- plus the syntax is just must easier to remember to me. When I use cut, I always try: "echo 'foo bar baz' | cut 2", just to realize that I need to pass '-f', then I do "cut -f 2", and get stump, and google it, to then remember I need to pass the delimiter explicitly even if it's a space.
- it works the same on windows. I dual boot.
Compare:
To:echo -e "foo bar baz" | choose -1
cut is, to me, the opposite of a friendly API.echo -e "foo bar baz" | rev | cut -d ' ' -f 1 | revSomething so basic in the Unix world should have sane default.
Default are not sane if I have to google it once out of two.
One more great feature of choose which cut doesn't have: not returning garbage output on garbage input.
If you give cut columns which don't exist, it's going to output the entire source as-is.
> Is this a typo or does inclusive/exclusive depend on whether first number is specified?
I hope it's a typo given:
> choose -3:-1 # print the last three items from a line
is clearly inclusive (otherwise it'd print but the last one), and there's a very explicit flag for inclusive ranges. Might be a good idea to open an issue just in case.
I not sure how to feel wrt using Python's range syntax with different inclusivity (by default) though.
edit: after installing and testing, it does seem like an error in the readme, the end is inclusive whether a start is provided or not. That is, `:3`, `0:3` and `1:3` all yield the 4th field.
Can it output fields in an order other than their input order? That's the one thing I regularly wish cut could do. I would like the output of the second cut below to be "3,1", not "1,3".
$ echo "1,2,3,4,5" | cut -d , -f 1,3
1,3
$ echo "1,2,3,4,5" | cut -d , -f 3,1
1,3Yes:
Note that the indexing starts with 0, "-d" is "-f", and a range is denoted by ":" instead of "-" which is used for indexing from the end.$ echo "1,2,3,4,5" | choose -f , 2 0 3 1 $ echo "1,2,3,4,5" | choose -f , 2:0 3 2 1awk -F, '{print $3","$1}'
I love my coreutils replacements...not that they're in Rust but because they're generally faster and easier to use. fd, rg, bat and now i shall use choose! I almost always have to lookup awk's syntax but the defaults in choose seem trivial to remember. Thanks for making this!
I am not sure why being zero-indexed is considered a feature. I have no problem using a zero-indexed system, but I've never really thought of it as a feature. Is there something I'm missing that makes zero-indexed systems faster, easier to use or otherwise better than one-indexed system?
There's a better reason than this that I'm forgetting, but never underestimate the power of being the same as what people are already familiar with. Every time I have to write lua or read some Matlab, the mental overhead of having to remember everything is one-indexed is just incredibly annoying.
Anyone used to command-line tools is used to fields being 1-indexed.
awk uses $0 as the whole line, and $1 as the first field. cut uses -f1 as the first field $1 is the first argument to a posix shell script /1 is the first matched reference in a sed $1 is the first regex match in perl
A command-line tool being 0-indexed breaks from expectation of what everybody is used to using on the command line.
Anyone is a generalization based on a narrow viewpoint. I would say I am pretty used to command-line tools at this point, at least enough to be using linux as a daily driver comfortably. And I frankly I didn't know 1-indexed fields were the norm, even though I knew $1 is the first argument to a poxis shell script (I always assumed $0 referred to the script or command itself).
That's fair, I overgeneralized.
I get what you're saying, but to be more general I'd argue that's just 0 indexing with the API specifying what's in what index.
I'd more "formally" define 0/1 indexing as:
Zero indexing: arr[0] is a valid way to address the first element of an array, and len(arr) - 1 is the index to the final element.
One indexing: arr[0] results in an error or an out of bounds access, and len(arr) is the index to the final element.
These statements are true in Matlab, but not most command line tools.
Inspired by a remark about Python's default split behavior in comment https://news.ycombinator.com/item?id=23446146 I wrote a Python oneliner for field selection works similar to "choose" but throws exceptions when the field cannot be found:
$ echo " a b c" | choose 1 2
b c
$ echo " a b c" | python3 -c 'import sys; [print(f[1], f[2]) for line in sys.stdin if (f := line.split()) or True]'
b cSomeone should make a bundle installer with this, bat, fdfind and ripgrep. I do enjoy those alternative to GNU, and install as many as I can: they are easier to use, usually faster, and just make more sense to my brain.
This pain is real: https://xkcd.com/1168/
There is https://github.com/uutils/coreutils implemented in Rust
Yes but their goal seems to be API compatible, which I understand the point of, but is not useful to me.
What does this do that cut can’t?
This is a poor question as it invokes the Turing tarpit. (Why using any language that is higher level than machine code?)
If it is more comfortable to use for some people then it’s a great invention.
Select fields from output with space-delimiters (commands like `docker images`, which have to be preprocessed with tr)
Nothing, but try doing a general version of:
With cut.echo -e "foo bar baz" | choose -1 -2echo -e "foo bar baz" | xargs | cut -d\ -f1,2
Very cool. An inverse mode to suppress matched fields might be a neat feature.
Very nice. Good work.