Use Long Options in Scripts
matklad.github.io297 points by OptionOfT a month ago
297 points by OptionOfT a month ago
Please DO NOT mix string interpolation and command execution, especially when a command is processed through the shell. Whatever your language, use a list-based or array-based execution API that passes arguments straight through to execv(2), execvp(2), etc, bypassing the shell.
Was waiting for this comment :P
The API used handles string interpolation correctly: the string literal is parsed at compile time, and the interpolated arguments are never concatenated or escaped, and end up directly as an element of arv array passed to a child. See
https://github.com/tigerbeetle/tigerbeetle/blob/7053ecd2137a...
This approach creates an odd mini language, which is incomplete:
comptime assert(std.mem.indexOfScalar(u8, cmd, '\'') == null); // Quoting isn't supported yet.
comptime assert(std.mem.indexOfScalar(u8, cmd, '"') == null);
But you can do correct interpolation with simple shell variables, rather than generating shell code strings: $ today='foo; bar' sh -c 'argv git switch --create "release-$today" origin/main'
['git', 'switch', '--create', 'release-foo; bar', 'origin/main']
So that is a test that we can use a plain shell string, without any shell injection bug. (argv is my command to test quoting: python -c 'import sys; print(sys.argv)' "$@" )Note that there's no escaping function needed, because we're not generating any shell code. We're generating an argv array for `/bin/sh` instead.
---
So by invoking with an env var, you can easily create a correct API that uses plain shell
git switch --create "release-$today"
rather than git switch --create release-{today} # what language is this? It's not obvious
If you don't want to use the env var, you can also use git switch --create "release-$1"
And invoke with ['sh', '-c', shell_string, 'unused-arg0', today_string]
With this approach, you don't need 1. any kind of shell escaping
2. any analyzing of pseudo-shell strings, which can't contain quotes
Because you are not generating any shell code. The shell code is constant.Why would they even change the language and the commands in the example? It confuses and undermines the point. Just say “use `git switch -c my-new-branch` for interactive usage and `git switch --create my-new-branch` in scripts”. It makes no sense to introduce other unexplained information.
Another approach is to have powerful enough language that allows you to guard against the shell injection. I wrote a syntax form allowing to do this:
(sh "cat " file " >" output)
With file being bound to "foo'bar" and output to "x", it is automatically translated into cat 'foo'\''bar' >'x'
This gives you the flexibility to use shell (sometimes it just is the most concise way) while being safe against injection.I believe for example in rust you should be able to do the same.
How do you know which shell you're escaping for? You could query the system, but now you end up implementing escaping for every shell out there.
Good question. I care only about POSIX compatible shells, so the escaping just follows the POSIX rules. In practice that means it works on any actually used system except windows, which is fine with me.
Miniature, in-line sh scripts are also fine as long as you use the provided parameter substitution.
If you’re averse to this:
q(“select x where y = ‘“ + v + “‘“)
And instead do this: q(“select x where y = %s”, v)
Then you should be averse to this: x(“foo --option ‘“ + v + “‘“)
And instead do this: x(‘foo --option “$1”’, v)
This is particularly useful when it’s expedient to have one thing piping into another. Like it or not the sh DSL for pipes is excellent compared to doing things natively with execve() and pipe(), just as doing group by and count is far more concise in SQL than doing so natively.Most SQL libraries give you something like q. Writing your own x is as simple as calling sh correctly. In Python, for example:
def x(script, *args):
run([“sh”, “-c”, script, “--“, *args])
Neither of those are equivalent to variable binding, which is what most SQL libraries provide, specifically because they don't actually solve the problem since they're still doing string substitution. Putting a double quotes in $1 in your "good" execute example will allow you break out of what's expected and then you're Bobby Tables.
Your python example at the bottom is correct, in that each separate element is more correct in that it allows each arg to be passed as an element, so there's no option to break out through quoting characters. SQL binds are like that in most ljbraries, even if they don't look like it. The parser knows a single item below there so if it passes it along as such. You cannot escape it in the same way.
I don’t really follow. My “good” example and the code at the bottom are the same.
sh is smarter than just doing string interpolation and ”$1” is passed on as a single argument, no matter what:
> run(["sh", "-c", 'echo "$1"', "--", 'a"'])
a”
Whereas if it were simple string interpolation, you’d see this: > run(["sh", "-c", 'echo "a""')
--: 1: Syntax error: Unterminated quoted string
It’s the same special casing that gets "$@" right.That requires you quote the param in the sintr to ensure that params are groups as expected. E.g.
# cat pvars.sh
#!/bin/bash
echo "\$1=$1"
echo "\$2=$2"
echo "\$3=$3"
# sh -c './pvars.sh "$1" $2 $3' -- 'a b' 'c d'
$1=a b
$2=c
$3=d
The whole point of passing in an array and using something like exec (or system(), if provided as it handled the fork and wait for you) is that you avoid the overhead of the shell starting up at all and parsing the command line, and it lets you define each param exactly as needed since each param is its own array item. You don't need to worry about splitting on space or the shell splitting params on space, or quoting to group items. If you want the param to be: foo "bar baz" quux
as one singular parameter, you just make that the contents of that array item, since no parsing need be done at all.If you have an array of params and you're jumping through hoops to make sure they're interpreted correctly by the shell you call execute a process, you're likely (depending on language and capabilities) wasting both cycles and over complicating the code when you can just call the program you actually want to execute directly and supply the params. Alternatively, if you have all the params as one long string and you want it to be pased as a shell would, then execute the shell and pass that as a param. e.g.
# perl -E 'system("./pvars.sh","a b","c d");'
$1=a b
$2=c d
$3=
# perl -E 'system("./pvars.sh","a b","c","d");'
$1=a b
$2=c
$3=d
Thanks for explaining. I feel like we’re talking past each other but it’s my mistake. I should have said it is only useful (not “particularly useful”) if one has compound statements like a pipe or multiple commands. Invoking sh just to run a single command is superfluous and you are right that reaching directly for execve() is better.
Ah, yes. If you want to take advantage of piping commands together through the shell as a subcommand of your program, then a way to make params behave more consistently regardless of content is useful.
For anything involving file paths, user input, etc. -- yes of course. It's not even a question because they would need to be escaped otherwise which nobody wants to do.
But for a simple example like this where it's inserting a date which has known properties, it seems fine, and is much more readable.
Tbf this input does not need escaping.
But at the very least the shell is unnecessary here.
Why not?
Any time you send commands and data down a single channel, user input that's intended to be data can be misinterpreted as a command. For example, if your program wants to:
run("program --option '{user_input}' > file")
to save some input to a file, and the user's input is: '; bad_command #
then when run() sends that string to the shell, the shell will run: program --option '';
bad_command #' > file
Most languages have something like a safe_exec() that separates the shape of the command from the values of the options, executing "program" with the options and the user_input in the arguments array as data. Skipping the shell step, which would just be building an exec call anyway, removes the opportunity for users to confuse it into doing something else.The list-based API alternative they recommend might look like this:
safe_exec(["program", "--option", user_input], stdout="file")
and it would always exec "program" with argv[1] == "--option" && argv[2] == user_input. If the user_input happens to be: '; bad_command #
...well, then, the user can enjoy the contents of their file.Yes of course. But why would you expect me to run shell commands with random person's input? Also:
safe_exec(["rm", user_input])
This isn't safe either! Despite clearly saying "safe_exec"!Yeah, "safe_exec" is a useless name without context. But the context was you need to call a program from another program. Many people would call system() or whatever because usually it's obvious and easy, and the pitfalls are less so.
Shelling out is not the only option. People are just saying not to use that option. Better ones won't save you if you purposely do something stupid. They will save you if the user wants to trick you into doing something else.
A nearby comment mentioned escaping. I guess that might be a good reason to use execv?
SQL injection on steroids.
Only if you are getting input from untrusted users
imo it's best to just avoid it altogether. Requirements change, and what was once a trusted input can become untrusted input.
Generally you shouldn't be passing random data from the web to shell scripts. Maybe I haven't done the right type of work but having to deal with fidlg bits it's much more likely not passing it to be shell will cause issues (with stuff like executable paths)
Or if your trusted users are fallible and could be tricked into providing unsafe inputs.
I prefer long options too. However, while writing programs that need to invoke POSIX commands in a portable manner, short options are the only viable choice, as POSIX doesn't specify long options. For instance, see the specification for diff at <https://pubs.opengroup.org/onlinepubs/9799919799/utilities/d...>, or that of any POSIX utility listed at <https://pubs.opengroup.org/onlinepubs/9799919799/idx/utiliti...>.
That said, this is more of a corner case. In most scenarios, rather than relying on POSIX utilities, there are often better alternatives, such as using library bindings instead of spawning external processes. For example, instead of invoking grep, using something like libpcre could be a more efficient choice.
For non-POSIX utilities like git, hg, rg, ag, etc., using long options makes perfect sense.
> However, while writing programs that need to invoke POSIX commands in a portable manner
...probably a stupid question, but something I have earnestly been wondering about... when does this actually happen nowadays? What POSIX systems are you targeting that aren't one of the major ones (Linux, Darwin, or one of the major BSDs)?
I was writing a shell script a few months ago that I wanted to be very durable, and I targeted sh instead of bash just because, well, it seemed like the correct hacker spirit thing to do... but I don't actually know what system in the past decade (or more) wouldn't have bash.
I ~recently had to wrestle w/ TeamCity (CICD) whose build agents provide only sh. I needed to invoke a 3rd-party util that required bash. The resulting "bash in dash in docker in docker" worked, but I wasn't thrilled about the convoluted / Frankenstein setup.
> ... but I don't actually know what system in the past decade (or more) wouldn't have bash.
There's some ambiguity about "have bash". If "having" bash means that (some version of) bash has been ported to the system, there are indeed very few. If "having" means that bash (supporting all options that you need) is available to the user, that could be a lot more. As others have noted, the BSDs, Android and many embedded Linux systems don't come with bash pre-installed, MacOS pre-installed bash is stuck at version 3.2 (which doesn't have associative arrays), and the user could be in an environment that does not allow them to install whatever they need.
Alpine docker images only come with dash instead of bash, which _may_ run your sh script, but test thoroughly. Or just install bash.
FWIW, Darwin/macOS is especially guilty of gobsmackingly ancient coreutils that don’t support long option variants.
Is it? I'm with you on gobsmackingly ancient, but it's "doesn't support long options" which I haven't bumped into. I do replace some coreutils, but not all of them.
What's a good example of such a utility?
For example, sed. macOS sed doesn't support long options. Not even --help or --version
(Running an older version of macOS so can't completely exclude this has been updated in a newer version, but I'd be surprised to learn that was true.)
macOS doesn't have GNU coreutils at all. It has the utils from FreeBSD.
The gobsmackingly ancient GNU software it does have is bash, because it's the last version under GPL 2. I've used Mac OS X since 10.1, so I remember when the default shell was tcsh and /bin/sh was not bash.
That's (basically) the case again on the last few macOS releases. Today, zsh is my shell of choice, including on Linux.
the alpine default shell is called "ash", "dash" is the debian/ubuntu default shell
Where I can sometimes get burnt is busybox.
I more often get burnt in zsh to bash than that however
> I don't actually know what system in the past decade (or more) wouldn't have bash.
I have written a bit more about it in these comments:
FreeBSD doesn’t come with bash though.
But it also have drawbacks :)
But being honest - you can install BASH that way:
# pkg install -y bash
Major ones are enough. Linux and Darwin (that is, macOS and GNU userspace, really) differ sufficiently that you need to pay attention or limit yourself to POSIX. E.g. sed and wc burned me a few times with scripts that need to run on both.
> but I don't actually know what system in the past decade (or more) wouldn't have bash.
I think MacOS still has bash, so that it, technically, doesn’t count, but it doesn’t have a bash from the past decade, and uses zsh by default.
Note that grep in particular is extremely optimized.. If you have multi-gigabyte files, and you only search for one thing, shelling out to grep will likely have much better performance that doing it yourself.
But not every system needs that much, and in a lot of cases, using your language's regexp library will be more robust anf easier to write.
Agree that long options should be used. But there is one caveat to consider: portability.
Sadly to this day not all BSD distributions have GNU style long options. And the ones that now do only got them fairly recently. So if you want portability you have to use short options as you weep with a bottle of vodka in hand.
Not trying to spam this thread with praises of nix, because it does have its own problems, but it certainly solves the portability problem.
Four years in to using it at work for dev environments across mac (x86 & ARM) and various linuxes and can’t imagine going back. I also always make dev environment definitions for my open source projects, so even if people aren’t using nix, there is at least a record of what tools they will need to install to run scripts, tests, etc.
Does nix work well on BSD-derived Unices? In particular, the most widespread of them, macOS?
Yes, works great on Mac. About half our engineers us Macs, the other half Linux. We have one nix configuration for the dev environment, which works for everyone.
This surprises me because the first case I remember ever coming across where short versus long options impacted portability across GNU and BSD was _fixed_ by using long options. Maybe six years ago or so I had an issue porting a script someone else had written for use in CI that happened to decode some base64 data that failed when I tried to use it on a different platform. I forget which one it was originally written for and which one I was trying to use it on, but the issue boiled down to the MacOS version of base64 using the BSD short option for decode and Linux using the GNU one, and they each used a different capitalization; one used `-d` and the other used `-D` (although I also can't remember which used which honestly). My solution was to use the long option `--decode`, which was the same on both of them, and since then the times I've needed to decode base64 I've always used the long option out of habit, which probably explains why I can't remember what option Linux uses despite it being the one I've used far more over the years since then.
I think the right way to think about this (if your goal is to avoid surprises at least) is that options (short or long) are just strings. There's no guarantee that there's a long variant of an option. There's not even a requirement that options start with a dash. A sufficiently brain-damaged developer could start them with a slash or something.
If you're going for portability the best bet is to just read the manual for each of the separate versions and do whatever works.
To this day, I write tar options with no dash, simply because I can. `tar cvzf foo.tar.gz ./foo`
I would never write a new program with this option, but I do find it a delightful historical oddity.
I've noticed that it seems to be a pattern that's used for other compression/decompression software as well. Sometimes mods I use for games will be uploaded as rars or 7zips (I guess because this stuff gets developed on and for Windows, and tarballs aren't really something people use much there), and the CLI invocations I use to extract them always look off to me, especially the 7zip one: `unrar x` and `7z x`.
That sounds reasonable to me. If anything, I might even go further and say that reading the manuals wouldn't be enough to fully convince me without also actually testing it by running a script on a given platform. It's not that I don't trust the manuals to be right, but I have less trust in myself to write bug-free code than probably any other language I've ever used, and I don't think I'd feel confident without verifying that I actually did what the manual said correctly.
Also, do not forget using “--” after all options, but before any dynamic arguments, just to be safe.
I know to do this intuitively, but I have no idea why.
It terminates argument parsing, so anything following it that starts with a hyphen will not be treated as an argument.
$ echo 'hack the planet' > --help
$ cat --help
cat: illegal option -- -
usage: cat [-belnstuv] [file ...]
$ cat -- --help
hack the planet
$ rm -vf --help
rm: illegal option -- -
usage: rm [-f | -i] [-dIPRrvWx] file ...
unlink [--] file
$ rm -vf -- --help
--help
$ cat -- --help
cat: --help: No such file or directory
It tells the shell utility that any remaining arguments are not options, but instead files or whatever the script might process. You know, in case someone makes a file called -rf.
But not all shell utilities follow this particular convention
Most of the commonly used ones do, so it's easiest to just always do it and then remember the two or three utils that don't like it.
Yes. It’s more of a convention. If a shell utility isn’t following it then it won’t mean what we think it would mean.
Also - it helps a lot when a utility accepts a path that may or may not contain hyphens.
Yes. The famous echo on Linux systems does not have it and therefore it's impossible to print the string "-n o p e", because -n will be interpreted as an option.
echo is not portable anyway, use "printf %s STRING" or "printf '%s\n' STRING".
Yes, that's what I use. Sometimes I still get tempted to use echo because there's less typing...
It does if single or double quotes are used, right? Which would be necessary (or preferred to multiple backslashes) quite often.
No, the quotes are not seen by the program. The program receives a list of strings, it does not get the information about whether and how those strings were originally quoted in the shell. Programs can also be directly called with lists of strings as in execve, so often it does not even make sense to ask if the arguments were quoted or not.
Quotes live on a different level of abstraction.
> No, the quotes are not seen by the program. The program receives a list of strings, it does not get the information about whether and how those strings were originally quoted in the shell.
With quotes the program will receive a single argument -n␣o␣p␣e instead of multiple ones -n, o, p, e. At least it works on the machine here:
]$ echo "-n o p e"
-n o p e
]$ /bin/echo "-n o p e"
-n o p e
Yes, I think there was some misremembering here. The nontrivial thing is to print out -n itself with echo. For example, echo doesn't treat "--" specially, so "echo -- -n" prints "-- -n".
Note that this is true for POSIX sytems but not e.g. for Windows. There the program receives the command-line as-is and is responsible for parsing it into an array. There are two different standard functions to do this parsing for you (with slightly different quoting behavior) but you could also create your own that requires options to not be quoted.
It's worth it just to watch the frustration of a junior when they try tacking more arguments on the end of a command.
A great opportunity to teach the importance of reading the whole command before trying to modify it.
Unfortunately, if you want your scripts to be portable to other POSIX systems you might have to use the short options, as the long ones are not standardized. You have to decide the tradeoff for yourself.
Using nix has really spoiled me on this. Everyone gets the same versions of all the CLI utilities in the dev environment, whether on mac or linux, and those are the same versions that run in CI and any prod systems. It’s really nice being able to use whichever newer bash features or gawk extensions you like, without having to deal with trying to ensuring the mac engineers have brew-installed all the right stuff to match a standard linux env.
nix didn't solve your issue here. nix didn't do anything. You're just describing the benefit of a reproducible development environment. You could do the same thing with brew, pacman, apt, or by just compiling every package from source from some huge mirror.
It's exactly the same thing people initially loved about docker or vagrant.
Sure, but it works on Mac and Linux and doesn’t require virtualization. I think brew might qualify, but it can’t define which environment variables should be available in the developer shell or which hooks to run upon entry.
I don’t think any of the other options you specified can manage the same thing.
Also you can use nix-shell as the shebang, and then pass a second line to tell it what program to use to interpret the rest of the script and all of its dependencies. It'll fetch the shell utilities in the versions you want when the script is run.
>[...] it can’t define which environment variables should be available in the developer shell or which hooks to run upon entry.
Neither pacman, apt, nor any other package manger require any sort of virtualization. pacman works fine where-ever you have a C compiler including macos, linux, windows, probably even TempleOS. Whatever you want.
If you want to add something to the user environment system wide, the traditional thing to do is to dump a file into `/etc/profile.d/` which will be sourced during shell startup. If you instead want something local to the project, you just make a script that the developer can source, like a python virtualenvironment.
I'm not saying any of these ideas are bad. I am saying that they are easily solvable and have been solved for the past 20 years. Without Nix.
Everyone has to use nix :)
But yes, that is nice.
That is the caveat. I initially set it up such that it wasn’t required: you could choose to use it if you wanted to, and otherwise here is a list of specific versions things you must install, etc. Everyone ultimately chose to use nix, and now it’s required. Makes for a pretty easy setup though for new devs: install nix, then run `nix develop`, then `make setup`, and you’re off to the races.
What POSIX systems in actual use (not historical Unixes) don't have the long options? macOS' BSD utilities I guess?
> What POSIX systems in actual use (not historical Unixes) don't have the long options?
All of them except for GNU, AFAICT? (That is, only GNU seems to have long options.) Checking manpages for rm(1) as a simple reference, I can't see long options in any of the 3 major BSDs or illumos, and checking Alpine Linux seems to show busybox also only doing short options (sorry, can't find an online doc for this, though it's easy to check in docker if you don't have a machine running Alpine handy). OpenWRT also uses busybox and has the same (lack of) options.
https://man.freebsd.org/cgi/man.cgi?query=rm&apropos=0&sekti...
More than this, the gnu utilities often have options that don't exist at all on other platforms, in either long or short form.
You can also brew install tools like gnused which have the same arguments. Not a viable option for all situations but if you just need to execute it on Linux and your local machine for dev you can use those.
I agree with this practice. Another benefit is it makes it easier (slightly, but still) to grep the man page for what the options do.
The corollary must be "write programs that take long options".
And put them on separate lines so you can track and git blame them more easily.
Same line git blame is not that hard, just list commits affecting specific file or even specific line span: https://git-scm.com/docs/git-log#Documentation/git-log.txt--...
Before invoking a command, always first check if the length of the command is not longer than ARG_MAX. For example, if this is your command:
grep --ignore-case --files-with-matches -- "hello" *.c
Then invoke it as follows: CMD="grep --ignore-case --files-with-matches -- \"hello\" *.c"
ARG_MAX=$(getconf ARG_MAX)
CMD_LEN=${#CMD}
if (( CMD_LEN > ARG_MAX )); then
echo "Error: Command length ($CMD_LEN) exceeds ARG_MAX ($ARG_MAX)." >&2
exit 1
fi
eval "$CMD" # warning, evaluates filenames
That might be sensible, but also obscure the script logic.
Since using Linux exclusively, I don't think I've ever encountered an issue due to too many arguments/length. And it's the first time I'm actively searching online for ARG_MAX.
I understand that different shells might be different, but with reasonable lengths is there any chance of it being relevant (aside from xargs, where it's generally intended, or better, to pass along each argument individually).
I started running into these issues when I started working with training examples in the context of deep learning. A folder with millions of files is then not unheard of.
Also if you do things like */*/*, then you can quickly get large command lines. Or even if you do long_name/another_long_name/*.
I've had problems with "cat *.csv" plenty of times processing data that is generated in many small files.
It is really difficult to deal with, because on top of the arg max limit, globs are not guaranteed to be in order.
The solution is not obvious and hard to get to if you don't know the foot guns in advanced and hard to read once implemented.
You should always type check your shell scripts as well. For example, you just:
$ shelltypes script.sh
# Welcome to shelltypes v 3.23.2
# type ‘help’ if you’re stuck
>>> {1) # import POSIX;;
Importing 73 items.
>>> {2} # append loadpath “/opt/local/shelltypes/base”;;
>>> {3} # import base::YOURPROJECT;;
Importing 15 items.
>>> {4} # check “YOURSCRIPT.sh”
Parsing YOURSCRIPT.sh.
Reticulating splines.
Expanding aliases.
Analyzing free shell environment variables.
Found inconsistencies in PATH.
Warning: Low battery!!!
Warning: found free type for ‘shred’, ignoring.
Warning: use of sudo requires password under /etc/sudoers.
Warning: this utility is fake.
Error: use of cat impossible in the presence of mutt.
Found 15 errors.
Try again. Goodbye.
$
Then you can be pretty sure your script isn’t going to do unnecessary harm, and has some proper guardrails in place.Where does "shelltypes" come from? I can't find anything on DuckDuckGo or Google, but this seems like it would be very useful.
From the output in the post I'm going to assume it's either a joke post or an LLM hallucination.
Oh god, it even says right there:
> Warning: this utility is fake.
Well played! I guess I got too excited at the possibility of such a tool existing.
The thought is very compelling! shellcheck is pretty great, but of course isn’t this complete. It definitely can’t reticulate splines, for instance.
Such a tool does exist, it's called shellcheck. If you give /bin/sh shebang for example it will tell you off using non-POSIX features regardless of whether they'll work with the sh on your system.
(Personally I typically use bash though, largely so I can `set -eEuo pipefail` next.)
I don't know shelltypes, and it sounds like a linter for shell scripts.
Does shelltypes warn against a failure to check for ARG_MAX?
> eval "$CMD"
That means you will eval all the filenames, so if you have a file with spaces in it will appear as two files, if there is a `$` in the name it will trigger parameter substitution and so on for the other shell meta-characters.
Yes, that could be true. I'm not great in Bash. Be careful. These types of error are why I don't use Bash. I just wanted to give an example in a commonly used scripting language. The main point here is to check ARG_MAX.
If you are doing something where going over ARG_MAX is a real possibility it would be better to write your scripts to avoid the problem altogether rather than awkwardly try to detect it wit ha bonus exploit. For example, many commands can accept a list of files on standard input or from a list file.
On my system, `getconf ARG_MAX` is over 2m.
I have seen some heinously long cmdline strings, but nothing close to that. Usually when invocations have creeped up into the O(1k) character limit at places I've worked, I've implemented a "yaml as cmdline args" option to just pass a config file instead.
Have you seen scenarios where this is actually limiting?
Yes, if *.c expands to a string over 2m. Maybe that is a lot for .c files, but it may easily happen with .tiff and a folder full of images used for training a deep learning model, for example.
Thanks, this is interesting. I have done a lot of this sorta stuff (glob expanding giant directories) without a thought for this `ARG_MAX` libc parameter, but now I know I need to keep it in mind!
> Before invoking a command, always first check if the length of the command is not longer than ARG_MAX.
Tell that to Google and Mozilla. /s
> Long form options are much more self-explanatory for the reader.
And less prone to typos
And not just options but base command names too. I wrote a tool to partially mitigate this in some cases: https://github.com/makesourcenotcode/name-safe-in-bash
This is one of my default rules for writing scripts. If the long option is available, use it. It makes too much sense to do so.
The hype side of this is to always add instructions to use long options to the agent base prompt. Its much easier to spot mistakes and long options have the advantage to not do something completely different if they are wrong.
Reminds me of our code having llm generated regular expressions which are impossible to understand and the only way you can tweak it is giving it to llm to change.
Folks, remember record your LLM prompt in a comment so that your regex can be validated.
what if I or someone wrote a bot / script that searches across github for every shell script file that it can find and converts all short options into long options and opens a PR? Think dependabot but lets call it longabot or readabot?
I don't think you should do that unprompted. There are reasons for using short options like the portability mentioned in other comments. It'd put an undo burden on open source maintainers.
Something opt-in like dependabot though could be useful.
Then you will get to feel the unreasonable hate of lots of lazy people, caring more about typing a few characters less, than their stuff being readable, and from all the Apple users.
Detracting from the message, but what is the `try shell.exec(" ...` thing?
Prepares to launch in flaming rant... sigh You're right.
If your goal is to help your coworkers, this is correct.
If not, it isn't.
and while you're at it, please alias --help to -h :-)
Some cli are event worse at this when you try -h or —help and they tell you that no, you must run ‘cli help’ instead of showing the help already.
not only long options, also multiple lines:
somecmd \
--option1 \
--option2 $FOO \
--option3 $BAR
It depends...
[flagged]
What is this? Use libgit2 or use a proper language where you do not need exec(), do it in Bash, or do it in C with libgit2 or with popen() or something. Using "system()" in C is never the right thing to do, the least you can do is using popen().
Not sure in which language you use "try shell.exec", but I am not even sure that it is the right way.
I used to think this, but now mostly (but weakly) don't. Long options buy expressiveness at the cost of density, i.e., they tend to turn "one-liners" into "N-liners". One-liners can be cryptic, but N-liners reduce how much program fits on the screen at once. I personally find it easier to look up flags than to have to page through multiple screenfuls to make sense of something. In this respect, ISTM short options are a /different/ way of helping a subsequent reader, by increasing the odds they see the forest, not just the trees.
Strongly, strongly disagree. This is a GNU-ism, and needlessly verbose. What's with people refusing to use the vast amount of memory in their brains and actually learning, instead lazily going for the lowest-common-denominator approach relying on "loanwords" from English?
Because not everyone has 20 or more years of flag memory from using Linux, and they’ve also got to maintain the scripts. If you’re not familiar with every arcane invocation of find or tar or whatever, or even if it’s just been a while, the long options are a godsend when you’re skimming a script trying to figure out what it’s doing.
No the problem is to overload utilities with hundreds of options and subcommand languages over the course of decades. The starting point was quick command-line usage but GNU-style long options are clearly rooted in Elisp/emacs hyphenated-words-verbosity. You can tell it wasn't meant to be like this when typing somecmd -h results in multiple pages of option output where you can't see the forrest for the trees, and the command's manpage similarly loosing its usefulness due to sheer size, and frequently not even containing an EXAMPLES section. If it has a manpage in the first place, rather than linking to a nonexistant texinfo page like was practice for the longest time. All of which make you none the wiser so you go to the web for example usage.
You can always copy+paste it into an LLM and have it explain the options in 5 seconds.
I like how programmers are all about locality of reasoning and avoiding context switches until the context switch is “go paste it into an LLM”
I just use the man page. Which happily describes all the options in an easily digestible format that does not require either an ineternet connection or massive amounts of power to be wasted on a result that is almost certainly copyrighted and flat out stolen from it's original creator.
I am of a generation where I see these uses of LLMs as exceptionally lazy or showy. Your ready use of a "chatbot" is not an attribute to broadcast like this.
Because not everyone has 20 or more years of flag memory from using Linux
It's called learning, apparently something that is now being eschewed in favour of quick superficiality (and making developers more replaceable --- with AI or unskilled offshore labour.) No wonder "modern" software is almost always total shit.
"But I don't have the time to learn," you complain, but ask yourself this instead: Why do you never have the time to do it right, but always the time to do it twice (or however many times is necessary to get a barely-working product)?
I’ve got almost 30 years of using UNIX shells and I’ve learnt a lot more than most, but there are plenty of tools that have things like -r and -R that do radically different things and I don’t always remember the difference between them. If it’s written --recursive then it’s a lot clearer. Clarity is useful even when you have a lot of knowledge and experience.
As someone who works on a team of people with different levels of experience, where understanding is more important than some self-inflicted purity test of their unix chops, I will always either do long options or, where that’s not possible, a comment explaining the short options.
No one is complaining about not having time to learn: you’re arguing against a strawman. The point is not to discourage learning, it’s to encourage clarity in a context where it can at times be critically important.
Learning that this tool can be used do these things, manipulate these objects or stream of data like this, and that this tool is useful in combination with these other tools are the valuable part. Memorizing some arbitrary encoding of the parameters are not really useful when I can offload it to my language center in the brain and basically get that part for free...
Why do you conflate learning with memorizing? man <command> is easy to do. And you take notes, write script, functions and alias for things you do often.
I’m nostalgic about old Unix too, but it’s just that: nostalgia. In reality, Unix was the system that brought us such elegant well-designed APIs as gets(3) and strtok(3).
Learning is one thing and being able to comprehend a piece of code at a glance is another. I might know them but wouldn’t immediately remember all the past and future short form of flags ever written. What is even the point of that? If I am comfortable with the essence of what they do, does it really matter that much if I have that ingrained in my memory that -r does foo while -f does bar?
You're confusing learning with memorization.
Exactly that. Well said. I can control my learning but memory? Not as much.
[flagged]
[flagged]
Nobody here was advocating against learning and memorising, but for writing code that is as self-evident and unambiguous as possible.
Talk to any electrician which kinds of diagrams they prefer; those were a fellow was clever, trying to compress stuff as much as possible, or those that are clearly readable and easy to understand. Pilots carry heaps of documentation with them on the plane, walk through checklists instead of trying to memorise what to do. Your argument doesn’t hold up the slightest scrutiny.
I’m advocating for a culture of helping people hone their skills by learning from others instead of being turned off by their arrogance. That’s a very different thing from encouraging them to avoid thinking.
Programs are read more often than they're written, and on a scale of decades all programmers are novices. So I try to optimize my code to be obvious to novices
What's wrong with GNU? they automatically wrong or something?
So I try to optimize my code to be obvious to novices
That's the attitude which is responsible for making software the way it is today: mediocre.
Software is mediocre today? You should have seen it back in the day. It was a brittle, arcane mess.
Software isn't as exciting any more, but that excitement isn't gone because newbs don't have to look up what -O stands for anymore. It's because it's wildly more reliable.
I low key agree on this, because I think it doesn’t matter and you have to look up the meaning anyway. English-like options may make you think you already know the details but usually you don’t. If there’s a need to develop in sh/bash/etc, then a modern solution would be to make something like a language server for it which could “go to definition” right into a man page section, or combine docs for all used options in a single popup/aux window. Of course it’s not possible for every man page, since some of these are written in too free form. Iow, it’s shell, whatever you program here in more than two lines you’d better program in python instead (and I’m saying this as a low key python hater).