Use Long Options in Scripts

matklad.github.io

297 points by OptionOfT 4 months ago

Please DO NOT mix string interpolation and command execution, especially when a command is processed through the shell. Whatever your language, use a list-based or array-based execution API that passes arguments straight through to execv(2), execvp(2), etc, bypassing the shell.

matklad - 4 months ago

Was waiting for this comment :P
The API used handles string interpolation correctly: the string literal is parsed at compile time, and the interpolated arguments are never concatenated or escaped, and end up directly as an element of arv array passed to a child. See
https://github.com/tigerbeetle/tigerbeetle/blob/7053ecd2137a...
- chubot - 4 months ago
  This approach creates an odd mini language, which is incomplete:
  comptime assert(std.mem.indexOfScalar(u8, cmd, '\'') == null); // Quoting isn't supported yet. comptime assert(std.mem.indexOfScalar(u8, cmd, '"') == null);
  But you can do correct interpolation with simple shell variables, rather than generating shell code strings:
  $ today='foo; bar' sh -c 'argv git switch --create "release-$today" origin/main' ['git', 'switch', '--create', 'release-foo; bar', 'origin/main']
  So that is a test that we can use a plain shell string, without any shell injection bug. (argv is my command to test quoting: python -c 'import sys; print(sys.argv)' "$@" )
  Note that there's no escaping function needed, because we're not generating any shell code. We're generating an argv array for `/bin/sh` instead.
  ---
  So by invoking with an env var, you can easily create a correct API that uses plain shell
  git switch --create "release-$today"
  rather than
  git switch --create release-{today} # what language is this? It's not obvious
  If you don't want to use the env var, you can also use
  git switch --create "release-$1"
  And invoke with
  ['sh', '-c', shell_string, 'unused-arg0', today_string]
  With this approach, you don't need
  1. any kind of shell escaping 2. any analyzing of pseudo-shell strings, which can't contain quotes
  Because you are not generating any shell code. The shell code is constant.
latexr - 4 months ago

Why would they even change the language and the commands in the example? It confuses and undermines the point. Just say “use `git switch -c my-new-branch` for interactive usage and `git switch --create my-new-branch` in scripts”. It makes no sense to introduce other unexplained information.
gray_-_wolf - 4 months ago
Another approach is to have powerful enough language that allows you to guard against the shell injection. I wrote a syntax form allowing to do this:
```
    (sh "cat " file " >" output)
```
With file being bound to "foo'bar" and output to "x", it is automatically translated into
```
    cat 'foo'\''bar' >'x'
```
This gives you the flexibility to use shell (sometimes it just is the most concise way) while being safe against injection.
I believe for example in rust you should be able to do the same.
- delusional - 4 months ago
  
  How do you know which shell you're escaping for? You could query the system, but now you end up implementing escaping for every shell out there.
  - gray_-_wolf - 4 months ago
    
    Good question. I care only about POSIX compatible shells, so the escaping just follows the POSIX rules. In practice that means it works on any actually used system except windows, which is fine with me.
gorgoiler - 4 months ago
Miniature, in-line sh scripts are also fine as long as you use the provided parameter substitution.
If you’re averse to this:
```
  q(“select x where y = ‘“ + v + “‘“)
```
And instead do this:
```
  q(“select x where y = %s”, v)
```
Then you should be averse to this:
```
  x(“foo --option ‘“ + v + “‘“)
```
And instead do this:
```
  x(‘foo --option “$1”’, v)
```
This is particularly useful when it’s expedient to have one thing piping into another. Like it or not the sh DSL for pipes is excellent compared to doing things natively with execve() and pipe(), just as doing group by and count is far more concise in SQL than doing so natively.
Most SQL libraries give you something like q. Writing your own x is as simple as calling sh correctly. In Python, for example:
```
  def x(script, *args):
    run([“sh”, “-c”, script, “--“, *args])
```
- kbenson - 4 months ago
  
  Neither of those are equivalent to variable binding, which is what most SQL libraries provide, specifically because they don't actually solve the problem since they're still doing string substitution. Putting a double quotes in $1 in your "good" execute example will allow you break out of what's expected and then you're Bobby Tables.
  Your python example at the bottom is correct, in that each separate element is more correct in that it allows each arg to be passed as an element, so there's no option to break out through quoting characters. SQL binds are like that in most ljbraries, even if they don't look like it. The parser knows a single item below there so if it passes it along as such. You cannot escape it in the same way.
  - gorgoiler - 4 months ago
    
    I don’t really follow. My “good” example and the code at the bottom are the same.
    sh is smarter than just doing string interpolation and ”$1” is passed on as a single argument, no matter what:
    > run(["sh", "-c", 'echo "$1"', "--", 'a"']) a”
    Whereas if it were simple string interpolation, you’d see this:
    > run(["sh", "-c", 'echo "a""') --: 1: Syntax error: Unterminated quoted string
    It’s the same special casing that gets "$@" right.
    
    kbenson - 4 months ago
    
    That requires you quote the param in the sintr to ensure that params are groups as expected. E.g.
    # cat pvars.sh #!/bin/bash echo "\$1=$1" echo "\$2=$2" echo "\$3=$3" # sh -c './pvars.sh "$1" $2 $3' -- 'a b' 'c d' $1=a b $2=c $3=d
    The whole point of passing in an array and using something like exec (or system(), if provided as it handled the fork and wait for you) is that you avoid the overhead of the shell starting up at all and parsing the command line, and it lets you define each param exactly as needed since each param is its own array item. You don't need to worry about splitting on space or the shell splitting params on space, or quoting to group items. If you want the param to be:
    foo "bar baz" quux
    as one singular parameter, you just make that the contents of that array item, since no parsing need be done at all.
    If you have an array of params and you're jumping through hoops to make sure they're interpreted correctly by the shell you call execute a process, you're likely (depending on language and capabilities) wasting both cycles and over complicating the code when you can just call the program you actually want to execute directly and supply the params. Alternatively, if you have all the params as one long string and you want it to be pased as a shell would, then execute the shell and pass that as a param. e.g.
    # perl -E 'system("./pvars.sh","a b","c d");' $1=a b $2=c d $3= # perl -E 'system("./pvars.sh","a b","c","d");' $1=a b $2=c $3=d
    
    gorgoiler - 4 months ago
    
    Thanks for explaining. I feel like we’re talking past each other but it’s my mistake. I should have said it is only useful (not “particularly useful”) if one has compound statements like a pipe or multiple commands. Invoking sh just to run a single command is superfluous and you are right that reaching directly for execve() is better.
    
    kbenson - 4 months ago
    
    Ah, yes. If you want to take advantage of piping commands together through the shell as a subcommand of your program, then a way to make params behave more consistently regardless of content is useful.
- pwdisswordfishz - 4 months ago
  SyntaxError: invalid character '“' (U+201C)
crazygringo - 4 months ago

For anything involving file paths, user input, etc. -- yes of course. It's not even a question because they would need to be escaped otherwise which nobody wants to do.
But for a simple example like this where it's inserting a date which has known properties, it seems fine, and is much more readable.
paulddraper - 4 months ago

Tbf this input does not need escaping.
But at the very least the shell is unnecessary here.
tasuki - 4 months ago

Why not?
- bulatb - 4 months ago
  Any time you send commands and data down a single channel, user input that's intended to be data can be misinterpreted as a command. For example, if your program wants to:
  run("program --option '{user_input}' > file")
  to save some input to a file, and the user's input is:
  '; bad_command #
  then when run() sends that string to the shell, the shell will run:
  program --option ''; bad_command #' > file
  Most languages have something like a safe_exec() that separates the shape of the command from the values of the options, executing "program" with the options and the user_input in the arguments array as data. Skipping the shell step, which would just be building an exec call anyway, removes the opportunity for users to confuse it into doing something else.
  The list-based API alternative they recommend might look like this:
  safe_exec(["program", "--option", user_input], stdout="file")
  and it would always exec "program" with argv[1] == "--option" && argv[2] == user_input. If the user_input happens to be:
  '; bad_command #
  ...well, then, the user can enjoy the contents of their file.
  - tasuki - 4 months ago
    
    Yes of course. But why would you expect me to run shell commands with random person's input? Also:
    safe_exec(["rm", user_input])
    This isn't safe either! Despite clearly saying "safe_exec"!
    
    bulatb - 4 months ago
    
    Yeah, "safe_exec" is a useless name without context. But the context was you need to call a program from another program. Many people would call system() or whatever because usually it's obvious and easy, and the pitfalls are less so.
    Shelling out is not the only option. People are just saying not to use that option. Better ones won't save you if you purposely do something stupid. They will save you if the user wants to trick you into doing something else.
    
    tasuki - 4 months ago
    
    A nearby comment mentioned escaping. I guess that might be a good reason to use execv?
  - - 4 months ago
    
    [deleted]
echelon - 4 months ago

SQL injection on steroids.
- rat87 - 4 months ago
  
  Only if you are getting input from untrusted users
  - remus - 4 months ago
    
    imo it's best to just avoid it altogether. Requirements change, and what was once a trusted input can become untrusted input.
    
    rat87 - 4 months ago
    
    Generally you shouldn't be passing random data from the web to shell scripts. Maybe I haven't done the right type of work but having to deal with fidlg bits it's much more likely not passing it to be shell will cause issues (with stuff like executable paths)
  - brookst - 4 months ago
    
    Or if your trusted users are fallible and could be tricked into providing unsafe inputs.

susam - 4 months ago

I prefer long options too. However, while writing programs that need to invoke POSIX commands in a portable manner, short options are the only viable choice, as POSIX doesn't specify long options. For instance, see the specification for diff at <https://pubs.opengroup.org/onlinepubs/9799919799/utilities/d...>, or that of any POSIX utility listed at <https://pubs.opengroup.org/onlinepubs/9799919799/idx/utiliti...>.

That said, this is more of a corner case. In most scenarios, rather than relying on POSIX utilities, there are often better alternatives, such as using library bindings instead of spawning external processes. For example, instead of invoking grep, using something like libpcre could be a more efficient choice.

For non-POSIX utilities like git, hg, rg, ag, etc., using long options makes perfect sense.

Wowfunhappy - 4 months ago

> However, while writing programs that need to invoke POSIX commands in a portable manner
...probably a stupid question, but something I have earnestly been wondering about... when does this actually happen nowadays? What POSIX systems are you targeting that aren't one of the major ones (Linux, Darwin, or one of the major BSDs)?
I was writing a shell script a few months ago that I wanted to be very durable, and I targeted sh instead of bash just because, well, it seemed like the correct hacker spirit thing to do... but I don't actually know what system in the past decade (or more) wouldn't have bash.
- chrisweekly - 4 months ago
  
  I ~recently had to wrestle w/ TeamCity (CICD) whose build agents provide only sh. I needed to invoke a 3rd-party util that required bash. The resulting "bash in dash in docker in docker" worked, but I wasn't thrilled about the convoluted / Frankenstein setup.
- em500 - 4 months ago
  
  > ... but I don't actually know what system in the past decade (or more) wouldn't have bash.
  There's some ambiguity about "have bash". If "having" bash means that (some version of) bash has been ported to the system, there are indeed very few. If "having" means that bash (supporting all options that you need) is available to the user, that could be a lot more. As others have noted, the BSDs, Android and many embedded Linux systems don't come with bash pre-installed, MacOS pre-installed bash is stuck at version 3.2 (which doesn't have associative arrays), and the user could be in an environment that does not allow them to install whatever they need.
- mceachen - 4 months ago
  
  Alpine docker images only come with dash instead of bash, which _may_ run your sh script, but test thoroughly. Or just install bash.
  FWIW, Darwin/macOS is especially guilty of gobsmackingly ancient coreutils that don’t support long option variants.
  - samatman - 4 months ago
    
    Is it? I'm with you on gobsmackingly ancient, but it's "doesn't support long options" which I haven't bumped into. I do replace some coreutils, but not all of them.
    What's a good example of such a utility?
    
    skissane - 4 months ago
    
    For example, sed. macOS sed doesn't support long options. Not even --help or --version
    (Running an older version of macOS so can't completely exclude this has been updated in a newer version, but I'd be surprised to learn that was true.)
  - dfe - 4 months ago
    
    macOS doesn't have GNU coreutils at all. It has the utils from FreeBSD.
    The gobsmackingly ancient GNU software it does have is bash, because it's the last version under GPL 2. I've used Mac OS X since 10.1, so I remember when the default shell was tcsh and /bin/sh was not bash.
    That's (basically) the case again on the last few macOS releases. Today, zsh is my shell of choice, including on Linux.
  - pingiun - 4 months ago
    
    the alpine default shell is called "ash", "dash" is the debian/ubuntu default shell
  - hulitu - 4 months ago
    
    P in POSIX stands for portability. ,/s
- cmgbhm - 4 months ago
  
  Where I can sometimes get burnt is busybox.
  I more often get burnt in zsh to bash than that however
- susam - 4 months ago
  
  > I don't actually know what system in the past decade (or more) wouldn't have bash.
  I have written a bit more about it in these comments:
  https://news.ycombinator.com/item?id=40681382
  https://news.ycombinator.com/item?id=17074163
- Gud - 4 months ago
  
  FreeBSD doesn’t come with bash though.
  - vermaden - 4 months ago
    
    But it also have drawbacks :)
    But being honest - you can install BASH that way:
    # pkg install -y bash
  - justaj - 4 months ago
    
    Nor does OpenBSD for that matter.
- vsl - 4 months ago
  
  Major ones are enough. Linux and Darwin (that is, macOS and GNU userspace, really) differ sufficiently that you need to pay attention or limit yourself to POSIX. E.g. sed and wc burned me a few times with scripts that need to run on both.
- Someone - 4 months ago
  
  > but I don't actually know what system in the past decade (or more) wouldn't have bash.
  I think MacOS still has bash, so that it, technically, doesn’t count, but it doesn’t have a bash from the past decade, and uses zsh by default.
theamk - 4 months ago

Note that grep in particular is extremely optimized.. If you have multi-gigabyte files, and you only search for one thing, shelling out to grep will likely have much better performance that doing it yourself.
But not every system needs that much, and in a lot of cases, using your language's regexp library will be more robust anf easier to write.

dosourcenotcode - 4 months ago

Agree that long options should be used. But there is one caveat to consider: portability.

Sadly to this day not all BSD distributions have GNU style long options. And the ones that now do only got them fairly recently. So if you want portability you have to use short options as you weep with a bottle of vodka in hand.

mplanchard - 4 months ago

Not trying to spam this thread with praises of nix, because it does have its own problems, but it certainly solves the portability problem.
Four years in to using it at work for dev environments across mac (x86 & ARM) and various linuxes and can’t imagine going back. I also always make dev environment definitions for my open source projects, so even if people aren’t using nix, there is at least a record of what tools they will need to install to run scripts, tests, etc.
- nine_k - 4 months ago
  
  Does nix work well on BSD-derived Unices? In particular, the most widespread of them, macOS?
  - mplanchard - 4 months ago
    
    Yes, works great on Mac. About half our engineers us Macs, the other half Linux. We have one nix configuration for the dev environment, which works for everyone.
saghm - 4 months ago

This surprises me because the first case I remember ever coming across where short versus long options impacted portability across GNU and BSD was _fixed_ by using long options. Maybe six years ago or so I had an issue porting a script someone else had written for use in CI that happened to decode some base64 data that failed when I tried to use it on a different platform. I forget which one it was originally written for and which one I was trying to use it on, but the issue boiled down to the MacOS version of base64 using the BSD short option for decode and Linux using the GNU one, and they each used a different capitalization; one used `-d` and the other used `-D` (although I also can't remember which used which honestly). My solution was to use the long option `--decode`, which was the same on both of them, and since then the times I've needed to decode base64 I've always used the long option out of habit, which probably explains why I can't remember what option Linux uses despite it being the one I've used far more over the years since then.
- delusional - 4 months ago
  
  I think the right way to think about this (if your goal is to avoid surprises at least) is that options (short or long) are just strings. There's no guarantee that there's a long variant of an option. There's not even a requirement that options start with a dash. A sufficiently brain-damaged developer could start them with a slash or something.
  If you're going for portability the best bet is to just read the manual for each of the separate versions and do whatever works.
  - sgarland - 4 months ago
    
    To this day, I write tar options with no dash, simply because I can. `tar cvzf foo.tar.gz ./foo`
    I would never write a new program with this option, but I do find it a delightful historical oddity.
    
    saghm - 4 months ago
    
    I've noticed that it seems to be a pattern that's used for other compression/decompression software as well. Sometimes mods I use for games will be uploaded as rars or 7zips (I guess because this stuff gets developed on and for Windows, and tarballs aren't really something people use much there), and the CLI invocations I use to extract them always look off to me, especially the 7zip one: `unrar x` and `7z x`.
  - saghm - 4 months ago
    
    That sounds reasonable to me. If anything, I might even go further and say that reading the manuals wouldn't be enough to fully convince me without also actually testing it by running a script on a given platform. It's not that I don't trust the manuals to be right, but I have less trust in myself to write bug-free code than probably any other language I've ever used, and I don't think I'd feel confident without verifying that I actually did what the manual said correctly.

teddyh - 4 months ago

Also, do not forget using “--” after all options, but before any dynamic arguments, just to be safe.

arcanemachiner - 4 months ago

I know to do this intuitively, but I have no idea why.
- hoherd - 4 months ago
  It terminates argument parsing, so anything following it that starts with a hyphen will not be treated as an argument.
  $ echo 'hack the planet' > --help $ cat --help cat: illegal option -- - usage: cat [-belnstuv] [file ...] $ cat -- --help hack the planet $ rm -vf --help rm: illegal option -- - usage: rm [-f | -i] [-dIPRrvWx] file ... unlink [--] file $ rm -vf -- --help --help $ cat -- --help cat: --help: No such file or directory
  - hiAndrewQuinn - 4 months ago
    
    My eyes have been opened. Thank you!
- less_less - 4 months ago
  
  It tells the shell utility that any remaining arguments are not options, but instead files or whatever the script might process. You know, in case someone makes a file called -rf.
  - pletnes - 4 months ago
    
    But not all shell utilities follow this particular convention
    
    ndsipa_pomu - 4 months ago
    
    Most of the commonly used ones do, so it's easiest to just always do it and then remember the two or three utils that don't like it.
    
    hackerthemonkey - 4 months ago
    
    Yes. It’s more of a convention. If a shell utility isn’t following it then it won’t mean what we think it would mean.
    Also - it helps a lot when a utility accepts a path that may or may not contain hyphens.
    
    ribcage - 4 months ago
    
    Yes. The famous echo on Linux systems does not have it and therefore it's impossible to print the string "-n o p e", because -n will be interpreted as an option.
    
    bonzini - 4 months ago
    
    echo is not portable anyway, use "printf %s STRING" or "printf '%s\n' STRING".
    
    ribcage - 4 months ago
    
    Yes, that's what I use. Sometimes I still get tempted to use echo because there's less typing...
    
    ezequiel-garzon - 4 months ago
    
    It does if single or double quotes are used, right? Which would be necessary (or preferred to multiple backslashes) quite often.
    
    bonoboTP - 4 months ago
    
    No, the quotes are not seen by the program. The program receives a list of strings, it does not get the information about whether and how those strings were originally quoted in the shell. Programs can also be directly called with lists of strings as in execve, so often it does not even make sense to ask if the arguments were quoted or not.
    Quotes live on a different level of abstraction.
    
    danadam - 4 months ago
    
    > No, the quotes are not seen by the program. The program receives a list of strings, it does not get the information about whether and how those strings were originally quoted in the shell.
    With quotes the program will receive a single argument -n␣o␣p␣e instead of multiple ones -n, o, p, e. At least it works on the machine here:
    ]$ echo "-n o p e" -n o p e ]$ /bin/echo "-n o p e" -n o p e
    
    bonoboTP - 4 months ago
    
    Yes, I think there was some misremembering here. The nontrivial thing is to print out -n itself with echo. For example, echo doesn't treat "--" specially, so "echo -- -n" prints "-- -n".
    
    account42 - 4 months ago
    
    Note that this is true for POSIX sytems but not e.g. for Windows. There the program receives the command-line as-is and is responsible for parsing it into an array. There are two different standard functions to do this parsing for you (with slightly different quoting behavior) but you could also create your own that requires options to not be quoted.
- bluedino - 4 months ago
  
  It's worth it just to watch the frustration of a junior when they try tacking more arguments on the end of a command.
  - account42 - 4 months ago
    
    A great opportunity to teach the importance of reading the whole command before trying to modify it.

saagarjha - 4 months ago

Unfortunately, if you want your scripts to be portable to other POSIX systems you might have to use the short options, as the long ones are not standardized. You have to decide the tradeoff for yourself.

mplanchard - 4 months ago

Using nix has really spoiled me on this. Everyone gets the same versions of all the CLI utilities in the dev environment, whether on mac or linux, and those are the same versions that run in CI and any prod systems. It’s really nice being able to use whichever newer bash features or gawk extensions you like, without having to deal with trying to ensuring the mac engineers have brew-installed all the right stuff to match a standard linux env.
- delusional - 4 months ago
  
  nix didn't solve your issue here. nix didn't do anything. You're just describing the benefit of a reproducible development environment. You could do the same thing with brew, pacman, apt, or by just compiling every package from source from some huge mirror.
  It's exactly the same thing people initially loved about docker or vagrant.
  - mplanchard - 4 months ago
    
    Sure, but it works on Mac and Linux and doesn’t require virtualization. I think brew might qualify, but it can’t define which environment variables should be available in the developer shell or which hooks to run upon entry.
    I don’t think any of the other options you specified can manage the same thing.
    
    SAI_Peregrinus - 4 months ago
    
    Also you can use nix-shell as the shebang, and then pass a second line to tell it what program to use to interpret the rest of the script and all of its dependencies. It'll fetch the shell utilities in the versions you want when the script is run.
    
    delusional - 4 months ago
    
    >[...] it can’t define which environment variables should be available in the developer shell or which hooks to run upon entry.
    Neither pacman, apt, nor any other package manger require any sort of virtualization. pacman works fine where-ever you have a C compiler including macos, linux, windows, probably even TempleOS. Whatever you want.
    If you want to add something to the user environment system wide, the traditional thing to do is to dump a file into `/etc/profile.d/` which will be sourced during shell startup. If you instead want something local to the project, you just make a script that the developer can source, like a python virtualenvironment.
    I'm not saying any of these ideas are bad. I am saying that they are easily solvable and have been solved for the past 20 years. Without Nix.
- paulddraper - 4 months ago
  
  Everyone has to use nix :)
  But yes, that is nice.
  - mplanchard - 4 months ago
    
    That is the caveat. I initially set it up such that it wasn’t required: you could choose to use it if you wanted to, and otherwise here is a list of specific versions things you must install, etc. Everyone ultimately chose to use nix, and now it’s required. Makes for a pretty easy setup though for new devs: install nix, then run `nix develop`, then `make setup`, and you’re off to the races.
pcwalton - 4 months ago

What POSIX systems in actual use (not historical Unixes) don't have the long options? macOS' BSD utilities I guess?
- yjftsjthsd-h - 4 months ago
  
  > What POSIX systems in actual use (not historical Unixes) don't have the long options?
  All of them except for GNU, AFAICT? (That is, only GNU seems to have long options.) Checking manpages for rm(1) as a simple reference, I can't see long options in any of the 3 major BSDs or illumos, and checking Alpine Linux seems to show busybox also only doing short options (sorry, can't find an online doc for this, though it's easy to check in docker if you don't have a machine running Alpine handy). OpenWRT also uses busybox and has the same (lack of) options.
  https://man.netbsd.org/rm.1
  https://man.openbsd.org/rm.1
  https://man.freebsd.org/cgi/man.cgi?query=rm&apropos=0&sekti...
  https://illumos.org/man/1/rm
  - SoftTalker - 4 months ago
    
    More than this, the gnu utilities often have options that don't exist at all on other platforms, in either long or short form.
- saagarjha - 4 months ago
  
  Yes or like BusyBox
- schneems - 4 months ago
  
  You can also brew install tools like gnused which have the same arguments. Not a viable option for all situations but if you just need to execute it on Linux and your local machine for dev you can use those.
- - 4 months ago
  
  [deleted]

ratrocket - 4 months ago

I agree with this practice. Another benefit is it makes it easier (slightly, but still) to grep the man page for what the options do.

The corollary must be "write programs that take long options".

starkparker - 4 months ago

And put them on separate lines so you can track and git blame them more easily.

ivanjermakov - 4 months ago

Same line git blame is not that hard, just list commits affecting specific file or even specific line span: https://git-scm.com/docs/git-log#Documentation/git-log.txt--...

amelius - 4 months ago

Before invoking a command, always first check if the length of the command is not longer than ARG_MAX. For example, if this is your command:

    grep --ignore-case --files-with-matches -- "hello" *.c

Then invoke it as follows:

    CMD="grep --ignore-case --files-with-matches -- \"hello\" *.c"
    ARG_MAX=$(getconf ARG_MAX)
    CMD_LEN=${#CMD}

    if (( CMD_LEN > ARG_MAX )); then
        echo "Error: Command length ($CMD_LEN) exceeds ARG_MAX ($ARG_MAX)." >&2
        exit 1
    fi

    eval "$CMD" # warning, evaluates filenames

mhitza - 4 months ago

That might be sensible, but also obscure the script logic.
Since using Linux exclusively, I don't think I've ever encountered an issue due to too many arguments/length. And it's the first time I'm actively searching online for ARG_MAX.
I understand that different shells might be different, but with reasonable lengths is there any chance of it being relevant (aside from xargs, where it's generally intended, or better, to pass along each argument individually).
- amelius - 4 months ago
  
  I started running into these issues when I started working with training examples in the context of deep learning. A folder with millions of files is then not unheard of.
  Also if you do things like */*/*, then you can quickly get large command lines. Or even if you do long_name/another_long_name/*.
- wodenokoto - 4 months ago
  
  I've had problems with "cat *.csv" plenty of times processing data that is generated in many small files.
  It is really difficult to deal with, because on top of the arg max limit, globs are not guaranteed to be in order.
  The solution is not obvious and hard to get to if you don't know the foot guns in advanced and hard to read once implemented.
apgwoz - 4 months ago
You should always type check your shell scripts as well. For example, you just:
```
    $ shelltypes script.sh
    # Welcome to shelltypes v 3.23.2
    # type ‘help’ if you’re stuck
    >>> {1) # import POSIX;;
    Importing 73 items.
    >>> {2} # append loadpath “/opt/local/shelltypes/base”;;
    >>> {3} # import base::YOURPROJECT;;
    Importing 15 items.
    >>> {4} # check “YOURSCRIPT.sh”
    Parsing YOURSCRIPT.sh.
    Reticulating splines.
    Expanding aliases.
    Analyzing free shell environment variables.
    Found inconsistencies in PATH.
    Warning: Low battery!!!
    Warning: found free type for ‘shred’, ignoring.
    Warning: use of sudo requires password under /etc/sudoers.
    Warning: this utility is fake.
    Error: use of cat impossible in the presence of mutt.
    Found 15 errors.
    Try again. Goodbye.
    $
```
Then you can be pretty sure your script isn’t going to do unnecessary harm, and has some proper guardrails in place.
- Timon3 - 4 months ago
  
  Where does "shelltypes" come from? I can't find anything on DuckDuckGo or Google, but this seems like it would be very useful.
  - Arrowmaster - 4 months ago
    
    From the output in the post I'm going to assume it's either a joke post or an LLM hallucination.
    
    Timon3 - 4 months ago
    
    Oh god, it even says right there:
    > Warning: this utility is fake.
    Well played! I guess I got too excited at the possibility of such a tool existing.
    
    apgwoz - 4 months ago
    
    The thought is very compelling! shellcheck is pretty great, but of course isn’t this complete. It definitely can’t reticulate splines, for instance.
    
    OJFord - 4 months ago
    
    Such a tool does exist, it's called shellcheck. If you give /bin/sh shebang for example it will tell you off using non-POSIX features regardless of whether they'll work with the sh on your system.
    (Personally I typically use bash though, largely so I can `set -eEuo pipefail` next.)
  - - 4 months ago
    
    [deleted]
- amelius - 4 months ago
  
  I don't know shelltypes, and it sounds like a linter for shell scripts.
  Does shelltypes warn against a failure to check for ARG_MAX?
p_wood - 4 months ago

> eval "$CMD"
That means you will eval all the filenames, so if you have a file with spaces in it will appear as two files, if there is a `$` in the name it will trigger parameter substitution and so on for the other shell meta-characters.
- amelius - 4 months ago
  
  Yes, that could be true. I'm not great in Bash. Be careful. These types of error are why I don't use Bash. I just wanted to give an example in a commonly used scripting language. The main point here is to check ARG_MAX.
account42 - 4 months ago

If you are doing something where going over ARG_MAX is a real possibility it would be better to write your scripts to avoid the problem altogether rather than awkwardly try to detect it wit ha bonus exploit. For example, many commands can accept a list of files on standard input or from a list file.
_huayra_ - 4 months ago

On my system, `getconf ARG_MAX` is over 2m.
I have seen some heinously long cmdline strings, but nothing close to that. Usually when invocations have creeped up into the O(1k) character limit at places I've worked, I've implemented a "yaml as cmdline args" option to just pass a config file instead.
Have you seen scenarios where this is actually limiting?
- amelius - 4 months ago
  
  Yes, if *.c expands to a string over 2m. Maybe that is a lot for .c files, but it may easily happen with .tiff and a folder full of images used for training a deep learning model, for example.
  - _huayra_ - 4 months ago
    
    Thanks, this is interesting. I have done a lot of this sorta stuff (glob expanding giant directories) without a thought for this `ARG_MAX` libc parameter, but now I know I need to keep it in mind!
hulitu - 4 months ago

> Before invoking a command, always first check if the length of the command is not longer than ARG_MAX.
Tell that to Google and Mozilla. /s

croes - 4 months ago

> Long form options are much more self-explanatory for the reader.

And less prone to typos

dosourcenotcode - 4 months ago

And not just options but base command names too. I wrote a tool to partially mitigate this in some cases: https://github.com/makesourcenotcode/name-safe-in-bash
- sebastianmestre - 4 months ago
  
  You have a nice ternary counter going in the version numbers :)

ndegruchy - 4 months ago

This is one of my default rules for writing scripts. If the long option is available, use it. It makes too much sense to do so.

gabrielsroka - 4 months ago

jFriedensreich - 4 months ago

The hype side of this is to always add instructions to use long options to the agent base prompt. Its much easier to spot mistakes and long options have the advantage to not do something completely different if they are wrong.

ashu1461 - 4 months ago

Reminds me of our code having llm generated regular expressions which are impossible to understand and the only way you can tweak it is giving it to llm to change.

akovaski - 4 months ago

Folks, remember record your LLM prompt in a comment so that your regex can be validated.

- 4 months ago

[deleted]

vivzkestrel - 4 months ago

what if I or someone wrote a bot / script that searches across github for every shell script file that it can find and converts all short options into long options and opens a PR? Think dependabot but lets call it longabot or readabot?

jmholla - 4 months ago

I don't think you should do that unprompted. There are reasons for using short options like the portability mentioned in other comments. It'd put an undo burden on open source maintainers.
Something opt-in like dependabot though could be useful.
zelphirkalt - 4 months ago

Then you will get to feel the unreasonable hate of lots of lazy people, caring more about typing a few characters less, than their stuff being readable, and from all the Apple users.

wodenokoto - 4 months ago

Detracting from the message, but what is the `try shell.exec(" ...` thing?

pixelkink - 4 months ago

Prepares to launch in flaming rant... sigh You're right.

chasil - 4 months ago

If your goal is to help your coworkers, this is correct.

If not, it isn't.

ofrzeta - 4 months ago

and while you're at it, please alias --help to -h :-)

jiehong - 4 months ago

Some cli are event worse at this when you try -h or —help and they tell you that no, you must run ‘cli help’ instead of showing the help already.

m463 - 4 months ago

not only long options, also multiple lines:

  somecmd \
      --option1 \
      --option2 $FOO \
      --option3 $BAR

malkia - 4 months ago

It depends...

sloooooooooooop - 4 months ago

[flagged]

johnisgood - 4 months ago

What is this? Use libgit2 or use a proper language where you do not need exec(), do it in Bash, or do it in C with libgit2 or with popen() or something. Using "system()" in C is never the right thing to do, the least you can do is using popen().

Not sure in which language you use "try shell.exec", but I am not even sure that it is the right way.

- 4 months ago

[deleted]

lapsed_lisper - 4 months ago

I used to think this, but now mostly (but weakly) don't. Long options buy expressiveness at the cost of density, i.e., they tend to turn "one-liners" into "N-liners". One-liners can be cryptic, but N-liners reduce how much program fits on the screen at once. I personally find it easier to look up flags than to have to page through multiple screenfuls to make sense of something. In this respect, ISTM short options are a /different/ way of helping a subsequent reader, by increasing the odds they see the forest, not just the trees.

userbinator - 4 months ago

Strongly, strongly disagree. This is a GNU-ism, and needlessly verbose. What's with people refusing to use the vast amount of memory in their brains and actually learning, instead lazily going for the lowest-common-denominator approach relying on "loanwords" from English?

mplanchard - 4 months ago

Because not everyone has 20 or more years of flag memory from using Linux, and they’ve also got to maintain the scripts. If you’re not familiar with every arcane invocation of find or tar or whatever, or even if it’s just been a while, the long options are a godsend when you’re skimming a script trying to figure out what it’s doing.
- tannhaeuser - 4 months ago
  
  No the problem is to overload utilities with hundreds of options and subcommand languages over the course of decades. The starting point was quick command-line usage but GNU-style long options are clearly rooted in Elisp/emacs hyphenated-words-verbosity. You can tell it wasn't meant to be like this when typing somecmd -h results in multiple pages of option output where you can't see the forrest for the trees, and the command's manpage similarly loosing its usefulness due to sheer size, and frequently not even containing an EXAMPLES section. If it has a manpage in the first place, rather than linking to a nonexistant texinfo page like was practice for the longest time. All of which make you none the wiser so you go to the web for example usage.
- xyzzy9563 - 4 months ago
  
  You can always copy+paste it into an LLM and have it explain the options in 5 seconds.
  - mplanchard - 4 months ago
    
    I like how programmers are all about locality of reasoning and avoiding context switches until the context switch is “go paste it into an LLM”
    
    xyzzy9563 - 4 months ago
    
    Just wait 5 years when 99% of code is written by LLM agents.
    
    - 4 months ago
    
    [deleted]
  - timewizard - 4 months ago
    
    I just use the man page. Which happily describes all the options in an easily digestible format that does not require either an ineternet connection or massive amounts of power to be wasted on a result that is almost certainly copyrighted and flat out stolen from it's original creator.
    I am of a generation where I see these uses of LLMs as exceptionally lazy or showy. Your ready use of a "chatbot" is not an attribute to broadcast like this.
- userbinator - 4 months ago
  
  Because not everyone has 20 or more years of flag memory from using Linux
  It's called learning, apparently something that is now being eschewed in favour of quick superficiality (and making developers more replaceable --- with AI or unskilled offshore labour.) No wonder "modern" software is almost always total shit.
  "But I don't have the time to learn," you complain, but ask yourself this instead: Why do you never have the time to do it right, but always the time to do it twice (or however many times is necessary to get a barely-working product)?
  - JimDabell - 4 months ago
    
    I’ve got almost 30 years of using UNIX shells and I’ve learnt a lot more than most, but there are plenty of tools that have things like -r and -R that do radically different things and I don’t always remember the difference between them. If it’s written --recursive then it’s a lot clearer. Clarity is useful even when you have a lot of knowledge and experience.
  - mplanchard - 4 months ago
    
    As someone who works on a team of people with different levels of experience, where understanding is more important than some self-inflicted purity test of their unix chops, I will always either do long options or, where that’s not possible, a comment explaining the short options.
    No one is complaining about not having time to learn: you’re arguing against a strawman. The point is not to discourage learning, it’s to encourage clarity in a context where it can at times be critically important.
  - olejorgenb - 4 months ago
    
    Learning that this tool can be used do these things, manipulate these objects or stream of data like this, and that this tool is useful in combination with these other tools are the valuable part. Memorizing some arbitrary encoding of the parameters are not really useful when I can offload it to my language center in the brain and basically get that part for free...
    
    skydhash - 4 months ago
    
    Why do you conflate learning with memorizing? man <command> is easy to do. And you take notes, write script, functions and alias for things you do often.
  - pcwalton - 4 months ago
    
    I’m nostalgic about old Unix too, but it’s just that: nostalgia. In reality, Unix was the system that brought us such elegant well-designed APIs as gets(3) and strtok(3).
  - hackerthemonkey - 4 months ago
    
    Learning is one thing and being able to comprehend a piece of code at a glance is another. I might know them but wouldn’t immediately remember all the past and future short form of flags ever written. What is even the point of that? If I am comfortable with the essence of what they do, does it really matter that much if I have that ingrained in my memory that -r does foo while -f does bar?
  - biorach - 4 months ago
    
    You're confusing learning with memorization.
    
    hackerthemonkey - 4 months ago
    
    Exactly that. Well said. I can control my learning but memory? Not as much.
    
    userbinator - 4 months ago
    
    It's the lowest, easiest level of learning.
  - 9dev - 4 months ago
    
    [flagged]
    
    userbinator - 4 months ago
    
    [flagged]
    
    9dev - 4 months ago
    
    Nobody here was advocating against learning and memorising, but for writing code that is as self-evident and unambiguous as possible.
    Talk to any electrician which kinds of diagrams they prefer; those were a fellow was clever, trying to compress stuff as much as possible, or those that are clearly readable and easy to understand. Pilots carry heaps of documentation with them on the plane, walk through checklists instead of trying to memorise what to do. Your argument doesn’t hold up the slightest scrutiny.
    I’m advocating for a culture of helping people hone their skills by learning from others instead of being turned off by their arrogance. That’s a very different thing from encouraging them to avoid thinking.
01HNNWZ0MV43FF - 4 months ago

Programs are read more often than they're written, and on a scale of decades all programmers are novices. So I try to optimize my code to be obvious to novices
What's wrong with GNU? they automatically wrong or something?
- userbinator - 4 months ago
  
  So I try to optimize my code to be obvious to novices
  That's the attitude which is responsible for making software the way it is today: mediocre.
  - cedilla - 4 months ago
    
    Software is mediocre today? You should have seen it back in the day. It was a brittle, arcane mess.
    Software isn't as exciting any more, but that excitement isn't gone because newbs don't have to look up what -O stands for anymore. It's because it's wildly more reliable.
wruza - 4 months ago

I low key agree on this, because I think it doesn’t matter and you have to look up the meaning anyway. English-like options may make you think you already know the details but usually you don’t. If there’s a need to develop in sh/bash/etc, then a modern solution would be to make something like a language server for it which could “go to definition” right into a man page section, or combine docs for all used options in a single popup/aux window. Of course it’s not possible for every man page, since some of these are written in too free form. Iow, it’s shell, whatever you program here in more than two lines you’d better program in python instead (and I’m saying this as a low key python hater).