Settings

Theme

Use Long Options in Scripts

matklad.github.io

297 points by OptionOfT 10 months ago · 156 comments

Reader

wahern 10 months ago

Please DO NOT mix string interpolation and command execution, especially when a command is processed through the shell. Whatever your language, use a list-based or array-based execution API that passes arguments straight through to execv(2), execvp(2), etc, bypassing the shell.

  • matklad 10 months ago

    Was waiting for this comment :P

    The API used handles string interpolation correctly: the string literal is parsed at compile time, and the interpolated arguments are never concatenated or escaped, and end up directly as an element of arv array passed to a child. See

    https://github.com/tigerbeetle/tigerbeetle/blob/7053ecd2137a...

    • chubot 10 months ago

      This approach creates an odd mini language, which is incomplete:

          comptime assert(std.mem.indexOfScalar(u8, cmd, '\'') == null); // Quoting isn't supported yet.
          comptime assert(std.mem.indexOfScalar(u8, cmd, '"') == null);
      
      But you can do correct interpolation with simple shell variables, rather than generating shell code strings:

          $ today='foo; bar' sh -c 'argv git switch --create "release-$today" origin/main'
          ['git', 'switch', '--create', 'release-foo; bar', 'origin/main']
      
      So that is a test that we can use a plain shell string, without any shell injection bug. (argv is my command to test quoting: python -c 'import sys; print(sys.argv)' "$@" )

      Note that there's no escaping function needed, because we're not generating any shell code. We're generating an argv array for `/bin/sh` instead.

      ---

      So by invoking with an env var, you can easily create a correct API that uses plain shell

          git switch --create "release-$today"
      
      rather than

          git switch --create release-{today}  # what language is this?  It's not obvious
      
      If you don't want to use the env var, you can also use

          git switch --create "release-$1"
      
      And invoke with

          ['sh', '-c', shell_string, 'unused-arg0', today_string]
      
      With this approach, you don't need

          1. any kind of shell escaping
          2. any analyzing of pseudo-shell strings, which can't contain quotes
      
      Because you are not generating any shell code. The shell code is constant.
  • latexr 10 months ago

    Why would they even change the language and the commands in the example? It confuses and undermines the point. Just say “use `git switch -c my-new-branch` for interactive usage and `git switch --create my-new-branch` in scripts”. It makes no sense to introduce other unexplained information.

  • gray_-_wolf 10 months ago

    Another approach is to have powerful enough language that allows you to guard against the shell injection. I wrote a syntax form allowing to do this:

        (sh "cat " file " >" output)
    
    With file being bound to "foo'bar" and output to "x", it is automatically translated into

        cat 'foo'\''bar' >'x'
    
    This gives you the flexibility to use shell (sometimes it just is the most concise way) while being safe against injection.

    I believe for example in rust you should be able to do the same.

    • delusional 10 months ago

      How do you know which shell you're escaping for? You could query the system, but now you end up implementing escaping for every shell out there.

      • gray_-_wolf 10 months ago

        Good question. I care only about POSIX compatible shells, so the escaping just follows the POSIX rules. In practice that means it works on any actually used system except windows, which is fine with me.

  • gorgoiler 10 months ago

    Miniature, in-line sh scripts are also fine as long as you use the provided parameter substitution.

    If you’re averse to this:

      q(“select x where y = ‘“ + v + “‘“)
    
    And instead do this:

      q(“select x where y = %s”, v)
    
    Then you should be averse to this:

      x(“foo --option ‘“ + v + “‘“)
    
    And instead do this:

      x(‘foo --option “$1”’, v)
    
    This is particularly useful when it’s expedient to have one thing piping into another. Like it or not the sh DSL for pipes is excellent compared to doing things natively with execve() and pipe(), just as doing group by and count is far more concise in SQL than doing so natively.

    Most SQL libraries give you something like q. Writing your own x is as simple as calling sh correctly. In Python, for example:

      def x(script, *args):
        run([“sh”, “-c”, script, “--“, *args])
    • kbenson 10 months ago

      Neither of those are equivalent to variable binding, which is what most SQL libraries provide, specifically because they don't actually solve the problem since they're still doing string substitution. Putting a double quotes in $1 in your "good" execute example will allow you break out of what's expected and then you're Bobby Tables.

      Your python example at the bottom is correct, in that each separate element is more correct in that it allows each arg to be passed as an element, so there's no option to break out through quoting characters. SQL binds are like that in most ljbraries, even if they don't look like it. The parser knows a single item below there so if it passes it along as such. You cannot escape it in the same way.

      • gorgoiler 10 months ago

        I don’t really follow. My “good” example and the code at the bottom are the same.

        sh is smarter than just doing string interpolation and ”$1” is passed on as a single argument, no matter what:

          > run(["sh", "-c", 'echo "$1"', "--", 'a"'])
          a”
        
        Whereas if it were simple string interpolation, you’d see this:

          > run(["sh", "-c", 'echo "a""')
          --: 1: Syntax error: Unterminated quoted string
        
        It’s the same special casing that gets "$@" right.
        • kbenson 10 months ago

          That requires you quote the param in the sintr to ensure that params are groups as expected. E.g.

              # cat pvars.sh
              #!/bin/bash
              echo "\$1=$1"
              echo "\$2=$2"
              echo "\$3=$3"
              # sh -c './pvars.sh "$1" $2 $3' -- 'a b' 'c d'
              $1=a b
              $2=c
              $3=d
          
          The whole point of passing in an array and using something like exec (or system(), if provided as it handled the fork and wait for you) is that you avoid the overhead of the shell starting up at all and parsing the command line, and it lets you define each param exactly as needed since each param is its own array item. You don't need to worry about splitting on space or the shell splitting params on space, or quoting to group items. If you want the param to be:

              foo "bar baz" quux
          
          as one singular parameter, you just make that the contents of that array item, since no parsing need be done at all.

          If you have an array of params and you're jumping through hoops to make sure they're interpreted correctly by the shell you call execute a process, you're likely (depending on language and capabilities) wasting both cycles and over complicating the code when you can just call the program you actually want to execute directly and supply the params. Alternatively, if you have all the params as one long string and you want it to be pased as a shell would, then execute the shell and pass that as a param. e.g.

              # perl -E 'system("./pvars.sh","a b","c d");'
              $1=a b
              $2=c d
              $3=
              # perl -E 'system("./pvars.sh","a b","c","d");'
              $1=a b
              $2=c
              $3=d
          • gorgoiler 10 months ago

            Thanks for explaining. I feel like we’re talking past each other but it’s my mistake. I should have said it is only useful (not “particularly useful”) if one has compound statements like a pipe or multiple commands. Invoking sh just to run a single command is superfluous and you are right that reaching directly for execve() is better.

            • kbenson 10 months ago

              Ah, yes. If you want to take advantage of piping commands together through the shell as a subcommand of your program, then a way to make params behave more consistently regardless of content is useful.

    • pwdisswordfishz 10 months ago

          SyntaxError: invalid character '“' (U+201C)
  • crazygringo 10 months ago

    For anything involving file paths, user input, etc. -- yes of course. It's not even a question because they would need to be escaped otherwise which nobody wants to do.

    But for a simple example like this where it's inserting a date which has known properties, it seems fine, and is much more readable.

  • paulddraper 10 months ago

    Tbf this input does not need escaping.

    But at the very least the shell is unnecessary here.

  • tasuki 10 months ago

    Why not?

    • bulatb 10 months ago

      Any time you send commands and data down a single channel, user input that's intended to be data can be misinterpreted as a command. For example, if your program wants to:

          run("program --option '{user_input}' > file")
      
      to save some input to a file, and the user's input is:

          '; bad_command #
      
      then when run() sends that string to the shell, the shell will run:

          program --option '';
          bad_command #' > file
      
      Most languages have something like a safe_exec() that separates the shape of the command from the values of the options, executing "program" with the options and the user_input in the arguments array as data. Skipping the shell step, which would just be building an exec call anyway, removes the opportunity for users to confuse it into doing something else.

      The list-based API alternative they recommend might look like this:

          safe_exec(["program", "--option", user_input], stdout="file")
      
      and it would always exec "program" with argv[1] == "--option" && argv[2] == user_input. If the user_input happens to be:

          '; bad_command #
      
      ...well, then, the user can enjoy the contents of their file.
      • tasuki 10 months ago

        Yes of course. But why would you expect me to run shell commands with random person's input? Also:

            safe_exec(["rm", user_input])
        
        This isn't safe either! Despite clearly saying "safe_exec"!
        • bulatb 10 months ago

          Yeah, "safe_exec" is a useless name without context. But the context was you need to call a program from another program. Many people would call system() or whatever because usually it's obvious and easy, and the pitfalls are less so.

          Shelling out is not the only option. People are just saying not to use that option. Better ones won't save you if you purposely do something stupid. They will save you if the user wants to trick you into doing something else.

          • tasuki 10 months ago

            A nearby comment mentioned escaping. I guess that might be a good reason to use execv?

  • echelon 10 months ago

    SQL injection on steroids.

    • rat87 10 months ago

      Only if you are getting input from untrusted users

      • remus 10 months ago

        imo it's best to just avoid it altogether. Requirements change, and what was once a trusted input can become untrusted input.

        • rat87 10 months ago

          Generally you shouldn't be passing random data from the web to shell scripts. Maybe I haven't done the right type of work but having to deal with fidlg bits it's much more likely not passing it to be shell will cause issues (with stuff like executable paths)

      • brookst 10 months ago

        Or if your trusted users are fallible and could be tricked into providing unsafe inputs.

susam 10 months ago

I prefer long options too. However, while writing programs that need to invoke POSIX commands in a portable manner, short options are the only viable choice, as POSIX doesn't specify long options. For instance, see the specification for diff at <https://pubs.opengroup.org/onlinepubs/9799919799/utilities/d...>, or that of any POSIX utility listed at <https://pubs.opengroup.org/onlinepubs/9799919799/idx/utiliti...>.

That said, this is more of a corner case. In most scenarios, rather than relying on POSIX utilities, there are often better alternatives, such as using library bindings instead of spawning external processes. For example, instead of invoking grep, using something like libpcre could be a more efficient choice.

For non-POSIX utilities like git, hg, rg, ag, etc., using long options makes perfect sense.

  • Wowfunhappy 10 months ago

    > However, while writing programs that need to invoke POSIX commands in a portable manner

    ...probably a stupid question, but something I have earnestly been wondering about... when does this actually happen nowadays? What POSIX systems are you targeting that aren't one of the major ones (Linux, Darwin, or one of the major BSDs)?

    I was writing a shell script a few months ago that I wanted to be very durable, and I targeted sh instead of bash just because, well, it seemed like the correct hacker spirit thing to do... but I don't actually know what system in the past decade (or more) wouldn't have bash.

    • chrisweekly 10 months ago

      I ~recently had to wrestle w/ TeamCity (CICD) whose build agents provide only sh. I needed to invoke a 3rd-party util that required bash. The resulting "bash in dash in docker in docker" worked, but I wasn't thrilled about the convoluted / Frankenstein setup.

    • em500 10 months ago

      > ... but I don't actually know what system in the past decade (or more) wouldn't have bash.

      There's some ambiguity about "have bash". If "having" bash means that (some version of) bash has been ported to the system, there are indeed very few. If "having" means that bash (supporting all options that you need) is available to the user, that could be a lot more. As others have noted, the BSDs, Android and many embedded Linux systems don't come with bash pre-installed, MacOS pre-installed bash is stuck at version 3.2 (which doesn't have associative arrays), and the user could be in an environment that does not allow them to install whatever they need.

    • mceachen 10 months ago

      Alpine docker images only come with dash instead of bash, which _may_ run your sh script, but test thoroughly. Or just install bash.

      FWIW, Darwin/macOS is especially guilty of gobsmackingly ancient coreutils that don’t support long option variants.

      • samatman 10 months ago

        Is it? I'm with you on gobsmackingly ancient, but it's "doesn't support long options" which I haven't bumped into. I do replace some coreutils, but not all of them.

        What's a good example of such a utility?

        • skissane 10 months ago

          For example, sed. macOS sed doesn't support long options. Not even --help or --version

          (Running an older version of macOS so can't completely exclude this has been updated in a newer version, but I'd be surprised to learn that was true.)

      • dfe 10 months ago

        macOS doesn't have GNU coreutils at all. It has the utils from FreeBSD.

        The gobsmackingly ancient GNU software it does have is bash, because it's the last version under GPL 2. I've used Mac OS X since 10.1, so I remember when the default shell was tcsh and /bin/sh was not bash.

        That's (basically) the case again on the last few macOS releases. Today, zsh is my shell of choice, including on Linux.

      • pingiun 10 months ago

        the alpine default shell is called "ash", "dash" is the debian/ubuntu default shell

      • hulitu 10 months ago

        P in POSIX stands for portability. ,/s

    • cmgbhm 10 months ago

      Where I can sometimes get burnt is busybox.

      I more often get burnt in zsh to bash than that however

    • susam 10 months ago

      > I don't actually know what system in the past decade (or more) wouldn't have bash.

      I have written a bit more about it in these comments:

      https://news.ycombinator.com/item?id=40681382

      https://news.ycombinator.com/item?id=17074163

    • Gud 10 months ago

      FreeBSD doesn’t come with bash though.

    • vsl 10 months ago

      Major ones are enough. Linux and Darwin (that is, macOS and GNU userspace, really) differ sufficiently that you need to pay attention or limit yourself to POSIX. E.g. sed and wc burned me a few times with scripts that need to run on both.

    • Someone 10 months ago

      > but I don't actually know what system in the past decade (or more) wouldn't have bash.

      I think MacOS still has bash, so that it, technically, doesn’t count, but it doesn’t have a bash from the past decade, and uses zsh by default.

  • theamk 10 months ago

    Note that grep in particular is extremely optimized.. If you have multi-gigabyte files, and you only search for one thing, shelling out to grep will likely have much better performance that doing it yourself.

    But not every system needs that much, and in a lot of cases, using your language's regexp library will be more robust anf easier to write.

dosourcenotcode 10 months ago

Agree that long options should be used. But there is one caveat to consider: portability.

Sadly to this day not all BSD distributions have GNU style long options. And the ones that now do only got them fairly recently. So if you want portability you have to use short options as you weep with a bottle of vodka in hand.

  • mplanchard 10 months ago

    Not trying to spam this thread with praises of nix, because it does have its own problems, but it certainly solves the portability problem.

    Four years in to using it at work for dev environments across mac (x86 & ARM) and various linuxes and can’t imagine going back. I also always make dev environment definitions for my open source projects, so even if people aren’t using nix, there is at least a record of what tools they will need to install to run scripts, tests, etc.

    • nine_k 10 months ago

      Does nix work well on BSD-derived Unices? In particular, the most widespread of them, macOS?

      • mplanchard 10 months ago

        Yes, works great on Mac. About half our engineers us Macs, the other half Linux. We have one nix configuration for the dev environment, which works for everyone.

  • saghm 10 months ago

    This surprises me because the first case I remember ever coming across where short versus long options impacted portability across GNU and BSD was _fixed_ by using long options. Maybe six years ago or so I had an issue porting a script someone else had written for use in CI that happened to decode some base64 data that failed when I tried to use it on a different platform. I forget which one it was originally written for and which one I was trying to use it on, but the issue boiled down to the MacOS version of base64 using the BSD short option for decode and Linux using the GNU one, and they each used a different capitalization; one used `-d` and the other used `-D` (although I also can't remember which used which honestly). My solution was to use the long option `--decode`, which was the same on both of them, and since then the times I've needed to decode base64 I've always used the long option out of habit, which probably explains why I can't remember what option Linux uses despite it being the one I've used far more over the years since then.

    • delusional 10 months ago

      I think the right way to think about this (if your goal is to avoid surprises at least) is that options (short or long) are just strings. There's no guarantee that there's a long variant of an option. There's not even a requirement that options start with a dash. A sufficiently brain-damaged developer could start them with a slash or something.

      If you're going for portability the best bet is to just read the manual for each of the separate versions and do whatever works.

      • sgarland 10 months ago

        To this day, I write tar options with no dash, simply because I can. `tar cvzf foo.tar.gz ./foo`

        I would never write a new program with this option, but I do find it a delightful historical oddity.

        • saghm 10 months ago

          I've noticed that it seems to be a pattern that's used for other compression/decompression software as well. Sometimes mods I use for games will be uploaded as rars or 7zips (I guess because this stuff gets developed on and for Windows, and tarballs aren't really something people use much there), and the CLI invocations I use to extract them always look off to me, especially the 7zip one: `unrar x` and `7z x`.

      • saghm 10 months ago

        That sounds reasonable to me. If anything, I might even go further and say that reading the manuals wouldn't be enough to fully convince me without also actually testing it by running a script on a given platform. It's not that I don't trust the manuals to be right, but I have less trust in myself to write bug-free code than probably any other language I've ever used, and I don't think I'd feel confident without verifying that I actually did what the manual said correctly.

teddyh 10 months ago

Also, do not forget using “--” after all options, but before any dynamic arguments, just to be safe.

  • arcanemachiner 10 months ago

    I know to do this intuitively, but I have no idea why.

    • hoherd 10 months ago

      It terminates argument parsing, so anything following it that starts with a hyphen will not be treated as an argument.

          $ echo 'hack the planet' > --help
          $ cat --help
          cat: illegal option -- -
          usage: cat [-belnstuv] [file ...]
          $ cat -- --help
          hack the planet
          $ rm -vf --help
          rm: illegal option -- -
          usage: rm [-f | -i] [-dIPRrvWx] file ...
                 unlink [--] file
          $ rm -vf -- --help
          --help
          $ cat -- --help
          cat: --help: No such file or directory
    • less_less 10 months ago

      It tells the shell utility that any remaining arguments are not options, but instead files or whatever the script might process. You know, in case someone makes a file called -rf.

      • pletnes 10 months ago

        But not all shell utilities follow this particular convention

        • ndsipa_pomu 10 months ago

          Most of the commonly used ones do, so it's easiest to just always do it and then remember the two or three utils that don't like it.

        • hackerthemonkey 10 months ago

          Yes. It’s more of a convention. If a shell utility isn’t following it then it won’t mean what we think it would mean.

          Also - it helps a lot when a utility accepts a path that may or may not contain hyphens.

        • ribcage 10 months ago

          Yes. The famous echo on Linux systems does not have it and therefore it's impossible to print the string "-n o p e", because -n will be interpreted as an option.

          • bonzini 10 months ago

            echo is not portable anyway, use "printf %s STRING" or "printf '%s\n' STRING".

            • ribcage 10 months ago

              Yes, that's what I use. Sometimes I still get tempted to use echo because there's less typing...

          • ezequiel-garzon 10 months ago

            It does if single or double quotes are used, right? Which would be necessary (or preferred to multiple backslashes) quite often.

            • bonoboTP 10 months ago

              No, the quotes are not seen by the program. The program receives a list of strings, it does not get the information about whether and how those strings were originally quoted in the shell. Programs can also be directly called with lists of strings as in execve, so often it does not even make sense to ask if the arguments were quoted or not.

              Quotes live on a different level of abstraction.

              • danadam 10 months ago

                > No, the quotes are not seen by the program. The program receives a list of strings, it does not get the information about whether and how those strings were originally quoted in the shell.

                With quotes the program will receive a single argument -n␣o␣p␣e instead of multiple ones -n, o, p, e. At least it works on the machine here:

                    ]$ echo "-n o p e"
                    -n o p e
                    
                    ]$ /bin/echo "-n o p e"
                    -n o p e
                • bonoboTP 10 months ago

                  Yes, I think there was some misremembering here. The nontrivial thing is to print out -n itself with echo. For example, echo doesn't treat "--" specially, so "echo -- -n" prints "-- -n".

              • account42 10 months ago

                Note that this is true for POSIX sytems but not e.g. for Windows. There the program receives the command-line as-is and is responsible for parsing it into an array. There are two different standard functions to do this parsing for you (with slightly different quoting behavior) but you could also create your own that requires options to not be quoted.

    • bluedino 10 months ago

      It's worth it just to watch the frustration of a junior when they try tacking more arguments on the end of a command.

      • account42 10 months ago

        A great opportunity to teach the importance of reading the whole command before trying to modify it.

saagarjha 10 months ago

Unfortunately, if you want your scripts to be portable to other POSIX systems you might have to use the short options, as the long ones are not standardized. You have to decide the tradeoff for yourself.

  • mplanchard 10 months ago

    Using nix has really spoiled me on this. Everyone gets the same versions of all the CLI utilities in the dev environment, whether on mac or linux, and those are the same versions that run in CI and any prod systems. It’s really nice being able to use whichever newer bash features or gawk extensions you like, without having to deal with trying to ensuring the mac engineers have brew-installed all the right stuff to match a standard linux env.

    • delusional 10 months ago

      nix didn't solve your issue here. nix didn't do anything. You're just describing the benefit of a reproducible development environment. You could do the same thing with brew, pacman, apt, or by just compiling every package from source from some huge mirror.

      It's exactly the same thing people initially loved about docker or vagrant.

      • mplanchard 10 months ago

        Sure, but it works on Mac and Linux and doesn’t require virtualization. I think brew might qualify, but it can’t define which environment variables should be available in the developer shell or which hooks to run upon entry.

        I don’t think any of the other options you specified can manage the same thing.

        • SAI_Peregrinus 10 months ago

          Also you can use nix-shell as the shebang, and then pass a second line to tell it what program to use to interpret the rest of the script and all of its dependencies. It'll fetch the shell utilities in the versions you want when the script is run.

        • delusional 10 months ago

          >[...] it can’t define which environment variables should be available in the developer shell or which hooks to run upon entry.

          Neither pacman, apt, nor any other package manger require any sort of virtualization. pacman works fine where-ever you have a C compiler including macos, linux, windows, probably even TempleOS. Whatever you want.

          If you want to add something to the user environment system wide, the traditional thing to do is to dump a file into `/etc/profile.d/` which will be sourced during shell startup. If you instead want something local to the project, you just make a script that the developer can source, like a python virtualenvironment.

          I'm not saying any of these ideas are bad. I am saying that they are easily solvable and have been solved for the past 20 years. Without Nix.

    • paulddraper 10 months ago

      Everyone has to use nix :)

      But yes, that is nice.

      • mplanchard 10 months ago

        That is the caveat. I initially set it up such that it wasn’t required: you could choose to use it if you wanted to, and otherwise here is a list of specific versions things you must install, etc. Everyone ultimately chose to use nix, and now it’s required. Makes for a pretty easy setup though for new devs: install nix, then run `nix develop`, then `make setup`, and you’re off to the races.

  • pcwalton 10 months ago

    What POSIX systems in actual use (not historical Unixes) don't have the long options? macOS' BSD utilities I guess?

ratrocket 10 months ago

I agree with this practice. Another benefit is it makes it easier (slightly, but still) to grep the man page for what the options do.

The corollary must be "write programs that take long options".

starkparker 10 months ago

And put them on separate lines so you can track and git blame them more easily.

amelius 10 months ago

Before invoking a command, always first check if the length of the command is not longer than ARG_MAX. For example, if this is your command:

    grep --ignore-case --files-with-matches -- "hello" *.c
Then invoke it as follows:

    CMD="grep --ignore-case --files-with-matches -- \"hello\" *.c"
    ARG_MAX=$(getconf ARG_MAX)
    CMD_LEN=${#CMD}

    if (( CMD_LEN > ARG_MAX )); then
        echo "Error: Command length ($CMD_LEN) exceeds ARG_MAX ($ARG_MAX)." >&2
        exit 1
    fi

    eval "$CMD" # warning, evaluates filenames
  • mhitza 10 months ago

    That might be sensible, but also obscure the script logic.

    Since using Linux exclusively, I don't think I've ever encountered an issue due to too many arguments/length. And it's the first time I'm actively searching online for ARG_MAX.

    I understand that different shells might be different, but with reasonable lengths is there any chance of it being relevant (aside from xargs, where it's generally intended, or better, to pass along each argument individually).

    • amelius 10 months ago

      I started running into these issues when I started working with training examples in the context of deep learning. A folder with millions of files is then not unheard of.

      Also if you do things like */*/*, then you can quickly get large command lines. Or even if you do long_name/another_long_name/*.

    • wodenokoto 10 months ago

      I've had problems with "cat *.csv" plenty of times processing data that is generated in many small files.

      It is really difficult to deal with, because on top of the arg max limit, globs are not guaranteed to be in order.

      The solution is not obvious and hard to get to if you don't know the foot guns in advanced and hard to read once implemented.

  • apgwoz 10 months ago

    You should always type check your shell scripts as well. For example, you just:

        $ shelltypes script.sh
        # Welcome to shelltypes v 3.23.2
        # type ‘help’ if you’re stuck
        >>> {1) # import POSIX;;
        Importing 73 items.
        >>> {2} # append loadpath “/opt/local/shelltypes/base”;;
        >>> {3} # import base::YOURPROJECT;;
        Importing 15 items.
        >>> {4} # check “YOURSCRIPT.sh”
        Parsing YOURSCRIPT.sh.
        Reticulating splines.
        Expanding aliases.
        Analyzing free shell environment variables.
        Found inconsistencies in PATH.
        Warning: Low battery!!!
        Warning: found free type for ‘shred’, ignoring.
        Warning: use of sudo requires password under /etc/sudoers.
        Warning: this utility is fake.
        Error: use of cat impossible in the presence of mutt.
        Found 15 errors.
        Try again. Goodbye.
        $
    
    
    Then you can be pretty sure your script isn’t going to do unnecessary harm, and has some proper guardrails in place.
    • Timon3 10 months ago

      Where does "shelltypes" come from? I can't find anything on DuckDuckGo or Google, but this seems like it would be very useful.

      • Arrowmaster 10 months ago

        From the output in the post I'm going to assume it's either a joke post or an LLM hallucination.

        • Timon3 10 months ago

          Oh god, it even says right there:

          > Warning: this utility is fake.

          Well played! I guess I got too excited at the possibility of such a tool existing.

          • apgwoz 10 months ago

            The thought is very compelling! shellcheck is pretty great, but of course isn’t this complete. It definitely can’t reticulate splines, for instance.

          • OJFord 10 months ago

            Such a tool does exist, it's called shellcheck. If you give /bin/sh shebang for example it will tell you off using non-POSIX features regardless of whether they'll work with the sh on your system.

            (Personally I typically use bash though, largely so I can `set -eEuo pipefail` next.)

    • amelius 10 months ago

      I don't know shelltypes, and it sounds like a linter for shell scripts.

      Does shelltypes warn against a failure to check for ARG_MAX?

  • p_wood 10 months ago

    > eval "$CMD"

    That means you will eval all the filenames, so if you have a file with spaces in it will appear as two files, if there is a `$` in the name it will trigger parameter substitution and so on for the other shell meta-characters.

    • amelius 10 months ago

      Yes, that could be true. I'm not great in Bash. Be careful. These types of error are why I don't use Bash. I just wanted to give an example in a commonly used scripting language. The main point here is to check ARG_MAX.

  • account42 10 months ago

    If you are doing something where going over ARG_MAX is a real possibility it would be better to write your scripts to avoid the problem altogether rather than awkwardly try to detect it wit ha bonus exploit. For example, many commands can accept a list of files on standard input or from a list file.

  • _huayra_ 10 months ago

    On my system, `getconf ARG_MAX` is over 2m.

    I have seen some heinously long cmdline strings, but nothing close to that. Usually when invocations have creeped up into the O(1k) character limit at places I've worked, I've implemented a "yaml as cmdline args" option to just pass a config file instead.

    Have you seen scenarios where this is actually limiting?

    • amelius 10 months ago

      Yes, if *.c expands to a string over 2m. Maybe that is a lot for .c files, but it may easily happen with .tiff and a folder full of images used for training a deep learning model, for example.

      • _huayra_ 10 months ago

        Thanks, this is interesting. I have done a lot of this sorta stuff (glob expanding giant directories) without a thought for this `ARG_MAX` libc parameter, but now I know I need to keep it in mind!

  • hulitu 10 months ago

    > Before invoking a command, always first check if the length of the command is not longer than ARG_MAX.

    Tell that to Google and Mozilla. /s

croes 10 months ago

> Long form options are much more self-explanatory for the reader.

And less prone to typos

ndegruchy 10 months ago

This is one of my default rules for writing scripts. If the long option is available, use it. It makes too much sense to do so.

gabrielsroka 10 months ago

Related 2013/2020 https://news.ycombinator.com/item?id=24518682

jFriedensreich 10 months ago

The hype side of this is to always add instructions to use long options to the agent base prompt. Its much easier to spot mistakes and long options have the advantage to not do something completely different if they are wrong.

ashu1461 10 months ago

Reminds me of our code having llm generated regular expressions which are impossible to understand and the only way you can tweak it is giving it to llm to change.

  • akovaski 10 months ago

    Folks, remember record your LLM prompt in a comment so that your regex can be validated.

vivzkestrel 10 months ago

what if I or someone wrote a bot / script that searches across github for every shell script file that it can find and converts all short options into long options and opens a PR? Think dependabot but lets call it longabot or readabot?

  • jmholla 10 months ago

    I don't think you should do that unprompted. There are reasons for using short options like the portability mentioned in other comments. It'd put an undo burden on open source maintainers.

    Something opt-in like dependabot though could be useful.

  • zelphirkalt 10 months ago

    Then you will get to feel the unreasonable hate of lots of lazy people, caring more about typing a few characters less, than their stuff being readable, and from all the Apple users.

wodenokoto 10 months ago

Detracting from the message, but what is the `try shell.exec(" ...` thing?

pixelkink 10 months ago

Prepares to launch in flaming rant... sigh You're right.

chasil 10 months ago

If your goal is to help your coworkers, this is correct.

If not, it isn't.

ofrzeta 10 months ago

and while you're at it, please alias --help to -h :-)

  • jiehong 10 months ago

    Some cli are event worse at this when you try -h or —help and they tell you that no, you must run ‘cli help’ instead of showing the help already.

m463 10 months ago

not only long options, also multiple lines:

  somecmd \
      --option1 \
      --option2 $FOO \
      --option3 $BAR
malkia 10 months ago

It depends...

johnisgood 10 months ago

What is this? Use libgit2 or use a proper language where you do not need exec(), do it in Bash, or do it in C with libgit2 or with popen() or something. Using "system()" in C is never the right thing to do, the least you can do is using popen().

Not sure in which language you use "try shell.exec", but I am not even sure that it is the right way.

lapsed_lisper 10 months ago

I used to think this, but now mostly (but weakly) don't. Long options buy expressiveness at the cost of density, i.e., they tend to turn "one-liners" into "N-liners". One-liners can be cryptic, but N-liners reduce how much program fits on the screen at once. I personally find it easier to look up flags than to have to page through multiple screenfuls to make sense of something. In this respect, ISTM short options are a /different/ way of helping a subsequent reader, by increasing the odds they see the forest, not just the trees.

userbinator 10 months ago

Strongly, strongly disagree. This is a GNU-ism, and needlessly verbose. What's with people refusing to use the vast amount of memory in their brains and actually learning, instead lazily going for the lowest-common-denominator approach relying on "loanwords" from English?

  • mplanchard 10 months ago

    Because not everyone has 20 or more years of flag memory from using Linux, and they’ve also got to maintain the scripts. If you’re not familiar with every arcane invocation of find or tar or whatever, or even if it’s just been a while, the long options are a godsend when you’re skimming a script trying to figure out what it’s doing.

    • tannhaeuser 10 months ago

      No the problem is to overload utilities with hundreds of options and subcommand languages over the course of decades. The starting point was quick command-line usage but GNU-style long options are clearly rooted in Elisp/emacs hyphenated-words-verbosity. You can tell it wasn't meant to be like this when typing somecmd -h results in multiple pages of option output where you can't see the forrest for the trees, and the command's manpage similarly loosing its usefulness due to sheer size, and frequently not even containing an EXAMPLES section. If it has a manpage in the first place, rather than linking to a nonexistant texinfo page like was practice for the longest time. All of which make you none the wiser so you go to the web for example usage.

    • xyzzy9563 10 months ago

      You can always copy+paste it into an LLM and have it explain the options in 5 seconds.

      • mplanchard 10 months ago

        I like how programmers are all about locality of reasoning and avoiding context switches until the context switch is “go paste it into an LLM”

      • timewizard 10 months ago

        I just use the man page. Which happily describes all the options in an easily digestible format that does not require either an ineternet connection or massive amounts of power to be wasted on a result that is almost certainly copyrighted and flat out stolen from it's original creator.

        I am of a generation where I see these uses of LLMs as exceptionally lazy or showy. Your ready use of a "chatbot" is not an attribute to broadcast like this.

    • userbinator 10 months ago

      Because not everyone has 20 or more years of flag memory from using Linux

      It's called learning, apparently something that is now being eschewed in favour of quick superficiality (and making developers more replaceable --- with AI or unskilled offshore labour.) No wonder "modern" software is almost always total shit.

      "But I don't have the time to learn," you complain, but ask yourself this instead: Why do you never have the time to do it right, but always the time to do it twice (or however many times is necessary to get a barely-working product)?

      • JimDabell 10 months ago

        I’ve got almost 30 years of using UNIX shells and I’ve learnt a lot more than most, but there are plenty of tools that have things like -r and -R that do radically different things and I don’t always remember the difference between them. If it’s written --recursive then it’s a lot clearer. Clarity is useful even when you have a lot of knowledge and experience.

      • mplanchard 10 months ago

        As someone who works on a team of people with different levels of experience, where understanding is more important than some self-inflicted purity test of their unix chops, I will always either do long options or, where that’s not possible, a comment explaining the short options.

        No one is complaining about not having time to learn: you’re arguing against a strawman. The point is not to discourage learning, it’s to encourage clarity in a context where it can at times be critically important.

      • olejorgenb 10 months ago

        Learning that this tool can be used do these things, manipulate these objects or stream of data like this, and that this tool is useful in combination with these other tools are the valuable part. Memorizing some arbitrary encoding of the parameters are not really useful when I can offload it to my language center in the brain and basically get that part for free...

        • skydhash 10 months ago

          Why do you conflate learning with memorizing? man <command> is easy to do. And you take notes, write script, functions and alias for things you do often.

      • pcwalton 10 months ago

        I’m nostalgic about old Unix too, but it’s just that: nostalgia. In reality, Unix was the system that brought us such elegant well-designed APIs as gets(3) and strtok(3).

      • hackerthemonkey 10 months ago

        Learning is one thing and being able to comprehend a piece of code at a glance is another. I might know them but wouldn’t immediately remember all the past and future short form of flags ever written. What is even the point of that? If I am comfortable with the essence of what they do, does it really matter that much if I have that ingrained in my memory that -r does foo while -f does bar?

      • biorach 10 months ago

        You're confusing learning with memorization.

  • 01HNNWZ0MV43FF 10 months ago

    Programs are read more often than they're written, and on a scale of decades all programmers are novices. So I try to optimize my code to be obvious to novices

    What's wrong with GNU? they automatically wrong or something?

    • userbinator 10 months ago

      So I try to optimize my code to be obvious to novices

      That's the attitude which is responsible for making software the way it is today: mediocre.

      • cedilla 10 months ago

        Software is mediocre today? You should have seen it back in the day. It was a brittle, arcane mess.

        Software isn't as exciting any more, but that excitement isn't gone because newbs don't have to look up what -O stands for anymore. It's because it's wildly more reliable.

  • wruza 10 months ago

    I low key agree on this, because I think it doesn’t matter and you have to look up the meaning anyway. English-like options may make you think you already know the details but usually you don’t. If there’s a need to develop in sh/bash/etc, then a modern solution would be to make something like a language server for it which could “go to definition” right into a man page section, or combine docs for all used options in a single popup/aux window. Of course it’s not possible for every man page, since some of these are written in too free form. Iow, it’s shell, whatever you program here in more than two lines you’d better program in python instead (and I’m saying this as a low key python hater).

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection