Why *not* parse `ls` (and what to do instead)?

9 min read Original article ↗

OP's Stated Intention Addressed

preface and original answer's rationaleupdated on 2015-05-18

mikeserv (the OP) stated in latest update to his question: "I do consider it a shame though that I first asked this question to point out a source of misinformation, and, unfortunately, the most upvoted answer here is in large part misleading."

Well, okay; I feel it was rather a shame that I spent so much time trying to figure out how to explain my meaning only to find that as I re-read the question. This question ended up "[generating] discussion rather than answers" and ended up weighing in at ~18K of text (for the question alone, just to be clear) which would be long even for a blog post.

But StackExchange is not your soapbox, and it's not your blog. However, in effect, you have used it as at least bit of both. People ended up spending a lot of time answering your "To-Point-Out" instead of answering people's actual questions. At this point I will be flagging the question as not a good fit for our format, given that the OP has stated explicitly that it wasn't even intended to be a question at all.

At this point I'm not sure whether my answer was to the point, or not; probably not, but it was directed at some of your questions, and maybe it can be a useful answer to someone else; beginners take heart, some of those "do not"s turn into "do sometimes" once you get more experienced. :)

As a General Rule...

please forgive remaining rough edges; i having spent far too much time on this already... rather than quote the OP directly (as originally intended) i will try to summarize and paraphrase.

[largely reworked from my original answer]
upon consideration, i believe that i mis-read the emphasis that the OP was placing on the questions i answered; however, the points addressed were brought up, and i have left the answers largely intact as i believe them to be to-the-point and to address issues that i've seen brought up in other contexts as well regarding advice to beginners.

The original post asked, in several ways, why various articles gave advice such as «Don't parse ls output» or «You should never parse ls output», and so forth.

My suggested resolution to the issue is that instances of this kind of statement are simply examples of an idiom, phrased in slightly different ways, in which an absolute quantifier is paired with an imperative [e.g., «don't [ever] X», «[you should] always Y», «[one should] never Z»] to form statements intended to be used as general rules or guidelines, especially when given to those new to a subject, rather than being intended as absolute truths, the apparent form of those statements notwithstanding.

When you're beginning to learn new subject matter, and unless you have some good understanding of why you might need to do else-wise, it's a good idea to simply follow the accepted general rules without exception—unless under guidance from someone more experienced that yourself. With rising skill and experience you become further able to determine when and if a rule applies in any particular situation. Once you do reach a significant level of experience, you will likely understand the reasoning behind the general rule in the first place, and at that point you can begin to use your judgement as to whether and to what level the reasons behind the rule apply in that situation, and also as to whether there are perhaps overriding concerns.

And that's when an expert, perhaps, might choose to do things in violation of "The Rules". But that wouldn't make them any less "The Rules".

And, so, to the topic at hand: in my view, just because an expert might be able to violate this rule without getting completely smacked down, i don't see any way that you could justify telling a beginner that "sometimes" it's okay to parse ls output, because: it's not. Or, at least, certainly it's not right for a beginner to do so.

You always put your pawns in the center; in the opening one piece, one move; castle at the earliest opportunity; knights before bishops; a knight on the rim is grim; and always make sure you can see your calculation through to the end! (Whoops, sorry, getting tired, that's for the chess StackExchange.)

Rules, Meant to Be Broken?

When reading an article on a subject that is targeted at, or likely to be read by, beginners, often you will see things like this:

  • "You should not ever do X."
  • "Never do Q!"
  • "Don't do Z."
  • "One should always do Y!"
  • "C, no matter what."

While these statements certainly seem to be stating absolute and timeless rules, they are not; instead this is a way of stating general rules [a.k.a. "guidelines", "rules of thumb", "the basics", etc.] that is at least arguably one appropriate way to state them for the beginners that might be reading those articles. However, just because they are stated as absolutes, the rules certainly don't bind professionals and experts [who were likely the ones who summarized such rules in the first place, as a way to record and pass on knowledge gained as they dealt with recurring issues in their particular craft.]

Those rules certainly aren't going to reveal how an expert would deal with a complex or nuanced problem, in which, say, those rules conflict with each other; or in which the concerns that led to the rule in the first place simply don't apply. Experts are not afraid to (or should not be afraid to!) simply break rules that they happen to know don't make sense in a particular situation. Experts are constantly dealing with balancing various risks and concerns in their craft, and must frequently use their judgement to choose to break those kind of rules, having to balance various factors and not being able to just rely on a table of rules to follow. Take Goto as an example: there's been a long, recurring, debate on whether they are harmful. (Yeah, don't ever use gotos. ;D)

A Modal Proposition

An odd feature, at least in English, and I imagine in many other languages, of general rules, is that they are stated in the same form as a modal proposition, yet the experts in a field are willing to give a general rule for a situation, all the while knowing that they will break the rule when appropriate. Clearly, therefore, these statements aren't meant to be equivalent to the same statements in modal logic.

This is why i say they must simply be idiomatic. Rather than truly being a "never" or an "always" situation, these rules usually serve to codify general guidelines that tend to be appropriate over a wide range of situations, and that, when beginners follow them blindly, are likely to result in far better results than the beginner choosing to go against them without good reason. Sometimes they codify rules simply leading to substandard results rather than the outright failures accompanying incorrect choices when going against the rules.

So, general rules are not the absolute modal propositions they appear to be on the surface, but instead are a shorthand way of giving the rule with a standard boilerplate implied, something like the following:

unless you have the ability to tell that this guideline is incorrect in a particular case, and prove to yourself that you are right, then ${RULE}

where, of course you could substitute "never parse ls output" in place of ${RULE}. :)

Oh Yeah! What About Parsing ls Output?

Well, so, given all that... i think it's pretty clear that this rule is a good one. First of all, the real rule has to be understood to be idiomatic, as explained above...

But furthermore, it's not just that you have to be very good with shell scripting to know whether it can be broken, in some particular case. It's, also, that it's takes just as much skill to tell you got it wrong when you are trying to break it in testing! And, I say confidently that a very large majority of the likely audience of such articles (giving advice like «Don't parse the output of ls!») can't do those things, and those that do have such skill will likely realize that they figure it out on their own and ignore the rule anyway.

But... just look at this question, and how even people that probably do have the skill thought it was a bad call to do so; and how much effort the author of the question spent just getting to a point of the current best example! I guarantee you on a problem that hard, 99% of the people out there would get it wrong, and with potentially very bad results! Even if the method that is decided on turns out to be a good one; until it (or another) ls parsing idea becomes adopted by IT/developer folk as a whole, withstands a lot of testing (especially the test of time) and, finally, manages to graduate to a 'common technique' status, it's likely that a lot of people might try it, and get it wrong... with disastrous consequences.

So, I will reiterate one last time.... that, especially in this case, that is why "never parse ls output!" is decidedly the right way to phrase it.

[UPDATE 2014-05-18: clarified reasoning for answer (above) to respond to a comment from OP; the following addition is in response to the OP's additions to the question from yesterday]

[UPDATE 2014-11-10: added headers and reorganized/refactored content; and also: reformatting, rewording, clarifying, and um... "concise-ifying"... i intended this to simply be a clean-up, though it did turn into a bit of a rework. i had left it in a sorry state, so i mainly tried to give it some order. i did feel it was important to largely leave the first section intact; so only two minor changes there, redundant 'but' removed, and 'that' emphasized.]

† I originally intended this solely as a clarification on my original; but decided on other additions upon reflection

‡ see https://unix.stackexchange.com/tour for guidelines on posts