Settings

Theme

Testing LLVM

blog.regehr.org

87 points by awalGarg 9 years ago · 34 comments

Reader

KenoFischer 9 years ago

I always find committing to LLVM very nerve wrecking, because of the post-commit CI testing. LLVM has so many architectures that more often than not something I write will fail on one of them. And the only way for me to find out is to commit it, wait for the buildbot to fail (which can take a few hours, during which I really can't leave my computer lest I leave trunk broken on some buildbot, which is a big faux pas), revert it and then figure out what went wrong. I'm hoping that at some point this will be improved, such that I can run the whole buildbot army on my commit before putting it on trunk.

  • shanemhansen 9 years ago

    That's odd. It would be nice if branches could be tested with the build bot.

    • gus_massa 9 years ago

      I agree. When I submit a commit for Racket, the branch is tested in Travis and AppVeyor. That catches a lot of error.

      Anyway, Racket has an additional internal CI called DrDr and some very subtle errors are only detected there, after the commit is merged.

  • wyldfire 9 years ago

    I get the feeling that there's parts of the community that feel the same way. I'm hoping that the planned move to github will naturally cascade into pre-commit checks.

  • Locke1689 9 years ago

    Most of the .NET repos are structured to use inner and outer loop testing with Jenkins. Most tests on most architectures are run in the inner loop, which are kicked off in parallel as soon as you make a pull request to one of the .NET repositories on Github.

    Some repositories, like CoreCLR, have outer loop testing that runs on a separate schedule (nightly, I think), but those tests are far less likely to break and are more devoted to finding rare and difficult to compute edge cases.

SloopJon 9 years ago

Interesting to see screenshots of LCOV. I'm hoping to get an intern to work on test coverage this summer, and I wondered whether LCOV is still current. Looks like the latest release is from December 2016.

bootload 9 years ago

"Compilers are usually not networked, concurrent, or timing-dependent, and overall interact with the outside world only in very constrained ways."

Are there any parallel compliers?

  • tux3 9 years ago

    I can say that for C and C++, the compilation is very often parallelized at the translation unit (file) level, by starting multiple instances of the compiler either locally or over a network with something like distcc. This is simple and effective enough that there wouldn't be much gain in parallelizing the compilers: all the cores are already busy most of the time.

  • Locke1689 9 years ago

    It depends on what you mean by parallel.

    Certainly the Roslyn C# compiler is highly parallel. All files are parsed in parallel, then all classes are bound (semantically analyzed) in parallel, then the IL serialization phase is sequential.

    • bootload 9 years ago

      "It depends on what you mean by parallel."

      Across different machines, not cores on ^a^ chip?

      • Locke1689 9 years ago

        I wouldn't say that's what most people mean by parallel, but in that case I think you're better off building a layer on top of the compiler for that.

        For instance, provided deterministic compilation you could keep a networked cache of compiled libraries that would be delivered as needed.

        Trying to be network-parallel at any finer level is probably a waste of time -- network and (de)serialization overhead would eat away all the advantages.

  • enqk 9 years ago

    Microsoft's CL.exe is, through the /MP option

    • boris 9 years ago

      Which only has effect if you pass multiple files to compile.

newsat13 9 years ago

One quirk of llvm is that they don't have a pre-commit CI.

mp3geek 9 years ago

I do find it strange that such a large project isn't using a better VCS. SVN seems to be very antiquated.

  • wyldfire 9 years ago

    It's its size that makes it difficult to move. Some major ecosystem stuff is designed around the svn infrastructure. When the will arrived to make a change, it seemed natural to migrate not just to a different VCS but a different host. And this seemed to spawn a new debate: monorepo vs multi-repo. [still open AFAIK]

    At the recent 2016 US Dev Conf, there was a consensus to move to git and that the new host would be github.

    Really subjective IMO part: In general, there's tons of really smart folks working on really awesome stuff in LLVM+clang+etc. There's a handful of folks also focusing on the general "plumbing" software within and among those projects. The meta-plumbing job of the dev infrastructure is "kinda interesting" to several folks who want to improve the way the project is developed. But "kinda interesting" doesn't pay the bills and so it's a second (or nth responsibility) for the folks volunteering to work on it. Add to that the "no good deed goes unpunished" rule that they'll get the responsibility/blame after making a sweeping change, it means it will require extreme patience and caution.

  • DannyBee 9 years ago

    For a project like LLVM, it just doesn't matter too much. git-svn or plain svn works pretty well for most people. Certainly it matters, and it'll move eventually, but i'd rather see time spent on better testing tools than a "better" vcs.

    When i moved GCC from CVS to SVN, it made life a bit easier but it's not revolutionary change.

    Which is funny, considering how often people argue about VCS systems.

    • Locke1689 9 years ago

      I don't agree.

      Before we moved to Git, Roslyn was on TFS, which was basically Perforce/CVS/SVN.

      You're absolutely right that the distinction among the former VCS's is minimal. However, Git offers value that was transformative compared to the former. Namely,

        1. Git allows you to easily switch between multiple work items while keeping track of the work done in each item.
      
        2. Git allows you to easily integrate with people who have significantly diverged clones of your tree without too much trouble.
      
        3. Git allows you to easily work offline.
      
      (1) is definitely the largest benefit, but was mitigated with tools like g5 when I was at Google. However, the Google gravity well has its own drawbacks.

      (2) is very important if you want to host rapid release schedules with divergence of features. It's especially useful if you want to have long stabilization periods and low-risk bug fixes delivered in parallel to multiple customers.

      (3) is pretty self-explanatory, but for most people it's underestimated how much downtime your VCS has. I'd bet, for most people, it's significantly less than 5 9's. Not only is that wasted time, it's frustrating because it's usually consecutive and removes entire working days at random.

      • saurik 9 years ago

        I take it you haven't actually used the tool that was mentioned in the comment you replied to, namely git-svn? My use of svn to interface with projects using Subversion has essentially entirely been replaced by git-svn, and I can say it is essentially impossible for someone who has used it to not realize that at least offline now works like git. Taking a step back: at some point what you run on the server is just a storage format; unless you used some of the more advanced Subversion features (at which point you might actually like using it), it generally maps pretty directly to git semantics, at which point essentially all other functionality differences are mere porcelain.

        • lomnakkus 9 years ago

          Presumably not everyone is using git-svn... otherwise what would be the point of sticking with svn?

    • paulddraper 9 years ago

      Not a big surprise, considering that SVN is "CVS done right".

      Compare that to Git's author: "Subversion has been the most pointless project ever started. There is no way to do cvs right."

      Git and Hg (+ the many tools that surround them: GitHub, Bitbucket, Gerrit, GitLab, etc.) have a model that makes community contribution far easier than CVS and SVN.

      • saurik 9 years ago

        The community contribution concepts in git are great, but it is confusing to then mention GitHub: their modus operandi is to provide tooling to make things easier that are only hard if you insist on misusing git as if it were Subversion (by for example having a single centralized repository with multiple committers, requiring complex and annoying access control and public key management). If someone had built tooling like GitHub around Subversion and then encouraged use of svk (note the "k"; this was a replacement client for Subversion that supported offline operation and had better merging support, but which worked with any svn server), things would have felt much more reasonable before; the irony is that if you follow the actual git workflow used by Linus for Linux (where everyone has their own repository, rather than at best their own branch and at worst trying to share master), you shouldn't even need any of that for git :/.

        • paulddraper 9 years ago

          > actual git workflow used by Linus for Linux

          When I install Linux 2.4, that centralized version comes from somewhere.

          I agree that svk could have made for a serviceable GitHub, but the fact that Git had such things natively supported is a big advantage.

          • saurik 9 years ago

            Yes, and the centralized version comes from a tree that in the Linux workflow is only able to be modified by one person. You submit patches via email or pull requests (literal ones, to pull from a repository); you don't share commit bits on a centralized repository.

  • Groxx 9 years ago

    Until semi-recently, Git[1] wouldn't let you do a shallow checkout and still do useful things. For a large project, for most purposes, downloading all of history is pointless and immensely wasteful. SVN handles that just fine, and people who want git locally can use git-svn.

    (edit: LLVM is surprisingly small, actually - a git clone comes in at just under 900MB. for more painful examples tho, see repos that commit(ted) binaries, or the scale of Android's repos)

    [1]: AFAIK Mercurial still has no built-in support, though extensions exist. Which is probably the right choice for Mercurial.

    • tux3 9 years ago

      >LLVM is surprisingly small, actually - a git clone comes in at just under 900MB

      That's a little bit on the small side, but it's still very manageable. For comparison Linux's .git folder comes in at 1.3GB on my computer, and LibreOffice's repo which has git history going back to the year 2000 weights some 3.6 GB. I can happily say that I haven't had any performance or space problem dealing with either full repos, even on my fairly weak laptop.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection