GitHub - Arech/yacce: Non-intrusive compile_commands.json extractor for Bazel

30 min read Original article ↗

Yacce is a non-intrusive compile_commands.json extractor for Bazel (experimental, local compilation, Linux only)

Yacce extracts compile_commands.json and build system insights from a build system by supervising the local compilation process with strace. Yacce primarily supports Bazel (other build systems might be added later).

Motivation

Only open-source history of Bazel development spans for over a decade, and yet - it has a ton of C++ specific features, while one of very important ones, - generation of compile_commands.json, - is still not there. There situation is so ridiculous that even G's own commands had to invent and support their own "wheels" to make compile_commands for their projects (sample refs: 1, 2).

But there already exist several decent generic compile_commands.json extractors, external to Bazel, with hedronvision/bazel-compile-commands-extractor being the most well-known and, probably, respected. Why bother?

There are several reasons:

  • either they don't yield a compile_commands.json that let a LSP server (clangd, usually) to see all the relevant C++ symbols, i.e. don't work well, or
  • their usability is horrible, - extractors I've seen (I don't claim I saw all of them in existence!) requires one to make a certain nontrivial modification of the build system and specifically list there what targets and how exactly are going to be compiled just to spew the damn compile_commands!
    • what if I'm supporting a complex project spanning across multiple code bases, that don't employ such extractor, and I have to work on many code branches across many different remote machines? I'd have to first extract potentially branch specific build targets, and then manually inject extractor's code into the build system. Do this a few times a week, and you'll start to genuinely dislike Bazel (if you don't yet).
    • why it can't be made as simple as, for example, in CMake with its -DCMAKE_EXPORT_COMPILE_COMMANDS=1 ?
  • there is a completely orthogonal to usability and to correctness InfoSec consideration: what if I don't want to add a 3rd party, potentially compromisable dependency, into my project? I have no idea what it does internally there and what could it inject into my binaries under the hood. Why does an extractor just have to be intrusive?

Benefits of yacce

Supervising a build system doing compilation with a standard system tool have several great benefits:

  • Yacce is super user-friendly and simple to use. It's basically a drop-in prefix for a shell command you could use to build the project, be it bazel build ..., bazel run ..., or even MY_ENV_VAR="value" ./build/compile.sh arg1 .. argn. Just prepend your build command with yacce -- and hit enter.
  • strace lets yacce see real compiler invocations, hence compile_commands.json made from strace log reflects the way you build the project precisely, with all the custom configuration details you might have used, and independently of what the build system lets you to know and not know about that.
  • Compilation of all external dependencies as well as linking commands, are automatically included (with a microsecond timing resolution, if needed).
  • There are just no InfoSec risks by design (of course, beyond running a code of yacce itself, though it's rather small and is easy to verify). Yacce is completely external to the build system and doesn't interfere with it in any way.

Limitations

However, the supervising approach have some intrinsic limitations, which make it not suitable for all use-cases supported by Bazel:

  • strace needs to be installed (apt install strace), which limits yacce to basically Linux only.
  • compilation could only happen locally, on the very same machine, on which yacce runs. This leaves out a Bazel RBE, and requires building the project from an empty cache, if the cache is used.
  • while yacce doesn't care how you launch the build system and lets you use any script or a command you like, eventually, it should build only one Bazel workspace. Yacce does not check if this limitation is respected by a user, though typically, it's easy to fulfil.

If this is a hard no-go for you, suffer with consider other extractor, such as the above mentioned hedronvision's tool.

There are some "soft" limitations that might be removed in the future, such as:

  1. currently yacce does not support incremental builds (i.e. you'd have to fully recompile the project to update compile_commands.json). The fix for that is simple and just a matter of implementation.
  2. It looks like strace sometimes might produce...misformed logs. I always get what I expect on Debian 12-13, but I had to implement a special handling for unexpected line-breaks it sometimes produces on Ubuntu 22.04. I can't guarantee that there are no other quirks that could break log parsing.
  3. Bazel is monstrous. While yacce works nicely with some code bases, there might be edge cases, that aren't properly handled.
  4. One can't just take all the compiler invocations a build system does and simply dump them to a compile_command.json. A certain filtering is mandatory, and that requires parsing compiler's arguments:
    • gcc- and clang- compatible compilers are the only supported.
    • 100% correct compiler's argument parsing requires implementing 100% of compiler's own CLI parser, which is not done and will never be done. Yacce's parser is good enough for many uses, but certainly not for all. Yacce could diagnose some edge cases and warn of potentially incorrect results, but, again, - certainly not all edge cases are covered by the diagnostics.

You're unlikely to hit the last two. However, if you will, you know what to do (please file a bug report, or better submit a PR).

Give yacce a try with pip install yacce! Prepend the build command with yacce -- and let me know how it goes!

Important

Always add a clangd server startup argument --compile-commands-dir= to point to your directory with compile-commands.json file. It might not work properly without that!

In VSCode it's as simple as opening Settings / Extensions / clangd, and adding to the first setting clangd.arguments an item with a content --compile-commands-dir=${workspaceFolder} (or any other path). It might not use the compiler_commands.json without that.

Examples of extracting compile_commands.json from Bazel with yacce

First, install yacce with pip install yacce. Python 3.10+ is supported.

Second, ensure you have strace installed with sudo apt install strace. Some distributions have it installed by default.

Example 1, extracting compile_commands.json for JAX (jaxlib)

An elaborate way to say "just prepend your build script with yacce -- "

JAX is one of Google's machine learning frameworks. Part of it is written in Python, while high performance code is in C++. A compiled part is called jaxlib and is responsible for parts of JAX common for all execution backends, and for a CPU-based backend. We'll be using currently the latest JAX v0.7.2 here.

Compiling jaxlib is a good first example for yacce, because it has quite a sizeable code base with at least one dependency, XLA (a machine learning compiler), that is almost always being worked upon in parallel with the jaxlib itself. By default, JAX's build system fetches XLA from a pinned commit, but since we emulate a real developer work here, we'll also checkout that pinned commit to a local directory, so we could work on it, and then tell JAX's build system to use that local directory instead of the pinned commit. Yacce will automatically generate a single compile_commands.json for both jaxlib and XLA.

First, let's setup the workspace:

mkdir /src_jax && cd /src_jax  # the dir for both JAX and XLA sources
( git clone https://github.com/openxla/xla && cd ./xla \
  && git checkout 0fccb8a6037019b20af2e502ba4b8f5e0f98c8f6 )
git clone --branch jax-v0.7.2 --depth 1 https://github.com/jax-ml/jax

Now we have /src_jax/jax directory having v0.7.2 JAX commit checkout, and /src_jax/xla having the same XLA commit, that's designed for JAX v0.7.2. Time to build!

Without yacce, I'd use the following command inside the ./jax directory:

jax$ python3 ./build/build.py build --wheels=jaxlib --verbose --use_clang false \
  --target_cpu_features=native --bazel_options=--override_repository=xla=../xla

With yacce, it's just prepending the command with yacce -- like this:

jax$ yacce -- python3 ./build/build.py build --wheels=jaxlib --verbose --use_clang false \
  --target_cpu_features=native --bazel_options=--override_repository=xla=../xla

At the start, yacce will test if strace and bazel are available, and then it will ask your permission to execute bazel clean command. Starting with a clean state is mandatory for yacce to capture all compilation commands, but since cleaning and rebuilding from scratch might be expensive, yacce tries to prevent accidental harm by asking a permission. You can authorize it to do that from command line with --clean always argument put before -- separator like this: yacce --clean always -- .

After doing bazel clean, yacce will setup strace supervision over Bazel's server execution, and then launch the build script. When the build finishes, yacce will start strace log processing and in few seconds it'll write /src_jax/jax/compile_command.json containing all C++ source files used for jaxlib and for parts of XLA, that were required by jaxlib.

Tip

If a build system fails or cancelled manually (such as by hitting Ctrl+C), by default yacce will still try to process the gathered strace log and produce at least something. You can use that to you advantage by running scripts invoking other bazel commands, such as bazel run or bazel test under yacce. Just be aware that not all programs are friendly to strace. For example, address sanitizer can't work under strace, so all tests compiled with the sanitizer will fail under yacce/strace.

Now fire up your IDE and point clangd to that file, so it starts indexing it. In VSCode with clangd extension installed, if /src_jax is the main opened directory (workspace), then one could open Settings / Extensions / clangd, and click "Add Item" for clangd.arguments settings, putting --compile-commands-dir=${workspaceFolder}/jax there and then do ctrl+shift+p, "clangd.restart".

Example 2, how to extract compile_commands.json for AMD's flavor of JAX (handling several Bazel workspaces)

To build JAX running on AMD GPUs, one would have to work with a rocm/rocm-jax meta-repository. How to set up the dev environment is described in the appropriate document of the repo (we'll be using a specific commit 0acd84e9d095814f3c2d487ff5c326e56c6d0dc3 of the repo for reproducibility, it corresponds to JAX v0.6.0).

The most important feature of this project in the context of yacce is that after the dev setup is finished (more precisely, after python3 stack.py develop has been run and ./jax_rocm_plugin/Makefile is generated. Hint: take a look into this file to understand what commands actually invoked to build the wheels!), the topmost directory of the repo checkout will contain 3 bazel workspaces:

  • ./jax_rocm_plugin is the main workspace one would use to build the plugin and pjrt wheels,
  • ./jax is the checkout of a relevant commit of the upstream JAX, that is used as a dependency of the ROCm support wheels, and might be needed if one is to rebuild the jaxlib wheel using own XLA checkout.
  • ./xla is the checkout of the relevant commit of AMD's fork of XLA. It's used to build the wheels as a dependency, and might be useful for running XLA tests on its own.

Typical build procedure to make all 3 wheels (plugin, pjrt and jaxlib) is:

jax_rocm_plugin$ make refresh refresh_jaxlib

However, this is the case where we can't just prepend yacce -- and get away with it, because from the commands in ./jax_rocm_plugin/Makefile it's obvious that make refresh uses Bazel workspace ./jax_rocm_plugin, while make refresh_jaxlib uses workspace in ./jax. Yacce can monitor only a single bazel workspace at a time, so we have to split these in two. No special treatment for the first part, make refresh, is needed, it's just a simple:

jax_rocm_plugin$ yacce -- make refresh

But we should do something for the second part, make refresh_jaxlib: by default, yacce produces two files in the current directory:

  • strace.txt recording log file (it's useful for re-running compile_commands.json extraction with a different yacce settings by using old compilation recording, but if know which yacce settings you need upfront, you can use yacce --keep_log if_errors -- to remove it automatically if build succeeds), and
  • compile_commands.json generated from the log.

While for building jaxlib we can instruct yacce to generate these files in a different directory (./jax is the most suitable one), it's easier just to invoke yacce from this different directory and instruct it to switch to a different directory before running the build script. The latter is done with a single --build_cwd argument. Here's the full commands line for that to execute from ./jax directory:

jax$ yacce --build_cwd ../jax_rocm_plugin -- make refresh_jaxlib

This will generate compile_commands.json for jaxlib in ./jax directory.

Example 3, extracting compile_commands.json while running bazel test for XLA

Yacce can extract compile_commands.json from running any script as long as it eventually builds a single bazel workspace. Running bazel test when no tests are built is an example where this could be useful. We could just run it under yacce and after bazel reports it finished compiling of all tests, just cancel tests execution with Ctrl+C to let yacce proceed with compile_commands generation faster.

In relation to tests of ROCm fork of XLA (we'll use a dev environment from the previous example), there are two things to be aware of:

  1. The testing script ./build_tools/rocm/run_xla.sh uses Bazel --disk_cache argument to setup local caching. This speeds up re-builds of the project, but prevents yacce from seeing compilation of a cached entity. The fix for this is either disable disk caching for run under yacce (require modification of the run_xla.sh script) or just start from a clean cache. We'll do the latter by calling rm -rf /tf/disk_cache/rocm-jaxlib-v0.6.0 before running yacce.
  2. XLA tests are built with sanitizers, and most of them aren't working well under strace. Typically this results in just test failures. This is fine, since we're running the tests script for producing compile_commands.json. Just be aware of that.
  • another thing to be aware of is that clangd, as of now, chokes when it finds use of sanitizers in compile_commands.json, but yacce by default removes compilation arguments related to that. If needed, you can control that behaviour with certain yacce switches, see below.

Therefore, we need to run yacce in the following manner from ./xla directory:

xla$ rm -rf /tf/disk_cache/rocm-jaxlib-v0.6.0 ; \
    yacce -- ./build_tools/rocm/run_xla.sh

After bazel reports that it finished compiling and runs only test jobs (these will start to fail due sanitizers incompatibility with strace), hit Ctrl+C to stop tests execution early and let yacce generate compile_commands.json from the build log.

Modes of yacce operation and how to configure them

Yacce currently have at least 2 main modes of operation and few submodes, and have a pretty extensive configurability. Just run yacce --help to see the details. Below is a typical output you can expect, but please note this could be obsolete. Please always query yacce for the actual features and settings it supports.

Main options

$ yacce -h
usage: yacce [-h] [--debug {0,1,2,3,4,5,6}] [--colors | --no-colors] {bazel,from_log} ...

Yacce extracts compile_commands.json and build system insights from a build system by
supervising the local compilation process with strace.
Primarily supports Bazel (other build systems might be added later).
--> Homepage: https://github.com/Arech/yacce

positional arguments:
  {bazel,from_log}      Modes of operation. Use "--help" with each mode to get more
                        information.
    bazel               Runs a given build system based on Bazel in a shell and extracts
                        compile_commands.json from it (possibly with individual
                        compile_commands.json for each external dependency).
                        This is a default mode activated if the mode specification is just
                        omitted.
                        Hint: use 'yacce bazel --help' to get CLI arguments help.
    from_log            [dbg!] Generates a possibly NON-WORKING(!) compile_commands.json from
                        a strace log file.
                        This mode features the most generic way to parse strace output and
                        since the log generally lacks some important information (such as the
                        working directory in case of a Bazel), it may produce a non-working
                        compile_commands.json. The mode is primarily intended for debugging
                        purposes as it doesn't use any knowledge about the build system used
                        and just parses the strace log file and turns it into
                        compile_commands.json as is.
                        Hint: use 'yacce from_log --help' to get CLI arguments help.

options:
  -h, --help            show this help message and exit
  --debug {0,1,2,3,4,5,6}
                        Minimum debug level to show. 0 is the most verbose.
                        Default level is (info=) 2. Setting it higher than a (warning=) 3 is
                        not recommended.
  --colors, --no-colors
                        Controls if the output could be colored. (default: True)

Options for Bazel mode

$ yacce bazel -h
usage: yacce [global options] [bazel] [options (see below)] [-- shell command eventually
invoking Bazel]

Yacce extracts compile_commands.json and build system insights from a build system by
supervising the local compilation process with strace.
Primarily supports Bazel (other build systems might be added later).
--> Homepage: https://github.com/Arech/yacce

Mode 'bazel' is intended to generate compile_commands.json from tracing execution of a 'bazel
build' or any other shell command invoking Bazel using the Linux's strace utility. Hence it
only supports compilation of a single Bazel workspace (including its all external
dependencies) happening locally. If you are using Bazel's remote caching feature, including '
--disk_cache', please make sure you're starting with a clean cache, otherwise yacce won't see
compilation of cache hits.

options:
  -h, --help            show this help message and exit
  --log_file path/to/file
                        Write strace log to and/or read it from this file.
                        See also '--from_log'.
                        Default: 'strace.txt' in the current directory,
                        /home/arech/Documents/src/my/yacce/mantest/strace.txt
  --external {ignore,combine-with-overridden,to-files,to-external,combine-all}
                        Determines what to do when a compilation of a project's dependency
                        source file (from 'external/' subdirectory) is found.
                        - One option is to just to 'ignore' (remove) it and to leave in the
                        resulting compile_commands.json *only* commands directly related to
                        the project.
                        - Option 'combine-with-overridden' produces a single
                        compile_commands.json containing main project's files as well as
                        dependencies that are stored *outside* of their expected location at
                        '$(bazel info output_base)/external/<repo>' (this typically happens
                        when you override a dependency repo location for Bazel when you work
                        on the project and its dependency simultaneously).
                        - Option 'to-files' produces individual files nearby the main
                        compile_commands.json, named like'compile_commands_ext_<repo>.json'
                        for each external dependency '<repo>' (this might be useful for
                        manual inspection).
                        - Option 'to-external' differs from 'to-files' only in the location
                        and naming of the resulting files. 'to-external' produces an
                        individual compile_commands.json in each external dependency's
                        directory and is useful when you're going to open the dependency
                        directory in a parallel IDE for a close inspection.
                        - The 'combine-all' (default) option just writes all compilation
                        commands (for the main project and its dependencies) into a single file.
                        See '--dest_dir' argument for a default location and/or override for
                        the main project's compile_commands.json file.
                        NOTE that since currently yacce doesn't properly process compiler
                        invocations that aren't related to compiling C or C++ sources (such
                        as linking only, or compiling ASM files), if '--other_commands' flag
                        is specified, a compound other_commands.json (containing all other
                        invocations of a compiler for the main project and its externals)
                        will be saved nearby the main compile_commands.json irrespective of a
                        value of this flag.
  --bazel_command command_or_filepath
                        Override which command to run to communicate with the instance of a
                        Bazel for the build system.
                        You don't typically need this argument, if you have bazelisk
                        installed.
                        To set the workspace directory see '--bazel_workspace' argument.
                        Default: bazel
  --bazel_workspace path/to/dir
                        Overrides Bazel workspace directory to set a current directory
                        context for the bazel command (see '--bazel_command').
                        This is useful if yacce needs to be run from an outside of that
                        workspace. Note that any dir under a real workspace would also work
                        here.
                        Default: a current working directory
                        '/home/arech/Documents/src/my/yacce/mantest'.
  --build_cwd path/to/dir
                        By default, a shell command to start the build is invoked from the
                        Bazel workspace directory (see '--bazel_workspace'). This argument
                        allows to override that and set a different directory as a cwd for
                        the build command.
                        Note that this is different from '--cwd' argument, which for the
                        'bazel --from_log' mode of yacce specifies a value of '$(bazel info
                        execution_root)' directory.
  --build_shell shell_to_use
                        Build command is executed by passing it to a shell. By default,
                        'bash' is used, but you can override that with this argument.
  --ensure_build_succeeds, --no-ensure_build_succeeds
                        By default yacce only warns if the build command fails (exits with a
                        non-zero code) and it tries to process the strace log file to produce
                        some results anyway. If you want to make sure that yacce will only
                        use a full log of a successful build, set this argument to enforce
                        yacce failure if the build fails.
                        (default: False)
  --cwd path/to/dir     Path to the working directory of the compilation.
                        This value goes to a 'directory' field of an entry of
                        compile_commands.json and is used to resolve relative paths found in
                        the command. If '--ignore-not-found' argument isn't set, yacce will
                        try to test if mentioned files exist in this directory and warn if
                        they aren't. Note that passing the file existence test helps, but
                        doesn't guarantee that the resulting compile_commands.json will be
                        correct.
                        In the 'yacce bazel --from_log' mode, this argument overrides an
                        output of '$(bazel info execution_root)' (i.e. this enables parsing
                        of an existing log file without querying its build system). In the
                        default live mode (when no '--from_log' argument is specified) this
                        argument is either has to be unset, or match the output of '$(bazel
                        info execution_root)'.
  --ignore-not-found, --no-ignore-not-found
                        If the flag is set, yacce will not test if files to be added to .json
                        exists, and will not attempt to test if an invoked compiler is a
                        script wrapper.
                        (default: False)
  -o, --other_commands, --no-other_commands
                        If set, yacce will also generate other_commands.json file.
                        This file has a similar to compile_commands.json format, but contains
                        all other compiler invocations found that aren't useful for gathering
                        C++ symbol information of the project, but handy to get insights
                        about the build in general (such as for compiling assembler sources
                        or for linking).
                        Note that yacce currently does not implement attribution of other
                        compilation commands to the project's external dependencies. I.e. all
                        other commands related to compiling non-C++ sources and linking will
                        be combined into a single other_commands.json file irrespective of '
                        --external' argument setting. (default: False)
  --save_duration, --no-save_duration
                        If set, yacce will add a 'duration_s' field into the resulting .json
                        that contain how long the command run in seconds with a microsecond
                        resolution.
                        This feature currently doesn't have automated use, but the file can
                        be inspected manually, or with a custom script to obtain build system
                        performance insights.
                        WARNING: current clangd gets upset when it finds a field it doesn't
                        know, so enabling this option might prevent you from using clangd
                        with the resulting file!
                        (default: False)
  --save_line_num, --no-save_line_num
                        If set, yacce will add a 'line_num' integer field into the resulting
                        .json that contain a line number of the compiler call in the strace
                        log file.
                        Useful for debugging, but have no automated use.
                        WARNING: current clangd gets upset when it finds a field it doesn't
                        know, so enabling this option might prevent you from using clangd
                        with the resulting file!
                        (default: False)
  --discard_outputs [PathFilter ...]
                        A build system can compile some dummy source files only to gather
                        information about compiler capabilities. Presence of these files in
                        the compile_commands.json aren't usually helpful. Typically, such
                        files are placed into /tmp or /dev/null, but other variants are
                        possible.
                        This setting lets you customize finding which compiler output files
                        should lead to ignoring the whole compilation call.
                        See a 'PathFilter specification' section below for details.
                        Pass an empty string "" to disable. Accepts multiple values at once.
                        Default: ['/dev/null', '/tmp/+', '+/.cache/ccache/tmp/+'].
  --discard_sources [PathFilter ...]
                        Similar to --discard_outputs, but controls which source file path
                        patterns should lead to ignoring the compiler call.
                        See a 'PathFilter specification' section below for details.
                        Pass an empty string "" to disable. Accepts multiple values at once.
                        Default: [''].
  --discard_args_with_pfx [+compiler_arg_prefix ...]
                        Certain compiler arguments, such as sanitizers, are known to choke
                        clangd. Some others like those concerning build reproducibility might
                        be useless for C++ symbols.
                        Set a value of this parameter to a sequence of prefixes to match and
                        remove such compiler arguments.
                        Pass an empty string "" to disable. Accepts multiple values at once.
                        ATTENTION: since Python's argparse always treats a leading dash in a
                        CLI argument as a script's argument name, but not value, use a plus
                        sign '+' instead of a dash '-' to specify a leading dash. Example:
                        instead of '-fsanitize' use '+fsanitize'.
                        Default: ['+fsanitize'].
  --discard_args [+compiler_arg_or_args_pair_spec ...]
                        Similarly to '--discard_args_with_pfx', values for this argument
                        define a set of compiler arguments (such as '-DMY_DEF=VALUE') or
                        pipe-delimited argument pairs (like a single token value
                        '-I|/certain/dir' defines a two token pair '-I /certain/dir') that
                        will be removed from a compiler invocation.
                        Note that a single token specification of a '-D' compiler argument
                        has a special handling and also addresses its two token alternatives.
                        Pass an empty string "" to disable. Accepts multiple values at once.
                        ATTENTION: since Python's argparse always treats a leading dash in a
                        CLI argument as a script's argument name, but not value, use a plus
                        sign '+' instead of a dash '-' to specify a leading dash. For the
                        above it's '+DMY_DEF=VALUE' and '+I|/certain/dir'.
                        Default: ['+DADDRESS_SANITIZER'].
  --enable_dupes_check, --no-enable_dupes_check
                        If set, yacce will report if a pair <source, output> isn't unique.
                        Usefulness of this flag solely depends on actual build system
                        implementation. Some might use lots of temporary compilations just to
                        gather compiler capabilities which could lead to an avalanche of
                        false positives. This could be mitigated with --discard* family of
                        flags, but this requires manual intervention, hence it's disabled by
                        default.
                        (default: False)
  -c [compiler_basename_or_path_fragment ...], --compiler [compiler_basename_or_path_fragment ...]
                        Adds an absolute path, a basename, a path suffix, or a path prefix
                        (prepend it with a plus '+' symbol) of a custom compiler to the set
                        of compilers already detectable by yacce. Accepts multiple values at
                        once.
  --not_compiler [compiler_basename_or_path_fragment ...]
                        You can prevent a certain absolute path, a basename, a path suffix,
                        or a path prefix (prepend it with a plus '+' symbol) from being
                        treated as a compiler by using this argument. Accepts multiple values
                        at once.
  --enable_compiler_scripts, --no-enable_compiler_scripts
                        By default, yacce doesn't treat a script (classified by testing for
                        shebang '#!' sequence in the first 2 bytes of the file) invocation as
                        a compiler invocation and ignores it. Set this option when this
                        behavior is unwanted.
                        (default: False)
  -d dir/path, --dest_dir dir/path
                        Destination directory in which yacce should create resulting .json
                        files. Must exist.
                        Default: directory of the log file (see '--log_file')

Log mode, uses existing strace log and is mutually exclusive with the live mode:
  --from_log            Toggles a mode in which yacce will only parse an existing log file
                        specified by '--log_file', but will not invoke a build system to spy
                        on.
                        Mutually exclusive with '--keep_log' and requires no build system
                        arguments passed (no '--' argument and anything after it).
                        Default: not set, i.e. the mode is not activated.

Live mode (default), runs a Bazel build system and is mutually exclusive with the log mode:
  --keep_log {if_errors,always,never}
                        Determines conditions of keeping of the strace log file after yacce
                        finishes. Mutually exclusive with '--from_log'.
                        Default is 'always' as it might be useful to run yacce in the log
                        mode with different arguments later on the same log file.
  --clean {always,expunge,never}
                        Determines, if a 'bazel clean' or 'bazel clean --expunge' commands
                        should be executed before running the build.
                        Note that if cleaning is disabled, cached (already compiled) sources
                        will be invisible to yacce and hence will not make it into resulting
                        compiler_commands.json! (iterative updates aren't supported yet)
                        Default: not specified, yacce will ask if running 'bazel clean' is
                        ok.

PathFilter specification:
-------------------------
Filters are tested before and after applying working directory to a path (if the path tested
isn't an abs path), and after expanding an abs path to a realpath(). Recognizable filter
specifications are:
- exact match: filter not starting or ending on a plus '+' sign
- match prefix: filter ending with a plus '+' sign. A path starting with the string before
the plus sign matches the filter.
- match suffix: filter beginning with a plus '+' sign. A path ending with a substring that
follows the plus sign matches the filter.
- substring match: filter beginning with and ending with a plus '+' sign matches as a
substring.

Generic log parsing mode options

$ yacce from_log -h
usage: yacce from_log [-h] [--cwd path/to/dir] [--ignore-not-found | --no-ignore-not-found]
                      [-o | --other_commands | --no-other_commands]
                      [--save_duration | --no-save_duration]
                      [--save_line_num | --no-save_line_num]
                      [--discard_outputs [PathFilter ...]]
                      [--discard_sources [PathFilter ...]]
                      [--discard_args_with_pfx [+compiler_arg_prefix ...]]
                      [--discard_args [+compiler_arg_or_args_pair_spec ...]]
                      [--enable_dupes_check | --no-enable_dupes_check]
                      [-c [compiler_basename_or_path_fragment ...]]
                      [--not_compiler [compiler_basename_or_path_fragment ...]]
                      [--enable_compiler_scripts | --no-enable_compiler_scripts]
                      [-d dir/path]
                      log_file

Yacce extracts compile_commands.json and build system insights from a build system by
supervising the local compilation process with strace.
Primarily supports Bazel (other build systems might be added later).
--> Homepage: https://github.com/Arech/yacce

Mode 'from_log' is a supplementary mode that generates a compile_commands.json from a strace
log file without using any additional information about build system.
ATTENTION: this mode is intended for debugging purposes only and most likely will not produce
a correct compile_commands.json due to a lack of information about the build process details.
If you want to regenerate compile_commands from a log file for Bazel, use 'yacce bazel
--from_log' instead.

positional arguments:
  log_file              Path to the strace log file to parse.

options:
  -h, --help            show this help message and exit
  --cwd path/to/dir     Path to the working directory of the compilation.
                        This value goes to a 'directory' field of an entry of
                        compile_commands.json and is used to resolve relative paths found in
                        the command. If '--ignore-not-found' argument isn't set, yacce will
                        try to test if mentioned files exist in this directory and warn if
                        they aren't. Note that passing the file existence test helps, but
                        doesn't guarantee that the resulting compile_commands.json will be
                        correct.
                        In the 'from_log' mode a relative path specification is resolved to
                        the absolute path using a directory of the log file.
                        Default: directory of the log file.
  --ignore-not-found, --no-ignore-not-found
                        If the flag is set, yacce will not test if files to be added to .json
                        exists, and will not attempt to test if an invoked compiler is a
                        script wrapper.
                        (default: False)
  -o, --other_commands, --no-other_commands
                        If set, yacce will also generate other_commands.json file.
                        This file has a similar to compile_commands.json format, but contains
                        all other compiler invocations found that aren't useful for gathering
                        C++ symbol information of the project, but handy to get insights
                        about the build in general (such as for compiling assembler sources
                        or for linking).
                        (default: False)
  --save_duration, --no-save_duration
                        If set, yacce will add a 'duration_s' field into the resulting .json
                        that contain how long the command run in seconds with a microsecond
                        resolution.
                        This feature currently doesn't have automated use, but the file can
                        be inspected manually, or with a custom script to obtain build system
                        performance insights.
                        WARNING: current clangd gets upset when it finds a field it doesn't
                        know, so enabling this option might prevent you from using clangd
                        with the resulting file!
                        (default: False)
  --save_line_num, --no-save_line_num
                        If set, yacce will add a 'line_num' integer field into the resulting
                        .json that contain a line number of the compiler call in the strace
                        log file.
                        Useful for debugging, but have no automated use.
                        WARNING: current clangd gets upset when it finds a field it doesn't
                        know, so enabling this option might prevent you from using clangd
                        with the resulting file!
                        (default: False)
  --discard_outputs [PathFilter ...]
                        A build system can compile some dummy source files only to gather
                        information about compiler capabilities. Presence of these files in
                        the compile_commands.json aren't usually helpful. Typically, such
                        files are placed into /tmp or /dev/null, but other variants are
                        possible.
                        This setting lets you customize finding which compiler output files
                        should lead to ignoring the whole compilation call.
                        See a 'PathFilter specification' section below for details.
                        Pass an empty string "" to disable. Accepts multiple values at once.
                        Default: ['/dev/null', '/tmp/+', '+/.cache/ccache/tmp/+'].
  --discard_sources [PathFilter ...]
                        Similar to --discard_outputs, but controls which source file path
                        patterns should lead to ignoring the compiler call.
                        See a 'PathFilter specification' section below for details.
                        Pass an empty string "" to disable. Accepts multiple values at once.
                        Default: [''].
  --discard_args_with_pfx [+compiler_arg_prefix ...]
                        Certain compiler arguments, such as sanitizers, are known to choke
                        clangd. Some others like those concerning build reproducibility might
                        be useless for C++ symbols.
                        Set a value of this parameter to a sequence of prefixes to match and
                        remove such compiler arguments.
                        Pass an empty string "" to disable. Accepts multiple values at once.
                        ATTENTION: since Python's argparse always treats a leading dash in a
                        CLI argument as a script's argument name, but not value, use a plus
                        sign '+' instead of a dash '-' to specify a leading dash. Example:
                        instead of '-fsanitize' use '+fsanitize'.
                        Default: ['+fsanitize'].
  --discard_args [+compiler_arg_or_args_pair_spec ...]
                        Similarly to '--discard_args_with_pfx', values for this argument
                        define a set of compiler arguments (such as '-DMY_DEF=VALUE') or
                        pipe-delimited argument pairs (like a single token value
                        '-I|/certain/dir' defines a two token pair '-I /certain/dir') that
                        will be removed from a compiler invocation.
                        Note that a single token specification of a '-D' compiler argument
                        has a special handling and also addresses its two token alternatives.
                        Pass an empty string "" to disable. Accepts multiple values at once.
                        ATTENTION: since Python's argparse always treats a leading dash in a
                        CLI argument as a script's argument name, but not value, use a plus
                        sign '+' instead of a dash '-' to specify a leading dash. For the
                        above it's '+DMY_DEF=VALUE' and '+I|/certain/dir'.
                        Default: ['+DADDRESS_SANITIZER'].
  --enable_dupes_check, --no-enable_dupes_check
                        If set, yacce will report if a pair <source, output> isn't unique.
                        Usefulness of this flag solely depends on actual build system
                        implementation. Some might use lots of temporary compilations just to
                        gather compiler capabilities which could lead to an avalanche of
                        false positives. This could be mitigated with --discard* family of
                        flags, but this requires manual intervention, hence it's disabled by
                        default.
                        (default: False)
  -c [compiler_basename_or_path_fragment ...], --compiler [compiler_basename_or_path_fragment ...]
                        Adds an absolute path, a basename, a path suffix, or a path prefix
                        (prepend it with a plus '+' symbol) of a custom compiler to the set
                        of compilers already detectable by yacce. Accepts multiple values at
                        once.
  --not_compiler [compiler_basename_or_path_fragment ...]
                        You can prevent a certain absolute path, a basename, a path suffix,
                        or a path prefix (prepend it with a plus '+' symbol) from being
                        treated as a compiler by using this argument. Accepts multiple values
                        at once.
  --enable_compiler_scripts, --no-enable_compiler_scripts
                        By default, yacce doesn't treat a script (classified by testing for
                        shebang '#!' sequence in the first 2 bytes of the file) invocation as
                        a compiler invocation and ignores it. Set this option when this
                        behavior is unwanted.
                        (default: False)
  -d dir/path, --dest_dir dir/path
                        Destination directory in which yacce should create resulting .json
                        files. Must exist.
                        Default: current working directory.

PathFilter specification:
-------------------------
Filters are tested before and after applying working directory to a path (if the path tested
isn't an abs path), and after expanding an abs path to a realpath(). Recognizable filter
specifications are:
- exact match: filter not starting or ending on a plus '+' sign
- match prefix: filter ending with a plus '+' sign. A path starting with the string before
the plus sign matches the filter.
- match suffix: filter beginning with a plus '+' sign. A path ending with a substring that
follows the plus sign matches the filter.
- substring match: filter beginning with and ending with a plus '+' sign matches as a
substring.

Stability and Changelog

Current version is considered a stable experimental release, which means it works perfectly well on the tested projects without known issues, but due to the facts described in "Limitations" section, might give unexpected results on other projects.

Zero in the current major version signify a development status, which is expected to stabilize into version 1.0 soon. After that the project will fully follow the semantic versioning scheme.

Depending on a feedback, the project or its individual components might get breaking changes, so you might want to use version pinning to prevent unexpected breakages if you're relying on it in automated scripts. See CHANGELOG.md for details.