Tree – a lib for working with nested data structures, open-sourced by deepmind
github.comFor those using Bazel this looks like a pretty helpful example of developing and releasing a cross-platform Python package with native-libs (C++ in this case).
I've wondered for a while how Deepmind manages their (Python & C++) codebase. This is a small window into that I think, and you can see that they're likely using internal 'Blaze' Python rules and rely on the maintenance of compatibility with the open-source Bazel rules. `py_library` is used in `tree/BUILD`, but it is not loaded explicitly as is expected with Bazel these days; the rule definition is loaded by Bazel implicitly.
Combining C++ with python via pybind is something bazel really excels at, with CMake you end up maintaining two disjoint build and packaging systems.
My project builds 20 binary wheels supporting most platforms except ARM (todo): https://pypi.org/project/elkai/#files It's one line of code using CMake. [1]
Granted, you also need CI that has most Python versions/platforms and the Python CMake extension by scikit team too. Ex. MacOS Travis with brew [2]
[1] https://github.com/filipArena/elkai/blob/ef8ee98d8114240b204...
[2] https://github.com/filipArena/elkai/blob/master/.travis.yml
I thought implicitly doing anything is against the whole bazel philosophy.
You're right, this is likely to require an explicit load in the future, iirc. As for now, the py_* rules are so common that they are implicit. Loading replacement py_* rules overrides the built-in.
From the tree docs:
> tree has originally been part of TensorFlow and is available as tf.nest.
The tf.nest docs can be found here and may be more useful for now: https://www.tensorflow.org/api_docs/python/tf/nest
I'm glad this is being moved out of TensorFlow. It's a really useful library on its own and I've found myself reaching for it on projects without wanting to import all of TensorFlow.
I wonder what is the utility of this library?
Caveat emptor: "The behavior for structures with cycle references is undefined"
Isn't this a pretty common error to occur in Python programs? It would be good to at least have a checked mode available, otherwise the "dragons may fly out of your nose" semantics feels like a pretty aggressive compromise in favour of performance vs correctness.
I'm not sure if they mean this strong form of "undefined", or something milder like "might run forever or cause stack overflows".
... It is called "tree", so if we're going by graph theory and typical ADT form then being acyclic is to be expected.
While I agree that a checked mode would be useful, and would likely be the best compromise for a "one-off" use, if you're preserving the structure like this does, then I feel it's quite likely you'd want to do multiple passes. If so, then pre-processing to ensure that it's acyclic would be useful: rather than a checked mode, perhaps an is_acyclic() method, similar to their is_nested(), would be the better compromise? That would allow it to be a nicely optimised function for just that, rather than adding overhead for "nested structures" that are known to be acyclic.
Adding is_acyclic() also means you can handle "sanitising" it yourself. I say that because it may be non-trivial depending on what you're doing and so not best-handled by an passed-through lambda etc. that may need to modify the structure mid "iteration", which would be awkward. Also, you're then doing it up-front, rather than handling an exception, which may have interrupted side-effects, and then makes it awkward because how do you modify and resume without repeating work / side-effects?
As to whether it's pretty common error or not, we'd probably need a wider study to see whether this has caused errors -- after all, sometimes cyclic / self-referential structures are correct. So I don't feel that this is an aggressive compromise, though I do agree that doing something to help with this case (in case of cyclic structures being erroneous) would be the "correct" thing to do.
While is_acyclic() seems better to me, a checked mode would be good for one-shot uses of this library, but as this would be pointless overhead on known tree / tree-like structures (what I would expect to use personally) then as a kwarg I'd prefer check_acyclic=False by default... and we'd likely need multiple cycle detection options, since space/time tradeoffs can be important and what's optimal will likely vary depending on the actual structure used. Similar options for is_acyclic() would be needed by the same argument, of course.
My biggest issue with check_acyclic= would be that it's an additional kwarg to almost every single function in the library, which is a bigger change (I feel) than an independent function. Occam's razor / separation-of-concerns and the like means that I feel the compromise of a checked mode outweighs its benefits, whereas adding is_acyclic() only has the compromise of a new, independent function (and the costs associated with implementing it), and otherwise has the same benefits.
Nice. Although I wish the documentation was less sparse.
So all it's doing is flatten the tree?