Settings

Theme

Tree – a lib for working with nested data structures, open-sourced by deepmind

github.com

127 points by okokok___ 5 years ago · 16 comments

Reader

thundergolfer 5 years ago

For those using Bazel this looks like a pretty helpful example of developing and releasing a cross-platform Python package with native-libs (C++ in this case).

I've wondered for a while how Deepmind manages their (Python & C++) codebase. This is a small window into that I think, and you can see that they're likely using internal 'Blaze' Python rules and rely on the maintenance of compatibility with the open-source Bazel rules. `py_library` is used in `tree/BUILD`, but it is not loaded explicitly as is expected with Bazel these days; the rule definition is loaded by Bazel implicitly.

jimfleming 5 years ago

From the tree docs:

> tree has originally been part of TensorFlow and is available as tf.nest.

The tf.nest docs can be found here and may be more useful for now: https://www.tensorflow.org/api_docs/python/tf/nest

I'm glad this is being moved out of TensorFlow. It's a really useful library on its own and I've found myself reaching for it on projects without wanting to import all of TensorFlow.

fulafel 5 years ago

Caveat emptor: "The behavior for structures with cycle references is undefined"

Isn't this a pretty common error to occur in Python programs? It would be good to at least have a checked mode available, otherwise the "dragons may fly out of your nose" semantics feels like a pretty aggressive compromise in favour of performance vs correctness.

  • CodesInChaos 5 years ago

    I'm not sure if they mean this strong form of "undefined", or something milder like "might run forever or cause stack overflows".

  • babel_ 5 years ago

    ... It is called "tree", so if we're going by graph theory and typical ADT form then being acyclic is to be expected.

    While I agree that a checked mode would be useful, and would likely be the best compromise for a "one-off" use, if you're preserving the structure like this does, then I feel it's quite likely you'd want to do multiple passes. If so, then pre-processing to ensure that it's acyclic would be useful: rather than a checked mode, perhaps an is_acyclic() method, similar to their is_nested(), would be the better compromise? That would allow it to be a nicely optimised function for just that, rather than adding overhead for "nested structures" that are known to be acyclic.

    Adding is_acyclic() also means you can handle "sanitising" it yourself. I say that because it may be non-trivial depending on what you're doing and so not best-handled by an passed-through lambda etc. that may need to modify the structure mid "iteration", which would be awkward. Also, you're then doing it up-front, rather than handling an exception, which may have interrupted side-effects, and then makes it awkward because how do you modify and resume without repeating work / side-effects?

    As to whether it's pretty common error or not, we'd probably need a wider study to see whether this has caused errors -- after all, sometimes cyclic / self-referential structures are correct. So I don't feel that this is an aggressive compromise, though I do agree that doing something to help with this case (in case of cyclic structures being erroneous) would be the "correct" thing to do.

    While is_acyclic() seems better to me, a checked mode would be good for one-shot uses of this library, but as this would be pointless overhead on known tree / tree-like structures (what I would expect to use personally) then as a kwarg I'd prefer check_acyclic=False by default... and we'd likely need multiple cycle detection options, since space/time tradeoffs can be important and what's optimal will likely vary depending on the actual structure used. Similar options for is_acyclic() would be needed by the same argument, of course.

    My biggest issue with check_acyclic= would be that it's an additional kwarg to almost every single function in the library, which is a bigger change (I feel) than an independent function. Occam's razor / separation-of-concerns and the like means that I feel the compromise of a checked mode outweighs its benefits, whereas adding is_acyclic() only has the compromise of a new, independent function (and the costs associated with implementing it), and otherwise has the same benefits.

asimjalis 5 years ago

Nice. Although I wish the documentation was less sparse.

slightwinder 5 years ago

So all it's doing is flatten the tree?

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection