AmpliGraph: A TensorFlow-Based Library for Knowledge Graph Embeddings

112 points by mulletboy 7 years ago · 13 comments

Reader

nl 7 years ago

Graph embeddings are one of my favorite underused things in ML.

I've used them to do things like characterise users based on follow/follower patterns, but there are many more applications.

In the past I've had great success with Facebook Research's StarSpace

evrydayhustling 7 years ago

Can you share more about that project? Sounds cool.

jamasb 7 years ago

I've been doing some work on link prediction in knowledge graphs recently with poor results on real-world data. These methods don't necessarily require a huge amount of data but they are very sensitive to noise and the 'density' of dataset. The benchmark datasets are, in essence, very easy to get good performance on. It's a real shame that metrics for these methods' tolerance of noise and sparsity are not reported because these are going to be present in almost any real-world dataset in far greater quantities than current benchmarks.

mulletboyOP 7 years ago

Well, the landscape is still quite fluid (there are new models proposed in literature at every major conference). Processing real-world graphs is obviously more challenging, for a number of reasons (multi-modality, scale, etc.) - even though benchmarks are catching up, and are becoming harder (see FB15k-237 or WN18RR).
As a general rule of thumb, it is important your graph has enough redundancy in it, i.e. the more relations, the better. Also, bear in mind these models do not support multi-modality, i.e. literals such as numbers, strings, geo coordinates, timestamps are simply treated as entities. In most cases it is probably better to filter literals out before generating the embeddings.

pilooch 7 years ago

Looks very seriously made and documented, congrats ! Was looking at it a bit closely the other days and put it onto my list of future tools. There's been another somewhat related library released by facebook recently, https://ai.facebook.com/blog/open-sourcing-pytorch-biggraph-...

mulletboyOP 7 years ago

Thanks! Feel free to play around with it - and of course any feedback is much appreciated (GitHub, email or on our Slack channel https://join.slack.com/t/ampligraph/shared_invite/enQtNTc2NT.... I was not aware of pytorch-biggraph. Looks cool. It's good to see there's a lot going on in graph representation learning!

bravura 7 years ago

Can you help me understand, what are possible inputs to ampligraph?

I think the main use-case is plugging in an existing knowledge graph, and it filling in the gaps, correct?

Can I augment this will really high-quality embeddings for the nodes, that were learned over auxiliary unlabelled text?

What are other ways I can augment the data set?

Is this useful only when there are many edge-types, or is it also good when there are very few?

It looks promising, I just couldn't immediately grok when I use should look to this library.

nl 7 years ago

I like the README.md for StarSpace[1] because it has lots of examples which get you thinking.
I used graph embeddings as input to a classifier to classify people when follower/followee information was easy to gather but text wasn't.
Basically anything that can be represented as a graph can be used. There is some interesting work being done using code syntax trees as input which uses a very similar approach. See code2vec[2]
I'm not aware of any way to transfer text embeddings into graph emneddings, but you can could concatenate them and use them together (I've done this before) or maybe do some dimension reduction or do a multi-task learning thing and try to learn some combined representation.
I'm not ware of the scalability limits for this particular library, but Facebook Research's pytorch-biggraph[3] (released 2 days ago) scales to trillions of edges and billions of nodes.
[1] https://github.com/facebookresearch/StarSpace
[2] https://arxiv.org/abs/1803.09473
[3] https://ai.facebook.com/blog/open-sourcing-pytorch-biggraph-...
mulletboyOP 7 years ago

> what are possible inputs to ampligraph? Any knowledge graph will do (i.e. directed, labeled multigraph, with or without schema). We have APIs to read graphs serialised as CSV files or RDF (any serialisation will do): http://docs.ampligraph.org/en/1.0.1/ampligraph.datasets.html...
> the main use-case is plugging in an existing knowledge graph, and it filling in the gaps Correct. That is known as Link Prediction. There are other machine learning tasks you can do, though: for example you can generate embeddings and then cluster them. Or you can use embeddings to see if distinct entities are indeed the same.
> Can I augment this will really high-quality embeddings for the nodes, that were learned over auxiliary unlabelled text? I know there is a handful of papers in literature that do that, but we have not implemented any of them yet in AmpliGraph. Examples:
* Xie, Ruobing, et al. "Representation Learning of Knowledge Graphs with Entity Descriptions." AAAI 2016. * Xu, Jiacheng, et al. "Knowledge Graph Representation with Jointly Structural and Textual Encoding." arXiv preprint arXiv:1611.08661 (2016). * [Han16] Han, Xu, Zhiyuan Liu, and Maosong Sun. "Joint Representation Learning of Text and Knowledge for Knowledge Graph Completion." arXiv preprint arXiv:1611.04125 (2016).
> What are other ways I can augment the data set? I would try first with a dataset with no literals (no strings, no numbers, no geo coordinates) as these are treated as entities, for now. I suggest generating embeddings first on your current graph, and measuring the predictive power using http://docs.ampligraph.org/en/1.0.1/generated/ampligraph.eva... Merging additional datasets would be another option, to get more data to work on.
> Is this useful only when there are many edge-types, or is it also good when there are very few?
Also when there are a few.
Let us know how you likeit, and if you need assistance, we have a public Slack channel - happy to answer any question! https://join.slack.com/t/ampligraph/shared_invite/enQtNTc2NT...

quenstionsasked 7 years ago

Cool. KGE methods are becoming more and more useful as companies are trying to find ways to interface some internal knowledge graph with machine learning techniques. I expect this space to grow substantially!

mulletboyOP 7 years ago

Indeed. Here in Accenture Labs we use it in quite a lot of diverse applicative scenarios. Besides, KG embeddings can be used for other tasks beyond link prediction (e.g. link-based clustering).

mulletboyOP 7 years ago

btw, we are hiring research engineers here in our Dublin Lab. Send me an email if interested: luca.costabello@accenture.com https://www.accenture.com/ie-en/careers/jobdetails?src=&id=0...

Settings

AmpliGraph: A TensorFlow-Based Library for Knowledge Graph Embeddings

Keyboard Shortcuts