Settings

Theme

Show HN: I built a news aggregator/knowledge graph

emergentdata.co

4 points by emrgx 9 years ago · 5 comments

Reader

purplecones 9 years ago

Hi this is great! I have some questions.

I'm curious how you decided to model the data in your Neo4J database. How did you do the 'Suggested Readings' section? How does the cipher query look that drives that.

How do you like using AlchemyAPI? Is it doing all the NLP stuff for you?

  • emrgxOP 9 years ago

    Alchemy is doing all the NLP. Each article is extracted for concepts and entities (as defined by Alchemy in their documentation). I normalize each term that is extracted in order to prevent duplicates (there are some duplicates that still sneak through so it still requires a little bit of data maintenance). So the way this looks is that their is one node for a term say "Machine Learning." In one article "Machine Learning" is a concept with a negative sentiment and high relevance and another article it is an entity with low relevance but positive sentiment. The relationships house the sentiment and relevance properties: (machine_learning)-[relevance,sentiment]-(article).

    The suggested readings sections pulls the most relevant concept of that article and finds connected articles with the same concept at a high relevance. This way suggested articles are more than just key word hits. It's all about relevance. I'm still continuing to tweak this query and there's a lot more that can be done with it such as matching sentiment and emotion. As the dataset grows I'll look to add a feature that pulls a list of articles based on a cluster of highly associated entities.

    As for Alchemy, I've tried a number of different NLP APIs and, in my opinion, none of them have come close to matching Alchemy's accuracy. It does make mistakes but at a low enough level that it's easy to manually correct.

    • purplecones 9 years ago

      Thanks for the background. I'm working on a similar project but currently parsing news articles using a collection of specific rss feeds and calling Google's NLP API with the text. It sounds like AlchemyAPI seems be a better fit in this case.

      How are you finding Neo4J is handling the scale of reading and writing all these stories? I've had a positive experience so far but I'm only in the few thousands range.

      • emrgxOP 9 years ago

        Neo4j handles read/write seamlessly I have found, but I'm only around 10,000 nodes and 20,000+ edges. I've heard use cases for Neo4j in the range of 50M+ nodes. My position on this is not whether Neo4j can handle it but whether your code and infrastructure can.

emrgxOP 9 years ago

Hi HN: I built a curative news feed covering advancements in technology and global issues. I'm utilizing Neo4j and AlchemyAPI, as well as some custom code, to create a knowledge graph in the background. Have a few ideas of additional features for the dataset but would love to hear some feedback.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection