Settings

Theme

Show HN: Graph of wikipedia articles semantic similarity (LSI, Python, d3.js)

similarityapi.appspot.com

54 points by lucamartinetti 14 years ago · 13 comments · 1 min read

Reader

Small experiment of visualization of wikipedia articles as a graph using d3.js.<p>Articles with more traffic are bigger. I computed the semantic similarity using LSI with python (gensim) You have to scroll down/right a bit!<p>http://similarityapi.appspot.com/graph/?title=blade%20runner<p>There is also a JSON api: http://similarityapi.appspot.com/api/v1/?limit=100&title=blade%20runner<p>All feedback is appreciated:<p>@lucamartinetti luca@luca.io

lucamartinettiOP 14 years ago

Small experiment of visualization of wikipedia articles as a graph using d3.js.

Articles with more traffic are bigger. I computed the semantic similarity using LSI with python (gensim) You have to scroll down/right a bit!

http://similarityapi.appspot.com/graph/?title=blade%20runner

There is also a JSON api: http://similarityapi.appspot.com/api/v1/?limit=100&title...

All feedback is appreciated:

@lucamartinetti luca@luca.io

  • 3pt14159 14 years ago

    I've had much, much better results with LDA than LSI. Give that a shot if you have a chance, you'll be blown away. Stop word ratios are important, and make the max number of tokens 500,000.

  • viscanti 14 years ago

    The JSON api should degrade gracefully if results aren't found. I.E. There should be a JSON message explaining that that item doesn't exist.

    • lucamartinettiOP 14 years ago

      Right! It could use some input checking / normalization too. It expects the title parameter to be lower case now.

  • rplnt 14 years ago

    Option to select language version could be a good feature (defaulting to en as now).

  • Radim 14 years ago

    how much data did you use for the semantic analysis?

    • lucamartinettiOP 14 years ago

      The whole text of all articles from wikipedia english (then filtered those with more the 1k views last month)

Edootjuh 14 years ago

I've never liked these scrolling animations. You need too much precision to see a part of the page clearly, while with normal scrolling it wouldn't matter if the information you're reading is at the bottom or top of the screen.

stephengoodwin 14 years ago

Does the font size for a node represent it's similarity with the query page?

  • lucamartinettiOP 14 years ago

    It represents the traffic of the article. Ten most related articles are displayed for each expanded node. Articles with more inbound links are darker

lucian1900 14 years ago

Blank page in Chrome.

ssn 14 years ago

Down?

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection