Show HN: An AI-Generated Encyclopedia

47 points by mahouk 3 years ago · 53 comments

Reader

bawolff 3 years ago

AI generated encyclopedia kind of seems like a terrible idea. The entire goal of an encyclopedia is to get true (and hopefully unbiased) knowledge, which AI is known to be bad at, especially on the edges.

mahoukOP 3 years ago

I found myself frequently using ChatGPT to learn about new topics. I prefer it over Wikipedia because when I don't understand something, I can just ask it and it can clarify it for me until I get it. However, I found the chat UI to be unideal for this sort of thing, so I created this website using a UX that is aimed at educational use.
- vunderba 3 years ago
  
  You should put up a disclaimer that says, "for entertainment purposes only". Calling this an encyclopedia and marketing it as educational just seems like a bad idea. The whole idea behind a traditional encyclopedia is usually that it is written and vetted by experts.
  Honestly, you'd be way better off just using a basic rag architecture. When a user asks for a topic simply mirror the Wikipedia article and throw up a chat sidebar interface, so that the user can ask questions about it. At least by locking down your context window, you could minimize the number of hallucinations, which judging from some of the other comments sounds like it's already an issue.

mahoukOP 3 years ago

Update:

Sorry guys, but it seems the server has crashed due to a sudden influx of traffic, and I'm attending a funeral service at the moment so I don't have access to my laptop. Will try to get the site back up asap!

nickpsecurity 3 years ago

Sorry to hear you lost someone close to you. I’ve prayed that Jesus Christ provides comfort to you and others involved. Do what you need to do and we’ll look at the site whenever it’s back up. No rush.
Edit: Forgot to add about the site crashing on heavy traffic. You might want to consider a CDN. Cloudflare is No 1 with a free option. StackPath was great but just shut down their CDN. I’m trying BunnyCDN now since it’s pennies per GB.
- mahoukOP 3 years ago
  
  Thanks, I appreciate the kind words
  I'm just new to devops stuff because the things I usually build don't get that much traffic, and a single server did the job without the need for CDNs, load balancers, etc. I had to figure this stuff out just now over the past few hours to help the site cope with all the load.

gus_massa 3 years ago

Nice!

I tried a few stupid words convinations like "blue banana" https://mycyclopedia.co/e/b2fc24e7-21cb-43b2-b5c1-224048982e... and got interesting results. This is strange because it combines a fake photo of a blue banana fruit with a description of the European region known as blue banana.

It's strange that each topic has a "conclusion" section. Is it common in dead tree encyclopedias? I expected a format more similar to Wikipedia.

mahoukOP 3 years ago

Thanks!
The images are real images from the web. In most cases they match the topic you search for but in some cases they turn out to be unrelated. (I already have an idea on how to try and improve accuracy here).
As for the conclusions, you're right now that you point it out. I don't recall coming across conclusion sections in other encyclopedias I read. it's a format GPT (which is the underlying LLM I'm using) seems to like to use by default. I didn't disable that behavior because I guess a conclusion to wrap everything up for the reader isn't a bad thing?
- gus_massa 3 years ago
  
  TIL there are real blue bananas. I thought the image was generated by AI.
  About conclusions, I guess it's the standard high school soulless esay that must have a conclusion at the bottom. I think it's better to remove the conclusions so it looks like Wikipedia, but if you like it you (obviously) can keep it.
  - mahoukOP 3 years ago
    
    I'll test out a conclusion-less format and see how my friends find it.
    > I thought the image was generated by AI.
    That was my initial plan, but I found AI-generated images to be more entertaining than informative, especially when the topic is new to the reader.
    P.S. I don't know if you already tried this, but if you highlight any snippet of text in an entry, you can start a realtime chat about that text (without having to provide any context yourself).

Dwedit 3 years ago

Generated the article for the NES, went away for a few hours, came back.

Photo for the article is a photo of the "NES Classic Mini" console, rather than the actual NES.

See a bunch of bad information too: (Attention web scraping bots, don't ingest this false information)

"...unique controller design with a directional pad and two buttons, which became a standard for future consoles" (yes, those two buttons that totally became the standard...)

An "Introduction" section that mostly duplicates the top summary.

Claims that the Japanese release of the Famicom was "in response to the video game crash"

"Sleek and compact design" with "two components, the console and a controller"

"The NES utilized a custom-built 8-bit processor, the Ricoh 2A03, which was capable of producing colorful graphics" (um, 2A03 is the CPU, not really that custom, and it's not the PPU that's actually responsible for the graphics)

Zelda had a "captivating story that captivated players"

COAGULOPATH 3 years ago

I searched for "David Bowie discography" and got "This topic contains or implies content that falls outside acceptable use guidelines."

Then I searched for "David Bowie". It slowly generated text, section by section. Much of its output was repetitive and slight. It generated a section called "Early Life and Career" and then "Birth and Childhood" with largely similar information. It then abruptly wrapped up the article ("Conclusion: David Bowie's first solo album, "David Bowie" (or "Space Oddity"), was a pivotal release...) with no information about what happened to David Bowie afterward. The actual text had many hallucinations. It said "Space Oddity" was Bowie's first album (it was actually his second), and said the album achieved fame with the Apollo Moon landing in 1972 (which happened in 1969.)

Maybe in a few years something like this will be viable. Right now, it seems inferior to Wikipedia in every aspect.

mahoukOP 3 years ago

That error you got the first time means your query contains words that triggered the OpenAI content filter.
I agree with the other comments on the hallucinations in the content, hence why I did include a disclaimer at the bottom of every page. This project is something I did to just test out the idea of an encyclopedia-like UI on GPT.

LeoPanthera 3 years ago

Accuracy aside, this is an interesting way to demonstrate "LLM as compression", since you can surely get an LLM to emit far more text than the size of the actual model.

schoen 3 years ago

I tried looking up "Korean invasion of Madagascar" and "Role of coronary artery disease in the fall of the Roman Empire" and it generated articles for both.

Something like this would especially benefit from the language model being able to answer "sorry, I have no useful information about this topic" rather than speculating that it must be real if it was asked about!

karimouda 3 years ago

Also what value do we get compared to wikipedia?

sertbdfgbnfgsd 3 years ago

No value. This is just another AI thingy. It's not real information. It can return something, but it doesn't know which things are real.
- mparnisari 3 years ago
  
  So you spent time building something that is actively less useful than what already exists in the world.
  I don't mean to be rude by that but I just don't get it.
  - ailef 3 years ago
    
    You can build stuff just for fun...
    
    mparnisari 3 years ago
    
    Yeah but if you're gonna build something, build something useful lol. Who wants an encyclopedia that returns inaccurate stuff? The whole point of an encyclopedia is that it's been reviewed by someone for accuracy

acosmism 3 years ago

it is funny to play with- i searched "irrational fear of cute kittens" which i didn't expect to generate anything rational - it pointed me towards a mental disorder (ailurophobia) which after i looked up elsewhere is apparently an actual condition. on the other hand "The Existential Fear of Earthquakes Caused by Earthworms" generated an entirely fake article. i see no harm in this product for future generations.

acosmism 3 years ago

let me add to this -
it is a pretty cool project! and useful if one knows entirely what it is doing and not for the masses - in the same way an uncensored AI would be useful for folks entirely aware of the potential gibberish it could generate.

ShamelessC 3 years ago

Perhaps best not to assume everyone on the planet worships Jesus Christ.

edit: It's really not a big deal though. Sorry for stirring a controversy unnecessarily.

swatcoder 3 years ago

They didn't. It's just their way of showing sympathy.
The OP may not find the prayer itself practical, as indeed many on the planet mightn't, but the expression doesn't assume so. Common secular expressions like "Sending good thoughts" or "I feel you" are just immaterial and seemingly ineffectual but similarly get across the point that someone has slowed down long enough to show care.
- ShamelessC 3 years ago
  
  It's really not a big deal. I understand what they meant. I just think the secular variety of expression is preferable to (some) who have lost someone. Religions deal with the afterlife in different ways, and so it could be seen as someone else making "judgemental" assumptions.
  - scrapcode 3 years ago
    
    He literally just stated that he personally prayed to his God for them... you are the one evangelizing here.
    
    mpalmer 3 years ago
    
    Devil's (heh) advocate: the prayer itself matters and is a nice thought, but mentioning the recipient of the prayer makes it a soft pitch, which some might reasonably find unwelcome or awkward.
    
    edgyquant 3 years ago
    
    Or you could not over analyze a well intentioned attempt at comforting another and move on.
    
    mpalmer 3 years ago
    
    I've prayed on this to Tlazolteotl, eater of sins, and have decided to disregard your comment as excessively meta.
    
    lfkdev 3 years ago
    
    Not really Devils Advocat, this is literally what it is
    
    ShamelessC 3 years ago
    
    Okay, sorry if I came across as evangelical. I understand there's no malicious intent.
    
    soperj 3 years ago
    
    I thought Jesus was supposed to be God's son, not literally god?
    
    scrapcode 3 years ago
    
    I think it can vary depending on denomination but within the belief of the trinity, it is one god in the form of three persons (father, son, holy spirit). I am certainly no theologian, though, so take that with a grain of salt.
    
    _a_a_a_ 3 years ago
    
    Just what is the holy spirit? I get the father/son, nobody ever bothered to let us know what that 3rd bit was supposed to be. Or why.
    
    scrapcode 3 years ago
    
    I want to preface that I am agnostic at this point in my life and I struggle with a good understanding of a great example of these and mostly the inconsistencies of their descriptions throughout denominations. But I think in general the "holy spirit" is supposed to be the "feelings" you get that "draw you into" God, or for some people the "callings" they seem to have.
    
    _a_a_a_ 3 years ago
    
    A better description (or indeed any at all) I've not come across before, at least it makes sense thanks
    
    soperj 3 years ago
    
    Why have God if you already have Jesus? Why do you need both?
    
    scrapcode 3 years ago
    
    Christians believe that Jesus' is atonement makes it possible for them to be resurrected and to repent and be forgiven so they can return to their Heavenly Father's presence.
    
    _a_a_a_ 3 years ago
    
    You're asking the wrong guy!
- Teever 3 years ago
  
  It's not their way of showing sympathy. It's their way of shoehorning their religion into a conversation about technical stuff.
  It's the equivalent of corporate astorturfing, but the product they're advertising is Jesus.
  "This comment brought to you by Christ."
  - _a_a_a_ 3 years ago
    
    Sometimes it is, here it doesn't feel like it.
    (Atheist)

karimouda 3 years ago

You need to work on the UI a little bit (things like big fonts, colors ..etc)

visarga 3 years ago

<rant>We need an AI-generated encyclopedia - not for us, but for AI. It should have a trillion articles covering all known entities and concepts, written using RAG over the web. Controversial topics should report the controversy or the distribution of opinions. We can put this big synthetic text corpus in the training set of future models.

Why? Because AI needs long form, in-depth texts to train on, and the web doesn't provide it in sufficient quantity and quality. We need chain-of-thought to capture relations between concepts in explicit language. Synthetic data makes it possible to have balanced coverage of topics and combinatorial coverage of skills to improve reasoning. It's also better from a copyright stand point to train models on synthetic data.</>

jfk13 3 years ago

> the web doesn't provide it in sufficient quantity and quality
Do you seriously think that "an AI-generated encyclopedia" would provide a better-quality training set? What would the "AI generator's" articles be derived from?
- Philpax 3 years ago
  
  The idea is that you can standardise the quality of the training data by taking source articles and synthesizing new data with the same "voice" and structure, as well as being able to collate insights from multiple sources.
  This is the line of thinking behind the Phi lineup of models [0], as well as efforts to generate synthetic textbooks for training [1].
  [0]: https://arxiv.org/abs/2309.05463
  [1]: https://twitter.com/ocolegro/status/1712327588255809667
- visarga 3 years ago
  
  So the way I see it, in the first stage the model can take all concepts in Wikipedia and other knowledge bases, and do web search, collect a bunch of references, study and compile a report. That's straight forward search + summarization. The advantage would be that models get to bring together information sitting in separate examples and synthesize or draw conclusions.
  The second stage would be to generate research questions, then solve them with LLM+web search+code execution+other tools. The results would be compiled in reports. So it's a loop of problem generation, problem solving and validation. You can validate with highly trusted sources, or you can run code or simulations, ensemble multiple attempts, or even leave it to ranking by a preference model.
- thatxliner 3 years ago
  
  What about model collapse?

Settings

Show HN: An AI-Generated Encyclopedia

Keyboard Shortcuts