Show HN: AskHN

612 points by kvh 3 years ago · 138 comments

Reader

setgree 3 years ago

As is often true of GPT responses, there's some nonsense interspersed here, e.g. the claim that R has "a more mature package universe" than Python.

I think this is false, but if you're reading quickly, it sounds cogent enough. As Sarah Constantin observed about GPT2 [0]:

> if you skim text, you miss obvious absurdities. The point is OpenAI HAS achieved the ability to pass the Turing test against humans on autopilot...The mental motion of “I didn’t really parse that paragraph, but sure, whatever, I’ll take the author’s word for it” is, in my introspective experience, absolutely identical to “I didn’t really parse that paragraph because it was bot-generated and didn’t make any sense so I couldn’t possibly have parsed it”, except that in the first case, I assume that the error lies with me rather than the text. This is not a safe assumption in a post-GPT2 world. Instead of “default to humility” (assume that when you don’t understand a passage, the passage is true and you’re just missing something) the ideal mental action in a world full of bots is “default to null” (if you don’t understand a passage, assume you’re in the same epistemic state as if you’d never read it at all.)

[0] https://www.skynettoday.com/editorials/humans-not-concentrat...

leereeves 3 years ago

> there's some nonsense interspersed here, e.g. the claim that R has "a more mature package universe" than Python.
As a programmer, I find R hard to use and not very well designed, so I can see why you'd call that nonsense.
But when I was a math student, I found that in some ways R does have "a more mature package universe". There were many math algorithms that I could find packages for in R and not in Python, even as a mere grad student.
- setgree 3 years ago
  
  Absolutely, for statistics and visualization I think R and its packages are (sometimes) superior. But GPT responses don't generally offer those kinds of nuances; the claim is that the packages are "more mature," period. And it's for good reason that the _most_ mature Python packages, e.g. numpy and pandas, are used by data scientists in production pretty much everywhere.
  - ux-app 3 years ago
    
    amazingly, your comment will eventually be added to the CHatGPT corpus and at some point down the line may be used to add the nuance that's currently lacking :)
    
    LtdJorge 3 years ago
    
    Assuming it's not a GPT response
  - 2020aj 3 years ago
    
    Regarding numpy/pandas: What's the reason outside of them being _in Python_?
shanusmagnus 3 years ago

I wonder if the "default to humility" heuristic probably does more harm than good on net, because the people who heed it probably shouldn't, and the ones who should won't.
- nine_k 3 years ago
  
  Default to humility. Do not assume you're so smart that you can skim the text and understand it correctly. Read every word, don't assume that the author is so predictable that you can guess correctly.
  Why, does not sound too arrogant to me.
q1w2 3 years ago

I think it's important to remember that Humans who are not-too-smart can also sound coherent, yet also babble complete nonsense.
My experience with ChatGPT thus far is that it is as intelligent as a very broadly read person who just doesn't reeeally get the complex or nuanced aspects of the content it reads - much like many real Humans.
- setgree 3 years ago
  
  Robin Hanson makes this point in better babblers http://www.overcomingbias.com/2017/03/better-babblers.html
  "After eighteen years of being a professor, I’ve graded many student essays. And while I usually try to teach a deep structure of concepts, what the median student actually learns seems to mostly be a set of low order correlations. They know what words to use, which words tend to go together, which combinations tend to have positive associations, and so on. But if you ask an exam question where the deep structure answer differs from answer you’d guess looking at low order correlations, most students usually give the wrong answer."
meken 3 years ago

Reminds me how when people get criticized on Twitter now, they just assume it’s a bot

have_faith 3 years ago

It seems to write in the generic "style" of GPT, instead of in the style I would recognise as a HN poster. Is that because of something baked into how the training process works? It lacks a sort of casualness or air of superiority ;)

sebzim4500 3 years ago

There was no training process, this is just running GPT with relevant HN comments as part of the prompt.
If he wanted it to replicate that classic HN feel he would either have to extend the prompt with additional examples or, better yet, use finetuning.
I guess he could also just randomly sprinkle in some terms like 'stochastic parrot' and find a way to shoehorn Tesla FSD into every conversation about AI.
- btbuildem 3 years ago
  
  > “AskHN” is a GPT-3 bot I trained on a corpus of over 6.5 million Hacker News comments to represent the collective wisdom of the HN community in a single bot.
  First sentence of the first paragraph on OP's page
  EDIT: it's a bit misleading, further down they describe what looks like a semantic-search approach
  - agolio 3 years ago
    
    Scroll a bit further down and you will see
    > 7. Put top matching content into a prompt and ask GPT-3 to summarize
    > 8. Return summary along with direct links to comments back to Discord user
    
    btbuildem 3 years ago
    
    Ah got it. Perhaps they should edit the intro then, it's misleading.
    
    stnmtn 3 years ago
    
    I agree, that language could be very improved. This is not a GPT-like LLM whose training corpus is HN comments, which I found to be an extremely interesting idea. Instead, it looks like it's finds relevant HN threads and tells GPT-3 (the existing model) to summarize them.
    To be clear, I think this is still very cool, just misleading.
    
    agolio 3 years ago
    
    Soon we will see language style transfer vectors, akin to the image style transfer at the peak of the ML craze 5-10 years ago -- so you will be able to take a HN snark vector and apply it to regular text, you heard it here first ;)
    
    OkGoDoIt 3 years ago
    
    Joking aside, that does seem like it would be very useful. Kind of reminds me of the analogies that were common in initial semantic vector research. The whole “king - man + woman = queen” thing. Presumably that sort of vector arithmetic is still valid on these new LLM embeddings? Although it still would only be finding the closest vector embedding in your dataset, it wouldn’t be generating text guided by the target embedding vector. I wonder if that would be possible somehow?
    
    efreak 3 years ago
    
    Hmm. If you're willing to be stuck in time at 2016, there's https://zenodo.org/record/45901
    Build a model off of that?
- rpastuszak 3 years ago
  
  Last year (pre the chatGPT bonanza) I was using GPT-3 to generate some content about attribution bias and the responses got much spicier once the prompt started including the typical HN poster lingo, like "10x developer":
  https://sonnet.io/posts/emotive-conjugation/#:~:text=I%27m%2...
  My conclusion was that you can use LLMs to automate and scale attribution bias.
  We did it guys!
britzkopf 3 years ago

To truly capture the HN experience, the user should provide a parameter for the number of "well actually"'s they want to receive. So initial response should demonstrate clear expertise and make a great concise point in response to question, and then start the cascade of silly nitpicking.
- bradwood 3 years ago
  
  I think you'll find "I think you'll find" trumps "well actually".
  ;)
  - vidarh 3 years ago
    
    I wish the results were reversed, so I could "well actually" your comment, but 'site:news.ycombinator.com "well actually"' gives ca. 4k results in Google and 'site:news.ycombinator.com "I think you'll find"' gives close to 17k results, so you appear to be right.
    
    actually_a_dog 3 years ago
    
    Well, "it turns out that" beats both, with about 26k results ;)
    
    Jimmc414 3 years ago
    
    site:news.ycombinator.com "in my experience" 120K results
    
    genericone 3 years ago
    
    IANAL: unfortunately only 10.6k results, thought I had a winner for a second.
    
    ysavir 3 years ago
    
    I am mildly disappointed that none of the phrase pitches in this thread were phrased with the given pitch.
clark-kent 3 years ago

> ii. Compute embeddings and similarity and choose top K comments closest to question
> iii. Put top matching comments into a prompt and ask GPT-3 to answer the question using the context
It depends on the Prompt used to ask GPT the question. A prompt that instructs GPT to write like a HN poster should fix that.
cookie_monsta 3 years ago

There also needs to be at least one question mark at the end of a statement?
reacharavindh 3 years ago

Now that you say it, it will train itself for it while it learns from your comments ;-)

georgelyon 3 years ago

Am I correct in understanding that this doesn't actually generate answers based on HN, but instead finds semantically-near comments and sends them verbatim to GPT to summarize? Seems like a good enough hack, though I'd love a detailed writeup of how to actually specialize an existing LLM with additional training data (like HN).

ilaksh 3 years ago

Technically it does give a specific answer to the question, but it is based on the semantically similar comments (and the question).
The thing people don't realize is that right now there is a very large gap between the capabilities of a few models including OpenAI's most recent ones, and most of the other LLMs. So there are several options for actually training or fine-tuning with open models, but actually none of them have the language understanding and generation capabilities at the level of those new OpenAI models.
As far as I know.
jerpint 3 years ago

The summary itself is still generated, but has all the context to do summarization in the prompt.
It's very difficult to otherwise finetune existing LLMs. GPT itself is closed-sourced, and doesn't allow for finetuning (except via an opaque API and with limited amounts of data). Other open models are either very difficult to load in memory and/or simply not as expressive as GPT
serjester 3 years ago

Agreed, I think the better approach is to do some custom tuning but that becomes cost prohibitive very quickly. Not really much different than Algolia with a minor GPT-3 integration but neat project regardless.
redox99 3 years ago

You can literally finetune these OpenAI models using their API. In this case it probably wasn't done because the author found it too much work and/or too expensive.
- moremetadata 3 years ago
  
  >the author found it too much work and/or too expensive.
  Military budgets and manpower would make sock puppets interesting. https://www.theguardian.com/technology/2011/mar/17/us-spy-op...
  Does anyone find it curious Edward Snowden is mute on the military's use of this technology?
  How many people would be surprised to learn they are communicating with a program on websites like this?

flappysolutions 3 years ago

I'm quite pleasantly surprised by this response:

Question: what are the biggest fallacies of people that post on hackernews?

Answer: One of the biggest fallacies of people that post on Hacker News is the nirvana fallacy, which is when someone jumps from "thing x is imperfect" to "thing x is worthless" without considering any of the benefits. People also tend to be familiar with certain technologies and will discount the flaws of existing technology vs. the flaws of new technology. Another fallacy is the assumption that there is one best tool for doing things - the one that supports the largest amount of use cases. Additionally, people may use appeals to authority, invent their own solutions rather than use existing ones, and assume that the approaches they first learned are most correct. Finally, people may assume open source software is a gift and that dependencies don't need to be tested.

autoexec 3 years ago

> People also tend to be familiar with certain technologies and will discount the flaws of existing technology vs. the flaws of new technology.
Is that really a problem? It's only natural to get accustomed to old problems caused by old tech (where workarounds probably already exist or limitations understood and accepted) while it makes sense to hold new tech to a higher standard since it has the benefit of learning from past mistakes and should avoid introducing obvious problems for people in the future to deal with. Our future selves will thank us for being more critical of flawed new tech than we are of old tech.
ducktective 3 years ago

It kinda enumerated all possible characteristics of people and their approaches to technical issues, not something unique to HN.

LeoPanthera 3 years ago

I'm a little surprised that Hacker News comments weren't already in the GPT-3 training set. I just assumed that OpenAI had vacuumed up most of the web already.

retube 3 years ago

I am guessing they already were? But this is 100% pure, concentrated HN not contaminated with nonsense from the rest of the web :)
- bityard 3 years ago
  
  If it's really trained exclusively off of HN comments, I expect most of the bot's responses will evade the actual question but spend several paragraphs debating the factual specifics of every possible related tangential point, followed by an thinly-veiled insult questioning the user's true motivations.
  - heleninboodler 3 years ago
    
    In no way does a typical HN comment debate every possible related tangential point. Do we expect a modicum of intellectual rigor? Yes. But to say every tangent is followed and scrutinized is simply factually untrue.
    And several paragraphs? I challenge you to show even a large minority of argumentative responses that veer into "several" paragraphs. You characterize this as "most of the ... responses" but I think that's unfair.
    One wonders why you'd resort to such hyperbole unless you were deliberately attempting to undermine the value of the site.
    
    GreenWatermelon 3 years ago
    
    This is my favorite type of humour.
  - Aromasin 3 years ago
    
    If you're not arguing over the semantics, rather than OP's clear-enough intent, are you really on HN?
  - jb1991 3 years ago
    
    That had me laughing! Case in point, from a few days ago: https://news.ycombinator.com/item?id=34855372
  - rocho 3 years ago
    
    It's not trained at all. The bot finds relevant comments and then uses OpenAI's API to summarize them.
- MuffinFlavored 3 years ago
  
  Is it exclusively HN comments and nothing else? How does a model like that know how to speak English (noun/verb and all that) if you are starting from scratch and feeding it nothing but HN comments?
  - neoromantique 3 years ago
    
    I'm sorry to be THAT GUY, but it is addressed in the article :)
    >GPT embeddings
    To index these stories, I loaded up to 2000 tokens worth of comment text (ordered by score, max 2000 characters per comment) and the title of the article for each story and sent them to OpenAI's embedding endpoint, using the standard text-embedding-ada-002 model, this endpoint accepts bulk uploads and is fast but all 160k+ documents still took over two hours to create embeddings. Total cost for this part was around $70.
  - gorbypark 3 years ago
    
    In a nut shell, this is using openai’s api to generate embeddings for top comments on hn, then also generating an embedding for the search term. It then can find the closest related comments for the given question by comparing the embeddings and then send the actual text to GPT3 to summarize. It’s a pretty clever way to do it.
  - nkozyra 3 years ago
    
    > How does a model like that know how to speak English
    Mimicry.
- nkozyra 3 years ago
  
  I have to assume that targeted/curated LLM training sets will have a tendency to be less accurate than very general, just by the very nature of how they work.
  (edited for clarity)
  - andai 3 years ago
    
    I know it's not quite analogous, but I fine-tuned GPT-3 on a small (200 examples) data set and it performed extremely poorly compared to the untrained version.
    This surprised me, I thought it wouldn't do much better, but I wasn't expecting that specializing it on my target data would reduce performance! I had fewer examples than the minimum OpenAI recommends, so maybe it was a case of overfitting or something like that.

jerpint 3 years ago

Nice! We built something very similar recently, it is more like "Ask your documentation" but very similar implementations otherwise.

See a demo on the huggingface transformers documentation: https://huggingface.co/spaces/jerpint/buster

code: https://github.com/jerpint/buster

freeqaz 3 years ago

Starred! We've been looking to build something similar so I appreciate you sharing this here.
The only other project that I've seen that's doing something close to this is this one: https://github.com/getbuff/Buff
It's a bit more similar to the OPs bot (it's a Discord bit).
Cool to see momentum in this space!
jn2clark 3 years ago

As did we! It works very well. Article: https://github.com/marqo-ai/marqo/blob/mainline/examples/GPT... and code:https://github.com/marqo-ai/marqo/tree/mainline/examples/GPT...

boywitharupee 3 years ago

For those who are wondering,

HN data is indexed with embeddings for semantic search. When queried, it finds closest article, top comments and summarizes with GPT-3.

GPT-3 serves as a rendering tool for compressed comments.

btbuildem 3 years ago

My own experiments made me think that the impact of finetuning is comparable to that of a molecule in a drop in a bucket.

> “AskHN” is a GPT-3 bot I trained on a corpus of over 6.5 million Hacker News comments to represent the collective wisdom of the HN community in a single bot.

I'm assuming you used the openai fine-tuning pathway to make a custom model?

Have you tested the responses on vanilla GPT3 vs your custom model?

I'd be curious to see the comparison.

clark-kent 3 years ago

From the article, they did not use fine-tuning. This is semantic search + GPT-3 to provide human-like answers.
- btbuildem 3 years ago
  
  Thanks! I missed that part.
  The semantic search approach seems to focus the answers better than fine-tuning; at the cost of preloading the prompt with a lot of tokens, but with the benefit of a more constrained response.
danuker 3 years ago

Yeah, to me it looks like the learning rate was way too low to make a difference.
I don't see any of the sublime and succinct snark.
- notahacker 3 years ago
  
  Yeah. Also full of GPT-3isms like "ultimately the choice ... comes down to the specific project and its ... requirements" and not nearly contrarian enough
  A bot focused on the output of HNers would insist on providing arguments against going through Google's interview process in the first place and suggestions that the correct answer to "Python or R" should be Haskell or Julia and would never suggest prioritising emotional vulnerability or being a happy person!
  - danuker 3 years ago
    
    Thank you for the laffs =)

SilverBirch 3 years ago

This might be a dumb question, but is this based on the collective wisdom of HN. Because I would say that the collective wisdom is just as much in the interaction of the comments and the ranking of those comments as it is in the comments themselves. If you just injest all the comments wholesale, aren't you rather getting the average wisdom of HN?

inportb 3 years ago

I believe it's always going to be an average. The more interesting question is how is the average weighted?

fleddr 3 years ago

Let's admit that HN's culture is that many of us are confidentially wrong, which we cover up with impressive technical jargon. As such, any wrong answer in this AI is in fact correct.

Sai_ 3 years ago

> confidentially wrong, which we cover up with impressive technical jargon
I get the feeling this comment is self-referential/self-parodying.
- fleddr 3 years ago
  
  Well spotted ;)

MikeTheRocker 3 years ago

I love this! I used to append "reddit" to my Google search queries to get best results, but the quality of dialog over there has really dropped in recent years. These days I've switched to appending "hackernews", but this is even better.

leobg 3 years ago

Same. I have “site:news.ycombinator.com” as a keyboard shortcut on my phone. Use it all the time.

monkeydust 3 years ago

Nice work! Been playing with Langchain and was not aware of patterns.app.

This whole space is moving so fast its hard to keep up for someone whos immediate day job doesn't revolve around this space. Congrats.

egypturnash 3 years ago

Is there a way to opt out of one's comments being used for this?

anaganisk 3 years ago

Nah, it's no big a deal, its not like cambridge analytica will happen again. They're just using your data to train AI. Who knows may be based on the way you comment, you may get suggestions on which medication you need, or if it's time for the Redbull/starbucks coffee. Nah, all is good. Nothing bad will happen in allowing companies to scrape comments and build models. They're very ethical. In fact, people here are suddenly not so concerned that the model is not open. There is no oversight on how data is being used They are just proud to get answers from a text generator.
- olivierduval 3 years ago
  
  The BIG DEAL is not THAT specific instance but the fact that the ML crowd think it's OK to take everything without even asking permission
  - autoexec 3 years ago
    
    > The BIG DEAL is...the fact that the ML crowd think it's OK to take everything without even asking permission
    Everything they take was freely given. Thrown into the void. Screamed into the wind. It's weird that people are perfectly fine if someone happens to read their words (at all) and fine if some of those who do read them manage to find something in them that is in any way helpful or useful, but the moment they think someone else might make money as a result of something gained from exposure to those same words it's somehow offensive and everyone starts demanding a cut of (usually non-existent) profit.
    The "ML" crowd has just as much a right to read and learn from the words I enter on social media platforms as anyone else. I'm not charging any kind of fee for the words of debatable wisdom, fact checking, or shitposting I "contribute". I didn't ask permission before replying to your comment. Why should anyone feel like they should ask for permission from me to read it? What exactly is "taken" from me beyond the time I voluntarily spent participating in online discourse?
  - anaganisk 3 years ago
    
    I think I should've put an /s at the end. Its kind of strange that I see constant discussions here and people harrassing small apps/libraries about how their error collection is not OPT-IN. The whole audacity debacle. But data collection for training ML models is perfectly fine because we sure do know the companies who fund the research, how they will get an ROI.
Minor49er 3 years ago

Just post a healthy amount of random nonsense along with any of your actual posts to dilute the effects
Banana Sebastian housewares fly swimmingly under terrestrial Zruodroru'th Memphis Steve Jobs archipelagos
- triyambakam 3 years ago
  
  > Banana Sebastian housewares fly swimmingly under terrestrial Zruodroru'th Memphis Steve Jobs archipelagos
  It's actually more likely to require a bathtub to increase the volume of the reticulated lorries, so I really don't think a farmer's market is the ideal place.
hombre_fatal 3 years ago

Yes, don’t post on online forums.
- samstave 3 years ago
  
  Thats how I decided to opt-out of reddit after 16 years.
jdthedisciple 3 years ago

Why would you want to? Genuinely wondering.
I for one am oh so proud that my valuable ramblings contributed to this majestic machinery.
olivierduval 3 years ago

I agree: when I signed in, I never agreed to let anybody use what I write to do anything they want ! I only agreed to let everybody read, understand, interact with what I wrote
Actually, it makes me feel as bad as knowing that CAPTCHA were used to train image recognition models...
I think it could be a good time to reconsider the question of the consent. I may agree that my words are used to train some IA... but 1) I must be asked (kindly) first and 2) it won't be free!!! (it may be paid to me or the service provider like HN... but it's NOT unpaid work ;-) )
andai 3 years ago

If you're willing to pay for the retraining? ;)

PetrBrzyBrzek 3 years ago

Hi, thanks for the interesting article. I have a question about Pinecone. What is the cost of storing all these vectors?

kerogerokero 3 years ago

Anyone here know how to generate subtitles automatically using AI when a video is playing on the web?

Was planning to see how I can build something like these sites but without the need to regularly update the subtitles catalog: https://subscene.be https://subtitlecat.com https://subtitletrans.com https://my-subs.co

la64710 3 years ago

Is there any LLM model that can be self hosted and fed a corpus of data to ingest for question answering? The part I find difficult is how to feed (not train) the open LLM models with entire dataset not available to public?

bayan1234 3 years ago

The hack to solve this is to embed each paragraph in your large corpus. Find paragraphs most similar to the user query using embeddings. Put the paragraphs and the raw user query into a prompt template. Send the final generated prompt to gpt3.
This actually works surprisingly well.
Check out the OpenAI cookbook for examples.

renewiltord 3 years ago

ChatGPT and friends always talk like those Microsoft and Apple forum responders with 100k reputation.

I see that you are asking about "How to get a job at Google". I will help you with "How to get a job at Google". In order to solve the problem of "How to get a job at Google" please follow the following steps first:

- rewrite your resume in Google Docs

- reinstall Chrome

- apply to the job

Let me know if I can help further with "How to get a job at Google". I like using it, but I have to tune my prompts to make sure that they don't bullshit me before getting to the point.

leobg 3 years ago

I like the project. Had been wanting to do this myself for a long time, because HN has become the first place I go to nowadays for answers, and I value the intelligence and experience distilled in the comments here.

I do not like that it seems to be effectively an ad.

> Embedding every single one of the 6.5 eligible comments was prohibitively time-consuming and expensive (12 hours and ~$2,000).

Does anybody understand what he’s talking about here? Assuming 6.5 million comments and an average token length of 70 we'd be looking at $180 ($0.0004 / 1K tokens).

motohagiography 3 years ago

Nice. I just sort of assumed early on my comments were training some future AI, and I hope that in some small way I have been able to moderate some of its stupider urges.

A version where you can turn knobs of flavored contributors would be pretty funny. I know my comment style is easily identifiable and reproducable, and it encodes a certain type of logical conjugation, albeit biased with some principles and trigger topics, and I think there is enough material on HN that there may be such a thing as a distinct, motohagiographic lens. :)

_8j50 3 years ago

Some day I will sue people like OP (if they're monetizing it) and OpenAI for monetizing my public posts. You can use, reuse and alter public speech but when you earn ad dollars...yeah part of that is mine if your model used my public content. I probably won't actually sue but someone will.

I am not a lawyer but there has to be a jurisdiction where I can establish standing at least.

gnicholas 3 years ago

Love that it includes sources — this makes it much more valuable because you can tell if it's giving useful information or just blowing smoke.

clark-kent 3 years ago

> 4. Index the embeddings in a database

If Op is reading. I'm curious about the database you are using to store the embeddings. Pinecone, Weaviate ...?

gk1 3 years ago

From the article:
> The embeddings were then indexed with Pinecone.

DanielBMarkham 3 years ago

Related question: I've written probably a million words over my lifetime.

Is there an easy way to load up GPT with my thoughts to have it be a fake me?

mittermayr 3 years ago

This I think would be a great little SaaS idea to make some money. I keep seeing more and more people askingn how they can transform their data into an interactive archive that responds as chat, or with voice.
ed_mercer 3 years ago

Sort of. Look into gpt-index/Langchain

einpoklum 3 years ago

> The methodology I used here is a generic, scalable solution for distilling a knowledge corpus into an embodied intelligence

The methodology used here is a generic solution for distilling a non-generic corpus of utterances of , into a generic platitude machine.

osigurdson 3 years ago

I have an experiment that uses the embeddings to visualize clusterings of HN comments (using tsne). Not super useful but interesting to view the comments in 3D and seeing how similar ones cluster together into mostly relevant themes.

ada1981 3 years ago

Hmm. I thought perhaps he was going to take the questions from askHN and the top upvoted comments and fine tuning a model with that as the prompt / reply pair.

Curious how that would differ; but would be an expensive endevour.

jaequery 3 years ago

Seeing a ton of projects utilizing chatgpt nowadays. Are the project owners basically paying the API costs out of pocket ? Think it would add up pretty quick especially if front page on HN.

wastu 3 years ago

Instead of being summarize tool, this bot is really useful if you want to search related hn post based on abstract thinking imo.

underlines 3 years ago

Just to be sure: This is NOT a finetuned GTP model, but rather standard GPT-3 API, used to summarize search results of a HN Comments DB, based on user input. Right?

tosemlep 3 years ago

Did you also ingest dead comments to the corpus?

I would very much like to see the ghost of Terry pop up from time to time, to offer his wisdom and unique style of response.

MuffinFlavored 3 years ago

> I trained on a corpus of over 6.5 million Hacker News comments

How long did it take to scrape them and train the "corpus" on this content?

tta 3 years ago

https://console.cloud.google.com/marketplace/details/y-combi...
SalimoS 3 years ago

That was mentioned in the article In the « Ingesting and filtering HN corpus » … 30min …

adversaryIdiot 3 years ago

I Didn't know the api supported downloading all of its database. Are you the reason HN has sporadic downtime lately? ;)

Existenceblinks 3 years ago

Ask HN: here is my idea, can I build this in a weekend

AI: of course .. here is your bash script (220 lines long)

bilekas 3 years ago

"He only went and did it... " !

dalmo3 3 years ago

This is nice! The official algolia search is useless.

Otoh, did I miss something or is it only on discord?

cactusplant7374 3 years ago

I really like Algolia. I usually use it to see if a particular link has been submitted. Other times I use it to find relevant comments or posts.

smsm42 3 years ago

How to get a job at Google? Oh, that's easy, just get a PhD.

Thanks bottie, very use, much helpful.

helsontaveras18 3 years ago

Now that we have this bot to answer questions for us, I think we can all go home!

AndrewKemendo 3 years ago

First thing I saw is my answers to someones question.

Can you cut me a distro of myself?

hbn 3 years ago

Amazing, an AI that is incapable of picking up on jokes or sarcasm!

m3kw9 3 years ago

I thought chatgpt may already used hacker news (Reddit) to train?

sorokod 3 years ago

the collective wisdom of the HN community

Made me smile

tomr75 3 years ago

could you do this for medical journal articles?

all2 3 years ago

You'd probably need to prepend a prompt that told the bot how to analyze experiment design. Maybe have it read a book or 10 on experiment design. Also a few books on social networks, financial motivations and other human factors in science. Then let it take a look at journal articles and their metadata. In short, you need a way to vet for quality.

pknerd 3 years ago

Can anyone help me to guide some tutorials using GPT-3 model on a certain dataset. I am a Python programmer.

Settings

Show HN: AskHN

Keyboard Shortcuts