Settings

Theme

Ask HN: What were the papers on the list Ilya Sutskever gave John Carmack?

396 points by alan-stark 3 years ago · 143 comments · 1 min read

Reader

John Carmack's new interview on AI/AGI [1] carries a puzzle:

“So I asked Ilya Sutskever, OpenAI’s chief scientist, for a reading list. He gave me a list of like 40 research papers and said, ‘If you really learn all of these, you’ll know 90% of what matters today.’ And I did. I plowed through all those things and it all started sorting out in my head.”

What papers do you think were on this list?

[1] https://dallasinnovates.com/exclusive-qa-john-carmacks-different-path-to-artificial-general-intelligence/

dang 3 years ago

Recent and related:

John Carmack’s ‘Different Path’ to Artificial General Intelligence - https://news.ycombinator.com/item?id=34637650 - Feb 2023 (402 comments)

sillysaurusx 3 years ago

"The email including them got lost to Meta's two-year auto-delete policy by the time I went back to look for it last year. I have a binder with a lot of them printed out, but not all of them."

RIP. If it's any consolation, it sounds like the list is at least three years old by now. Which is a long time considering that 2016 is generally regarded as the date of the deep learning revolution.

  • pengaru 3 years ago

    > If it's any consolation, it sounds like the list is at least three years old by now.

    In my experience when it comes to learning technical subjects from a position of relative total ignorance, it's the older resources that are the easiest to bootstrap knowledge from. Then you basically work your way forward through the newer texts, like an accelerated replay of a domain's progress.

    I think it's kind of obvious that this would be the case when you think about it. Just like how history textbooks can't keep growing in size to give all past events an equal treatment, nor can technical references as a domain matures.

    You're forced to toss out stuff deemed least relevant to today, and in technical domains that's often stuff you've just started assuming as understood by the reader... where early editions of a new space would have prioritized getting the reader up to speed in something totally novel to the world.

  • moglito 3 years ago

    "considering that 2016 is generally regarded as the date of the deep learning revolution" --

    I thought it was 2012, when AlexNet took the imagenet crown?

    • sillysaurusx 3 years ago

      That's probably fair. But you'd be hard-pressed to find a DL stack to try out your ideas with prior to 2016, since that's when Tensorflow launched. :)

      (Gosh, it's been less than a decade. Time sometimes doesn't fly, considering how much it's changed the world since then...)

  • vtantia 3 years ago

    Whoops, Carmack referenced the thread and tagged Ilya in it a veiled request to publish the list - https://twitter.com/ID_AA_Carmack/status/1622673143469858816

  • mellosouls 3 years ago

    Sorry - where is that sourced from? Or are you meaning it was a personal communication to you? Or it's a joke?

querez 3 years ago

A lot of other posts here are biased to recent papers, and papers that had "a big impact", but miss a lot of foundations. I think this reddit post on the most foundational ML papers gives a lot more balanced overview: https://www.reddit.com/r/MachineLearning/comments/zetvmd/d_i...

sho_hn 3 years ago

> "You’ll find people who can wax rhapsodic about the singularity and how everything is going to change with AGI. But if I just look at it and say, if 10 years from now, we have ‘universal remote employees’ that are artificial general intelligences, run on clouds, and people can just dial up and say, ‘I want five Franks today and 10 Amys, and we’re going to deploy them on these jobs,’ and you could just spin up like you can cloud-access computing resources, if you could cloud-access essentially artificial human resources for things like that—that’s the most prosaic, mundane, most banal use of something like this."

So, slavery?

  • hosolmaz 3 years ago
    • sho_hn 3 years ago

      I was quoting "Measure of a Man" :-)

      "Lena" is a bit of different case because it's not AGI. Probably ripe for the "forced prison labor" suggested by your sibling as the moral cop-out. Imagine being sentenced to being a cloud VM image!

      • EamonnMR 3 years ago

        Is there a good way to distinguish between the brain dumps in Lena and what you'd call an AGI?

        • sho_hn 3 years ago

          A brain dump has a history, and we ascribe meaning to the past. As mentioned the thread here has mentioned forced prison labor as a form of socially acceptable slavery, and society could convince itself that a given brain dump deserves its fate, even that it is a form of atonement.

          Artifical life on the other hand is presumably "pure at birth".

          Of course it's not that easy. You could discuss whether individual instances have unique sets of human rights, and value potential futures over pasts.

  • i_s 3 years ago

    Sounds like 'Age of Em' by Robin Hanson: https://ageofem.com/

  • aj7 3 years ago

    Computer time is paid for.

    • zo1 3 years ago

      What happens when we digitize ourselves and can run said snapshot image on "computer time"? We can barely cope with legal issues in the digital age now.

    • inawarminister 3 years ago

      Isn't it a similar problem to humans on hostile biosphere like the promised Martian colonies?

      Even breathing and drinking water are sourced by megacorps-in-charge

    • sho_hn 3 years ago

      Will the AIs own the computers?

  • klabb3 3 years ago

    I think there’s broad consensus that slavery only applies to human labor. Even within that spectrum people avoid the term (see forced prison labor). We also don’t use it for animal labor, for instance.

    • sho_hn 3 years ago

      > animal labor

      The context uses human-like/human-level a lot, but I agree what level and type of intelligence commands human respect is tricky business.

      > forced prison labor

      Would be interesting if we found ways to convince ourselves the AIs had it coming.

      Generally speaking, slavery has been morally acceptable and popular before, and I will also not be surprised if we return to those ways.

    • bathtub365 3 years ago

      Human slaves were often considered to be less than human or, at the very least, not deserving of basic rights that other humans enjoyed, as part of the moral and ethical frameworks that supported the practice. I think we might see the same shift in dominant ideology if we do have “true” AGI. I’m sure I could be convinced that an intelligence that develops and grows over a number of years begins to have a right to exist and a right to freedom of expression and movement.

      • sho_hn 3 years ago

        Given the outcry/backlash over Dall-E/ChatGPT (what is "real art", etc.) and how much of our society is permeated by a search for authenticity (perceived) already, I wonder if you're right. We might decide "artifical" lifeforms are a lower class than "evolved in nature". For many religions this could be a natural take - made by God vs. folly of man, etc.

    • mike_d 3 years ago

      If we ever conjure a way to capture the human consciousness and preserve it before death, "AI" will be based on indentured servitude.

      The people given a second chance at life will be the ones who are quickest at identifying traffic signals or fire hydrants from a line up of images.

  • robotburrito 3 years ago

    I wonder if those Franks and Amys would just think they are working remote jobs hammering out tickets from their studio apartments lol.

optimalsolver 3 years ago

Carmack says he's pursuing a different path to AGI, then goes straight to the guy at the center of the most saturated area of machine learning (deep learning)?

I would've hoped he'd be exploring weirder alternatives off the beaten path. I mean, neural networks might not even be necessary for AGI, but no one at OpenAI is going to tell Carmack that.

  • GuB-42 3 years ago

    If you want to be off the beaten path, you have to know where the beaten path is.

    Otherwise you may end up walking the ditch beside the beaten path. It is slow and difficult, but it won't get you anywhere new.

    For example, you may try an approach that doesn't look like deep learning, but after a lot of work, realize that you actually reinvented deep learning, poorly. We call these things neurons, transformers, backpropagation, etc... but in the end, it is just maths. If you end up finding that your "alternative" ends up being very well suited to linear algebra and gradient descent, once you have found the right formulas, you may realize that they are equivalent to the ones used in traditional "deep learning" algorithms. It help to recognize this early and take advantage of all the work done before you.

  • mindcrime 3 years ago

    Wouldn't it be fair to say that one has to know what the current path is and have some idea where it leads and what its issues are, before forging a new path?

    I mean, any idiot can go off-trail and start blundering around in the weeds, and ultimately wind up tripping, falling, hitting their head on a rock, and drowning to death in a ditch. But actually finding a new, better, more efficient path probably involves at least some understanding of the status quo.

    • agar 3 years ago

      > probably involves at least some understanding of the status quo.

      Oh man, you had me going with such a vivid metaphor. I was really hoping for a payoff in the end, but you abandoned it. The easy close would be "probably involves at least some understanding of the existing terrain" but I was optimistic for something less prosaic.

      • mindcrime 3 years ago

        Sorry to disappoint. My creative juices aren't flowing today I guess. Need more coffee, or something!

    • someweirdperson 3 years ago

      To walk a path no knowledge of the existing is needed. But to be able to claim it is new it is. Even more so to be able to claim that the new is better.

      • fnordpiglet 3 years ago

        Bias and ignorance are two different things. No knowledge is ignorance. Bias is using knowledge to judge new knowledge. The goal isn’t to pursue things with raging ignorance but to pursue them with no bias and collecting knowledge without conclusion, then once you’re knowledgeable of what is there you can take off with raging ignorance in the direction no one has gone before. But you can’t do than holding bias any more than you can having ignorance of what directions have been gone before.

  • ly3xqhl8g9 3 years ago

    The most off the beaten path to AGI I heard through the grapevine is to not have artificial neural networks, as in algorithms involving matmul running on silicon, at all. But instead, going on the path of the laziest engineer is the best engineer, to rely on the fact that neurons, actual neurons from someone's brain, already "know" how to make efficient, good-enough, general learning architectures and therefore in order to obtain programmatic human-like intelligence one would 'simply'† have to implant them not in mice [1] but in an actual vat and 'simply' interface with the whatever a group of neurons can be called, a soma(?). Given this Brain-on-a-Chip architecture, we wouldn't have to stick GPUs in our cars to achieve self-driving, but even more wetware (and of course, ignore the occasional screams of dread as the wetware becomes aware of themselves and how condemned they are to an existence of left-right-accelerate-break).

    It would have been interesting seeing someone like Carmack going in this direction, but from the little details he gave he seems less interested in cells and Kjeldahl flasks and more of the same type-a-type-a on the ol' QWERTY.

    † 'simply' might involve multiple decades of research and Buffett knows how many billions

    [1] Human neurons implanted in mice influence behavior, https://www.nature.com/articles/s41586-022-05277-w

  • pavon 3 years ago

    What a waste it would be to think you are pursuing a different path only to discover you spent a year reinventing something that you could have learned by reading papers for a few days.

    • drakenot 3 years ago

      > "I have been amazed at what we've found here," he told them. "A few weeks ago I would not have believed, did not believe, that records such as you have in your Memorabilia could still be surviving from the fall of the last mighty civilization. It is still hard to believe, but evidence forces us to adopt the hypothesis that the documents are authentic.

      > Their survival here is incredible enough; but even more fantastic, to me, is the fact that they have gone unnoticed during this century, until now. Lately there have been men capable of appreciating their potential value– and not only myself. What Thon Kaschler might have done with them while he was alive!– even seventy years ago."

      > The sea of monks' faces was alight with smiles upon hearing so favorable a reaction to the Memorabilia from one so gifted as the thon. Paulo wondered why they failed to sense the faint undercurrent of resentment– or was it suspicion?– in the speaker's tone. "Had I known of these sources ten years ago," he was saying, "much of my work in optics would have been unnecessary." Ahha! thought the abbot, so that's it. Or at least part of it. He's finding out that some of his discoveries are only rediscoveries, and it leaves a bitter taste. But surely he must know that never during his lifetime can he be more than a recoverer of lost works; however brilliant, he can only do what others before him had done. And so it would be, inevitably, until the world became as highly developed as it had been before the Flame Deluge.

      -- A Canticle for Leibowitz

    • stuntkite 3 years ago

      That's like a constant cycle for me. The stuff that grows from it is the things that keep growing and sticking around and I don't find any other literature directly replacing it or enhancing it. When I do find things that replace a bunch of my work I'm thrilled because I don't have to do that now and I can focus my energy on the other threads. Every once in a while I get competitive and it hurts, but if I'm being honest if I find something that gets me that way I've got a special appreciation for that moment.

  • ramraj07 3 years ago

    This is pretty much the same deal in biology as well. At calico, at verily, at CZI, even at Allen, same story - they say they will reinvent biology research and then go get the same narrow minded professors and CEOs who run the status quo and end up as one more of the same stuff.

    Neuralink is the only place where this pattern seemed to break a bit but then seems like Elon came into his own path with trying to push for faster results and breaking basic ethics.

    • 93po 3 years ago

      > breaking basic ethics

      This didn't happen

      • optimalsolver 3 years ago
        • 93po 3 years ago

          This criticism is coming from an "ethics group" that is literally funded by PETA and is frequently criticized by an actual, legitimate group: the American Medical Association. It's baseless garbage, and the hypocrisy is not lost on me that it's published on Fortune, which is owned by a billionaire whose majority wealth comes from Charoen Pokphand. This company is responsible for some of the worst factory farming conditions on the planet along with being accused by the Guardian of using slave labor on their shrimping boats - an accusation they later admitted to. Fortune in general is a shit publication with an axe to grind against Elon.

          • optimalsolver 3 years ago

            The criticism is coming from current and former employees, regardless of who's amplifying the message.

  • sinenomine 3 years ago

    The amount of disdain academically inclined people express towards reductionist engineering-first paradigms is hilarious and depressing.

    The denial of obviously fertile paradigm feels like such a useless self-defeating loss to indulge in an intellectual status game.

    We could be all better off right now if connectionists were given DOE-grade supercomputers in the 90s, and were supplied with custom TPUs later in the 00s as their ideas were proven generally correct via rigorous experimentation on said DOE supercomputers. This didn't happen due to what amounts to academic bullying culture: https://en.wikipedia.org/wiki/Perceptrons_(book)

    The sheer scale of cumulative losses we suffered (at least in part) due to this denial of the connectionism as a generally useful foundational field will be estimated somewhere in the astronomical powers of ten in the future, where the fruits of this technology will provide radically better lives for us and our descendants.

    I see you have a knee-jerk reaction to hype and industry, and we are all fearing replacement unless its a stock market doing the work for us ... but why do you feel the need to punch down at this prosaic field "about nonlinear optimization"? The networks in question just want to learn, and to help us, if we train them to this end - and we make any and all excuses to avoid receiving this help, as our civilization quietly drowns in its own incompetency...

  • throwaway4837 3 years ago

    Did you read the full article? In science, you should usually have a very solid understanding of what the top minds in the field are fixated on as it allows you to try something different with confidence, and prevents you from pulling a Ramanujan, reinventing the exact same wheel. I can't think of a single scientist who caused a paradigm shift and didn't have an intimate understanding of the current status quo.

  • albertzeyer 3 years ago

    It is possible to use neural networks and still be on a quite different path than the mainstream.

    Of course, there are a group of people defending the symbolic computation, e.g. see Gary Marcus, and always pushing back on connectionism (neural networks).

    But this is somewhat a spectrum, or also rather sloppy terminology. Once you go away from symbolic computation, many things can be interpret as neural network. And there is also all the computational neuroscience, which also work with some variants of neural networks.

    And there is the human brain, which demonstrates, that a neural network is capable of doing AGI. So why would you not want a neural network? But that does not say that you can do many things very different from mainstream.

chrgy 3 years ago

From ChatGPT, although personally I think this list is bit old but should be at the 60% mark at the very least Deep Learning:

AlexNet (2012) VGGNet (2014) ResNet (2015) GoogleNet (2015) Transformer (2017) Reinforcement Learning:

Q-Learning (Watkins & Dayan, 1992) SARSA (R. S. Sutton & Barto, 1998) DQN (Mnih et al., 2013) A3C (Mnih et al., 2016) PPO (Schulman et al., 2017) Natural Language Processing:

Word2Vec (Mikolov et al., 2013) GLUE (Wang et al., 2018) ELMo (Peters et al., 2018) GPT (Radford et al., 2018) BERT (Devlin et al., 2019)

  • loveparade 3 years ago

    You are getting downvoted because this list if from ChatGPT, but as a researcher in the field, this list is actually really good, except for perhaps the SARSA and GLUE papers, which are less generally relevant. I would add WaveNet, the Seq2Seq paper, GANs, some optimizer papers (e.g. Adam), diffusion models, and some of the newer Transformer variants.

    I'm very confident that this is pretty much what any researcher, including Ilya, would recommend. It really isn't hard to find those resources, they are simply the most cited papers. Of course you can go deeper into any of the subfields if you desire.

ilaksh 3 years ago

My guess is that multimodal transformers will probably eventually get us most of the way there for general purpose AI.

But AGI is one of those very ambiguous terms. For many people it's either an exact digital replica of human behavior that is alive, or something like a God. I think it should also apply to general purpose AI that can do most human tasks in a strictly guided way, although not have other characteristics of humans or animals. For that I think it can be built on advanced multimodal transformer-based architectures.

For the other stuff, it's worth giving a passing glance to the fairly extensive amount of research that has been labeled AGI over the last decade or so. It's not really mainstream except maybe the last couple of years because really forward looking people tend to be marginalized including in academia.

https://agi-conf.org

Looking forward, my expectation is that things like memristors or other compute-in-memory will become very popular within say 2-5 years (obviously total speculation since there are no products yet that I know of) and they will be vastly more efficient and powerful especially for AI. And there will be algorithms for general purpose AI possibly inspired by transformers or AGI research but tailored to the new particular compute-in-memory systems.

  • TimPC 3 years ago

    Why do you think multimodal transformers will get us anywhere near general purpose AI? Multimodal transformers are basically a technology for sequence-to-sequence intelligent mappings and it seems to me extremely unlikely that general intelligence is one or more specific sequence-to-sequence mappings. Many specific purpose problems are sequence-to-sequence but these tend to be specialized functionalities operating in one or more specific domains.

    • RC_ITR 3 years ago

      A lot of people don't really get that our brains are a bunch of specialized subcomponents that work in concert (Your pre-frontal cortex just cannot beat your heart, not matter how optimized it gets). This is unsurprising, as our brains are one of the most complex/hard to monitor things on earth.

      When an artificial tool that is really a point solution "tricks" us into thinking it has replicated a task that requires complex multi-component functioning within our brain, we assume the tool is acting like our brain is acting.

      The joke of course being that if you maliciously edited GPT's index for translating vectors to words, it would produce gibberish and we wouldn't care (despite being the exact same core model).

      We are only impressed by the complex sequence to sequence strings it makes because the tokens happen to be words (arguable the most important things in our lives).

      EDIT: a great historic metaphor for this is how we thought about 'computer vision' and CNN's. They do great at identifying things in images, but notice that we still use image-based captcha's (Even on OpenAI sites no less!)?

      That's because it turns out optical illusions and context-heavy images are things that CNN's really struggle at (since the problem space is bigger than 'how are these pixels arranged')

    • ilaksh 3 years ago

      A couple of things.

      1) As I said, many people have different ideas of what we are talking about. I assume that for you general purpose AI has more capabilities, such as the ability to quickly learn tasks to a high level on the fly. For me, it still qualifies as general purpose if it can do most tasks but relies on a lot of pre-training and let's say knowledgebase look up.

      2) It seems obvious to me that ChatGPT proves a general purpose utility for these types of LLMs, and it is easy to speculate that something similar but with visual input/output also will be even more general. And so we are just looking at a matter of degree by that definition.

      • TimPC 3 years ago

        For 1) I agree but ChatGPT is a specific purpose sequence to sequence model. It’s fairly obvious to me it’s not general purpose and it even fails sometimes at correctly reading content it generates. It also doesn’t understand correctness and often ends up generating incorrect content. Our best example of this not being general purpose is how staggeringly bad ChatGPT is at math which is blatantly obvious when you think about how it is designed.

  • mirekrusin 3 years ago

    AGI will be AI which can improve it's own code after N iterations where N will be blurry.

jimmySixDOF 3 years ago

>90% of what matters today

Strikes me as the kind of thing where that last 10% will need 400 papers

  • mindcrime 3 years ago

    "The first 90% is easy. It's the second 90% that kills ya."

    • kabdib 3 years ago

      "All projects are divided into three phases, each consisting of 90% of the work."

      -- just about everything I've shipped :-)

  • michpoch 3 years ago

    For the last 10% you'll need to write a paper yourself.

  • tikhonj 3 years ago

    Along with the kind of details and tacit knowledge that never makes it into papers...

  • swyx 3 years ago

    maybe thats the part he intends to deviate. he just doesnt need to reinvent the settled science.

codeviking 3 years ago

This inspired us to do a little exploration. We used the top cited papers of a few authors to produce a list that might be interesting, and to do some additional analysis. Take a look: https://github.com/allenai/author-explorer

hexhowells 3 years ago

While not all papers, this list contains a lot of important papers, writings, and conversations currently in AI: https://docs.google.com/document/d/1bEQM1W-1fzSVWNbS4ne5PopB...

albertzeyer 3 years ago

(Partly copied from https://news.ycombinator.com/item?id=34640251.)

On models: Obviously, almost everything is Transformer nowadays (Attention is all you need paper). However, I think to get into the field, to get a good overview, you should also look a bit beyond the Transformer. E.g. RNNs/LSTMs are still a must learn, even though Transformers might be better in many tasks. And then all those memory-augmented models, e.g. Neural Turing Machine and follow-ups, are important too.

It also helps to know different architectures, such as just language models (GPT), attention-based encoder-decoder (e.g. original Transformer), but then also CTC, hybrid HMM-NN, transducers (RNN-T).

Some self-promotion: I think my Phd thesis does a good job on giving an overview on this: https://www-i6.informatik.rwth-aachen.de/publications/downlo...

Diffusion models is also another recent different kind of model.

Then, a separate topic is the training aspect. Most papers do supervised training, using cross entropy loss to the ground-truth target. However, there are many others:

There is CLIP to combine text and image modalities.

There is the whole field on unsupervised or self-supervised training methods. Language model training (next label prediction) is one example, but there are others.

And then there is the big field on reinforcement learning, which is probably also quite relevant for AGI.

  • hardware2win 3 years ago

    I do wonder whether people behind Attention is all you need paper

    Will receive Turing Award

    It is being cited often

    • RC_ITR 3 years ago

      >Will receive Turing Award

      This is the weird thing - hopefully not! Hopefully there's even better NN models coming out every 5-10 years and we look back on transformers as 'just a phase' sort of like how we look back at RNN's (which were no less of an amazing achievement - look at the proliferation of voice assistants), as potentially obsolete technology today.

      Fore example, attention is great and does a really good job of simulating context in language, but what if we come up with a clever way to simulate symbology? Then we actually are back on the path to AGI and transformers will look like child's play.

      • Beldin 3 years ago

        > symbology

        Off-topic, but now I have William Dafoe going "What's the 'symbology' here? The symbolism ..." in my head (from Boondock Saints).

        • Gee101 3 years ago

          Even thou I watched that movie 20 years ago. I will never forget that scene.

    • modeless 3 years ago

      The Adam optimizer is another possibility. It's unbelievably good and everyone uses it.

    • albertzeyer 3 years ago

      The authors did not really expect it to be such a huge influence. You could also argue, it is a somewhat natural next step. This paper did not invent self-attention nor attention. Attention was already very popular, specifically for machine translation, and a few other papers already did use self-attention at that point in time. It was just the first paper which solely used attention and self-attention and nothing else.

    • mirekrusin 3 years ago

      Guy who said - “I don’t understand all of this, can we just throw more machines?” should get the award.

    • seydor 3 years ago

      I remember an interview with one of the founders of openAI, saying that if it wasn't the transformer architecture it would be something else. What really matters is the scale of the model. The transformer is only one of the possible configurations that work well with text. It seems they stuck to it because it is really so good so why break things.

    • mattcaldwell 3 years ago

      Came here expecting a Haiku.

    • PartiallyTyped 3 years ago

      Attention existed before that paper and was incorporated to LSTMs until that point in time.

  • alan-starkOP 3 years ago

    Thanks for sharing. Cool to see someone from Aachen NLP group. I'll be visiting Aachen/Düsseldorf/Heidelberg area in spring. Do you know of any local ML meetups open to general (ML engineer/programmer) public?

    • albertzeyer 3 years ago

      Unfortunately, not really. We used to have some RWTH internal meetups, although that has been somewhat interrupted since Corona, and not really recovered afterwards.

      Aachen has quite a few companies with activity on NLP or speech recognition, mostly due to my professor Hermann Ney. E.g. there is Apple, Amazon, Nuance, eBay. And lesser-known AppTek. And in Cologne, you have DeepL. In all those companies, you find many people from our group. And then, at the RWTH Aachen University, you have our NLP/speech group, and also the computer vision group.

klaussilveira 3 years ago

Following: https://twitter.com/u3dcommunity/status/1621524851898089478?...

polskibus 3 years ago

What about just asking Carmack on twitter?

KRAKRISMOTT 3 years ago

Start tweeting at him until he shares

  • fnordpiglet 3 years ago

    Clearly do this by tweet storming him via LLM

    • steveBK123 3 years ago

      As an AI LLM, I cannot decide which academic papers are "best" as the idea of "best" is subjective and there are many different factors that need to be considered.

      • cwillu 3 years ago

        I apologize for the oversight, you are correct. Let me know if there's anything else I can help you with.

EvgeniyZh 3 years ago

Attention, scaling laws, diffusion, vision transformers, Bert/Roberta, CLIP, chinchilla, chatgpt-related papers, nerf, flamingo, RETRO/some retrieval sota

  • seydor 3 years ago

    what do you mean 'scaling laws'?

    • EvgeniyZh 3 years ago

      J. Kaplan, S. McCandlish, T. Henighan, T. B. Brown, B. Chess, R. Child, S. Gray, A. Radford, J. Wu, and D. Amodei. Scaling laws for neural language models. arXiv preprint arXiv:2001.08361, 2020.

      and multiple follow-ups

username3 3 years ago

They asked on Twitter and he didn’t reply. We need someone with a blue check mark to ask. https://twitter.com/ifree0/status/1620855608839897094

winwhiz 3 years ago

I had read that somewhere else and this is as far as I got

https://twitter.com/id_aa_carmack/status/1241219019681792010

throwaway4837 3 years ago

Wow, crazy coincidence that you all read this article yesterday too. I was thinking of emailing one of them for the list, then I fell asleep. Cold emails to scientists generally have a higher success-rate than average in my experience.

cloudking 3 years ago

Ilya's publications may be on the list https://scholar.google.com/citations?user=x04W_mMAAAAJ&hl=en

daviziko 3 years ago

I wonder what would Ilya Sutskever would recommend as an updated list nowadays. I don't have a twitter account, otherwise I'd ask him myself :)

Phil_Latio 3 years ago

Not in the list: https://arxiv.org/pdf/1805.09001.pdf

evc123 3 years ago

https://arxiv.org/abs/2210.14891

adt 3 years ago

https://lifearchitect.ai/papers/

vikashrungta 3 years ago

I posted a list of papers on twitter, and will be posting a summary for each of them as well. here is the list https://twitter.com/vrungta/status/1623343807227105280

Unlocking the Secrets of AI: A Journey through the Foundational Papers by @vrungta (2023)

1. "Attention is All You Need" (2017) - https://arxiv.org/abs/1706.03762 (Google Brain) 2. "Generative Adversarial Networks" (2014) - https://arxiv.org/abs/1406.2661 (University of Montreal) 3. "Dynamic Routing Between Capsules" (2017) - https://arxiv.org/abs/1710.09829 (Google Brain) 4. "Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks" (2016) - https://arxiv.org/abs/1511.06434 (University of Montreal) 5. "ImageNet Classification with Deep Convolutional Neural Networks" (2012) - https://papers.nips.cc/paper/4824-imagenet-classification-wi... (University of Toronto) 6. "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding" (2018) - https://arxiv.org/abs/1810.04805 (Google) 7. "RoBERTa: A Robustly Optimized BERT Pretraining Approach" (2019) - https://arxiv.org/abs/1907.11692 (Facebook AI) 8. "ELMo: Deep contextualized word representations" (2018) - https://arxiv.org/abs/1802.05365 (Allen Institute for Artificial Intelligence) 9. "Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context" (2019) - https://arxiv.org/abs/1901.02860 (Google AI Language) 10. "XLNet: Generalized Autoregressive Pretraining for Language Understanding" (2019) - https://arxiv.org/abs/1906.08237 (Google AI Language) 11. T5: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer" (2020) - https://arxiv.org/abs/1910.10683 (Google Research) 12. "Language Models are Few-Shot Learners" (2021) - https://arxiv.org/abs/2005.14165 (OpenAI)

theusus 3 years ago

like papers are that comprehensible.

mgaunard 3 years ago

In my experience, all deep learning is overhyped, and most needs that are not already addressable by linear regressions can be done so with simple supervised learning.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection