Ask HN: Become a 10x developer with LLM? Myth? Reality?
I keep hearing from VC's on the internet mainly the right now in the `all in podcast` there are people who have leveraged LLM to become a 10x dev.
What stack are they using? I am thinking it's pretty much any LLM (GPT-3/GPT-4), Lang chain with GPT Index.
I am guessing they build boilerplate templates and filling in functions with GPT outputs then using them to write unit tests for base cases?
I am guessing the truth is really closer to the centre. It gives me faster access to documentation/Stackoverflow, and creates bespoke templates. So I am "10x" but for the most tedious parts of "copy/paste" development. For the hard parts or the time spent thinking about design/flow, it does basically nothing. So am I "10x" overall? Absolutely not. I am barely even "2x" relative to before (since this is a minority of my time). For example (ChatGPT): - Prompt: To achieve 10x you need to learn exactly how lazy you can be and get the same results :P me: html select list of orderings: fifo, random, rr, lifo GPT-4: To create an HTML select list with different orderings, such as FIFO (First In, First Out), Random, RR (Round Robin), and LIFO (Last In, First Out), you can use the following HTML code: --- OK not exactly the same but just wanted to GPT-golf it. When lazymaxing for real, I just try to keep in mind that ChatGPT is already fine-tuned to include certain things, like following proper conventions, explaining its steps, commenting code. And that most of the time it'll make an educated guess rather than looking at me going "huh?" if it's unsure - so if there's really only 1 reasonable interpretation of terse instructions, I can drop the human-friendly cruft text. > Keep in mind if I hand-wrote this I still have to write out the text-values which are 50% of the prompt's length. It did the names/IDs and the boilerplate. This is the kind of thing that I really struggle with when trying to do something like this. Often the part that takes me the longest when writing snippets like those _is_ the copy (or at least, collecting my thoughts and/or notes to get the copy into my working memory), the code is quick enough. If I spend that same amount of time (or more) writing the prompt, I don't feel like I've saved much time. I agree; but that's why I like the above. I spend just as much time thinking of the text as I do the values/IDs/names, and it did that part for me. So in terms of my mental workload it is cut in half. The above example is kind of silly in that FIFO/LILO are common terms, but this works just as well for very niche business names too. > It gives me faster access to documentation/Stackoverflow, and creates bespoke templates. But it regurgitates Stackoverflow with sub 100% accuracy, and then you need to spend time investigating in what went wrong/trying to fix it It's for sure spit out an API that doesn't exist to me before > But it regurgitates Stackoverflow with sub 100% accuracy, and then you need to spend time investigating in what went wrong/trying to fix it Since it is bespoke boilerplate, it isn't "just" documentation or Stackoverflow. Take this example, I gave it a library, I gave it custom names for everything, and the result is pretty time-saving: Prompt: It helps with the tedious tasks but great programmers are great because of the way they approach and solve problems, not because they write code faster. On a sidenote: You shouldn't take anything said in the "all in podcast" more seriously than a political discussion overheard a saturday night in a bar. Be careful with the halo effect. I've been able to use phind + Co-pilot to learn python, build and deploy a FastAPI service on the OpenAPI spec, without really knowing any of those things. Just ask GPT-4 how FastAPI works, how OpenAPI works, and ask it to format and write your YAML for you. It's an amazing TA — better than any I've had in undergrad, and way better than office hours, where your prof don't really care and wants to go back to their research. I don't have to read through docs for hours and build a mental model — that stuff is taken care of by asking Chat. Now I'm learning how to deploy my own pytorch model for a real-life use case, by using a combination of SAM and Roboflow. I have absolutely no background in any of these, I just ask it dumb questions until I get it. It's also helping me understand how and when to use hashing and merkle trees for a related project. For me this is xInfinite because I'm doing stuff I'd been too lazy to learn and understand (e.g. vision). And I'm at the core a UX / product designer and absolutely not an engineer. This has been my experience too. I’ve seen a few responses in here comparing time-to-write x lines of code alone vs with GPT, but imo it’s not that stuff that makes the biggest difference (although I do like that stuff too). If creating something complex and novel (for you, at least) is to be constantly pushing at the edges of what you know, your ability to progress that work is gated by a constant flow of challenges that are - to you - insurmountable without the right knowledge. Whether it’s an incantation or a mental model or whatever: you need that key to open that gate. ChatGPT - and I want to confine my claims to GPT4 because I have found it much better than Turbo3.5 in this regard - is a tool that has the potential to take each of those gates, and tell you - converse with you until you understand - what you need to, to progress. That feels incredibly powerful, and I wouldn’t hesitate to call that a 10x improvement for those - very common - scenarios. As a result, I have found the same thing happening that Simon Willison has been talking about - and that parent also mentioned: you’re in a position to tackle a much wider array of challenges than before. Stuff you would’ve written off as too time-consuming is suddenly a few minutes work. I dunno, to me, this whole experience is completely magical. Honestly can’t understand how people remain underwhelmed by what’s happening. I always want to say “hey, look over here! This is _amazing_!” I think we’re still in the stage where people are figuring out how to use a powerful new tool, and there’s quite a bit of “trying to drill holes with a hammer”. For the types of job at which this particular tool excels, until now we’ve largely had to do it by hand. If you have no experience whatsoever, how do you know what to avoid? I can tell that phind is a great educational tool, but without the proper knowledge it can give you answers you might not know how to cope with, and you need to ask and ask again and again until you get half a convincing answer. It's like reading exclusively the accepted answer on stackoverflow. There are many reasons why that's wrong: it can be outdated, it might have been upvoted simply by the "wrong" people, etc. I am honestly very scared by the amount of people that never wrote one line of code and suddenly can "deploy" things. Not because I am afraid for my job, but simply because in any language and framework there are implicit things to learn which take years to master, and suddenly everyone knows how to build a product -> why does it take our engineers so long! Look, here is the code! Yeah, that kind of "mindset", I am afraid of. Wrong tool in the wrong hands. As long as you do it for hobby, I wish you all the fun of the world. That's an interesting review of the technology, because to me what you've described raises a lot of red flags. I've read - and written! - plenty of code where the author clearly hasn't taken the time to learn the tools that they're using, and just hacked around in circles until they've got everything to work. And that code is usually bad - difficult to extend or change, poorly abstracted, difficult to understand, etc. It has its place, and I understand why people (including myself) write like that sometimes, but I wouldn't want it to become the norm. It sounds like these sorts of tools are making it easier to get started with new ideas, but I'm intrigued to see what long-term maintenance is going to look like. > It's an amazing TA — better than any I've had in undergrad, and way better than office hours, where your prof don't really care and wants to go back to their research It may be for trivial things (under which I mean that there is basically a verbatim answer on the internet that it had in its training data), but I don’t think it’s a good idea in general — if you talk with it on any topic that you are knowledgeable on, you notice it spews a bunch of nonsense and it is really hard to recognize that otherwise. And I’m not talking about just hard questions, something as basic as Java medior interview questions have sometimes glaring inconsistencies - it sounds techy and detailed, but ultimately wrong. > I've been able to use phind + Co-pilot to learn python > I don't have to [...] build a mental model You are not actually learning. You can easily write x10 lines of code; but what matters is deliverables. My guess (I'm using free ChatGPT rather than v4) is that it's really good at small chunks, but not big. On this basis, if you're bad at software architecture then it will rapidly get confused and produce a spaghetti monster nobody can even debug; whereas if you're good at architecture, even though you probably didn't feel limited by your typing speed in the first place, it's enough of a boost to make a difference. It helps me not break my flow to go to StackOverflow/documentation, I can stay in the zone more continuously. It does worry me about the future of StackOverflow, because it is unclear how they continue to snowball and stay current if folks are not using the site directly. Also, this is tangential, but: when I'm doing PoC application websites, etc it is also helpful to generate images/placeholders/etc easily with image generators (not LLM, but in the realm) so i can stay focused on the backend functionality. I recently created a PoC internal use website in an hour because I didnt have to mess around with finding just the right stock images, etc. Good PoCs, IMHO, should have enough visuals so as not to distract from testing and the goals of the PoC. Reality. ChatGPT has been working fine for me but I have a ton of programming experience so I can easily fix any mistakes. I will be upgrading to plus soon. Also, Copilot sometimes just does nothing, but there have been many instances now where it literally does exactly what I wanted. It's pretty random. "Reality. ChatGPT has been working fine for me but I have a ton of programming experience so I can easily fix any mistakes. I will be upgrading to plus soon.' I think this is the reality of the current crop of LLMs. It's like having and assistant that you need to constantly check on. If you have lots of experience you can quickly catch the problems and move on but if you don't then it slows you down. My feeling is that in every field that it influences it will make experts more productive by reducing the tedious parts but for novice it will help but they won't have a 10x increase in productivity. My general AI view is that to get the most of it you better become an expert in your field so you can catch problems with the output fast. Those that don't or can't will have a tough future trying to compete in the job market. Yes, I totally agree that it is more helpful for people that are already experts. If you're just copying someone else's code, whether from StackOverflow or an LLM, you haven't made yourself a 10x developer. You may have found a shortcut through the boilerplate and repetitive code that occurs across many applications, but have you learned anything? Do you understand the code? Could you write it yourself? Using a forklift doesn't make me a 10x weight lifter. Finishing a paint-by-numbers canvas doesn't make me an artist. Learning programming languages and writing the trivial bits doesn't describe the hard part of programming: domain knowledge and skillfully translating complex business requirements into working code. [ignorant bystander] Since LLMs are trained, by their training, and I guess by your prompts (per session at least?), the only participant progressing toward 10X status would be the LLM. I don’t think GPT4 beats an expert yet. That’s why expectations to ask it to build a complex commercial project from scratch are not realistic. Where it shines is up-levelling the developer in areas he is not an expert. Need to jump out of Java development and process some data in Python with a special library? It will likely be more than 10x speed up for you to ask GPT4 for help. Learning a functional language? It will send tons of best practices your way. It’s also great in translating ideas between domains, one example are your requirements to code, but also code to tests, data to SQL etc. I’m happy with how helpful it is in exploring implementation concepts - you can discuss problems you imagine and it will improve the suggestions. A bit like a brainstom guide for you. And sometimes it will surprise you with a much simpler idea which you’ve been overthinking. It’s really great when you don’t expect it to be perfect. It fails very often do to unclear commands (limited prompt „memory” is counter intuitively for humans limiting what can be understood - how can it not remember what you can see a bit to the top?). It will always try to do what you ask for, but if you assume it will know what you know you will be disappointed. Idk if it can turn a 1x developer into a 10x developer, but it can certainly turn a product manager with enough coding knowledge into a passable developer. I just wrote a tweet thread about this today - https://twitter.com/rrichrs/status/1646885705790914564?s=20 See https://simonwillison.net/2023/Mar/27/ai-enhanced-developmen... > What stack are they using? I am thinking it's pretty much any LLM (GPT-3/GPT-4), Lang chain with GPT Index. I'm skeptical of any 'stack' that isn't just ChatGPT + GPT-4. It's probably whatever their flash in the pan startup is building right now. To me this seems like a horrible time to build up such a stack (personal or otherwise) rather than just getting proficient at ChatGPT and the plain old OpenAI API while waiting for new stuff like plugins to come out. Personally I just have a go-to tech stack for GPT-based projects, composed of python, poetry, datasette, and click CLI's written by GPT-4 that can call out to OpenAI in order to manipulate databases (to then be viewed in datasette). But this is more for building little demos to try out GPT rather than incorporating into other work. Otherwise I just use ChatGPT alone. But TBH I'm far from 10x. I don't know about 10x, but it has enabled me to explore new approaches much faster and write a lot of the code. My old workflow would be to search documentation, blog posts, stack overflow and anything else that might give me a clue. Usually there isn't something that does exactly what I want and the documentation usually leaves a lot to be desired. New workflow: please write some code using X library in Y language that does Z. Code usually mostly works. If there is anything I don't understand, I ask it to explain. It's like having an expert by my side teaching and writing for me. Not sure it makes me 10x, but it does let me explore more things and get to an understanding a lot faster. I use some of that time saved to try out more ideas, so overall I'm not sure I'm hugely faster but what I can produce is a lot higher quality with better features. I would say that since LLMs made it on the scene my code output efficiency went 2-3x. As a backend dev I ended up implementing several frontend features at work, stuff I would have not touched previously. I did not blindly implement them. But had ChatGPT teach me how to do so. But I would say that I could still not fully understand it without ChatGPT. And as for debugging and googling things goes. Efficiency has definitely gone up 5x in many cases. Maybe for some cases as much as 10-20x. However my overal efficiency and productivity maybe only went up maybe 2-3x? I don’t know. It’s hard to assess. Writing code is really only part of the job, so it can only do so much. But I am absolutely sure that even now there will be outliers who went from being 1x developers to the mythical 10x, based on the improvements I have seen with myself. I have used ChatGPT to cut down one of my jobs to maybe 3 hours of coding work a week. The job is extracting data from JSON and putting it into differently formatted JSON. It does a great job of designing extractors and putting it into a different format. I can sort-of answer with a personal example. A week or two back I asked Bing to give me the code for a Go API to list books. Here's what I did, and each bullet point was one single step in the conversation. --- - It gave me the complete code for a Go API using JSON, with handler and routing, using a map to hold the data - I asked it to switch to a website and it gave me the same code, with the route adjusted and the rendering returning HTML to view the books - I said I needed a SQLite back-end and it updated the code to fetch from a database, including stuff to open the connection - I asked it to seed the database with some random data and it added a method to do that - I asked it for a page to add a book and it gave me the full code back with a route, handler, database code, and form handling logic to do add a book - I said I needed the option to delete any book and it gave me updated HTML with a small form tag and delete button for every book in the item list, with code to delete the right book when a button is clicked --- I'm not exaggerating when I say that each of the above was literally a single request. I pasted the code returned into a file and ran it and in each case it built and worked. At only one stage did it get something wrong. I never kept the prompts/code as I was experimenting, but basically it had an off-by-one error in a loop where it did less than or equals to rather than less than. I described the problem (one prompt) and it "worked it out" and gave me the code back with the line fixed. So that is 6 prompts plus an extra one for the bug, and the result was a working (trivial) site written in Go with a database back-end. The other main use I've had is pasting in a screen or two's worth of existing Python code. No questions, just the code. It explained what the code did (including a nuance I'd missed) and proactively pointed out a bug I hadn't seen. It was incredible to see, then it was worrying (am I needed any more?), then reassuring when I realised this was great for getting started with green-field stuff or iterating over small stuff, but less so with big-picture stuff on a large codebase. In my admitted limited experience (small amounts at work and in personal projects) it's some integer multiplier. For routine code it's really nice. But for something more complicated it was distracting. Copilot had an opinion of what the software should be which wasn't quite right and had to be ripped out later. It feeds on itself and your program is further shaped in that direction. Of course I could have made a similarly poor decision but it's interesting to see that automated to the point of convenience. Let's first try to become 10x system administrators with LLM. What I would like to know is what is the best way to integrate an LLM and its output into my IDE without paying for something like Github Copilot. Any ideas here? what ide? I'm using VScode
edit: For context, I have a GPT-4 API key, so I'd like to do something like what Copilot X is, just obviously much worse and rough around the edges considering I'm rolling it myself When I search VS Code extensions for 'GPT', I get quite a few - is there something you are hoping for beyond what is available with those? Maybe, but everybody has access to it. So with the playing field remaining equal, everybody becomes 10x what they were before and you're back at square 1 I pay for ChatGPT-4. It's helping me program noticeably better and much faster, even with mid-sized projects involving many files. This is the process that I follow: - I write a system prompt with a succinct description of what I want to implement: "X is an online app that does Y, with these features: ..." I try to be exhaustive, and I write as I would if I were describing what I want to do to a very proficient programmer who needs to understand the problem and the solution. I save this to a prompts/0-system.txt file that will be part of the project. - I design an architecture and define the general boundary conditions. I may ask ChatGPT for opinions, but at this stage, it's mostly based on experience. I add it to the system prompt. - I write a description of the first chunk of functionality that I want to implement, usually a set of functions or a class, at the individual file level. For example, prompts/1-dal.txt. - I (actually ChatGPT) wrote a very simple recursive cat script (https://github.com/juanre/catprompt) that combines several files into a single prompt (a file can include other files, etc). I add a line to prompt/1-dal.txt pointing to the system prompt, and use catprompt to create a combined prompt that will include the overall description and the specifics of the functionality I am after. - I give this prompt to ChatGPT, and it produces the code. It's typically perfectly fine. I add it to the project, for example, as dal.py, review, and fix whatever may need fixing. - I ask ChatGPT to create tests in the same session. This also tends to work fine. - Then I move to the next piece of functionality, in a new session. I write another description, in another file, including the system prompt and usually the code already created. The prompts remain: they become a description of the code (and of my intentions) for other programmers, and they can help ChatGPT write documentation as well. I enjoy programming like this. The effort I make in writing concise descriptions helps me think more clearly. I used to love literate programming back in the 90s, and this feels very much like it. Knuth described it thus: "Let us change our traditional attitude to the construction of programs: Instead of imagining that our main task is to instruct a computer what to do, let us concentrate rather on explaining to human beings what we want a computer to do." Replace "human beings" with LLMs, and I think that's where we are now. I've essentially been using ChatGPT-4 in the same way and can confirm that the code it gives typically works. If not I just decompose pieces of it once and try again. If it fails there I just write it myself. And would you say this process saves you time? Instead of traditional programming? Or do you find other upsides to it? It certainly saves me time, and I end up with better code and better documentation. I can stay thinking at a higher level, which removes cognitive friction. I usually don't want to be programming: I want the functions I need to be written and working. ChatGPT-4 takes care of that quite well. I have made 2 iphone apps in the past week. I have never before been able to finish an app as i would run into some insurmountable issue and lose interest. The apps are simple but functional. It has really made me excited about doing lots of things that I could never get over the competence hump of doing previously. Interesting, can you post your general workflow? It has def helped me become more efficient in the sense that I can get ideas of how things could be implemented. Or for use cases like creating a regex to do something. Has it made me 10x, probably not, has it improved my efficiency, 100%. I can only speak for myself but my productivity has definitely improved 10x. It's not that I do everything 10x faster, but when I'm stuck on a long standing problem i resolve it much quicker. GPT 4 gives me example code so I can learn the missing system call or library call that I need to make something work. Previously, searching for those @#$@ calls took most of my time. In my 20+ year career I've still yet to see the myth of a "10x developer" as a reality. It just doesn't exist. Some old time programmers could program with just their brains, basically see code, and output working code into a terminal with minimal fixes. This was a must since computers were shared and they were very expensive to use. My feeling is that 10x programmers can do the same. Being able to program in your brain is a huge advantage. I've never worked with one but I have no doubt they exist. I'm no genius, not even close, but in my math classes, I did so many practice problems related to a subject that after a while I was able to solve some problems in my head without having to write anything down. Super programers aren't necessarily geniuses but they can use their brain to a greater extent before they start programming. I suspect. I didn't believe it either until I met one. They exist, but it takes more than an interview to gauge and they're so rare it's not worth trying. it will be difficult because the people with potential to become 10x developers are too busy trying to stay up to date with advances in LLM technology. I've been using GPT4 very intensively for the past week. I've mostly been writing frontend code and writing Python CLI utilities. - First, there's a MAJOR difference between GPT-3.5 and GPT-4. From a branding point of view it's almost surprising that both are grouped under the name "ChatGPT". Anytime I hear someone saying "yeah I tried out ChatGPT and wasn't impressed" you can be sure they're using the free version (ie GPT-3.5) - GPT-3.5 is great for simple scripts and data transformation, eg "rename files from this pattern to that pattern", "write a script to reformat from this CSV format to this JSON format". GPT-4 can do that as well of course but is generally overkill, it generates more slowly, and your GPT-4 requests are currently throttled so you want to keep them for higher value questions. - GPT-4 is GREAT for frontend code (React/TypeScript/TailwindCSS). For example I had a few ideas yesterday to improve one of our pages: I wanted to add "infinite loading" (loading data progressively instead of all at once), make the search input "sticky" so they stay at the end of the page, and add a "Clear" button in the main input field that only appear when it's focused. I gave ALL THREE changes at once to GPT-4, along with the 160 lines of relevant code (15K+ characters). It came up with comprehensive steps to make all these changes, including installing some npm packages, changing codes here and there, adding custom CSS styles. And IT ALL JUST WORKED ON FIRST TRY. - GPT-4 is also suprisingly good at design. I build pages which are tools for engineers so I don't need award winning UI, but having nice matching colors always makes things better. You can ask GPT-4 to make some display "classy", or "old school", or "make it really stand out" and it'll follow your instructions, matching the style of your webapp. Before that I was using TailwindUI, I now haven't checked it in a week. - GPT-4 can do non-trivial code refactor. For example I was able to give it all the code for request handling, asked to make it "more concise and maintainable" and it suggested major architecture changes, then gave me back mostly working code. The last small problems I was able to either fix myself or if I was lazy i just had GPT4 fix them. The question then really becomes a tradeo-off of how much I want to think vs how much I'm willing to do mindless copy-paste between the ChatGPT interface and my IDE. - What I find most enjoyable is that it makes it very "cheap" in terms of time and energy to try out new things. Frontend is not my speciality so for example I wasn't sure how long it'd take to add an "infinite loading" feature. Without GPT4, it wasn't a priority so i just wouldn't have added it, because it would have meant checking recommended libraries, checking their docs, finding Medium blog posts with code screenshots and behind paywall etc. Here I got a solution i could try out in a few minutes. In some cases I try things out and i realise they inccur too many changes - no worries i just bin the changes and move on with my life. - For simple Python CLI utilities it's also been insanely useful. For example I was able to say: "any time it's making a network request, i want a cool animation to be displayed" and bam it integrated the "halo" package, worked right away. I realise that this is an "easy case" because the whole code fits in context (300 short lines). So yeah overall, I'd definitely say it has been increasing my efficiency by a factor of 2-3x for these specific tasks. Again these tasks are relatively "easy", writing lots of code to do simple things that could reasonably easily be explained in plain words.
- Result: Create a HTML select list with the following text values: "First In, Last Out", "Last In, First Out", "Random Order", "Round-Robin" give each option and the select list appropriate names and IDs.
Keep in mind if I hand-wrote this I still have to write out the text-values which are 50% of the prompt's length. It did the names/IDs and the boilerplate. <select name="order" id="order">
<option value="fifo">First In, Last Out</option>
<option value="lifo">Last In, First Out</option>
<option value="random">Random Order</option>
<option value="round-robin">Round-Robin</option>
</select>
This code creates a simple HTML page with a form containing a select list. The user can choose one of the ordering methods from the list, and the selected value can be submitted with the form. <!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Select List of Orderings</title>
</head>
<body>
<form>
<label for="orderings">Choose an ordering method:</label>
<select name="orderings" id="orderings">
<option value="fifo">FIFO (First In, First Out)</option>
<option value="random">Random</option>
<option value="rr">RR (Round Robin)</option>
<option value="lifo">LIFO (Last In, First Out)</option>
</select>
<button type="submit">Submit</button>
</form>
</body>
</html>
Result: Create html using the bootstrap 5 library, create form with a fieldset (title: "Filter") inside with the following select option/label groups:
- Department
- Year
- Invoice Source
- Invoice Destination
Then two inputs/labels for start date and end date, and a submit button. Only include one example in each of the select options and no page boilerplate.
<form>
<fieldset>
<legend>Filter</legend>
<div class="mb-3">
<label for="department" class="form-label">Department</label>
<select class="form-select" id="department">
<option selected>Choose department</option>
<option value="1">Sales</option>
</select>
</div>
<div class="mb-3">
<label for="year" class="form-label">Year</label>
<select class="form-select" id="year">
<option selected>Choose year</option>
<option value="2022">2022</option>
</select>
</div>
<div class="mb-3">
<label for="source" class="form-label">Invoice Source</label>
<select class="form-select" id="source">
<option selected>Choose source</option>
<option value="1">Email</option>
</select>
</div>
<div class="mb-3">
<label for="destination" class="form-label">Invoice Destination</label>
<select class="form-select" id="destination">
<option selected>Choose destination</option>
<option value="1">Customer</option>
</select>
</div>
<div class="row">
<div class="col-md-6">
<label for="start-date" class="form-label">Start Date</label>
<input type="date" class="form-control" id="start-date">
</div>
<div class="col-md-6">
<label for="end-date" class="form-label">End Date</label>
<input type="date" class="form-control" id="end-date">
</div>
</div>
<div class="d-grid mt-3">
<button type="submit" class="btn btn-primary">Filter</button>
</div>
</fieldset>
</form>