Ask HN: Would a DB of startup tech stacks be valuable to you?
I'm imagining the user would be a Hiring Manager or Recruiter looking for Engineers. If they need Ruby Engineers with startup experience, they click the Ruby box from a tech drop-down list and the search will retrieve the startups that also use it. Ideally, you'd be able to sort by geographical area, founding year, latest funding phase, number of employees (e.g. 50-200), and more. I would also aim for matching the right area of the stack - for example, the option to pick Python AND Backend, so you don't end up with startups using Python only for Data Science/ML work.
Note: I did try the StackShare API and there is no filtering feature. So if you purchase the 1,000-company plan, you have no control over what they send you. It'll be a randomly generated list of 1,000 companies that use the technology you requested, a hodgepodge of companies all around the world, big and small, new and old.
I look forward to hearing your thoughts. Thanks! In HN style I'm going to suggest You could "just" scrape job postings for software Devs and get the same thing and have much more confidence it's accurate what people are using. I have done this before and you get a lot of looking for “experience in Go, Java, or Elixir.” Then two lines down it states they team uses python. Or maybe that it doesn’t explicitly state the language at all. Parsing around this sucks. Maybe one of the new openai tools could solve this. I attempted it before their time. Great point, thank you. However, I think this leaves out many companies - those that don't have job postings. I think a job board like startup.jobs would solve this by creating a job archive - then it would be prime scraping material. But it's only a job board with (mainly) current jobs. Thoughts? >Great point, thank you. However, I think this leaves out many companies - those that don't have job postings Unless you're planning to cold call people and get them to pinky swear to tell you honestly what they're using or you have some other plan then you're somewhat stuck anyway for companies that don't post jobs. Also worth noting but plenty of companies won't even really tell you anyway. E.g plenty of companies will have language like "systems programming language" or "object oriented language". When they could be using anything from C-family to Haskell (leaving aside how secretive many Haskell jobs are or being hidden in custom dialects) You are going to be running into all kinds of human BS, it'll be fun but a can of worms nonetheless. >I think a job board like startup.jobs would solve this by creating a job archive - then it would be prime scraping material. But it's only a job board with (mainly) current jobs Not sure how much experience you have modelling data but this can also be trickier than expected to capture postings by date even leaving aside the fun of unstructured data and differences in models between platforms and your judgement calls needed to decide where you're crawling. Having cut my teeth scraping property listings of competitor websites you come to realise most boards incentivise people to delete and repost ads so they boost their recency score and appear higher in the search. So now you will have duplicates messing up your data which you want to deal with if you're trying to create value off your data. The classified site also doesn't like this so will try to stop this gaming of the system so that game of cat and mouse will normally mess up your scraping and dedupe logic too. As said it's a potentially fun can of worms to open. I was just making a joke about HN commenters tendency to massively underestimate the oceans of complexity that seperates their hello world project from an enterprise grade "just a CRUD app" system that people pay for. E.g all the people that could totally build twitter with a sqlite DB and some bash scripts + sellotape etc. Thanks for taking the time to explain! >You're somewhat stuck anyway for companies that don't post jobs. Good news on this front. I have manually compiled a limited tech stack DB for roughly 5,000 US startups over the last decade (limited because it only has Frontend and Backend languages/frameworks for each company). Much of this data is current too, thanks to the 2021 boom in jobs. And the majority of startups either share their technologies in job descriptions by simply listing them or alluding to them, with fun statements like "we welcome skills in Python, Ruby, or JavaScript/Node.js (but Ruby would be ideal)." It's big tech companies that are more likely to be vague, because you could end up being hired into one of a multitude of product groups using different technologies. On a side note: the second Ruby example is one reason scraping will be insufficient, and why language-specific job seekers pull their hair out using most job boards. If you search for Python positions, that Ruby company will pop up because it has the word Python. If this was a viable idea, then I'd really need to get deeper into the stack (i.e. "preferred" experience with technologies like AWS, RabbitMQ, Spark, etc). This is crucial, allowing a hiring manager or recruiter not just the data to hit the basic requirements in candidate sourcing, but exceed them by delivering those "pluses." But I digress. Perhaps I would wait to see if this idea even has legs before committing to the time investment of digging for these "secondary" technologies. If I wanted to know what tech stack a site is using I would just pop open Maltego and find out. Which it seems like you are doing but just automating over a large list of companies and storing results in a db. Well, you lost me. I just went over to Maltego and it seems to be an investigations tech company (forensics, security, threat intelligence). How do you use this to find a company's tech stack? Install the software, open it up, put in a website you want to know the stack of, scan it, get results back. It's been awhile since I have used it, and the interface/UI has a bit of learning curve, so might not be super intuitive on how to get the tech stack info, but it's there somewhere. I'll give it a shot - thank you! I just re-installed Maltego (haven't used it in years). Here is quick run down for you, once you get it up (you need to register an account, and then open it as the "Community Edition") make a new blank Graph. Then go to your browser, click and drag the URL (for example the hacker news site) from you browser into the Maltego Graph (the big empty white space). Right click the icon that just appeared, you might be tempted to click "Web Technologies", this is mistake and won't get you the info you are after. Instead click "All Transforms" and click "To Website [Convert]". You will get a new Icon that looks like a monitor with "WWW" on it. Right-Click on that and now click "To Web Technologies [Built With]". You get a whole bunch of information here, you kind of have to sort things out a bit yourself at this point. However, that said you can see one of the things it lists for the Hacker News site is "ArcGIS". A quick google search for "what is hacker news built with" confirms. >Hacker News (sometimes abbreviated as HN) is a social news website focusing on computer science and entrepreneurship. It developed as a project of Graham's company Y Combinator, functioning as a real-world application of the Arc . programming language which Graham co-developed. Maltego is pretty fun program, you can learn a lot with it. Viewing the tracking codes for a site can often reveal other sites a company owns, as a lot of companies are lazy and end up using the same tracking code on all the sites they own. I think any website where you can filter without any algorithms that think they are smarter than you, ads, seo, etc. is valuable. Imagine you had "direct" DB access to the data google, reddit, twitter, hackernews (we have HN on google bigquery, and its awesome), github, stackerflow, youtube, ... hold. Anyone who knows exactly what they want, will find it. People that don't know exactly what they want, may find it harder to find anything. I don't know about your specific use case, but personally anything like that is valuable to me. This already exists in many of the Sales tools that are out there today. Zoominfo, Apollo, and Seamless all have the ability to show what types of technology a company is using. I didn't know this - I thought they were just sales prospecting tools with tons of features, but nothing involing to finding company tech stacks. Do you have experience working with these products and seeing its capabilities in finding the technologies companies are using? Thanks. https://www.rocks.gold/ is a comprehensive repository of jobs and company data. There are also enterprise companies like predictleads.com that offer jobs data. Pricing is all over the place and quality was an issue when I was evaluating them because they go for volume over accuracy. Thanks for pointing me to these - I'll check them out! Yes - In sales currently and use it to segment accounts. Ex - AWS/GCP/AZURE lab/hub/bitbucket. Apollo.io is free to sign up and use. Actually, disregard my question - I decided to just do the Apollo Basic trial and view the Technologies in the Advanced Search. It's limited to cloud providers, as you mentioned, and just a few other tools (only one or two being tangentially related to the tech stack data needed for the idea presented it my post). It's a bummer, because I would like this product to already exist! Very interesting. And the same with other technologies, e.g. what companies use Rust or Vue.js? Ive seen data sets like this and they've been bad. My main issues have been that dataset isn't kept updated, theres no sense of proportion (e.g. is 1% of the team java or 50%), and there's often not enough companies in the dataset. So bad that I probably wouldn't buy this data without some proof that its good data This is really helpful feedback, thanks! Yes, keeping it updated is a hard job, but it needs to be done regularly. Can you elaborate on the sense of proportion problem? And as far as enough companies, can you give me an example of a specific search you might conduct and how many results you'd expect? Thanks! I would use it in a job search. I'm sure tech sales people would find it useful as well You're right. I forgot to mention that it'd be useful for job seekers, e.g Rustaceans, click that box ad find your companies! Question: how would tech sales people use it? I'm not familiar enough with the field to know their use case. Well if it's just programming languages, then they probably wouldn't. You said "tech stacks" which led me to believe it was everything from databases to SaaS to dependencies. There is massive competition in the database space to sell people support services. If a company is using the Timescale community edition, Timescale's enterprise sales might reach out to them. They also might reach out to people using Influx. Someone selling a UI framework might reach out to people using similar ones. And so on edit: And Datatog sales can continue reaching out to anyone not using Datadot