Ask HN: Cheapest way to compute large batches for science
Hi all,
I'm currently working on my PhD, where I will have to fit a few hundred million high-dimensional curves under somewhat complex constraints. My data - around 3.5 TB - is stored on a Postgresql-Server in my basement.
My current approach is as follows: - Segment the data into chunks and upload to AWS-S3 - Use AWS Lambda to create an AWS Batch Job to run the computation on EC2 spot instances - Store result on S3 and fetch
While that works like a charm, it also burns through my grad student budget like a wildfire, so I am looking for less cost-prohibitive ways of running that computation.
I have considered either running my tasks on EC2-comparable instances with another (cheaper) provider, since the AWS Batch functionality is mostly convenient, but I don't really need the flexibility. I know exactly which results I need, since my data set is already complete. However, I don't understand enough about cloud computing to exactly quantify how many requests or hours of computing I need on what type of processor.
Alternatively, I might just buy more computers, and heat my flat with those for a few months, before selling them again. Issues here are the high upfront cost, but it would give me cost safety over time. I also have no idea how long my computation would need.
How would you approach this problem?
No comments yet.