Cheapest way to deploy smaller fine-tuned AI models?
Any tips on how to deploy and use a fine-tuned model on Huggingface in a cost effective way? Right now looking into use Gradio with HuggingFace spaces and using the API endpoint from there. Inference endpoints and Sagemaker seem excessive for this. The whole idea to use smaller models is to decrease costs (vs using a bigger model with an API endpoint) but maybe this just isn't cost effective for where we are right now. If you're only using it incrementally then Replicate and Modal Labs have per-second pricing. Not sure about HuggingFace though. Sagemaker supposedly has a Serverless endpoint but haven't looked into it and doubt it would be a good deal since it's AWS. Looks like replicate is perfect. Will look into it. Thanks!