Cheapest way to deploy smaller fine-tuned AI models?

2 points by johhns4 2 years ago · 2 comments · 1 min read

Any tips on how to deploy and use a fine-tuned model on Huggingface in a cost effective way? Right now looking into use Gradio with HuggingFace spaces and using the API endpoint from there. Inference endpoints and Sagemaker seem excessive for this. The whole idea to use smaller models is to decrease costs (vs using a bigger model with an API endpoint) but maybe this just isn't cost effective for where we are right now.

ilaksh 2 years ago

If you're only using it incrementally then Replicate and Modal Labs have per-second pricing.

Not sure about HuggingFace though.

Sagemaker supposedly has a Serverless endpoint but haven't looked into it and doubt it would be a good deal since it's AWS.

johhns4OP 2 years ago

Looks like replicate is perfect. Will look into it. Thanks!

Settings

Cheapest way to deploy smaller fine-tuned AI models?

Keyboard Shortcuts