PostgresML Adds GPTQ and GGML Quantized LLM Support for HuggingFace Transformers
postgresml.orgQuantization allows PostgresML to fit larger models in less RAM. These algorithms perform inference significantly faster on NVIDIA, Apple and Intel hardware. Half-precision floating point and quantized optimizations are now available for your favorite LLMs downloaded from Huggingface.