PostgresML Adds GPTQ and GGML Quantized LLM Support for HuggingFace Transformers

4 points by montanalow 3 years ago · 1 comment

Reader

Quantization allows PostgresML to fit larger models in less RAM. These algorithms perform inference significantly faster on NVIDIA, Apple and Intel hardware. Half-precision floating point and quantized optimizations are now available for your favorite LLMs downloaded from Huggingface.

Settings

PostgresML Adds GPTQ and GGML Quantized LLM Support for HuggingFace Transformers

Keyboard Shortcuts