Harsh MajethiyaMay 17, 20233 minSmaller is Better: Q8-Chat, an Efficient Generative AI Experience on XeonLarge language models (LLMs) have taken the machine learning community by storm, thanks to their transformative architecture that excels...