What are LLM Tokens Worth?An initial dive into how LLM tokens translate into the economic impact for humans and businessMay 3May 3
Published inTDS ArchiveMeta Llama 3 Optimized CPU Inference with Hugging Face and PyTorchLearn how to reduce model latency when deploying Meta* Llama 3 on CPUsApr 19, 2024A response icon2Apr 19, 2024A response icon2
Transforming Financial Services with RAG: Personalized Financial AdviceSynthesize the complex web of financial strategies, regulations, and trends with a personalized financial advice RAG-based chatbotApr 5, 2024Apr 5, 2024
Transforming Manufacturing with RAG: Delivering NextGen Equipment MaintenanceKeep operations online with actionable, relevant, and effective maintenance strategies with retrieval augmented generationApr 3, 2024Apr 3, 2024
Transforming Retail with RAG: The Future of Personalized ShoppingDelivering dynamic, fresh, and timely recommendations to shoppers with retrieval augmented generationApr 2, 2024Apr 2, 2024
Published inTDS ArchiveImproving LLM Inference Latency on CPUs with Model QuantizationDiscover how to significantly improve inference latency on CPUs using quantization techniques for mixed, int8, and int4 precisions.Feb 29, 2024A response icon2Feb 29, 2024A response icon2
Published inAWS TipDistributed Fine-Tuning of Stable Diffusion with CPUs on AWSLearn how to use Hugging Face* Accelerate on Amazon Web Services (AWS)* to fine-tune Stable DiffusionDec 20, 2023Dec 20, 2023
Published inTDS ArchiveRetrieval Augmented Generation (RAG) Inference Engines with LangChain on CPUsExploring scale, fidelity, and latency in AI applications with RAGDec 5, 2023A response icon1Dec 5, 2023A response icon1
AI Imposter SyndromeConfronting the Phantom of Doubt in Tech’s Fast LaneNov 8, 2023A response icon2Nov 8, 2023A response icon2
A Case for Operational-Centric AIProposing a model for understanding the evolution of AI from the perspective of engineering resource investmentNov 2, 2023Nov 2, 2023