
GitHub - confident-ai/deepeval: The LLM Evaluation Framework
DeepEval is a simple-to-use, open-source LLM evaluation framework, for evaluating and testing large-language model systems. It is similar to Pytest but specialized for unit testing LLM outputs.
GitHub - openai/evals: Evals is a framework for evaluating LLMs and …
Evals provide a framework for evaluating large language models (LLMs) or systems built using LLMs. We offer an existing registry of evals to test different dimensions of OpenAI models and the ability to …
GitHub - EleutherAI/lm-evaluation-harness: A framework for few-shot ...
Overview This project provides a unified framework to test generative language models on a large number of different evaluation tasks. Features: Over 60 standard academic benchmarks for LLMs, …
The LLM Evaluation guidebook ⚖️ - GitHub
If you've ever wondered how to make sure an LLM performs well on your specific task, this guide is for you! It covers the different ways you can evaluate a model, guides on designing your own …
Supercharge Your LLM Application Evaluations - GitHub
Ragas is your ultimate toolkit for evaluating and optimizing Large Language Model (LLM) applications. Say goodbye to time-consuming, subjective assessments and hello to data-driven, efficient …
GitHub - modelscope/evalscope: A streamlined and customizable …
EvalScope is a powerful and easily extensible model evaluation framework created by the ModelScope Community, aiming to provide a one-stop evaluation solution for large model developers.
GitHub - evalplus/evalplus: Rigourous evaluation of LLM-synthesized ...
EvalPerf: evaluating the efficiency of LLM-generated code! Framework: our packages/images/tools can easily and safely evaluate LLMs on above benchmarks.
GitHub - huggingface/lighteval: Lighteval is your all-in-one toolkit ...
Your go-to toolkit for lightning-fast, flexible LLM evaluation, from Hugging Face's Leaderboard and Evals Team. Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends—whether …
GitHub - raga-ai-hub/raga-llm-hub: Framework for LLM evaluation ...
The RagaAI LLM Hub is uniquely designed to help teams identify issues and fix them throughout the LLM lifecycle, by identifying issues across the entire RAG pipeline.
llm-evaluation-framework · GitHub Topics · GitHub
Jan 4, 2024 · llm-evaluation-framework Here are 43 public repositories matching this topic... Language: All Sort: Most stars