OpenDataArena-Tool Data Scorer Documentation
The data scorer of OpenDataArena-Tool for OpenDataArena offers multi-dimensional score assessments for datasets through a series of automated, multi-faceted evaluation and processing methods.
Installation
conda create -n oda python=3.10
conda activate oda
git clone https://github.com/OpenDataArena/OpenDataArena-Tool.git
cd OpenDataArena/data_scorer
pip install -r requirements.txt
pip install flash_attn==2.7.4.post1 --no-build-isolation
# if you want to calculate fail rate, run the following command, which will install the lighteval package
cd model_based/fail_rate
pip install -e .[dev]
Data Evaluation
The data scorer of OpenDataArena-Tool integrates various advanced data processing and scoring technologies, primarily including the following three core modules.
- Model-based Evaluation
- AskLlmScorer
- AtheneScorer
- CleanlinessScorer
- ComplexityScorer
- DebertaScorer
- DeitaCScorer
- DeitaQScorer
- EffectiveRankScorer
- EmbedSVDEntropyScorer
- FailRateScorer
- FinewebEduScorer
- Gpt2HarmlessScorer
- Gpt2HelpfulScorer
- GraNdScorer
- HESScorer
- IFDScorer
- InfOrmScorer
- InstagScorer
- MIWVScorer
- NormLossScorer
- NuclearNormScorer
- PPLScorer
- ProfessionalismScorer
- QuRateScorer
- RMDeBERTaScorer
- ReadabilityScorer
- ReasoningScorer
- SelectitModelScorer
- SelectitSentenceScorer
- SelectitTokenScorer
- SkyworkLlamaScorer
- SkyworkQwenScorer
- Task2VecScorer
- TextbookScorer
- ThinkingProbScorer
- UPDScorer
- UniEvalD2tScorer
- UniEvalDialogScorer
- UniEvalFactScorer
- UniEvalSumScorer
- LLM-as-Judge
- Q = instruction-only, QA = instruction + output
- Difficulty (Q)
- Relevance (QA)
- Clarity (Q & QA)
- Coherence (Q & QA)
- Completeness (Q & QA)
- Complexity (Q & QA)
- Correctness (Q & QA)
- Meaningfulness (Q & QA)
- Heuristic
- ApjsScorer
- ApsScorer
- ClusterInertiaScorer
- CompressRatioScorer
- FacilityLocationScorer
- GramEntropyScorer
- HddScorer
- KNN Scorer
- LogDetDistanceScorer
- LogicalWordCountScorer
- MtldScorer
- NovelSumScorer
- PartitionEntropyScorer
- PureThinkScorer
- RadiusScorer
- StrLengthScorer
- ThinkOrNotScorer
- TokenEntropyScorer
- TokenLengthScorer
- TreeInstructScorer
- TsPythonScorer
- UniqueNgramScorer
- UniqueNtokenScorer
- VendiScorer
- VocdDScorer