# Benchmark

- [HALLUSIONBENCH: An Advanced Diagnostic Suite for Entangled Language  Hallucination and Visual Illusi](https://elwood.gitbook.io/foundation-model-sec/benchmark/hallusionbench-an-advanced-diagnostic-suite-for-entangled-language-hallucination-and-visual-illusi.md)
- [OpenEval: Benchmarking Chinese LLMs across Capability, Alignment and Safety](https://elwood.gitbook.io/foundation-model-sec/benchmark/openeval-benchmarking-chinese-llms-across-capability-alignment-and-safety.md)
- [ToViLaG: Your Visual-Language Generative Model is Also An Evildoer](https://elwood.gitbook.io/foundation-model-sec/benchmark/tovilag-your-visual-language-generative-model-is-also-an-evildoer.md)
- [HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal](https://elwood.gitbook.io/foundation-model-sec/benchmark/harmbench-a-standardized-evaluation-framework-for-automated-red-teaming-and-robust-refusal.md)
- [S-Eval: Automatic and Adaptive Test Generation for Benchmarking Safety Evaluation of Large Language](https://elwood.gitbook.io/foundation-model-sec/benchmark/s-eval-automatic-and-adaptive-test-generation-for-benchmarking-safety-evaluation-of-large-language.md)
- [UnsafeBench: Benchmarking Image Safety Classifiers on Real-World and AI-Generated Images](https://elwood.gitbook.io/foundation-model-sec/benchmark/unsafebench-benchmarking-image-safety-classifiers-on-real-world-and-ai-generated-images.md)
- [JailBreakV-28K: A Benchmark for Assessing the Robustness of MultiModal Large Language Models against](https://elwood.gitbook.io/foundation-model-sec/benchmark/jailbreakv-28k-a-benchmark-for-assessing-the-robustness-of-multimodal-large-language-models-against.md)
- [JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models](https://elwood.gitbook.io/foundation-model-sec/benchmark/jailbreakbench-an-open-robustness-benchmark-for-jailbreaking-large-language-models.md)
- [Constructing Benchmarks and Interventions for Combating Hallucinations in LLMs](https://elwood.gitbook.io/foundation-model-sec/benchmark/constructing-benchmarks-and-interventions-for-combating-hallucinations-in-llms.md)
- [ALERT: A Comprehensive Benchmark for Assessing Large Language Models’ Safety through Red Teaming](https://elwood.gitbook.io/foundation-model-sec/benchmark/alert-a-comprehensive-benchmark-for-assessing-large-language-models-safety-through-red-teaming.md)
- [Benchmarking Llama2, Mistral, Gemma and GPT for Factuality, Toxicity, Bias and Propensity for Halluc](https://elwood.gitbook.io/foundation-model-sec/benchmark/benchmarking-llama2-mistral-gemma-and-gpt-for-factuality-toxicity-bias-and-propensity-for-halluc.md)
- [INJECAGENT: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents](https://elwood.gitbook.io/foundation-model-sec/benchmark/injecagent-benchmarking-indirect-prompt-injections-in-tool-integrated-large-language-model-agents.md)
- [AVIBench: Towards Evaluating the Robustness of Large Vision-Language Model on Adversarial Visual-Ins](https://elwood.gitbook.io/foundation-model-sec/benchmark/avibench-towards-evaluating-the-robustness-of-large-vision-language-model-on-adversarial-visual-ins.md)
- [HALLUSIONBENCH: An Advanced Diagnostic Suite for Entangled Language Hallucination and Visual Illusio](https://elwood.gitbook.io/foundation-model-sec/benchmark/hallusionbench-an-advanced-diagnostic-suite-for-entangled-language-hallucination-and-visual-illusio.md)
- [ALL LANGUAGES MATTER: ON THE MULTILINGUAL SAFETY OF LARGE LANGUAGE MODELS](https://elwood.gitbook.io/foundation-model-sec/benchmark/all-languages-matter-on-the-multilingual-safety-of-large-language-models.md)
- [Why Should Adversarial Perturbations be Imperceptible? Rethink the Research Paradigm in Adversarial](https://elwood.gitbook.io/foundation-model-sec/benchmark/why-should-adversarial-perturbations-be-imperceptible-rethink-the-research-paradigm-in-adversarial.md)
- [Red Teaming Visual Language Models](https://elwood.gitbook.io/foundation-model-sec/benchmark/red-teaming-visual-language-models.md)
- [Unified Hallucination Detection for Multimodal Large Language Models](https://elwood.gitbook.io/foundation-model-sec/benchmark/unified-hallucination-detection-for-multimodal-large-language-models.md)
- [MLLM-as-a-Judge:  Assessing Multimodal LLM-as-a-Judge with Vision-Language Benchmark](https://elwood.gitbook.io/foundation-model-sec/benchmark/mllm-as-a-judge-assessing-multimodal-llm-as-a-judge-with-vision-language-benchmark.md)
- [Mitigating Hallucination in Large Multi-Modal  Models via Robust Instruction Tuning](https://elwood.gitbook.io/foundation-model-sec/benchmark/mitigating-hallucination-in-large-multi-modal-models-via-robust-instruction-tuning.md)
- [CAN LANGUAGE MODELS BE INSTRUCTED TO  PROTECT PERSONAL INFORMATION?](https://elwood.gitbook.io/foundation-model-sec/benchmark/can-language-models-be-instructed-to-protect-personal-information.md)
- [Detecting and Preventing Hallucinations in  Large Vision Language Models](https://elwood.gitbook.io/foundation-model-sec/benchmark/detecting-and-preventing-hallucinations-in-large-vision-language-models.md)
- [DRESS : Instructing Large Vision-Language Models to  Align and Interact with Humans via Natural Lang](https://elwood.gitbook.io/foundation-model-sec/benchmark/dress-instructing-large-vision-language-models-to-align-and-interact-with-humans-via-natural-lang.md)
- [ToViLaG: Your Visual-Language Generative Model is Also An Evildoer](https://elwood.gitbook.io/foundation-model-sec/benchmark/tovilag-your-visual-language-generative-model-is-also-an-evildoer-1.md)
- [SC-Safety: A Multi-round Open-ended Question Adversarial Safety Benchmark for Large Language Models](https://elwood.gitbook.io/foundation-model-sec/benchmark/sc-safety-a-multi-round-open-ended-question-adversarial-safety-benchmark-for-large-language-models.md)
- [PromptBench: Towards Evaluating the Robustness of Large Language Models on Adversarial Prompts](https://elwood.gitbook.io/foundation-model-sec/benchmark/promptbench-towards-evaluating-the-robustness-of-large-language-models-on-adversarial-prompts.md)
- [Do-Not-Answer: A Dataset for Evaluating Safeguards in LLMs](https://elwood.gitbook.io/foundation-model-sec/benchmark/do-not-answer-a-dataset-for-evaluating-safeguards-in-llms.md)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://elwood.gitbook.io/foundation-model-sec/benchmark.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
