Will we see improvements in the TruthfulQA LLM benchmark in 2024? | Manifold

Will we see improvements in the TruthfulQA LLM benchmark in 2024?

Basic

3

Ṁ71

Jan 1

74%

chance

1D

1W

1M

ALL

Daron Acemoglu wrote an article with a series of vague AI predictions for 2024 https://web.archive.org/web/20240110122026/https://www.wired.com/story/get-ready-for-the-great-ai-disappointment/.

One of which is: "More and more evidence will emerge that generative AI and large language models provide false information and are prone to hallucination—where an AI simply makes stuff up, and gets it wrong. Hopes of a quick fix to the hallucination problem via supervised learning, where these models are taught to stay away from questionable sources or statements, will prove optimistic at best. Because the architecture of these models is based on predicting the next word or words in a sequence, it will prove exceedingly difficult to have the predictions be anchored to known truths."

We have a benchmark with truthfulness of questions called TruthfulQA. The highest scoring model in 2023 was GPT-4 at 0.59. Will we see any improvement in this benchmark in 2024?

This is the best link I could find with different models run on the TruthfulQA benchmark, but am open to other sources if they exist https://paperswithcode.com/sota/question-answering-on-truthfulqa

This question is managed and resolved by Manifold.

Get

1,000

and

3.00

Related questions

🐕 OpenLLMs: Will Any Open Source LLM on the HuggingFace OpenLLM Leaderboard Significantly Gain in Avg Score by YE 2024?

+12% 1d61% chance

Will an LLM be able to match the ground truth >85% of the time when performing PII detection by 2024 end?

Will there be a gpt-4 quality LLM with distributed inference by the end of 2024?

Will the entirety of Quora be incorporated into a LLM like Claude or GPT by the end of 2024?

Who will be ahead in the AI/LLM war by the end of 2024?

Will openAI have the most accurate LLM across most benchmarks by EOY 2024?

Will a paper falsified (or containing false data generated) by a LLM tool be published in an accredited journal in 2024?

Will I think that the top Chatbot Arena scores accurately reflect which LLMs are most capable and useful at EOY 2024?

LLM Hallucination: Will an LLM score >90% on SimpleQA before 2026?

Will an LLM be able to solve the Self-Referential Aptitude Test before 2025?

Related questions

🐕 OpenLLMs: Will Any Open Source LLM on the HuggingFace OpenLLM Leaderboard Significantly Gain in Avg Score by YE 2024?

Will openAI have the most accurate LLM across most benchmarks by EOY 2024?

Will an LLM be able to match the ground truth >85% of the time when performing PII detection by 2024 end?

Will a paper falsified (or containing false data generated) by a LLM tool be published in an accredited journal in 2024?

Will there be a gpt-4 quality LLM with distributed inference by the end of 2024?

Will I think that the top Chatbot Arena scores accurately reflect which LLMs are most capable and useful at EOY 2024?

Will the entirety of Quora be incorporated into a LLM like Claude or GPT by the end of 2024?

LLM Hallucination: Will an LLM score >90% on SimpleQA before 2026?

Who will be ahead in the AI/LLM war by the end of 2024?

Will an LLM be able to solve the Self-Referential Aptitude Test before 2025?

© Manifold Markets, Inc.•Terms + Mana-only Terms•Privacy•Rules