I tested ChatGPT by giving it some easy odd-one-out riddles. I would give it five items, and one of them wouldn't belong. It had to find the odd one out and explain why.
It's track record was shoddy but not hopeless. Sometimes it was right:
Me: Which is the odd one out: warmth, happiness, coal, red, big
ChatGPT: In this case, "coal" is the odd one out, as it is not an abstract concept or an emotion like the other terms listed (warmth, happiness, red, and big). It is a physical object or substance.
Sometimes it was wrong:
Me: Which is the odd one out: Phone, switchblade, keys, supercomputer, glasses
ChatGPT: In this case, the odd one out is "switchblade" as it is the only item that is primarily used as a weapon. The other items - phone, keys, supercomputer, and glasses - are typically used for communication, computing, or vision correction.
And sometimes it was right but with the wrong reason:
Me: Which is the odd one out: Frederick Douglass, Muhammad Ali, Neil DeGrasse Tyson, Usain Bolt, George Bush?
ChatGPT: The odd one out in this group is "George Bush," as he is the only one who has served as the President of the United States. The others - Frederick Douglass, Muhammad Ali, Neil DeGrasse Tyson, and Usain Bolt - are known for their achievements in activism, sports, and science, respectively.
If, before 2025, an AI can solve 90%+ of these riddles (correct answer with explanation) then this resolves yes.
You are welcome to suggest criteria to make this more rigorous (like what counts as "easy"). Anyways, I am not betting in this market.
This seems more like a matter of opinion to me:
"Me: Which is the odd one out: Phone, switchblade, keys, supercomputer, glasses
ChatGPT: In this case, the odd one out is "switchblade" as it is the only item that is primarily used as a weapon. The other items - phone, keys, supercomputer, and glasses - are typically used for communication, computing, or vision correction."
One could just as easily say "Supercomputer, because it is the only item on the list that most people can't afford" or "Supercomputer, because it's the only item one would not find in a house", "Supercomputer, because it's the only item one would not take out while going for a walk". It's unclear to me why this is a better answer than switchblade.
To be fair to GPT: I don't think I could solve riddles 1 and 2 either. My first reaction upon reading "warmth, happiness, coal, red, big" is "wait, is this supposed to have something to do with Christmas?"
For "phone, switchblade, keys, supercomputer, glasses" I'm thinking "is it supercomputer as the most advanced and doesn't fit in a room? glasses, because you wear them on yourself? keys because they are plural?"
I sure hope nobody will try to use this stuff to prove that LLMs are dumb and can't really think, because it will be a little awkward.
@firstuserhere @bohaska can you look at this chat? Are these answers correct? I don't know the ground truths lol