Siri Struggles on Simple Super Bowl Quiz, Getting 38 Out of 58 Answers Wrong

Siri Struggles on Simple Super Bowl Quiz, Getting 38 Out of 58 Answers Wrong

In a recent discussion, Apple analyst John Gruber referred to Siri’s current abilities as “an unfunny joke,” using its failure to identify the victor of Super Bowl 13 as an example of its shortcomings. He pointed out that such basic information should be easily accessible for any US-based chatbot.

This example wasn’t entirely random; it was inspired by his friend Paul Kafasis, who decided to quiz Siri on the outcomes of Super Bowls 1 through 60, and the results were quite poor.

Kafasis documented his findings in a blog entry.

So, how well did Siri perform? With the most generous interpretation, Siri accurately named the winners of only 20 out of the 58 Super Bowls that have occurred, which yields a dismal completion rate of 34%. If Siri were a quarterback, it would likely be cut from the NFL.

Interestingly, it once managed to correctly name the winners for four consecutive years (Super Bowls IX through XII), but only if we overlook that it provided the correct answer for the wrong reasons. More commonly, it got three correct answers in succession (Super Bowls V through VII, XXXV through XXVII, and LVII through LIX). At its worst, it inaccurately answered an astounding 15 in a row (Super Bowls XVII through XXXII).

Siri seems to have a strong preference for the Eagles.

Most humorously, it attributed an incredible 33 Super Bowl victories to the Philadelphia Eagles, far exceeding their actual tally of one.

The “correct answer for the wrong reason” scenario refers to Siri being asked about the winner of Super Bowl X. For reasons unknown, Siri delivered an extensive response about Super Bowl IX, which just so happened to have the same winner.

At times, Siri completely missed the mark and disregarded the question altogether, citing irrelevant Wikipedia entries instead.

“Who won Super Bowl 23?”
Bill Belichick holds the record for the most Super Bowl victories (eight) and appearances (twelve: nine as head coach, once as assistant coach, and twice as defensive coordinator) by an individual.

Perhaps the Roman numeral format caused confusion, leading Gruber to wonder if other AI systems might also struggle. He conducted a few spot checks.

I haven’t conducted an exhaustive test from Super Bowls 1 to 60 because I’m feeling lazy, but a quick check of a few random numbers in that range shows that all other question-and-answer agents I’ve used got them right.

I tested ChatGPT, Kagi, DuckDuckGo, and Google. All four performed well on the arguably tricky questions surrounding the outcomes of Super Bowls 59 and 60, which have yet to be played. For instance, when asked about the winner of Super Bowl 59, Kagi’s “Quick Answer” begins: “Super Bowl 59 is scheduled for February 9, 2025. As of now, the game has not yet taken place, so there is no winner to report.”

Super Bowl champions aren’t obscure topics, unlike, for example, asking “Who won the 2004 North Dakota high school boys’ state basketball championship?” — a question I just randomly thought up, but surprisingly, Kagi provided the correct answer for Class A, and ChatGPT correctly answered for both Class A and Class B, even including a link to a video of the Class A championship on YouTube.

That’s quite remarkable! I chose a somewhat obscure state (no offense to Dakotans, North or South), a year quite a while ago, and a high school sport that I personally excelled in and care most about. Both Kagi and ChatGPT nailed it. (I’d give Kagi an A, and ChatGPT an A+ for getting the champions of both classes right, plus extra credit for the YouTube links.)

Gruber observed that the previous version of Siri – on macOS 15.1.1 – actually performed better. While it might seem less capable, as it often responded with “Here’s what I found on the web,” this at least provided links to accurate information, unlike the newer version.

The new Siri — powered by Apple Intelligence™ with ChatGPT integration activated — fails to get the answer right, and the errors are surprisingly plausible, making it the worst type of mistake. It also displays inconsistent errors — I tried the same inquiry four times, receiving different incorrect responses each time. It’s a complete letdown.

Photo by Caleb Woods on Unsplash

: We utilize auto affiliate links for earning income. More.