AI Forensics is a European non-profit that investigates influential and cryptic algorithms with the goal of helping shape and enforce responsible regulatory policies. They work to hold major technology platforms accountable by conducting independent and innovative technical investigations to uncover and expose the harms caused by their algorithms. Along with Algorithm Watch, they set out to understand if Microsoft’s Copilot, an AI tool formally known as Bing Chat, is a reliable source of information during elections.
To assess chatbot behavior across different countries and to avoid IP blacklisting, they partnered with the Bright Initiative and utilized Bright Data’s Residential Proxies and Scraping Browser. The investigation centered around the Swiss Federal Elections and the German state elections in Hesse and Bavaria. Over the span of two months, the chatbot was prompted with questions about election dates, candidates, polling numbers and controversies. Experts then analyzed over 1000 prompts for information quality, checking for errors, fabrications, and instances where the chatbot struggled to respond.
In the realm of AI Forensics, our reliance on the Bright Data Initiative is pivotal, enabling us to replicate our research on a global scale. This partnership not only makes our work more relevant but also facilitates cross-national comparisons that were previously challenging. Until major platforms commit to consistently providing data to researchers, strategic collaborations such as this one to collect and analyze public data remain essential for advancing the public good.Salvatore Romano, Head of Research, AI Forensics
Their research found that Microsoft’s Bing frequently gave incorrect information, with one-third of its answers containing factual errors such as incorrect election dates, out-of-date candidates, and made-up controversies. The chatbot often avoided answering questions or gave inconsistent responses, compromising its reliability as an information source. Additionally, they found that the chatbot preforms significantly worse in languages other than English. Despite informing Microsoft about these issues, a subsequent check revealed little improvement in the chatbot’s accuracy. These findings highlight the instability of generative AI and the need for regulation, especially as these tools are marketed for the general public and spreading misinformation.