Did they cheat in the LLM arena ranking?
Cgpt is winning the race
It's very hard to navigate non-technical problems at Google. Legal, privacy concerns etc. than the technical problems. Each team works for themselves to benefit from their features and to get promoted. Asking a team to expose data via message queues or simple APIs will not work. Before asking them for the data you have to talk to your privacy counsel, legal counsel. Then their privacy counsel and legal counsel. Then your counsels should work with their counsel to understand what is the privacy/legal implications. Finally after you have navigated and decided a good technical solution they would call it an unfair advantage to expose the API only to internal teams. So then the other team has to conform to public API standards (which could take a while) since it is a public API and you need some more reviews. Now they have to set up a separate service to expose those public APIs isolated from their internal APIs. Finally after all the work is done you get to call the API. These things take months. Please don't blame Google. Google was fast moving and seemingly innovative a few years back because these restrictions were much sparse. Now, people who work on such things are equally or more smarter than the old engineers but are restricted with such rules and regulations. There is a huge violation that ChatGPT did by crawling webpages without appropriate permissions. Google CANNOT do that. If engineers build such a thing without privacy approval they cannot launch. If they launch by lying then they get fired.
This will get better over months. The underlying model is rock solid and will continue to get better. There are numerous examples on Twitter/internet where gemini is dunking on gpt4 too.
The major problem with Gemini is its consistency. When it works, it works really well, but often it returns unusable answers like this one. Too much censorship will destroy these models, Claude is also unusable now due to this censorship.
You need to understand that you are not a typical user. The vast majority of users don't give a shit about censorship or 'pushing the boundaries'. If these LLMs were to achieve mass market adoption, then it is going to be based on their overall usefulness than anything else.
Link?
https://g.co/gemini/share/85d5ddf13703