Search Engine Dilemma Bias VS Accuracy

Author's note: This is a very old gemlog that I wrote but never finished out of laziness. Reading Krixano's post[1] today gave me the motivation.

One problem with the 3 current major Gemini search engines (GUS, Kennedy and TLGS) is the accuracy of their results. Searching anything too specific and they fail. I saw a lot of people writing about this frustration. "Fine, I'll bring bigger guns out" I thought. After days of reading on arXiv and other sources. I wasn't sure how to improve accuracy without some sort of compromise. Gemini itself also has other issues, you might want to read my other gemlog about this.

I assume most users of Gemini are FOSS enthusiastics and freedom lovers. Just like the case of RMS, this doesn't put us anywhere on the political spectrum. It simply means most of us are very familiar with computers and want absolute control over programs we run - there's shall be no human component in our code - that effects which algorithms is used for searching. Currently TLGS runs on PostgreSQL's full text search ranking and an old link analysis algorithm called SALSA.

One (relatively) easy way to improve search accuracy being, instead of just relying on a full text search to bring up pages. We can introduce the BERT model to extract context representation vectors of articles and match that with the query vector. However I'm hesitant to do this. Because

  • Training data, thus the model is always biased
  • I need much more computing resource than I'm now using

Most of Gemini users would agree that they don't want magic in their software. But an AI model is exactly that. Some blackbox that we can't control, but we know that it generates the correct results in the range of our ability to test it (with is very limited). Furthermore, it's difficult to quantize how biased the model is. AFAIK it's trained on the WikiText-2 dataset. LIkely it's biased towards how people write on Wikipedia. Especially for political topics. If the dataset is biased towards a certain side or country, the generated representation will be too. - I feel this pain myself. I live in a certain country that China doesn't like. So constantly seeing topics written by Chinese wikipedia editors that doesn't represent my country.

I don't think this is what we want. Search engines should pull up pages independent of the political adgenda of the creator.

Limitations and tradeoff in a search engine

The No Free Lunch Theorem

No Free Lunch Theorem is something I learned while doing AI research. In short, it's a mathematical statement that prohibits any universal algorithm to exist. If an algorithm is better at a certain task, it must loose some capacity somewhere else (though the looses can be from a complete different field). Without rigorous proof, I think accuracy and bias are what we are trading in a search engine. Either we have high accuracy but search result depends on human biases or we have low accuracy but search result is independent of any human values. As of now all Gemini search engines are the latter.

Take the following as an example. User types in "a" into search. Then what exactly should show up? An Wikipedia article about the Latin alphabet? The grammar to use "a" in English? The pronunciation of "a" in Esperanto? See, without knowing the user (which is definitely considered privacy invasion) we can't know what the user is looking for. Furthermore, our first option biases us towards Wikipedia. Yet there's dozens of encyclopedias around the world. Likewise, why the grammar in English and pronunciation in Esperanto? Spanish, French, German, Italian, and many other languages all uses that same character.

One might say that of course we should show the Wikipedia article and of course English. Because that's more popular. Yet, that means the system is biased towards popular contents. Still biased nevertheless. How about just show content that matches the FTS query? No, you are still biased. This time towards FTS ranking. A hybrid of the two? No, you are still biased away from hundreds of other metrics. You can try averaging all possible metrics. Then you are back to a random result.

That's the essence of the No Free Lunch Theorem. The more you generalize and attempt to solve more problems. The less effective a system is at the problems it can solve. And why popularity is a popular choice? Because it serves the most users what they want.

The Is-Ought Problem (or Hume's Guillotine)

There's another subject that I wish is more well-known among engineers and the general public - Hume's Guillotine (the Is-Ought Problem). See, you can't derrive any actionable conclusion without a preference. Consider the following conversation between Alice and Bob. Where Alice is a super Vulcan that answers all questions accurately but have no preference.

Bob: Today is hot.
Alice: Yes, indeed it's flaming hot.
Bob: And you are going out with a heavy coat.
Alice: Yes, I am.
Bob: So you'll also be hot.
Alice: So you mean I don't want to be hot?
Bob: But you'll get heat stroke if you are too hot.
Alice: Indeed.
Bob: And you could die from heat stroke.
Alice: You mean I shouldn't be dead?

Ok, the example is silly. Hopefully it demonstrates that for Alice to ever make a decision, she must have at least a preference. In our example, Alice must ought to not die for her to ought not to wearing a heavy coat in a hot day. You can never derive an ought statement without assuming another ought statement. This does not come in day to day conversations because we are talking to another human, Thus, both parties are makeing generally the same assumptions (aka, human value). The same can't be said about software.

The same limitation goes for a search engine. Without a preference, it's literally impossible to generate a search result. Does the search prefer popular content? article-query representation vector similarity? FTS match? No matter which one chooses, the search result will be biased towards that selection. Again, the no free lunch theorem kicks in. The futher you prefer, the more biased towards that preference.

Completely solving human ethics

There is one way around both issues outlined above. Just solve 100% of human preferences and ethics and write that in code, without any personal bias (since we basically can guarantee users will be a Homo Sapiens). Since we are at it, we should make this system omniscience so it can provide accurate information. - This is not viable for obvious reasons. Philosopher have spent thousands of years trying their best at this problem. And we are no where close to a sensible solution.

Fortunately, or rather unfortunately, this is one of the problem AI researches are solving to make safe AI possible. Any super intelligent that is not perfectly aligned with the human ideals will end up in a state of complete chaos. The following link is a good starting point:

Youtube video: Intro to AI Safety, Remastered - Robert Miles

There's no algorithm of truth

In computer science, a normal problem can be solved with finite computing power. A really hard problem can be solved with infinite computing power. And a really, really hard problem is one that you can't solve even with infinite computing power. Chess is hard. General AI is really hard but possible with infinite commuting power. Deciding truth is one of the really really hard problems. No one has a reliable procedure that tells fake news from real ones.

I can go into my distaste about modern, sensationalized news. But that's beyond the point. It's impossible to decide weather a piece of news is real or not given the text and it's sources. The problem is not just because getting computers to understand human text is another these really, really hard problems. It's that we can't even define truth in a way that makes sense outside of science. Should we consider some Christan news reporting on God's mercy true? It's false for Muslims, other religions and atheists. Likewise, we can't know if a reporting that some celebrity had a baby is real or not. It's their privacy yet our algorithm must decide otherwise fake news gets released into the wild.

Pushing to the extreme. Any algorithm that can decide weather all news is either true or fales is automatically impossible. Just publish a paper that says "The Result of this Paper is the inverse of the Truth-Deciding algorithm". And you get the halting problem. Literarly the problem undecidable. Even if we ignore the extreme case. News like "The following N-not-SAT problem is true" can effectively DoS the fake news detector by forcing it to solve a NP-hard problem.

Such algorithm cannot exist. Even just for piratical means. Implementations will have severe limitations.

"Do what I think not what I say"

This is another classic really, really hard problem that us, developers face regularly. Computers cannot understand human language. NLP is at best an approximation of it. A specification that is detailed and clear enough for computers to understand and execute is... code! And when what you say diverges from what you think, you get a bug.

It's easy to parse Lojiban into an AST. Maybe Esperanto and Ido with much effort. Notice that all language above are constructed language. They are designed to be regular without grammatical exceptions. Even by sacrificing all natural language. We just have the AST. Computers still don't "understand" what we are saying.

NLP is a blackbox (well, besides IBM's Watson, but the inner working is not accessible to the general public). Like in a human brain, they represent their understanding of the world a messy and complex data. So far, we have only be able to indirectly extract information out of a NLP model. We can't yet truly make a computer understand our language.

Search Engine as a tool

Like any software out there. Search engine is a tool that people can use. And people should be aware of it's limitations. It's not an oracle that answers your questions. It's a index that brings up links based on user input and the vendor-defined-ranking-algorithm. Like a knife it can be useful, but you need to use it's power with care.

The way forward

I don't know. I'm very open about this. I agree the link based analysis used by TLGS and GUS is not perfect. But they serve as a good proxy for what users want - mostly a way to find information quickly. And before someone solved human ethics and codify that. We have to rely on proxies of that. And that means bias everywhere no matter how hard we try. The no free lunch theorem will always kick in.

I could move TLGS to matching with BERT vectors. I can afford nodes with NPUs attached in my home lab. But that does not necessarily solve the problem of bias and accuracy. I think, the real question is, what's closer to both human ethics and preference. The best I can do is use the search engine I created. And try to make results I want to see show up more often.

Yes, it's biased. It's biased towards who I am. But I hope I'm a typical geek that is close enough to the people using Gemini. Acting as a proxy of the values we care.

Suggestions are always welcomed. Tag me on Station, reply and make it show up on Cosmos, a GitHub issue. I'll try to think trough what I change in the search engine.

Author's profile. Photo taken in VRChat by my friend Tast+
Martin Chang
Systems software, HPC, GPGPU and AI. I mostly write stupid C++ code. Sometimes does AI research. Chronic VRChat addict

I run TLGS, a major search engine on Gemini. Used by Buran by default.

  • marty1885 \at
  • Matrix:
  • Jami: a72b62ac04a958ca57739247aa1ed4fe0d11d2df