Perplexity's Deep Research is a Hallucinating Mess

Perplexity (perplexity.ai) loosely is a ChatGPT competitor that focuses on incorporating online search into its responses. This supposedly helps with reducing hallucinations and allows the LLM to have up to date information. Very recently they lunched "Deep Research" which instead of basically querying (I suppose) Bing then RAG on the search result. They use some Chain-of-Though and iterative refining to dig deeper into the search results. Interesting but how well that actually works?

I happen do have a subscription that give me access to the Deep Research feature. I use Perplexity from time to time as it is genuinely helpful solving difficult questions. Let's try. I also happen to be quite familiar with what Tenstorrent is doing, so let's ask Deep Research about Tenstorrent:

Query: How does Tenstorrent's processor architecture works? Any strength and limitations of the chip design?

The full response from Deep Research is available in the following link.

Deep Research's response to the query

Since the point of the feature is to allow users to dig deeper into the search results. To solve problems that are not easily answered simply by querying. It should be at minimal to NOT provide inaccurate information. Let's read what Deep Research has to say:

### Tensix Core: The Building Block

At the heart of Tenstorrent's architecture lies the Tensix core—a heterogeneous compute unit integrating five RISC-V RV32IMC cores, a 32x32 matrix engine (FPU), a vector engine (SFPU), 1–2 MB of SRAM, and two network-on-chip (NoC) routers. Two RISC-V cores manage NoC packet routing, while three handle compute kernel execution, enabling explicit control over data movement and computation. This contrasts with GPUs' fixed memory hierarchies, as Tenstorrent's SRAM acts as primary tensor storage rather than a cache, reducing DRAM dependency for intermediate results.

Just no! This is so f**king wrong! Just in the real 1st research paragraph, they got the following wrong. And I checked the reference, none of the information is there.

The RISC-V cores are RV32IM. The C encoding space is repurposed (IMO C is a bad extension to use up 30% of encoding space)
The FPU is not native 32x32, only the API is. Hence there's a packer and unpacker to grab data from tile into the FPU
You can store tensors in SRAM and it does gets used for intermediate results. But it's not the primary tensor storage

Great start. Let's see the next paragraph. Which is equally great.

The matrix engine processes BF16, FP16, INT8, and sparse formats, while the vector engine accelerates transcendental functions (e.g., GELU, Softmax) critical for transformer models. Each Tensix core achieves "silicon peak" efficiency by operating primarily on SRAM-resident data, avoiding the energy costs of HBM or GDDR6 accesses. A 12x12 grid of these cores forms the compute substrate in chips like Grayskull (120 cores) and Wormhole (144 cores).

Here's the list

The FPU (Matrix Engine) does not support INT8
The vector unit deal with everything the FPU can't
Weight tensors still has to live on GDDR6 (where did it get HBM from?). That cost is not avoided
The 12x12 grid is the NoC grid size. But not all of them are tensix cores. There are other types of cores in the grid

2 paragraphs down and I had enough reading the deep research response. It's bad and I cannot trust it. Nor I am going to trust it on any future topics. This happens to be one I can tell where something is BS or not. Imagine if I asked about something I don't know and yet critical. Boom!

The point of research should be to figure out the truth. Not just what it seems to be. Given LLM's ability and limitation of context windows, I am unsure how they are expected to fix this. Maybe RAG within RAG? Either way, this feature is not ready for prime time. And IMO best to avoid it.

Just, stay safe guys.

Martin Chang

Systems software, HPC, GPGPU and AI. I mostly write stupid C++ code. Sometimes does AI research. Chronic VRChat addict

I run TLGS, a major search engine on Gemini. Used by Buran by default.

marty1885 \at protonmail.com
Matrix: @clehaxze:matrix.clehaxze.tw
Jami: a72b62ac04a958ca57739247aa1ed4fe0d11d2df