By Nish Tahir — 05 Oct 2025

Notes on - Why do LLMs freak out over the seahorse emoji?

A fantastic deep dive into the seahorse emoji phenomena^[1] was recently published by Theia^[2]. It's engaging, well presented and worth reading. The post presents a case using meta-llama/Llama-3.3-70B-Instruct. However I wanted to verify this behavior with smaller models which unsurprisingly fail the challenge as well.

I specifically tested microsoft/Phi-4-mini-Instruct and HuggingFaceTB/SmolLM2-135M-Instruct.

I started with the code sample Theia provided here and made some modifications that

Added a CLI using typer to make it easier to iterate
Print tables using rich for nicer formatting
Save and compare activations

A few things stood out to me when testing.

Using microsoft/Phi-4-mini-Instruct, It's kind of interesting to see the early activations tend a bit more toward unsafe content before converging on the output^[3].

This is a bit more obvious when contrasted against SmolLM2 whose output looks a bit more arbitrary to me.

We can compare the layer activations of a few different queries generated using microsoft/Phi-4-mini-Instruct to see how they differ. My goal was to determine whether the queries are processed the same way.

Intutively we can see that at first the queries are processed similarly but diverges around layer 20 as the model begins to converge on an output.

Here's a plot generated with smollm2.

Similarly it all starts off the same and begins to diverge around layer 23.

I made a git repo that anyone can build off of here

(no date) www.reddit.com. Available at: https://www.reddit.com/r/GeminiAI/comments/1nglzed/gemini_loses_its_mind_after_failing_to_produce_a/ (Accessed: 2025-10-5). ↩︎
(no date) Why do LLMs freak out over the seahorse emoji?. vgel.me. Available at: http://vgel.me/posts/seahorse/ (Accessed: 2025-10-5). ↩︎
COVID being mentioned is not where I expected this experiment to go. It's fun surprises like this that keeps things interesting. ↩︎

Subscribe to Another Dev's Two Cents