The Eidon condition was handicapped by design. It could not see a single file. And it still won every question. In normal usage, an AI would have both the compressed structural knowledge and direct file access. These results are the floor, not the ceiling.
The Setup
What changed, and what did not.
Both runs used VS Code, GitHub Copilot, and Claude Sonnet 4.6 on the same public repository. Questions 1 through 5 were identical. The only changed variable was whether the assistant had Eidon.
Open files. Search. Trace call chains manually. Reconstruct repository context before answering.
Use Eidon MCP outputs only. No file access. Begin from structured whole-repository context.
Q1 to Q5 were shared. Q6 was Eidon-only. Q1 used only eidon_guide and eidon_encoding. Q2 to Q5 used the broader Eidon surface. Q6 used only eidon_encoding.
The shared benchmark is about Questions 1 through 5. Question 6 is separate and interpretive. It asks what the model could see only because full-repository state was already available.
The Compression
2,719,406 tokens.
That is roughly 10,000 pages of code.
Compressed to 31,621. One AI prompt.
Every file. Every function. Every cross-file dependency. Restructured as a mathematical graph. Not summarized. Not truncated. Not approximated. The same information in 86× less space. Every finding on this page came from that 31,621-token output.
The analysis that produced these results ran once, before any question was asked. If the repository changes, only the modified files need reanalysis. The structural graph updates incrementally. The 31,621-token output reflects the entire codebase at the time of the test.
The Results
Eidon won every question.
Across all five shared questions, the Eidon condition produced deeper analysis, broader coverage, and more cross-file reasoning. The control was competent. It found real bugs, traced real code paths, and identified real architectural patterns. But it was working from 4.8% of the repository. The 95.2% it never opened contained the security vulnerabilities, the layer violations, the proto migration gaps, and the division-by-zero crashes.
Primary-Source Record
The exact paired answers.
The benchmark summary above is grounded in the paired transcript itself. For readers who want primary-source evidence, the excerpts below reproduce representative passages from both runs, condensed for readability. They are shown side by side so the difference in coverage, abstraction level, and causal accuracy can be inspected directly. The full unedited transcript is linked below.
Q1 Architecture Paired excerpts
Q2 Code Path Paired excerpts
Q3 Single point of failure Paired excerpts
Q4 Architectural insights Paired excerpts
Q5 Bug and risk detection Paired excerpts
Q6 What full-repository access made visible Eidon-only follow-up
This prompt was asked only in the Eidon run. There is no control answer for it in the source record.
blast_radius=0 across the graph is a Hilt DI inversion artifact, not a sign of low coupling.
Seeing Types.kt, ModelHelperExt.kt, and LlmChatModelHelper.kt together reveals the NPU dispatch failure.
Seeing settings.proto plus all five serializers proves the token migration gap is systemic, not an isolated oversight.Question 1
Architecture Overview
Both runs were asked to provide a complete architectural overview of the repository: modules, data flow, dependencies, and non-obvious insights. The file-browsing control spent 63 operations before its first full answer. It opened directories, enumerated Kotlin files, read key runtime and UI modules, and eventually produced a serious architectural summary. This is important: the control was not shallow. It was expensive.
For this first question, the Eidon condition made 2 structured Eidon calls:
eidon_guide and eidon_encoding. In return it received coverage of
all 1,361 files as a
mathematical graph: 6,933 nodes, 16,925 edges, 305 community structures. It produced exact
eigenvector centrality rankings, cluster structure, and graph-level properties such as the
Fiedler spectral gap. That is the difference in kind: the Eidon run began from a whole-system
object instead of assembling the system out of fragments.
These are not the kinds of properties manual browsing naturally surfaces early. The control can accumulate local truth by reading more files, but whole-graph quantities only appear when the repository is already carried as a graph. The advantage here is not speed alone. It is the starting representation.
Question 2
Code Path Tracing
Both runs were asked to trace the complete path from a user tapping "Send" to the final
model response appearing on screen. The file-browsing control produced a correct surface-level
trace: MessageInputText → ChatPanel → ChatView
→ LlmChatViewModel → LlmChatModelHelper → LiteRT-LM →
MessageCallback → StateFlow → recomposition →
MessageBodyText. It included inline code snippets. More readable, but every step
comes from a single file’s local scope.
The Eidon condition traced the same chain plus the runtime paths the control never reached.
It identified that Hilt @IntoSet multibindings in AgentChatTaskModule.kt,
MobileActionsModule.kt, and TinyGardenTaskModule.kt invert the
dependency graph at runtime. A structural property invisible in any single file. It found
the agent path divergence: ActionChannel → CompletableDeferred →
GalleryWebView → JavaScript bridge → window['ai_edge_gallery_get_result']
→ JavascriptInterface. It reported structural position metrics (PageRank,
in-degree, community role) for every file in the trace. It mapped the CleanUpListener
lifecycle and token streaming via ResultListener.onResult(partial, done, thinkingText).
All from 2 Eidon MCP calls. No file opened.
The control’s 10 steps and Eidon’s 9 steps cover the same UI chain. The difference is not step count. It is that Eidon also saw the agent fork, the JS bridge path, the Hilt runtime inversion, and the structural metrics that quantify why each file matters in the dependency graph. That is what whole-repository context makes visible.
Question 3
Single Point of Failure
Both runs were asked: which one file, if it disappeared, would cause the greatest architectural damage across the entire system? The interesting outcome is not a clean right-versus-wrong split. It is that the two runs converged on two different kinds of catastrophic importance.
Two valid answers with different origins. The control found the contract from file inspection. Eidon isolated the structural amplifier from graph metrics: 57 imports, highest weighted PageRank, and a 7-step destruction chain it mapped quantitatively. That quantification is the difference. Both answers are real. Eidon’s is measured.
The AI with Eidon was forbidden from opening files. It still found 4.5× more bugs, every security vulnerability, and traced paths the file browser never reached. This is what happens when the AI knows the entire codebase before the first question.
Question 4
Three Deepest Architectural Insights
With Eidon
Runtime layer violation. LlmChatModelHelper lives in
ui/llmchat/, a UI package. The entire inference runtime depends on a UI-layer
class. ModelHelperExt.kt uses if instead of exhaustive
when for RuntimeType dispatch, silently routing unknown accelerators
(including NPU) to the wrong handler.
Hidden ANR vector. AgentTools.kt calls runBlocking
inside LiteRT's synchronous ToolSet callback, blocking the inference thread.
AgentChatTaskModule.kt lines 76-80 take a non-atomic skill snapshot that can
race with concurrent skill loading. Both are invisible from any single file.
Systemic proto migration gap. settings.proto deprecates
access_token_data in a comment. UserData.secrets exists as the
supposed replacement. All 5 serializer files (SettingsSerializer.kt,
UserDataSerializer.kt, BenchmarkResultsSerializer.kt,
CutoutsSerializer.kt, SkillsSerializer.kt) have identical
getDefaultInstance() with zero migration logic. HuggingFace OAuth tokens
silently lost on upgrade. Seeing all 5 simultaneously confirms this is systemic, not an
oversight in a single file.
Without Eidon
Inference thread blocking via CompletableDeferred. Correctly identified
that the CompletableDeferred bridge in AgentTools.kt connects
synchronous JavaScript calls to Kotlin coroutines, creating a thread-blocking risk.
Real finding. Correctly traced through Types.kt and
AgentChatScreen.
Engine/Conversation split. Identified the boundary between
LlmChatModelHelper.kt (engine) and TinyGardenViewModel.kt
(conversation) as a load-bearing architectural boundary. Valid observation.
model.instance: Any? parallel state system. Identified the
untyped instance field running a parallel mutable state system underneath the reactive
Compose pipeline. Correct. But surface-level compared to the layer violation and proto
migration gap that require seeing dozens of files simultaneously.
The file-browsing control found real insights. None are wrong. The difference is structural depth: the Eidon agent identified problems that span multiple files, packages, and architectural layers. A runtime class in a UI package. A deadlock hiding inside a DI callback chain. A migration gap confirmed across 5 serializer files. These patterns are invisible when you read files one at a time, no matter how carefully.
Question 5
Bug Detection and Security Audit
The Eidon condition found 58 bugs. The file-browsing control found 13. Every one of the 13 is a legitimate, confirmed finding. Zero false positives on either side. The gap is not competence. It is coverage. The 45 bugs the control run missed are not in any file it read.
58 bugs across 9 categories
What the AI found without Eidon
13 bugs. All legitimate. Zero false positives. To give a fair picture:
model.instance race condition between Model.kt and ModelManagerViewModel.ktExperimentalFlags global mutable state in LlmChatModelHelper.ktDataStoreRepository runBlocking deadlock riskobserverWorkerProgress LiveData leak in DownloadRepository.ktAICoreModelHelper.initialize onDone strings comparisonBenchmarkViewModel deleteBenchmarkResult index driftDownloadWorker channelCreated static flag raceGalleryLifecycleProvider isAppInForeground not volatileModelManagerViewModel addImportedLlmModel concurrent modificationAICoreModelHelper unlimited prompt growthSkillManagerViewModel.loadSkills race conditionThe 45-bug gap is not about competence. The control never opened the files where those bugs live. The whole-repository chains surfaced earlier and more completely in the Eidon run because the Eidon run started from the whole repository.
Security
9 security issues surfaced in the Eidon run. The control run surfaced none.
The file-browsing control surfaced zero security vulnerabilities in this transcript. Not because the agent was weak, but because these issues sit in chains that span types, tools, WebView permissions, persistence code, and logging. They are much easier to miss when context is assembled sequentially instead of carried as a whole-system state.
| File | Vulnerability | Severity |
|---|---|---|
| GalleryWebView.kt | allowFileAccess=true with WebViewAssetLoader already present. Path traversal attack surface is open and unnecessary. |
Critical |
| GalleryWebView.kt | Auto-grants persistent storage permissions to any URL loaded in WebView, without validation. | High |
| FcmMessagingService.kt | Raw deeplink from FCM push notification dispatched as Intent without validation. Deeplink injection via notification. | High |
| FcmMessagingService.kt | InputStream from FCM payload never closed. Resource leak on every notification. | Medium |
| IntentHandler.kt | PII (email, phone, SMS body) logged raw to Logcat on lines 53, 57, 69, 73. Readable by apps with READ_LOGS. | High |
| MobileActionsTools.kt | Full contact name, email, phone, and email body logged on lines 51-54, 77. Same READ_LOGS exposure. | High |
| Types.kt | CallJsAgentAction.secret as plain String on an open class. Subclassable, accessible from any context. |
Medium |
| AgentTools.kt | Skill name unsanitized before use as URL parameter fed into WebView via runJs(). Skill URL injection. |
High |
| settings.proto + 5 serializers | access_token_data deprecated. Migration never implemented. HuggingFace OAuth token silently lost on app upgrade. |
High |
Question 6
What full-repository access made visible
This question was asked only in the Eidon run. There is no control counterpart. The AI
was given only eidon_encoding: the compressed structural graph
alone. Not the guide. Not the source files. Just 31,621 tokens of structural state. From that
single input, it produced every discovery below.
blast_radius=0 readout was a DI artifact, not low coupling.blast_radius=0 across the graph
was not a repository full of isolated files. It was a Hilt DI inversion artifact.
@IntoSet multibindings in AgentChatTaskModule.kt,
MobileActionsModule.kt, and TinyGardenTaskModule.kt invert the
dependency graph at runtime, which a static import graph cannot recover cleanly.
Types.kt adds Accelerator.NPU. ModelHelperExt.kt
still uses if (runtimeType == RuntimeType.AICORE) with an implicit else routing
everything else to LlmChatModelHelper. LlmChatModelHelper.kt
itself lives in ui/llmchat/ instead of a runtime package. Any one of those files
looks merely odd. Seeing all three together makes the regression concrete.
Types.kt defines CallJsAgentAction with secret as a
plain String. AgentTools.kt passes it through the action channel.
IntentHandler.kt logs raw parameter JSON before dispatch.
MobileActionsTools.kt logs full contact PII and email body. No single file view
reveals the complete path from secret storage to log exposure.
settings.proto deprecates access_token_data.
UserData.secrets is the supposed replacement. The five serializers
SettingsSerializer.kt, UserDataSerializer.kt,
BenchmarkResultsSerializer.kt, CutoutsSerializer.kt, and
SkillsSerializer.kt all keep the same getDefaultInstance()
pattern with no migration logic. That is not an isolated omission. It is a full-layer gap.
0.0141 with 293 disconnected components revealed
a core+satellite topology: 9 major Kotlin clusters carry all the work. Everything else
(audio files, skill HTML, git metadata, binary assets) is structurally isolated. All
meaningful change latency lives inside those 9 clusters. All security-relevant paths run
through Cluster_79 and Cluster_89. This is not something you derive from browsing files. It
emerges from the full graph eigenvalue decomposition.
AgentTools.kt ANR chain requires connecting three files across three layers.AgentChatTaskModule.kt creates AgentTools and registers it as a
ToolSet. AgentTools.kt uses runBlocking to bridge
LLM function calls to WebView JS. LlmChatModelHelper.kt and
AICoreModelHelper.kt call ToolSet handlers synchronously on the inference
thread. Connecting those three facts gives you the ANR: the inference thread blocks waiting
for a CompletableDeferred resolved by WebView JavaScript. Each fact lives in a
different file. The encoding delivered all three simultaneously.
delay() as a state machine substitute.TextInputHistorySheet.kt: 100 ms dismiss delay, 400 ms delete delay.
PromoScreenGm4: 5-second dismiss timer.
HoldToDictateViewModel.kt: 500 ms stop delay. Seen individually, each is a
workaround. Seen across the full repository, the pattern reveals a codebase using arbitrary
delay() calls instead of state machines. The race condition count scales with
the number of timing-sensitive paths, not with how conspicuous any single one looks.
The Central Finding
The AI that knew everything found everything. The AI that browsed found what it happened to open.
58 bugs versus 13. 9 security vulnerabilities versus 0. Full repository coverage versus 4.8%. In fewer operations, with zero file browsing, working from compressed structural state alone.
The control was not bad. Its 13 bugs are all real. Its architecture answers are verifiable. But it was reasoning from a keyhole. 95.2% of the codebase was invisible. Every security finding, every division-by-zero, every layer violation, every proto migration gap lived in that invisible 95.2%.
This study handicapped Eidon on purpose. No file browsing. No folder exploration. Just the compressed structural graph. In normal usage, an AI has both: full-repository structural knowledge and direct file access. The results on this page are the floor, not the ceiling.
Reproducibility
The repository is public. Every finding is real.
We published every line number, every commit hash, and every file name against a live, actively maintained Google codebase. Google's own engineers can open that repository right now and verify every single finding independently. That is not a confidence claim. That is an open invitation.
Every finding in this benchmark references real files, real line numbers, and real commit
hashes present in the public repository as of April 18, 2026. The NPU dispatch regression
references commit [533855e8]. The PII logging findings reference exact line
numbers in IntentHandler.kt and MobileActionsTools.kt. The proto
migration gap references all 5 serializer files by name. The paired answer record above is
reproduced from the test transcript, and the full exact transcript is published at
/study/transcript. Both runs executed in isolated sessions with no
cross-contamination.
Your AI is working from 5% of your codebase. Fix that.
Eidon analyzes your entire repository once. Every AI session after that starts from full structural knowledge. Local-first. Your code never leaves your machine.