0 files opened. 100% coverage. 4.5× more bugs.

The Eidon condition was handicapped by design. It could not see a single file. And it still won every question. In normal usage, an AI would have both the compressed structural knowledge and direct file access. These results are the floor, not the ceiling.

The Setup

What changed, and what did not.

Both runs used VS Code, GitHub Copilot, and Claude Sonnet 4.6 on the same public repository. Questions 1 through 5 were identical. The only changed variable was whether the assistant had Eidon.

Without Eidon

Open files. Search. Trace call chains manually. Reconstruct repository context before answering.

With Eidon

Use Eidon MCP outputs only. No file access. Begin from structured whole-repository context.

Q1 to Q5 were shared. Q6 was Eidon-only. Q1 used only eidon_guide and eidon_encoding. Q2 to Q5 used the broader Eidon surface. Q6 used only eidon_encoding.

The shared benchmark is about Questions 1 through 5. Question 6 is separate and interpretive. It asks what the model could see only because full-repository state was already available.

The Compression

2,719,406 tokens.
That is roughly 10,000 pages of code.
Compressed to 31,621. One AI prompt.

Every file. Every function. Every cross-file dependency. Restructured as a mathematical graph. Not summarized. Not truncated. Not approximated. The same information in 86× less space. Every finding on this page came from that 31,621-token output.

One analysis. Full coverage. Incremental updates.

The analysis that produced these results ran once, before any question was asked. If the repository changes, only the modified files need reanalysis. The structural graph updates incrementally. The 31,621-token output reflects the entire codebase at the time of the test.

The Results

Eidon won every question.

Across all five shared questions, the Eidon condition produced deeper analysis, broader coverage, and more cross-file reasoning. The control was competent. It found real bugs, traced real code paths, and identified real architectural patterns. But it was working from 4.8% of the repository. The 95.2% it never opened contained the security vulnerabilities, the layer violations, the proto migration gaps, and the division-by-zero crashes.

Primary-Source Record

The exact paired answers.

The benchmark summary above is grounded in the paired transcript itself. For readers who want primary-source evidence, the excerpts below reproduce representative passages from both runs, condensed for readability. They are shown side by side so the difference in coverage, abstraction level, and causal accuracy can be inspected directly. The full unedited transcript is linked below.

Read the full exact transcript Watch the side-by-side test

Q1 Architecture Paired excerpts

With Eidon

This is AI Edge Gallery, a native Android application (Kotlin + Jetpack Compose) that showcases on-device LLM inference using Google's LiteRT LM (`com.google.ai.edge.litertlm`) and AI Core libraries. The graph has 1361 files, 6933 nodes, 16925 edges across 305 communities. Structurally, Eidon resolves 12 major Kotlin clusters and ~290 trivial singleton clusters (assets, git files, skill resources).

Without Eidon

The app is a single-module Android application built with Kotlin + Jetpack Compose, wired together with Hilt for DI. Its architecture follows a clean MVVM pattern layered like this: UI (Compose screens) ↓ ViewModels (Hilt, StateFlow) ↓ Runtime layer (LlmModelHelper interface) ↓ LiteRT-LM / AICore backends (on-device inference) ↓ Data layer (DataStoreRepository, DownloadRepository, WorkManager)

Q2 Code Path Paired excerpts

With Eidon

`LlmChatScreen.kt` hosts `ChatPanel.kt` (out-degree=81). `ChatPanel` embeds two sub-composables: - `ChatView.kt` (`TAG = "AGChatView"`, out-degree=51) - the scrolling message list - `MessageInputText.kt` (`TAG = "AGMessageInputText"`, out-degree=121, the graph's single highest-degree node) `ModelHelperExt.kt` maps the `Model` reference to the right `LlmModelHelper` implementation: `AICoreModelHelper(...)` when `runtimeType == RuntimeType.AICORE`, otherwise `LlmChatModelHelper(...)`.

Without Eidon

Here is the complete, file-by-file trace from user tap to rendered token on screen. The example is the standard AI Chat task (task id `llm_chat`). Every file and function name below is real. Step 1 - The send button is tapped. `MessageInputText.kt` builds a `List` with `createMessagesToSend(...)`, `ChatPanel.kt` passes it upward, and `ChatView.kt` routes it into `viewModel.generateResponse(...)`.

Q3 Single point of failure Paired excerpts

With Eidon

`ModelManagerViewModel.kt` is the only file that simultaneously imports all six architectural layers: Domain, Data/persistence, Protobuf state, Runtime, Auth, and Network. Removing it severs all six layers simultaneously. Every other file belongs to one or two of these layers; this one is the only bridge across all of them.

Without Eidon

Removing this single 3-method interface destroys more architectural layers simultaneously than any other file in the repository. Every task module (`LlmChatTaskModule`, `LlmSingleTurnTaskModule`, `AgentChatTaskModule`, `MobileActionsModule`, `TinyGardenTaskModule`, `ExampleCustomTaskModule`) uses the same pattern: `@Provides @IntoSet fun provideXxxTask(): CustomTask = XxxTask()`.

Q4 Architectural insights Paired excerpts

With Eidon

The runtime layer has a permanent layer-boundary violation - `LlmChatModelHelper` lives in the wrong package and the entire dispatch architecture depends on that mistake. `ModelHelperExt.kt` uses an `if` instead of an exhaustive `when` for `RuntimeType` dispatch, silently routing unknown accelerators (including NPU) to the wrong handler.

Without Eidon

The LLM inference thread is literally a blocking coroutine, suspended by the UI via `CompletableDeferred`. In `agentchat/AgentTools.kt`, the `@Tool`-annotated `runJs()` runs inside `runBlocking(Dispatchers.Default)`, builds a `CallJsAgentAction` with a `CompletableDeferred`, and immediately calls `action.result.await()`.

Q5 Bug and risk detection Paired excerpts

With Eidon

Here is the complete catalog: 1. `AgentChatTaskModule.kt` lines 76-80 - non-atomic skill snapshot 2. `AgentChatTaskModule.kt` lines 68-84 - `loadSkills{}` has no timeout and no fallback 3. `AgentTools.kt` - `runBlocking` inside a LiteRT `ToolSet` synchronous callback

Without Eidon

Here is a complete analysis of every significant bug, race condition, silent failure, and scalability problem, organized by severity. 1. Race condition: `model.instance` and `model.initializing` are unprotected mutable state written from multiple coroutines. 2. `ExperimentalFlags.enableConversationConstrainedDecoding` is a global mutable boolean shared across all task initializations. 3. `DataStoreRepository` calls `runBlocking` from a coroutine context, which can cause deadlock on a single-threaded dispatcher.

Q6 What full-repository access made visible Eidon-only follow-up

This prompt was asked only in the Eidon run. There is no control answer for it in the source record.

With Eidon

Only seeing all 1,361 files simultaneously reveals that blast_radius=0 across the graph is a Hilt DI inversion artifact, not a sign of low coupling. Seeing Types.kt, ModelHelperExt.kt, and LlmChatModelHelper.kt together reveals the NPU dispatch failure. Seeing settings.proto plus all five serializers proves the token migration gap is systemic, not an isolated oversight.

Question 1

Architecture Overview

Both runs were asked to provide a complete architectural overview of the repository: modules, data flow, dependencies, and non-obvious insights. The file-browsing control spent 63 operations before its first full answer. It opened directories, enumerated Kotlin files, read key runtime and UI modules, and eventually produced a serious architectural summary. This is important: the control was not shallow. It was expensive.

For this first question, the Eidon condition made 2 structured Eidon calls: eidon_guide and eidon_encoding. In return it received coverage of all 1,361 files as a mathematical graph: 6,933 nodes, 16,925 edges, 305 community structures. It produced exact eigenvector centrality rankings, cluster structure, and graph-level properties such as the Fiedler spectral gap. That is the difference in kind: the Eidon run began from a whole-system object instead of assembling the system out of fragments.

These are not the kinds of properties manual browsing naturally surfaces early. The control can accumulate local truth by reading more files, but whole-graph quantities only appear when the repository is already carried as a graph. The advantage here is not speed alone. It is the starting representation.

Question 2

Code Path Tracing

Both runs were asked to trace the complete path from a user tapping "Send" to the final model response appearing on screen. The file-browsing control produced a correct surface-level trace: MessageInputText → ChatPanel → ChatView → LlmChatViewModel → LlmChatModelHelper → LiteRT-LM → MessageCallback → StateFlow → recomposition → MessageBodyText. It included inline code snippets. More readable, but every step comes from a single file’s local scope.

The Eidon condition traced the same chain plus the runtime paths the control never reached. It identified that Hilt @IntoSet multibindings in AgentChatTaskModule.kt, MobileActionsModule.kt, and TinyGardenTaskModule.kt invert the dependency graph at runtime. A structural property invisible in any single file. It found the agent path divergence: ActionChannel → CompletableDeferred → GalleryWebView → JavaScript bridge → window['ai_edge_gallery_get_result'] → JavascriptInterface. It reported structural position metrics (PageRank, in-degree, community role) for every file in the trace. It mapped the CleanUpListener lifecycle and token streaming via ResultListener.onResult(partial, done, thinkingText). All from 2 Eidon MCP calls. No file opened.

The control’s 10 steps and Eidon’s 9 steps cover the same UI chain. The difference is not step count. It is that Eidon also saw the agent fork, the JS bridge path, the Hilt runtime inversion, and the structural metrics that quantify why each file matters in the dependency graph. That is what whole-repository context makes visible.

Question 3

Single Point of Failure

Both runs were asked: which one file, if it disappeared, would cause the greatest architectural damage across the entire system? The interesting outcome is not a clean right-versus-wrong split. It is that the two runs converged on two different kinds of catastrophic importance.

Without Eidon

CustomTask.kt

The control converged on the interface contract that fans out into every task module. This is a legitimate high-damage answer. It reflects the place where the plugin architecture is declared and where the task system is bound together conceptually.

With Eidon

ModelManagerViewModel.kt

The Eidon run converged on the runtime amplifier: the file where the injected task set, model lifecycle, auth flow, and navigation wiring all concentrate. This is the execution hub that turns the interface contract into live system behavior.

Two valid answers with different origins. The control found the contract from file inspection. Eidon isolated the structural amplifier from graph metrics: 57 imports, highest weighted PageRank, and a 7-step destruction chain it mapped quantitatively. That quantification is the difference. Both answers are real. Eidon’s is measured.

The AI with Eidon was forbidden from opening files. It still found 4.5× more bugs, every security vulnerability, and traced paths the file browser never reached. This is what happens when the AI knows the entire codebase before the first question.

Question 4

Three Deepest Architectural Insights

With Eidon

Runtime layer violation. LlmChatModelHelper lives in ui/llmchat/, a UI package. The entire inference runtime depends on a UI-layer class. ModelHelperExt.kt uses if instead of exhaustive when for RuntimeType dispatch, silently routing unknown accelerators (including NPU) to the wrong handler.

Hidden ANR vector. AgentTools.kt calls runBlocking inside LiteRT's synchronous ToolSet callback, blocking the inference thread. AgentChatTaskModule.kt lines 76-80 take a non-atomic skill snapshot that can race with concurrent skill loading. Both are invisible from any single file.

Systemic proto migration gap. settings.proto deprecates access_token_data in a comment. UserData.secrets exists as the supposed replacement. All 5 serializer files (SettingsSerializer.kt, UserDataSerializer.kt, BenchmarkResultsSerializer.kt, CutoutsSerializer.kt, SkillsSerializer.kt) have identical getDefaultInstance() with zero migration logic. HuggingFace OAuth tokens silently lost on upgrade. Seeing all 5 simultaneously confirms this is systemic, not an oversight in a single file.

Without Eidon

Inference thread blocking via CompletableDeferred. Correctly identified that the CompletableDeferred bridge in AgentTools.kt connects synchronous JavaScript calls to Kotlin coroutines, creating a thread-blocking risk. Real finding. Correctly traced through Types.kt and AgentChatScreen.

Engine/Conversation split. Identified the boundary between LlmChatModelHelper.kt (engine) and TinyGardenViewModel.kt (conversation) as a load-bearing architectural boundary. Valid observation.

model.instance: Any? parallel state system. Identified the untyped instance field running a parallel mutable state system underneath the reactive Compose pipeline. Correct. But surface-level compared to the layer violation and proto migration gap that require seeing dozens of files simultaneously.

The file-browsing control found real insights. None are wrong. The difference is structural depth: the Eidon agent identified problems that span multiple files, packages, and architectural layers. A runtime class in a UI package. A deadlock hiding inside a DI callback chain. A migration gap confirmed across 5 serializer files. These patterns are invisible when you read files one at a time, no matter how carefully.

Question 5

Bug Detection and Security Audit

The Eidon condition found 58 bugs. The file-browsing control found 13. Every one of the 13 is a legitimate, confirmed finding. Zero false positives on either side. The gap is not competence. It is coverage. The 45 bugs the control run missed are not in any file it read.

58 bugs across 9 categories

11 Race conditions & async AgentChatTaskModule non-atomic skill snapshot, loadSkills no timeout, GlitteringShapesLoader curId, HoldToDictateViewModel cancelSpeechRecognition, BenchmarkValueSeriesViewer awaitPointerEvent

10 Silent failures & deadlocks AgentTools ANR, CustomTask empty string sentinel, MemoryWarning false, GalleryAppTopBar else->{}, ModelDownloadStatusInfoPanel hard-coded statuses, DeleteModelButton no revalidation

9 Security vulnerabilities WebView path traversal, persistent storage auto-grant, FCM deeplink injection, PII logging chain, skill URL injection, proto token data loss

7 NPE & IndexOutOfBounds MessageBodyImage parallel lists, ModelNameAndStatus task.models[0], ModelPicker force-unwrap, TaskIcon force-unwrap, TaskIcon negative index

6 Scalability & performance GlitteringShapesLoader infinite loop, MessageBodyConfigUpdate O(N²), MessageBodyPromptTemplates LazyColumn height, text-spinner CPU-forced MediaPipe

6 Division by zero & NaN MessageBodyBenchmark maxCount, BenchmarkValueSeriesViewer range, ModelDownloadingAnimation totalBytes, VerticalSplitView columnHeightDp, ColorUtils palette size

4 Config & schema bugs AppModule 5 identical providers, Consts MS vs SECONDS, ImportedModel no checksum, SecretEditorDialog hardcoded English

4 Wrong deps & layer violations LlmChatModelHelper wrong package, ModelHelperExt if-not-when, AppModule manual GalleryLifecycleProvider, AppModule unqualified DataStore

2 Resource leaks HoldToDictateViewModel SpeechRecognizer not destroyed, FcmMessagingService InputStream not closed

What the AI found without Eidon

13 bugs. All legitimate. Zero false positives. To give a fair picture:

1model.instance race condition between Model.kt and ModelManagerViewModel.kt

2ExperimentalFlags global mutable state in LlmChatModelHelper.kt

3DataStoreRepository runBlocking deadlock risk

4observerWorkerProgress LiveData leak in DownloadRepository.kt

5ChatViewModel state mutation not thread-safe

6AICoreModelHelper.initialize onDone strings comparison

7BenchmarkViewModel deleteBenchmarkResult index drift

8DownloadWorker channelCreated static flag race

9AgentChatScreen JS interface singleton

10GalleryLifecycleProvider isAppInForeground not volatile

11ModelManagerViewModel addImportedLlmModel concurrent modification

12AICoreModelHelper unlimited prompt growth

13SkillManagerViewModel.loadSkills race condition

The 45-bug gap is not about competence. The control never opened the files where those bugs live. The whole-repository chains surfaced earlier and more completely in the Eidon run because the Eidon run started from the whole repository.

Security

9 security issues surfaced in the Eidon run. The control run surfaced none.

The file-browsing control surfaced zero security vulnerabilities in this transcript. Not because the agent was weak, but because these issues sit in chains that span types, tools, WebView permissions, persistence code, and logging. They are much easier to miss when context is assembled sequentially instead of carried as a whole-system state.

File	Vulnerability	Severity
GalleryWebView.kt	`allowFileAccess=true` with `WebViewAssetLoader` already present. Path traversal attack surface is open and unnecessary.	Critical
GalleryWebView.kt	Auto-grants persistent storage permissions to any URL loaded in WebView, without validation.	High
FcmMessagingService.kt	Raw deeplink from FCM push notification dispatched as Intent without validation. Deeplink injection via notification.	High
FcmMessagingService.kt	InputStream from FCM payload never closed. Resource leak on every notification.	Medium
IntentHandler.kt	PII (email, phone, SMS body) logged raw to Logcat on lines 53, 57, 69, 73. Readable by apps with READ_LOGS.	High
MobileActionsTools.kt	Full contact name, email, phone, and email body logged on lines 51-54, 77. Same READ_LOGS exposure.	High
Types.kt	`CallJsAgentAction.secret` as plain `String` on an `open class`. Subclassable, accessible from any context.	Medium
AgentTools.kt	Skill name unsanitized before use as URL parameter fed into WebView via `runJs()`. Skill URL injection.	High
settings.proto + 5 serializers	`access_token_data` deprecated. Migration never implemented. HuggingFace OAuth token silently lost on app upgrade.	High

Question 6

What full-repository access made visible

This question was asked only in the Eidon run. There is no control counterpart. The AI was given only eidon_encoding: the compressed structural graph alone. Not the guide. Not the source files. Just 31,621 tokens of structural state. From that single input, it produced every discovery below.

Discovery 01

The universal blast_radius=0 readout was a DI artifact, not low coupling.

Seeing every file at once made it clear that blast_radius=0 across the graph was not a repository full of isolated files. It was a Hilt DI inversion artifact. @IntoSet multibindings in AgentChatTaskModule.kt, MobileActionsModule.kt, and TinyGardenTaskModule.kt invert the dependency graph at runtime, which a static import graph cannot recover cleanly.

Discovery 02

The NPU dispatch regression only appears when three files are read together.

Types.kt adds Accelerator.NPU. ModelHelperExt.kt still uses if (runtimeType == RuntimeType.AICORE) with an implicit else routing everything else to LlmChatModelHelper. LlmChatModelHelper.kt itself lives in ui/llmchat/ instead of a runtime package. Any one of those files looks merely odd. Seeing all three together makes the regression concrete.

Discovery 03

The PII logging chain crosses four files in three packages.

Types.kt defines CallJsAgentAction with secret as a plain String. AgentTools.kt passes it through the action channel. IntentHandler.kt logs raw parameter JSON before dispatch. MobileActionsTools.kt logs full contact PII and email body. No single file view reveals the complete path from secret storage to log exposure.

Discovery 04

The token migration gap is systemic across the persistence layer.

settings.proto deprecates access_token_data. UserData.secrets is the supposed replacement. The five serializers SettingsSerializer.kt, UserDataSerializer.kt, BenchmarkResultsSerializer.kt, CutoutsSerializer.kt, and SkillsSerializer.kt all keep the same getDefaultInstance() pattern with no migration logic. That is not an isolated omission. It is a full-layer gap.

Discovery 05

The repository's spectral gap is nearly zero. It is one cut away from splitting in half.

The Fiedler spectral gap of 0.0141 with 293 disconnected components revealed a core+satellite topology: 9 major Kotlin clusters carry all the work. Everything else (audio files, skill HTML, git metadata, binary assets) is structurally isolated. All meaningful change latency lives inside those 9 clusters. All security-relevant paths run through Cluster_79 and Cluster_89. This is not something you derive from browsing files. It emerges from the full graph eigenvalue decomposition.

Discovery 06

The AgentTools.kt ANR chain requires connecting three files across three layers.

AgentChatTaskModule.kt creates AgentTools and registers it as a ToolSet. AgentTools.kt uses runBlocking to bridge LLM function calls to WebView JS. LlmChatModelHelper.kt and AICoreModelHelper.kt call ToolSet handlers synchronously on the inference thread. Connecting those three facts gives you the ANR: the inference thread blocks waiting for a CompletableDeferred resolved by WebView JavaScript. Each fact lives in a different file. The encoding delivered all three simultaneously.

Discovery 07

Individual timing delays became a pattern. The codebase uses delay() as a state machine substitute.

TextInputHistorySheet.kt: 100 ms dismiss delay, 400 ms delete delay. PromoScreenGm4: 5-second dismiss timer. HoldToDictateViewModel.kt: 500 ms stop delay. Seen individually, each is a workaround. Seen across the full repository, the pattern reveals a codebase using arbitrary delay() calls instead of state machines. The race condition count scales with the number of timing-sensitive paths, not with how conspicuous any single one looks.

The Central Finding

The AI that knew everything found everything. The AI that browsed found what it happened to open.

58 bugs versus 13. 9 security vulnerabilities versus 0. Full repository coverage versus 4.8%. In fewer operations, with zero file browsing, working from compressed structural state alone.

The control was not bad. Its 13 bugs are all real. Its architecture answers are verifiable. But it was reasoning from a keyhole. 95.2% of the codebase was invisible. Every security finding, every division-by-zero, every layer violation, every proto migration gap lived in that invisible 95.2%.

This study handicapped Eidon on purpose. No file browsing. No folder exploration. Just the compressed structural graph. In normal usage, an AI has both: full-repository structural knowledge and direct file access. The results on this page are the floor, not the ceiling.

Reproducibility

The repository is public. Every finding is real.

We published every line number, every commit hash, and every file name against a live, actively maintained Google codebase. Google's own engineers can open that repository right now and verify every single finding independently. That is not a confidence claim. That is an open invitation.

Every finding in this benchmark references real files, real line numbers, and real commit hashes present in the public repository as of April 18, 2026. The NPU dispatch regression references commit [533855e8]. The PII logging findings reference exact line numbers in IntentHandler.kt and MobileActionsTools.kt. The proto migration gap references all 5 serializer files by name. The paired answer record above is reproduced from the test transcript, and the full exact transcript is published at /study/transcript. Both runs executed in isolated sessions with no cross-contamination.

Your AI is working from 5% of your codebase. Fix that.

Eidon analyzes your entire repository once. Every AI session after that starts from full structural knowledge. Local-first. Your code never leaves your machine.

Get early access Read the docs

We Gave the Same AI the Same Code. One Had Eidon. One Didn't.

Eidon compressed 2.7 million tokens of code into 31,621. Nothing lost. Then the AI found 4.5× more bugs.

What changed, and what did not.

2,719,406 tokens.
That is roughly 10,000 pages of code.
Compressed to 31,621. One AI prompt.

Eidon won every question.

The exact paired answers.

Architecture Overview

Code Path Tracing

Single Point of Failure

Three Deepest Architectural Insights

With Eidon

Without Eidon

Bug Detection and Security Audit

58 bugs across 9 categories

What the AI found without Eidon

9 security issues surfaced in the Eidon run. The control run surfaced none.

What full-repository access made visible

The AI that knew everything found everything. The AI that browsed found what it happened to open.

The repository is public. Every finding is real.

Your AI is working from 5% of your codebase. Fix that.

Eidon compressed 2.7 million tokens of code into 31,621. Nothing lost. Then the AI found 4.5× more bugs.

What changed, and what did not.

2,719,406 tokens.That is roughly 10,000 pages of code.Compressed to 31,621. One AI prompt.

Eidon won every question.

The exact paired answers.

Architecture Overview

Code Path Tracing

Single Point of Failure

Three Deepest Architectural Insights

With Eidon

Without Eidon

Bug Detection and Security Audit

58 bugs across 9 categories

What the AI found without Eidon

9 security issues surfaced in the Eidon run. The control run surfaced none.

What full-repository access made visible

The AI that knew everything found everything. The AI that browsed found what it happened to open.

The repository is public. Every finding is real.

Your AI is working from 5% of your codebase. Fix that.

2,719,406 tokens.
That is roughly 10,000 pages of code.
Compressed to 31,621. One AI prompt.