The Rearview Mirror Problem for CTI and Security AI

Looking Backward While Threats Move Forward

Security leaders are under pressure to deploy AI and Machine Learning capabilities into their environments, and the promise is real: autonomous systems that investigate alerts, hunt threats, and respond at machine speed. But beneath the momentum lies a foundational question most organizations haven’t asked themselves: what intelligence model is producing the data we are actually grounding these systems on?

For the majority of security teams, the answer is Cyber Threat Intelligence — largely produced by commercial feeds, ISAC sharing, STIX/TAXII pipelines, alert data, and enriched IOC databases. It’s what many of their SIEMs were built around, so it becomes the default data diet for their agentic AI systems as well. The problem with this is that CTI was never designed to serve as the foundational layer for developing datasets that support autonomous AI learning and reasoning. Rather, it was designed for human analysts as consumers, who can now work with known threat patterns. Inheriting it uncritically as the substrate for agentic AI produces systems that are fast and semi-scalable, but structurally blind in ways organizations may not understand until something goes wrong.

By contrast, Internet Intelligence offers a different foundation, built not on the record of past threats, but on continuous, comprehensive observation of internet traffic and behaviors as it exists and changes in real-time. The difference between these two foundations is not a matter of degree. It is a matter of kind, which we’ll break down and discuss in this blog post.

CTI Is Largely a Forensic Discipline — and That’s a Problem for Agentic AI Systems.

As a disclaimer: I have the utmost respect for my friends and colleagues working in Cyber Threat Intelligence (CTI). It’s an incredible discipline and the live knowledge base its analysts are required to maintain mentally proved itself as the cornerstone of many of the structured and unstructured threat hunts, purple team and security engineering engagements I have been a part of over the years. There is no replacing these professionals, and thus this article does not seek to downplay their importance to the cybersecurity ecosystem, but rather highlight some issues with relying on their output to create Agentic AI datasets.

At a high level, CTI is generated from either confirmed threat activity or by maintaining some presence within areas of known activity. Malware is often captured and reverse-engineered. A phishing campaign is detected and its infrastructure identified and documented. A breach concludes and its findings are codified into indicators and TTPs ultimately resulting in attack attribution. Few can argue that this is not valuable analytical work — but all of it begins with a confirmed event, and by the time it is analyzed, is often already in recent history.

This forensic character produces structural limitations that compound significantly when CTI becomes AI training data.

The confirmed-event ceiling. CTI can only contain what has been observed and eventually attributed. Every threat actor without a detected campaign, every piece of infrastructure not yet used maliciously, every novel technique not yet documented — none of these exist in CTI knowledge bases because the target volume is simply too high. A human analyst knows this and compensates, hence their job security in a “put AI in everything” job economy. However, an AI system trained on CTI doesn’t know what it doesn’t know, and has no mechanism to reason about it.

Indicator decay. We’ve observed malicious IPs to rotate in hours, phishing domains go dark in under 24 hours, and C2 infrastructure migrates constantly. Even well-maintained CTI pipelines race against the decay of their own data, and AI & ML models trained on it inherit its temporal fragility — developing strong pattern recognition for infrastructure configurations adversaries have already abandoned.

Vendor sampling bias. Every CTI and alert dataset reflects the sensor coverage and analytical bias of the organizations that produced it. The result is systematic underrepresentation of geographies, hosting environments, and attack categories that fall outside the originating vendors’ visibility. When these datasets train ML models, coverage gaps become gaps in the model or agent’s understanding.

Flat indicators in a relational world. Internet infrastructure is inherently relational — domains share IPs, certificates bind assets across providers, registration patterns repeat across campaigns. CTI captures point-in-time indicators but rarely preserves the relational graph those indicators exist within. An ML model trained on flat indicator lists develops pattern-matching capability, not infrastructure reasoning capability. The difference is decisive when threats are novel.

The cumulative effect of the above is that CTI-trained AI and ML systems are retrospective by design. Well-calibrated for the threat landscape that existed when their training data was collected, but poorly calibrated for the threats we face today and nearly blind to what’s taking shape for tomorrow.

Internet Intelligence: Proactive Visibility by Design

Internet Intelligence (II) begins from a different premise than CTI and alert data entirely. Rather than starting from confirmed threat activity and working backward, II starts from the internet traffic and connected infrastructure itself — observing it continuously, comprehensively, and without waiting for a threat to announce itself.

The raw material includes passive DNS data, BGP routing tables, TLS certificate issuance and history, WHOIS and registration records, service fingerprinting across the accessible address space, and the relational graphs connecting all of these and many others together. The result is something CTI cannot provide: a continuous, high-fidelity map of what the internet actually is — its infrastructure, topology, users and their actions, and how each is changing in near real time.

Pre-attack infrastructure visibility. Threat actors stage and begin to mobilize their infrastructure in the same ways a modern military or task force might. They register domains weeks or months before use, provision servers, and configure services long before a campaign begins. Internet Intelligence is more likely to observe this staging behavior because it scans (or sweeps) the entirety of the internet continuously — not just what has already been flagged malicious. This is II’s most significant operational advantage, as it enables organizations to witness malicious infrastructure mobilization, not just the execution.

Baseline-grounded anomaly detection. Because II maps the entire observable internet, it is possible to establish what ‘normal’ looks like — hosting patterns, certificate behaviors, registration timing, infrastructure clustering. Anomaly detection grounded in this baseline is fundamentally more robust than detections grounded in a blocklist. It catches what hasn’t been witnessed previously, before it has become active.

Relational graph depth. II data is natively graph-structured. A single IP exists in relation to its domains, certificate, ASN, hosting neighbors, and the full historical record of those relationships. This relational depth is not an enrichment layer added on top of II — it is the fundamental structure of II data, and it is what enables infrastructure reasoning rather than simple indicator matching.

Historical provenance at internet scale. A domain appearing unremarkable six months ago may resolve to malicious infrastructure today. A certificate pattern appearing new may be structurally identical to patterns used in prior campaigns which targeted your organization. II’s historical dimension allows AI systems to reason about infrastructure provenance — not just current state — a detection capability that point-in-time CTI indicators cannot replicate.

Filling the Gaps CTI Leaves Behind

The argument here is not that CTI should be abandoned, but rather that CTI alone leaves systematic observability gaps in training and reference datasets that Internet Intelligence is uniquely positioned to fill.

For example: the gap between attack and detection remains fairly wide, with the average dwell times for sophisticated intrusions often still measured in weeks or months. Due to its nature, II fills this gap by surfacing pre-attack and active attack signals that precede the attacks themselves. Because the actor’s infrastructure still exists on the internet and still exhibits patterns, it can still be correlated to their broader topology. II surfaces these signals for analysis and for developing training datasets that enable security-focused AI agents and ML to get ahead of the attack.

In another category, the supply chain observation gap is equally significant. CTI is often our first line of defense against known threats facing our own organization, but will say little about the infrastructure posture of our vendors and partners. Unfortunately, it is this extended attack surface that has proven repeatedly to be the entry point for some of the most consequential breaches of the past decade. Within this category, II provides continuous and nearly effortless visibility into third-party infrastructure exposure, change patterns, and relational proximity to known malicious infrastructure.

There are many other use cases, but for security teams already stretched by alert volume and analyst shortages, II delivers not simply marginal improvements but structural additions to what the team can actually observe and defend against.

Internet Intelligence as the Foundation for Agentic AI

The case for II as a complement to CTI for human analysts is strong, but its case as a foundational requirement for agentic AI is stronger still. This is because the limitations of forensic intelligence are not merely inconvenient for autonomous systems - they are architecturally limiting.

Agentic AI in security is expected to reason: to investigate unknown signals, form hypotheses, pivot across data sources, and reach conclusions that drive autonomous action or distill a sea of signals into something that is actionable for human-in-the-loop workflows. This reasoning is only as good as the foundational world model the agent carries. A CTI-grounded agent carries a world model that is, at its core, limited to a historical record of known threats. Though sophisticated within that record, these agents are largely blind to threats outside of it.

II-grounded agents reason about the unknown. Because II provides a comprehensive model of internet infrastructure within the organization’s chosen context, agents trained on II data can reason about novel infrastructure, users, and behavior by reference to its properties and relationships, rather than by lookup against a known-bad list. The difference between an agent that terminates its investigation when it finds no IOC match and one that pivots to examine hosting patterns, certificate relationships, time series data, TTPs, and historical neighbors is the difference between CTI and what can be produced by II as a foundation for training data.

II enables autonomous investigation chains. The core operational motion of an agentic AI security system that is often pitched to me is pivoting at machine speed in a manner that mimics human curiosity, from one signal to the next, building a complete and well-connected picture. It sounds great on paper, but this actually requires exceptionally large volumes of relational data: IP to domain, domain to certificate, certificate to co-hosted assets, assets to hosting neighbors, telemetry, time series data of active assets, and more. Due to its nature and approach, II is the native source of relational graph data structures. Without it, an agent cannot pivot meaningfully or efficiently.

II reduces hallucination risk in security contexts. As we’ve seen many times over the past 6-8 months: CTI-trained agents are structurally prone to confident misclassification — assessing novel infrastructure as benign not from affirmative evidence of safety, but from absence of known-bad signal. In contrast, II-grounded agents carrying a baseline model of normal internet infrastructure can distinguish between “unknown and consistent with normal” and “unknown and anomalous.” This distinction is operationally critical but cannot be supported by CTI workflows.

II enables proactive agentic action. An agent grounded in II can act on pre-attack signals — identifying staging infrastructure before weaponization, flagging anomalous certificate issuance before it’s used in a phishing campaign, surfacing relational proximity to known threat actor infrastructure before a single malicious packet is sent. CTI-grounded agents may struggle to accomplish this with acceptable accuracy, if it is in fact possible when pre-attack signal reasoning isn’t in its world model.

Three Scenarios Where the Foundation Determines the Outcome

Scenario 1: The Unnamed Threat Actor

A sophisticated group with no prior attributed campaigns stages infrastructure over six weeks — 80 domains, servers across four hosting providers, certificates from two CAs. No IOCs exist anywhere in your organization’s CTI ecosystem or in the forums and subscription platforms to which you subscribe.

A CTI-grounded agent finds nothing. The infrastructure is invisible to it — not because it isn’t there, but because it hasn’t appeared in a confirmed-event record.

In contrast, an II-grounded agent observes it directly. It flags unusual domain registration volumes with shared registrant patterns, notes that several servers fall within hosting clusters historically associated with threat actor infrastructure, and identifies certificate subject patterns deviating from legitimate baseline behavior. A high-confidence pre-attack warning — six weeks before the campaign launches — derived entirely from what the internet shows, not what the threat record contains.

Scenario 2: Supply Chain Compromise via Infrastructure Overlap

A software vendor in an enterprise’s supply chain is silently compromised. The attacker’s C2 infrastructure shares a certificate pattern and hosting subnet with infrastructure from a prior espionage campaign — one disrupted before any IOCs were published. CTI has nothing.

An II-grounded agent, conducting routine third-party infrastructure monitoring, identifies the certificate pattern overlap, traces the historical hosting relationship, and escalates the vendor for review by your organization’s team — before any malicious traffic reaches your enterprise environment.

Scenario 3: Autonomous Threat Hunting at Depth

A single internal host is making intermittent connections to an external IP with no CTI record. A CTI-grounded agent finds no match, checks for known malware signatures, finds none, and closes the alert as low-risk.

In this scenario, an II-grounded agent pivots intuitively. It traces the IP’s hosting history, identifies domains that previously resolved to it, follows one to a certificate shared with active credential-harvesting infrastructure, maps the broader hosting cluster, and surfaces two additional IPs in the same subnet making similar low-volume connections to other internal hosts. It doesn’t classify and close — it builds a graph, and the graph reveals a slow-burn intrusion that no CTI feed could have surfaced.

The Architectural Imperative

The intelligence foundation you build today is the cognitive architecture of your security AI in training for tomorrow’s deployment. A program built on CTI will improve at CTI-shaped problems such as known threat recognition, indicator matching, and documented TTP identification. It will remain structurally limited at everything outside that record.

On the other hand, Internet Intelligence reframes what can be accomplished by security-focused AI agents. It replaces a forensic world model with a live one that is far more proactive and curious by nature. It replaces a blocklist-shaped cognitive architecture with a baseline-and-deviation model and gives autonomous agents the relational context to investigate beyond the edge of known threat data.

CTI remains essential. It carries attribution, context, and analytical depth that II alone cannot replicate. The argument is not substitution — it is foundation. II is the layer that makes the rest of the intelligence architecture, and the AI built on top of it, capable of operating in a threat environment that does not wait for the forensic record to catch up.