How Does the Public Sector Gather Open Source Intelligence (OSINT)?
Public sector intelligence teams are responsible for helping inform decisions for a range of national security threats—from natural disasters, terrorism, public health crises, and war, to disinformation campaigns and cyber threats. Open-source data from the web is increasingly relevant to support investigations and response efforts for these security risks.
But where does this public data come from, and how does it go from a website to actionable intelligence in the form of a report? In this article, we’re diving into online data discovery and intelligence processes used for national security.
The Intelligence Cycle
The foundation of any intelligence-driven mission is built on the intelligence cycle—a six-step process used by analysts to turn raw data into actionable intelligence. This includes:
- Identifying information requirements, mission objectives, and other factors to inform the cycle’s planning and direction.
- Collecting raw intelligence from a variety of relevant sources.
- Processing that data to return only what’s useful, and transforming it into appropriate formats.
- Analyzing and contextualizing data into finished intelligence.
- Disseminating and communicating that intelligence effectively.
- Incorporating feedback into the initial planning and direction phase.
Government intelligence teams require a variety of tools and sources to meet requirements at each stage of the cycle. These typically involve a combination of third-party OSINT solutions (e.g. Maltego, Palantir, Shodan, the Echosec Systems Platform), with proprietary tooling requiring third-party data inputs.
Intelligence also varies in its value to different operational goals and strategies. Intelligence can be broken down into three main levels, which often overlap and inform each other from the tactical level up:
- Tactical: Provides on-the-ground context to inform specific tactics in local Areas of Operation (AOs).
- Operational: Informs movements and operations in a defined area, usually at a state or regional level.
- Strategic: Informs broad planning and objectives on a national or international security level.
Open Source Intelligence Sources
The public sector gathers a variety of intelligence types depending on the specific mission. For example, this could include cyber intelligence (CYBINT), geospatial intelligence (GEOINT), human intelligence (HUMINT), and imagery intelligence (IMINT), just to name a few. To generate intelligence, data can be gathered through a multitude of sources: online data feeds, photography, maps, human communications, field sensors, and more.
Open-source intelligence, which includes any publicly-available information, often overlaps with these other intelligence types. An unclassified data source does not make it less valuable—in fact, open-source data is often crucial for informing planning and direction for classified investigations.
OSINT can include offline data inputs like public records, but the web is the primary source of open-source data for the intelligence community. The indexed web contains an estimated 5.5 billion pages, but this is only a fraction of the information available to analysts. So what’s out there?
Open-source social media data now comes from a wide range of sites. Beyond “mainstream” platforms like YouTube or Reddit, there has been a rise in alt-tech platforms in recent years. These tend to be less-regulated or decentralized, designed to avert content policies on major networks like Facebook. This means they can be breeding grounds for extremist movements, disinformation, and even violent planning.
The surface web includes any content that is indexed and searchable using standard search engines like Google and Bing. Some surface web content that could be useful for intelligence efforts could include indexed social media posts, maps, and public records.
The deep web makes up a majority (estimated 90%) of internet content and includes any unindexed web pages (undiscoverable through a standard search engine). Some of the more fringe social media sites, forums/imageboards, and messaging platforms like the Internet Relay Chat (IRC) are all valuable deep web intelligence sources.
The deep web also hosts to paste sites, which are used to publicly and anonymously share blocks of plain text. While most pastes are innocuous, they are sometimes used to leak data. Tracking this activity is useful for developing intelligence for cybersecurity use cases and supporting VIP physical security—for example, in the event of a targeted dox.
The dark web is often associated with the most nefarious online activities since its user-anonymization enables illegal activities. And while the dark web gives intelligence professionals a window into trafficking, fraud, and illegal hacking, it also hosts similar content types to those on the surface and deep web: social media networks, imageboards, paste sites, and news aggregation sites.
For national security, these web spaces are valuable for:
- Gathering ground-truth and geographic information from physical security crises, such as natural disasters or conflict zones
- Identifying mis- and disinformation networks and bot accounts
- Monitoring extremist movements, propaganda, and planning
- Understanding political and social climates anywhere in the world
Intelligence Requirements and OSINT Solutions
Any analyst knows that finding relevant open-source web data across the web is not viable using standard search engines. Intelligence professionals require specialized software to address requirements at each stage of the intelligence cycle.
Commercial OSINT tools help intelligence teams gather open-source data more efficiently and in line with mission requirements. But because intelligence teams often work with their own interfaces and tooling, they also require direct, raw data access and integrations that can be plugged into their existing systems.
Growing Data Volumes
According to the US Intelligence National Strategy (2019), the intelligence community is increasingly challenged by growing volumes of online data available for collection, processing, analysis, and triage. The western world is also facing a data analyst shortage coupled with a growing demand for AI. As a result, data scientists tend to handle more complex tasks, developing tooling and data sets to support lower-level analysts on intuitive platforms.
Lack Of Access To The Right Sources
Intelligence teams are also challenged by a lack of access to some emerging online sources. For example, some niche networks (like alt-tech platforms, deep and dark web imageboards and paste sites, etc.) do not offer their own API or are unavailable through commercial API providers. To gather data from these sources, analysts are often required to create dummy accounts, make group requests, and navigate networks manually. This requires a significant amount of HUMINT resources that could be allocated to other areas of the intelligence cycle.
To address these challenges and satisfy intelligence requirements, OSINT solutions must:
- Improve data coverage by providing access to relevant sources, including fringe web spaces, that are not commonly available through commercial, off-the-shelf vendors.
- Leverage machine learning capabilities. AI is a major priority for the public sector, helping analysts process and contextualize intelligence more efficiently.
- Be intuitive and user-friendly for lower-level intelligence analysts, providing more efficient workflows and better speed-to-information.
Key Factors For Effective Data Gathering
There’s no doubt that publicly-available online data is valuable for public sector intelligence teams. The question is which sources are becoming more relevant, and how can this raw data be integrated into government tooling to generate actionable intelligence?
The internet is no longer just a valuable source of open-source information like public records and illicit dark web activities. More covert sites, like alt-tech platforms, imageboards, and paste sites across the surface, deep, and dark web host a plethora of risk indicators. This data can provide critical context for emerging national security concerns like extremism, cyber defense, and disinformation.
To gather, process, and contextualize this information into actionable intelligence, intelligence teams must use solutions that improve data coverage, utilize machine learning capabilities, and—where third-party tools are required—prioritize usability. These features will enable a more efficient and comprehensive intelligence cycle, ultimately driving more informed, timely decisions to protect citizens, assets, and other national security interests.
Connect with us to explore some critical data feeds your intelligence toolkit might be missing.