Defense initiatives address a wide range of national security threats—from natural disasters, terrorism, public health crises, and foreign military activity, to disinformation campaigns, cyber threats, and organized blue and white collar crime. Mainstream and fringe social media platforms, as well as deep and dark web networks, are increasingly relevant to support investigations and response efforts for these security risks.
Defense agencies can glean more powerful insights from other intelligence inputs by integrating relevant online data from these open sources. However, consolidating this data while meeting intelligence cycle requirements poses a number of challenges.
For instance, some social, deep, and dark web data feeds aren’t easily accessible through a data lake, API, or intuitive data aggregator. These are required by agencies to integrate new data sources into other feeds and tooling, develop advanced data science applications like machine learning, and conserve HUMINT resources amidst an analyst shortage.
In this article, we’re diving into online data discovery and intelligence processes in defense. How and where are defense agencies gathering threat intelligence? How can relevant social, deep, and dark web data be accessed in line with defense requirements? And how does this data leverage investigations into burgeoning use cases like countering domestic terrorism, identifying disinformation, and reinforcing national cybersecurity?
The foundation of any intelligence-driven mission is built on the intelligence cycle—a six-step process used by government and defense entities to turn raw information into actionable intelligence.
Whether the goal is to address cybersecurity risks, combat terrorism, support law enforcement, or inform other national security initiatives, the intelligence gathering process involves.
Military intelligence requires a variety of tools and data sources to meet requirements at each stage of the cycle. These typically involve a combination of third-party OSINT tools, like Maltego, Palantir, Shodan, or the Echosec Systems Platform, with proprietary interfaces, data feeds, and tooling built by data scientists within an intelligence team.
The intelligence cycle describes the process of gathering raw data, contextualizing it into actionable intelligence, and iterating upon this process to support mission-driven environments—but what types of intelligence do government and defense actually work with?
Cyber intelligence is gathered to address cyberwarfare and any state-targeted cyber attacks. Cyber intelligence could include, for example, network traffic data, dark web communications, social media data, or breached data.
Financial intelligence makes sense of financial data and activities that suggest money laundering, tax evasion, and other financial crimes. FININT could include transaction logs, Suspicious Activity Reports (SARs), or online communications indicating financial fraud (on social media or the dark web, for example).
Geospatial intelligence gathers geographically attributed human activities and geographical features on earth. This overlaps with IMINT (imagery intelligence), leveraging satellite and aerial photography, mapping data, and other geographic data points. Echoesc Systems combines GEOINT with OSINT to map public social media data in a defined area.
Human intelligence is gathered from a person relevant to the operation. This could include NGO, military, (e.g. attachés, special reconnaissance), and non-military personnel (e.g. detainees, refugees). HUMINT is often required along the intelligence cycle to contextualize raw data and separate false positives from relevant results.
Imagery intelligence is gathered from satellite and aerial photography. More recently, drones have enabled covert IMINT gathering with less risk to on-board personnel.
Measurement and signature intelligence is obtained by gathering qualitative and quantitative scientific data points from technical sensors. This could include acoustic or seismic measurements, chemical or biological data from a physical sample, nuclear data, and more.
Open-source intelligence is gathered from publicly-available data sources and often overlaps with the other types of intelligence mentioned in this list. While OSINT sources can include information from any public domain—including HUMINT or physical resources—a bulk of OSINT data and tools focus on data gathered from online sources.
Signals intelligence is gathered by intercepting human communications (COMINT) or electronic communications that lack speech and text (ELINT).
Technical intelligence describes weapon systems and tactical equipment used by a foreign military. The goal is to stay informed, evaluate foreign capabilities, and address a threat actor’s technological advantages.
Regardless of the data type or source, intelligence varies in its value to different operational goals and strategies. Intelligence can be broken down into three main levels, which often overlap and inform each other from the tactical level up:
The indexed web contains an estimated 5.5 billion pages—a fraction of the information available to intelligence professionals if the deep web is also considered.
It’s no surprise that much of the open-source data valuable for defense intelligence cycles are gathered from both indexed and unindexed online sources. These public sources often provide a number of intelligence types beyond OSINT, including IMINT, GEOINT, and TECHINT.
Social media platforms are valuable sources of public, ground-truth data relevant to local, regional, and national security initiatives. Mainstream platforms like Twitter and YouTube are often the earliest sources of text-based, audio, and visual data related to a crisis or security compromise.
However, more explicit chatter is surfacing on fringe networks like Gab, Mastodon, and 4chan as mainstream networks crack down on disinformation, hate speech, and other potentially dangerous content. These niche sources enable government and defense to access less-moderated interactions related to national security concerns like extremism and cyberwarfare.
The deep and dark web host a number of sites that also support defense objectives, from detecting public conversations on anonymized networks like 8kun, to uncovering illicit activities and public safety risks on unindexed forums and dark web marketplaces.
What do these online data sources look like, and why are they critical inputs for intelligence cycles in national security?
“Mainstream” social media platforms include sites such as Facebook, Instagram, Twitter, YouTube, Flickr, Reddit, Vimeo, OK.ru, Tumblr, and Vkontakte. Some mainstream providers (ie. Facebook, Instagram and occasionally Twitter) prohibit or limit access to broad monitoring by the public sector. However, much of the public information available on other mainstream platforms is relevant to defense initiatives. For example, this data can be used to identify:
As mentioned, fringe social networks are also of increased concern to intelligence agencies, as they tend to be loosely moderated or unmoderated. Some sites, like Gab and Mastodon, are built using a decentralized model, which makes them virtually impossible to remove or moderate regardless of the content they host.
As a result, many communities that engage in extremist movements and illicit activities migrate to these sites to avoid censorship. These sites include chan sites and messaging apps like Telegram and Discord. They are also more user-friendly than dark web navigation even though they don’t achieve the same level of anonymity as a Tor browser. For use cases like counter-terrorism, these fringe platforms are more effective for reaching wider audiences and young adults who are vulnerable to radicalization.
For government and defense entities, niche social media networks are useful for locating tactical planning, extremist manifestos, and other indicators of an imminent digital or physical national security threat. They are also likely to contain more explicit propaganda and extremist chatter that might not be present on mainstream social networks. This is useful for understanding extremist populations and trends within those groups. .
The deep web, which contains an estimated 90% of internet content, is also a valuable source of unindexed, open-source data valuable to defense initiatives. For example, the deep web hosts unindexed forums/imageboards and the Internet Relay Chat (IRC). These sources are accessible from a user’s regular browser, but not indexed by mainstream search engines like Google. These sites have a similar value to fringe social networks, enabling chatter about more explicit topics like extremism, trafficking, and drug use.
Paste sites, which are used to publicly and anonymously share blocks of plain text, are also useful for data breach and cybersecurity use cases in defense. While most paste site activity is innocuous, they are also used nefariously to dump classified data and target government personnel, law enforcement, and other high-profile individuals in doxxing attacks.
The dark web has long been used as a valuable intelligence source for governments investigating a number of national and global threats. Dark web forums function as an anonymized haven for individuals discussing illegal activity—from child exploitation, to drug and human trafficking, cyberattacks, and financial fraud. Dark web marketplaces also offer a window into current fraud tools and services, illegal substances, and leaked data offered by vendors.
Online Radicalization: Using Social Data to Inform De-Escalation
What is the Boogaloo? Why Fringe Networks are Critical for Addressing Domestic Terrorism
The Value of Reddit as a Threat Intelligence Feed
For decades, right-wing extremism has been the most prevalent terrorist threat in the United States, accounting for the majority of plots and incidents since 1994—and this trend is only escalating. According to CSIS, “right-wing extremists perpetrated nearly two-thirds of the terrorist attacks and plots in the United States [in 2019], and they committed over 90 percent of attacks and plots between January 1 and May 8, 2020.”
Right-wing extremism became particularly active in 2020 as groups co-opted civil unrest related to COVID-19, Black Lives Matter protests, and the presidential election. Extremists use social media platforms to communicate, recruit, and spread propaganda in support of their agendas, which tend to revolve around white-supremacism and anti-government sentiment. While this has been prevalent on mainstream networks like Facebook, extremists are also active on fringe platforms where censorship is relaxed or nonexistent.
Far-right movements are also seeing increased participation by active-duty military. This group is targeted by far-right recruiters for its tactical knowledge and capabilities. These factors all point to the value in monitoring social media—particularly less-regulated networks—to understand how emerging extremist groups operate and inform de-escalation strategies.
Intelligence about foreign military activity is often gathered through a combination of IMINT, SIGINT, TECHINT, MASINT, and other intelligence formats. However, online open-source data leverages these data points by providing on-the-ground input. This information can help predict and respond to military activities anywhere in the world.
For example, IMINT combined with imagery publicly posted to social media helped investigators retroactively locate the activity of a Russian Buk missile launcher before it shot down flight MH17 in 2014. More covert online sources like the dark web can also provide anonymized chatter related to foreign military activities, and uncover cyberespionage campaigns.
Beyond supporting physical security risks and investigations, governments can also aggregate data across social platforms and media sources to assess public sentiment from any location on earth. This is useful for informing decisions around new policies, embassies, foreign aid, military action, and other initiatives affecting or impacted by foreign populations.
Disinformation is prevalent on social media, blogs, news, and other online media platforms. Monitoring these sources is crucial for locating disinformation and engineered campaigns related to a public crisis (such as false COVID-19 health information) or leading up to an election. For example, tracking Russian media and social interactions on networks like VK and OK.ru can help Baltic governments and other impacted nations track and counter foreign-influence disinformation campaigns.
Beyond swaying public opinion, disinformation campaigns can also aid in leveraging cyberattacks (e.g. through phishing and impersonation) or extremist movement narratives (e.g. right-wing extremists often circulate disinformation related to anti-government initiatives).
Disinformation can take the form of:
Breaching government data is both financially and politically lucrative for lone-wolf attackers, organized hacking groups, and nation state actors. Sophisticated technologies are also available to a greater diversity of adversaries than ever before. Governments are faced with handling a wider range of agile cyberattack techniques targeting their data, infrastructure, and citizens—whether or not the threat actors are backed by nation-state resources.
Persistent online threats include:
Cyber attacks also increase significantly in response to global events that elevate public anxiety—such as the COVID-19 pandemic. Following the early months of the pandemic in 2020, for example, there was an increase in malicious domains providing health information, government and public health entity impersonation, phishing, and ransomware attacks. This trend is expected to continue in the face of future public crises around the world.
Paste sites, discussion forums, and marketplaces on the deep and dark web often provide early indicators of nation-state targeted breaches, malware, and attack techniques. Streamlined access to data from these sources allow intelligence agencies to leverage other data feeds used to predict and investigate these attacks.
Open-source data online helps government and defense entities investigate corruption, embezzlement, fraud, money laundering, and other financial crimes. The deep and dark web also hosts a number of marketplaces, discussion forums, and other communication channels that enable illegal human and drug trafficking on a regional, federal, and international level.
Open-source intelligence aggregated from social media and the deep and dark web can help analysts investigate blue and white collar crimes by:
The COVID-19 pandemic demonstrated how intertwined the impacts of major global events, like climate change and natural disasters, globalization, migration, and conflict can and will have on national security. These impacts are further complicated as crises are weaponized by an already overwhelming number of cyber adversaries. These threats, both real-world and digital, are only expected to intensify.
Governments require sufficient, well-rounded intelligence to inform appropriate national security decisions and crisis response as threats become more complex. Raw data feeds from social media and deep and dark web networks give government access to critical context related to large-scale crises, whether they are responding to a real-life threat or securing vulnerable computer networks. Social media is often the most timely source of information for on-the-ground threats.
These feeds can help answer questions like:
According to the US Intelligence National Strategy (2019), the intelligence community is increasingly challenged by growing volumes of online data available for collection, processing, analysis, and triage. The western world is also facing a data analyst shortage coupled with a growing demand for military AI. As a result, data scientists in defense tend to handle more complex tasks, developing tooling and data sets to support lower-level analysts on intuitive platforms.
Defense ministries are also challenged by a lack of streamlined access to pertinent social data
sources. Many defense missions are driven in part by content from niche social networks or unindexed areas of the web. However, many of these feeds are not typically offered by commercial, off-the-shelf APIs. Analysts are often required to create dummy accounts, make group requests, and navigate networks manually. This requires a significant amount of HUMINT resources that could be allocated to other areas of the intelligence cycle.
Additionally, many threat intelligence companies are focused on providing tools and services that deliver organizations finished threat intelligence. This enables users to bypass the collection, processing, and analysis of raw data in the intelligence cycle. These tools use automation to operationalize threat intelligence processes and mitigate the efficiency problem that analysts are facing across many industries.
However, government and defense agencies have unique needs that don’t necessarily fit the mould of a finished intelligence delivery system. Specifically:
Intelligence professionals require a multitude of tools and functionality to address these challenges at each stage of the intelligence cycle. Defense agencies typically combine some third-party OSINT tools, like Maltego, Palantir, Shodan, or the Echosec Systems Platform, with their own proprietary interfaces and tooling, and source data from both internal feeds and external APIs.
To satisfy data access requirements and challenges for defense, data collection, processing, and analysis tools must:
For example, integrating a fringe social media API with a program like Maltego or other custom tooling allows users to automatically correlate data points between disparate sources—such as usernames across extremist chan boards to real-life personas and other PII. The goal is flexibility to customize data integrations relevant to the specific mission at hand.
Current defense use cases, such as counter-terrorism, require access to niche online data sources like Gab, Telegram, and 4chan. However, many of these data feeds are not accessible through commercial, off-the-shelf APIs on the market—and manual navigation consumes valuable HUMINT resources.
Niche APIs allow defense users to access more data points on these networks than existing commercial data solutions, and allow for seamless integration into existing defense infrastructure. Adding these sources into a defense agency’s other feeds allows users to extract more compelling insights for use cases like counter-terror.
This allows for easy integrations, analytics, application of machine learning, and other advanced data science applications. This differs from APIs that focus on real-time firehose data but do not organize unstructured data into useful formats.
AI is a major priority for defense, helping contextualize actionable intelligence more efficiently. This is incredibly useful for analyzing content that is time-consuming, expensive, or easily overlooked with HUMINT resources—such as intentional obfuscation.
The requirements listed above are most relevant to higher-level intelligence professionals, such as data scientists and programmers, who interact with raw data and develop tooling and integrations. It’s worth noting that these requirements ultimately point to usable, intuitive platforms and interfacing for lower-level analysts working in mission-driven environments.
The goal is to bridge the gap between automation and HUMINT, allow for customized data integrations and flexibility, and improve access to emerging social sources that are becoming more relevant in national security initiatives like counter-terrorism.
Mainstream and niche social platforms, as well as deep and dark websites, are highly valuable intelligence sources for government and defense. These networks are frequented by extremists, nation-state actors, and other fringe populations, and provide real time information before, during, and after dangerous digital and physical events.
Combined with other intelligence feeds, these online sources offer stronger insights about immediate national security threats and allow analysts to more comprehensively understand emerging, evolving, and less-understood adversarial populations. This added context is necessary for effective decision-making, whether it’s on the ground or in the digital world.
Advancements in data discovery software now enable intelligence professionals in this space to easily access and filter down pertinent, open source data online. Many of these feeds—particularly fringe social platforms—have not been previously available through solutions that allow for API integrations, advanced data aggregation and filtering, as well as advanced data science applications like machine learning. These capabilities will only add more context to mission-driven environments, conserve HUMINT resources where necessary, and ultimately enable more informed and timely national security decisions.
Want to learn more about how the Echosec products can assist your open source data discovery strategy?
Reach out to our team for a demo.