Skip to content

Data Breach Detection 101

Data breaches are an inevitable crisis for any organization with a digital footprint. Data breach detection is becoming an increasingly important initiative for organizations of all sizes. According to the Cost of a Data Breach Report by IBM and Ponemon Institute (2019), companies now face a 30% chance of suffering a data breach within the next two years. 

Increased data breach risks, frequency, and costs—both fiscal and reputational—mean that cybersecurity is now an essential investment for organizations regardless of their vertical or size. Understanding who is vulnerable to data breaches, how they happen, and how to detect and respond to them quickly and effectively is the cornerstone of reducing breach costs. This information can also help organizations better predict and prevent breaches altogether.

What exactly is a data breach, why and how do they happen, and how can organizations detect and respond to them more effectively?

What is a Data Breach?

A data breach is the intentional or unintentional release of private information to an unauthorized individual or environment. A breach incident could include one or a few records of breached data or a “mega breach” of over 1 million records.

Data breaches are usually financially motivated. Some breaches are also politically motivated—for example, by publicly revealing government data as a form of activism or social justice. Breaches can also be motivated purely by thrill-seekers seeing how far they can take their nefarious hacking skills.

Data breaches happen most often because of human error (falling victim to social engineering or phishing schemes), weak security best practices (such as poor password security or lack of 2-factor authentication), and unpatched system vulnerabilities. They can be caused by both external and internal actors.

Types of Data Breaches

Any private data type can be involved in a breach. However, breaches usually target personally identifiable information (PII), as this is the most easily monetized by threat actors.

Breached PII often includes:

  • Full names
  • Email addresses
  • Usernames
  • Passwords and hashed passwords
  • Security question answers
  • Home addresses and contact information
  • Dates of birth
  • Social security numbers
  • Drivers license information
  • Credit card and other personal financial data
  • Insurance information
  • Personal health information
  • Family and friends’ PII

Breaches can also target non-PII information. This is often the case in politically-motivated breaches, either by hacktivists or threat actors backed by nation-states. Non-PII breached data can include:

  • Classified government documents or communications
  • Internal business, such as trade secrets, business relationships, plans, or budgets
  • Proprietary source code
  • Infrastructure data, such as building, device, or defense blueprints

Who is at risk of a Data Breach?

In reality, nobody is safe from a breach. Even global organizations that are aware of their appeal to adversaries and invest heavily in cybersecurity, such as financial institutions, are still at great risk due to the value of their internal data. 

Given the amount of personal information available online—whether it’s PII sold on the dark web, or public data on a personal social media account—individual targets are more vulnerable than ever. Enterprises of all sizes are at risk, with small businesses accounting for 43% of breaches in 2019 (source: Verizon’s 2019 Data Breach Investigations Report). 

The following industries are the world’s most targeted, according to IBM’s 2020 X-Force Threat Intelligence Index:

  1. Financial and Insurance. This has been the most targeted industry by cybercriminals since 2016, accounting for 17% of all attacks. Financial institutions harbor millions of records of financial data and PII that are quickly and easily monetized by attackers.
  2. Retail. Retailers saw a significant risk increase in 2019, accounting for 16% of cyber attacks—just slightly under the financial sector. More retailers than ever are moving their services online, opening up further attack vectors. POS machines, as well as PII, financial data, gift card credentials, and loyalty program data associated with customer accounts, are valuable targets.
  3. Transportation. Like retail and finance, the transportation industry hosts millions of customer financial data records in its systems, as well as passport information and data relevant to a country’s critical infrastructure.
  4. Media and Entertainment. In the digital world, the ability to sway mass opinion is more powerful and viable than ever. This industry is attractive to cybercriminals who want to influence public information flows or gain content control before it’s publicly accessible. 
  5. Professional Services. Of all the industries analyzed in IBM’s report, the professional services industry had the largest number of breached records in 2019. This includes data from legal, accounting, and HR firms, as well as technology companies.
  6. Government. Breaching government data is both financially and politically lucrative for attackers, who are often backed by nation-states. Targets include national government departments and smaller municipal governments and public sector entities, such as law enforcement agencies.
  7. Education. Educational institutions host a variety of monetizable intellectual properties and personal information records. 500 schools were attacked by cybercriminals in the United States in 2019, according to the Index. 
  8. Manufacturing. Our globalized economy leaves many entry points and ripple effects for data breaches in this sector. Manufacturers contain valuable trade information and intellectual property for global companies and government entities, such as defense. 
  9. Energy. Like other industries in this list, the energy sector is a valuable target for financially-motivated cybercriminals seeking customer PII, or politically-backed actors seeking intellectual property and critical infrastructure data.
  10. Healthcare. Accessing personal healthcare information is valuable for criminals committing identity theft or collecting ransom in exchange for breached data. This sector is especially vulnerable to attackers during global health crises, such as pandemics when healthcare systems are overwhelmed.

How do Data Breaches happen?

iStock-street blur busy city-1024x689How does an attacker go from finding a target to gaining control of its data?

This question is critical to cybersecurity teams who aim to predict, identify, and respond to data breaches and other security compromises. An attacker’s moves over the course of a data breach is often referred to as the “cyber kill chain.” 

This process is constantly evolving as threat actors develop new tactics and techniques. Gathering all available threat intelligence at each stage of the cyber kill chain greatly leverages an organization’s ability to defend against future data breaches.

The Data Breach cyber kill chain 

1. Target and Investigate

The first step of a data breach is to find a target, investigate it, and assess its most vulnerable point of entry. The chosen target depends on what the attacker’s goals are, their skill level, and whether they’re a lone wolf or part of a coordinated team. 

For example, a lone attacker seeking quick financial gain might start by searching for available login credentials for their target on dark web marketplaces. Exploiting previously stolen credentials accounts for 29% of initial attack vectors (IBM’s 2020 X-Force Intelligence Index).

Investigating the target and its vulnerabilities at this stage will largely determine whether their strategy will exploit computer or human error, or both.

Exploiting Computer Vs. Human Error

Exploiting computer errors in a breach most commonly involves scan and exploit strategies, which account for 30% of initial access vectors. These strategies involve assessing a target’s software and security configurations for known (or novel) vulnerabilities and launching malware designed to target them. Scan and exploit strategies can take a higher skill level to execute. Exploiting computer errors can also be less fruitful for the attacker if their target regularly updates security patches. 

The alternative, and often more successful method, is to exploit human error. These strategies, which are most typically delivered through a phishing or social engineering campaign, manipulate people, under the guise of a trusted entity, into clicking malicious links or delving out passwords and other PII. Phishing is the most common access vector, accounting for 31% of attacks.

A successful phishing campaign relies on sounding and looking legitimate—so the first step is research. Attackers search social media and other relevant sources to establish profiles of who they aim to impersonate and target, and learn what kinds of software, tools, and services the target organization already uses. 


Say the attacker uses public LinkedIn data to learn that the target company has an executive traveling in the UK for business. They also find a press release announcing the company’s integration of a new cloud service—and by looking at the company’s employee page, they find the executive assistant’s name and email address. This is easily discoverable, seemingly innocuous, yet incredibly valuable information for an adversary.

2. Develop tools and strategies

The tools and techniques an attacker uses in the next stage again depend on a number of factors, including who their target is, the attacker’s skill level, and what information they gathered during step one.

At this stage of a scan and exploit strategy, the attacker’s aim is to determine which malware types they can use to exploit known vulnerabilities in the target’s system. This technique relies on the target neglecting to keep up-to-date patches on publicly-known vulnerabilities (IBM’s X-Force team has documented over 150,000 of them). Alternatively, more skilled attackers aim for novel, undefended software vulnerabilities and develop new malware to target them.

Developing tools and strategies exploiting human error look a bit different. Here, the attacker is likely developing a social engineering narrative based on their target research and deciding how to deliver that. Most often, this is through a phishing email—but they could also use a messaging app or a phone. 


The information found by the example threat actor in step one could be used to create a fake webpage imitating the cloud service’s login screen (a “scampage”) and an email domain closely resembling the company’s. They could also establish a convincing narrative (e.g. “I’m at ___ in the UK, and I can’t seem to access our cloud service. Can you login below and let me know if it works for you?”) that includes a scampage link. This is a classic example of a business email compromise (BEC)—a highly-targeted phishing scam that impersonates executives.

There is a range of other tools and techniques the attacker could develop at this point—for example, attaching malware to a USB stick for deployment along a supply chain or in a public place. At this stage, the attacker might also plan and develop command-and-control (C2) software or other techniques to gain further data access once they’ve entered the target system.

3. Deliver attack

This is where the attacker launches malware, sends phishing emails, or delivers any other attack method to the target. Infecting the target system with malware through a known vulnerability might be all the attacker needs to access private data at this stage.

The journey from attack delivery to acquiring breached data often has an even longer journey with social engineering methods. In the BEC example highlighted, the attacker must wait until the executive assistant selects the link and/or enters login credentials on the scam page. There’s potential that their strategy could fail—if it does, the attacker would have to find a new target or adjust their strategy.

4. Exploit

With successful deployment, the attacker is now able to leverage a system vulnerability or gather login credentials—either by social engineering or finding credentials on the dark web. Now they can enter the system, run malware, and start exploiting data. They can also move around to some extent within the target system and evaluate its infrastructure for any further exploitation vectors.

5. Command and control

The attacker has now gained system entry and is likely exfiltrating data and learning about other system processes. At this point, they could gather what’s in sight and call it a day—but typically, the attacker utilizes tools that enable deeper system control that could last long after the initial exploitation. This could involve altering the system’s security features and code internally, setting up additional system accounts, or launching remote access tools (RATs).

Adversaries are increasingly developing command-and-control (C2) tools at this stage of the attack, which maintains system control after gaining entry. The “go-to” C2 tool, Empire, was discontinued in July 2019. This precipitated multiple C2 tool developments for use by both ethical and adverse hackers.

Living off the Land (“LolBas” or “LolBins”) strategies are also becoming more popular at this stage. These combine computer vulnerabilities with social engineering tactics and use a target’s existing system functions to maintain control. This gives security analysts monitoring the target system the illusion that their network is running normally.

6. Complete Objectives

A threat actor who has successfully navigated through these attack stages now has breached data records from the target system and can deliver on their initial objectives. This usually involves monetizing breached data, acting on political or cyberwarfare motivations, or simply publishing breached data online. Sophisticated exploitation strategies can often take weeks or even months for the target organization to detect, giving the threat actor plenty of time to deliver on their goals.

Read more:
What is Hacking? How Does It Work?
10 Ways Cybercriminals Put Your Data Security At Risk?
Lifecycle of a Hack: The 5 Stages of a Data Breach

Exploited data

iStock-data sources code abstract

What do cybercriminals do with breached data once they’ve completed their objectives?

There are dozens of ways that attackers can abuse breached personal data and other proprietary information. Again, the most common goal is to monetize data—but some adversaries are also politically motivated or use breached data to engage in harassment and hacktivism campaigns. Here are a number of specific examples of how breached data is exploited.

Identity theft and financial fraud

Many consequences of breached data, including some of those mentioned in this list, fall under the umbrella of identity theft: the act of illegally exploiting another’s personal data for financial or other gains. Accessing records of personal financial data is the most immediate way to monetize a person’s identity.

Identity theft and financial fraud using breached data often looks like:

  • Cashing out a person’s bank account. The dark web hosts a large market for bank account logins to give buyers direct access to an individual’s cash and credit.
  • Bank drops. These are legitimate-looking bank accounts created with stolen credentials. They’re used to siphon illegal cashouts, establish fake credit scores, and commit loan fraud.
  • Creating counterfeit banking cards and other financial documents. Counterfeit cards can be sophisticated, with mag stripes and built-in security features.
  • Tax fraud. Attackers will use stolen credentials to file an individual’s taxes and claim the return before the taxpayer does.
  • Healthcare fraud. The attacker uses stolen credentials to purchase medical care or prescription drugs.

Dark web marketplaces

While attackers can keep any breached data to themselves, an alternative strategy is to offer it for sale on the dark web. There is a plethora of dark web marketplaces offering breached data records for sale (especially personal financial data). 

These marketplaces evolve rapidly as they are launched or taken offline by law enforcement or distributed denial of service (DDoS) attacks. Many dark web vendors also use deep web sources, including messaging apps like Telegram and paste sites like Pastebin and DeepPaste, to advertise their shops.

Credential stuffing

Some attackers take breached personal credential lists in bulk (emails, usernames, and passwords) and programmatically test them against login pages for other sites and applications. Access to mega-breaches is lucrative for nefarious hackers using this credential stuffing strategy, enabling them to breach even more accounts. 

Password reuse is a relatively common practice (that we do not recommend), so access to one breached account often means access to multiple accounts belonging to the same individual. Attackers use specialized software to run credential stuffing campaigns that override login Captchas and distribute their login attempts across multiple IP addresses to blend in with other internet traffic.

Social engineering

The attacker might continue using social engineering strategies to leverage any leaked data they acquired in the initial attack. Data breaches don’t necessarily deliver account passwords—and any personal information is all an attacker needs to gain access to more lucrative data.

For example, say an attacker gained access to a government revenue service/agency database containing full names, addresses, phone numbers, and social security numbers. The attacker could contact the service over the phone under the guise of one of those individuals, use that information to pass security screening questions, and make account changes or request access to more personal data.

SIM jacking

Also called SIM swap fraud, SIM jacking is a form of social engineering that specifically targets an individual’s mobile phone. In a nutshell, the attacker uses breached personal information—names, dates of birth, phone numbers, home addresses, and emails—and contacts that individual’s phone service provider. 

Then, they use that information to request a SIM swap under the guise of that individual to a card in the attacker’s control. This is a relatively common request by legitimate customers who have lost their phones. Alternatively, they could bribe the agent with Paypal deposits or cryptocurrency to move the account over to the hacker’s SIM card.

Once the SIM card is swapped, the attacker has full access to any apps or accounts linked to that person’s phone. This can result in financial loss, identity theft, and hacked social media accounts for the victim. This type of attack is particularly hard to rectify since victims rarely get the support they need from service providers once the account is out of their control.

Doxxing and hacktivism

Doxxing is the act of putting an individual’s personal information records on public display as a form of online harassment. This type of attack is usually targeted at breaching a specific individual’s data rather than the large-volume, monetizable attacks usually associated with the term “data breach.” Breached data in a typical dox includes:

  • Full names, home addresses, and contact information for the individual and their immediate family members
  • Date of birth
  • Social security data
  • Workplace information and history
  • Banking information
  • Social media handles
  • Reason for the dox

Reasons behind a dox vary. Doxxing is often a form of hacktivism against high-profile individuals, such as executives or politicians. Many doxxes are also committed as revenge against those suspected of committing crimes, such as child exploitation. 

Hacktivist data breaches are committed by politically-motivated individuals or groups. These are often related to freedom of speech and information as well as human rights movements. The goal is to arouse wide-scale social change through online activities. Doxing and leaking confidential documents or communications are often used by hacktivist groups to advance their agenda.

Nation-state cyber attacks

According to Verizon’s Data Breach Investigations Report (2019), the number of breaches committed by nation-state and affiliated actors has increased steadily since 2017. In 2019, 23% of breaches were committed by these groups, and a quarter of the breaches analyzed in the report were related to cyberespionage. 

This trend points to an evolution in data breach threats and targets. While a bulk of breach incidents continue to be financially motivated, government entities are increasingly targeted by nation-backed threat actors seeking to gain political, military, technological, or other advantages.

Illegal immigration and international crime

Leaked personally identifiable information enables individuals to illegally cross borders. Data breaches can facilitate the distribution of stolen personal documents, including passports, as well as counterfeit identification and work permits. On the dark web, these are largely targeted towards refugees and other individuals seeking to establish residence illegally. In rare cases, identity theft can lead to espionage and even terrorism in a foreign country.

Read more:
Financial Fraud: 7 Critical Dark Web Threats and How to Find Them Fast

Data Breach consequences

Data breaches have immense consequences for their victims. The data exploitation methods described above give some indication of what consequences might look like for individual data subjects—but what do they look like for organizations?

A single breach can have significant financial and reputational costs for an organization. A breach means they must invest additional resources into:

  • Detection efforts and assessing/reporting the breach
  • Notifying compromised individuals
  • Breach response and containment, including investing in additional security measures and improved staff training
  • Service disruptions
  • Loss of revenue and customers, attracting new business
  • Fines incurred through laws like the GDPR

According to the 2019 Cost of a Data Breach Report by IBM and Ponemon Institute, a single breach costs USD $8.19 million in the United States. Considering the average breach size consists of a modest 25,575 records, these costs could escalate quickly if the organization is facing a mega-breach of over a million records. 

Lack of customer, employee, and stakeholder trust following a data breach can be even more damaging in the long term than financial loss following a breach. Regardless of how many records were compromised, a brand’s success going forward largely depends on how a company responds to the incident and handles relationships with those affected.

Read more:
Cyber Security Breach: What Happens in the Fallout?

Data Breach detection

istock corporate office data blur boardroom meetingThe length of a data breach lifecycle—the amount of time it takes to detect a breach—is critical. It takes organizations an average of 206 days to detect an attack and 73 days to contain it (IBM’s 2019 Cost of a Data Breach Report). That means an organization’s breached data could be vulnerable for over 9 months.

Longer detection times significantly impact the security of affected individuals and any financial and reputation damage the organization incurs. According to the same IBM report, breach life cycles of over 200 days cost 37% more than those with life cycles under 200 days.

These costs and timelines mean that breach detection tools are a critical component of an organization’s cybersecurity. How and where do these tools find data alerting an organization to a breach?

Detecting Data Breaches safely and efficiently

Some cybersecurity tools allow security teams to detect infrastructure vulnerabilities or suspicious activity early on. However, as adversary tactics and techniques become more advanced, detection often isn’t possible until breached data is actually out in the world—and it tends to appear in obscure and unindexed (unsearchable) online spaces. This is where data discovery solutions like the Echosec Systems Platform greatly leverage breach detection efficiency. 

Three common breach detection sources include:

  • Paste sites. These are popular on both the deep and dark web for publicly and anonymously sharing blocks of plain text. Nefarious paste site use involves exposing breached data. These breaches often come in the form of doxxing or breached credential lists. Popular paste sites include Pastebin, PasteFS, and DeepPaste.
  • Dark web forums and marketplaces. These sites offer users total anonymity, making them abundant sources of breached data. Dark web marketplaces typically state where the data came from and offer a preview of offered data, while dark web forums can act more like paste sites, with users dumping breached data lists.
  • Breached Data Repositories. These repositories are publicly-available databases aggregating over 10 billion leaked records from known breached incidents. Repositories are continuously evolving as new breach events are discovered on the dark web and other hidden sources.

Searching these sources for entities unique to an organization—such as email handles—is extremely valuable for detecting a breach early on and shortening the data breach lifecycle. However, because these sources are unindexed, searching them efficiently for relevant data is an extremely cumbersome and time-consuming process without specialized search software. Networks on the dark web, such as DeepPaste, marketplaces, and dark web forums, are notoriously slow to navigate and can expose organizations to further risk if done inexpertly.

The Echosec Systems Platform crawls and indexes these sources so organizations can efficiently find breached data indicators and shorten time-to-discovery and response. Search results can be viewed within the web-based Platform so that users don’t have to navigate directly to dangerous networks like TOR for threat data.

Read more:
5 Current Cyber Attack Techniques and How to Stay Threat-Informed
How to Detect Data Breaches Fast

Data Breach prevention and response

“It is people and not technology that are the first line of defense in detecting and stopping many of these attacks…”
Lance Spitzner, SANS Security Awareness

Prevention is the ideal data breach scenario for any organization. Since exploiting human error through phishing and social engineering campaigns is the most common attack vector, successful prevention often comes down to educating personnel and changing basic security best practices. 

This could involve routine, up-to-date cybersecurity training (anything from identifying a phishing email to using public wifi), and enforced password protocols, such as 2-factor authentication, improving password strength, and avoiding password reuse. 

Prevention strategies also involve software and security process improvements. This includes:

  • Limiting administrative controls only to personnel who require it
  • Patching software vulnerabilities regularly
  • Data encryption
  • Assessing and improving third-party vendor security

Access to the hidden online sources described in the previous section can also help organizations predict and prevent specific attack vectors. For example, dark web forums contain attack planning discussions for specific industries or organizations. This intelligence can be used to improve security processes or inform personnel about potential risks.

Data breach containment and response strategies

Prevention isn’t always possible, even with the most security-centric organizations. There is always room for human error, and sophisticated adversaries are still capable of exploiting novel, zero-day attacks. Organizations must contain and respond to the breach quickly—often within 48-hours of detection. Effective containment and response strategies depend on the organization and how the breach happened, but typically involve:

  • Locating, repairing, and securing affected systems. This should be outsourced if your organization is not already equipped with a team of cybersecurity experts. It’s important to retain any digital evidence of a breach for forensic examiners.
  • Reporting the breach to the Information Commissioner’s Office or other regulatory authority. Breaches must be reported within 72 hours in compliance with the GDPR.
  • Informing affected individuals. This response should clearly explain how the breach happened, what data was compromised, what affected individuals should do, what the organization’s mitigation strategy is, and how they can be contacted.
  • Gathering threat intelligence. Cybersecurity and forensics analysts gather threat intelligence along the breach’s cyber kill chain to better understand how the system was compromised.
  • Improving security processes. Threat intelligence gathered post-incident can be used to improve the organization’s security profile for future attacks and better train personnel for prevention and detection.

In 2019, over 8.5 billion data records were compromised worldwide. To put this number in perspective, this is a 200% increase in the number of breached records in 2018.

As attackers scale up, so does response

Organizations of all sizes and industries around the world are struggling to keep up with this increased risk, as well as the variety of tactics and techniques adversaries are developing to exploit data. While breach attempts are inevitable, organizations can significantly reduce breach costs and impacts the earlier they can detect incidents. 

Web monitoring software is becoming a crucial part of an organization’s security and breach mitigation toolkit. These solutions help organizations efficiently locate breach indicators when they first appear on hidden online networks so they can respond faster and reduce damages. This is the kind of proactive approach that retains customer, employee, and stakeholders' trust, and ensures an organization’s success for years to come.

Echosec Systems provides intelligence and security teams with streamlined access to online networks to get clear visibility into the risk landscape.