Introductory Guide to eDiscovery


101 Guide: eDiscovery
Organizations produce a deluge of electronic information. Email inboxes are overflowing, hard drives and cloud storage repositories are packed, and social media notifications come at breakneck speed.
For legal professionals who must locate key documents and information for use as possible evidence in litigation and investigations, the explosion of such data has rendered the process of collecting, culling, and reviewing digital information—known as eDiscovery—incredibly costly and time-consuming.
Fortunately, advances in eDiscovery technology are reducing the cost and burden for legal teams to narrow the universe of potentially relevant data. This eDiscovery 101 guide provides an introduction to the fundamentals of eDiscovery and explores challenges and best practices, including considerations for .
What Is eDiscovery?
eDiscovery is the process of identifying, collecting, filtering, and reviewing electronically stored information (ESI) that constitutes potential evidence in legal proceedings and investigations. During eDiscovery, legal professionals, IT personnel, and forensics teams work together to narrow very large datasets into more manageable volumes.
Prior to the digital age, “discovery” in business disputes meant combing through boxes and drawers full of paper documents to locate key information and evidence. Today, since most documents needed for business litigation, arbitration, and investigations are stored in electronic format, traditional discovery is referred to as eDiscovery and is focused on ESI. In the early days of eDiscovery, it was possible for legal professionals to manually review ESI to determine its relevancy to a matter. This was feasible for 25 or 2,500 files, but not so much for 25,000, let alone 250,000 or 2.5 million!
Today, eDiscovery tools powered by analytics and artificial intelligence (AI) significantly streamline the eDiscovery process. But even with modern technology, given the ever-increasing volumes and diversity of ESI, eDiscovery remains an intensive endeavor that requires human expertise, effective communication, and agile decision-making at every step of the way.
What Is ESI?
ESI stands for electronically stored information. This includes a wide variety of digital assets, including emails; e-documents (Word, PPT, and Excel files); image, audio, and video files; mobile device data (e.g., chat programs and text messages); cloud-based applications; website content; and social media postings..
What Is the EDRM Framework for eDiscovery?
Introduced in 2005, the Electronic Discovery Reference Model (EDRM) is a visual representation of the complete eDiscovery lifecycle, which is widely referred to by legal teams globally. The EDRM breaks down the eDiscovery process into nine steps, although not every step is relevant to every matter. The EDRM isn’t always followed sequentially, and steps may be repeated depending on the project scope and cadence.
What Are the Steps in the EDRM?
What Is Early Case Assessment?
(ECA)in a workflow is typically employed during the “processing” stage above to identify key documents and information that the legal team uses to advise its client on case strategy and merits. ECA can help predict the cost of a case as well as likely exposure, which helps teams create realistic case strategies and budgets for the full eDiscovery process. ECA also uses advanced methods to filter out non-relevant data early in the eDiscovery process, thereby narrowing large datasets. Beyond the standard culling methods of filtering by date and search terms, and deduplicating and de-NISTing files (i.e., removing system files and other non-user-generated files), ECA leverages the following tools to further cull data during processing:
- Filetype filters (e.g., removing calendar entries or video files when not relevant to the matter)
- Domain/email handle filters (e.g., removing SPAM and industry newsletters that hit on search terms but are not relevant to the underlying dispute)
- Concept clustering (i.e., organizing the files by content to identify and remove groups of search term false positives)
Generally speaking, standard ESI filtering results in an 80% reduction in the volume of ESI. That number can top 90% using ECA advanced filtering, meaning only 10% of your ESI moves to the document review phase, rather than 20%.
is an ECA and eDiscovery platform from 九色 Legal that ingests, culls, analyzes, and exports datasets. It can be installed securely on-site or hosted in a 九色 Legal data center. Capable of processing up to 17 terabytes of data in 24 hours and managing multi-language data, Digital Reef can help you process, investigate, and preview your data 40% faster while reducing datasets by more than 90%. Attorneys and legal processing service providers use Digital Reef’s built-in document viewer and coding interface to conduct investigations.
For larger matters that require linear or a technology assisted review (TAR)-based approach, data is culled in Digital Reef and then exported in required output formats to easily load into your preferred review solution. Digital Reef integrates with numerous technologies, including the market-leading Relativity document hosting and review platform. The team at 九色 Legal builds proprietary add-ons for Relativity.
ECA in Action
In a construction matter, a case-relevant search term entered into an ECA tool produced 300,000 emails. However, many of these documents related to construction projects that were not relevant to the matter at hand. Emails tied to these extraneous projects were excluded from the dataset, cutting the number of search hits in half. Further analysis removed irrelevant email domains, resulting in an overall reduction of 200,000 documents—leaving only 100,000 for review. Spending just a few hours on this ECA exercise saved more than 4,000 hours of document review and hundreds of thousands of dollars. While the technology enabled rapid data exclusion, it took the experienced, skilled eyes of a human reviewer to unlock the full value of the ECA tool.
What Happens During Document Review and Analysis?
Document review and analysis is the most time-consuming and costly part of the eDiscovery process. It involves large teams of lawyers examining the documents that have survived the culling process to determine their relevance to the case. During document review, data is evaluated for relevance, privilege, confidentiality, and privacy. Technology assisted review (TAR) typically happens during document review.
Before undergoing document review, it’s important to:
As with early case assessment, technology can streamline the process with document hosting, review, and analysis. Examples are Relativity and 九色 Legal’s newly released . offering a robust suite of analytics and AI features including continuous active learning, redaction, near-duplicate analysis, and daily review reporting.
How Is AI Streamlining eDiscovery?
AI has become an increasingly integral part of eDiscovery over the past decade. By applying data mining techniques, AI can narrow the set of documents sent for review—saving both time and money. AI-powered eDiscovery tools leverage technologies such as machine learning (ML), natural language processing (NLP), and, more recently, generative AI (GenAI). Here are some of the ways AI can automate and streamline key aspects of the eDiscovery process:
Technology-Assisted Review (TAR)
helps prioritize and identify potentially relevant documents for review by learning from human reviewers' feedback. Over the past 10 years, TAR has become the most popular and powerful AI tool in the eDiscovery toolbox.
Concept Clustering
After files have been processed, concept clustering groups the remaining data based on concepts, topics, or ideas. This allows reviewers to examine documents based on similarity, remove groups of search-term “mishits,” and generally focus the review on more relevant content first.
Conceptual Search
Beyond keyword searches, conceptual searching enables users to search based on the context and meaning of the content, thereby improving the accuracy of search results.
Generative AI
Looking ahead, generative AI (e.g., ChatGPT) is poised to take on a more prominent role in eDiscovery. Take, for example, the “document dump” that sometimes happens the night before a deposition. GenAI may be used to quickly summarize documents or create a chronology, making a previously untenable eleventh-hour review possible. Likewise, GenAI can provide useful features, such as “ask my documents a question,” as well as potentially replace TAR as the key driver of document review itself.
Language Identification and Machine Translation
Some AI-powered eDiscovery platforms have language identification capabilities that can automatically detect and tag the language of documents. Likewise, advanced machine translation engines are often built directly into eDiscovery platforms for multilingual datasets.
Named Entity Recognition (NER)
AI-based NER can identify and extract entities such as names, locations, and dates from documents.
Sentiment Analysis
Sentiment analysis tools can help identify the tone and emotional context of communications.
AI in Action
Let’s say you want to find all documents that include references to “X” and any documents that include negative sentiments around “X.” Suppose you have 75,000 possible documents to review. Using an AI tool, you create your instructions, submit them, and then wait for documents to be reviewed. Each document gets classified in one of four ways: relevant to the issue, not relevant to the issue, needs further review, or has a technical issue. The tool will tell you which issue an item is relevant for (reference to X and/or negative sentiment about X). Now you have a much smaller pool of documents to review.
What Are Some Considerations for Multilanguage and Cross-Border eDiscovery?
The complexities of language and translation can make it difficult to understand and analyze electronic data in foreign languages. Cultural differences can also impact the interpretation and analysis of data, which can affect the accuracy and effectiveness of an investigation.
tools play an important role when electronic data is in more than one language. They use natural language processing (NLP) and machine learning algorithms to help investigators track, understand, translate, and analyze electronic data in multiple languages.
Cross-border investigations can be tricky, as laws and regulations regarding the collection, analysis, and transfer of electronic data vary by country. Investigators must have an awareness of data protection laws in each jurisdiction to ensure proper legal compliance. Multilingual eDiscovery tools offer legal and regulatory guidance, helping to ensure that no laws are broken within the jurisdictions in which the investigation is being conducted.
Data Privacy Compliance During eDiscovery
and protection are paramount to maintaining the integrity of case materials. Companies need to keep data privacy laws and regulations front and center during eDiscovery. This means adhering to rules under such acts as the General Data Protection Regulation (GDPR), the California Consumer Privacy Act (CCPA), and the European Union Privacy Directives (EUPD) for every jurisdiction in which they do business.
Companies must be mindful of how they identify and handle personal health information (PHI) and personally identifiable information (PII). Pinpointing where PHI and PII appear in datasets helps prevent potential data privacy violations. Advanced data mining technology can analyze large datasets to detect names, locations, organizations, and other sensitive details—and appropriately redact that information. It’s also important to note that the GDPR defines personal information more broadly, including categories such as sexual orientation and political beliefs.
Best Practices in Data Privacy
What Are Some Challenges Associated with eDiscovery?