Top Five Mistakes Made When Collecting Data From Nontraditional Sources
- 8 min read
Collecting electronically stored information is a common practice for eDiscovery practitioners in today’s digital world. One of the things that makes this task more difficult over time is the overwhelming amount of new and emerging technologies constantly entering the market. Collecting data from these new sources can provide obstacles for attorneys because storage methods and retrieval capabilities differ amongst the various types of technologies. To gain a better understanding of these collection issues, attorneys must understand the difference between data from traditional and nontraditional sources. Traditional data is user-created, organized, and has established workflows; things like email and Microsoft Office documents generally fall into this category. On the other hand, nontraditional data refers to data types for which there are often few commonly accepted, and industry standard collection and processing procedures. Nontraditional data could include social media sites, text messages, chat applications, Internet of Things devices, and cloud platforms.
When an attorney receives and reviews disclosure requests, the first step should be determining whether any of the requests apply to nontraditional data sources. Since individuals regularly use or communicate over several devices and apps, the collection of nontraditional data sources is inevitable. Every organization should ensure that its litigation preparedness plan accounts for ways to handle data collection from nontraditional sources and how to handle certain issues that commonly arise. Becoming familiar with common collection mistakes should prove valuable to attorneys concerning how to approach this task. Here are five important mistakes to avoid:
Different Strokes for Different Data
Failure to understand how different data types require different collection approaches. Nontraditional data sources may require unique approaches to data collection for each data source. Retrieving relevant emails (i.e. traditional data) stored on a company server can be as simple as performing a keyword search with a date filter and exporting the results. However, collecting data from nontraditional sources will often require more advanced methods of data collection like direct device/platform analysis, special processing and outsourcing. For example, iPhone data can often be collected from a backup of the iPhone, and this collection is made easier if the backup is stored in the iCloud because then it can be downloaded from anywhere in the world. However, depending on the iPhone settings, certain information, including text messages, may not be included as part of the backup. If a collection of the phone proceeds without checking and/or changing these settings, important data may be missed. Another example is exporting data from a call recording system. Data exporting may require the involvement of the system vendor to export both the audio recordings and the metadata that exists as fielded information within the system’s database. If this aspect is not considered by counsel prior to collection, the metadata may be missed as it is often assumed that such metadata exists within the audio files as it often does with music files and other traditional forms of audio files. New and advanced call recording systems and other nontraditional data sources may not store data in the ways that we have become accustomed to, and careful examination of those data storage methods may mean the difference between getting the data you need or missing it.
Who Owns What Data?
In order to effectively collect data, eDiscovery practitioners must first know how it is generated and stored. Often nontraditional data is not stored on a device, but is instead stored in the cloud, a third-party host, or even an international server. This can introduce many issues associated with possession, custody, and control when requesting or collecting the data. A subpoena, search warrant, or other court order might be necessary in order to legally obtain the data. Additionally, a party may argue that they do not have to disclose certain data if they do not have it in their physical possession. However, recent case law seems to be leaning towards a duty to produce such data. Courts have found that control exists over data held by independent contractors and third-parties where a contractual business relationship exists. Creating governance policies, data maps, and custom workflows are crucial to dealing with nontraditional sources of data and also help to provide a foundation for properly addressing the issue of data ownership.
Ignoring the Issue of Proportionality
There are some situations where an organization can successfully argue that a discovery request is not proportional to the needs of the case. The 2015 amendments to the Federal Rules of Civil Procedure aimed to reduce the overly broad production of data by explicitly including the concept of proportionality. Since, several courts have responded to these changes by denying requests for data when the burden of collection and production outweighs, or seems to outweigh, the potential relevance/usefulness of that data. As always, the success of this argument will depend on the specific facts of a case. For example, one federal court held that a request for text messages was disproportional to the burden of collecting and producing them even though they had been produced in a pre-litigation investigation because the text messages only added minimal evidentiary value to the case. Litigators must be able to clearly articulate a proportionality argument in order to successfully avoid the production of minimally relevant/useful data.
Failure to Understand Data Structure and Accompanying Issues
It is important for eDiscovery practitioners to have an understanding of the data’s underlying structure and storage mechanisms. For example, normalized databases can pose problems for discovery review because certain data points for one client or topic may be spread across dozens of tables. In order to make discovery compliance simpler and more focused, the reviewer may need to “flatten” this multi-table structure, identify and isolate segments of information across multiple tables, and carve out the relevant information. Additionally, traditional review tools may not be appropriate for these types of data sets. For example, generating aggregate queries using a database platform to identify trends and outliers may be a more appropriate way to analyze large data sets than looking at individual records within a review platform.
It may be important to collect additional information that provides the necessary context to the raw data. Structured data sets are often not “self-documenting” like traditional data types. For example, an email can give a fair amount of context. Information like who sent it, to whom, on what date and at what time, what the subject is, and the content of the body are all known fields. In structured data, there may only be a list of data points with cryptic field names that may not adequately describe what the data points represent. It is important to understand that this data requires additional research that may include analysis of documentation, data dictionaries, or even conversations with the teams that manage this data. These actions need to be performed as part of the data collection to ensure all the relevant data is captured and not missed.
Being aware of the above common mistakes helps eDiscovery practitioners prepare for the myriad challenges associated with data collection from nontraditional sources. As new technologies continue to emerge, eDiscovery practitioners must remain educated about the systems that store data relevant to their cases, and the issues that must be considered in order to obtain and produce that data or to avoid the costs of such collections and productions where appropriate. If you found this blog informative, you may enjoy reading Ambiguity of Emojis Can Complicate Legal Cases or The Epiq Angle Blog.