Auto-Classification Aids Information Governance and Electronic Discovery
Machine learning allows computers to understand data and act on it without being explicitly programmed. Auto-classification, also known as auto-categorization or other terms, such as predictive coding or content analytics (depending on the application), uses computers to make valuable information more accessible and useful.
In most instances, computers accomplish this task faster and more accurately than humans unassisted by computers and related software.
Auto-classification uses meta-data and rules (algorithms) to recognize document types, engage in content analytics, and appropriately tag and index data, among other things. A basic example of auto-classification is an email filter that recognizes and sorts spam from valued email.
Auto-classification Advances in Information Governance
The use of auto-classification in records and information management arose in the early to mid-2000s, as information management professionals began to acknowledge that many record owners were neither quick nor accurate in deciding what information qualified as a record or in assigning disposal dates under the appropriate classification system.
In the mid-2000s, software developers suggested computers could correctly categorize information as accurately as humans, but could do it more quickly. Advances in computer algorithms and processing speeds, coupled with the exponential growth of electronically created and stored records and information, meant that computers in the past decade have been able to correctly categorize information much faster than humans.
Given these developments, an effective information governance program should include a comprehensive data classification capability, combined with the effective, timely deletion of information. The benefit of using auto-classification machine learning to sort data into types or categories is that the work can be accomplished using software alone to analyze the data.
Auto-Classification in eDiscovery
The use of computers and software to correctly classify information has implications not only for effective information governance but also for eDiscovery. With respect to eDiscovery, the rise in volume of electronically stored information, coupled with changes in the federal rules of civil procedure and evidence regarding admissibility of such information, have raised the stakes for being able to accurately and efficiently access information.
In eDiscovery, auto-classification is sometimes called predictive coding or technology-assisted review. Regardless of the name, auto-classification application in eDiscovery teaches computers to assign tags to information, just as in information governance.
Effective auto-classification software uses statistically relevant sampling and quality control to achieve transparency and defensibility in eDiscovery and information governance. Appropriate use of software to manage corporate information also can significantly decrease the cost and risk associated with eDiscovery and potential discovery violations.
The application of auto-classification improves the effectiveness and efficiency of both eDiscovery and information governance, and offers better data accessibility for business intelligence and compliance purposes.