Lessons in AI from the World’s Largest Repository of Document Images 

May 25, 2023

Lessons in AI from the World’s Largest Repository of Document Images  

By Vin Vomero, CEO FoxyAI

First American Data & Analytics recently sat down with their very own VP of Data, Data Applications, and Data Science Prabhu Narsina to share the recent breakthroughs made by their team in information extraction through AI and ML. 

I’ve pulled out a few key takeaways that can help us quickly see the impact these developments have on Real Estate transactions today – and for the foreseeable future. 

Meet First American 

First American Data and Analytics has over 8 billion document images – deeds, mortgages, foreclosures, and the like – from over 2,000 counties, making it the largest repository of document images available today. 

They process around 300,000 new documents on a daily basis and need to wrangle these documents (each with varying levels of quality) in a way that makes the information readily available and easy to understand. 

The Problem? 

Before AI, these documents were processed using double-key and verify processes, an expensive and slow method that only captured 40 – 50 form fields (out of 500+ available data elements) at a time. 

Historically, valuable information has been left off the table. For decades, First American Data & Analytics has been trying to come up with a new way to make the data on these images usable. 

Now, they’ve hit a breakthrough. 

Data Insights, Unlocked. 

Using a combination of ML and OCR (Optical Character Recognition), First American Data & Analytics can now capture up to 450 form fields in a single document. 

That’s a 1025% increase in captured form fields compared to traditional methods.  

How did they do it? 

First, they had to automate the process of identifying document types. They did so through multiple models and ultimately achieved document recognition accuracy greater than 96%, which rivals that of us humans. 

Then, they had to develop models for information extraction, starting with Natural Language Processing (NLP) and working up to bi-directional long-and short-term memory networks. 

The Results. 

While the sheer volume of information to process is a challenge in its own right, the benefit of it is that it enables the active learning part of ML. Now, First American can process hundreds of thousands of documents and images daily to extract the critical fields of information. 

In doing so with cloud architecture, First American is starting to see costs reduced by a factor of 10 – and the best is yet to come! 

To AI – and Beyond! 

I’ve previously shared that companies that adopt an Artificial Intelligence-first strategy will be better prepared to weather tough economic times, and First American is proving that to be true with this recent breakthrough. 

We’re delighted to spotlight the work of First American Data & Analytics and look forward to forging a better future through AI for all of us in Real Estate