Artificial Intelligence and Optical Character Recognition in FinTech

Banking automation is booming in recent years, with advancements in 24/7 mobile banking, enhanced security and fraud detection, blockchain integration, big data analytics, and many more digital technologies. Artificial intelligence systems support both customer-facing operations and automation solutions behind the scenes — but due to the range of document types accepted and various rules and regulations across state and international lines, much of document processing is still being done manually.

Dr. Amar Gupta, a researcher at CSAIL, the Department of Electrical Engineering and Computer Science (EECS), and the Institute for Medical Engineering and Science (IMES) at MIT, is developing technologies and business processes that are capable of quickly and accurately digitizing and processing financial and other documents with zero or minimal human intervention.

In Dr. Gupta’s work across fintech and health care, he takes an integrated approach, encompassing not only financial and medical expertise but also input from engineers, computer scientists, lawyers, and policy makers. In order to deploy novel technologies for fields like fintech and healthcare, he adopts a knowledge-based framework to distinguish between four levels of activities that should be considered for a society in the information age:

  1. Knowledge Acquisition
  2. Knowledge Discovery
  3. Knowledge Management
  4. Knowledge Dissemination

For example, Dr. Gupta said that when he came to the U.S., he had accounts at a bank that went through three successive rounds of mergers with other banks that merged over time. Each time a merge happened, lots of money was spent integrating this information.

“That’s one of the problems of data aggregation,” he said. “When you are doing things in the modern world, in a modern society, you really need access to information from many different areas. On one side you have this problem of data aggregation. The other side is this issue of data disintegration, which is reaching the data that you actually need. Data overload is what we are facing at this point.”

Each of the levels in his knowledge-based structure helps people parse through the massive amounts of data available, and can be further aided by technology for better interoperability between systems.

Check Processing

During the nineties, Dr. Gupta and his students developed the technology for reading the handwritten parts of checks with high speed and accuracy. They started using neural networks to read the amount accurately, especially the cent portion of the checks that varies widely in how it is written.

While the account number and check number, which are preprinted, could be automatically processed earlier, the technology developed by Dr. Gupta’s research group was able to read the handwritten numbers in the courtesy amount block. The time and cost for this reading has been reduced substantially, considering that 14.5 billion checks are processed each year in the U.S. alone.

Character recognition technology can be deployed to detect courtesy amounts of checks. This detection process involves three stages: preprocessing through the segmentation and normalization of numerals and punctuation, recognition of digits by neural networks, and postprocessing in which the algorithm verifies the accuracy of each digit.

As part of this work, Dr. Gupta also proposed a nationwide, electronic check clearance system, an idea that was subsequently adopted through the Check Clearing for the 21st Century Act (Check 21) in the U.S.

Automated Document Processing

Through the FinTech@CSAIL Initiative, Dr. Gupta and Peter Szolovits, Professor of Computer Science and Engineering at MIT, are working on a project that uses AI and optical character recognition (OCR) in document processing, as well as investigating further neural network applications in finance.

While important, checks are just one type of document that financial institutions receive in non-electronic form. Other documents that must be processed include ID cards and driver’s licenses that vary by country and state, corporate documents, and faxed instructions for buying or selling shares and other transactions.

The challenges in reading a driver’s license or ID card, for example, are that many documents have different variants, and context often matters — ID cards are not standardized and processed locally. To solve these issues, Dr. Gupta and Prof. Szolovits preprocess the image or scan and remove unnecessary color, designs, and noise. They then locate relevant text that is format independent and use a convolutional neural network (CNN) combined with OCR (or a service like Amazon Textract) to extract the data. For corporate documents, such as company bylaws, which are also noisy and not standardized, they apply similar steps like preprocessing the scan and applying OCR, as well as implement vector embedding to search and locate key terms and phrases — and this process can also be automated.

Some of these paper documents are handwritten and some are typed. Handwriting poses a challenge for machine-based reading because of variables like pen stroke direction and pressure. But typed and printed documents can also be hard for a machine to read. For example, many banks use shading and background images in these documents to reduce the incidence of fraud, and this adds further complexity that has to be accounted for in automated reading systems.

Similar to the check processing steps, the automated processing of information from paper documents to computers involves steps such as binarization of images, identification and extraction of strings of specific interest (e.g., postal code), segmentation of strings into individual characters, recognition of individual characters using artificial neural networks, and postprocessing to evaluate character accuracy.

Working with Data Sources to Standardize and Optimize Automation

The goal of the document processing with AI and OCR project is to automate the reading of documents that have a standard format by first detecting where the areas of relevant text are, then using OCR to “read” the text, and finally using contextual knowledge to determine what the text holds (e.g., a phone number or a name). To optimize automation in document processing, the researchers are working with several data sources located in the U.S. and internationally, including banks; organizations involved in processing forms they receive in paper form; and forms and spec sheets with textual, pictorial, and other formats received by collaborating organizations from other organizations.

In working with financial data from a company in the U.S., which receives a lot of company certificates to process with individualized templates for each state, Dr. Gupta and Prof. Szolovits are helping with the current program that allows a user to crop boxes on the template, which can be analyzed for relevant text. Next, they are working to analyze the scalability of this process and look into CNNs if necessary. The automatic creation of templates will open up opportunities to easily read and extract and interpret information from tax exemption forms and certificates.

For banks located outside the U.S., which hope to quickly process supervisory letters that are in a text-based PDF format, the researchers aim to extract data points such as bank name, risk rating, and other simple information; identify supervisory issues and the bank’s response to them; and ideally identify common language and tone elements across ratings and supervisors.

Document Processing for FinTech and Beyond

Dr. Gupta, Prof. Szolovits, and their team hope to use AI, OCR, neural networks, and contextual knowledge to use digital methods to extract information from a wide range of documents and analyze raw information from a specific vantage point, without the need for human eyes on every paper that comes in. They are designing a robust preprocessing pipeline to prepare documents for OCR, and implementing reinforcement learning to further improve the decision-making ability for preprocessing each document type.

As many institutions shift toward serving more customers around the globe, automation in fintech and healthcare and the application of Dr. Gupta’s knowledge-based structure will become increasingly relevant.

“In the case of document processing, we are looking at mainly financial documents right now, but we are also looking at some medical documents,” said Dr. Gupta. “We’re looking at online systems, how we can use the AI to combine them, and how we can do forecasting in the current environment so that we can optimize the resources that are needed to fight COVID-19.”

In addition, some organizations have broader needs that are related to doing business during the COVID-19 pandemic and beyond. Dr. Gupta is working on subprojects to meet these needs, including automated document processing for the supply chain side of the business, predictive modeling, and processing product specifications.