> For the complete documentation index, see [llms.txt](https://docs.radiux.org/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.radiux.org/radiux/data-extraction/intelligent-document-processing-idp.md).

# ↪ Intelligent Document Processing (IDP)

**Intelligent document processing** is the kind of technology that can automatically recognize and extract valuable data from diverse documents like scanned forms, PDF files, emails, etc., and transform it into the desired format. The technology is also referred to as Cognitive Document Processing, Intelligent Document Recognition, or Intelligent Document Capture.\
\
Whatever the name, there are numerous reasons to implement such software, including:

* elimination of manual interventions in the document-driven workflows;
* improved data quality and reliability as human-prone errors get excluded; and
* reduction in document processing execution time, resulting in decreased operational costs.

IDP is often combined with other technologies employed to automate mundane business tasks, namely [Robotic Process Automation](https://www.altexsoft.com/blog/robotic-process-automation/) (RPA) and Optical Character Recognition (OCR). Let’s see how all three work together and what enables the “intelligent” part of the system.

### The key stages of the IDP process <a href="#the-key-stages-of-the-idp-process" id="the-key-stages-of-the-idp-process"></a>

<figure><img src="/files/wdw21DkN7ejcDnULFa67" alt=""><figcaption></figcaption></figure>

#### Document classification <a href="#document-classification" id="document-classification"></a>

This phase aims at dividing documents into different categories by structure, content, and/or type. The step also has to do with detecting the beginning and the ending of the document.\
\
AI-driven document classification can be performed

* based on image patterns, with the help of computer vision algorithms — in the case of scans or document pictures; and
* based on the textual content, using NLP techniques — in the case of electronic documents.

Document classification greatly enhances the follow-up extraction process as the data from a particular document gets to the right workflow faster.

#### Data extraction <a href="#data-extraction" id="data-extraction"></a>

The most critical step in the process comes after the document classification is finished. It deals with the extraction of important data from documents.

<br>

<figure><img src="/files/ubtrNR6nas0JwmVeBpCq" alt=""><figcaption><p><em>An illustration of how IDP can turn unstructured documents into a standardized structured format</em></p></figcaption></figure>

First, IDP relies on OCR that extracts textual data from images, scanned documents, and PDF files and converts it into a readable digital output.

Then NLP tools enter the game and decide on the type of data being extracted including dates, figures, names, etc. In addition, ML-trained models can be used to make data consistent (e.g., $5 instead of 5 dollars), correct some common misspellings, transform data into a standard output format, and much more.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.radiux.org/radiux/data-extraction/intelligent-document-processing-idp.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
