Set-up Text Extraction – Vector

Upon ingestion of a document into the Vector System, Vector's Imaging Module can automatically extract text on the document and map it to certain available fields. To set-up text extraction, follow the steps below.

First, you'll need to have an automatic Document Type assignment set-up. Follow the set-up steps Here.

Set Up Text Extraction

1. Under the Templates tab within Imaging, select the template that you want to set up Text Extraction for.

2. On this page, you will see the document you uploaded covered with an orange glowing box on the left.

On the right, you will see the Template Properties. Since you've already set up automatic document type assignment, then Document Type and Template Name are automatically filled. Match Threshold is the percentage value that says how much the image has to match for it to “match” with the template. By default, Vector sets the match threshold value to 40%. You may need to increase this value to avoid false positives.

Fill out the fields under Property 1:

Label - This is what property the extracted value should map to. Unless you have customizations on your account, you will only be able to select Name.
Path - This is the actual JSON schema path to the property. Once you select Name, then document.name will automatically appear as the path.
Default Value (optional) - This allows you to identify the template used to auto-image the Document. Along with the extracted value, this additional value will be added to the load number.

3. Now, let's train Vector on where to extract text from the document you uploaded. To begin, here are things you will need to know:

Relative Label: Highlighted in blue, these labels should appear in every instance of the document
- Try to choose unique labels on the document
- Try to choose labels with long strings
Extraction Area: Highlighted in orange, this text is dynamic and will be extracted
Border: For the orange Extraction Area, which border of it are you defining? You do not need all four borders.
Edge: For the blue Relative Label you are using to set a border, which edge of the Relative Label are you using?

First, draw a rectangle on a piece of text near the area you wish to extract.

Example: In the screenshot below, a rectangle (blue Relative Label) was drawn over Shipper No.

Under Property 1, a new row appeared that correlates with the drawn rectangle.

On the left, Left border was selected to define the left border of the orange extraction box.
In the middle, Shipper No. was the text extracted, which identifies the blue relative label.
On the right, Right edge was selected to define that the right edge of the blue relative label, Shipper No., is the left border of the orange extraction area.

4. Draw a second rectangle to further define the area you wish to extract.

Example: In the screenshot below, a rectangle (blue Relative Label) was drawn over STRAIGHT BILL OF LADING.

Under Property 1, another row appeared that correlates with the drawn rectangle.

On the left, Top border was selected to define the top border of the orange extraction box.
In the middle, STRAIGHT BILL OF LADING was the text extracted, which identifies the blue relative label.
On the right, Top edge was selected to define that the top edge of the blue relative label, STRAIGHT BILL OF LADING, is the top border of the orange extraction area.

5. Draw a third rectangle to further define the area you wish to extract.

Example: In the screenshot below, a rectangle (blue Relative Label) was drawn over Shipper No. again.

Under Property 1, another row appeared that correlates with the drawn rectangle.

On the left, Bottom border was selected to define the bottom border of the orange extraction box.
In the middle, Shipper No. was the text extracted, which identifies the blue relative label.
On the right, Bottom edge was selected to define that the bottom edge of the blue relative label, Shipper No., is the bottom border of the orange extraction area.

If needed, feel free to draw an additional rectangle to further define the area you wish to extract.

6. Once you have reviewed the rectangles that you have drawn and the borders/edges that you have set, click SAVE to save your template.

Test Auto-Indexing

Ingest a file into the Imaging Module that is the same Document Type as the template you set up to test if it auto-indexes.

If Vector's Imaging Module is able to match and read the document, it will automatically doctype & text extract fields, changing the status of the document to "Indexed".

If any part of the matching and text extraction fails, the document will stay in a "Queued" status until you manually assign a document type and/or text field.

Not working? Try re-defining the field, or create a template from a cleaner, clearer file. Once complete, click into the Queued image, click the three dots on the top right corner of the screen, and select Reimage. Wait to see if the Document Type and Text Extraction field(s) populate this time around.

Make sure to publish the document once indexed.

Do you want a Multi-page template? We do not support multi-page text extraction templates at this time. If you are attempting to extract information from subsequent pages, you’ll need to create a new template and image that page as its own document

Upon ingestion of a document into the Vector System, Vector's Imaging Module can automatically extract text on the document and map it to certain available fields. To set-up text extraction, follow the steps below.

Set Up Text Extraction

Test Auto-Indexing

Related articles