Hero image for From RPA to IPA: When Robots Learn to Think with AI

From RPA to IPA: When Robots Learn to Think with AI

RPA UiPath AI Machine Learning Document Understanding LLM

Traditional RPA has a limitation: it can only follow explicit rules.

“If the button says ‘Submit’, click it.” “If the cell value is greater than 1000, flag it.” “If the email subject contains ‘Invoice’, process it.”

But what about:

  • “Read this invoice” (that has a completely different layout than yesterday’s)?
  • “Find the submit button” (when there’s no stable selector)?
  • “Classify this customer complaint” (from free-text email)?

These require cognitive capabilities. This is where AI enters the picture, transforming RPA into IPA (Intelligent Process Automation).


The AI/ML Opportunity in RPA

Traditional RPAAI-Enhanced RPA
Follows fixed rulesRecognizes patterns
Needs structured dataHandles unstructured data
Breaks on variationAdapts to variation
Explicit programmingLearning from examples

Where AI Adds Value

graph TB
    subgraph AI["AI Capabilities in RPA"]
        subgraph DU["Document Understanding"]
            DU1["Invoice extraction"]
            DU2["Contract parsing"]
            DU3["Form processing"]
        end
        subgraph CV["Computer Vision"]
            CV1["Find UI elements"]
            CV2["Screen recognition"]
            CV3["Image matching"]
        end
        subgraph NL["Natural Language"]
            NL1["Email parsing"]
            NL2["ChatGPT queries"]
            NL3["Text generation"]
        end
        subgraph SC["Sentiment & Classification"]
            SC1["Ticket routing"]
            SC2["Priority scoring"]
            SC3["Intent detection"]
        end
    end

Document Understanding

The crown jewel of AI-RPA integration. Document Understanding (DU) can extract data from documents even when layouts vary.

[!NOTE] UiPath-Specific: This section covers UiPath’s Document Understanding framework. Concepts apply similarly to other platforms (ABBYY, AWS Textract) but API/activity names are UiPath-specific.

OCR vs Document Understanding: The Semantic Layer

Traditional OCR = “What text is on this page?” Document Understanding = “What does this text mean?” (semantic layer)

graph TB
    subgraph DU["DU Architecture Layers"]
        S["SEMANTIC LAYER (DU)<br/>'This is an invoice number' 'This is the total'<br/>Understands MEANING based on context"]
        O["OCR LAYER (Digitization)<br/>'INV-2024-0001' '$1,234.56' 'Acme Corp'<br/>Reads TEXT but doesn't understand"]
        D["DOCUMENT (Image/PDF)"]
    end
    S -->|builds upon| O
    O --> D

The Problem with Traditional OCR Alone

Traditional approach:

  1. OCR extracts all text
  2. Use RegEx to find patterns
  3. Pray the position doesn’t change

This breaks when:

  • Different vendors use different invoice formats
  • The same vendor changes their template
  • Handwritten annotations appear

Step 0: Taxonomy (The Foundation)

[!NOTE] Critical: Everything in DU starts with Taxonomy. Without defining Taxonomy first, Classify and Extract have no reference point.

Taxonomy defines:

  • Document Types you want to process (Invoice, Purchase Order, Contract…)
  • Fields to extract from each type (InvoiceNumber, VendorName, Amount…)
  • Field Data Types (String, Date, Currency, Table…)
// Example Taxonomy Structure (taxonomy.json)
{
  "documentTypes": [
    {
      "typeName": "Invoice",
      "fields": [
        { "name": "InvoiceNumber", "type": "string" },
        { "name": "InvoiceDate", "type": "date" },
        { "name": "VendorName", "type": "string" },
        { "name": "TotalAmount", "type": "currency" },
        { "name": "LineItems", "type": "table", 
          "columns": ["Description", "Quantity", "UnitPrice", "Amount"] }
      ]
    },
    {
      "typeName": "PurchaseOrder",
      "fields": [...]
    }
  ]
}
' Load Taxonomy in UiPath
Activity: Load Taxonomy
├── TaxonomyPath: "taxonomy.json"
└── Output: documentTaxonomy

' Use throughout DU pipeline
' → Classify uses taxonomy to know what document types exist
' → Extract uses taxonomy to know what fields to look for

Document Understanding Pipeline

╔════════════╗   ╔════════════╗   ╔════════════╗   ╔════════════╗   ╔════════════╗
│  TAXONOMY  │══?│  DIGITIZE  │══?│  CLASSIFY  │══?│  EXTRACT   │══?│  VALIDATE  │
└──══════════╝   └──══════════╝   └──══════════╝   └──══════════╝   └──══════════╝
     │                │                │                │                │
     →                →                →                →                →
  Define doc      OCR engine       ML classifier    ML extractor    Human review
  types & fields   reads text       identifies doc   finds fields    catches errors
                                    type (invoice,   (amount, date,
                                    PO, contract)    vendor)

Step 1: Digitization

Convert document to machine-readable format:

Activity: Digitize Document
├── DocumentPath: "C:\Incoming\invoice.pdf"
├── OCR Engine: UiPath Document OCR
├── Languages: ["en", "zh-tw"]  ' Support multiple languages
└── Output: 
    ├── DocumentText: fullText
    ├── DocumentObjectModel: dom
    └── Pages: pageArray

Step 2: Classification

Determine document type:

Activity: Classify Document Scope
├── DocumentObjectModel: dom
├── ClassifierEndpoint: "https://du.uipath.com/classifier/invoice-model"
└── Activities:
    ├── Intelligent Keyword Classifier
    │   └── Keywords: {"Invoice": ["invoice", "bill", "amount due"],
"PurchaseOrder": ["PO", "purchase order"]}
    ├── Machine Learning Classifier
    │   └── Model: "invoice_po_classifier_v2"
    └── Present Classification Station (if confidence < 80%)

Output: documentType, confidence

Step 3: Data Extraction

Extract specific fields:

Activity: Data Extraction Scope
├── DocumentObjectModel: dom
├── DocumentType: documentType
└── Activities:
    ├── ML Extractor
    │   └── Model: "invoice_extractor_v3"
    │   └── Fields: [InvoiceNumber, InvoiceDate, VendorName, 
    │                LineItems, SubTotal, Tax, Total]
    ├── RegEx Extractor (backup for specific patterns)
    │   └── Patterns: {
"InvoiceNumber": "INV-\d{4}-\d{6}",
"Date": "\d{2}/\d{2}/\d{4}"
    │   }
    └── Form Extractor (for fixed-position fields)

Output: extractedData (ExtractionResult)

Step 4: Human Validation

For low-confidence extractions:

Activity: Present Validation Station
├── DocumentObjectModel: dom
├── ExtractionResults: extractedData
├── ValidationStationUrl: "https://validation.company.com"
└── Output: validatedData

' Robot pauses until human reviews and confirms
' Validated data is then used for training to improve model

Training Custom Models

When out-of-box models aren’t enough:

  1. Collect samples: 50-100 documents per type
  2. Label data: Use AI Center to annotate fields
  3. Train model: AI Center builds custom ML model
  4. Deploy: Publish to production endpoint
  5. Improve: Use validation corrections as new training data

2026 Trend: Generative Extraction (Zero-Shot)

[!TIP] When Training is Overkill: For one-off or highly unstructured documents (resumes, contracts, press releases), use Generative Extractor instead of training custom models.

How It Works:

Traditional ML Model          Generative Extractor
══════════════════════          ═════════════════════
                              
1. Collect 50+ samples        1. Write a prompt
2. Label fields manually      2. Done ✓
3. Train for hours           
4. Deploy model              
5. Retrain when formats change

Time: Days to weeks           Time: Minutes

Generative Extractor Prompt Example:

' Extract from resume without any training
prompt = "
Extract the following from this resume:
- Full Name
- Email
- Phone
- Skills (as array)
- Highest Education (degree and school)
- Years of Experience

Respond in JSON format.
"

result = GenerativeExtractor.Extract(documentText, prompt)

When to Use What:

ScenarioTraditional MLGenerative Extractor
High-volume (1000+ docs/day)✓ Cost-efficient✗ Too expensive
Fixed formats (invoices)✓ Faster inference~ Overkill
One-time extraction✗ Training overhead✓ Zero-shot
Non-standard docs (contracts)~ Hard to train✓ Natural language
Accuracy critical (finance)✓ Controllable~ Prompt-dependent

[!CAUTION] Cost Warning: Generative Extraction uses LLM API calls per document. At 0.03/1Ktokens,processing1000contractscouldcost0.03/1K tokens, processing 1000 contracts could cost 30-100. For recurring high-volume tasks, train a custom model instead.


Case Study: Training a Custom Invoice Extraction Model

[!IMPORTANT] When to Train Custom Models:

  • Out-of-box models accuracy < 80% for your document types
  • Highly specialized formats (industry-specific, internal forms)
  • Non-English documents with complex layouts
  • Need to extract custom fields not in standard models

Step 1: Prepare Your Dataset

Minimum Requirements:

Document TypeMinimum SamplesRecommended
Single layout (one vendor)20 documents50+
Multiple layouts (multi-vendor)50 documents100+ per layout
Complex forms100 documents200+

Quality Matters:

✓ Good Training Data:
   - Clear scans (300 DPI or higher)
   - Variety of real-world samples
   - Include edge cases (handwritten, stamps, corrections)

✗ Bad Training Data:
   - All from same day/batch (no variety)
   - Only "perfect" samples (won't handle real-world noise)
   - Synthetic or template documents only

[!IMPORTANT] Data Stratification: The Secret to Robust Models
Your training set MUST include “dirty” data to build robustness:

Data Type% of Training SetWhy
Clean scans40%Baseline quality
Mobile photos20%Real-world capture
Skewed/rotated15%Handling misalignment
Low light/shadows10%Lighting variation
Faxes/low resolution10%Legacy document handling
Handwritten annotations5%Human modifications

A model trained only on perfect scans will fail spectacularly when it encounters a photo taken under fluorescent lighting.

Step 2: AI Center Project Setup

╔═════════════════════════════════════════════════════════════════╗
│                    AI Center Project Structure                   │
├──═══════════════════════════════════════════════════════════════╣
│                                                                  │
│   Project: InvoiceProcessing_AP                                 │
│   │                                                              │
│   ├── Datasets                                                   │
│   │   ├── TrainingSet_v1 (80 invoices)                          │
│   │   ├── ValidationSet_v1 (20 invoices)                        │
│   │   └── TestSet_v1 (10 invoices - never trained on)           │
│   │                                                              │
│   ├── Labeling Sessions                                          │
│   │   ├── Session_2024Q1_Batch1                                 │
│   │   └── Session_2024Q1_Batch2                                 │
│   │                                                              │
│   ├── ML Packages                                                │
│   │   ├── invoice_extractor_v1.0 (initial)                      │
│   │   ├── invoice_extractor_v1.1 (improved)                     │
│   │   └── invoice_extractor_v2.0 (current production)           │
│   │                                                              │
│   └── ML Skills (Deployed Models)                                │
│       └── InvoiceExtractor_Prod                                 │
│                                                                  │
└──═══════════════════════════════════════════════════════════════╝

Step 3: Data Labeling in AI Center

Labeling Interface Workflow:

╔═════════════════════════════════════════════════════════════════╗
│                  Document Labeling Process                       │
├──═══════════════════════════════════════════════════════════════╣
│                                                                  │
│   ╔═════════════════════════════════════════════════════════╗   │
│   │          ORIGINAL DOCUMENT (PDF/Image)                   │   │
│   │   ╔═════════════════════════════════════════════════╗   │   │
│   │   │  INVOICE                                         │   │   │
│   │   │  Acme Corporation                     INV-12345  │   │   │
│   │   │  ═══════════════════════════════════════════════ │   │   │
│   │   │  Invoice Date: 2024-01-15                       │   │   │
│   │   │  Due Date: 2024-02-15                           │   │   │
│   │   │                                                  │   │   │
│   │   │  Item        Qty    Price      Amount           │   │   │
│   │   │  Widget A     10    $50.00     $500.00          │   │   │
│   │   │  Widget B      5    $30.00     $150.00          │   │   │
│   │   │  ═══════════════════════════════════════════════ │   │   │
│   │   │  Subtotal                       $650.00          │   │   │
│   │   │  Tax (10%)                       $65.00          │   │   │
│   │   │  TOTAL                          $715.00          │   │   │
│   │   └──═══════════════════════════════════════════════╝   │   │
│   └──═══════════════════════════════════════════════════════╝   │
│                              →                                   │
│   ╔═════════════════════════════════════════════════════════╗   │
│   │           LABELED OUTPUT (What You Create)               │   │
│   │                                                          │   │
│   │   Field: VendorName      → [Acme Corporation]           │   │
│   │   Field: InvoiceNumber   → [INV-12345]                  │   │
│   │   Field: InvoiceDate     → [2024-01-15]                 │   │
│   │   Field: DueDate         → [2024-02-15]                 │   │
│   │   Field: Subtotal        → [$650.00]                    │   │
│   │   Field: Tax             → [$65.00]                     │   │
│   │   Field: Total           → [$715.00]                    │   │
│   │   Table: LineItems       →                              │   │
│   │       Row 1: [Widget A] [10] [$50.00] [$500.00]        │   │
│   │       Row 2: [Widget B] [5]  [$30.00] [$150.00]        │   │
│   │                                                          │   │
│   └──═══════════════════════════════════════════════════════╝   │
│                                                                  │
└──═══════════════════════════════════════════════════════════════╝

Labeling Best Practices:

PracticeWhy It Matters
Be consistentLabel “Invoice #” and “Invoice Number” the same way
Include contextSelect currency symbols with amounts (“$500” not “500”)
Handle edge casesLabel even partially visible or crossed-out fields
Multi-page handlingLabel fields even if split across pages

Step 4: Train the Model

Training Configuration:

# AI Center Training Pipeline
Pipeline:
  BaseModel: "UiPath.ExtractiveDocumentML"
  Version: "3.0"
  
Training:
  Dataset: "TrainingSet_v1"
  ValidationSplit: 0.2
  Epochs: 50
  EarlyStoppingPatience: 10
  
Hyperparameters:
  LearningRate: 0.001
  BatchSize: 4
  AugmentationEnabled: true
  
Output:
  PackageName: "invoice_extractor"
  Version: "1.0.0"

Training Metrics to Monitor:

╔═════════════════════════════════════════════════════════════════╗
│                 Training Progress Dashboard                      │
├──═══════════════════════════════════════════════════════════════╣
│                                                                  │
│  Epoch 45/50                                                     │
│  →→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→ 90%         │
│                                                                  │
│  Field-Level Accuracy:                                           │
│  ════════════════════                                            │
│  InvoiceNumber   ████████████████████████ 98.5%                 │
│  VendorName      ███████████████████████? 95.2%                 │
│  TotalAmount     ████████████████████████ 99.1%                 │
│  InvoiceDate     ██████████████████████?? 92.3%                 │
│  LineItems       ████████████████████???? 87.6%  → Needs more   │
│                                                                  │
│  Overall Extraction Accuracy: 94.5%                              │
│                                                                  │
└──═══════════════════════════════════════════════════════════════╝

Step 5: Evaluate & Deploy

Evaluation Checklist:

' Test on held-out test set (never seen during training)
testResults = EvaluateModel("invoice_extractor_v1.0", "TestSet_v1")

' Check metrics
If testResults.OverallAccuracy < 0.90 Then
    ' Need more training data or hyperparameter tuning
    Log.Warn("Model accuracy below threshold, not ready for production")
ElseIf testResults.LowestFieldAccuracy < 0.85 Then
    ' Specific field needs more examples
    Log.Warn($"Field {testResults.LowestField} needs improvement")
Else
    ' Ready to deploy
    Log.Info("Model passed evaluation, deploying to production")
End If

Deployment to ML Skill:

AI Center → ML Skills → Create Skill
├── Package: invoice_extractor_v1.0
├── Skill Name: InvoiceExtractor_Prod
├── GPU: Enable (for faster inference)
├── Replicas: 2 (for high availability)
└── Auto-scaling: Min 1, Max 5

Step 6: Use in UiPath Workflow

Activity: Data Extraction Scope
├── DocumentObjectModel: dom
├── DocumentType: "Invoice"
└── Activities:
    ├── ML Extractor
    │   ├── Endpoint: "https://aicenter.company.com/mlskills/InvoiceExtractor_Prod"
    │   ├── ApiKey: {{GetCredential("AICenter_APIKey")}}
    │   └── Fields: [All from Taxonomy]
    └── RegEx Extractor (fallback for low confidence)

Output: extractedData

' Access extracted values
vendorName = extractedData.GetField("VendorName").Value
invoiceNumber = extractedData.GetField("InvoiceNumber").Value
totalAmount = extractedData.GetField("TotalAmount").Value
lineItems = extractedData.GetField("LineItems").Table

Step 7: Continuous Improvement

[!TIP] Human Validation = Free Training Data
Every correction made in Validation Station becomes labeled data for retraining.

╔═════════════════════════════════════════════════════════════════╗
│            Continuous Improvement Feedback Loop                  │
├──═══════════════════════════════════════════════════════════════╣
│                                                                  │
│                     ╔═════════════════╗                         │
│              ╔═════?│ Production Bot  │                         │
│              │      │   (v1.0)        │                         │
│              │      └──══════┬════════╝                         │
│              │               │                                   │
│              │               →                                   │
│   ╔══════════┼═══════╗  ╔════════════╗                         │
│   │  Retrain Model   │  │ Low Conf?  │════? Auto-process       │
│   │   (v1.1)         │  └──══┬═══════╝      (confidence > 90%)  │
│   └──════════════════╝       │                                   │
│              →               →                                   │
│              │         ╔════════════╗                           │
│              │         │ Validation │                           │
│              │         │  Station   │                           │
│              │         └──══┬═══════╝                           │
│              │              │                                    │
│              │              →                                    │
│              │  ╔═══════════════════════╗                       │
│              └──│ Collect Corrections   │                       │
│                 │ (New Training Data)   │                       │
│                 └──═════════════════════╝                       │
│                                                                  │
│  Result: Model accuracy improves from 94% → 97% → 99%           │
│                                                                  │
└──═══════════════════════════════════════════════════════════════╝

Retraining Schedule:

TriggerAction
Monthly (scheduled)Retrain with accumulated corrections
Accuracy drops below 90%Immediate investigation and retrain
New vendor format encounteredAdd samples, retrain classifier
Major accuracy drift detectedFull model review

Example: Multi-Format Invoice Processing

' Handle invoices from any vendor
documentTypes = ClassifyDocument(dom)

Select Case documentTypes.FirstOrDefault()?.Type
    Case "Invoice"
        data = ExtractWithModel(dom, "universal_invoice_model")
    Case "CreditNote"
        data = ExtractWithModel(dom, "credit_note_model")
    Case Else
        ' Unknown format - send to human
        data = PresentValidationStation(dom)
End Select

' Extracted data is now structured regardless of original format
vendorName = data("VendorName").Value
amount = data("TotalAmount").Value
invoiceNumber = data("InvoiceNumber").Value

Computer Vision

When selectors fail completely, Computer Vision uses AI-powered visual recognition to find UI elements.

[!NOTE] UiPath-Specific: UiPath AI Computer Vision uses neural networks trained on millions of UI elements. This is fundamentally different from the older “Click on Image” approach.

Image Automation vs AI Computer Vision

FeatureTraditional Image (Pixel-Based)AI Computer Vision (Neural Network)
TechnologyPixel-perfect template matchingNeural network recognition
Resolution Change✗ Breaks completely✓ Still works
Color/Theme Change✗ Often breaks✓ Usually works
Element RecognitionMatches exact pixelsRecognizes “button”, “text field”, “checkbox” as concepts
Accuracy0.95+ required0.7-0.8 is often enough
Use CaseExact image matchingVirtual environments (Citrix/VDI)
╔═════════════════════════════════════════════════════════════════╗
│        Traditional Image vs AI Computer Vision                   │
├──═══════════════════════════════════════════════════════════════╣
│                                                                  │
│  TRADITIONAL IMAGE                AI COMPUTER VISION            │
│  ═════════════════                ══════════════════            │
│                                                                  │
│  "Does this pixel block          "This looks like a            │
│   match EXACTLY?"                 Submit button"                │
│                                                                  │
│  ╔════════╗                       ╔════════╗                    │
│  │ Submit │  → Exact match        │ Submit │  → Concept match  │
│  └──══════╝    required           └──══════╝    (any style)    │
│                                                                  │
│  Resolution: 1920→1080            Resolution: Any               │
│  Color: Exact blue                Color: Any theme              │
│  Font: Exact Arial 12pt          Font: Any readable            │
│                                                                  │
└──═══════════════════════════════════════════════════════════════╝

When to Use AI Computer Vision

[!NOTE] Primary Use Case: AI CV was designed specifically for Virtual Desktop Infrastructure (VDI) environments like Citrix, RDP, and VMware Horizon where there’s no DOM access.

ScenarioCV Solution
Citrix/VDI (Primary)No DOM access—CV finds buttons by visual AI recognition
Legacy Windows appsNo automation framework—CV sees what you see
Image-based menusIcons without text—CV recognizes UI patterns
Resolution-variable displaysDifferent screens—CV adapts to visual changes

CV Activities

Click on Image:

Activity: CV Click
├── Target: Image of button (saved as PNG)
├── Accuracy: 0.8 (80% similarity threshold)
├── WaitForTarget: True
├── Timeout: 30000
└── Action: Click

Type with CV Context:

Activity: CV Type Into
├── Target: Image of text field label ("Username:")
├── Text: username
├── Anchor: Left (type to the right of anchor)
└── Relative Position: 50, 0 (50px right, 0px down)

Screen Region Comparison:

Activity: CV Screen Scope
├── IndicateAnchor: Screenshot of stable area
└── Activities:
    ├── CV Get Text: Read text from region
    ├── CV Element Exists: Check if image appears
    └── CV Click: Click on matched element

CV Descriptor: How AI Actually Finds Elements

[!NOTE] Behind the Scenes: CV uses a Descriptor that combines multiple recognition strategies, not just image matching.

CV Descriptor Components:

╔═════════════════════════════════════════════════════════════════╗
│                  CV Descriptor Structure                        │
├──═══════════════════════════════════════════════════════════════╣
│                                                                 │
│   ╔═══════════════════════╗                                     │
│   │   TEXT FEATURES     │  "Submit", "Login", "????"            │
│   │   (OCR-based)       │  Any text visible on/near element     │
│   └──═════════════════════╝                                     │
│           +                                                     │
│   ╔═══════════════════════╗                                     │
│   │  VISUAL FEATURES    │  Button shape, checkbox, icon         │
│   │  (Neural Network)   │  Recognizes UI element "type"         │
│   └──═════════════════════╝                                     │
│           +                                                     │
│   ╔═══════════════════════╗                                     │
│   │   ANCHOR POSITION   │  "To the right of 'Username:'"        │
│   │   (Relative layout)  │  Position relative to stable text    │
│   └──═════════════════════╝                                     │
│           =                                                     │
│   ╔═══════════════════════╗                                     │
│   │  ROBUST MATCH       │  Works even if button changes color   │
│   │  (Combined score)   │  or moves slightly                    │
│   └──═════════════════════╝                                     │
│                                                                 │
└──═══════════════════════════════════════════════════════════════╝

Anchor Best Practices:

Anchor TypeWhen to Use
Text label”Username:” + field to the right
IconSearch icon + input field below
Stable UI regionHeader area that doesn’t change

Fuzzy Matching

CV doesn’t require exact matches:

' Handles slight variations in button appearance
Activity: CV Click
├── Target: "submit_button.png"
├── Accuracy: 0.7  ' 70% match is enough
├── WaitForReady: Interactive
└── MatchExact: False  ' Allow color variations

CV + Traditional Automation

Best practice: use CV as a fallback:

Try
    ' Try selector first (faster, more reliable)
    Click(submitButtonSelector)
Catch SelectorNotFoundException
    ' Fall back to Computer Vision
    Log.Warn("Selector failed, using Computer Vision")
    CV_Click("submit_button.png")
End Try

Virtual Desktop Automation

For Citrix/RDP where you only see pixels:

Activity: Citrix Scope
├── Application: "SAP"
└── Activities:
    ├── CV Type Into: Username field
    ├── CV Type Into: Password field
    ├── CV Click: Login button
    ├── CV Wait For Image: SAP menu loaded
    └── Continue with CV navigation...

LLM/ChatGPT Integration

Large Language Models bring natural language understanding to RPA.

Use Cases

TaskHow LLM Helps
Email classificationUnderstand intent, not just keywords
Response draftingGenerate human-like replies
Data extractionFind entities in unstructured text
SummarizationCondense long documents
TranslationMulti-language support
Decision supportSuggest actions based on context

Integrating OpenAI/Azure OpenAI

Direct API Call:

' Call ChatGPT API
endpoint = "https://api.openai.com/v1/chat/completions"

requestBody = New JObject()
requestBody("model") = "gpt-4"
requestBody("messages") = New JArray({
    New JObject({{"role", "system"}, {"content", "You are a customer service classifier. Respond with JSON only."}}),
    New JObject({{"role", "user"}, {"content", $"Classify this email: {emailBody}"}}
})
requestBody("temperature") = 0.1  ' Low temperature for consistent output

response = HTTP_POST(endpoint, requestBody.ToString(), headers)
classification = JObject.Parse(response)("choices")(0)("message")("content")

Example: Customer Email Classification

' Input: raw customer email
emailContent = "I've been waiting 3 weeks for my order #12345. 
               This is completely unacceptable! I want a full refund 
               and I'm never ordering from you again!"

' Build prompt
prompt = $"
Analyze this customer email and respond with JSON:
{{
  ""category"": ""complaint"" | ""inquiry"" | ""feedback"" | ""order_status"",
  ""sentiment"": ""positive"" | ""neutral"" | ""negative"",
  ""urgency"": ""low"" | ""medium"" | ""high"" | ""critical"",
  ""order_number"": ""extracted order number or null"",
  ""suggested_action"": ""brief action recommendation""
}}

Email:
{emailContent}
"

' Call LLM
result = CallOpenAI(prompt, model: "gpt-4", temperature: 0.1)

' Parse result
classification = JObject.Parse(result)
category = classification("category").ToString()      ' "complaint"
sentiment = classification("sentiment").ToString()    ' "negative"
urgency = classification("urgency").ToString()        ' "critical"
orderNumber = classification("order_number").ToString() ' "12345"
suggestedAction = classification("suggested_action").ToString()

' Route based on classification
Select Case category
    Case "complaint"
        If urgency = "critical" Then
            CreatePriorityTicket(emailContent, orderNumber)
        Else
            CreateStandardTicket(emailContent, orderNumber)
        End If
    Case "order_status"
        SendOrderStatusUpdate(orderNumber, email.From)
    ' ...
End Select

Example: Document Summarization

' Summarize a long contract
contract = ReadPDF("service_agreement.pdf")

prompt = $"
Summarize this contract in bullet points covering:
- Parties involved
- Service scope
- Payment terms
- Duration
- Key obligations
- Termination clauses

Contract:
{contract.Substring(0, 15000)}  ' Token limit
"

summary = CallOpenAI(prompt, model: "gpt-4", temperature: 0.3)

' Use summary for quick review
SendEmail("Legal Review Request", 
    $"Please review this contract summary:\n\n{summary}\n\n" +
    "Full document attached.", 
    attachments: {"service_agreement.pdf"})

Prompt Engineering Tips

TipExample
Be specific”Extract the invoice number in format INV-XXXX” not “Find the invoice number”
Provide examples”Like this: INV-2024-001”
Request structured output”Respond only with valid JSON”
Set constraints”If unsure, respond with ‘UNKNOWN‘“
Use low temperature0.1-0.3 for consistent, deterministic outputs

Token Management: Control LLM Costs

[!WARNING] The Hidden Cost Trap: Sending a 50-page contract to GPT-4 can cost 25perdocument.Withchunking,youcanreducethisto2-5 per document. With chunking, you can reduce this to 0.20.

Token Estimation:

' Rule of thumb: 1 token ? 4 characters (English) or 1-2 characters (CJK)
Function EstimateTokens(text As String) As Integer
    Return CInt(text.Length / 4)
End Function

' Example
contractText = ReadPDF("50_page_contract.pdf")  ' ~100,000 characters
tokens = EstimateTokens(contractText)  ' ~25,000 tokens
cost = tokens * 0.00003  ' GPT-4: ~$0.75 per request!

Chunking Strategy:

' DON'T send entire document
' DO extract relevant sections first

' Step 1: Use RegEx/DU to find relevant sections
paymentSection = ExtractSection(contract, "Payment Terms")
terminationSection = ExtractSection(contract, "Termination")

' Step 2: Only send what you need
prompt = $"
Analyze these contract sections:

Payment Terms:
{paymentSection}   ' ~500 tokens instead of 25,000

Termination:
{terminationSection}

Extract: payment schedule, late fees, termination notice period.
"

' Result: 90% cost reduction

Cost Comparison:

ApproachTokensCost (GPT-4)
Send full 50-page contract25,000$0.75
Extract relevant sections2,500$0.08
Savings89%

LLM Guardrails: Preventing Hallucinations

[!NOTE] CRITICAL FOR ENTERPRISE RPA: LLMs can “hallucinate” - generate plausible-sounding but completely fabricated information. In enterprise automation, blindly trusting LLM output without validation can cause serious damage.

The Problem:

LLM: "The order number is ORD-12345 and the refund amount is $500"

Reality: 
- ORD-12345 doesn't exist in your database
- Customer actually ordered $50, not $500
- Your bot just issued a $500 refund that shouldn't happen!

Guardrail Strategies:

╔═════════════════════════════════════════════════════════════════╗
│              LLM Output Validation Pipeline                     │
├──═══════════════════════════════════════════════════════════════╣
│                                                                 │
│   LLM Response                                                  │
│       →                                                         │
│   ╔═════════════════════════════════════════════════════════╗   │
│   │ 1. JSON SCHEMA VALIDATION                               │   │
│   │    Does the output match expected structure?            │   │
│   │    Are all required fields present?                     │   │
│   │    Are field types correct (string, number, date)?      │   │
│   └──══════════════════════┬════════════════════════════════╝   │
│                            →                                    │
│   ╔═════════════════════════════════════════════════════════╗   │
│   │ 2. BUSINESS RULE VALIDATION                             │   │
│   │    Is the amount within acceptable range?               │   │
│   │    Is the date in a valid range?                        │   │
│   │    Does the category exist in allowed values?           │   │
│   └──══════════════════════┬════════════════════════════════╝   │
│                            →                                    │
│   ╔═════════════════════════════════════════════════════════╗   │
│   │ 3. DATABASE VERIFICATION                                │   │
│   │    Does the Order ID exist?                             │   │
│   │    Does the Customer ID exist?                          │   │
│   │    Do the extracted values match database records?      │   │
│   └──══════════════════════┬════════════════════════════════╝   │
│                            →                                    │
│   ? Validated Output → Continue processing                      │
│   ? Validation Failed → Flag for human review                   │
│                                                                 │
└──═══════════════════════════════════════════════════════════════╝

Implementation with Newtonsoft.Json:

Imports Newtonsoft.Json
Imports Newtonsoft.Json.Linq
Imports Newtonsoft.Json.Schema

' Step 1: Define expected schema
jsonSchema = JSchema.Parse("
{
  'type': 'object',
  'properties': {
    'order_number': { 'type': 'string', 'pattern': '^ORD-[0-9]{5,10}$' },
    'category': { 'type': 'string', 'enum': ['complaint', 'inquiry', 'feedback'] },
    'urgency': { 'type': 'string', 'enum': ['low', 'medium', 'high', 'critical'] },
    'amount': { 'type': 'number', 'minimum': 0, 'maximum': 10000 }
  },
  'required': ['category', 'urgency']
}
")

' Step 2: Parse LLM response and validate against schema
Try
    llmOutput = JObject.Parse(rawLlmResponse)
    isValid = llmOutput.IsValid(jsonSchema, errors)
    
    If Not isValid Then
        Log.Error($"LLM output failed schema validation: {String.Join(", ", errors)}")
        SendToHumanReview(rawLlmResponse, "Schema validation failed")
        Return
    End If
Catch ex As JsonReaderException
    Log.Error($"LLM returned invalid JSON: {ex.Message}")
    SendToHumanReview(rawLlmResponse, "Invalid JSON format")
    Return
End Try

' Step 3: Verify extracted data against database
orderNumber = llmOutput("order_number")?.ToString()
If Not String.IsNullOrEmpty(orderNumber) Then
    orderExists = CheckOrderExists(orderNumber)  ' Database lookup
    If Not orderExists Then
        Log.Warn($"LLM hallucinated order number: {orderNumber} not found in database")
        llmOutput("order_number") = Nothing  ' Clear hallucinated value
        llmOutput("confidence_warning") = "Order number could not be verified"
    End If
End If

' Step 4: Business rule validation
amount = llmOutput("amount")?.Value(Of Decimal?)() 
If amount.HasValue AndAlso amount.Value > MaxRefundAmount Then
    Log.Warn($"LLM suggested amount ${amount} exceeds max refund ${MaxRefundAmount}")
    SendToHumanReview(llmOutput.ToString(), "Amount exceeds threshold")
    Return
End If

Guardrail Summary:

LayerWhat It CatchesExample
Schema ValidationStructural errorsMissing fields, wrong types
Enum/Pattern ValidationInvalid valuesCategory = “urgent” (not in list)
Range ValidationOut-of-bounds valuesAmount = $999,999
Database VerificationHallucinated entitiesNon-existent Order ID
Cross-Reference CheckInconsistent dataCustomer A’s order assigned to Customer B

[!NOTE] Rule of Thumb: The more critical the automation action (refunds, payroll, contracts), the more validation layers you need before acting on LLM output.

Rate Limiting and Cost Management

' Track usage
tokenCount += response("usage")("total_tokens").Value(Of Integer)()

' Rate limiting
If DateTime.Now - lastCallTime < TimeSpan.FromMilliseconds(100) Then
    Delay(100)  ' Prevent hitting rate limits
End If

' Cost estimation (GPT-4 pricing example)
costPerToken = 0.00003  ' $0.03 per 1K tokens
estimatedCost = tokenCount * costPerToken
Log.Info($"LLM usage: {tokenCount} tokens, ~${estimatedCost:F4}")

Combining AI Capabilities

The real power comes from combining multiple AI tools:

Example: Intelligent Invoice Processing

╔═════════════════════════════════════════════════════════════════╗
│           AI-Powered Invoice Processing Pipeline                 │
├──═══════════════════════════════════════════════════════════════╣
│                                                                  │
│  1. EMAIL ARRIVES                                                │
│     ├── LLM: Classify email intent                              │
│     └── If invoice-related → continue                           │
│                                                                  │
│  2. ATTACHMENT EXTRACTION                                        │
│     ├── Document Understanding: OCR + Digitization              │
│     └── ML Classifier: Identify document type                   │
│                                                                  │
│  3. DATA EXTRACTION                                              │
│     ├── ML Extractor: Pull structured fields                    │
│     ├── LLM: Parse unusual line item descriptions               │
│     └── Confidence check → Validation Station if needed         │
│                                                                  │
│  4. BUSINESS LOGIC                                               │
│     ├── Traditional RPA: Enter data in SAP                      │
│     ├── Computer Vision: Handle legacy ERP screens              │
│     └── API: Post to modern systems                             │
│                                                                  │
│  5. EXCEPTION HANDLING                                           │
│     ├── LLM: Generate human-readable error explanation          │
│     └── Email: Notify appropriate team                          │
│                                                                  │
└──═══════════════════════════════════════════════════════════════╝

AI Model Management

Model Versioning

Track which model version processed each document:

' Log model info with each transaction
transactionLog = New Dictionary(Of String, Object) From {
    {"DocumentId", documentId},
    {"ModelName", "invoice_extractor"},
    {"ModelVersion", "v3.2.1"},
    {"Confidence", extractionConfidence},
    {"ProcessedAt", DateTime.UtcNow}
}
LogToDatabase(transactionLog)

Performance Monitoring

' Track model accuracy over time
If humanValidated Then
    ' Compare extracted vs validated values
    For Each field In extractedFields
        accuracy = CalculateAccuracy(field, validatedValue)
        LogModelPerformance(modelName, field.Name, accuracy)
    Next
End If

' Alert if accuracy drops
If weeklyAccuracy < 0.85 Then
    SendAlert("Model accuracy dropped below 85%")
End If

Continuous Improvement Loop

╔═════════════════════════════════════════════════════════════════╗
│                    ML Improvement Cycle                          │
├──═══════════════════════════════════════════════════════════════╣
│                                                                  │
│   ╔════════════╗                      ╔════════════╗            │
│   │   Deploy   │?═════════════════════│   Train    │            │
│   │   Model    │                      │   Model    │            │
│   └──═══┬══════╝                      └──═══→══════╝            │
│         │                                   │                    │
│         →                                   │                    │
│   ╔════════════╗                      ╔════════════╗            │
│   │  Process   │                      │  Collect   │            │
│   │  Documents │═════════════════════?│ Validated  │            │
│   └──═══┬══════╝                      │   Data     │            │
│         │                             └──══════════╝            │
│         →                                                        │
│   ╔════════════╗                                                │
│   │  Human     │                                                │
│   │ Validation │ (for low-confidence extractions)               │
│   └──══════════╝                                                │
│                                                                  │
└──═══════════════════════════════════════════════════════════════╝

Key Takeaways

  1. Document Understanding extracts data from varying layouts without custom code per format.
  2. Computer Vision is your backup when selectors fail completely.
  3. LLMs handle unstructured text that pattern matching can’t.
  4. Combine AI capabilities for complex, real-world processes.
  5. Human-in-the-loop improves accuracy and provides training data.
  6. Monitor model performance and retrain when accuracy drops.
  7. AI doesn’t replace RPA—it extends it to handle what rules can’t.

The future isn’t RPA vs AI. It’s RPA with AI. And the developers who master both will be unstoppable.