Hero image for Beyond Click Automation: RPA Integration with Documents, Databases, and Files

Beyond Click Automation: RPA Integration with Documents, Databases, and Files

RPA UiPath PDF Excel Database Integration

Clicking buttons is easy. Moving data between systems is where RPA earns its keep.

In the real world, your automation needs to:

  • Extract tables from PDF invoices
  • Consolidate 50 Excel files into one report
  • Read from and write to SQL Server
  • Parse unstructured email bodies
  • Handle CSV, XML, JSON, and legacy formats

This is the integration layer that makes RPA genuinely useful.


PDF Processing

PDFs are everywhere in business - invoices, contracts, reports. And they’re notoriously difficult to automate.

Types of PDFs

TypeCharacteristicsExtraction Method
Native/DigitalCreated digitally, text is selectableDirect text extraction
Scanned/ImagePhotos of paper, no selectable textOCR required
HybridMix of native text and embedded imagesCombined approach

Native PDF Extraction

For digitally-created PDFs, use Read PDF Text:

Activity: Read PDF Text
├── FileName: "C:\Invoices\invoice_001.pdf"
├── Range: "All" (or "1-3" for specific pages)
└── Output: extractedText (String)

The output is raw text. You’ll need to parse it:

' Extract invoice number using RegEx (with defensive programming)
pattern = "Invoice\s*#?\s*:?\s*(\w+-?\d+)"
match = Regex.Match(extractedText, pattern)

' IMPORTANT: Always check match.Success before accessing Groups
If match.Success Then
    invoiceNumber = match.Groups(1).Value
Else
    invoiceNumber = "NOT_FOUND"
    Log.Warn($"Could not extract invoice number from text")
End If

' Extract total amount
pattern = "Total\s*:?\s*\$?([\d,]+\.\d{2})"
match = Regex.Match(extractedText, pattern)

If match.Success Then
    totalAmount = Decimal.Parse(match.Groups(1).Value.Replace(",", ""))
Else
    totalAmount = 0
    Log.Warn($"Could not extract total amount from text")
End If

Defensive Programming: Always use If match.Success before accessing match.Groups(). Without this check, your bot will crash on documents with unexpected formats.

Extracting Tables from PDFs

For structured data, Read PDF with OCR with table extraction:

Activity: Read PDF With OCR
├── FileName: "invoice.pdf"
├── OCR Engine: UiPath Document OCR
├── ExtractTables: True
└── Output:
    ├── Text: fullText
    └── Tables: extractedTables (DataTable[])

Processing the extracted table:

For Each table As DataTable In extractedTables
    For Each row As DataRow In table.Rows
        itemName = row(0).ToString()
        quantity = Convert.ToInt32(row(1))
        unitPrice = Convert.ToDecimal(row(2))
        Log.Info($"Item: {itemName}, Qty: {quantity}, Price: {unitPrice}")
    Next
Next

Scanned PDF + OCR

For image-based PDFs:

Activity: Read PDF With OCR
├── FileName: "scanned_invoice.pdf"
├── OCR Engine: Google Cloud Vision (or Tesseract, ABBYY)
├── Language: "eng" (or "chi_tra" for Traditional Chinese)
└── Output: ocrText

OCR Engine Comparison:

EngineAccuracySpeedCostBest For
TesseractGoodFastFreeSimple documents
Google VisionExcellentMediumPay-per-useMulti-language
ABBYY FineReaderExcellentSlowLicenseComplex layouts
UiPath Document OCRVery GoodFastIncludedUiPath integration

2026 Approach: Document Understanding (DU)

[!IMPORTANT] Regex is Not Enough for Real-World Documents Traditional pattern matching fails when invoice formats vary by vendor. UiPath Document Understanding uses AI to intelligently extract data.

How Document Understanding Works:

+--------------------------------------------------------------------+
|              Document Understanding Pipeline                        |
+--------------------------------------------------------------------+
|                                                                    |
|   PDF/Image -> Digitize -> Classify -> Extract -> Validate -> Export
|       |           |          |          |          |        |
|       v           v          v          v          v        v
|   [Input]     [OCR]    [Document   [ML Model  [Human    [JSON]
|               Engine    Type ID]   Key-Value  Review
|                                    Pairs]     Station]
|                                                                    |
|   Key Benefit: AI automatically finds fields like "Invoice #",    |
|   "Total", "Vendor" without hardcoded coordinates or Regex.       |
|                                                                    |
+--------------------------------------------------------------------+

When to Use What:

ScenarioRegex/OCRDocument Understanding
Single vendor, fixed format✓ SimpleOverkill
Multi-vendor invoices✗ Nightmare✓ Perfect
Handwritten forms✗ Impossible~ Requires training
High accuracy requirements~ Brittle✓ Self-improving

[!TIP] Even without full DU training, Intelligent Keyword Classifier and Form Extractor can handle many document types out-of-the-box.

PDF Generation

Creating PDFs from data:

' Generate PDF from HTML template
htmlContent = File.ReadAllText("template.html")
htmlContent = htmlContent.Replace("{{InvoiceNumber}}", invoiceNumber)
htmlContent = htmlContent.Replace("{{CustomerName}}", customerName)
htmlContent = htmlContent.Replace("{{TotalAmount}}", totalAmount.ToString("C"))

Activity: Generate PDF
├── Source: htmlContent
├── SourceType: HTML
└── FileName: "output_invoice.pdf"

Advanced Excel Operations

Excel is the universal translator of business data. Master it.

Beyond Read Range

Filtering and Sorting:

' Read entire sheet
dataTable = ExcelReadRange("data.xlsx", "Sheet1", "A1:Z1000")

' Filter using LINQ
filteredData = dataTable.AsEnumerable() _
    .Where(Function(r) r("Status").ToString() = "Pending" _
           AndAlso r.Field(Of Decimal)("Amount") > 1000) _
    .OrderByDescending(Function(r) r.Field(Of DateTime)("Date")) _
    .CopyToDataTable()

' Write filtered results
ExcelWriteRange(filteredData, "filtered_output.xlsx", "Results", "A1")

[!WARNING] Memory Killer: Large Excel Files Read Range loads the entire file into memory. For files > 50MB or 100,000+ rows, your robot will crash with OutOfMemoryException.

Solutions:

ApproachWhen to Use
Workbook ActivitiesDoesn’t open Excel, lower memory
OLEDB ProviderTreat Excel as a database, query specific rows
OpenXml (Coded Workflow)Streaming read, handle millions of rows

OLEDB Example:

connectionString = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=huge.xlsx;Extended Properties='Excel 12.0;HDR=YES';"
sql = "SELECT * FROM [Sheet1$] WHERE Amount > 1000"
' Only loads matching rows, not entire file

Dynamic Range Detection:

' Find last row with data
Activity: Excel Application Scope
├── Workbook: "data.xlsx"
└── Activities:
    ├── Read Cell: "A1048576" -> lastCell
    ├── Use formula: =COUNTA(A:A) to count non-empty
    └── Or use Get Last Row activity

Pivot Table Creation

While UiPath can’t directly create pivot tables, you can:

  1. Use Execute Macro (for macros saved in workbook):
Activity: Execute Macro
├── MacroName: "CreatePivotTable"
└── Parameters: [sourceRange, destSheet]
  1. Use Invoke VBA Activity:

Important: The Invoke VBA activity requires a file path to a .vbs or .txt file containing the VBA code, NOT a code string directly. This is a common mistake.

Step 1: Create VBA Script File (CreatePivot.vbs):

Sub CreatePivot()
    Dim pt As PivotTable
    Dim pc As PivotCache
    Set pc = ThisWorkbook.PivotCaches.Create(xlDatabase, "Data!A1:E100")
    Set pt = pc.CreatePivotTable("Summary!A1")
    pt.AddDataField pt.PivotFields("Amount"), "Sum of Amount"
    pt.PivotFields("Category").Orientation = xlRowField
End Sub

Step 2: Call the Activity:

Activity: Invoke VBA
├── CodeFilePath: "C:\Scripts\CreatePivot.vbs"  -> File path, not code string!
├── EntryMethodName: "CreatePivot"
└── EntryMethodParameters: (empty or as needed)

Modern Alternative (2026): Consider using Coded Workflows (see Article 7) to manipulate Excel via Interop.Excel with full IDE debugging support.

Multi-File Consolidation

Combining multiple Excel files:

' Get all Excel files in folder
files = Directory.GetFiles("C:\Reports\", "*.xlsx")

' Initialize master table with structure from first file
masterTable = ExcelReadRange(files(0), "Sheet1", "")
masterTable.TableName = "Consolidated"

' Append data from remaining files
For Each file In files.Skip(1)
    tempTable = ExcelReadRange(file, "Sheet1", "")

    ' Skip header row, append data rows
    For Each row As DataRow In tempTable.Rows.Cast(Of DataRow)().Skip(1)
        masterTable.ImportRow(row)
    Next
Next

' Write consolidated output
ExcelWriteRange(masterTable, "C:\Reports\consolidated.xlsx", "All Data", "A1")

Excel Formatting

Using Excel Application Scope for formatting:

Activity: Excel Application Scope
├── Workbook: "report.xlsx"
└── Activities:
    ├── Auto Fit Columns: Range="A:Z"
    ├── Set Range Color: Range="A1:E1", Color=Blue (header highlight)
    ├── Set Range Font: Range="A1:E1", Bold=True
    └── Add Borders: Range="A1:E100", Style=Thin

Database Connectivity

For high-volume, mission-critical data, databases beat Excel every time.

Security First: Never Hardcode Credentials

CRITICAL: Database passwords should NEVER appear in your code, config files, or logs.

Use Orchestrator Assets or Windows Credential Manager:

' WRONG - password in code or config (audit failure!)
connectionString = "Server=...;Password=MySecretPassword123;"

' CORRECT - retrieve from Orchestrator Credential Asset
dbCredential = GetCredential("DB_InvoiceDB_ServiceAccount")
connectionString = $"Server=server.company.com;Database=InvoiceDB;User Id={dbCredential.Username};Password={dbCredential.GetNetworkCredential().Password};"

For Windows Integrated Authentication (no password needed):

connectionString = "Server=server.company.com;Database=InvoiceDB;Integrated Security=True;"

Connection Patterns

SQL Server:

' Using Credential Asset (recommended)
dbCredential = GetCredential("SQL_ServiceAccount")
connectionString = $"Server=server.company.com;Database=InvoiceDB;User Id={dbCredential.Username};Password={dbCredential.Password};"

Activity: Connect
├── ConnectionString: connectionString
├── ProviderName: "System.Data.SqlClient"
└── Output: dbConnection

Oracle:

connectionString = "Data Source=ORCL;User Id=rpa_user;Password={{password}};"
providerName = "Oracle.ManagedDataAccess.Client"

MySQL:

connectionString = "Server=mysql.company.com;Database=crm;Uid=rpa;Pwd={{password}};"
providerName = "MySql.Data.MySqlClient"

Query Execution

SELECT (Read Data):

Activity: Execute Query
├── Connection: dbConnection
├── Sql: "SELECT InvoiceID, Vendor, Amount, DueDate
          FROM Invoices
          WHERE Status = 'Pending' AND Amount > @MinAmount"
├── Parameters: {{"@MinAmount", 1000}}
└── Output: resultTable (DataTable)

[!CAUTION] SQL Injection: The Security Red Line The @MinAmount parameter above isn’t just convenient - it’s mandatory for security.

' ✗ DANGEROUS: String concatenation (SQL Injection vulnerable!)
sql = "SELECT * FROM Users WHERE Name = '" & userInput & "'"
' If userInput = "'; DROP TABLE Users; --" -> Database destroyed

' ✓ SAFE: Parameterized query (SQL Injection proof)
sql = "SELECT * FROM Users WHERE Name = @UserName"
parameters = {{"@UserName", userInput}}

When is this critical? Whenever data comes from external sources: email bodies, PDF text, user forms, API responses.

INSERT/UPDATE (Write Data):

Activity: Execute Non Query
├── Connection: dbConnection
├── Sql: "INSERT INTO ProcessedInvoices
          (InvoiceID, ProcessedDate, RobotName, Status)
          VALUES (@InvoiceID, @ProcessedDate, @RobotName, @Status)"
├── Parameters: {
    {"@InvoiceID", invoiceId},
    {"@ProcessedDate", DateTime.Now},
    {"@RobotName", Environment.MachineName},
    {"@Status", "Success"}
}
└── Output: rowsAffected (Int32)

Stored Procedures

For complex operations, use stored procedures:

Activity: Execute Non Query
├── Connection: dbConnection
├── CommandType: StoredProcedure
├── Sql: "sp_ProcessInvoiceBatch"
├── Parameters: {
    {"@BatchID", batchId},
    {"@ProcessedBy", "RPA_Bot"},
    {"@OutputStatus", Direction.Output, SqlDbType.VarChar, 50}
}
└── Output: outputParams

Batch Operations

For inserting thousands of rows:

' BAD: One insert per row (slow)
For Each row In dataTable.Rows
    ExecuteNonQuery("INSERT INTO ...")  ' 10,000 round trips!
Next

' GOOD: Bulk insert
Activity: Insert DataTable
├── Connection: dbConnection
├── TableName: "StagingTable"
├── DataTable: dataTable
└── BatchSize: 1000

' Or use stored procedure with table-valued parameter

Transaction Handling

For data integrity:

Activity: Start Transaction
├── Connection: dbConnection
└── Output: transaction

Try
    ExecuteNonQuery("UPDATE Accounts SET Balance = Balance - @Amount ...", transaction)
    ExecuteNonQuery("UPDATE Accounts SET Balance = Balance + @Amount ...", transaction)
    ExecuteNonQuery("INSERT INTO TransactionLog ...", transaction)

    Activity: Commit Transaction
    └── Transaction: transaction

Catch ex As Exception
    Activity: Rollback Transaction
    └── Transaction: transaction

    Throw
End Try

File and Folder Operations

File Management Patterns

Watching for New Files:

Activity: File Change Trigger
├── Path: "C:\Incoming\Invoices"
├── Filter: "*.pdf"
├── NotifyFilter: Created
└── Actions:
    └── Process each new file

Archiving Processed Files:

' Create dated archive folder
archiveFolder = Path.Combine("C:\Archive", DateTime.Now.ToString("yyyy-MM-dd"))
Directory.CreateDirectory(archiveFolder)

' Move processed file with timestamp
newFileName = $"{Path.GetFileNameWithoutExtension(file)}_{DateTime.Now:HHmmss}{Path.GetExtension(file)}"
File.Move(file, Path.Combine(archiveFolder, newFileName))

Safe File Handling:

' Wait for file to be fully written (not locked)
maxWaitSeconds = 30
waited = 0

While IsFileLocked(filePath) AndAlso waited < maxWaitSeconds
    Delay(1000)
    waited += 1
End While

If waited >= maxWaitSeconds Then
    Throw New ApplicationException($"File locked: {filePath}")
End If

' Now safe to process
ProcessFile(filePath)

XML Processing

' Load XML
doc = XDocument.Load("data.xml")

' Query with LINQ to XML
invoices = doc.Descendants("Invoice") _
    .Where(Function(i) Decimal.Parse(i.Element("Amount").Value) > 1000) _
    .Select(Function(i) New With {
        .Id = i.Element("InvoiceID").Value,
        .Amount = Decimal.Parse(i.Element("Amount").Value),
        .Vendor = i.Element("Vendor").Value
    }).ToList()

' Create XML
newDoc = New XDocument(
    New XElement("ProcessedInvoices",
        From inv In invoices
        Select New XElement("Invoice",
            New XAttribute("Id", inv.Id),
            New XElement("Status", "Processed"),
            New XElement("Timestamp", DateTime.Now)
        )
    )
)
newDoc.Save("output.xml")

JSON Processing with Newtonsoft.Json

JSON is the standard format for APIs and modern configuration files. UiPath uses Newtonsoft.Json (Json.NET) for JSON handling.

Package: UiPath.Web.Activities includes JSON activities on the left side will include. For code-based parsing, Newtonsoft.Json is already referenced.

Parsing JSON Response from API:

' After HTTP Request activity returns responseBody (String)
' Parse to JObject for navigation

Imports Newtonsoft.Json.Linq

jsonObject = JObject.Parse(responseBody)

' Access nested properties
customerId = jsonObject("data")("customer")("id").ToString()
customerName = jsonObject("data")("customer")("name").ToString()
orderCount = CInt(jsonObject("data")("orders").Count())

' Access array elements
firstOrderId = jsonObject("data")("orders")(0)("orderId").ToString()

Iterating JSON Arrays:

' API returns: {"invoices": [{...}, {...}, ...]}
invoicesArray = JArray.Parse(jsonObject("invoices").ToString())

For Each invoice As JToken In invoicesArray
    invoiceNumber = invoice("invoiceNumber").ToString()
    amount = CDec(invoice("amount"))
    vendorName = invoice("vendor")("name").ToString()

    ' Handle optional fields safely
    dueDate = If(invoice("dueDate") IsNot Nothing,
                 DateTime.Parse(invoice("dueDate").ToString()),
                 DateTime.Today.AddDays(30))

    Log.Info($"Processing {invoiceNumber}: {amount:C}")
Next

Building JSON for API Requests:

' Method 1: Using JObject
requestBody = New JObject()
requestBody("transactionId") = transactionId
requestBody("processedDate") = DateTime.Now.ToString("yyyy-MM-ddTHH:mm:ssZ")
requestBody("status") = "Completed"
requestBody("details") = New JObject()
requestBody("details")("itemCount") = itemCount
requestBody("details")("totalAmount") = totalAmount

jsonString = requestBody.ToString()
' Use in HTTP Request Body

' Method 2: Using Anonymous Types (cleaner)
requestData = New With {
    .transactionId = transactionId,
    .processedDate = DateTime.Now.ToString("yyyy-MM-ddTHH:mm:ssZ"),
    .status = "Completed",
    .details = New With {
        .itemCount = itemCount,
        .totalAmount = totalAmount
    }
}
jsonString = JsonConvert.SerializeObject(requestData)

Deserialize to Strongly-Typed Objects:

' Define class (in Invoke Code or Coded Workflow)
Public Class Invoice
    Public Property InvoiceNumber As String
    Public Property Amount As Decimal
    Public Property DueDate As DateTime
End Class

' Deserialize
invoiceList = JsonConvert.DeserializeObject(Of List(Of Invoice))(jsonString)

For Each inv As Invoice In invoiceList
    ' IntelliSense works! Type-safe!
    Log.Info($"{inv.InvoiceNumber}: {inv.Amount}")
Next

Working with Config.json (Modern Pattern):

' Read config file
configJson = File.ReadAllText("config.json")
config = JObject.Parse(configJson)

' Access settings
orchestratorUrl = config("Settings")("OrchestratorUrl").ToString()
maxRetries = CInt(config("Settings")("MaxRetries"))
enableNotifications = CBool(config("Settings")("EnableNotifications"))

' Environment-specific overrides
currentEnv = config("CurrentEnvironment").ToString()  ' "Dev", "UAT", "Prod"
dbConnection = config("Environments")(currentEnv)("DatabaseConnection").ToString()

Modern Alternative: System.Text.Json

[!NOTE] For Coded Workflows (2026+): Consider System.Text.Json (.NET Core native) instead of Newtonsoft.Json for better performance.

// In Coded Workflow (.cs file)
using System.Text.Json;

// Deserialize (2-3x faster than Newtonsoft for large payloads)
var invoice = JsonSerializer.Deserialize<Invoice>(jsonString);

// Serialize
var json = JsonSerializer.Serialize(invoice, new JsonSerializerOptions
{
    WriteIndented = true,
    PropertyNamingPolicy = JsonNamingPolicy.CamelCase
});

When to Use Which:

LibraryProsConsUse When
Newtonsoft.JsonFeature-rich, flexibleSlower, largerComplex JSON, legacy code
System.Text.JsonFaster, native, lower memoryLess flexibleCoded Workflows, high performance

CSV Handling

' Read CSV with headers
Activity: Read CSV
├── FilePath: "data.csv"
├── IncludeColumnNames: True
├── Delimiter: ","
└── Output: csvTable

' Handle special cases
' - Quoted fields with commas inside
' - Different encodings (UTF-8, ASCII)
' - Custom delimiters (semicolon in Europe)

' Write CSV
Activity: Write CSV
├── FilePath: "output.csv"
├── DataTable: resultTable
├── IncludeColumnNames: True
└── Delimiter: ","

Email Processing

Reading and Parsing Emails

Activity: Get Outlook Mail Messages
├── Account: "invoices@company.com"
├── Filter: "[Subject] = 'Invoice' AND [ReceivedTime] >= '" &
            DateTime.Today.ToString("MM/dd/yyyy") & "'"
├── Top: 50
└── Output: emailMessages

For Each email In emailMessages
    ' Extract sender
    senderEmail = email.From.Address

    ' Parse body for data
    bodyText = email.Body
    invoiceNumber = ExtractWithRegex(bodyText, "Invoice\s*#?\s*:?\s*(\w+)")

    ' Save attachments
    For Each attachment In email.Attachments
        attachment.SaveAsFile($"C:\Incoming\{attachment.Name}")
    Next

    ' Mark as processed
    email.MarkAsRead()
    email.Move(processedFolder)
Next

Sending Formatted Emails

' Build HTML body
htmlBody = $"
<html>
<body>
    <h2>Daily Processing Report</h2>
    <p>Processing completed at {DateTime.Now:g}</p>
    <table border='1'>
        <tr><th>Metric</th><th>Value</th></tr>
        <tr><td>Processed</td><td>{successCount}</td></tr>
        <tr><td>Failed</td><td>{failCount}</td></tr>
        <tr><td>Duration</td><td>{duration.TotalMinutes:F1} minutes</td></tr>
    </table>
</body>
</html>
"

Activity: Send Outlook Mail Message
├── To: "ops-team@company.com"
├── Subject: $"[RPA] Daily Report - {DateTime.Today:d}"
├── Body: htmlBody
├── IsBodyHtml: True
├── Attachments: {"C:\Reports\daily_log.xlsx"}
└── Priority: Normal

Integration Patterns

Pattern 1: ETL Pipeline

+--------------------------------------------------------------------+
|                      ETL Pipeline Pattern                          |
+--------------------------------------------------------------------+
|                                                                    |
|  +----------+   +----------+   +----------+   +----------+         |
|  | EXTRACT  |-->|TRANSFORM |-->|   LOAD   |-->| ARCHIVE  |         |
|  +----------+   +----------+   +----------+   +----------+         |
|       |              |              |              |                |
|       v              v              v              v                |
|   - PDF OCR      - Validate     - SQL Insert   - Move files        |
|   - Email parse  - Cleanse      - API POST     - Update log        |
|   - Excel read   - Enrich       - Excel write  - Notify            |
|   - API GET      - Calculate    - Email send                       |
|                                                                    |
+--------------------------------------------------------------------+

Pattern 2: Data Reconciliation

' Load data from two sources
systemAData = QueryDatabase("SELECT * FROM Sales WHERE Date = @Date", systemA)
systemBData = ExcelReadRange("daily_report.xlsx", "Sales", "")

' Find discrepancies
discrepancies = (
    From a In systemAData.AsEnumerable()
    Join b In systemBData.AsEnumerable()
    On a("OrderID").ToString() Equals b("Order ID").ToString()
    Where a.Field(Of Decimal)("Amount") <> b.Field(Of Decimal)("Amount")
    Select New With {
        .OrderID = a("OrderID"),
        .SystemA_Amount = a("Amount"),
        .SystemB_Amount = b("Amount"),
        .Difference = a.Field(Of Decimal)("Amount") - b.Field(Of Decimal)("Amount")
    }
).ToList()

' Report discrepancies
If discrepancies.Any() Then
    SendAlertEmail(discrepancies)
    LogToDatabase(discrepancies)
End If

Error Handling for Data Integration

Try
    ' Attempt database operation
    result = ExecuteQuery(sql, parameters)

Catch ex As SqlException When ex.Number = 1205
    ' Deadlock - retry
    Log.Warn("Deadlock detected, retrying...")
    Delay(1000)
    result = ExecuteQuery(sql, parameters)

Catch ex As SqlException When ex.Number = -2
    ' Timeout - increase and retry
    Log.Warn("Query timeout, increasing timeout and retrying...")
    command.CommandTimeout = 120
    result = ExecuteQuery(sql, parameters)

Catch ex As IOException
    ' File locked or inaccessible
    Log.Error($"File I/O error: {ex.Message}")
    Throw New SystemException("Cannot access required file", ex)

Catch ex As Exception
    Log.Error($"Unexpected error: {ex.Message}")
    Throw
End Try

Key Takeaways

  1. PDFs require different strategies for native vs scanned documents.
  2. Excel is powerful beyond simple read/write - master LINQ for manipulation.
  3. Databases scale better than files for high-volume, concurrent access.
  4. Use transactions for data integrity in multi-step database operations.
  5. Archive everything: source files, processed data, logs.
  6. Handle file locks gracefully - files being written can’t be read.
  7. Email parsing requires robust RegEx for varying formats.

Data integration is where RPA stops being a toy and becomes enterprise infrastructure.