Building a Production Ready OCR Module in Blazor Using Tesseract .NET

Introduction

Optical Character Recognition (OCR) is a powerful capability that enables applications to extract text from images and scanned documents. While building a simple OCR demo is relatively easy, developing a production-ready solution requires careful attention to reliability, security, performance, and maintainability.

In the .NET ecosystem, one of the most practical ways to implement OCR is by using the Tesseract .NET wrapper, which provides a managed interface over the native Tesseract OCR engine. When combined with a modern UI framework like Blazor, you can build a clean, end-to-end solution entirely in C#.

This article walks through a complete, production-grade implementation of OCR in a Blazor application. It covers:

Clean architecture design
File validation and safety
Logging and observability
Rate limiting and protection
Exception handling
Cancellation support
Deployment considerations

The goal is not just to “make OCR work,” but to build a solution you can confidently deploy in real-world applications.

Why Use Server-Side Processing in Blazor

Although Blazor supports both WebAssembly and server-based models, OCR processing is best handled on the server side.

The reason is simple: the Tesseract wrapper depends on native libraries and language data files. These dependencies are not well-suited for browser execution.

A better architecture is:

User uploads an image via the UI
The server validates the input
The server processes OCR
Results are returned to the UI

This approach ensures better performance, security, and maintainability.

Application Architecture Overview

The solution is structured into clear layers:

UI Layer (Blazor Page) → Handles file upload and display
Service Layer (OCR Service) → Executes OCR logic
Configuration Layer (Options) → Controls behavior via settings
Infrastructure (Logging, Rate Limiting) → Handles operational concerns

Prerequisites

Before starting:

Create a Blazor Web App
Install the Tesseract .NET package
Add a tessdata folder with language files (e.g., eng.traineddata)
Ensure required native runtime dependencies are installed

Project Structure

			
BlazorTesseractOcr/
 ├─ Components/
 │   └─ Pages/
 │       └─ Ocr.razor
 ├─ Models/
 │   └─ OcrResult.cs
 ├─ Options/
 │   └─ OcrOptions.cs
 ├─ Services/
 │   ├─ IOcrService.cs
 │   └─ TesseractOcrService.cs
 ├─ Extensions/
 │   └─ FileValidationExtensions.cs
 ├─ tessdata/
 ├─ appsettings.json
 └─ Program.cs

		

Configuration (appsettings.json)

			
{
  "Ocr": {
    "TessdataPath": "tessdata",
    "Language": "eng",
    "MaxUploadSizeBytes": 5242880,
    "AllowedExtensions": [ ".png", ".jpg", ".jpeg", ".bmp" ],
    "TempFolder": "App_Data/Uploads",
    "MaxTextLength": 50000
  }
}

		

OCR Options Class

			
public sealed class OcrOptions
{
    public string TessdataPath { get; set; } = "tessdata";
    public string Language { get; set; } = "eng";
    public long MaxUploadSizeBytes { get; set; }
    public string[] AllowedExtensions { get; set; } = Array.Empty<string>();
    public string TempFolder { get; set; } = "";
    public int MaxTextLength { get; set; }
}

		

OCR Result Model

			
public sealed class OcrResult
{
    public bool Success { get; init; }
    public string Text { get; init; } = "";
    public float Confidence { get; init; }
    public string? Error { get; init; }
}

		

OCR Service Interface

			
public interface IOcrService
{
    Task<OcrResult> ExtractTextAsync(Stream stream, string fileName, CancellationToken token);
}

Core OCR Service (Production Ready)

			
public class TesseractOcrService : IOcrService
{
    private readonly ILogger<TesseractOcrService> _logger;
    private readonly OcrOptions _options;
    private readonly IWebHostEnvironment _env;
    public TesseractOcrService(
        ILogger<TesseractOcrService> logger,
        IOptions<OcrOptions> options,
        IWebHostEnvironment env)
    {
        _logger = logger;
        _options = options.Value;
        _env = env;
    }
    public async Task<OcrResult> ExtractTextAsync(Stream stream, string fileName, CancellationToken token)
    {
        if (stream == null || string.IsNullOrWhiteSpace(fileName))
            return new OcrResult { Success = false, Error = "Invalid input" };
        var extension = Path.GetExtension(fileName);
        if (!_options.AllowedExtensions.Contains(extension, StringComparer.OrdinalIgnoreCase))
            return new OcrResult { Success = false, Error = "Unsupported file type" };
        var tempPath = Path.Combine(_env.ContentRootPath, _options.TempFolder);
        Directory.CreateDirectory(tempPath);
        var filePath = Path.Combine(tempPath, Guid.NewGuid() + extension);
        try
        {
            using (var fs = File.Create(filePath))
                await stream.CopyToAsync(fs, token);
            var tessPath = Path.Combine(_env.ContentRootPath, _options.TessdataPath);
            using var engine = new TesseractEngine(tessPath, _options.Language);
            using var img = Pix.LoadFromFile(filePath);
            using var page = engine.Process(img);
            var text = page.GetText() ?? "";
            var confidence = page.GetMeanConfidence();
            if (text.Length > _options.MaxTextLength)
                text = text.Substring(0, _options.MaxTextLength);
            return new OcrResult
            {
                Success = true,
                Text = text,
                Confidence = confidence
            };
        }
        catch (Exception ex)
        {
            _logger.LogError(ex, "OCR failed");
            return new OcrResult { Success = false, Error = "OCR failed" };
        }
        finally
        {
            if (File.Exists(filePath))
                File.Delete(filePath);
        }
    }
}

		

Rate Limiting Configuration

			
builder.Services.AddRateLimiter(options =>
{
    options.AddFixedWindowLimiter("ocr", opt =>
    {
        opt.PermitLimit = 10;
        opt.Window = TimeSpan.FromMinutes(1);
        opt.QueueLimit = 2;
    });
});

		

Blazor UI Page

			
<InputFile OnChange="UploadFile" />
@if (result != null)
{
    if (result.Success)
    {
        <p>Confidence: @result.Confidence</p>
        <textarea>@result.Text</textarea>
    }
    else
    {
        <p style="color:red">@result.Error</p>
    }
}

		

			
@code {
    private OcrResult? result;
    async Task UploadFile(InputFileChangeEventArgs e)
    {
        var file = e.File;
        using var stream = file.OpenReadStream(5 * 1024 * 1024);
        result = await OcrService.ExtractTextAsync(stream, file.Name, CancellationToken.None);
    }
}

		

How the System Works

User uploads a file
UI validates size
File is streamed to server
Temporary file is created
OCR engine processes image
Text + confidence returned
Temp file deleted

Production Readiness Considerations

Logging

Track failures and performance
Avoid logging sensitive OCR content

Exception Handling

Always catch and log errors
Return safe messages to users

Rate Limiting

Prevent abuse and overload
Protect OCR resources

File Security

Validate extensions
Limit file size
Use safe temp storage

Cancellation Support

Allow user to cancel long OCR tasks

Deployment

Ensure tessdata exists
Ensure native dependencies are installed
Ensure file permissions are correct

Possible Enhancements

Multi-language OCR
PDF support (convert to images first)
Background processing
Database storage
AI-based post-processing

Conclusion

Building OCR into a modern web application is more than just calling a library—it requires a thoughtful architecture that balances performance, security, and reliability.

By combining the Tesseract .NET wrapper with Blazor, you can create a clean, maintainable, and fully .NET-based solution.

The implementation shown here provides a strong production foundation. It demonstrates how to safely handle file uploads, execute OCR reliably, manage system resources, and protect your application from misuse.

From here, you can extend the system to support more complex workflows such as document processing pipelines, AI enrichment, or enterprise-grade document management systems.