Introduction
Optical Character Recognition (OCR) is a powerful capability that enables applications to extract text from images and scanned documents. While building a simple OCR demo is relatively easy, developing a production-ready solution requires careful attention to reliability, security, performance, and maintainability.
In the .NET ecosystem, one of the most practical ways to implement OCR is by using the Tesseract .NET wrapper, which provides a managed interface over the native Tesseract OCR engine. When combined with a modern UI framework like Blazor, you can build a clean, end-to-end solution entirely in C#.
This article walks through a complete, production-grade implementation of OCR in a Blazor application. It covers:
- Clean architecture design
- File validation and safety
- Logging and observability
- Rate limiting and protection
- Exception handling
- Cancellation support
- Deployment considerations
The goal is not just to “make OCR work,” but to build a solution you can confidently deploy in real-world applications.
Why Use Server-Side Processing in Blazor
Although Blazor supports both WebAssembly and server-based models, OCR processing is best handled on the server side.
The reason is simple: the Tesseract wrapper depends on native libraries and language data files. These dependencies are not well-suited for browser execution.
A better architecture is:
- User uploads an image via the UI
- The server validates the input
- The server processes OCR
- Results are returned to the UI
This approach ensures better performance, security, and maintainability.
Application Architecture Overview
The solution is structured into clear layers:
- UI Layer (Blazor Page) → Handles file upload and display
- Service Layer (OCR Service) → Executes OCR logic
- Configuration Layer (Options) → Controls behavior via settings
- Infrastructure (Logging, Rate Limiting) → Handles operational concerns
Prerequisites
Before starting:
- Create a Blazor Web App
- Install the Tesseract .NET package
- Add a
tessdatafolder with language files (e.g.,eng.traineddata) - Ensure required native runtime dependencies are installed
Project Structure
BlazorTesseractOcr/ ├─ Components/ │ └─ Pages/ │ └─ Ocr.razor ├─ Models/ │ └─ OcrResult.cs ├─ Options/ │ └─ OcrOptions.cs ├─ Services/ │ ├─ IOcrService.cs │ └─ TesseractOcrService.cs ├─ Extensions/ │ └─ FileValidationExtensions.cs ├─ tessdata/ ├─ appsettings.json └─ Program.cs
Configuration (appsettings.json)
{ "Ocr": { "TessdataPath": "tessdata", "Language": "eng", "MaxUploadSizeBytes": 5242880, "AllowedExtensions": [ ".png", ".jpg", ".jpeg", ".bmp" ], "TempFolder": "App_Data/Uploads", "MaxTextLength": 50000 }}
OCR Options Class
public sealed class OcrOptions{ public string TessdataPath { get; set; } = "tessdata"; public string Language { get; set; } = "eng"; public long MaxUploadSizeBytes { get; set; } public string[] AllowedExtensions { get; set; } = Array.Empty<string>(); public string TempFolder { get; set; } = ""; public int MaxTextLength { get; set; }}
OCR Result Model
public sealed class OcrResult{ public bool Success { get; init; } public string Text { get; init; } = ""; public float Confidence { get; init; } public string? Error { get; init; }}
OCR Service Interface
public interface IOcrService{ Task<OcrResult> ExtractTextAsync(Stream stream, string fileName, CancellationToken token);}
Core OCR Service (Production Ready)
public class TesseractOcrService : IOcrService{ private readonly ILogger<TesseractOcrService> _logger; private readonly OcrOptions _options; private readonly IWebHostEnvironment _env; public TesseractOcrService( ILogger<TesseractOcrService> logger, IOptions<OcrOptions> options, IWebHostEnvironment env) { _logger = logger; _options = options.Value; _env = env; } public async Task<OcrResult> ExtractTextAsync(Stream stream, string fileName, CancellationToken token) { if (stream == null || string.IsNullOrWhiteSpace(fileName)) return new OcrResult { Success = false, Error = "Invalid input" }; var extension = Path.GetExtension(fileName); if (!_options.AllowedExtensions.Contains(extension, StringComparer.OrdinalIgnoreCase)) return new OcrResult { Success = false, Error = "Unsupported file type" }; var tempPath = Path.Combine(_env.ContentRootPath, _options.TempFolder); Directory.CreateDirectory(tempPath); var filePath = Path.Combine(tempPath, Guid.NewGuid() + extension); try { using (var fs = File.Create(filePath)) await stream.CopyToAsync(fs, token); var tessPath = Path.Combine(_env.ContentRootPath, _options.TessdataPath); using var engine = new TesseractEngine(tessPath, _options.Language); using var img = Pix.LoadFromFile(filePath); using var page = engine.Process(img); var text = page.GetText() ?? ""; var confidence = page.GetMeanConfidence(); if (text.Length > _options.MaxTextLength) text = text.Substring(0, _options.MaxTextLength); return new OcrResult { Success = true, Text = text, Confidence = confidence }; } catch (Exception ex) { _logger.LogError(ex, "OCR failed"); return new OcrResult { Success = false, Error = "OCR failed" }; } finally { if (File.Exists(filePath)) File.Delete(filePath); } }}
Rate Limiting Configuration
builder.Services.AddRateLimiter(options =>{ options.AddFixedWindowLimiter("ocr", opt => { opt.PermitLimit = 10; opt.Window = TimeSpan.FromMinutes(1); opt.QueueLimit = 2; });});
Blazor UI Page
<InputFile OnChange="UploadFile" /> (result != null){ if (result.Success) { <p>Confidence: .Confidence</p> <textarea>.Text</textarea> } else { <p style="color:red">.Error</p> }}
{ private OcrResult? result; async Task UploadFile(InputFileChangeEventArgs e) { var file = e.File; using var stream = file.OpenReadStream(5 * 1024 * 1024); result = await OcrService.ExtractTextAsync(stream, file.Name, CancellationToken.None); }}
How the System Works
- User uploads a file
- UI validates size
- File is streamed to server
- Temporary file is created
- OCR engine processes image
- Text + confidence returned
- Temp file deleted
Production Readiness Considerations
Logging
- Track failures and performance
- Avoid logging sensitive OCR content
Exception Handling
- Always catch and log errors
- Return safe messages to users
Rate Limiting
- Prevent abuse and overload
- Protect OCR resources
File Security
- Validate extensions
- Limit file size
- Use safe temp storage
Cancellation Support
- Allow user to cancel long OCR tasks
Deployment
- Ensure
tessdataexists - Ensure native dependencies are installed
- Ensure file permissions are correct
Possible Enhancements
- Multi-language OCR
- PDF support (convert to images first)
- Background processing
- Database storage
- AI-based post-processing
Conclusion
Building OCR into a modern web application is more than just calling a library—it requires a thoughtful architecture that balances performance, security, and reliability.
By combining the Tesseract .NET wrapper with Blazor, you can create a clean, maintainable, and fully .NET-based solution.
The implementation shown here provides a strong production foundation. It demonstrates how to safely handle file uploads, execute OCR reliably, manage system resources, and protect your application from misuse.
From here, you can extend the system to support more complex workflows such as document processing pipelines, AI enrichment, or enterprise-grade document management systems.
