From Problem to Architecture
In the previous post, we explored why traditional keyword-based CV screening fails HR departments. Now let’s dive into how we built a solution using Microsoft Azure’s AI services and C#.
When designing Candidex, we needed to answer three fundamental questions:
- How do we extract structured data from unstructured CV documents?
- How do we make that data searchable in meaningful ways?
- How do we provide intelligent, conversational responses to HR queries?
Azure’s ecosystem provided answers to all three.
The Azure Architecture
+------------------+ +------------------+ +------------------+
| CV Upload |----->| Azure Blob |----->| Document |
| (React App) | | Storage | | Intelligence |
+------------------+ +------------------+ +--------+---------+
|
v
+------------------+ +------------------+ +------------------+
| Chat UI |<---->| ASP.NET Core |<---->| Azure Cognitive |
| (React) | | Backend | | Search |
+------------------+ +--------+---------+ +------------------+
|
v
+------------------+
| OpenAI API |
| (GPT-4 Turbo) |
+------------------+
The Services We Chose
| Service | Purpose | Why We Chose It |
|---|---|---|
| Azure Blob Storage / Data Lake | Document storage | Scalable, integrates natively with other Azure services |
| Azure Document Intelligence | CV data extraction | Purpose-built for extracting structured data from documents |
| Azure Cognitive Search | Indexed search | Enterprise-grade search with filtering and ranking |
| OpenAI API | Intelligent responses | GPT-4’s reasoning for natural language interactions |
| ASP.NET Core | Backend API | Strong typing, excellent Azure SDK support |
The Tech Stack
Our C# implementation uses .NET 8 with the following key packages:
<PackageReference Include="Azure.AI.FormRecognizer" Version="4.1.0" />
<PackageReference Include="Azure.AI.DocumentIntelligence" Version="1.0.0-beta.2" />
<PackageReference Include="Azure.Search.Documents" Version="11.5.1" />
<PackageReference Include="Azure.Storage.Blobs" Version="12.20.0" />
<PackageReference Include="Azure.Storage.Files.DataLake" Version="12.18.0" />
<PackageReference Include="OpenAI" Version="1.11.0" />
Each package represents a piece of our processing pipeline:
- Storage SDKs handle document upload and retrieval
- Document Intelligence SDK extracts structured CV data
- Search SDK enables intelligent querying
- OpenAI SDK provides conversational AI capabilities
The Processing Pipeline
Step 1: Document Ingestion
When an HR professional uploads a CV, it lands in Azure Data Lake Storage:
DataLakeServiceClient dataLakeServiceClient = new DataLakeServiceClient(_storageConnectionString);
DataLakeFileSystemClient fileSystemClient = dataLakeServiceClient.GetFileSystemClient("landing");
DataLakeDirectoryClient directoryClient = fileSystemClient.GetDirectoryClient("documents");
We chose Data Lake over simple Blob Storage for its hierarchical namespace—organizing CVs by date, department, or job opening becomes trivial.
Step 2: Intelligent Extraction
This is where the magic happens. Azure Document Intelligence uses a trained model to extract specific fields from each CV:
public async Task<List<Dictionary<string, object>>> ProcessDocumentFromStream(Stream stream)
{
var operation = await _formRecognizerClient.AnalyzeDocumentAsync(
WaitUntil.Completed,
_modelId,
stream
);
var response = await operation.WaitForCompletionAsync();
var forms = response.Value;
foreach (AnalyzedDocument form in forms.Documents)
{
Dictionary<string, object> data = new Dictionary<string, object>();
data["Name"] = form.Fields["Name"].Value;
data["Email"] = SafeGetField(form, "Email");
data["ProfessionalSummary"] = form.Fields["ProfessionalSummary"].Value;
data["TechnicalSkills"] = form.Fields["TechnicalSkills"].Value;
data["JobRole"] = form.Fields["JobRole"].Value;
data["Education"] = form.Fields["Education"].Value;
// ... additional fields
}
}
The custom model knows where to find:
- Contact information (name, email, phone, LinkedIn, GitHub)
- Professional summary and current role
- Technical skills and certifications
- Work history with responsibilities
- Education and academic projects
- Languages spoken
Step 3: Searchable Index
Extracted data feeds into Azure Cognitive Search, making it queryable:
public async Task<IEnumerable<SearchResult<Dictionary<string, object>>>> ExecuteSearch(
string query,
string language)
{
var searchOptions = new SearchOptions
{
IncludeTotalCount = true,
Filter = $"Language eq '{language}'"
};
var searchResults = await _searchClient.SearchAsync<Dictionary<string, object>>(
query,
options: searchOptions
);
return searchResults.Value.GetResults();
}
The search index supports:
- Full-text search across all CV fields
- Filtering by language, skills, or experience level
- Ranking by relevance to the query
Step 4: Intelligent Responses
When an HR professional asks a question, we combine search results with GPT-4:
public async Task<string> CallOpenAIApi(object[] messages)
{
var request = new
{
model = "gpt-4-turbo",
messages = messages,
max_tokens = 2048
};
var response = await _httpClient.SendAsync(httpRequestMessage);
var openAIAPIResponse = JsonSerializer.Deserialize<OpenAIChatResponse>(jsonResponse);
return openAIAPIResponse?.choices?.FirstOrDefault()?.message?.content;
}
The result? Natural conversations like:
HR: “Find me senior developers with Python and cloud experience”
Candidex: “Based on your query, here are the top candidates:
- Maria Garcia, currently working as Senior Software Engineer at TechCorp
- James Chen, Lead Developer at CloudSystems Inc.
Both have 5+ years of Python experience and AWS/Azure certifications. Would you like more details about their specific projects?”
The Frontend Experience
We built the frontend in React with a clean, professional interface:
// components/SearchResults.js
const SearchResults = ({ results }) => {
return (
<div className="results-container">
{results.map(candidate => (
<CandidateCard
key={candidate.id}
name={candidate.Name}
role={candidate.CurrentRole}
skills={candidate.TechnicalSkills}
summary={candidate.ProfessionalSummary}
/>
))}
</div>
);
};
The chat interface allows natural language queries while displaying structured candidate information—the best of both worlds.
Multi-Language Support
A key feature: our system handles CVs in multiple languages. The Document Intelligence model was trained on both English and Spanish documents, and we filter search results by language:
var searchOptions = new SearchOptions
{
Filter = $"Language eq '{language}'"
};
This ensures Spanish-speaking recruiters see Spanish CVs first, while maintaining the ability to search across all candidates when needed.
What Worked Well
Azure’s Integration Story: The services connect seamlessly. Documents flow from Storage to Document Intelligence to Search without complex glue code.
C# and Strong Typing: When dealing with structured data extraction, compile-time type checking prevents entire categories of bugs.
Document Intelligence Training: Once we trained the custom model with sample CVs, extraction accuracy exceeded 90% for most fields.
What We’d Change
Service Complexity: Managing five Azure services means five sets of credentials, five monitoring dashboards, five potential failure points.
Cost at Scale: Per-document pricing for Document Intelligence adds up quickly with high CV volumes.
Flexibility: The trained model is specific to our CV format. New fields require retraining.
These observations would eventually lead us to explore a different architecture—but that’s a story for a future post.
Coming Up Next
In the next post, we’ll dive deep into Azure Document Intelligence: how we trained our custom CV model, handled edge cases, and achieved reliable data extraction from wildly varied resume formats.
Next up: Training Azure Document Intelligence for CV Extraction
About This Series: This blog series documents the development of Candidex, exploring both the original C# Azure implementation and its evolution. We share technical decisions, challenges faced, and lessons learned along the way.