Реализация рекомендательной системы кандидатов для HR
HR-рекомендации работают в обе стороны: система подбирает кандидатов под вакансию и вакансии под кандидата. Сложность: matching неструктурированных данных (резюме, JD), учёт soft skills, diversity requirements и compliance-ограничений. Плюс нельзя использовать возраст, пол, национальность как признаки.
Semantic Matching резюме и вакансий
from sentence_transformers import SentenceTransformer
import numpy as np
import pandas as pd
from anthropic import Anthropic
class HRRecommendationSystem:
def __init__(self):
self.encoder = SentenceTransformer('all-mpnet-base-v2')
self.llm = Anthropic()
self.candidate_index = {}
self.job_index = {}
def index_candidate(self, candidate_id: str, resume: dict):
"""Индексирование резюме"""
# Структурированное текстовое представление
resume_text = self._resume_to_text(resume)
embedding = self.encoder.encode(resume_text, normalize_embeddings=True)
self.candidate_index[candidate_id] = {
'embedding': embedding,
'skills': resume.get('skills', []),
'experience_years': resume.get('total_experience_years', 0),
'current_salary': resume.get('current_salary', 0),
'location': resume.get('location', '')
}
def index_job(self, job_id: str, job_description: dict):
"""Индексирование вакансии"""
jd_text = self._jd_to_text(job_description)
embedding = self.encoder.encode(jd_text, normalize_embeddings=True)
self.job_index[job_id] = {
'embedding': embedding,
'required_skills': job_description.get('required_skills', []),
'min_experience': job_description.get('min_experience_years', 0),
'salary_max': job_description.get('salary_max', 0),
'location': job_description.get('location', '')
}
def _resume_to_text(self, resume: dict) -> str:
"""Конвертация резюме в текст для энкодинга"""
parts = []
if resume.get('summary'):
parts.append(resume['summary'])
if resume.get('skills'):
parts.append("Skills: " + ", ".join(resume['skills']))
for exp in resume.get('experience', [])[:3]:
parts.append(f"{exp.get('title', '')} at {exp.get('company', '')}: {exp.get('description', '')[:200]}")
for edu in resume.get('education', [])[:2]:
parts.append(f"{edu.get('degree', '')} in {edu.get('field', '')} from {edu.get('institution', '')}")
return ". ".join(parts)
def _jd_to_text(self, jd: dict) -> str:
"""Конвертация JD в текст"""
parts = [
jd.get('title', ''),
jd.get('description', '')[:500],
"Requirements: " + ", ".join(jd.get('required_skills', [])),
"Nice to have: " + ", ".join(jd.get('preferred_skills', []))
]
return ". ".join(p for p in parts if p)
def match_candidates_to_job(self, job_id: str,
n: int = 20,
hard_filters: dict = None) -> list[dict]:
"""Топ-N кандидатов для вакансии"""
if job_id not in self.job_index:
return []
job = self.job_index[job_id]
scored = []
for cid, candidate in self.candidate_index.items():
# Hard filters (compliance)
if hard_filters:
if (hard_filters.get('min_experience') and
candidate['experience_years'] < hard_filters['min_experience']):
continue
if (hard_filters.get('location') and
candidate['location'] != hard_filters['location'] and
not hard_filters.get('remote_ok', False)):
continue
# Semantic similarity
semantic_score = float(
np.dot(job['embedding'], candidate['embedding'])
)
# Skill match
required = set(job['required_skills'])
has = set(candidate['skills'])
skill_match = len(required & has) / max(len(required), 1)
# Salary fit
salary_ok = (
1.0 if job['salary_max'] == 0
else min(1.0, job['salary_max'] / max(candidate['current_salary'] * 1.2, 1))
)
final_score = 0.5 * semantic_score + 0.35 * skill_match + 0.15 * salary_ok
scored.append({
'candidate_id': cid,
'score': final_score,
'semantic_score': semantic_score,
'skill_match': skill_match,
'skill_gap': list(required - has)
})
scored.sort(key=lambda x: x['score'], reverse=True)
return scored[:n]
def generate_match_explanation(self, job_id: str,
candidate_id: str) -> str:
"""AI-объяснение совместимости"""
job = self.job_index.get(job_id, {})
candidate = self.candidate_index.get(candidate_id, {})
required_skills = set(job.get('required_skills', []))
candidate_skills = set(candidate.get('skills', []))
response = self.llm.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=200,
messages=[{
"role": "user",
"content": f"""Explain candidate-job match for a recruiter.
Required skills: {', '.join(required_skills)}
Candidate has: {', '.join(candidate_skills)}
Experience years: {candidate.get('experience_years', 0)}
Missing skills: {', '.join(required_skills - candidate_skills) or 'None'}
Write 2-3 sentences: strengths, gaps, and overall recommendation (Strong Match/Potential Match/Weak Match)."""
}]
)
return response.content[0].text
Semantic matching с sentence-transformers даёт точность совпадений на 20-30% выше keyword matching. Время скрининга 100 резюме сокращается с 3-4 часов до 15-20 минут. Важно: все признаки должны быть skills/experience based, без демографических данных. Рекомендуется регулярный аудит на bias (disparate impact analysis).







