Check out my blog post on SQLite vector embeddings that explains this project!
This example tutorial project demonstrates how to utilize Xenova Transformers and SQLite with the sqlite-vec vector extension to create and search vector embeddings stored in SQLite for AI-powered job matching.
The project consists of the following functionalities:
- Embeddings Creation: Using Xenova Transformers (specifically the
Xenova/gte-base
model) to create vector embeddings from raw string inputs. - Database Setup: Connecting to SQLite and loading the required extensions for vector operations, along with table creation for resumes and jobs.
- Job Matching: Providing functions to add resume and job embeddings and perform similarity searches to match candidates with jobs and vice versa.
- Enhanced Semantic Context: Enriching embeddings with structured metadata to improve matching quality.
- Intelligent Job Representation: Using hierarchical formatting, alternative titles, and career context to improve job matching.
- High-Similarity Examples: Demonstrating near-perfect matches with 90%+ cosine similarity scores.
- Negative Examples: Including deliberately mismatched profiles to verify the system works properly.
- Node.js version 22.x
npm install
Make sure the database path is configured correctly in the code:
const DB_PATH = "./db.sqlite"
Run the application with:
npm start
If everything is installed correctly, you should see a demonstration of AI-powered job matching, with results showing the best job matches for different candidates and the best candidate matches for different job postings, including some extremely high-quality matches with 90%+ similarity scores.
This project uses the Xenova/gte-base
model through the transformers pipeline:
embedder = await pipeline("feature-extraction", "Xenova/gte-base");
The GTE (General Text Embeddings) model is particularly well-suited for semantic similarity tasks, making it ideal for this job matching application.
createEmbedding(input: string)
: Creates an embedding for the given input string using the GTE-base model.createQueryEmbedding(query: string)
: Creates an embedding for search queries.createResumeEmbedding(resume_text: string)
: Creates an embedding specifically for resume content.createJobEmbedding(job_title: string, job_description: string)
: Creates an embedding specifically for job descriptions.setupEmbeddings()
: Prepares the embedding model from the Xenova Transformers library.
serializeEmbedding(embedding: number[] | Float32Array): Buffer
: Serializes an embedding for storing in SQLite.deserializeEmbedding(buffer: Buffer): Float32Array
: Deserializes a buffer back to a Float32Array.setupDatabase(): Promise<Database>
: Opens the database, loads required extensions, and creates tables for resumes and jobs.clearResumeEmbeddings()
: Clears all resume embeddings from the database.createOrUpdateResumeEmbedding(candidate_name: string, resume_text: string, seniority: string, skills: string[], industry: string): Promise<void>
: Creates or updates a resume embedding with enhanced semantic context to improve matching quality.clearJobEmbeddings()
: Clears all job embeddings from the database.createOrUpdateJobEmbedding(job_title: string, job_description: string, seniority: string, required_skills: string[], industry: string): Promise<void>
: Creates or updates a job embedding with comprehensive semantic enrichment for optimal matching.findMatchingResumes(job_title: string, limit: number = 3): Promise<{ title: string; similarity: number }[]>
: Finds matching candidates for a job.findMatchingJobs(candidate_name: string, limit: number = 3): Promise<{ title: string; similarity: number }[]>
: Finds matching jobs for a candidate.
The system improves matching quality by:
- Structured Metadata: Organizing candidate and job information in a semantically meaningful format
- Contextual Descriptions: Adding natural language descriptions that highlight relationships between skills, experience, and industry
- Content Weighting: Emphasizing important information through formatting and repetition
- Skill Prioritization: Dividing skills into primary and secondary categories to focus on core requirements
- Alternative Job Titles: Adding common alternative names for positions to improve matching with varied terminology
- Related Industries: Including information about related professional fields to broaden matching scope
- Hierarchical Information: Using clear sections and headings to help the model understand the relative importance of different data points
To demonstrate the effectiveness of the vector similarity search, the demo includes examples of extremely high-quality matches with cosine similarity scores of 90% or higher. These are achieved through:
- Mirrored Terminology: Using nearly identical language patterns between job descriptions and resumes
- Structural Alignment: Ensuring both the job and resume follow the same information structure
- Extended Context: Providing rich, detailed descriptions that give the embedding model sufficient context
- Skill Matching: Using identical skill lists with the same priority order
- Experience Level Consistency: Maintaining the same experience level and seniority
- Industry Alignment: Ensuring perfect industry category matching
The demo clearly labels match quality with stars and percentage scores, making it easy to identify the highest quality matches.
The demo also includes deliberately mismatched profiles, such as:
// Example of a negative test case
await createOrUpdateResumeEmbedding(
"resume-negative-example",
"pastry chef with no experience in creating gourmet desserts and pastries. Doesn't know anything about French patisserie techniques, chocolate tempering, and sugar art. Doesn't know anything about pastry. Doesn't know anything about culinary arts. Doesn't know anything about desserts. Doesn't know anything about pastries.",
"No Experience",
[],
"Culinary Arts"
);
These negative examples verify that the system correctly identifies poor matches and returns low similarity scores, confirming that our matching algorithm is working as expected.
The code includes a jobMatchingDemo()
function that:
-
Creates sample resume embeddings from various backgrounds including:
- Senior software engineers
- Machine learning engineers
- Graphic designers
- Marketing specialists
- Negative examples (e.g., inexperienced pastry chef)
-
Creates sample job posting embeddings with matching and non-matching positions
-
Demonstrates perfect match examples with 90%+ similarity scores
-
Shows a range of match qualities:
- Perfect matches (90%+): Senior software engineers and ML engineers with nearly identical job requirements
- Excellent matches (75-90%): Full-Stack Engineers matching with software development resumes
- Good matches (60-75%): Graphic Designer positions matching with candidates having design skills
- Possible matches (<60%): Various candidates with some skill overlap but different domains
- Poor matches: Deliberately mismatched examples to verify system discrimination
-
The demo outputs clear, visually-enhanced results with a star rating system to help understand match quality:
⭐⭐⭐ PERFECT MATCH (90%+ similarity) ⭐⭐ EXCELLENT MATCH (75-90% similarity) ⭐ GOOD MATCH (60-75% similarity) POSSIBLE MATCH (<60% similarity)
- The Xenova Transformers library for creating semantic textual embeddings
- GTE-base embedding model for high-quality similarity measurement
- SQLite-vec extension for vector search functionality within the database