Skip to content

stephenc222/example-sqlite-vec-tutorial

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Example SQLite Vector Embeddings

Check out my blog post on SQLite vector embeddings that explains this project!

This example tutorial project demonstrates how to utilize Xenova Transformers and SQLite with the sqlite-vec vector extension to create and search vector embeddings stored in SQLite for AI-powered job matching.

Description

The project consists of the following functionalities:

  1. Embeddings Creation: Using Xenova Transformers (specifically the Xenova/gte-base model) to create vector embeddings from raw string inputs.
  2. Database Setup: Connecting to SQLite and loading the required extensions for vector operations, along with table creation for resumes and jobs.
  3. Job Matching: Providing functions to add resume and job embeddings and perform similarity searches to match candidates with jobs and vice versa.
  4. Enhanced Semantic Context: Enriching embeddings with structured metadata to improve matching quality.
  5. Intelligent Job Representation: Using hierarchical formatting, alternative titles, and career context to improve job matching.
  6. High-Similarity Examples: Demonstrating near-perfect matches with 90%+ cosine similarity scores.
  7. Negative Examples: Including deliberately mismatched profiles to verify the system works properly.

Getting Started

Prerequisites

  • Node.js version 22.x

Installation

npm install

Configuration

Make sure the database path is configured correctly in the code:

const DB_PATH = "./db.sqlite"

Usage

Run the application with:

npm start

If everything is installed correctly, you should see a demonstration of AI-powered job matching, with results showing the best job matches for different candidates and the best candidate matches for different job postings, including some extremely high-quality matches with 90%+ similarity scores.

Embedding Model

This project uses the Xenova/gte-base model through the transformers pipeline:

embedder = await pipeline("feature-extraction", "Xenova/gte-base");

The GTE (General Text Embeddings) model is particularly well-suited for semantic similarity tasks, making it ideal for this job matching application.

API Reference

Embeddings Functions

  • createEmbedding(input: string): Creates an embedding for the given input string using the GTE-base model.
  • createQueryEmbedding(query: string): Creates an embedding for search queries.
  • createResumeEmbedding(resume_text: string): Creates an embedding specifically for resume content.
  • createJobEmbedding(job_title: string, job_description: string): Creates an embedding specifically for job descriptions.
  • setupEmbeddings(): Prepares the embedding model from the Xenova Transformers library.

Database Functions

  • serializeEmbedding(embedding: number[] | Float32Array): Buffer: Serializes an embedding for storing in SQLite.
  • deserializeEmbedding(buffer: Buffer): Float32Array: Deserializes a buffer back to a Float32Array.
  • setupDatabase(): Promise<Database>: Opens the database, loads required extensions, and creates tables for resumes and jobs.
  • clearResumeEmbeddings(): Clears all resume embeddings from the database.
  • createOrUpdateResumeEmbedding(candidate_name: string, resume_text: string, seniority: string, skills: string[], industry: string): Promise<void>: Creates or updates a resume embedding with enhanced semantic context to improve matching quality.
  • clearJobEmbeddings(): Clears all job embeddings from the database.
  • createOrUpdateJobEmbedding(job_title: string, job_description: string, seniority: string, required_skills: string[], industry: string): Promise<void>: Creates or updates a job embedding with comprehensive semantic enrichment for optimal matching.
  • findMatchingResumes(job_title: string, limit: number = 3): Promise<{ title: string; similarity: number }[]>: Finds matching candidates for a job.
  • findMatchingJobs(candidate_name: string, limit: number = 3): Promise<{ title: string; similarity: number }[]>: Finds matching jobs for a candidate.

Semantic Matching Enhancements

The system improves matching quality by:

  1. Structured Metadata: Organizing candidate and job information in a semantically meaningful format
  2. Contextual Descriptions: Adding natural language descriptions that highlight relationships between skills, experience, and industry
  3. Content Weighting: Emphasizing important information through formatting and repetition
  4. Skill Prioritization: Dividing skills into primary and secondary categories to focus on core requirements
  5. Alternative Job Titles: Adding common alternative names for positions to improve matching with varied terminology
  6. Related Industries: Including information about related professional fields to broaden matching scope
  7. Hierarchical Information: Using clear sections and headings to help the model understand the relative importance of different data points

Achieving High-Similarity Matches

To demonstrate the effectiveness of the vector similarity search, the demo includes examples of extremely high-quality matches with cosine similarity scores of 90% or higher. These are achieved through:

  1. Mirrored Terminology: Using nearly identical language patterns between job descriptions and resumes
  2. Structural Alignment: Ensuring both the job and resume follow the same information structure
  3. Extended Context: Providing rich, detailed descriptions that give the embedding model sufficient context
  4. Skill Matching: Using identical skill lists with the same priority order
  5. Experience Level Consistency: Maintaining the same experience level and seniority
  6. Industry Alignment: Ensuring perfect industry category matching

The demo clearly labels match quality with stars and percentage scores, making it easy to identify the highest quality matches.

Testing with Negative Examples

The demo also includes deliberately mismatched profiles, such as:

// Example of a negative test case
await createOrUpdateResumeEmbedding(
  "resume-negative-example",
  "pastry chef with no experience in creating gourmet desserts and pastries. Doesn't know anything about French patisserie techniques, chocolate tempering, and sugar art. Doesn't know anything about pastry. Doesn't know anything about culinary arts. Doesn't know anything about desserts. Doesn't know anything about pastries.",
  "No Experience",
  [],
  "Culinary Arts"
);

These negative examples verify that the system correctly identifies poor matches and returns low similarity scores, confirming that our matching algorithm is working as expected.

Demo Implementation

The code includes a jobMatchingDemo() function that:

  1. Creates sample resume embeddings from various backgrounds including:

    • Senior software engineers
    • Machine learning engineers
    • Graphic designers
    • Marketing specialists
    • Negative examples (e.g., inexperienced pastry chef)
  2. Creates sample job posting embeddings with matching and non-matching positions

  3. Demonstrates perfect match examples with 90%+ similarity scores

  4. Shows a range of match qualities:

    • Perfect matches (90%+): Senior software engineers and ML engineers with nearly identical job requirements
    • Excellent matches (75-90%): Full-Stack Engineers matching with software development resumes
    • Good matches (60-75%): Graphic Designer positions matching with candidates having design skills
    • Possible matches (<60%): Various candidates with some skill overlap but different domains
    • Poor matches: Deliberately mismatched examples to verify system discrimination
  5. The demo outputs clear, visually-enhanced results with a star rating system to help understand match quality:

    ⭐⭐⭐ PERFECT MATCH (90%+ similarity)
    ⭐⭐ EXCELLENT MATCH (75-90% similarity)
    ⭐ GOOD MATCH (60-75% similarity)
    POSSIBLE MATCH (<60% similarity)
    

Acknowledgments

  • The Xenova Transformers library for creating semantic textual embeddings
  • GTE-base embedding model for high-quality similarity measurement
  • SQLite-vec extension for vector search functionality within the database

About

A tutorial explaining how to use sqlite-vec for a TypeScript application.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published