Example SQLite Vector Embeddings

Check out my blog post on SQLite vector embeddings that explains this project!

This example tutorial project demonstrates how to utilize Xenova Transformers and SQLite with the sqlite-vec vector extension to create and search vector embeddings stored in SQLite for AI-powered job matching.

Description

The project consists of the following functionalities:

Embeddings Creation: Using Xenova Transformers (specifically the Xenova/gte-base model) to create vector embeddings from raw string inputs.
Database Setup: Connecting to SQLite and loading the required extensions for vector operations, along with table creation for resumes and jobs.
Job Matching: Providing functions to add resume and job embeddings and perform similarity searches to match candidates with jobs and vice versa.
Enhanced Semantic Context: Enriching embeddings with structured metadata to improve matching quality.
Intelligent Job Representation: Using hierarchical formatting, alternative titles, and career context to improve job matching.
High-Similarity Examples: Demonstrating near-perfect matches with 90%+ cosine similarity scores.
Negative Examples: Including deliberately mismatched profiles to verify the system works properly.

Getting Started

Prerequisites

Node.js version 22.x

Installation

npm install

Configuration

Make sure the database path is configured correctly in the code:

const DB_PATH = "./db.sqlite"

Usage

Run the application with:

npm start

If everything is installed correctly, you should see a demonstration of AI-powered job matching, with results showing the best job matches for different candidates and the best candidate matches for different job postings, including some extremely high-quality matches with 90%+ similarity scores.

Embedding Model

This project uses the Xenova/gte-base model through the transformers pipeline:

embedder = await pipeline("feature-extraction", "Xenova/gte-base");

The GTE (General Text Embeddings) model is particularly well-suited for semantic similarity tasks, making it ideal for this job matching application.

API Reference

Embeddings Functions

createEmbedding(input: string): Creates an embedding for the given input string using the GTE-base model.
createQueryEmbedding(query: string): Creates an embedding for search queries.
createResumeEmbedding(resume_text: string): Creates an embedding specifically for resume content.
createJobEmbedding(job_title: string, job_description: string): Creates an embedding specifically for job descriptions.
setupEmbeddings(): Prepares the embedding model from the Xenova Transformers library.

Database Functions

serializeEmbedding(embedding: number[] | Float32Array): Buffer: Serializes an embedding for storing in SQLite.
deserializeEmbedding(buffer: Buffer): Float32Array: Deserializes a buffer back to a Float32Array.
setupDatabase(): Promise<Database>: Opens the database, loads required extensions, and creates tables for resumes and jobs.
clearResumeEmbeddings(): Clears all resume embeddings from the database.
createOrUpdateResumeEmbedding(candidate_name: string, resume_text: string, seniority: string, skills: string[], industry: string): Promise<void>: Creates or updates a resume embedding with enhanced semantic context to improve matching quality.
clearJobEmbeddings(): Clears all job embeddings from the database.
createOrUpdateJobEmbedding(job_title: string, job_description: string, seniority: string, required_skills: string[], industry: string): Promise<void>: Creates or updates a job embedding with comprehensive semantic enrichment for optimal matching.
findMatchingResumes(job_title: string, limit: number = 3): Promise<{ title: string; similarity: number }[]>: Finds matching candidates for a job.
findMatchingJobs(candidate_name: string, limit: number = 3): Promise<{ title: string; similarity: number }[]>: Finds matching jobs for a candidate.

Semantic Matching Enhancements

The system improves matching quality by:

Structured Metadata: Organizing candidate and job information in a semantically meaningful format
Contextual Descriptions: Adding natural language descriptions that highlight relationships between skills, experience, and industry
Content Weighting: Emphasizing important information through formatting and repetition
Skill Prioritization: Dividing skills into primary and secondary categories to focus on core requirements
Alternative Job Titles: Adding common alternative names for positions to improve matching with varied terminology
Related Industries: Including information about related professional fields to broaden matching scope
Hierarchical Information: Using clear sections and headings to help the model understand the relative importance of different data points

Achieving High-Similarity Matches

To demonstrate the effectiveness of the vector similarity search, the demo includes examples of extremely high-quality matches with cosine similarity scores of 90% or higher. These are achieved through:

Mirrored Terminology: Using nearly identical language patterns between job descriptions and resumes
Structural Alignment: Ensuring both the job and resume follow the same information structure
Extended Context: Providing rich, detailed descriptions that give the embedding model sufficient context
Skill Matching: Using identical skill lists with the same priority order
Experience Level Consistency: Maintaining the same experience level and seniority
Industry Alignment: Ensuring perfect industry category matching

The demo clearly labels match quality with stars and percentage scores, making it easy to identify the highest quality matches.

Testing with Negative Examples

The demo also includes deliberately mismatched profiles, such as:

// Example of a negative test case
await createOrUpdateResumeEmbedding(
  "resume-negative-example",
  "pastry chef with no experience in creating gourmet desserts and pastries. Doesn't know anything about French patisserie techniques, chocolate tempering, and sugar art. Doesn't know anything about pastry. Doesn't know anything about culinary arts. Doesn't know anything about desserts. Doesn't know anything about pastries.",
  "No Experience",
  [],
  "Culinary Arts"
);

These negative examples verify that the system correctly identifies poor matches and returns low similarity scores, confirming that our matching algorithm is working as expected.

Demo Implementation

The code includes a jobMatchingDemo() function that:

Creates sample resume embeddings from various backgrounds including:
- Senior software engineers
- Machine learning engineers
- Graphic designers
- Marketing specialists
- Negative examples (e.g., inexperienced pastry chef)
Creates sample job posting embeddings with matching and non-matching positions
Demonstrates perfect match examples with 90%+ similarity scores
Shows a range of match qualities:
- Perfect matches (90%+): Senior software engineers and ML engineers with nearly identical job requirements
- Excellent matches (75-90%): Full-Stack Engineers matching with software development resumes
- Good matches (60-75%): Graphic Designer positions matching with candidates having design skills
- Possible matches (<60%): Various candidates with some skill overlap but different domains
- Poor matches: Deliberately mismatched examples to verify system discrimination

The demo outputs clear, visually-enhanced results with a star rating system to help understand match quality:

⭐⭐⭐ PERFECT MATCH (90%+ similarity)
⭐⭐ EXCELLENT MATCH (75-90% similarity)
⭐ GOOD MATCH (60-75% similarity)
POSSIBLE MATCH (<60% similarity)

Acknowledgments

The Xenova Transformers library for creating semantic textual embeddings
GTE-base embedding model for high-quality similarity measurement
SQLite-vec extension for vector search functionality within the database

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
.nvmrc		.nvmrc
README.md		README.md
db.ts		db.ts
embed.ts		embed.ts
index.ts		index.ts
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Example SQLite Vector Embeddings

Description

Getting Started

Prerequisites

Installation

Configuration

Usage

Embedding Model

API Reference

Embeddings Functions

Database Functions

Semantic Matching Enhancements

Achieving High-Similarity Matches

Testing with Negative Examples

Demo Implementation

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Languages

stephenc222/example-sqlite-vec-tutorial

Folders and files

Latest commit

History

Repository files navigation

Example SQLite Vector Embeddings

Description

Getting Started

Prerequisites

Installation

Configuration

Usage

Embedding Model

API Reference

Embeddings Functions

Database Functions

Semantic Matching Enhancements

Achieving High-Similarity Matches

Testing with Negative Examples

Demo Implementation

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages