GitHub - borderless/unfurl: Extract rich metadata from URLs

Unfurl

Extract rich metadata from URLs.

Installation

npm install @borderless/unfurl --save

Usage

Unfurl attempts to parse and extract rich structured metadata from URLs.

import { scraper, urlScraper } from "@borderless/unfurl";
import * as plugins from "@borderless/unfurl/dist/plugins";

Scraper

Accepts a request function and a list of plugins to use. The request is expected to return a "page" object, which is the same shape as the input to scrape(page).

const scrape = scraper({
  request,
  plugins: [plugins.htmlmetaparser, plugins.exifdata],
});

const res = await fetch("http://example.com"); // E.g. `popsicle`.

await scrape({
  url: res.url,
  status: res.status,
  headers: res.headers.asObject(),
  body: res.stream(), // Must stream the request instead of buffering to support large responses.
});

URL Scraper

Simpler wrapper around scraper that automatically makes a request(url) for the page.

const scrape = urlScraper({ request });

await scrape("http://example.com");

License

Apache 2.0

Oct	NOV	Dec
	27
2021	2022	2023

README.md

Unfurl

Installation

Usage

Scraper

URL Scraper

License

About

Releases 13

Packages

Contributors 5

Languages

License

borderless/unfurl

Sign In Required

Launching GitHub Desktop

Launching GitHub Desktop

Launching Xcode

Launching Visual Studio Code

Latest commit

Git stats

Files

README.md

Unfurl

Installation

Usage

Scraper

URL Scraper

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 13

Packages 0

Contributors 5

Languages

Packages