COLLECTED BY
Organization:
Internet Archive
Focused crawls are collections of frequently-updated webcrawl data from narrow (as opposed to broad or wide) web crawls, often focused on a single domain or subdomain.
The Wayback Machine - https://web.archive.org/web/20200813023522/https://github.com/topics/ingest
Here are
22 public repositories
matching this topic...
Python based Open Source ETL tools for file crawling, document processing (text extraction, OCR), content analysis (Entity Extraction & Named Entity Recognition) & data enrichment (annotation) pipelines & ingestor to Solr or Elastic search index & linked data graph database
Updated
Apr 12, 2020
Python
An application-agnostic, all-in-one New Relic integration integration
JavaFX and commandline application to import events from the Ethereum blockchain into ElasticSearch, MongoDB, Hazelcast, CQEngine and SQLite.
E-ARK Web is a software for the creation and management of archival information packages, and it supports full-text search for individual files contained in them.
Hive streaming ingest test application
API client for Aleph, supports bulk entity and document upload.
Updated
Jun 25, 2020
Python
DataStax or Cassandra Ingest from Relational Databases with StreamSets
Updated
Mar 5, 2019
PLSQL
Apache NiFi 1.5 Custom Processor for WebCams
Updated
Mar 27, 2018
Java
Populating Cloud Data Warehouses with Apache NiFi
Twitter Ingest to S3, ORC, Slack, Hive Streaming
Islandora Prepare Ingest is a module that helps you build workflows for preparing data for ingest into Islandora.
Notes, scripts, images, Apache NiFi templates, processors
Renames a folder full of TIFF/XML/MRC files and moves them into the correct folders to prepare for ingest using the Islandora Book Batch module
Updated
Dec 10, 2013
Shell
Newspaper batch ingest that uses the ordering of a ZIP to dictate sequence.
🏭 Kids First Data Ingest Library
Updated
Aug 11, 2020
Python
Ingest pipelines to parse Artifactory logs sent to Elasticsearch using Filebeat
A pluggable batch ingester module for Islandora.
Conversion and ingestion of NOAA GHCN-D weather data into PostgreSQL database and querying. Python, R, SQL, Scala
Updated
Jun 28, 2018
XSLT
Batch ingest for IArchive format, as provided from UoM.
Updated
Feb 26, 2014
XSLT
Improve this page
Add a description, image, and links to the
ingest
topic page so that developers can more easily learn about it.
Curate this topic
Add this topic to your repo
To associate your repository with the
ingest
topic, visit your repo's landing page and select "manage topics."
Learn more
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session.
You signed out in another tab or window. Reload to refresh your session.