Focused crawls are collections of frequently-updated webcrawl data from narrow (as opposed to broad or wide) web crawls, often focused on a single domain or subdomain.
DataSphereStudio is a one stop data application development& management portal, covering scenarios including data exchange, desensitization/cleansing, analysis/mining, quality measurement, visualization, and task scheduling.
Use Pentaho's open source data integration tool (Kettle) to create Extract-Transform-Load (ETL) processes to update a Socrata open data portal. Documentation is available at http://open-data-etl-utility-kit.readthedocs.io/en/stable
Cassandra JDBC driver that works with 2.x and above. It intends to provide better SQL compatibility over CQL, so that it works well with existing tools like SQuirreL SQL for SQL development, JMeter for stress testing, and Pentaho BI Suite for data processing.