COLLECTED BY
Organization:
Internet Archive
Focused crawls are collections of frequently-updated webcrawl data from narrow (as opposed to broad or wide) web crawls, often focused on a single domain or subdomain.
The Wayback Machine - https://web.archive.org/web/20210119170308/https://github.com/topics/dataproc
Here are
33 public repositories
matching this topic...
Ephemeral Hadoop clusters using Google Compute Platform
Companion to Learning Hadoop and Learning Spark courses on Linked In Learning
Updated
Jan 12, 2021
HTML
A Python framework for data processing on GCP.
Updated
Jan 18, 2021
Python
gomrjob - a Go Framework for Hadoop Map Reduce Jobs
Google Cloud Dataproc is a managed Apache Spark and Apache Hadoop service that lets you take advantage of open source data tools for batch processing, querying, streaming, and machine learning.
Updated
Jan 14, 2021
TypeScript
A search engine to query social media insights with political theme
Updated
Oct 1, 2020
Jupyter Notebook
opens a chrome browser to a dataproc cluster
Updated
Jan 23, 2018
Python
Creating an Inverted Index of words occurring in a large set of documents extracted from web pages using Hadoop MapReduce and Google Dataproc
Updated
Oct 28, 2019
Java
Demonstration of Google Cloud Dataproc Workflow Templates
Covers big data processing in cloud
Demonstration of Google Cloud Dataproc for running PySpark jobs
Updated
Dec 16, 2018
Python
gke with terraform, dataproc with terraform
Twilytics provides insights into the trends on Twitter based on real time data
Updated
Jul 19, 2020
Jupyter Notebook
Demonstration of Google Cloud Dataproc for running Spark jobs with Java
Updated
Dec 17, 2018
Java
Collection of personal resources on Google Cloud
Updated
Nov 18, 2020
Shell
Collected data about from three sources, one opinion-based social media in twitter, research data in New York Times, and the third is the common crawl data for the same topic or key phrase, and from similar time periods. Processed the three data sets collected individually using classical big data methods like Map Reduce in Google Dataproc Clusters. And then compared the outcomes using popular visualization methods in tableau.
Updated
Oct 25, 2019
Python
Updated
Nov 18, 2020
Python
Google DataProc Spark Scala Job for MNIST Handwritten Digit Recognition using Decision Trees (Spark MLlib)
Updated
Jan 2, 2018
Perl 6
Updated
Jan 14, 2021
Scala
Neo4j, Cassandra, Hadoop, PySpark, RDD, MapReduce, Cluster-Computing, DataProc
Updated
Dec 26, 2019
Python
Updated
Jan 14, 2021
Jupyter Notebook
Updated
Nov 30, 2018
Python
Using PySpark for Tensorflow model inferencing on GCP Dataproc Cluster. Demo for PyCon Hong Kong Fall 2020 Presentation
Updated
Nov 9, 2020
Jupyter Notebook
Spark-приложение для потоковой обработки датасета олимпийских достижений и рекордов
Updated
Nov 16, 2020
Scala
Working examples for some components on GCP, and instructions on how to run them.
Updated
Apr 26, 2017
Java
Dataproc Customisable HA cluster debian-9 with zookeeper,kafka ,BigQuery and other tools/jobs with Terraform
Running a wordcount job on a Google Dataproc cluster
Updated
Mar 29, 2020
Python
Data Workflows with GCP Dataproc, Apache Airflow and Apache Spark
Updated
Mar 4, 2020
Python
Improve this page
Add a description, image, and links to the
dataproc
topic page so that developers can more easily learn about it.
Curate this topic
Add this topic to your repo
To associate your repository with the
dataproc
topic, visit your repo's landing page and select "manage topics."
Learn more
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session.
You signed out in another tab or window. Reload to refresh your session.