The above architecture illustrates the workflow of how an RSS feed can be analyzed as soon as it is collected. Here is the logic:
Here is a step-by-step process to build the above steps using Google Cloud managed services:
Step 1: Cloud Scheduler writes a sample message to Pub/Sub. This is just to invoke a cloud function that is configured to start as soon as this event occurs.
A configuration like above writes a message to a Pub/Sub topic at 5 AM PDT every day.
Step 2: An RSS feed contains one or many web page URLs. A message written to Pub/Sub in Step 1 triggers a cloud function that downloads the contents of the RSS feed and parses a list of web page URLs from its content. These URLs are then written to another Pub/Sub topic for further processing.
With a configuration such as the one above, Cloud Functions starts executing at 5AM when the scheduler event arrives via the Pub/Sub trigger.
Here is sample code written in C#, .NET core 3.1, to parse RSS feed(s) and collect web page URLs from it.
Function PublishMessageWithRetrySettingsAsync
can be found here.
Step 3: A URL entry written to Pub/Sub in Step 2 triggers a new cloud function that downloads web page text and writes the text to a Cloud Storage bucket.
With a configuration like the one above, a cloud function will execute for each web page URL collected from the RSS feed and written to Pub/Sub in Step 2.
Here is sample code written in Python 3.9 to download web page text and store it in Cloud Storage. BeautifulSoup library is used to parse web pages and extract text.
Step 3 will result in web page text stored in Cloud Storage as shown below:
Step 4: A new file creation event in Step 3 triggers a new cloud function that does entity sentiment analysis on the web page text and writes results to another Cloud Storage bucket.
With a configuration like this, a cloud function will execute for each web page text downloaded and stored in a Cloud Storage bucket.
Here is sample code written in Nodejs 14 to read data from a Cloud Storage bucket, parse the download web page text, and store it in Cloud Storage.
analyzeEntitySentiment
function does Entity Sentiment Analysis. Natural Language API has several methods for performing analysis and annotation on text data. Here are examples of how to perform analysis on text data using Natural Language API.
The analysis data collected can be used by data engineering teams to find out answers to questions like: ‘Which entities (person, product etc) were discussed the most in recent times?’ and ‘What is the sentiment associated with these entities?’ These questions can help in understanding the overall effectiveness of existing marketing campaigns or latest trends, to help create better marketing or public relations campaigns.
Resources:
All logs from above services will be available in Cloud Logging.
Cloud Functions can be tested and debugged locally.
All services are loosely coupled.
Vertex AI can be used to analyze the meta data collected at the end..
Summary:
Cloud Functions can be powerful services that can be used to build event-driven applications with minimal effort. The above walkthrough demonstrates Cloud Functions events, trigger integrations with different Google Cloud services, and its multi-language support. To get started, check out the references below.
References: