Tutorial 2. Azure Lab: Azure Stream Analytics

2.1 Using Azure’s Stream Analytics with Twitter

First, you need to sign up for the Azure Free Trial. Go to this site (you have to enter credit card information) and sign up for the trial: https://account.windowsazure.com/SignUp.

You can follow the lab online or here; I’ll try to give more pictures and in depth explanation about what is going on. Here is where you’ll find the lab online: https://azure.microsoft.com/en-us/documentation/articles/stream-analytics-twitter-sentiment-analysis-trends/.

Now, head to https://manage.windowsazure.com/ and we’ll start the lab.

2.1.1 Getting Twitter OAuth Keys and Downloading the Twitter Client

Luckily, we’ve already retrieved our OAuth keys. The Twitter Client we’ll be using to conduct the lab experiment can be downloaded by clicking here. You can download the source code for the application here.

2.1.2 Creating an Event Hub and Consumer Group

An Event Hub is a data transfer solution provided by the Service Bus. It creates a common hub in the cloud where events are published and then can be read on a very large scale.

  1. In the Azure Portal click New > App Services > Service Bus > Event Hub > Quick Create
    • Give the hub a name and a namespace that are easy to remember.
  2. Create the New Event Hub
  3. To create a new Consumer Group click your namespace name (highlighted in yellow)
    pt2-1
  4. Click “Event Hubs” (purple) > “Create Consumer Group” (red)
    pt2-2
  5. Click the name of your Event Hub > “Configure” (brown) tab > Create new shared access policy with manage permissions (purple) > Save (green) > Dashboard tab (yellow)
    pt2-3
  6. Click “Connection Information” at the bottom of the screen > Copy the “Connection String”
    pt2-4

2.1.3 Configuring the Twitter Client Application

If you haven’t yet downloaded the TwitterClient application you can download it here. Now, we’ll be configuring the application to interface with the Service Bus that we just created.

  1. Unzip the Twitter Client Application.
  2. Open the TwitterClient.exe.config file. Notepad or a similar application works fine; do not use Microsoft Word or other applications similar to that.
  3. Input your Twitter OAuth keys. It should look like this:
    pt2-5
  4. Input your “Connection Information” from the Event Hub you created.

This is where some errors could occur. The “Connection String” you copied should look something like this (but all in one line):

pt2-additional1

You have to paste this inside the “EventHubConnectionString” value’s quotes:

pt2-additional2

Then, you have to enter the actual name of your Event Hub NOT the name of your policy, so mine was simply “twitterhub”. Here is what mine looked like in the end:

pt2-additional3

Run “TwitterClient.exe” to see if you entered the information correctly. If all went well this is what you should see when you run the program:

pt2-6

*Possible Errors:
– If the console opens and then closes immediately, you entered the Twitter OAuth keys incorrectly.
– If the console opens, but nothing happens, you entered the Event Hub information incorrectly.

2.2 Stream Analytics

Now, we must set up stream analytics to analyze the data that we’re sending out.

2.2.1 Creating the Stream Analytics job

  1. In the Azure Portal click New > Data Services > Stream Analytics > Quick Create. Input all the necessary information just as I do.
    pt2-7
  2. Select your newly created Stream Analytics Job.

2.2.2 Specifying Job Input

  1. Click Inputs in the top menu. Then Add Input.
  2. Select “Data Stream”, then continue.
  3. Select “Event Hub”, then continue.
  4. Enter the following values on the third page, then continue:
    • Input Alias: Select a friendly name. You will use this later in the query.
    • Subscription: “Use Event Hub from Current Subscription”
    • Choose a Namespace: Choose your namespace. Mine was “TwittterHubRyanBury-ns (Central US)”.
    • Choose an Eventhub: Choose your Eventhub. Mine was “twitterhub”.
    • Event Hub Policy Name: Choose the policy we made earlier. Mine was “TwitterHubPolicy”.
    • Choose a Consumer Group: Choose the consumer group we made earlier. Mine was “twitterhubgroup”.
  5. Enter the following:
    • Event Serializer Format: JSON
    • Encoding: UTF8
  6. Click the check to add the source and test the connection with the event hub.

2.2.3 Specifying Job Query

First, we need to receive some sample data to run some test queries with.

  1. Run the “TwitterClient.exe” and let it run for at least 5 minutes before you go any further. This allows some events to collect within the Event Hub.
  2. Select INPUTS after selecting your Stream Analytics job.
  3. Select SAMPLE DATA at the bottom of the page.
  4. The default selections should be adequate for collecting enough data.
  5. After running click the DETAILS Then click the link to download the .JSON file with your data. Save this file in an easily accessible place; we will use it next.

We’ll run a couple of different queries to get familiar with the process. If you would like to learn more about how to run these queries, the reference is here.

  1. Select your newly created Stream Analytics job. I selected “twitterstreamanalytics”.
  2. Click QUERY at the top of the job’s page.
    pt2-sql&code1
  3. In the editor replace the query with this:
  4. Click TEST under the query.
  5. Select the sample .JSON file we created earlier.
  6. Click the check button to view the results. They might be quite large.

Now we’ll count the number of tweets per topic every 5 seconds:
pt2-sql&code2

  1. Change the query to:
  2. Click RERUN under the query editor to view the results.

2.2.4 Finding Trending Topics

For this query we need a rather large dataset so you may have to go back to where we collected sample data and retrieve a larger set. Here is how you’ll find trending topics:

  1. Change the query to:
    pt2-sql&code3
  2. Click Test and make sure you select a dataset large enough to find some trending topics. With my count (the number bold and red) at 20 I only found one trending topic. If I changed it to 15 I received 7 results.pt2-8

2.2.5 Find Sentiment Statistics

Here we’ll collect various statistical measures such as number of mentions, average, minimum, maximum, and standard deviation of sentiment score for each topic.

  1. Change the query to:
    pt2-sql&code4
  2. RERUN the query to view the results.
  3. This is the query we’ll be using so click SAVE at the bottom of the screen.

2.2.6 Create output sink

Now we’ve done everything we have to except specifying the output of our application. You could push (store) the results to an SQL Database, Table Store, or Event Hub, but we’ll be pushing them to an Azure Blob storage.

  1. Select the Storage Account we created earlier. Mine was named “twitterstreamstorage1”. If you never created one click NEW > DATA SERVICES > STORAGE > QUICK CREATE > and follow the on-screen instructions.
  2. In the Storage Account click CONTAINERS at the top of the page and add a container.
  3. Create a name for your container and set the access to “Public Blob”.

2.2.7 Specify Job Output

  1. In your Stream Analytics job, click OUTPUT at the top of the page and add an output.
  2. Select “Blob Storage”.
  3. Choose the following values:
    • Output Alias: Enter a friendly name for the output.
    • Subscription: This should be selected for you to “Use Storage Account from Current Subscription”.
    • Storage Account: This should also be selected for you to the name of your storage account.
    • Container: Select your container if it isn’t already selected.
    • Filename Prefix: Type in a simple file prefix. Example: “module1”.
  4. Click the right button.
  5. Specify these values:
    • Event Serializer Format: CSV (CSV is easier to use with Power BI, you could select JSON)
    • Encoding: UTF8
  6. Click the check button to create the output.

2.2.8 Starting the Job

  1. Navigate to the dashboard of your streaming job and click START at the bottom of the screen.
  2. Make sure that Job Start Time is selected and click the checkmark. The job will start running.

* Make sure that your “TwitterClient.exe” is running before you start the twitterstream otherwise nothing will happen as the job is running!

If all went well your dashboard should have some graphs like this indicating that events are being processed:

pt2-9

Continue with Tutorial 3 …

Tutorial 3 will go over viewing your output and organizing the results in a meaningful way. It will also give some extra activities to try using the Stream Analytics.