Azure Event Hub Review Most Recent Messages

If I am using AzureEventHub to load streaming data into an application, how can I review the most recent messages that have been sent through the EventHub to the AI Suite data stores?

If I was loading files through the JMSDataLoadQueue, then I could check the SourceFiles / DataLoadUpload/ProcessLogs that have arrived, review their content in the file store, and then investigate the resulting records in the Type-Enabled data store (be it cassandra, postgres, file store, etc).

I am interested in the equivalent steps for Azure’s Event Hub to analyze message content and determine which records on which types can be analyzed to determine whether the resulting records are landing in the platform in the desired fashion.

I assume this is 7.8 we are talking about.

After the syncing/loading process has begun, the process should be the same (checking SourceFile, DataLoadUpload/ProcessLogs, ect)

If you want to access messages that are in the stream itself, you can call receiveMessages, on your instance of AEH with the partition and sequence number you are interested in.

Here are some of the steps that we took to ensure we receive messages from EventHub during setup.

  1. EventHubQueue should have same of number of the messages as the number of partitions that we connect to in the EventHub. (# of partitions is found from the count of results returned by AzureEventHubCheckpoint.fetch())

  2. Check AzureEventHubCheckpoint.fetch() to make sure the sequenceNumber is increasing. The sequenceNumber should increase as messages are successfully processed by the EventHubQueue. A computing/processing entry will not equate to an increased sequenceNumber. Only once the processing has completed successfully will the number toggle up.

  3. This simple splunk query will give you an idea if messages are getting processed in the environment host=<<hotname>> queueName=<<queue_name>>
    (queue_name is found from the queueName field in the following api call: c3Grid(AzureEventHubCheckpoint.fetch()))

  4. Comparing Splunk activity for AzureEventHub.receive* and Canonical*.import() actions will reveal if there is a discrepancy between the number of messages attempting to be retrieved from the event hub, and the count of Canonical processing actions that are taking place. It is not good if there are many AzureEventHub actions and not many Canonical actions.

  5. Review errors messages if any in EventHubQueue. In our case it was always errorMsg: Opening MessagingFactory timed out. Opening MessagingFactory timed out. So we opened the necessary ports to allow out bound traffic.

Many customers have created Splunk Dashboard to get statistics on the total number of messages that we receive / process, latency etc.

As a longer term solution, we are thinking on generating alerts if we do not receive messages based on the time since last received message.