Cloud Operations Overview for Google Cloud Professional Architect

cloud operations overview for google cloud professional architect gcp free certification gcp professional cloud architect google cloud architect google cloud architect salary and demand gooogle cloud architect free course stackdriver google cloud Oct 29, 2020
Cloud Operations Overview for Google Cloud Professional Architect

Operations Suite (Stackdriver) is a hybrid monitoring, logging, and diagnostics tool suite for applications on the Google Cloud Platform and AWS.

GCP Purchased Stackdriver and was rebranded to Google Stackdriver after the purchase.

Google has now rebranded the Stackdriver Suite as “Cloud Operations” This is important to know in case the exam has not been updated to reflect the change.

Cloud Operations monitors the clouds service layers in a single SaaS solution. Cloud Operations maintains native integration with Google Cloud data tools BigQuery, Cloud Pub/Sub, Cloud Storage, Cloud Datalab, and out-of-the-box integration with all your other application components.

In a nutshell Cloud Operations Suite allows you to Monitor, troubleshoot, and improve application performance on your Google Cloud environment.

To access Cloud Operations simply go to the Cloud Console and go to Operations.

Cloud Operations Monitoring Main Benefits

  • Fully integrated with Google Cloud Console
  • Monitors multi cloud (GCP and AWS)
  • Identify trends and prevents issues
  • Lowers Monitoring headaches
  • Fix problems faster
  • Aids with Cloud Security
  • Aids with Compliance

With Cloud Operations from a monitoring perspective defaults are intelligent and dynamic. Metrics retention is up to 24 months and writing metrics at up to 10-second granularity.

There are also Health checks which can be set up to monitor your cloud resources in specific regions.

Important note for preparing for the exam. In Stackdriver the Metrics are

  1. >> Platform, system, application
    Then we can:
  2. >>>Ingest Data Metrics, events and metadata
    so, we can then
  3. >>>be provided insight thru dashboards, charts, and alerts

In a nutshell, we select the platform then we select metrics and then we will choose how to monitor or view.

Figure 1 Cloud Operations Main Menu

The menu pane is on the left side of the Cloud Console.

Test Tip — You can write logs from Python applications by using the Stackdriver logging handler

Workspaces

A Workspace is a tool for monitoring resources contained in one or more GCP projects or AWS accounts.
Each Workspace can have between 1 and 100 monitored projects, including one or more GCP projects.
You can have as many Workspaces as you wish, but GCP projects and AWS accounts cannot be monitored by more than one Workspace.
A Workspace contains the custom dashboards, alerting policies, uptime checks, notification channels, and group definitions that you use with your monitored projects.

Test Tip — Every workspace has a host project.

A Workspace can access metric data from its monitored projects, but the metric data and log entries remain in the individual projects.

Figure 2 shows the Stackdriver Monitoring Dashboard that lists the dashboard that has been created.

Figure 2 Cloud Operations

Image for post
Image for post

The above figure shows an App Engine dashboard and one for Development GKE. Having very specific dashboards aids in the rapid identification of potential issues by providing an immediate visual thru viewing graphs/charts for example.

Test Tip — You can share Cloud Operations Charts by sending a parameterized URL

Figure 3 Dashboard View

Image for post
Image for post

The monitoring dashboard in the figure above is just a sample of what could be customized.

It works simply by selecting the Resource and then available metrics you would like in your dashboard view. Metrics Explorer lets you build charts for any metric collected by your project.

The monitoring dashboard in the figure above is just a sample of what could be customized.

It works simply by selecting the Resource and then available metrics you would like in your dashboard view. Metrics Explorer lets you build charts for any metric collected by your project.

Figure 4 Metrics Explorer

Image for post
Image for post

Google Cloud Professional Cloud Architect — All in one Guide https://amzn.to/3hSd733

Order now on Amazon.

Image for post
Image for post

Uptime Checks

When you make a change to an uptime check delay could be 25 minutes.

Figure 5 Latency Uptime Check

Image for post
Image for post

Uptime checks are available under monitoring and the dashboard will show you Uptime, Latency, Alerts, Checks Passed, etc.

Test Tip — You can setup alerts via email, SMS. PagerDuty, Web apps, webhooks

Cloud Logging

Logging supports these destinations for exported log entries. These exports are facilitated with the Logging API

Logging Agent

The Logging agent which you would install on a VM is based on Fluentd. Fluentd is an open-source data collector, which lets you unify the data collection and consumption for a better use and understanding of data. Fluentd is used to obtain or write log files to Syslog and other supported log types.

Install Logging Agent

The Logging Agent is installed the same way you would install the monitoring agent on a VM. It is just a different script that your running.

#curl -sSO https://dl.google.com/cloudagents/add-logging-agent-repo.sh

#sudo bash add-logging-agent-repo.sh

#sudo apt-get update

Log Sinks

You need to create a log sink which includes a logs query and an export destination. To create a sink, use the gcloud command

#gcloud logging sinks create

Sinks can be set up at the Google Cloud project level, or at the organization or folder levels using aggregated sinks.

Understanding Sink Permissions

To create or modify a sink, you must have the IAM roles Owner or Logging/Logs Configuration Writer in the sink’s parent resource.

To view existing sinks, you must have the IAM roles Viewer or Logging/Logs Viewer in the sink’s parent resource

Logs

Your GCP project has several logs that are relevant to a GKE cluster. These include the Admin Activity log, the Data Access log, and the Events log.

Table 1 shows the current options Log Info and how long Google Cloud log retention trivia for the exam.

Table 1 Log Retention

Logging retention varies based on the type of log. The default range is between 30 days and up to 400 days.

Logs Router checks each log entry against existing rules to determine which log entries to discard, which log entries to ingest into Cloud Logging, and which log entries to export using log sinks.

To list current logs.

#gcloud logging logs list

Table 2 shows the current options Log Type and how long Google maintains the logs for you.

Table 2 Log Retention

For more information on Log Retention please refer to the Google Cloud Documentation site here.

https://cloud.google.com/logging/quotas Figure 6 GKE Resource Types

Image for post
Image for post

Unlike other services in Google Cloud, GKE presents an additional choice of monitoring. In Google Kubernetes Engine (GKE), there is native integration with Cloud Monitoring and Cloud Logging.

When you create a GKE cluster, Kubernetes Engine Monitoring is enabled by default and provides a monitoring dashboard specifically tailored for Kubernetes.

With Kubernetes Engine Monitoring, you can control whether Cloud Logging collects application logs. This monitoring can also be disabled for Cloud Monitoring and Cloud Logging as needed.

Cloud Operations is native to Google Cloud and therefore the recommended approach by Google Cloud.

Note — Other Options for Monitoring
Many monitoring solutions use the Kubernetes DaemonSet structure to deploy an agent on every cluster node.

Note also that each tool has its own software for cluster monitoring.

Heapster is another option that could also be used, Heapster is a bridge between a cluster and a storage designed to collect the cluster metrics.

Cloud Operations is native to Google Cloud and therefore the recommended approach by Google Cloud.

Error Reporting

Cloud Operations Error Reporting analyses and aggregates the errors in your cloud applications. It will notify you when new errors are detected. Error Reporting brings you the processed data directly to help you understand and fix the root causes faster.

You can also receive email or mobile alerts via mobile applications

Figure 7 Viewing Errors

Image for post
Image for post

Errors that are grouped together are usually very similar, so Error Reporting keeps only 1,000 samples. Keep all occurrences of an error then exporting your logs to BigQuery will save all errors.

You must know what the Error Codes 400/403 are in several contexts.

Test Tip — You must know what the Error Codes 400/403 are in several contexts.

Cloud Debugger

Debugger lets you inspect the state of an application at any code location without stopping or slowing it down. Debugger provides real time capture by inspecting applications and not have to stop the application. Supports runtimes Java, Python or Go and using Snapshots or Log points

Debugging captures the call stack and local variables of a running application. You can also inject logging into a service without stopping it.

Figure 8 Cloud Debugger

Image for post
Image for post

Debugging captures the call stack and local variables of a running application. You can also inject logging into a service without stopping it.

Process to debug application

  1. Clone Project > Repository
  2. cd Sample Code
  3. gcloud config set project MyDev123
  4. gcloud app deploy -version=v1
  5. gcloud app browse
  6. Validate output
  7. Setup up a snapshot
  8. Rerun application

I will cover more target objective focused details on Debugger during the objective coverage. Cloud Operations Trace

Gather and analyse TRACE flows understanding performance issues such as latency or bottleneck discovery
Stackdriver Trace can also analyse apps and generate reports

There is also a Trace SDK which currently supports Java, Node.js, Ruby, and Go

Figure 9 shows some of the options for selecting source code to trace

Figure 9 Trace Options

Image for post
Image for post

Cloud Operations Profiler

Stackdriver Profiler allows developers to analyse applications running in GCP and other cloud platforms, or on-premises

Stackdriver Profiler continuously analyses the performance of CPU or memory-intensive functions executed across an application.

Stackdriver Profiler presents the call hierarchy and resource consumption of the relevant function in an interactive flame graph.

The main use case for continuous CPU and heap profiling is to improve performance and reduce costs around your application.

In Summary

Cloud Operations is a very powerful and complex suite of solutions from Google.

The learning curve for Cloud Operations is more than just a few hours so I encourage you to further your learning in this area.

Table 3 Operations Suite (Stackdriver) Use Cases

Operations Suite Capability

Summary of Use Case

Operations Suite

Allows you to Monitor, troubleshoot, and improve application performance on your Google Cloud environment

Monitoring

Use for a native monitoring solution for Google Cloud environment insight or extend to a hybrid solution with AWS.

Logging

Maintain audit compliance and historical activity

Error Reporting

Analyses and aggregates the errors in your cloud applications.

Trace

Gather and analyse TRACE flows in near real-time to identify bottlenecks.

Debug

Inject logging into a real-time service without stopping it (call service)

Profiler

Understand CPU and heap profiling to improve performance and reduce costs

The Cloud Architect exam will test you heavily in this area so prepare accordingly. I encourage you to take the free Developer focused labs on Codelabs to get a firm idea of how to use Stackdriver.

If you’re not familiar with Cloud Operations then your likely not going to pass this exam.

https://codelabs.developers.google.com/

Figure 10 shows some of the options for selecting source code to trace

Figure 10 Codelabs

Image for post
Image for post

Joe Holbrook

TechCommanders.com

About TechCommanders –

TechCommanders is an online training platform for both aspiring and veteran IT professionals interested in next generation IT Skills.
TechCommanders is led by Joseph Holbrook, a highly sought-after technology industry veteran.

Techcommanders offers blended learning which allows the students to learn on demand but with live training.

Courses offered are used to prepare students to take certification exams in Cloud, DevOps, IT Security and Blockchain.

Techcommanders was established in Jacksonville, Florida in 2020 by Joseph Holbrook, both a US Navy Veteran and a technology industry veteran. Techcommanders, Advancing your NextGen Technology Skills.

 

TechCommanders. Leading the Charge in Next Gen IT Training

Join TechCommanders Today. 

Over 60 Courses and Practice Questions! 

Coaching and CloudINterviewACE

Join TechCommanders

Stay connected with news and updates!

Join our mailing list to receive the latest news and updates from our team.
Don't worry, your information will not be shared.

We hate SPAM. We will never sell your information, for any reason.