Spark developers often face lot of challenges while debugging the application. The challenge is compounded when developing and testing against a transient cluster, and the developer is left at the mercy of logs stored in Object store like AWS S3 or Yarn Nodes.
This video, demonstrates how Gigahex can be used to identify common causes of slow performance or failures of Spark application. Common causes like - skewed joins, uneven distribution of tasks across executors, lack of storage memory and too much time being spent in garbage collection - can easily be discovered with the metrics dashboard in Gigahex.
About Gigahex
Gigahex is a observability platform for Spark batch applications. It allows developers to monitor the deployed spark application, running on transient clusters in cloud like AWS EMR, Google Dataproc or even a cluster running in your laptop.
It's currently in alpha stage, and looking for developers who would like to be part of the early access program. Request access today
gigahex.com/
#apachespark #monitoring #developer
コメント