In Spark, the query execution plan is the entry point to understanding how the spark query is executed. This is very important, especially while debugging or investigating the execution in heavy workloads, or when the job takes a long time to run. Understanding the query plan is the first step one has to make towards optimizing the Spark code.
If we look at the query execution page, we see terms like Task, Stages, and Jobs. Let’s try to understand these terminologies before we move further.
A Task is a single operation applied to a single partition. Each task is executed…
Dependency hell is a negative situation that occurs when a software application is not able to access the additional programs it requires in order to work. In sofware development, additional programs that software requires are called dependencies. Sometimes known as JAR hell or classpath hell, dependency hell’s common outcomes include software performing abnormally, bugs, error messages when trying to run or install software, or the software ceasing to function.
Say your application uses two libraries,
lib-b. Both these libraries use a shared library
Initially, everything works smoothly.
The Daily Scrum is an essential event for inspection and adaption, run daily to ensure that the Scrum is on its path to achieving the Sprint Goal. It helps in creating transparency, thus enabling inspection of the Scrum.
Typically, a good Scrum Team won’t need more than 10 to 15 minutes to inspect its progress towards the Sprint Goal. Given this short period, it is interesting to observe that a lot of strange personas unknowingly obstruct the smooth conduction of this event. Let us discuss a few of those personas and see how we can tackle them.
The retrospective is a ceremony held at the end of each Sprint where team members collectively analyze how things went in order to improve the process for the next Sprint.
The purpose of the Sprint Retrospective is to
It provides a formal opportunity to inspect and adapt the working of your scrum. …
Suppose an array sorted in ascending order is rotated at some pivot unknown to you beforehand.
[0,1,2,4,5,6,7] might become
Find the minimum element.
You may assume no duplicate exists in the array.
def findMin(nums: List[int]) -> int:
left, right = 0, len(nums) - 1
while nums[left] > nums[right]:
middle = int((left + right)/2)
if nums[middle] < nums[right]:
right = middle
left = middle + 1
We use a modified version of binary search, to find the “Inflection Point”.
A conveyor belt has packages that must be shipped from one port to another within
i-th package on the conveyor belt has a weight of
weights[i]. Each day, we load the ship with packages on the conveyor belt (in the order given by
weights). We may not load more weight than the maximum weight capacity of the ship.
Return the least weight capacity of the ship that will result in all the packages on the conveyor belt being shipped within
Input: weights = [1,2,3,4,5,6,7,8,9,10], D = 5
A ship capacity of…
Design a stack that supports push, pop, top, and retrieving the minimum element in constant time.
MinStack minStack = new MinStack();
minStack.getMin(); // return -3
minStack.top(); // return 0
minStack.getMin(); // return -2
getMinoperations will always be called on non-empty stacks.
You are given two integer arrays nums1 and nums2 sorted in ascending order and an integer k.
Define a pair (u,v) which consists of one element from the first array and one element from the second array.
Find the k pairs (u1,v1),(u2,v2) …(uk,vk) with the smallest sums.
Input: nums1 = [1,7,11], nums2 = [2,4,6], k = 3
Explanation: The first 3 pairs are returned from the sequence:
Input: nums1 = [1,1,2], nums2 = [1,2,3], k = 2
Explanation: The first 2 pairs are returned from the sequence:
So, yesterday I tried to set up Airflow for a pet project.
My project basically needed Airflow to run a job every 5 mins to pull data from various sources, transform the data perhaps and write to an Elasticsearch index.
I wanted to dockerize it, so I can deploy the entire the set up easily in any machine. This way I can share my project with anyone, and they can set it up in their machine and get started.
That was the goal.
Before I start, let me brief you on some key concepts/ Terminologies in Airflow.
A DAG is…
Given an array nums containing n + 1 integers where each integer is between 1 and n (inclusive), prove that at least one duplicate number must exist. Assume that there is only one duplicate number, find the duplicate one.