@rajeev9293

Excellent stuff and lot of details covered in short time. I always need to watch your videos multiple times to grasp all the intricacies since your content covers so much depth.👏

@HimanshuPatel-wn6en

Your videos are gem, many so-called paid courses do not have this level of quality.

@knightbird00

Bookmarks 

Talking points 
4:02 Job types - Cron, DAG (build>test>deploy), Can include SLA Guarantees, binary size, job sandbox, security timeouts.
5:46 DAG scheduling, data schemas and sharding (need ACID, single leader, replication for reads) 

Execution
12:04 Scheduler table, may use status with locks (scheduled, running, complete) 
14:20 Scheduling Performance, can use scheduler, use queue with priority with consumer groups
21:00 Job completion, run at least once or only once, idempotent jobs
28:00 Diagram

@venkatadriganesan475

One of the excellent System design videos I have ever seen, Touched all the concepts in 30 minutes.

@viralvideoguy1988

I'm a chronic procrastubater myself. Thanks for taking the time to create this Jordan.

@hazardousharmonies

Another Jordan classic - great learning material as always! Thank you Sir!

@technical3446

One of the alternatives will be to have a job execution table, and the scheduler creates an entry here for a job. The scheduler based on the information about various executors will assign the  job to a specific executor. Then there is a scheduler agent lib running on the executor which will scan the job execution table to execute the jobs that are assigned against that executor ID. This is very similar to how kubernetes foes pod scheduling. One significant benefit of this approach is that the scheduler can be very sophisticated, as he has the full view of worker state and that job backlog.

@sumeet2707

Thank you Jordan! This is the best video I have seen on this topic ❤

@ajayreddy9176

Basically Jenkins master and slave set up deployed on Kubernetes for scalability

@kevinding0218

Thank you, Jordan!

I still have some clarifications to get a better understanding:

1. What does "step" mean in the context of updating the run_timestamp each time we process the job? For example, if we update the job's run_timestamp from 2:01 to 2:06, is this just a one-time update, or do we continue to update it at subsequent steps, say from 2:06 to 2:11?

2. I'm struggling to understand the need for the run_timestamp according to "increase the run_ts for reflect how much time we should wait before rescheduling the job".

Especially when we already have a status column. Typically, we can determine which jobs to queue by checking the status field, for example, moving jobs from "READY" to "PROCESSING". For scenarios involving failure and retry, if a job fails and the executor is still operational, we could simply update the status to "FAILED". If the executor fails, it seems another executor pick up the job via a message queue, and handle the status updates accordingly? 

3. Concerning priority scheduling, is there a risk of resource wastage, especially since it appears that all long-running jobs might subsequetitially occupy all executor resources connected from low to mid and to high-level message queues, since we always have any job start from the lowest level?

@shahnawazalam9939

QQ Jordan: Why we have put 2 Kafkas to stream job events from "DAG Table" and "Cron Table" to "SchedulingTable"? Why didn't we put 1 Kafka after Job EnqueueService to stream events (Cron or DAG) into the Scheduling Table? I believe this would simplify the design and also make JobEnqueueService asynchronous. One possible advantage I can think of in your design is that by committing initially in DB, we are ensuring the clients that we have successfully taken job requests. Whereas Kafka messages can drop even though Kafka has high retention policy.

@owenmorris1725

Just wanna say I really like the addition of the initial high level design! Definitely wouldn’t say it was incomprehensible before (I think your other videos are great too, thanks for all the content!), but this style definitely feels a little more like interview style and helps to better understand where your deeper explanations fit in the system.

@ravipradeep007

Excellent video Jordan 
1. I have few doubts on how the system would scale when 
 R1. For a high priority job scheduled at 2pm i want it to get executed within 200ms of scheduled time 
Constraint : The s3 binary for the job itself might be 100 mb , and downloading that would take 5 sec .

Here is my high level approach 
Two options here .
1.Have a resource manager 
2.Execution Planner 
3.Executor 
Execution planner , at 1.30 pm starts and see what are the tasks planned at 2.00 pm .
Categorizes them into high resource , medium resource, low resource
and how much 
Talks to Resource planner pre identify apppropriate workers and pre warm the nodes , 
1. Pre download the s3 binary 
Creates task execution , worker node mapping 
Any changes eg. cancellation are communicated to the worker nodes, 
Now at 2.00 pm , it can again result into a thundering herd problem where the database gets inundated with queries , 
To avoid that , we can push the jobs , before to workers , and a local cron job , 
so it runs exactly at 2.00 pm , since the binary is already downloaded.

@divyanandlalsahetya9324

whats the schema of DAG table? 
and you added a job status service which connects to DAG table, is that only to update DAG's epoch?
Where does binary get uploaded here to S3? and where is it read?

@shawnngohungson9662

Question at time 22:42: if we have the table with the partition key (run_time + random number), we cannot update the run_time for next retry (at 12:45), am I right? NoSQL DB doesn't allow to update the partition key.

@tysonliu2833

For borg, I think you can also specify the resources required, priority and preferred dc(s). These can all be accomplished by having one queue per priority/dc, and if my job is preferred to be run in dc A or B, and A picks it up first, it will update the job DB (the one with strong consistency) so that when dc B picks up the same job, it will add it back to the queue instead of actually running it (or throwing it away). Now if dc A is down, after some time when dc B picks up the job again, and realize that it has been processing for unreasonably long, maybe it can run it.

@marksun6420

Great video! In the part about scheduling dag jobs, why we have to declare job 1 and 2 are dependencies of job 4 and 5? The execution of job 1 and 2 are based on the current timestamp and if it is their turn, they can run without assuming 4 and 5 are dependencies. On the other hand, knowing job 4 and 5 are not the dependencies of any other job, we can tell that they are the last job to run and can mark the dag as succeed? So having job 1 and 2 to say they are dependent on 4 and 5, how can we tell a dag is finished

@karangoyanka147

One approach is to use a distributed lock in Redis to prevent multiple executors from rescheduling the same job. An executor can acquire a lock on a job ID by creating an entry in Redis with a TTL. Do you think this is a good idea?

@Anonymous-ym6st

everytime rewatch the video feel learning something new. One new question: at 16:40 for the problem about load balancing (LB need to know status of the all executor, and can be single point of failure); isn't this the problem for all LB (not only for job scheduler problem)? for point of failure we can use multiple LB (active-active/active-passive etc.). Asking as if it is a common problem for LB, why LB are used frequently in most of other designs?

@yanxinchen8439

Jordan, excellent video, for the case of using kafka as the message queue, one consumer in a consumer group can process the job in a partition sequentially, what if a job takes a long time to run, will it be a problem of blocking all following jobs in this partition ？shall we wait or kill the job after sometime or move it to the retry queue ?