@tryexponent

Join the waitlist for Exponent’s Data Engineering Interview Course: https://bit.ly/4cmpq34

@harisridhar1668

I strongly appreciated the trade-offs and the architecture insights discussed:

1.  A hybrid approach combined Spark Streaming versus Apache Flink as distributed computing platforms, based on latency criteria for clickstream metrics. The justification being that Apache Spark streaming works well with >= 1 second metrics generation, whereas Apache Flink meets single millisecond / sub-second performance.
2. To use the push model ( agents and daemons ) versus the pull model ( the infra ) for large-scale data pipelines : the former being better for real-time needs ( even if it may overwhelms the pipeline ), whereas the latter is polling based and may fail to deliver real-time ( or close-enough-to-real-time ) customer behavior insights.
3. The justification for using a NoSQL DB versus a SQL DB for compute storage : NoSQL being schema-less, high-performant, and having low-latency reads and writes - and thus, able to handle large event volumes under ingestion ( e.g. Kafka's 50,000 events / second ), and how he identified the RDBMS storage as a potential pipeline bottleneck.

@prashantsalgaocar

I thought this was too high level. There were no non functional requirements discussed. Also a lot of the complexity was abstracted with Lamdba usage. There should have been more discussion on some of the core functional requirements and non functional requirements and some more deep dives which this system design lacked.

@angelotheman

We need more of these. However try to make it suitable for beginners or better still, state the experience in the title so we know whom this is directed to. 

Thanks

@briandevvn

As per the changes/add, I think we could add the monitoring services there to see the system health and notify if any schema changes are important

@barikung

Thank you for this video. I enjoy watching it and love how you relate the architecture design to AWS services, along with the business-level assumptions that lead to the design of the architecture !

@sc.smitshah

80 / 20 Rule -- 80% of traffic comes in 20% of time. Is this some popular way to identify the metrics or sla ?

@Ikyua

I love this channel so much keep up the content :)!

@PraveenKumarBN

Why is Lambda required right after API Gateway?

@akshayshankar3707

When are you launching the data engineering course?

@probhakarsarkar2430

does the kafka to flibk required or lambda can directly connect to flink?

@god_play

which flowchart application is this? Looks like it allows to pull in AWS icons, data Lake sections etc. by keyword ?

@wtfitcntbdiff

Hidden gem

@arjunekrishna7044

he looks like the director lokesh kanagaraj lol