Session Outline
In this session, we will discuss the need for data lakes, what does it take to build and scale a data lake with 100’s of petabytes data at Uber. We will understand the challenges in managing such a massive scale and also discuss solutions. Finally, we will discuss some techniques to optimize your data lakes to reduce costs.
Key Takeaways
- What are the requirements from a data lake
- How to decide what data lake technologies to use
- What are the pitfalls associated with massive scale data lakes
- Operational challenges when managing large scale data lakes
————————————————————————————————————————————————————
Speaker Bio
Nishith Agarwal – Engineering Manager | Uber
Nishith leads the Data Infra team at Uber, where he manages storage and compute platforms. He is a PMC of Apache Hudi and has 10+ years of experience in distributed systems and databases.
October 14 @ 16:00
16:00 — 16:20 (20′)
Day 1 | M4 | Data Management Stage
Nishith Agarwal – Engineering Manager | Uber