Session Outline

In this session, we will discuss the need for data lakes, what does it take to build and scale a data lake with 100’s of petabytes data at Uber. We will understand the challenges in managing such a massive scale and also discuss solutions. Finally, we will discuss some techniques to optimize your data lakes to reduce costs.

Key Takeaways

  • What are the requirements from a data lake
  • How to decide what data lake technologies to use
  • What are the pitfalls associated with massive scale data lakes
  • Operational challenges when managing large scale data lakes

————————————————————————————————————————————————————

Speaker Bio

Nishith Agarwal – Engineering Manager | Uber

Nishith leads the Data Infra team at Uber, where he manages storage and compute platforms. He is a PMC of Apache Hudi and has 10+ years of experience in distributed systems and databases.

October 14 @ 16:00
16:00 — 16:20 (20′)

Day 1 | M4 | Data Management Stage

Nishith Agarwal – Engineering Manager | Uber