How to build, manage and optimize multi petabyte scale data lakes

Session Outline

In this session, we will discuss the need for data lakes, what does it take to build and scale a data lake with 100’s of petabytes data at Uber. We will understand the challenges in managing such a massive scale and also discuss solutions. Finally, we will discuss some techniques to optimize your data lakes to reduce costs.

Key Takeaways

What are the requirements from a data lake
How to decide what data lake technologies to use
What are the pitfalls associated with massive scale data lakes
Operational challenges when managing large scale data lakes

————————————————————————————————————————————————————

Speaker Bio

Nishith Agarwal – Engineering Manager | Uber

Nishith leads the Data Infra team at Uber, where he manages storage and compute platforms. He is a PMC of Apache Hudi and has 10+ years of experience in distributed systems and databases.

October 14 @ 16:00

16:00 — 16:20 (20′)

Day 1 | M4 | Data Management Stage

Nishith Agarwal – Engineering Manager | Uber

How to build, manage and optimize multi petabyte scale data lakes

Nishith Agarwal – Engineering Manager | Uber

Legal

Contact

Regional Summits