How to break into Data Engineering in 2023?

Vino Duraisamy
Data Engineer Things
2 min readDec 7, 2022

--

Step by step guide to help you get into junior DE roles in 30 days!

Zach’s post is definitely comprehensive. But for an absolute beginner to data engineering, it might be slightly intimidating.

For absolute beginners:

Here is what I would start with:

  1. SQL — At least intermediate level SQL (Self Joins, Groupby, CTEs, Views)
  2. SQL Query optimization techniques — indexing, etc.,
  3. Python Basics — (Pandas, Numpy, Pytest, Lambdas).
  4. Python Data Project — Pick any dataset you want and do data exploration, cleaning and transformation.
  5. Getting yourself familiar with distributed computing concepts and paradigms are important.
  6. Spark Basics — (Spark Dataframe is almost similar to Pandas, SparkSQL is similar to SQL, just that Spark dataframes are distributed).
  7. Use databricks community edition to work on a big data project.

Intermediate:

  1. Get yourself familiar with any one cloud data warehouse (AWS Redshift, Google BigQuery, Snowflake, Azure Synapse, etc.,).
  2. All vendors have free trials, make use of them. You can find plenty of affordable Udemy courses or youtube videos to walk you through the setup as well.
  3. Read up about data modeling concepts — Data warehouse tool kit by Ralph Kimball
  4. Spark Optimization techniques — skew, spill, shuffle, adaptive query execution, etc.,

Free resources:

  1. Freecodecamp: youtube.com/c/freecodecamp
  2. Data engineering zoomcamp: https://github.com/DataTalksClub/data-engineering-zoomcamp
  3. AWS/Azure/GCP/Snowflake free trainings on youtube/documentation

For interview prep:

  1. Python data structures (basics) — array, stack, list, set, dictionary
  2. And finally, Leetcode/Hackerrank for DSA and simple SQL practice.
  3. StrataScratch for Interview like SQL practice.

Thanks for Reading!

If you like my work and want to support me…

  1. The BEST way to support me is by following me on Medium.
  2. For data engineering best practices, and Python tips for beginners, follow me on LinkedIn.
  3. Feel free to give claps so I know how helpful this post was for you.

--

--

Developer Advocate @Snowflake❄️. Previously Data & Applied Machine Learning Engineer @Apple, Nike, NetApp | Spark, Snowflake, Hive, Python, SQL, AWS, Airflow