Developer
Jakub Kielbasiewicz
Senior Big Data Engineer
Wrocław, Poland
SQL
Spark
R
Python
Flask
Databricks
Azure
GCP
AWS
ETL
MLOps
PowerBI
Tableau
Airflow
Kafka
About
I am Machine Learning OPS enthusiast, with broad experience in Big Data applications. I have delivered numerous architectures and implementations of data-intensive applications, mainly in cloud environments. I got a lot of experience in cloud environments, with strong emphasis on data, ML and AI solutions.
Skills
Languages
R, Python, Scala
Frameworks
Spark, Hadoop, Flask, Django, Kafka
Libraries/APIs
Pandas, Scikit learn, Matplotlib, Numpy, PySpark, sqlalchemy, pytest
Paradigms
Functional programming
Platforms
Azure, AWS, GCP, Databricks, Hadoop
Storage
Cassandra, HDFS, RDBMS, Data Lake, Delta Lake
Tools
Tableau, Power BI, SSIS, SSAS, BigQuery, Redshift, Data Factory, Airflow
Experience
Data Engineering
5 years
Spark
3 years
SQL
5 years
Azure
3 years
GCP
1 year
AWS
2 years
R
2 years
Hightlight Projects

IoT Data Quality Control

Developed Data Quality solution for IoT implementation in manufacturing company, which allowed assessing the data quality of sensor data
/
/

Initial idea of the project was to deliver real-time dashboarding solution using PowerBI, but it has escalated quickly, the final product has involved machine learning algorithms, database re-design and metadata control activities. Along with the dashboard I have proposed a new solution for data quality management in companies - the iterative methodology and step-by-step process of how to work with IoT data quality.

DigitalWarehou.se

Full product to manage digital goods
digitalwarehou.se
digitalwarehou.se

I have architected an award-winning project in Azure - datawarehou.se - online application for storing digital goods (like game keys). Along with fullstack team we're working on making this solution production-ready.

Snowflake migration

Data warehouse migration to snowflake
HumanN
HumanN
  • Migrated data warehouse to snowflake
  • re-designed reporting environment (Tableau)
  • tuned ETL performance

Data Integration

Designed and implemented reporting solution for insurance broker agency
Toptal
Toptal
  • Streamlined reporting process
  • Designed and implemented data warehouse
  • Implemented ETL process
  • Allowed company to get insight from their data
Work Experience
BI Consultant
Tech Data Client Solutions
|
Jun 2016 - Apr 2018

● Current workloads maintenance

● Resolved ETL issues

● Creating end-to-end reporting environment and solutions for Tesco Mobile (Microstrategy) - developed Microstrategy solution to allow data governance

Microstrategy
TSQL
SSIS
SSAS
PowerBI
ETL
ETL & Automation Analyst
Ryanair
|
Apr 2018 - Dec 2019

● Query performance tuning - highly improved ETL processes performance required for reporting solutions in Marketing team

● Data integration automation (R, Python, SQL, MDX) - eliminated human factor from ETL process completely (full automation) 

ETL
R
Python
MDX
Impala
Hadoop
Bash
SQL
BI Developer
Ryanair
|
Dec 2018 - Jun 2019

● Created ETL processes, which allowed company to integrate data during LaudaMotion takeover and use Data Science algorithms to measure fuel consumption

● Modeled new data marts focused on integrating big data solutions based on web analytics and marketing with consumer-related Data Warehouse

● Developed Data Quality checks

● Prepared PoC solutions in Azure(Databricks, ADF), graph databases, Power BI

ETL
SSIS
SSAS
PowerBI
R
Databricks
Azure
ADF
Data Engineer
Aberdeen
|
Jun 2019 - Apr 2020

● Master Data Management - created new ways to cleanup the data, using web scraping and algorithms (Python, PySpark)

● Performance tuning

● SQL Server administration - data partitioning, index maintenance, security, backup policies

● Advanced string matching algorithms improvement - about 500% improvement in edge cases

● Migrating product to AWS (re-designing the data product - using AWS EMR, s3, RDS, Lambda, Athena, Glue) 

AWS
Python
Spark
SQL
SSIS
Elasticsearch
Cloud Data Engineer
Roche
|
Apr 2020 - Sep 2020

Developed IoT Data pipelines - Azure Synapse, HDInsight, Azure Data Factory, Databricks, CosmosDB

● Architected storage solutions and ETL processes (Azure)

● Performance Tuning of Databricks performance

● Created, maintained and streamlined ADF processes

Azure
Spark
Databricks
SQL
ADF
Synapse
IoT
Senior Big Data Engineer
Circle K
|
Sep 2020 - Present

● Maintenance of Databricks clusters (performance tuning, clusters scaling, updating, monitoring)

● Streaming processes development

● Machine Learning deployment and scaling

● SparkR, pySpark, SparkSQL performance tuning

● Designed, built and supported data processing pipelines built in pyspark, sparksql, sparkR, ADF and Airflow

● Maintenance and improvement of Airflow pipelines

Spark
Databricks
R
SparkR
Airflow
Azure
Machine Learning
Docker
Kubernetes
Education

Wroclaw University of Science and Technology

Wroclaw
|
Oct 2016 - Feb 2020
Computer Science

University of Wroclaw

Wroclaw
|
Oct 2014 - Jun 2017
Economics
Certifications

MCSA: SQL 2016 Database Development

Apr 2018 - Permanent
Microsoft

MCSA: SQL 2016 Database Administration

Mar 2019 - Permanent
Microsoft

MCSA: SQL 2016 BI Development

Jun 2018 - Permanent
Microsoft

MCSA: SQL BI Reporting

Sep 2019 - Permanent
Microsoft

MCSE: Data Management and Analytics

Mar 2019 - Permanent
Microsoft

Microsoft Certified: Azure Administrator Associate

Apr 2019 - Apr 2021
Microsoft

GCP: Associate Cloud Engineer

Dec 2019 - Dec 2021
Google

AWS: Cloud Practitioner

May 2020 - May 2022
AWS

Microsoft Certified: Azure Data Engineer Associate

Jul 2020 - Jul 2022
Microsoft

Microsoft Certified: Azure AI Fundamentals

Aug 2020 - Permanent
Microsoft