Data Systems That Scale. Insights That Matter. Growth That Lasts.
Data Systems That Scale. Insights That Matter. Growth That Lasts.
I help organizations build data systems that are not only scalable and reliable but also rooted in purpose, so every insight supports real decisions.
I’m Prakhar Srivastava, a Data & Analytics Engineer with experience designing scalable systems that turn data into meaningful insight.
I care about the ‘why’ behind the data because engineering isn’t just about tools, it’s about solving the right problems in the right way. My focus is on building reliable, efficient pipelines and analytics layers that support real decisions, not just dashboards.
📍Tallinn, Estonia
Bridged technical and business needs across diverse domains like fintech, geospatial, healthcare, telecom, compliance, and agritech from startups to MNCs.
Cloud & Infrastructure
GCP, AWS, Snowflake, Databricks, Terraform, Docker, Kubernetes
Data Engineering & Pipelines
Airflow, dbt, Spark, SQL, Python, Git
Data Analytics & Visualization
Tableau, Alteryx, QGIS, Neo4j
Bondora is a European fintech company operating in Estonia, Finland, Netherlands, Denmark, and Latvia, offering easy-to-access personal loans and enabling investors to earn
passive income through automated investment products such as Go & Grow and Portfolio Manager.
Design, develop, and maintain modular dbt models that transform raw data from multiple sources (loan applications, repayments, underwriting, and scoring systems) into curated, production-ready datasets, using incremental loads to handle large data volumes efficiently for financial reporting, investor dashboards, credit risk analysis and regulatory compliance.
Collaborate with cross-functional teams including product managers, analysts, and engineers to align on business definitions and enhance data consistency; refactor old dbt models to improve performance and reliability across KPIs, loan funnel metrics, and risk segmentation, resulting in 20% faster access to actionable insights.
Implement and maintain robust data quality checks using dbt tests, schema validations, and freshness monitoring in Databricks, supported by custom hooks and SQLFluff code reviews to enforce standards and catch issues early, ensuring reliable and maintainable analytics deliverables.
Tools: Azure Databricks, Azure DevOps, dbt, SQL, Python, Tableau, EventHub
EyeVi Technologies, an Estonia-based startup founded in 2020, enhances road networks using AI-driven mapping technologies to improve predictive maintenance, traffic
management, and safety auditing for transportation infrastructure clients.
Designed and maintained scalable ETL pipelines in BigQuery and Snowflake to process over 50TB of geospatial data, such as high-resolution road panoramas, traffic signs, and sensor readings. Used partitioning, bucketing, and structured data modeling to improve performance, making large datasets easier to query and manage.
Built automated data validation workflows in dbt to clean up inconsistent road data, like mismatched kilometer markers and blurred signs. Ensured pipelines could handle changing data formats (schema evolution) and rerun safely without duplication (idempotency). Managed infrastructure changes and deployment using Terraform.
Increased pipeline speed by 30% and improved data accuracy by 25% using smart backfilling techniques for reprocessing old data when logic changed. Reduced manual QA time by 40 hours/month and delivered reliable, production-ready datasets using Docker to clients like Google, Prointec, Idea, and Xais.
Tools: GCP (BigQuery, Looker), Snowflake, DBT, Python, PostgreSQL, Airflow, Terraform, Docker, Kubernetes, Linux, Git.
KappaZeta, an Estonian startup established in 2015, leverages satellite technology with AI to help farmers boost agricultural efficiency and sustainability.
Developed and deployed cloud masking and crop boundary models using PyTorch, CNNs, and Transformers on Sentinel-2 imagery as part of an ESA-backed project, covering over 1.3M sq. km to support insurance companies with remote crop monitoring and claims assessment.
Built a GAN-based cloud removal model, containerized with Docker, deployed via AWS SageMaker, and managed data through S3. Used Git for version control and collaboration.
Improved model accuracy by 35–40% through tuning and validation in QGIS, and integrated models into scalable batch inference pipelines with automated retraining.
Tools: AWS (S3, SageMaker), Pytorch, Python, QGIS, GAN, Transformer, CNN, Docker, Git.
Researched with a PhD student to optimize a real-time pedestrian detection model using transformer architectures. Tasks included data preprocessing (augmenting and normalizing large datasets), model training (experimenting with different transformer configurations), and utilizing a high-performance computing cluster with 4 GPUs.
Addressed the challenge of detecting pedestrians in real-time under diverse environmental conditions, optimizing the model to handle variations in lighting and occlusions. This involved integrating attention mechanisms and fine-tuning hyperparameters to improve detection accuracy and reliability, achieving the inference speed of 30 FPS.
Enhanced the pedestrian detection model's accuracy by 12% and reduced inference time by 25%, contributing to a significant improvement in real-time processing capabilities.
Tools: Google Colab, Pytorch, Python, NumPy, Pandas.
Project type: Telecom
Developed and maintained ETL pipelines to process large volumes of telecom data for the Dubai-based client DU, including call detail records (CDRs), billing info, and subscriber plans. Improved performance by applying techniques like partition pruning, caching, and broadcast joins, reducing processing time by around 40%. Pipelines were orchestrated using Airflow and deployed on AWS.
Took care of sensitive subscriber data by implementing proper PII handling and ensuring GDPR compliance through masking and access controls. Designed customer data models using Slowly Changing Dimensions (SCD Type 2) to track customer history changes, helping teams run better churn analysis and customer profiling.
Reworked older ETL workflows to handle late-arriving data, making daily and near real-time reports more accurate and reliable. Tuned SQL logic and added validation checks to improve data quality by 30%. Also contributed to containerizing pipelines with Docker and managing deployments using Kubernetes, ensuring smoother rollouts and environment consistency.
Tools: AWS, Spark, Snowflake, Python, SQL, Airflow, Docker, ElasticSearch, Tableau, Kubernetes, Linux.
Project type: Healthcare
Analyzed large datasets including patient records, prescription data, and delivery logistics for US-based client company IHA Independent Health to optimize e-medicine delivery processes, maintained data pipelines, and collaborated with cross-functional teams to create KPIs using PowerBI, resulting in a 15% increase in sales and improved decision-making.
Identified delivery delays using EDA in Alteryx and SAS Enterprise Miner and built a predictive model for route and time optimization, reducing delivery time by 15%.
Leveraged data from the Oracle database to support scheduling improvements and real-time tracking. This resulted in a 20% boost in customer satisfaction and 15% cost reduction.
Tools: Python, SQL, Microsoft Excel, SAS Enterprise Miner, Airflow, PowerBI, Oracle database, Linux.