
About Me
I'm a Lead Data Engineer with a passion for building scalable data infrastructure and turning raw data into actionable insights. Based in London, I specialize in cloud data architecture, ETL pipeline optimization, and database performance tuning.
With experience across AWS, ClickHouse, MongoDB, and Apache Airflow, I've architected solutions that process terabytes of data weekly while achieving 1000x performance improvements and 75% cost reductions.
Currently leading a team of data engineers at SecurityHQ, where we build cutting-edge cyber intelligence data pipelines serving enterprise clients across finance, government, and insurance sectors.
Testimonials
Derek Hoogewerf
Information Security | Cloud Security | AWS Certified Solutions Architect
I offer my sincere recommendation for Vikramaditya. He has an impressive set of skills, particularly in Python, MongoDB, and RDBMS. Vikramaditya is a skilled developer who has deep knowledge of Python...
Experience
December 2024 - Present
December 2024 - Present
- Spearheaded the migration of the primary analytical database from a self-hosted, sharded MongoDB cluster to ClickHouse Cloud. This initiative improved analytical query performance by over 1000x and reduced annual platform costs by 75%.
- Built and maintained end-to-end scalable data pipelines using TDD from Cyber Intelligence vendors such as SentinelOne, Microsoft Azure Sentinel, Defender, GroupIB, IBM QRadar, Qualys, etc. using AWS Lambda, AWS Glue and Apache Airflow.
- Proactively identified hidden problems and patterns for database performance.
- Mentored and guided a team of 5 data engineers while acting as a code reviewer and providing technical expertise and project oversight for developing Airflow DAGs and creating database schemas.
- Created CI/CD workflows for deploying DAGs, updating Docker images and Devcontainers using GitHub Actions. Integrated secure GenAI driven development into workflows.
- Accelerated security incident response and analysis by architecting and deploying interactive PowerBI and Streamlit dashboards for EDR data sources.
June 2022 - December 2024
June 2022 - December 2024
- Administered and set up a sharded MongoDB cluster in a self-hosted secure environment from scratch, orchestrated backups and acted as the SME for MongoDB.
- Architected and developed a scalable, high-throughput, Python-based data pipeline, leveraging asynchronous programming to reliably stream over 10TB of security event logs weekly.
- Designed and built robust data pipelines to ingest diverse data formats (JSON, XML, CSV, Parquet) from REST APIs and AWS S3, demonstrating proficiency in API integration using Apache Airflow.
- Ensured high standards of data quality and accuracy throughout the data lifecycle for clients across diverse sectors including financial services, construction, government, and insurance.
- Automated data integrity monitoring by developing a Tableau dashboard that reconciled source REST API metadata with target table counts, significantly improving data reliability and stakeholder trust.
Feb 2022 - June 2022
Feb 2022 - June 2022
- Optimized SQL Server databases for customers ensuring data integrity and query performance using best practices.
- Monitored CPU utilization, memory usage, and disk I/O in customer environments to identify potential issues proactively.
- Diagnosed and resolved complex database issues, including system errors, performance bottlenecks, and compatibility problems.
May 2021 - Nov 2021
May 2021 - Nov 2021
- Developed a critical component for the DynamoDB meta-data team that promised 99.999999999% availability.
- Migrated client discovery service from Java to Rust to overcome the drawbacks of the garbage collector and target consistent sub-1ms client-side latency.
- Optimized the service by implementing multi-threading, an asynchronous network runtime and data compression over the wire.
Education
Interests













