Skip to content

Latest commit

 

History

History
205 lines (147 loc) · 8.11 KB

README.md

File metadata and controls

205 lines (147 loc) · 8.11 KB

Rajat Verma

About me!

Hi! I'm Rajat, a Data Scientist with 6 years of experience, based in Delhi. I have a background in Industrial and Systems engineering from IIT Kharagpur. I have extensive experience in solving problems through Data Science in organizations from a range of industries, including retail, media, and consulting. My key interests lie in Natural Language Processing and Recommendation Systems. My approach centers on developing streamlined, high-impact solutions. Throughout my career, I've driven success through taking initiative, diligence, and an unwavering commitment to continuous improvement. I have 3 ginger cats, and I enjoy hiking in the Himalayas in my spare time.

Currently I am working as a Lead Data Scientist at The Home Depot.


Educational Background

IIT Kharagpur - Industrial and Systems Engineering - Dual Degree (B.Tech & M.Tech) | Class of 2018


Work Experience

Lead Data Scientist at The Home Depot (via Decision Culture)

  • Next Best Action: Developed a configurable Python package for Next Best Action Sequence Models, enabling data science teams to train performant recommenders, and enhancing customer experiences across The Home Depot.
  • Causal Inference Framework for Customer Insights: Devised a reusable causal inference framework, enabling cross-functional teams to quantify the impact of adverse events on customer behavior, including:
    • Email Opt-out Revenue Impact Analysis: Quantified potential revenue loss per customer due to email opt-outs, informing targeted retention strategies.
    • Validated Framework Efficacy: Successfully validated the framework's results against a Mobile App team's A/B test, achieving >99% alignment in measured lift, and demonstrating the framework's reliability for future use cases.
  • Lookalike-Modeling: Framework for semi-supervised learning on Retail Data. Helped create customer persona based on spend history, demographics, browsing patterns for effective targeting
  • Propensity Models Improvement: Studied exising propensity models, and devised new features to improve performance of Class Propensity Models upto 15%

Senior Data Scientist at Times Internet - Economic Times

  • Economic Times Newsletter: Created news recommendation pipeline for 1.5 million users to increase click-through rate from 4.8% to 15.3% using Apache Airflow for scheduling, monitoring, and logging

  • News Summarisation API: Google’s Pegasus-Large Model customized for ET markets data and finetuned for English Summarization. Built Flask API as an endpoint which is currently in production

  • Summarization for Indic Languages: Tested on various statistical summarization methods for Bangla, Tamil, Telugu, Malyalam, Marathi and Gujarati News

  • Keyword Recommendation for Articles (BERT + Solr): Designed a system for recommending Keywords for articles using Solr. Indexed Wikipedia data in solr search engine. Used BM25 for coarse search and BERT embeddings for ranking. Tested for 10 searches per second over 6M articles

  • Hack and Hustle 3.0 Hackathon Finalist 2022: Lead a team of 3 senior developers to build a Twitter Social listening API to provide Sentiment, Subjectivity and Engagement Insights on user specified Stocks

Data Scientist at ZS

  • Rare Disease Identification using EHR: Devised PU-Classifier using GANs. Automated Rare Disease Identification pipeline from Electronic Health Records , leading to lead time reduction from weeks to 10 work hours
  • Compliance Prediction and Monitoring: Resource level Non-Compliance prediction for fortune 100 Hi-Tech client, leading to 23% reduction in non-compliance and $ 150K savings over 6 months
  • Asymptomatic Liver Disease Progression Modeling: Predicted high risk NASH patients with 86% AUROC in EHR, identified 4 additional non invasive markers for fast-progressing NASH. Model was used for identifying patients for their focused treatment and higher enrollment in clinical trials
  • Text Mining and Classification: Extracted unstructured clinical trial data for given therapy area in order to analyze current research scenario and trends

ZS - Data Science Intern

Data Science Intern Jan. 2017 - June. 2017

  • Text Multi Class Sentiment Analysis: NLP pipeline for unstructured social listening data using bidirectional LSTM, TF-IDF Random Forest and Word2Vec Averaging MaxEnt. 72% Accuracy and 0.45 Cohen’s Kappa Score
  • Smart Scheduling Tool for Recruiters: Created a tool in Excel VBA for Recruiters to plan and monitor daily interviews with flexibility of sending automated mailers

Programming languages

Python, Pandas, Numpy, sklearn, Pytorch, Tensorflow, Keras, Transformers, SQL, VBA


Projects/Blogs