DUVVURI DURGA PRASAD

Data Scientist • MLOps Practitioner • Machine Learning Engineer • Deep Learning Engineer

Aspiring Data Scientist with hands-on experience deploying scalable machine learning solutions using Python, Flask, Docker, and AWS. Recently earned a Postgraduate Diploma in Data Science & AI from the University of Liverpool, building a solid foundation in machine learning, Deep learning, MLOps, and cloud-native deployment. Skilled in modern tools like DVC, MLflow, and GitHub Actions to automate and streamline ML workflows. Passionate about writing clean, reproducible code and solving real-world problems with data-driven solutions. Eager to learn continuously, contribute fresh ideas, and thrive in dynamic, collaborative environments.

About Me

Hello! I'm Duvvuri Durga Prasad, an aspiring Data Scientist with a strong foundation in Python, Machine Learning, and MLOps. I recently completed my PG Diploma in Data Science & Artificial Intelligence from the University of Liverpool and hold a Bachelor's degree in Mechanical Engineering.

I am passionate about transforming raw data into actionable insights that drive impactful business solutions. I have independently built and deployed end-to-end ML applications using Flask, Streamlit, Docker, and AWS, with projects spanning finance, healthcare, education, and food tech.

My journey from Mechanical Engineering to Data Science reflects my adaptability and drive to solve real-world problems through data-driven approaches. I value clean code, reproducibility, and data ethics, and thrive in collaborative, fast-paced environments where I can learn and contribute as a team player.

As a fresher, I bring a fresh perspective, a strong willingness to learn, and proven hands-on skills in modern ML and MLOps tools like DVC, MLflow, and GitHub Actions. I am eager to contribute, grow, and make a meaningful impact in a forward-thinking organization.

Education & Certifications

Post Graduate Diploma in Data Science and Artificial Intelligence

University of Liverpool, UK | Dec 2024

Focus: Machine Learning, Deep Learning, AI, NLP, Data Visualization

Dissertation: Developed a Python-based 2D strategy game using graph theory.

Bachelor of Technology in Mechanical Engineering

Avanthi Institute, Hyderabad | May 2017

Project: Solar-powered car prototype focusing on sustainable energy design.

Courses & Certifications
  • Machine Learning with Python: Zero to GBMs (Aug 2023 – Feb 2024)
  • Data Analysis with Python: Zero to Pandas (May 2022 – Aug 2022)

Personal Projects

Web Development Projects

Portfolio Website

Deployed HTML CSS JavaScript
Portfolio Website Screenshot

Developed a fully responsive personal portfolio website using HTML, CSS, and JavaScript. The website includes dark/light theme toggle, smooth scrolling, and a dynamic showcase of projects, skills, and contact form integration. The design is clean, professional, and mobile-friendly to ensure an optimal experience across devices.

Technologies: HTML, CSS, JavaScript

View Code Try Now

I began by designing the layout with semantic HTML5 and styled it using modern CSS3, focusing on a clean and responsive layout. JavaScript was used to handle the dark/light theme switch, smooth navigation, and dynamic content rendering for projects.

The website was version-controlled with Git and deployed using GitHub Pages for free and fast hosting. All project sections link directly to live demos and source code for transparency and easy access.

Key highlights:

  • Dark/light theme toggle
  • Fully responsive design
  • Integrated project showcase with live links
  • Deployed using GitHub Pages

NLP Projects

Movie Recommender System

Deployed Streamlit App NLP
Movie Recommender System

This project is a content-based movie recommender system designed to suggest movies similar to a selected title. It uses metadata features like genres, cast, keywords, and overview to calculate similarity between movies using cosine similarity. To enhance the user experience, the app fetches live movie posters dynamically from The Movie Database (TMDB) API and displays them alongside recommendations.

Technologies: Python, NLP, Cosine Similarity, TMDB API, Streamlit

View Code Try Now

I started by collecting and cleaning movie metadata to create meaningful features for similarity comparison. Using TF-IDF vectorization, I converted textual features into numerical vectors. Cosine similarity was then used to compute similarity scores between movies.

The TMDB API was integrated to dynamically fetch high-quality movie posters to display with recommendations, improving visual appeal and user engagement.

The entire system was wrapped into an interactive web app using Streamlit, providing a simple and responsive UI that allows users to select movies and instantly view recommendations with posters.

Key highlights:

  • Advanced feature engineering combining multiple metadata features
  • Real-time API integration for poster retrieval
  • End-to-end deployment with Streamlit for an interactive user experience

MLOps & Deployment Projects

Credit Score Classification

Deployed MLOps CI/CD
Credit Score Classification

This project implements a robust machine learning pipeline to classify credit scores, aiming to predict creditworthiness of loan applicants. It incorporates modern MLOps practices using DVC for dataset and pipeline versioning, MLflow for experiment tracking and model registry, and Flask for serving the trained model through a REST API. The application is fully containerized with Docker and deployed on AWS EC2, with automated CI/CD workflows via GitHub Actions.

Technologies: Python DVC MLflow Flask Docker AWS EC2 GitHub Actions

View GitHub

I structured the project as an end-to-end ML pipeline, starting with data ingestion and preprocessing, followed by feature engineering, model training, and evaluation.

Using DVC, I version-controlled the datasets and pipeline stages, enabling reproducibility and easy collaboration. MLflow tracked experiments including model parameters and evaluation metrics, facilitating model comparison and selection.

The best model was wrapped in a Flask API for serving predictions, while the entire setup was containerized using Docker to ensure environment consistency. Deployment was automated on AWS EC2, and GitHub Actions CI/CD pipelines ensured code quality and smooth updates.

Key highlights:

  • Modular pipeline components enabling flexible experimentation.
  • Seamless experiment tracking and model registry with MLflow.
  • Robust deployment strategy with Docker and AWS.
  • CI/CD automation via GitHub Actions for testing and deployment.

Data Analysis & Insights Projects

NYC Restaurants Recommendation System

EDA Predictive Modeling Visualization
NYC Restaurants Recommendation System

This project explores customer order data from NYC restaurants to uncover insights about cuisine popularity, customer satisfaction, and operational efficiency. By analyzing order counts, delivery times, and customer ratings, the project provides actionable recommendations to restaurant owners and food delivery platforms.

Technologies: Python Pandas Seaborn Plotly matplotlib Jupyter Notebook

View GitHub

I began by cleaning and preprocessing a large dataset of restaurant orders. Using exploratory data analysis and visualization techniques (Seaborn, Plotly), I identified key trends such as the popularity of American and Japanese cuisines, and how delivery efficiency correlates with customer satisfaction.

Built predictive models for delivery time, order cost, and customer ratings using relevant features like cuisine type, preparation time, and day of the week.

The insights help stakeholders improve operational efficiency, predict delivery delays, and enhance customer experience through data-driven decisions.

Key highlights:

  • Comprehensive exploratory data analysis revealing customer behavior patterns.
  • Predictive modeling for critical operational metrics.
  • Interactive visualizations supporting decision-making.

Predictive Modeling Projects

Student Performance Prediction

Deployed Streamlit App ML Model
Student Performance Prediction

This project predicts students' exam performance based on factors such as study time, past grades, and other academic-related features. The model helps educators identify students who may need additional support.

Technologies: Python Scikit-learn Streamlit

View Code

Data preprocessing and feature selection were performed to identify the most relevant predictors of exam outcomes. I trained multiple machine learning algorithms, tuning hyperparameters to maximize predictive accuracy.

An interactive web application was created using Streamlit to allow users to input student data and instantly receive performance predictions.

Key highlights:

  • Feature engineering for improved model accuracy.
  • Model selection and tuning to identify optimal predictors.
  • Easy-to-use Streamlit app for real-time predictions.

Loan Prediction Web Application

Deployed Flask App SQLite database
Loan Prediction Web Application

This Flask web application predicts loan approval status based on applicant information. It leverages a trained Random Forest classifier to provide real-time decisions, helping users understand their loan eligibility.

Technologies: Flask Random Forest SQLite Bootstrap HTML/CSS

View GitHub Try Now

I trained a Random Forest classifier using cleaned loan application data and stored the model using pickle. The Flask app collects user inputs via Bootstrap-powered responsive forms and returns approval predictions.

Predictions and user contact messages are saved to an SQLite database for persistence. Front-end design was enhanced with custom CSS and JavaScript for usability.

Project Structure Highlights:

  • app.py: Flask backend application
  • templates/: HTML pages including forms and result views
  • static/: CSS, images, and JavaScript files
  • random_forest_model.pkl: Serialized model file
  • schema.sql: SQLite database schema

Heart Disease Prediction

ML Models Data Exploration Visualization
Heart Disease Prediction

Using the Heart Disease UCI dataset, this project predicts heart disease presence based on clinical features like age, cholesterol, and blood pressure. Multiple models including Logistic Regression, Random Forest, and XGBoost were evaluated for accuracy and robustness.

Technologies: Python Scikit-learn XGBoost Data Visualization

View GitHub

I performed extensive exploratory data analysis using boxplots, KDE plots, and scatter matrices to understand feature distributions and relationships. Feature engineering improved model inputs, while missing data was carefully handled.

I trained and tuned multiple classifiers, ultimately selecting the Random Forest model based on performance metrics. The final model was saved for future use and deployment.

Key highlights:

  • Thorough data exploration and visualization.
  • Comparative evaluation of different machine learning models.
  • Robust feature engineering and model selection.

Breast Cancer Prediction

Deployed Flask App Random Forest
Breast Cancer Prediction

This Flask web application predicts whether a tumor is cancerous or benign based on tumor characteristics using a Random Forest classifier. It provides users with instant, real-time predictions through an easy-to-use web interface.

Technologies: Flask Random Forest Scikit-learn NumPy Pickle

View GitHub

I developed the machine learning pipeline by training a Random Forest classifier on a labeled breast cancer dataset, tuning it for high accuracy and reliability.

The model was serialized using pickle and integrated into a Flask web app that allows users to input tumor features via a clean and responsive HTML form.

Static assets like CSS and JavaScript were used to enhance the UI/UX. The app immediately provides a prediction on submission, allowing for quick cancer risk assessments.

Project Structure Highlights:

  • app.py: Main Flask application handling routes and prediction logic
  • model.py: Contains the model training and loading functions
  • classifier_rf.pkl: Serialized Random Forest model
  • templates/index.html: User input form and result display
  • static/: CSS and JavaScript files for front-end styling and interaction

Prerequisites: Python 3.x, Flask, scikit-learn, NumPy, Pickle.

Prudential Life Insurance Assessment

Logistic Regression Decision Trees Feature Engineering
Prudential Life Insurance Assessment

A predictive model for evaluating life insurance applications using decision trees and logistic regression on customer demographic and financial data.

Technologies: Logistic Regression Decision Trees Feature Engineering

View GitHub
Explored structured customer datasets and engineered features such as income-to-age ratio and employment status. Trained classification models with hyperparameter tuning and evaluated using ROC AUC and confusion matrix analysis.

Implementation Process:

  • Processed structured demographic and financial data
  • Engineered key features like income-to-age ratio
  • Trained Decision Tree and Logistic Regression models
  • Evaluated with ROC AUC and confusion matrix

Resume

Download my latest resume below.

Download Resume View Resume (PDF)

Technical Skills

Soft Skills

Contact Me

Get in Touch

Open to new data science roles and collaborations. Reach out to discuss how I can add value to your team or project.