makaato serumaga
Learner
(2)
11
Portals
Categories
Artificial intelligence

Skills

Algorithms 3 Bioinformatics 3 Data analysis 3 Data processing 3 Genomics 3 Machine learning 3 Adaptability 2 Performance appraisal 2 Research 2 Biological process 1 Computer science 1

Latest feedback

Achievements

Recent projects

Re:Pair Genomics Inc.
Re:Pair Genomics Inc.
Vaughan, Ontario, Canada

Bioinformatics Algorithm Enhancement with Machine Learning

Re:Pair Genomics Inc. is seeking to enhance its bioinformatics algorithms by integrating machine learning techniques. The project aims to improve the accuracy and efficiency of genomic data analysis, which is crucial for identifying genetic variations and understanding complex biological processes. Learners will apply their machine learning knowledge to develop and refine components of an existing algorithm used in genomic data processing. The project will involve analyzing existing datasets, identifying patterns, and implementing machine learning models to optimize algorithm performance. This initiative provides an opportunity for learners to bridge the gap between theoretical knowledge and practical application in the field of bioinformatics. The project is designed to be completed by a team of learners specializing in computer science or bioinformatics, ensuring a focused and cohesive approach.

Matches 1
Category Machine learning + 4
Closed
Re:Pair Genomics Inc.
Re:Pair Genomics Inc.
Vaughan, Ontario, Canada

Enhancing Genomic Data Analysis with Machine Learning

Re:Pair Genomics Inc. is seeking to enhance its bioinformatics algorithms by integrating machine learning techniques to improve the accuracy and efficiency of genomic data analysis. The current algorithms, while effective, can benefit from the predictive power and adaptability of machine learning models. The project aims to identify specific areas within the existing bioinformatics pipeline where machine learning can be applied to optimize performance. Students will be tasked with researching and selecting appropriate machine learning models, training these models on existing genomic datasets, and evaluating their performance against current methods. The goal is to achieve a measurable improvement in data processing speed and accuracy, ultimately contributing to more precise genomic interpretations.

Matches 1
Category Artificial intelligence + 2
Closed
Re:Pair Genomics Inc.
Re:Pair Genomics Inc.
Vaughan, Ontario, Canada

Enhancing Genomic Data Analysis with Machine Learning Part 2

Re:Pair Genomics Inc. is seeking to enhance its bioinformatics algorithms by integrating machine learning techniques to improve the accuracy and efficiency of genomic data analysis. The current algorithms, while effective, can benefit from the predictive power and adaptability of machine learning models. The project aims to identify specific areas within the existing bioinformatics pipeline where machine learning can be applied to optimize performance. Students will be tasked with researching and selecting appropriate machine learning models, training these models on existing genomic datasets, and evaluating their performance against current methods. The goal is to achieve a measurable improvement in data processing speed and accuracy, ultimately contributing to more precise genomic interpretations.

Matches 1
Category Artificial intelligence + 2
Open

Personal projects

Math Visualization Assistant
February 2025 - Current
https://github.com/makaato53/RAG-project

Overview

This project implements a dual-function Retrieval-Augmented Generation (RAG) assistant that bridges the gap between complex theoretical research and practical implementation. It not only helps researchers and developers understand and implement ML/AI research papers with efficient CUDA optimization, but also breaks down complex mathematical topics into digestible, visually engaging animated explanations.

By combining paper comprehension, advanced mathematical summarization, and Manim-based animation with CUDA documentation retrieval, the system provides a comprehensive tool for both academic exploration and practical GPU programming.

Features

Scientific Paper & Math Processing:

Processes academic papers or math-heavy texts (PDF or plain text) to extract and preserve technical details and mathematical notation.
Uses advanced text chunking and embedding techniques to maintain context and isolate key concepts.
Generates concise summaries that break down complex math topics into step-by-step instructions suitable for visualization.
Manim-Based Animation Generation:

Automatically converts extracted mathematical concepts into detailed Manim animations.
Produces scene-by-scene breakdownsβ€”for example, plotting functions, highlighting transformations, and visualizing derivationsβ€”to bring complex concepts to life in a style reminiscent of 3Blue1Brown.
Allows interactive adjustments of visualization parameters (such as domain, range, or color highlights).
CUDA Implementation Guidance:

Searches through CUDA documentation and NVIDIA best practices to provide targeted strategies for implementing and optimizing ML/AI algorithms on GPUs.
Utilizes vector search with FAISS to retrieve the most relevant documentation and presents contextually accurate solutions using LLM-generated responses.
LLM Integration:

Leverages Hugging Face's FLAN-T5 model for generating both context-aware paper summaries and detailed Manim scene instructions.
Ensures that responses are both theoretically sound and practically applicable for GPU programming and mathematical visualization.
Project Structure

graphql
Copy
Copyproject/
β”œβ”€β”€ notebooks/
β”‚ β”œβ”€β”€ 01_data_preparation.ipynb # PDF extraction, CUDA & math document processing
β”‚ β”œβ”€β”€ 02_embeddings.ipynb # Generating vector embeddings for papers and CUDA docs
β”‚ β”œβ”€β”€ 03_model_testing.ipynb # Testing response generation and animation pipelines
β”‚ └── 04_quality_improvement.ipynb # Optimizing output quality for responses and animations
β”œβ”€β”€ app.py # Streamlit web application integrating both functionalities
β”œβ”€β”€ data/ # Datasets (e.g., paper PDFs, math text, annotated Manim prompts)
β”œβ”€β”€ src/
β”‚ β”œβ”€β”€ doc_processing/ # Code for processing papers and math content
β”‚ β”œβ”€β”€ visualization/ # Manim pipeline and math visualization tools
β”‚ β”œβ”€β”€ embeddings/ # Embedding generation and FAISS vector search
β”‚ β”œβ”€β”€ models/ # LLM and response generation models
β”‚ └── utils/ # Utility functions and helpers
β”œβ”€β”€ requirements.txt # Project dependencies
└── Dockerfile # Container configuration for deployment