Research Intern

Posted on 11/5/2025

University Health Network

University Health Network

Compensation Overview

$25 - $30/hr

Toronto, ON, Canada

Hybrid

Job Description

Union: Non-Union
Site: Toronto General Hospital
Department: AI Collaborative Centre
Reports to: Dr. Bo Wang
Work Model: Hybrid
Hours: 20
Salary: 25.00 - 30.00 per hour
Status: Temporary Part Time
Closing Date: November 30, 2025

Position Summary 

Computational prediction of protein function remains a major challenge. High-throughput sequencing generates vast numbers of protein sequences, but only a small fraction have experimentally validated Gene Ontology (GO) annotations. The CAFA 6 competition (Critical Assessment of Functional Annotation) is the leading international benchmark for GO-based function prediction, similar to the CASP challenge in structure prediction that led to breakthroughs such as AlphaFold. Yet, unlike structure prediction, protein function prediction remains unsolved and is a key frontier in computational biology.

Large, self-supervised protein-language models such as ESM-2 and ESM-3 have transformed representation learning by capturing evolutionary, biochemical, and structural semantics. Building on these advances, models such as InterLabelGO+, DPFunc, and PhiGnet have achieved strong benchmark performance in large-scale GO function prediction. Despite progress, current methods still struggle to fuse diverse data modalities, capture hierarchical GO complexities, and generalize to rare or highly specific protein functions.

Duties

  • Curate, process, and integrate protein data from CAFA and public bioinformatics databases (e.g., UniProt, InterPro, PDB, Pfam, STRING)
  • Implement and fine-tune deep learning architectures (e.g., transformers, graph neural networks) using PyTorch for protein function prediction
  • Conduct ablation and benchmarking experiments to evaluate model generalization across organisms and rare functions
  • Collaborate with the mentors to design, train, and validate models in an iterative development loop guided by quantitative metrics of CAFA 6
  • Maintain reproducible workflows, version control, and thorough documentation of experiments and datasets
  • Contribute to competition reports, research abstracts, or manuscripts summarizing project outcomes

Qualifications

  • Must be 16 years of age or older, per UHN policy
  • Must be enrolled in an undergraduate or postgraduate program in Computer Science, Computational Biology, Biomedical Engineering, Data Science, or a related field
  • Strong programming skills in Python and experience implementing and training deep learning models in PyTorch
  • Background/experience with bioinformatics and/or computational biology
  • Familiarity with the transformer architecture and prior work with LLMs or model fine-tuning
  • Experience deploying or adapting models from GitHub/HuggingFace repositories
  • Working knowledge of bash, git, virtual environments, and ComputeCanada/SciNet or similar HPC systems
  • Excellent problem-solving skills and ability to work independently and in a team environment
  • Strong analytical and communication skills, with the ability to present research findings effectively

Research Intern @ University Health Network | InternList.org