Skip to main content Skip to secondary navigation

Education

Main content start

Practical Foundations of Data Science in Ophthalmology

Course Description

This introductory course explores the foundations of data science and biomedical informatics with a focus on ophthalmology. Students will learn how to develop impactful research questions, conduct effective literature reviews, and navigate large-scale datasets such as national health surveys, All of Us, insurance claims data, and TriNetX. The course covers the structure of healthcare data, including clinical ontologies and ophthalmic imaging, and provides a practical overview to coding, cohort construction, and feature engineering. Emphasis is placed on best practices for reproducible and collaborative data analysis using tools like Jupyter Notebooks, Google Cloud Platform. The course concludes with guidance on writing scientific manuscripts and understanding the peer review process, equipping students with the skills to contribute meaningfully to the field of ophthalmic data science. No specific background in ophthalmology OR coding is assumed or required: this is truly an introductory course which will meet researchers-in-training where they are at. Learners who complete this course will be well-positioned to start building big data cohorts for research projects either in the machine learning or traditional statistics domain. 

This course is open to anyone affiliated with or doing research with a faculty member in the Stanford Department of Ophthalmology: medical students, residents, fellows, visiting fellows, visiting scholars, visiting student researchers, undergraduates, graduate students, etc.

Please inquire with sywang@stanford.edu for specific questions. 

Registration

Please follow this link to register for this course. The course is completed for 2025 but recordings and course materials are available if you register. 

Course Schedule

Class Times: Tuesdays and Thursdays 11:30am - 12:30pm, September 2 - September 25, 2025 

Location: Spencer Vision Research Center, 2nd floor conference room. In-person is best to make the most of the course but there will be Zoom links for those who have to miss some sessions. 

Week 1: Framing the Research

  • From Hunch to Hypothesis: Designing the Right Research Questions and Reviewing the Literature

This talk will discuss how to develop research questions suitable for big data analyses, and how to review the scientific literature effectively.

  • Curated or Chaotic: Big Datasets for Eye Research

A lightning tour of many of the available big datasets for eye research, including national survey data, insurance claims data, multicenter registry data, electronic health records and more.

Week 2: Understanding the Structure of Healthcare Data

  • Data Wrangling Tools: SQL, Cloud Computing, and A Beginner's Roadmap

You CAN learn to code. At a minimum, you must know what you want to do with large interrelated data tables. This lecture gives an overview of data wrangling in relational databases using SQL and an overview of basic cloud computing skills including Colab and Nero-GCP. This lecture will give also a beginner's roadmap as to how to start learning how to program for data science.

  • ICD, SNOMED, and the Very Particular Language of Healthcare Data

An introduction to the structure of healthcare data, including ontologies and data schema - a can't miss for those who haven't worked with "real-world" healthcare or EHR data before. Concludes with an exercise in querying the OMOP concept tables. 

Week 3: Imaging and Data Preparation

  • Pixels and Perimetry: A Tour Through Ophthalmic Imaging

We will cover different forms of ophthalmic imaging and testing, such as fundus photography, OCT, and visual field testing, at a level useful to someone without much ophthalmic background. We will also cover working with these data types, including DICOM files and data formats, which will be useful to someone with more clinical background but who wants to get into data analysis with imaging.

  • Cohorts, Features, and Filters: Structuring Data for Analysis

Whether you have a machine learning project or a traditional biostats/epidemiology project, everyone will need to learn to prepare an analytic dataset with the patients and features/variables they're interested in. Here's a practical example walk-through of how to do this.

Week 4: Doing and Sharing the Work

  • Science that Scales: Organizing Your Code, Data, and Sanity

This will describe best practices for organizing the workflow of data analysis, with an aim towards reproducible and collaborative science. This involves organizing your code and datasets in a smart way.

  • Polish and Publish: Writing, Submitting, and Surviving Peer Review

This lecture will discuss how to write a paper, what's expected of a first author, and describe the peer review process.