January 2025

Diffuse & Deduce: A diffusion-inspired image game

https://github.com/weblab-class/diffuse-and-deduce

D&D is a quiz game that aims to help users broaden their knowledge of the world by learning new visual concepts. Inspired by diffusion models, the game leverages noisy images that gradually denoise with time. The player's goal is to guess the image as early as possible.

Noise is produced by adding a randomly sampled offset to every color channel of every pixel of the image, with the variance of offsets decreasing over time. The website uses React + Tailwind CSS for the frontend and Node.js for the backend, along with socket.io for real-time interactions between players.

Built in collaboration with Saisneha Ghatti and Peter Lin. Selected as a semifinalist at MIT's web.lab web programming contest. You can play the game here.

August 2024

Descry: OCR for low-resource writing systems

https://github.com/dariakryvosheieva/descry-ocr

I developed text recognition engines for two new alphabets: Adlam, used to write the Fula language spoken in West Africa, and Kayah Li, used for dialects of the Kayah continuum in Myanmar.

The pipeline for each alphabet consists of a CRAFT text detector, which detects bounding boxes of words in an image, and a CRNN sequence recognizer, which identifies the word (sequence of characters) in each bounding box. To train the models, I generated a total of 16,250,000 synthetic images by drawing random Unicode strings on different backgrounds with Pillow.

Users can interact with the models via a Flask app: you can select an alphabet, upload an image containing text in that alphabet, and have the text recognized.

December 2023

Tone.me: a Mandarin Chinese tone corrector

https://github.com/tone-me/tone-me

Tone.me is a web app that helps learners of Mandarin Chinese improve their pronunciation of tones.

It uses a Wav2Vec2 Transformer finetuned for tone classification to identify tones in a speech recording and compare them to the correct tones in the words that the user intended to say. The finetuning dataset consisted of syllables extracted from the AISHELL-3 corpus with the help of stable-ts. The app uses the Next.js web framework.

Built in collaboration with Riddhi Bhagwat, Chris Ge, Katherine Guo, and Anshul Gupta for the AIM Labs program at MIT.