Allison Hegel

I'm an AI Resident at Microsoft Research specializing in natural language processing and text generation with experience as a Data Scientist at Apple and HackerRank. I've applied machine learning, deep learning, and natural language processing in both industry and research settings to better understand user behavior and preferences using Python, PyTorch, and SQL. I have experience presenting results and training at 14 international conferences and universities, teaching 9 classes, and winning 19 grants and awards.

Recent Work

Image of a chart comparing multiple tone datasets with examples.

Tone Classification and Rewrite Suggestion

As an AI Resident at Microsoft Research, I am collaborating with Office teams to build a neural system to detect impolite language and offer rewritten alternatives.


Image of a model diagram for recipe research project.

Text Generation and Content Transfer

As an AI Resident at Microsoft Research, I developed deep learning models to rewrite recipes based on dietary constraints using GPT-2 fine-tuned on over 1.2 million recipes. I also implemented state-of-the-art models for comparison including BERT, PPLM, and CTRL. Research submitted to EMNLP.


Image of sentence encoding method comparison tables.

Comparison of Sentence Encoding Methods

This project compares several methods for computing sentence similarity, including Jaccard similarity, TF-IDF, GloVe, BERT, and RoBERTa. Jaccard similarity is able to achieve the highest precision with a strict threshold that only aligns a small portion of the data, while neural methods offer a good balance of precision and alignment percentage.


Apple Crash Prediction

As a data scientist at Apple, I developed machine learning models to predict crash rates of new software releases on billions of devices using Python (sklearn) and Hadoop.


Example chart from HackerRank Candidate Feedback App.

HackerRank Candidate Feedback App

As a data scientist at HackerRank, I used Python (sklearn) and Django to create an internal web app that uses machine learning to automatically tag customer feedback comments and route it to the relevant department, as well as view trends updated hourly.


Image of the HackerRank Test Health Dashboard.

HackerRank Test Health Dashboard

As a data scientist at HackerRank, I developed metrics and deployed the data pipeline for a major product launch that helps customers understand how effectively they are using the product and make data-driven improvements. The data pipeline used Python, MySQL, Redshift, and Airflow.


Slides describing Game Developer Dashboard web app project.

Game Developer Dashboard

A web app that helps game developers design their marketing strategy, powered by a logistic regression model in Python (sklearn) that predicts whether a video game will be successful with 90% accuracy and identifies the language reviewers use to talk about successful video games. I deployed the model as an interactive web app using Flask, Bootstrap, and Bokeh.


Box and whisker plot of genre classification accuracy.

Classifying Genre with Machine Learning

I used a support vector model in Python (sklearn) to classify Goodreads book reviews based on genre, revealing the associated vocabulary of user reviews for each genre and trends in word use over time. This work was part of my dissertation project.


Scatterplot with genre clusters.

Clustering Genres with Machine Learning

I implemented the t-SNE algorithm in Python (sklearn) to cluster high-dimensional, sparse textual data derived from Goodreads users' shelving activity, identifying latent genres that better represent user activity than traditional product categories. This work was part of my dissertation project.


Sparklines showing character presence over time in plot summaries from three sources.

Extracting Events from Plot Summaries with NLP

I used natural language processing to identify characters and plot events in book reviews, revealing the themes reviewers care about most. I conducted the analysis in Python, using Stanford CoreNLP to parse sentence grammar. This work was part of my dissertation project.


Supervised Topic Classification

I used a support vector model in Python (sklearn) to classify the topic of book review sentences, identifying what reviewers are most concerned about on different websites and within different product categories. This work was part of my dissertation project.


BookSplice logo.

BookSplice Recommendation App

On a team of 5 developers, we created a website that recommends books to users based on sentiment analysis of book reviews. I scraped and cleaned the book review database and implemented the sentiment analysis code using Python (Scrapy and NLTK) and SQL.


Chart of collocation data.

Text Analysis with Collocations

I wrote a Python script to find the most common collocations in a text. I walked through implementing the analysis with a class of undergraduates as part of "Collocations Analysis: A Hands-On Workshop," which I ran with Matthew Lavin at the University of Pittsburgh on April 7, 2017.


Sunburst visualization of Amazon product categories.

Amazon Product Categories Data Visualization

I designed an interactive sunburst chart of Amazon product categories using D3.js to display the magnitude of each category and sub-category of books sold on the site.


Chart of max features and RMSE for Yelp project.

Predicting Star Ratings on Yelp

Which business attributes have the greatest impact on a Yelp business' star rating? Using a variety of regression models in Python (sklearn), I was able to predict (within about 0.75 stars accuracy) a business' star rating given only its business attributes on Yelp.

Resume

Experience



Skills

Code

Python, SQL, Git, HTML, CSS

Machine Learning

PyTorch, Scikit-Learn, Pandas, Numpy, SpaCy, CoreNLP, NLTK, Scrapy, Jupyter Notebooks

Data Deployment

Docker, Airflow, Hadoop, Django, Flask, Tableau



Education

UCLA
PhD in English, with a focus on natural language processing and machine learning

Program ranked #6 in field
Dissertation: Using machine learning and natural language processing to better understand readers' online behavior and preferences by analyzing 400K+ book reviews

University of Chicago
BA, double major with honors, English and Fundamentals

Phi Beta Kappa
Varsity Soccer: Mary Jean Mulvaney Scholar Athlete Award (awarded annually to one female varsity athlete), UAA Champions, 3x NCAA Tournament Bid



Speaking



Grants & Awards



Teaching & Workshops

Website Design: HTML5 UP