The Data Science Lab


Matrix

Sentiment Classification of IMDB Movie Review Data Using a PyTorch LSTM Network

This demo from Dr. James McCaffrey of Microsoft Research of creating a prediction system for IMDB data using an LSTM network can be a guide to create a classification system for most types of text data.

Preparing IMDB Movie Review Data for NLP Experiments

Dr. James McCaffrey of Microsoft Research shows how to get the raw source IMDB data, read the movie reviews into memory, parse and tokenize the reviews, create a vocabulary dictionary and convert the reviews to a numeric form.

Convolutional Neural Networks for MNIST Data Using PyTorch

Dr. James McCaffrey of Microsoft Research details the "Hello World" of image classification: a convolutional neural network (CNN) applied to the MNIST digits dataset.

Preparing MNIST Image Data Text Files

Dr. James McCaffrey of Microsoft Research demonstrates how to fetch and prepare MNIST data for image recognition machine learning problems.

Speed Lines Graphic

Quantum-Inspired Annealing Using C# or Python

Dr. James McCaffrey of Microsoft Research explains a new idea that slightly modifies standard simulated annealing by borrowing ideas from quantum mechanics.

Chi-Square Test Using C#

A chi-square (also called chi-squared) test is a classical statistics technique that can be used to determine if observed-count data matches expected-count data.

Green Motherboard Closeup Graphic

How to Compute Transformer Architecture Model Accuracy

Dr. James McCaffrey of Microsoft Research uses the Hugging Face library to simplify the implementation of NLP systems using Transformer Architecture (TA) models.

Simulated Annealing Optimization Using C# or Python

Dr. James McCaffrey of Microsoft Research shows how to implement simulated annealing for the Traveling Salesman Problem (find the best ordering of a set of discrete items).

Swirl

How to Fine-Tune a Transformer Architecture NLP Model

The goal is sentiment analysis -- accept the text of a movie review (such as, "This movie was a great waste of my time.") and output class 0 (negative review) or class 1 (positive review).

How to Create a Transformer Architecture Model for Natural Language Processing

The goal is to create a model that accepts a sequence of words such as "The man ran through the {blank} door" and then predicts most-likely words to fill in the blank.

Red Brick Graphic

Anomaly Detection Using Principal Component Analysis (PCA)

The main advantage of using PCA for anomaly detection, compared to alternative techniques such as a neural autoencoder, is simplicity -- assuming you have a function that computes eigenvalues and eigenvectors.

Ordinal Classification Using PyTorch

Dr. James McCaffrey of Microsoft Research presents a simple technique he has used with good success, previously unpublished and without a standard name.

Nebula

Computing the Similarity Between Two Machine Learning Datasets

At first thought, computing the similarity/distance between two datasets sounds easy, but in fact the problem is extremely difficult, explains Dr. James McCaffrey of Microsoft Research.

Differential Evolution Optimization

Dr. James McCaffrey of Microsoft Research explains stochastic gradient descent (SGD) neural network training, specifically implementing a bio-inspired optimization technique called differential evolution optimization (DEO).

Wasserstein Distance Using C# and Python

Dr. James McCaffrey of Microsoft Research shows how to compute the Wasserstein distance and explains why it is often preferable to alternative distance functions, used to measure the distance between two probability distributions in machine learning projects.

Spiral Dynamics Optimization with Python

Dr. James McCaffrey of Microsoft Research explains how to implement a geometry-inspired optimization technique called spiral dynamics optimization (SDO), an alternative to Calculus-based techniques that may reach their limits with huge neural networks.

Sentiment Analysis Using a PyTorch EmbeddingBag Layer

Dr. James McCaffrey of Microsoft Research uses a full movie review example to explain the natural language processing (NLP) problem of sentiment analysis, used to predict whether some text is positive (class 1) or negative (class 0).

Logistic Regression Using PyTorch with L-BFGS

Dr. James McCaffrey of Microsoft Research demonstrates applying the L-BFGS optimization algorithm to the ML logistic regression technique for binary classification -- predicting one of two possible discrete values.

Generating Synthetic Data Using a Generative Adversarial Network (GAN) with PyTorch

Dr. James McCaffrey of Microsoft Research explains a generative adversarial network, a deep neural system that can be used to generate synthetic data for machine learning scenarios, such as generating synthetic males for a dataset that has many females but few males.

Positive and Unlabeled Learning (PUL) Using PyTorch

Dr. James McCaffrey of Microsoft Research provides a code-driven tutorial on PUL problems, which often occur with security or medical data in cases like training a machine learning model to predict if a hospital patient has a disease or not.

Blule Squares

Generating Synthetic Data Using a Variational Autoencoder with PyTorch

Generating synthetic data is useful when you have imbalanced training data for a particular class, for example, generating synthetic females in a dataset of employees that has many males but few females.

Autoencoder Anomaly Detection Using PyTorch

Dr. James McCaffrey of Microsoft Research provides full code and step-by-step examples of anomaly detection, used to find items in a dataset that are different from the majority for tasks like detecting credit card fraud.

How To: Create a Streaming Data Loader for PyTorch

When training data won't fit into machine memory, a streaming data loader using an internal memory buffer can help. Dr. James McCaffrey of Microsoft Research shows how.

Neural Regression Using PyTorch: Model Accuracy

Dr. James McCaffrey of Microsoft Research explains how to evaluate, save and use a trained regression model, used to predict a single numeric value such as the annual revenue of a new restaurant based on variables such as menu prices, number of tables, location and so on.

Black White Wave IMage

Neural Regression Using PyTorch: Training

The goal of a regression problem is to predict a single numeric value, for example, predicting the annual revenue of a new restaurant based on variables such as menu prices, number of tables, location and so on.

Subscribe on YouTube