刘凡 3282579ec1 first commit		3 lat temu
..
DNN	3282579ec1 first commit	3 lat temu
Data	3282579ec1 first commit	3 lat temu
HMM_topRNN	3282579ec1 first commit	3 lat temu
RNN_LSTM	3282579ec1 first commit	3 lat temu
tests	3282579ec1 first commit	3 lat temu
.coveragerc.yml	3282579ec1 first commit	3 lat temu
.travis.yml	3282579ec1 first commit	3 lat temu
README.md	3282579ec1 first commit	3 lat temu
requirements.txt	3282579ec1 first commit	3 lat temu

Machine Learning and having it deep and structured

About

Implementations and homeworks of the course Machine Learning and having it deep and structured of National Taiwan University (offered by Hung-yi Lee):

Constructed and trained variants of neural networks by Theano
Attemped to solve the sequence labeling problem in speech recognition (phoneme labeling)
Deep Neural Network (DNN) with dropout, maxout and momentum optimization
Bidirectional Recurrent Neural Network (RNN) with dropout and RMSProp optimization
Bidirectional Long-Short Term Memory (LSTM) with peephole and NAG optimization
Hidden Markov Model (HMM) on top of RNN to improve the performance

Course page

Syllabus

Neural Networks and Training:

What is Machine Learning, Deep Learning and Structured Learning?
Neural Network Basics | Backpropagation | Theano: DNN
Tips for Training Deep Neural Network
Neural Network with Memory | Theano: RNN
Training Recurrent Neural Network
Convolutional Neural Network (by Prof. Winston)

Structured Learning and Graphical Models:

Introduction of Structured Learning | Structured Linear Model | Structured SVM
Sequence Labeling Problem | Learning with Hidden Information
Graphical Model, Gibbs Sampling

Extensions, New Applications and Trends:

Markov Logic Network
Deep Learning for Human Language Processing, Language Modeling
Caffe | Deep Reinforcement Learning | Visual Question Answering
Unsupervised Learning
Attention-based Model

Content

Deep Neural Network (DNN)[kaggle]:

Construct and train a deep neural network to classify pronunciation units (phonemes) in each time frame of a speech.
Inputs: MFCC features
Activation function: Maxout (generalization of ReLU, "learnable" activation function)
Output layer: Softmax
Cost function: cross entropy
Optimization: Momentum
With Dropout technique

Bidirectional Recurrent Neural Network (RNN)[kaggle]:

Construct and train a bidirectional deep recurrent neural network to classify pronunciation units (phonemes) in each time frame of a speech.
Inputs: prediction probabilities of each class from previous DNN
Activation function: ReLU
Output layer: Softmax
Cost function: Mean Squared Error
Optimization: Root Mean Square Propagation (RMSProp)
With Dropout technique

Bidirectional Long-Short Term Memory (LSTM)[kaggle]:

Construct and train a bidirectional deep Long-Short Term Memory to classify pronunciation units (phonemes) in each time frame of a speech.
Inputs: prediction probabilities of each class from previous DNN
Optimization: Nesterov Accelerated Gradient (NAG)
With Peephole
Using grad_clip in theano to prevent gradient exploding

Structure Learning (output phone label sequence)[kaggle]:

On top of results of RNN / LSTM, applies Hidden Markov Model (HMM) to model the phone transition probabilities and further improves the performance of RNN / LSTM on this sequence labeling problem.
Input: the whole utterance as one training data
Output: phone label sequence

The performance is measured by Levenshtein distance (a.k.a. Edit distance).

Usage

Clone the repo and use the virtualenv:

git clone https://github.com/AaronYALai/Machine_Learning_and_Having_It_Deep_and_Structured

cd Machine_Learning_and_Having_It_Deep_and_Structured

virtualenv venv

source venv/bin/activate

Install all dependencies and run the model:

pip install -r requirements.txt

cd RNN_LSTM

python run_RNN.py

README.md

Machine Learning and having it deep and structured

About

Syllabus

Content

Usage