Python Text Classification

This sample shows how to:

The modules model_helpers, text_processing and azure_storage_helpers in this repo can be used independently.

Run the sample

Install the following libraries used if you don't already have them installed by running pip install {name of library}:
- spacy
- nltk
- azure-cosmos
- azure-storage-blob
- pandas
- sklearn
Make sure you have a CosmosDB database set up with data in it

Dataset is to come, but the expected format in this sample is:
```
{...
 pages:[{
        ...
        sections:[{
                  ...
                  text:''
                  label:''
                  },
                  ...
                  ]
       },
       ...
       ]
}
```
You can of course use your own data, in this case make some changes to the dataset building in model_train.py
Make sure you have an Azure Storage account set up
In model_train.py, replace the values in cosmosConfig and blobConfig with your own
Run python .\model_train.py
In get_predict.py, replace the values in blobConfig with your own
In get_predict.py, replace the text to get a prediction for
Run python .\get_predict.py

If you need to upload/retrieve files from CosmosDB or Azure Blob Storage in python, azure_storage_helpers.py is all you need
If you want to perform text processing (normalize text and remove stop words), you can use text_processing.py
If you want to train a classification model using sklearn, model_helpers is what you're looking for