刘凡 3282579ec1 first commit преди 2 години
..
.gitignore 3282579ec1 first commit преди 2 години
README.md 3282579ec1 first commit преди 2 години
azure_storage_helpers.py 3282579ec1 first commit преди 2 години
get_predict.py 3282579ec1 first commit преди 2 години
model_helpers.py 3282579ec1 first commit преди 2 години
model_train.py 3282579ec1 first commit преди 2 години
text_processing.py 3282579ec1 first commit преди 2 години

README.md

Python Text Classification

This sample shows how to:

  • Build a dataset with contents retrieved from CosmosDB
  • Train a text classification model in Python using sklearn and SpaCy
  • Use a model stored in Azure Blob Storage to get a prediction

The modules model_helpers, text_processing and azure_storage_helpers in this repo can be used independently.

Run the sample

  1. Install the following libraries used if you don't already have them installed by running pip install {name of library}:

    • spacy
    • nltk
    • azure-cosmos
    • azure-storage-blob
    • pandas
    • sklearn
  2. Make sure you have a CosmosDB database set up with data in it

    Dataset is to come, but the expected format in this sample is:

    {...
     pages:[{
            ...
            sections:[{
                      ...
                      text:''
                      label:''
                      },
                      ...
                      ]
           },
           ...
           ]
    }
    

    You can of course use your own data, in this case make some changes to the dataset building in model_train.py

  3. Make sure you have an Azure Storage account set up

  4. In model_train.py, replace the values in cosmosConfig and blobConfig with your own

  5. Run python .\model_train.py

  6. In get_predict.py, replace the values in blobConfig with your own

  7. In get_predict.py, replace the text to get a prediction for

  8. Run python .\get_predict.py

Use the modules independently

  • If you need to upload/retrieve files from CosmosDB or Azure Blob Storage in python, azure_storage_helpers.py is all you need

  • If you want to perform text processing (normalize text and remove stop words), you can use text_processing.py

  • If you want to train a classification model using sklearn, model_helpers is what you're looking for