get started with trainable classifiers - microsoft 365 compliance | microsoft docs
A Microsoft 365 trainable classifier is a tool you can train to recognize various types of content by giving it samples to look at. Once trained, you can use it to identify item for application of Office sensitivity labels, Communications compliance policies, and retention label policies.
Creating a custom trainable classifier first involves giving it samples that are human picked and positively match the category. Then, after it has processed those, you test the classifiers ability to predict by giving it a mix of positive and negative samples. This article shows you how to create and train a custom classifier and how to improve the performance of custom trainable classifiers and pre-trained classifiers over their lifetime through retraining.
Opt-in is required the first time for trainable classifiers. It takes twelve days for Microsoft 365 to complete a baseline evaluation of your organizations content. Contact your global administrator to kick off the opt-in process.
When you want a trainable classifier to independently and accurately identify an item as being in particular category of content, you first have to present it with many samples of the type of content that are in the category. This feeding of samples to the trainable classifier is known as seeding. Seed content is selected by a human and is judged to represent the category of content.
You need to have at least 50 positive samples and as many as 500. The trainable classifier will process up to the 500 most recent created samples (by file created date/time stamp). The more samples you provide, the more accurate the predictions the classifier will make.
Once the trainable classifier has processed enough positive samples to build a prediction model, you need to test the predictions it makes to see if the classifier can correctly distinguish between items that match the category and items that don't. You do this by selecting another, hopefully larger, set of human picked content that consists of samples that should fall into the category and samples that won't. You should test with different data than the initial seed data you first provided. Once it processes those, you manually go through the results and verify whether each prediction is correct, incorrect, or you aren't sure. The trainable classifier uses this feedback to improve its prediction model.
Collect between 50-500 seed content items. These must be only samples that strongly represent the type of content you want the trainable classifier to positively identify as being in the classification category. See, Default crawled file name extensions and parsed file types in SharePoint Server for the supported file types.
Make sure the items in your seed set are strong examples of the category. The trainable classifier initially builds its model based on what you seed it with. The classifier assumes all seed samples are strong positives and has no way of knowing if a sample is a weak or negative match to the category.
Within 24 hours the trainable classifier will process the seed data and build a prediction model. The classifier status is In progress while it processes the seed data. When the classifier is finished processing the seed data, the status changes to Need test items.
Collect at least 200 test content items (10,000 max) for best results. These should be a mix of items that are strong positives, strong negatives and some that are a little less obvious in their nature. See, Default crawled file name extensions and parsed file types in SharePoint Server for the supported file types.
When the trainable classifier is done processing your test files, the status on the details page will change to Ready to review. If you need to increase the test sample size, choose Add items to test and allow the trainable classifier to process the additional items.
Microsoft 365 will present 30 items at a time. Review them and in the We predict this item is "Relevant". Do you agree? box choose either Yes or No or Not sure, skip to next item. Model accuracy is automatically updated after every 30 items.
machine learning classifiers - the algorithms & how they work
A classifier in machine learning is an algorithm that automatically orders or categorizes data into one or more of a set of classes. One of the most common examples is an email classifier that scans emails to filter them by class label: Spam or Not Spam.
A classifier is the algorithm itself the rules used by machines to classify data. A classification model, on the other hand, is the end result of your classifiers machine learning. The model is trained using the classifier, so that the model, ultimately, classifies your data.
There are both supervised and unsupervised classifiers. Unsupervised machine learning classifiers are fed only unlabeled datasets, which they classify according to pattern recognition or structures and anomalies in the data. Supervised and semi-supervised classifiers are fed training datasets, from which they learn to classify data according to predetermined categories.
Sentiment analysis is an example of supervised machine learning where classifiers are trained to analyze text for opinion polarity and output the text into the class: Positive, Neutral, or Negative. Try out this pre-trained sentiment analysis model to see how it works.
Machine learning classifiers are used to automatically analyze customer comments (like the above) from social media, emails, online reviews, etc., to find out what customers are saying about your brand.
Other text analysis techniques, like topic classification, can automatically sort through customer service tickets or NPS surveys, categorize them by topic (Pricing, Features, Support, etc.), and route them to the correct department or employee.
SaaS text analysis platforms, like MonkeyLearn, give easy access to powerful classification algorithms, allowing you to custom-build classification models to your needs and criteria, usually in just a few steps.
Machine learning classifiers go beyond simple data mapping, allowing users to constantly update models with new learning data and tailor them to changing needs. Self-driving cars, for example, use classification algorithms to input image data to a category; whether its a stop sign, a pedestrian, or another car, constantly learning and improving over time.
A decision tree is a supervised machine learning classification algorithm used to build models like the structure of a tree. It classifies data into finer and finer categories: from tree trunk, to branches, to leaves. It uses the if-then rule of mathematics to create sub-categories that fit into broader categories and allows for precise, organic categorization.
Naive Bayes is a family of probabilistic algorithms that calculate the possibility that any given data point may fall into one or more of a group of categories (or not). In text analysis, Naive Bayes is used to categorize customer comments, news articles, emails, etc., into subjects, topics, or tags to organize them according to predetermined criteria, like this:
K-nearest neighbors (k-NN) is a pattern recognition algorithm that stores and learns from training data points by calculating how they correspond to other data in n-dimensional space. K-NN aims to find the k closest related data points in future, unseen data.
In text analysis, k-NN would place a given word or phrase within a predetermined category by calculating its nearest neighbor: k is decided by a plurality vote of its neighbors. If k = 1, it would be tagged into the class nearest 1.
Take a look at this visual representation to understand how SVM algorithms work. We have two tags: red and blue, with two data features: X and Y, and we train our classifier to output an X/Y coordinate as either red or blue.
The SVM assigns a hyperplane that best separates (distinguishes between) the tags. In two dimensions this is simply a straight line. Blue tags fall on one side of the hyperplane and red on the other. In sentiment analysis these tags would be Positive and Negative.
SVM algorithms make excellent classifiers because, the more complex the data, the more accurate the prediction will be. Imagine the above as a 3-dimensional output, with a Z-axis added, so it becomes a circle.
Artificial neural networks are designed to work much like the human brain does. They connect problem-solving processes in a chain of events, so that once one algorithm or process has solved a problem, the next algorithm (or link in the chain) is activated.
Artificial neural networks or deep learning models require vast amounts of training data because their processes are highly advanced, but once they have been properly trained, they can perform beyond other, individual, algorithms.
There are a variety of artificial neural networks, including convolutional, recurrent, feed-forward, etc., and the machine learning architecture best suited to your needs depends on the problem youre aiming to solve.
Classification algorithms enable the automation of machine learning tasks that were unthinkable just a few years ago. And, better yet, they allow you to train AI models to the needs, language, and criteria of your business, performing much faster and with a greater level of accuracy than humans ever could.
MonkeyLearn is a machine learning text analysis platform that harnesses the power of machine learning classifiers with an exceedingly user-friendly interface, so you can streamline processes and get the most out of your text data for valuable insights.
classifier | definition of classifier by merriam-webster
These example sentences are selected automatically from various online news sources to reflect current usage of the word 'classifier.' Views expressed in the examples do not represent the opinion of Merriam-Webster or its editors. Send us feedback.