Hello, my partner! Let's explore the mining machine together!

[email protected]

which classifier to choose

nlp - which classifier to choose in nltk - stack overflow

nlp - which classifier to choose in nltk - stack overflow

I want to classify text messages into several categories like, "relation building", "coordination", "information sharing", "knowledge sharing" & "conflict resolution". I am using NLTK library to process these data. I would like to know which classifier, in nltk, is better for this particular multi-class classification problem.

Naive Bayes is the simplest and easy to understand classifier and for that reason it's nice to use. Decision Trees with a beam search to find the best classification are not significantly harder to understand and are usually a bit better. MaxEnt and SVM tend be more complex, and SVM requires some tuning to get right.

With your problem, I would focus first on ensuring you have a good training/testing dataset and also choose good features. Since you are asking this question you haven't had much experience with machine learning for NLP, so I'd say start of easy with Naive Bayes as it doesn't use complex features- you can just tokenize and count word occurrences.

Yes, Training a Naive Bayes Classifier for each category and then labeling each message to a class based on which Classifier provides the highest score is a standard first approach to problems like this. There are more sophisticated single class classifier algorithms which you could substitute in for Naive Bayes if you find performance inadequate, such as a Support Vector Machine ( Which I believe is available in NLTK via a Weka plug in, but not positive). Unless you can think of anything specific in this problem domain that would make Naieve Bayes especially unsuitable, its ofen the go-to "first try" for a lot of projects.

The other NLTK classifier I would consider trying would be MaxEnt as I believe it natively handles multiclass classification. (Though the multiple binary classifer approach is very standard and common as well). In any case the most important thing is to collect a very large corpus of properly tagged text messages.

If by "Text Messages" you are referring to actual cell phone text messages these tend to be very short and the language is very informal and varied, I think feature selection may end up being a larger factor in determining accuracy than classifier choice for you. For example, using a Stemmer or Lemmatizer that understands common abbreviations and idioms used, tagging part of speech or chunking , entity extraction, extracting probably relationships between terms may provide more bang than using more complex classifiers.

This paper talks about classifying Facebook status messages based on sentiment, which has some of the same issues, and may provide some insights into this. The links is to a google cache because I'm having problems w/ the original site:

http://docs.google.com/viewer?a=v&q=cache:_AeBYp6i1ooJ:nlp.stanford.edu/courses/cs224n/2010/reports/ssoriajr-kanej.pdf+maxent+classifier+multiple+classes&hl=en&gl=us&pid=bl&srcid=ADGEESi-eZHTZCQPo7AlcnaFdUws9nSN1P6X0BVmHjtlpKYGQnj7dtyHmXLSONa9Q9ziAQjliJnR8yD1Z-0WIpOjcmYbWO2zcB6z4RzkIhYI_Dfzx2WqU4jy2Le4wrEQv0yZp_QZyHQN&sig=AHIEtbQN4J_XciVhVI60oyrPb4164u681w&pli=1

By clicking Accept all cookies, you agree Stack Exchange can store cookies on your device and disclose information in accordance with our Cookie Policy.

which machine learning classifier to choose, in general? - intellipaat

which machine learning classifier to choose, in general? - intellipaat

Decision Tree-Decision tree is one of the most popular tools for classification and prediction. It is a flowchart of a tree structure where each note represents a test, each branch represents the outcome of a test and each terminal node hold a class label.

SMO (SVM)-A Support Vector Machine (SVM) is a discriminative classifier formally defined by a separating hyperplane. In other words, this algorithm outputs an optimal hyperplane which categorizes new examples.

which classifier has the best performance? - data science central

which classifier has the best performance? - data science central

In practice, given a wide range of classifiers, we often have to choose the one based on performance comparison through validation. Research literature shows that there is no classifier that performs universally best in all contexts for all problems. The following paper applied 8 most popular classifiers (e.g., SVM, Neural Net, Ensemble, KNN, Decision Tree, Logistic Regression, Discriminant Analysis, Naive Bayes, etc.) in Machine Learning arena to solve a problem currently confronting finance institutions such as banks, insurers, asset managers in their derivative valuation and risk management. The paper shows that when properly parameterized (which the paper discusses in details), the performances are either consistent with or contrary to some classic studies in the area. The paper is available at SSRN.

I replied, saying that abetter question is which classifier performs worst, as the answer is more simple. Answers that come to my mind are classifiers based on discriminate analysis or Naive Bayes. For instance, discriminate analysis only allows you to detect clusters that are linearly separated. A blend of various classifiers usually works better than a single one, and sometimes just transforming your data (using a log scale) provides substantial improvement.

Logistic regression enables the researcher to explicitly parameterize and estimate a theoretical model. The other techniques to varying degrees put more emphasis on purely empirical estimation from the data.

Whether this is advantageous or disadvantageous will of course depend on the situation but as we move to an increasingly Big Data world I believe that theory-based research will have an inherent advantage: with thousands of potential explanatory variables available, the potential for noise to overwhelm the signal increases, unless we have theoretical models to suggest which variables are more likely to convey signal.

It's similar to how computers can beat human chess (and now Go) players -- but the strongest players in the world are expert human players combined with computers. Pure machine learning gets stronger every day, but still benefits from human input.

Unless I missed it, but I quickly searched the paper for "boost" and I'm surprised I didn't find gradient boosted trees (seehttp://fastml.com/what-is-better-gradient-boosted-trees-or-random-f...) mentioned. And XGBoost in particular (and not the gradient boosted tree implementation in sklearn), which seems to be wining more than half of Kaggle contests (https://github.com/dmlc/xgboost/tree/master/demo#machine-learning-c...). XGBoost seems to have tweaked the original description of gradient boosted trees and is thus, different than sklearn's implementation (https://medium.com/@chris_bour/6-tricks-i-learned-from-the-otto-kag...). Users claim better prediction power than sklearn.

As indicated in our paper, the original objective of our research was to search for a real-world solution for a real-world need to find the proxy for a type of financial market feature variable (in this case, Credit Default Swap, CDS curves) for those illiquid corporates, i.e., those that don't have liquid quotes. It turned out classifier performance comparison is a natural extension of the research.

We followed through existing literature (two classic ones are mentioned in the paper); clearly, to find the best (in terms of optimal paramterization choices) of the best (in terms of classifier families across 8 most popular ones). Naturally, we had to compare 156 classifiers (we did a lot more than that; but we had to cut the paper size to current version). The paper is here: https://ssrn.com/abstract=2967184

Most popular articles Free Book and Resources for DSC Members New Perspectives on Statistical Distributions and Deep Learning Time series, Growth Modeling and Data Science Wizardy Statistical Concepts Explained in Simple English Machine Learning Concepts Explained in One Picture Comprehensive Repository of Data Science and ML Resources Advanced Machine Learning with Basic Excel Difference between ML, Data Science, AI, Deep Learning, and Statistics Selected Business Analytics, Data Science and ML articles How to Automatically Determine the Number of Clusters in your Data Fascinating New Results in the Theory of Randomness Hire a Data Scientist | Search DSC | Find a Job Post a Blog | Forum Questions

Free Book and Resources for DSC Members New Perspectives on Statistical Distributions and Deep Learning Time series, Growth Modeling and Data Science Wizardy Statistical Concepts Explained in Simple English Machine Learning Concepts Explained in One Picture Comprehensive Repository of Data Science and ML Resources Advanced Machine Learning with Basic Excel Difference between ML, Data Science, AI, Deep Learning, and Statistics Selected Business Analytics, Data Science and ML articles How to Automatically Determine the Number of Clusters in your Data Fascinating New Results in the Theory of Randomness Hire a Data Scientist | Search DSC | Find a Job Post a Blog | Forum Questions

which machine learning classifier to choose, in general? - stack overflow

which machine learning classifier to choose, in general? - stack overflow

What are other guidelines? Even answers like "if you'll have to explain your model to some upper management person, then maybe you should use a decision tree, since the decision rules are fairly transparent" are good. I care less about implementation/library issues, though.

What you do is simply to split your dataset into k non-overlapping subsets (folds), train a model using k-1 folds and predict its performance using the fold you left out. This you do for each possible combination of folds (first leave 1st fold out, then 2nd, ... , then kth, and train with the remaining folds). After finishing, you estimate the mean performance of all folds (maybe also the variance/standard deviation of the performance).

How to choose the parameter k depends on the time you have. Usual values for k are 3, 5, 10 or even N, where N is the size of your data (that's the same as leave-one-out cross validation). I prefer 5 or 10.

Let's say you have 5 methods (ANN, SVM, KNN, etc) and 10 parameter combinations for each method (depending on the method). You simply have to run cross validation for each method and parameter combination (5 * 10 = 50) and select the best model, method and parameters. Then you re-train with the best method and parameters on all your data and you have your final model.

There are some more things to say. If, for example, you use a lot of methods and parameter combinations for each, it's very likely you will overfit. In cases like these, you have to use nested cross validation.

Again, you first split your data into k folds. After each step, you choose k-1 as your training data and the remaining one as your test data. Then you run model selection (the procedure I explained above) for each possible combination of those k folds. After finishing this, you will have k models, one for each combination of folds. After that, you test each model with the remaining test data and choose the best one. Again, after having the last model you train a new one with the same method and parameters on all the data you have. That's your final model.

The book "OpenCV" has a great two pages on this on pages 462-463. Searching the Amazon preview for the word "discriminative" (probably google books also) will let you see the pages in question. These two pages are the greatest gem I have found in this book.

SVM's are fast when it comes to classifying since they only need to determine which side of the "line" your data is on. Decision trees can be slow especially when they're complex (e.g. lots of branches).

For classification, Naive Bayes is a good starter, as it has good performances, is highly scalable and can adapt to almost any kind of classification task. Also 1NN (K-Nearest Neighbours with only 1 neighbour) is a no-hassle best fit algorithm (because the data will be the model, and thus you don't have to care about the dimensionality fit of your decision boundary), the only issue is the computation cost (quadratic because you need to compute the distance matrix, so it may not be a good fit for high dimensional data).

Another good starter algorithm is the Random Forests (composed of decision trees), this is highly scalable to any number of dimensions and has generally quite acceptable performances. Then finally, there are genetic algorithms, which scale admirably well to any dimension and any data with minimal knowledge of the data itself, with the most minimal and simplest implementation being the microbial genetic algorithm (only one line of C code! by Inman Harvey in 1996), and one of the most complex being CMA-ES and MOGA/e-MOEA.

As a side-note, if you want a theoretical framework to test your hypothesis and algorithms theoretical performances for a given problem, you can use the PAC (Probably approximately correct) learning framework (beware: it's very abstract and complex!), but to summary, the gist of PAC learning says that you should use the less complex, but complex enough (complexity being the maximum dimensionality that the algo can fit) algorithm that can fit your data. In other words, use the Occam's razor.

Another resource is one of the lecture videos of the series of videos Stanford Machine Learning, which I watched a while back. In video 4 or 5, I think, the lecturer discusses some generally accepted conventions when training classifiers, advantages/tradeoffs, etc.

If you want to understand the complex relationship that is occurring in your data then you should go with a rich inference algorithm (e.g. linear regression or lasso). On the other hand, if you are only interested in the result you can go with high dimensional and more complex (but less interpretable) algorithms, like neural networks.

By clicking Accept all cookies, you agree Stack Exchange can store cookies on your device and disclose information in accordance with our Cookie Policy.

Related News
  1. spiral classifier protection
  2. frpglass steel placer gold mining equipment spiral chute
  3. mining mining spiral classifier machine
  4. kerja mesin classifierprinsip
  5. linear classifier example
  6. classifier overfitting
  7. good classifier prenciple
  8. efficient medium rock spiral classifier for sale in oran
  9. classifier 5 asl example
  10. chute type classifier
  11. used rock crusher for sale indonesia
  12. suppliers of nickel crusher in indonesia
  13. jaw crusher brands
  14. poolan crushing plant for sale coal karnatakan
  15. high quality large construction waste raymond mill manufacturer in japan
  16. flotation concentrate machine
  17. spiral classifiers for mineral separation
  18. single screw spiral classifier for gold ore separation
  19. jaw crusher lab scale price
  20. gold mining equipment in edmonton