机器学习是对原始数据数据(train_feature->train_label)运用算法训练生成模型,来预测新的数据(test_feature), python的scklearn包含了很多算法。

步骤: 导入sklearn模块->选择分类器->创建分类器->拟合分类器->用拟合的分类器进行预测 ->评估模型的准确率->选择分类器,如此循环。

intro:

naive bayes:

svm:

  • svm video
  • modules svm

    from sklearn import svm clf = svm.SVC() clf.fit(X, y) clf.predict(test_feature)

    kernel:The kernel function can be any of the following: linear: \langle x, x’\rangle. polynomial: (\gamma \langle x, x’\rangle + r)^d. d is specified by keyword degree, r by coef0. rbf: \exp(-\gamma |x-x’|^2). \gamma is specified by keyword gamma, must be greater than 0. sigmoid (\tanh(\gamma \langle x,x’\rangle + r)), where r is specified by coef0.

    You can define your own kernels by either giving the kernel as a python function or by precomputing the Gram matrix.

    避免过度拟合->调节参数 训练数据太大,svm会比较慢,数据有噪声,会出现过度拟合

decision trees:

根据训练数据找出决策边界
from sklearn import tree
clf = tree.DecisionTreeClassifier()
clf = clf.fit(X, Y)
clf.predict(test_feature)

k nearest neighbors

>>> X = [[0], [1], [2], [3]]
>>> y = [0, 0, 1, 1]
>>> from sklearn.neighbors import KNeighborsClassifier
>>> neigh = KNeighborsClassifier(n_neighbors=3)
>>> neigh.fit(X, y) 

adaboost

random forest