AndrewNg

看着吴恩达老师的网课稍微用自己的理解来记一记东西~非常业余只供看乐子

Concepts

Structured Data : Refers to the data that each of the features has a very well defined meaning.
Unstructured Data : audio,raw audio or images where you may want to recognize what’s in the image or text. And the features might be the pixel values(像素值) in an image.What should be paid attention to is that humans are really good at interpret unstructing data,as a word or a text can be regarded as a form of unstructured data.
piont: deal with a huge amount of data.Especially “labled data”
algorithm innovation(data computation): transform sigmoid to RELU function(help computation),more convient for bigger nn trains and trying your new idea,improving your efficiency.
Label : things you’re going to predict.
Feature :
Example : labened/unlabeled(features only) ;
model :
training :
inference :
overfitting ；
convergence :
parameter :
hyperparameter :
模型训练的一次迭代（一次梯度更新）

Basics

Logistic Regression

The dimension of feature vector:the number of elements.n=num*row*col(of the matrixes);m refers to the number of examples we have in total;
create a matrix (nx,m);python:X.shape;Y.shape(1,m)
We assume that the parameters(参数) of logistic regression will be w an nx-dimensional vector.
you can use the following function to get a estimated value:
$$
\widehat{y}=\sigma(wx+b)
$$
Loss Function: Logistic Rsgression lost function,we can find the local optima(best solution).You can assume y equals to a certain value like 1 or 0,then see what value we hope the $\widehat{y}$ be.
$$
L(\widehat{y},y)=-(ylog\widehat{y}+(1-y)log(1-\widehat{y}))
$$
Cost Function: It is the additive sum of every loss of the predictive value,and it measures how well the parameters w and b are doing. We should find apprioate w and b to make the J(w,b) as small as possible.
$$
J(w,b) = \frac{1}{m}\sum L(\widehat{y}^{(i)},y^{(i)})
$$
Interpret:
First, we want to minimum the cost function J(w,b).
Second, we’re supposed to maximum the minus L, for we always want the max through the maximum likehood estimition.
Then we assume that our model were IID(identically independently distributed).
$$y = 1 : p(y|x) = \widehat{y}$$
$$y = 0 : p(y|x) = 1 - \widehat{y}$$
We mix the two function together:
$$p(y|x) = \widehat{y}^y + (1-\widehat{y})^{(1-y)}$$
log the both side we get the -L.

Use the gradient descent algorithm

to train the w and b,get the best result;
progress:as you should know that J(w,b) is a convex(凹函数),so we’re supposed to find the minus,at least the local minus. so we repeat:
$$
w=w-\alpha \frac{\delta J(w,b)}{\delta w}
$$

Implement gradient for logistic regression

three core formulas:
$$z = w^Tx+b$$
$$\widehat{y} = a = \sigma(z)$$
$$L(a,y) = -(ylog(a)+(1-y)log(1-a))$$
the number of fearures equals to the unmber of w(i),and only a sigle b.We compute the loss based on a sigle example.
Caculate:
$$
dz = a - y , db = dz
$$
then you cancompute $w_1,w_2$….the same way as the former
$$dw_1 = x_1dz,w_1 = w_1 - \alpha dw
$$

for the m examples,make a for loop

you add every value up,include $J,w_i,z_i,dz_i$,then you get the additive sum of $j,dw_i,b$,then /m.
$$w_i = w_i - \alpha dw_i
$$
$$b = b - \alpha db$$

Vectorization

use special command so that you can accelarate your efficiency.for loop is too slow.
feature: transform the for loop compute to the special matrix algorithm,so we can call out numpy to perform better.Second you can achieve work out all the results at a sigle time through transform the vector to the matrix.

broadcasting in Python

1 2	cal = A.sum(axis = 0) # to sum vertically, 1 means sum horizontally percentage = 100*A/(cal.reshape(1,4)) # make sure that the shape is you want

In fact, Python can automatically transforms the matrix too suit the compute, by copying vertically or horizontally.
You can read NumPy documention.

Do not use rank 1 arries

1 2	a = np.random.dandn(5) # to create a special arry ranking 1 a.shape = (5,) # rank = 1, behavior differently from vector

instead do this

a = np.random,randn(5,1) # column vector
a = np.random.randn(1,5) # row vector
assert(a.shape == (5,1)) # to confirm the shape
a = a.reshape((1,5)) # to ensure its behavior

Neural Network

Overview&Representation

input layer -> hinden layer -> output layer
we call the input layer the zero layer, according to that role to define the number of layer.

西瓜书

性能度量

错误率精度略，这里主要说明查准率与查全率
查准率：
$$
P = \frac{TP}{TP+FP}
$$
查全率：
$$
R = \frac{TP}{TP+FN}
$$
直观反映：PR图

Gamer's Show

DeepLearning

AndrewNg

Concepts

Basics

Logistic Regression

Use the gradient descent algorithm

Implement gradient for logistic regression

for the m examples,make a for loop

Vectorization

broadcasting in Python

Do not use rank 1 arries

instead do this

Neural Network

Overview&Representation

西瓜书

性能度量

神经网络

前馈型（FNN）

典型：卷积神经网络（CNN）

后馈型（递归型）

典型：循环神经网络（RNN）

LSTM

非线性分类器

决策树

随机森林

GBDT

SVM（非线性核）

多层感知机