Competitive aspect of Machine learning

 * Data ko split krna jruri hai, isse hamara model rough tough banenga

* Data ko split krne ke do traike traditional way hai train_test_split, aur doosra tarika hai cross validation , cross validation is more sophisticated one

* remember when we use k fold or cross validation , we don't partition it , we don't fit it , we don't predict it

*There are some feature selection techniques are there

first is Selectkbest, second is RFE (recursive feature elimination)


* then we have something for dimensionalty reduction  and that is pca(Principal component analysis)

*# kya hota hai ki koi bhi chiz 3 dimension mai hai, 

# aur usme hame mushkil ho rahi hai ki ye a, b,c hai

# to use hm two dimension mai daal denge , aur isse hme a,b,c clear ho jayenge

# So by reducing the dimension , you are acheving the separation that is pca

* See selectkbest is selecting individual Feature , RFE is giving multiple feature at a time and pca is kind of telling that with less feature you can achieve a maximum accuracy here comes the fourth one Extra tree classifier

* Extra tree classifier is like a random forest here we are asking our model which are most important feature

one with high value is most important feature



* Dummy variable create krne ki ninja technique

pd.get_dummies(dataset)


* YOu can also use keras to_categorical method which is similar to one hot encoding

* if you have more 0 in a row in your dataset  , you replace them with nan and then remove all the row which have nan , this is first strategy

* We have another strategy also and that is replace all the nan with mean

* Another strategy is use simpleimputer


* Remember earlier what we did is we remove all the row which have nan , but this is not good practice because there is loss of data , that is why we are studying all these strategy

* We also check for different algorithms to check which give max accuracy

* Now let's talk about scaling 

* Machine learning put higher weightage on features which have higher scale

* We have different scaling techniques

* Min Max scaler (so what min max scaler do is it takes the min value and then max value and scale it )

* Standard Scaler(# vals with a mean of 0 & sd of 1

# takes ea feature val

# calcs the mean of ea fea

# subtracts the mean

# then div by sd

# then v get these values

# this is the 2nd pre-processing technique)

* Normalization( Normalization, when v hv a lot of 0 vals, missing vals)

# so what you do is try all these scaling technique and see which scaling technique gives you maximum accuracy

* Binarizer ( this we use rarerly)

 it tell if value is less than 0 we give 0 otherwise 1

--------------------------------------

*# The difference between decision tree and random forest is in decision tree we use only one tree where as in random forest we use group of tree and this is called ensemble learning

# here we used bagging classifier, decision tree and random forest both are bagging

# bagging is like 500 judge make decison parallel and boosting is like one judge take decision then other judge will take decision and so on..

# In bagging it is parallel process and in boosting it is sequential process

* # Grid search is used for hyperparameter tunning
* #Ridge is sophisticated linear regression
*#### Automate ML using Pipelines ###
# the output of this line go to the input of next line
#so from this  pipe water flowing into next pipe that is pipeline


*featureunion is using group of technique together

* So if you want to transfer your machine learning model to django team python team YOu use pickle

import pickle
# Fit the model 
model = LogisticRegression()
model.fit(X_train, y_train)
# save the model to disk
filename = 'finalized_model.sav'
pickle.dump(model, open(filename, 'wb'))
 #ye code bhejta hai finalized_model .sav file mai save hoga aur fir hm ise emai ke jariye bhej denge
# some time later...
 



# is code ki madad se hm us file ko khol skte Hai
# load the model from disk
loaded_model = pickle.load(open(filename, 'rb'))
result = loaded_model.score(X_test, y_test)
print(result)

* Underfitting ka solution hai boosting

* Overfitting ka solution hai hyperparameter tuning

Comments

Popular posts from this blog

All about Machine learning

Machine Learning

OS in Python