Posts

How to Calculate Feature Importance With Python

 *  Feature importance refers to a class of techniques for assigning scores to input features to a predictive model that indicates the relative importance of each feature when making a prediction. *  Feature importance scores can be calculated for problems that involve predicting a numerical value, called regression, and those problems that involve predicting a class label, called classification. * Feature importance helps in : Better understanding the data. Better understanding a model. Reducing the number of input features. *  There are many ways to calculate feature importance scores and many models that can be used for this purpose. * 

Machine learning Hack

1. To calculate  the null values in dependent variable in dataset data. data.isnull().sum() 2. To compare the particular feature with dependent variable. pd.crosstab(data.Gender,data.Loan_Status) 3.To check that particular categorical coloum has how many different values(data - dataframe, gender- coloum) print(data.Gender.value_counts()) 4. To fill the na in the categorical column with highest number of values in it data['Gender'].fillna(data['Gender'].mode()[0],inplace = True) 5.  To get the set of numerical feature coloum numerical_feature_columns = list(df._get_numeric_data().columns) numerical_feature_columns 6. To get the set of categorical coloum categorical_feature_columns = list(set(df.columns) - set(df._get_numeric_data().columns)) categorical_feature_columns 7.  To drop the  particular coloum in dataframe data = data.drop(['Loan_ID'],axis=1) 8. To convert features into dummy variable in oneshot X_features = list( data.columns ) X_features data = pd.get...

Competitive aspect of Machine learning

  * Data ko split krna jruri hai, isse hamara model rough tough banenga * Data ko split krne ke do traike traditional way hai train_test_split, aur doosra tarika hai cross validation , cross validation is more sophisticated one * remember when we use k fold or cross validation , we don't partition it , we don't fit it , we don't predict it *There are some feature selection techniques are there first is Selectkbest, second is RFE (recursive feature elimination) * then we have something for dimensionalty reduction  and that is pca(Principal component analysis) * # kya hota hai ki koi bhi chiz 3 dimension mai hai,  # aur usme hame mushkil ho rahi hai ki ye a, b,c hai # to use hm two dimension mai daal denge , aur isse hme a,b,c clear ho jayenge # So by reducing the dimension , you are acheving the separation that is pca * See selectkbest is selecting individual Feature , RFE is giving multiple feature at a time and pca is kind of telling that with less feature you can a...

All about Machine learning

          Building a performing Machine learning Model * data preparation > feature enginerring>data modeling>performance measure> performance improvement * this is highly iterative process , to repeat until your model reaches a satisfying performance * Let's see the steps 1. Data preparation  * query your data - basically you can query your data using pandas , this will give you a dataframe with your raw data * clean your data - 1st step is to deal with missing values, and if a colum contain too much missing value remove that coloum ,and 2nd step is remove the outlliers ,and you need to remove them , you can remove it by using your mind or you can use a robust methods to remove the outliers * Format data - this is basically a encoding of categorical variables , you can use label encoding or one hot encoding 2. Feature Engineering * A feature is an individual measurable property of a phenomenon being observed * like for to predict the price o...

Ml-2

*With textual data we cannot perform mathematical function that's why we are converting the textual data into dumy variable *visiulisation gives you correlation but remember corelation does not means causation causation is an impact or effect * We are learning machine learning because machine learning gives us causation, cofficient  is actually a causation * causation is actually a deep level of correlation * the differcenc between the actual and predicted is called an error * in statsmodel ols (ordinary least square) class give us a method called summary , with summary we can do the statistical analysis of data sets * Signinficance level is considered as 0.05 * if p value for certain colum is greater than 0.05 we will remove that coloum because that colum is garbage that is unimportant variable *R 2 tells us how point is closed to the line . but if number of variable increase r2 will also increase , but as we know profit does not depand upon phone number(unimportant variable) , so...

Machine Learning

MACHINE LEARNING We learn from experience , machine learn from data that is machine learning when a human fail it will bring to machine that  is machine learning jitna jada data hoga utna badhiya hmara model hoga type of machine learning  1.Supervised learning 2.Unsupervised learning 3.Reinforcement learning List of Common Machine Learning Algorithms Here is the list of commonly used machine learning algorithms. These algorithms can be applied to almost any data problem: Linear Regression Logistic Regression Decision Tree SVM Naive Bayes kNN K-Means Random Forest Dimensionality Reduction Algorithms Gradient Boosting algorithms GBM XGBoost LightGBM CatBoost Supervised learning Algorithim are trained using labeld data, input data is provided to the model along with the output 1. linear regression * It is used to estimate real values (cost of houses, number of calls, total sales etc.) * An algorithm is derived by statisticians and mathematicians for a particular task i.e. in our ...