Pandas in Python

                            Pandas
Pandas ek open source data analysis library hai jo ki python mai likhi gyi hai, Agar saaf shabdo mai kahe to pandas is package for data frame.
Data frame ek table jaisa hai, ya fir ye kahna bura nhi hoga ki data frame ek 2 dimensional array jaisa hai jisme row aur coloum hai , Data Frame mai hm numeric ,string ,character data store kr skte hai.

Pandas ka use ham data frame mai operation krne ke liye krte hai


Install pandas
pip install pandas

Import

import pandas as pd

How to work with Dataframe:-

hm jis dataset ke sath kaam krne ja rahe hai usse ap yaha se download kr skte hai.


import pandas as pd
stats = pd.read_csv('DemographicData.csv') # it read the file and then create the data frame
stats

output

Country NameCountry CodeBirth rateInternet usersIncome Group
0ArubaABW10.24478.9High income
1AfghanistanAFG35.2535.9Low income
2AngolaAGO45.98519.1Upper middle income
3AlbaniaALB12.87757.2Upper middle income
4United Arab EmiratesARE11.04488.0High income
..................
190Yemen, Rep.YEM32.94720.0Lower middle income
191South AfricaZAF20.85046.5Upper middle income
192Congo, Dem. Rep.COD42.3942.2Low income
193ZambiaZMB40.47115.4Lower middle income
194ZimbabweZWE35.71518.5Low income

195 rows × 5 columns

-------------------------------------------------------------------

len(stats)

195

#this is basically a row of stats dataframe
------------------------------------------
stats.head()

Country NameCountry CodeBirth rateInternet usersIncome Group
0ArubaABW10.24478.9High income
1AfghanistanAFG35.2535.9Low income
2AngolaAGO45.98519.1Upper middle income
3AlbaniaALB12.87757.2Upper middle income
4United Arab EmiratesARE11.04488.0High income
#it gives the top five
------------------------------------------
stats.tail()

Country NameCountry CodeBirth rateInternet usersIncome Group
190Yemen, Rep.YEM32.94720.0Lower middle income
191South AfricaZAF20.85046.5Upper middle income
192Congo, Dem. Rep.COD42.3942.2Low income
193ZambiaZMB40.47115.4Lower middle income
194ZimbabweZWE35.71518.5Low income
#it give the bottom five
----------------------------------------------------
lst = stats.columns
lst

Index(['Country Name', 'Country Code', 'Birth rate', 'Internet users',
       'Income Group'],
      dtype='object')

#it gives you the list of the columns


-----------------------------------------------------
stats.columns = (['CountryName', 'CountryCode', 'BirthRate', 'InternetUsers','IncomeGroup'])
stats.head()

CountryNameCountryCodeBirthRateInternetUsersIncomeGroup
0ArubaABW10.24478.9High income
1AfghanistanAFG35.2535.9Low income
2AngolaAGO45.98519.1Upper middle income
3AlbaniaALB12.87757.2Upper middle income
4United Arab EmiratesARE11.04488.0High income


#isme agar apne dhyan diya ho to hamne columns ke name ko change kr diya hai
#columns ke bich ke space ko khatm kr diya hai

--------------------------------------------------------------------
stats.CountryName


0                     Aruba
1               Afghanistan
2                    Angola
3                   Albania
4      United Arab Emirates
               ...         
190             Yemen, Rep.
191            South Africa
192        Congo, Dem. Rep.
193                  Zambia
194                Zimbabwe
Name: CountryName, Length: 195, dtype: object


#ap dekh skte hai ki hm kisi bhi column ke saare element kaise access kr skte hai
#isme hamne country name ke saare element ko access kiya

-----------------------------------------------------------------
stats[['CountryName','BirthRate']].head()


 CountryNameBirthRate
0Aruba10.244
1Afghanistan35.253
2Angola45.985
3Albania12.877
4United Arab Emirates11.044
#idhar hmne 2 column ko access kiya 

-----------------------------------------------------------
stats[4:8][['CountryName','BirthRate']]


CountryNameBirthRate
4United Arab Emirates11.044
5Argentina17.716
6Armenia13.308
7Antigua and Barbuda16.447

#CountryName aur birthRate coloum ke 4th row se 8th row ke element ko access kiya
#lekin isme 4 included hota hai aur 8 excluded (not included ) hota

--------------------------------------------------------
df1 = stats[4:8] 
df1

CountryNameCountryCodeBirthRateInternetUsersIncomeGroup
4United Arab EmiratesARE11.04488.0High income
5ArgentinaARG17.71659.9High income
6ArmeniaARM13.30841.9Lower middle income
7Antigua and BarbudaATG16.44763.4High income

----------------------------------------------------------
stats[['CountryName','BirthRate','InternetUsers']]

CountryNameBirthRateInternetUsers
0Aruba10.24478.9
1Afghanistan35.2535.9
2Angola45.98519.1
3Albania12.87757.2
4United Arab Emirates11.04488.0
............
190Yemen, Rep.32.94720.0
191South Africa20.85046.5
192Congo, Dem. Rep.42.3942.2
193Zambia40.47115.4
194Zimbabwe35.71518.5

195 rows × 3 columns

-------------------------------------------------------------------------------------

result = stats.BirthRate * stats.InternetUsers

result.head()


0    808.2516
1    207.9927
2    878.3135
3    736.5644
4    971.8720
dtype: float64


#idhar hamne mathematical operation kiya hai
#hamne birth rate aur internet users ko multiply kiya hai
#ham dataframe mai mathematical operation use kr skte hai

------------------------------------------------------------------
stats[:]
Country NameCountry CodeBirth rateInternet usersIncome Group
0ArubaABW10.24478.9High income
1AfghanistanAFG35.2535.9Low income
2AngolaAGO45.98519.1Upper middle income
3AlbaniaALB12.87757.2Upper middle income
4United Arab EmiratesARE11.04488.0High income
..................
190Yemen, Rep.YEM32.94720.0Lower middle income
191South AfricaZAF20.85046.5Upper middle income
192Congo, Dem. Rep.COD42.3942.2Low income
193ZambiaZMB40.47115.4Lower middle income
194ZimbabweZWE35.71518.5Low income

195 rows × 5 columns


---------------------------------------------------------------------------

stats[::2]


CountryNameCountryCodeBirthRateInternetUsersIncomeGroupMyCalc
0ArubaABW10.24478.9High income808.2516
2AngolaAGO45.98519.1Upper middle income878.3135
4United Arab EmiratesARE11.04488.0High income971.8720
6ArmeniaARM13.30841.9Lower middle income557.6052
8AustraliaAUS13.20083.0High income1095.6000
.....................
186VietnamVNM15.53743.9Lower middle income682.0743
188West Bank and GazaPSE30.39446.6Lower middle income1416.3604
190Yemen, Rep.YEM32.94720.0Lower middle income658.9400
192Congo, Dem. Rep.COD42.3942.2Low income93.2668
194ZimbabweZWE35.71518.5Low income660.7275

98 rows × 6 columns

#Idhar ham ek ek row ko skip krke likh rahe hai

------------------------------------------------------------------

stats[::-1]

CountryNameCountryCodeBirthRateInternetUsersIncomeGroupMyCalc
194ZimbabweZWE35.71518.5Low income660.7275
193ZambiaZMB40.47115.4Lower middle income623.2534
192Congo, Dem. Rep.COD42.3942.2Low income93.2668
191South AfricaZAF20.85046.5Upper middle income969.5250
190Yemen, Rep.YEM32.94720.0Lower middle income658.9400
.....................
4United Arab EmiratesARE11.04488.0High income971.8720
3AlbaniaALB12.87757.2Upper middle income736.5644
2AngolaAGO45.98519.1Upper middle income878.3135
1AfghanistanAFG35.2535.9Low income207.9927
0ArubaABW10.24478.9High income808.2516

195 rows × 6 columns


#Isme row ki starting piche se ho rahi hai.

----------------------------------------------------------------------------------

stats


CountryNameCountryCodeBirthRateInternetUsersIncomeGroupMyCalc
0ArubaABW10.24478.9High income808.2516
1AfghanistanAFG35.2535.9Low income207.9927
2AngolaAGO45.98519.1Upper middle income878.3135
3AlbaniaALB12.87757.2Upper middle income736.5644
4United Arab EmiratesARE11.04488.0High income971.8720
.....................
190Yemen, Rep.YEM32.94720.0Lower middle income658.9400
191South AfricaZAF20.85046.5Upper middle income969.5250
192Congo, Dem. Rep.COD42.3942.2Low income93.2668
193ZambiaZMB40.47115.4Lower middle income623.2534
194ZimbabweZWE35.71518.5Low income660.7275

195 rows × 6 columns

#here simply we are accessing the dataframe

--------------------------------------------------------------------------------------------------------------------------------------------

stats.iat[2,2]

output:
45.985

#iat means index at , so at 2nd means(3rd) row and 2nd means(3rd) coloum 45.985 is there
---------------------------------------------------------------------------------------

stats.iat[2,'BirthRate']

output:
error

#hm index at mai textual data nhi daal skte jaise hamne yaha likha hai 'birthrate', to agar hame textual data daalna hai to hame sirf khali at ka use krna padega , aayiye dekhte hai kaise
----------------------------------------------------------------
stats.at[2,'BirthRate']

output:
45.985
---------------------------------------------------------------------------------------
#We can also do filtering here

Filter = (stats.InternetUsers < 2)
stats[Filter]

CountryNameCountryCodeBirthRateInternetUsersIncomeGroupMyCalc
11BurundiBDI44.1511.3Low income57.3963
52EritreaERI34.8000.9Low income31.3200
55EthiopiaETH32.9251.9Low income62.5575
64GuineaGIN37.3371.6Low income59.7392
117MyanmarMMR18.1191.6Lower middle income28.9904
127NigerNER49.6611.7Low income84.4237
154Sierra LeoneSLE36.7291.7Low income62.4393
156SomaliaSOM43.8911.5Low income65.8365
172Timor-LesteTLS35.7551.1Lower middle income39.3305
#here we are telling python give me the data frame in which internet user is less than two 
---------------------------------------------------------------------

stats.describe()


BirthRateInternetUsers
count195.000000195.000000
mean21.46992842.076471
std10.60546729.030788
min7.9000000.900000
25%12.12050014.520000
50%19.68000041.000000
75%29.75950066.225000
max49.66100096.546800

#describe() basically apke numeric coloum ka statistical analysis krta hai
--------------------------------------------------------------------------------------------

Creating the dataframe using dictionary


dict ={"ID":['101', '102','103'], "name":['Ram', 'shayam', 'gyan'], 
       "City":['rewa','satna','Bhopal']}
dict

output:
{'ID': ['101', '102', '103'],
 'name': ['Ram', 'shayam', 'gyan'],
 'City': ['rewa', 'satna', 'Bhopal']}

# now creating the dataframe

s= pd.DataFrame(dict)
s

output:

IDnameCity
0101Ramrewa
1102shayamsatna
2103gyanBhopal

#here we go , here is our dataframe
-----------------------------------------------------------------------------------------------------

DOUBT?

Ask me  on comment section




Comments

Popular posts from this blog

All about Machine learning

Machine Learning

OS in Python