Pandas in Python

Pandas

Pandas ek open source data analysis library hai jo ki python mai likhi gyi hai, Agar saaf shabdo mai kahe to pandas is package for data frame.

Data frame ek table jaisa hai, ya fir ye kahna bura nhi hoga ki data frame ek 2 dimensional array jaisa hai jisme row aur coloum hai , Data Frame mai hm numeric ,string ,character data store kr skte hai.

Pandas ka use ham data frame mai operation krne ke liye krte hai

Install pandas

pip install pandas

Import

import pandas as pd

How to work with Dataframe:-

hm jis dataset ke sath kaam krne ja rahe hai usse ap yaha se download kr skte hai.

https://github.com/StephanieStallworth/Exploratory_Data_Analysis_Visualization_Python/blob/master/DemographicData.csv

import pandas as pd

stats = pd.read_csv('DemographicData.csv') # it read the file and then create the data frame

stats

output

Country Name	Country Code	Birth rate	Internet users	Income Group
0	Aruba	ABW	10.244	78.9	High income
1	Afghanistan	AFG	35.253	5.9	Low income
2	Angola	AGO	45.985	19.1	Upper middle income
3	Albania	ALB	12.877	57.2	Upper middle income
4	United Arab Emirates	ARE	11.044	88.0	High income
...	...	...	...	...	...
190	Yemen, Rep.	YEM	32.947	20.0	Lower middle income
191	South Africa	ZAF	20.850	46.5	Upper middle income
192	Congo, Dem. Rep.	COD	42.394	2.2	Low income
193	Zambia	ZMB	40.471	15.4	Lower middle income
194	Zimbabwe	ZWE	35.715	18.5	Low income

195 rows × 5 columns

-------------------------------------------------------------------

len(stats)

#this is basically a row of stats dataframe

------------------------------------------

stats.head()

 Country Name Country Code Birth rate Internet users Income Group
0 Aruba ABW 10.244 78.9 High income
1 Afghanistan AFG 35.253 5.9 Low income
2 Angola AGO 45.985 19.1 Upper middle income
3 Albania ALB 12.877 57.2 Upper middle income
4 United Arab Emirates ARE 11.044 88.0 High income

#it gives the top five

------------------------------------------

stats.tail()

 Country Name Country Code Birth rate Internet users Income Group
190 Yemen, Rep. YEM 32.947 20.0 Lower middle income
191 South Africa ZAF 20.850 46.5 Upper middle income
192 Congo, Dem. Rep. COD 42.394 2.2 Low income
193 Zambia ZMB 40.471 15.4 Lower middle income
194 Zimbabwe ZWE 35.715 18.5 Low income

	Country Name	Country Code	Birth rate	Internet users	Income Group
190	Yemen, Rep.	YEM	32.947	20.0	Lower middle income
191	South Africa	ZAF	20.850	46.5	Upper middle income
192	Congo, Dem. Rep.	COD	42.394	2.2	Low income
193	Zambia	ZMB	40.471	15.4	Lower middle income
194	Zimbabwe	ZWE	35.715	18.5	Low income

#it give the bottom five

----------------------------------------------------

lst = stats.columns
lst

Index(['Country Name', 'Country Code', 'Birth rate', 'Internet users',
       'Income Group'],
      dtype='object')

#it gives you the list of the columns


-----------------------------------------------------
stats.columns = (['CountryName', 'CountryCode', 'BirthRate', 'InternetUsers','IncomeGroup'])
stats.head()

CountryName CountryCode BirthRate InternetUsers IncomeGroup
0 Aruba ABW 10.244 78.9 High income
1 Afghanistan AFG 35.253 5.9 Low income
2 Angola AGO 45.985 19.1 Upper middle income
3 Albania ALB 12.877 57.2 Upper middle income
4 United Arab Emirates ARE 11.044 88.0 High income


#isme agar apne dhyan diya ho to hamne columns ke name ko change kr diya hai
#columns ke bich ke space ko khatm kr diya hai

--------------------------------------------------------------------
stats.CountryName


0                     Aruba
1               Afghanistan
2                    Angola
3                   Albania
4      United Arab Emirates
               ...         
190             Yemen, Rep.
191            South Africa
192        Congo, Dem. Rep.
193                  Zambia
194                Zimbabwe
Name: CountryName, Length: 195, dtype: object


#ap dekh skte hai ki hm kisi bhi column ke saare element kaise access kr skte hai
#isme hamne country name ke saare element ko access kiya

-----------------------------------------------------------------
stats[['CountryName','BirthRate']].head()


 CountryName BirthRate
0 Aruba 10.244
1 Afghanistan 35.253
2 Angola 45.985
3 Albania 12.877
4 United Arab Emirates 11.044
#idhar hmne 2 column ko access kiya 

-----------------------------------------------------------
stats[4:8][['CountryName','BirthRate']]


CountryName BirthRate
4 United Arab Emirates 11.044
5 Argentina 17.716
6 Armenia 13.308
7 Antigua and Barbuda 16.447

#CountryName aur birthRate coloum ke 4th row se 8th row ke element ko access kiya
#lekin isme 4 included hota hai aur 8 excluded (not included ) hota

--------------------------------------------------------
df1 = stats[4:8] 
df1

CountryName CountryCode BirthRate InternetUsers IncomeGroup
4 United Arab Emirates ARE 11.044 88.0 High income
5 Argentina ARG 17.716 59.9 High income
6 Armenia ARM 13.308 41.9 Lower middle income
7 Antigua and Barbuda ATG 16.447 63.4 High income

----------------------------------------------------------
stats[['CountryName','BirthRate','InternetUsers']]

CountryName BirthRate InternetUsers
0 Aruba 10.244 78.9
1 Afghanistan 35.253 5.9
2 Angola 45.985 19.1
3 Albania 12.877 57.2
4 United Arab Emirates 11.044 88.0
... ... ... ...
190 Yemen, Rep. 32.947 20.0
191 South Africa 20.850 46.5
192 Congo, Dem. Rep. 42.394 2.2
193 Zambia 40.471 15.4
194 Zimbabwe 35.715 18.5
195 rows × 3 columns
-------------------------------------------------------------------------------------
result = stats.BirthRate * stats.InternetUsers
result.head()

0    808.2516
1    207.9927
2    878.3135
3    736.5644
4    971.8720
dtype: float64


#idhar hamne mathematical operation kiya hai
#hamne birth rate aur internet users ko multiply kiya hai
#ham dataframe mai mathematical operation use kr skte hai

------------------------------------------------------------------
stats[:]
Country Name Country Code Birth rate Internet users Income Group
0 Aruba ABW 10.244 78.9 High income
1 Afghanistan AFG 35.253 5.9 Low income
2 Angola AGO 45.985 19.1 Upper middle income
3 Albania ALB 12.877 57.2 Upper middle income
4 United Arab Emirates ARE 11.044 88.0 High income
... ... ... ... ... ...
190 Yemen, Rep. YEM 32.947 20.0 Lower middle income
191 South Africa ZAF 20.850 46.5 Upper middle income
192 Congo, Dem. Rep. COD 42.394 2.2 Low income
193 Zambia ZMB 40.471 15.4 Lower middle income
194 Zimbabwe ZWE 35.715 18.5 Low income
195 rows × 5 columns

---------------------------------------------------------------------------
stats[::2]

CountryName CountryCode BirthRate InternetUsers IncomeGroup MyCalc
0 Aruba ABW 10.244 78.9 High income 808.2516
2 Angola AGO 45.985 19.1 Upper middle income 878.3135
4 United Arab Emirates ARE 11.044 88.0 High income 971.8720
6 Armenia ARM 13.308 41.9 Lower middle income 557.6052
8 Australia AUS 13.200 83.0 High income 1095.6000
... ... ... ... ... ... ...
186 Vietnam VNM 15.537 43.9 Lower middle income 682.0743
188 West Bank and Gaza PSE 30.394 46.6 Lower middle income 1416.3604
190 Yemen, Rep. YEM 32.947 20.0 Lower middle income 658.9400
192 Congo, Dem. Rep. COD 42.394 2.2 Low income 93.2668
194 Zimbabwe ZWE 35.715 18.5 Low income 660.7275
98 rows × 6 columns
#Idhar ham ek ek row ko skip krke likh rahe hai
------------------------------------------------------------------
stats[::-1]
CountryName CountryCode BirthRate InternetUsers IncomeGroup MyCalc
194 Zimbabwe ZWE 35.715 18.5 Low income 660.7275
193 Zambia ZMB 40.471 15.4 Lower middle income 623.2534
192 Congo, Dem. Rep. COD 42.394 2.2 Low income 93.2668
191 South Africa ZAF 20.850 46.5 Upper middle income 969.5250
190 Yemen, Rep. YEM 32.947 20.0 Lower middle income 658.9400
... ... ... ... ... ... ...
4 United Arab Emirates ARE 11.044 88.0 High income 971.8720
3 Albania ALB 12.877 57.2 Upper middle income 736.5644
2 Angola AGO 45.985 19.1 Upper middle income 878.3135
1 Afghanistan AFG 35.253 5.9 Low income 207.9927
0 Aruba ABW 10.244 78.9 High income 808.2516
195 rows × 6 columns

#Isme row ki starting piche se ho rahi hai.
----------------------------------------------------------------------------------
stats

CountryName CountryCode BirthRate InternetUsers IncomeGroup MyCalc
0 Aruba ABW 10.244 78.9 High income 808.2516
1 Afghanistan AFG 35.253 5.9 Low income 207.9927
2 Angola AGO 45.985 19.1 Upper middle income 878.3135
3 Albania ALB 12.877 57.2 Upper middle income 736.5644
4 United Arab Emirates ARE 11.044 88.0 High income 971.8720
... ... ... ... ... ... ...
190 Yemen, Rep. YEM 32.947 20.0 Lower middle income 658.9400
191 South Africa ZAF 20.850 46.5 Upper middle income 969.5250
192 Congo, Dem. Rep. COD 42.394 2.2 Low income 93.2668
193 Zambia ZMB 40.471 15.4 Lower middle income 623.2534
194 Zimbabwe ZWE 35.715 18.5 Low income 660.7275
195 rows × 6 columns
#here simply we are accessing the dataframe
--------------------------------------------------------------------------------------------------------------------------------------------

CountryName	BirthRate
0	Aruba	10.244
1	Afghanistan	35.253
2	Angola	45.985
3	Albania	12.877
4	United Arab Emirates	11.044

CountryName	BirthRate
4	United Arab Emirates	11.044
5	Argentina	17.716
6	Armenia	13.308
7	Antigua and Barbuda	16.447

CountryName	CountryCode	BirthRate	InternetUsers	IncomeGroup
4	United Arab Emirates	ARE	11.044	88.0	High income
5	Argentina	ARG	17.716	59.9	High income
6	Armenia	ARM	13.308	41.9	Lower middle income
7	Antigua and Barbuda	ATG	16.447	63.4	High income

CountryName	BirthRate	InternetUsers
0	Aruba	10.244	78.9
1	Afghanistan	35.253	5.9
2	Angola	45.985	19.1
3	Albania	12.877	57.2
4	United Arab Emirates	11.044	88.0
...	...	...	...
190	Yemen, Rep.	32.947	20.0
191	South Africa	20.850	46.5
192	Congo, Dem. Rep.	42.394	2.2
193	Zambia	40.471	15.4
194	Zimbabwe	35.715	18.5

	CountryName	CountryCode	BirthRate	InternetUsers	IncomeGroup	MyCalc
0	Aruba	ABW	10.244	78.9	High income	808.2516
2	Angola	AGO	45.985	19.1	Upper middle income	878.3135
4	United Arab Emirates	ARE	11.044	88.0	High income	971.8720
6	Armenia	ARM	13.308	41.9	Lower middle income	557.6052
8	Australia	AUS	13.200	83.0	High income	1095.6000
...	...	...	...	...	...	...
186	Vietnam	VNM	15.537	43.9	Lower middle income	682.0743
188	West Bank and Gaza	PSE	30.394	46.6	Lower middle income	1416.3604
190	Yemen, Rep.	YEM	32.947	20.0	Lower middle income	658.9400
192	Congo, Dem. Rep.	COD	42.394	2.2	Low income	93.2668
194	Zimbabwe	ZWE	35.715	18.5	Low income	660.7275

CountryName	CountryCode	BirthRate	InternetUsers	IncomeGroup	MyCalc
194	Zimbabwe	ZWE	35.715	18.5	Low income	660.7275
193	Zambia	ZMB	40.471	15.4	Lower middle income	623.2534
192	Congo, Dem. Rep.	COD	42.394	2.2	Low income	93.2668
191	South Africa	ZAF	20.850	46.5	Upper middle income	969.5250
190	Yemen, Rep.	YEM	32.947	20.0	Lower middle income	658.9400
...	...	...	...	...	...	...
4	United Arab Emirates	ARE	11.044	88.0	High income	971.8720
3	Albania	ALB	12.877	57.2	Upper middle income	736.5644
2	Angola	AGO	45.985	19.1	Upper middle income	878.3135
1	Afghanistan	AFG	35.253	5.9	Low income	207.9927
0	Aruba	ABW	10.244	78.9	High income	808.2516

CountryName	CountryCode	BirthRate	InternetUsers	IncomeGroup	MyCalc
0	Aruba	ABW	10.244	78.9	High income	808.2516
1	Afghanistan	AFG	35.253	5.9	Low income	207.9927
2	Angola	AGO	45.985	19.1	Upper middle income	878.3135
3	Albania	ALB	12.877	57.2	Upper middle income	736.5644
4	United Arab Emirates	ARE	11.044	88.0	High income	971.8720
...	...	...	...	...	...	...
190	Yemen, Rep.	YEM	32.947	20.0	Lower middle income	658.9400
191	South Africa	ZAF	20.850	46.5	Upper middle income	969.5250
192	Congo, Dem. Rep.	COD	42.394	2.2	Low income	93.2668
193	Zambia	ZMB	40.471	15.4	Lower middle income	623.2534
194	Zimbabwe	ZWE	35.715	18.5	Low income	660.7275

stats.iat[2,2]

output:

45.985

#iat means index at , so at 2nd means(3rd) row and 2nd means(3rd) coloum 45.985 is there

---------------------------------------------------------------------------------------

stats.iat[2,'BirthRate']

output:

error

#hm index at mai textual data nhi daal skte jaise hamne yaha likha hai 'birthrate', to agar hame textual data daalna hai to hame sirf khali at ka use krna padega , aayiye dekhte hai kaise

----------------------------------------------------------------

stats.at[2,'BirthRate']

output:

45.985
---------------------------------------------------------------------------------------
#We can also do filtering here

Filter = (stats.InternetUsers < 2)
stats[Filter]

CountryName CountryCode BirthRate InternetUsers IncomeGroup MyCalc
11 Burundi BDI 44.151 1.3 Low income 57.3963
52 Eritrea ERI 34.800 0.9 Low income 31.3200
55 Ethiopia ETH 32.925 1.9 Low income 62.5575
64 Guinea GIN 37.337 1.6 Low income 59.7392
117 Myanmar MMR 18.119 1.6 Lower middle income 28.9904
127 Niger NER 49.661 1.7 Low income 84.4237
154 Sierra Leone SLE 36.729 1.7 Low income 62.4393
156 Somalia SOM 43.891 1.5 Low income 65.8365
172 Timor-Leste TLS 35.755 1.1 Lower middle income 39.3305
#here we are telling python give me the data frame in which internet user is less than two 
---------------------------------------------------------------------

stats.describe()


BirthRate InternetUsers
count 195.000000 195.000000
mean 21.469928 42.076471
std 10.605467 29.030788
min 7.900000 0.900000
25% 12.120500 14.520000
50% 19.680000 41.000000
75% 29.759500 66.225000
max 49.661000 96.546800

#describe() basically apke numeric coloum ka statistical analysis krta hai
--------------------------------------------------------------------------------------------

Creating the dataframe using dictionary


dict ={"ID":['101', '102','103'], "name":['Ram', 'shayam', 'gyan'], 
       "City":['rewa','satna','Bhopal']}
dict

output:
{'ID': ['101', '102', '103'],
 'name': ['Ram', 'shayam', 'gyan'],
 'City': ['rewa', 'satna', 'Bhopal']}

# now creating the dataframe

s= pd.DataFrame(dict)
s

output:

ID name City
0 101 Ram rewa
1 102 shayam satna
2 103 gyan Bhopal

#here we go , here is our dataframe
-----------------------------------------------------------------------------------------------------

DOUBT?

Ask me  on comment section

Search This Blog

Data Science

Pandas in Python

Comments

Post a Comment

Popular posts from this blog

All about Machine learning

Machine Learning

OS in Python

	CountryName	CountryCode	BirthRate	InternetUsers	IncomeGroup	MyCalc
11	Burundi	BDI	44.151	1.3	Low income	57.3963
52	Eritrea	ERI	34.800	0.9	Low income	31.3200
55	Ethiopia	ETH	32.925	1.9	Low income	62.5575
64	Guinea	GIN	37.337	1.6	Low income	59.7392
117	Myanmar	MMR	18.119	1.6	Lower middle income	28.9904
127	Niger	NER	49.661	1.7	Low income	84.4237
154	Sierra Leone	SLE	36.729	1.7	Low income	62.4393
156	Somalia	SOM	43.891	1.5	Low income	65.8365
172	Timor-Leste	TLS	35.755	1.1	Lower middle income	39.3305

BirthRate	InternetUsers
count	195.000000	195.000000
mean	21.469928	42.076471
std	10.605467	29.030788
min	7.900000	0.900000
25%	12.120500	14.520000
50%	19.680000	41.000000
75%	29.759500	66.225000
max	49.661000	96.546800