Pandas Data Frame Operations
Pandas Data Frame
In Pandas library , Data Frame is a 2 dimensional array with rows and columns where each row and column are pandas Series. Data frame has row and column index which can be used to access the data. Lets explain this with an example. First we will import pandas library.
Lets create some random data for our Data Frame operations,
We created this data frame from data created as a dictionary. And we saw how our data looks like in the frame, now we will see few common operations we can perform on the dataframe.
1. Head Operations: Shows us the first 5 rows by default and we can specify any number of rows we can see.
2. Tail Operations: Shows us the last 5 rows by default and we can specify any number of rows we can see.
3. Accessing Columns from the Data Frame : There are several ways we can access a particular column or a group of columns, they can be indexing, or using the loc or iloc function, we will discuss about them as we go on through the article.
Through all of this operations, we can see we combined 2 operations while accessing the columns, we accessed the column A and then used the head operation. This is possible because the index column operation, it returns a data frame on which we can do the next operation, example as,
We basically combined this both operations into one. We can do this for far more complex operations which we will see later.
Next we will explore the loc and the iloc functions which are very important for our analysis, lets see an example of the loc() function.
Based on the operation we have seen above, the loc function start with the slicing rows of the data and columns can be specified in Column names as mentioned.
Using Conditional Statements inside the loc function : Quite important this will help us use conditional statements in the data frame, before that we will see a way of running conditional statements.
As you can see above that we have used the loc function inside which we have given a condition and this returns to us another data frame, then we can apply a function on top of it for further operations.
Here we are using the apply() function to the data frame using the lambda function which operates of each of the elements in the data frame and applies the function to it.
Next we will look at the iloc function which like the loc function function deals with taking a section of the data frame out but is purely index based. We will be using index for the rows as well as for the columns, for example as
iloc[row_start_index:row_stop_index,column_start_index:column_stop_index]
As the operations you have see above using a section of the rows based on conditions we can perform aggregation operations on the data frame.
Next we move to another important topic called Data frame merging and concatenation which is similar to sql merging operation. For this we will create 2 data frames, which has column called key which is present in both the data frames along with data for it.
Simple merge, it merges the data frames based on the common row between the 2 data frames, which is also called a inner join.
This is a outer join which combines both the data frames based on all the available rows together.
This has been merged based on the left data frame which has the row with the key as "e".
This has been merged based on the right data frame which does not have the row with key as "e".
We can also use the pd.concat method to join data frames as follows using an axis or using the how method of joining.
Comments
Post a Comment