Pandas Series Operations



Pandas Series

The Pandas library contains specialized data structures known as Data frames which are like an excel worksheet while representing data in form of rows and columns. The data frames in pandas are collection of pandas series which are 1 dimensional array of a constant datatype used to represent any sequence of objects in pandas. Collection of pandas series row wise and column wise creates a pandas data frame.

Let’s start with the basic pandas series. We will be utilizing Jupiter notebooks in Visual Studio Code for our tasks, but you can do this in any python code editor which has pandas installed on it. We will start by importing pandas as

After we have successfully imported pandas. We will start by creating a pandas Series.
As you can see from the above the representation of the series is given in the form of key value pair. Here the key is index which is given in the left from 0 to 3 and the values are on the right. We can separately see the index or change it as per our needs.
Moreover, you can see from the representation above that the index of the Series is a separate datatype called Range Index. We can also specify the index which creating the series than doing it later.
Here above we see that while creating the series we have specified the index of the data. Like NumPy array or python lists we can select individual elements of the series using the key index method as follows We can also select multiple elements using the a list of indices inside the brackets.
We can also use this indexing feature to assign values to the series.
One interesting feature which will later come in handy while out data frame operation is called masking. In masking we supply a condition to the series object inside brackets. This condition will return a Boolean array using the condition. After the Boolean array is returned it can be used to filter out values which satisfies our condition. We can see an example as below:

In the example we create a series of numbers from 1 to 10. Then we apply the condition on the series for numbers greater than 5 , it returns a Boolean array as show. Now we insert the Boolean array which is a mask, and it only returns values which ae greater than 5. This can be done in one operation as follows:

Next, we will see creation of pandas Series using a dictionary as follows. The dictionary keys will be taken as index and the values will be the values.


The reason for showing this operation is that this kind of series will appear to us when we are dealing with pandas data frames, where a particular row in the data frame will be pandas series which we can operate on.
We can similarly convert a pandas series into a dictionary using the to_dict() function.
We will explore few other operations in Series. As the list of operations that can be done on pandas series can be quite exhaustive. We will focus on few important ones.
1. Apply function: This function allows us to apply a function which will manipulate the values in some way. For example,
We can also use lambda function for the same operation,
2. Functions for Central Tendency : We also hav function for central tendency, variance & standard deviation,
3. Working with missing values ( replacing them or dropping them): Sometimes our data contains elements which NaN values. Based on the type of data we are dealing with, we can either drop them or replace them with values which are relevant to the data. We will illustrate this with an example,
Here we will drop the np.nan values from the data using the dropna() method,
Here we will fill the np.nan with the value "REP"
Replace the np.nan values with values from central tendency,
Here we are getting only the values which are not np.nan using the notna() method which returns a Boolean mask array,

















































Comments

Popular posts from this blog

Python Asyncio Implementation

Information Security : Finding Documents Metadata