Working with Pandas Data Structures
Pandas is a powerful library for data manipulation and analysis in Python. In this section, we will explore various data structures provided by Pandas, including Series and DataFrame, and how to efficiently work with them.
Overview of Pandas Data Structures
1. Series
A Pandas Series is a one-dimensional array-like object that can hold any data type. It is similar to a list or an array in Python but with additional capabilities, such as indexing and data manipulation.Creating a Series
You can create a Series using the pd.Series()
constructor:
`
python
import pandas as pd
Creating a Series from a list
data = [10, 20, 30, 40] series = pd.Series(data) print(series)`
Output:
`
0 10
1 20
2 30
3 40
dtype: int64
`
2. DataFrame
A DataFrame is a two-dimensional, size-mutable, and potentially heterogeneous tabular data structure. It is similar to a spreadsheet or SQL table and can be thought of as a collection of Series.Creating a DataFrame You can create a DataFrame from various data sources:
`
python
Creating a DataFrame from a dictionary
data = { 'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35], 'City': ['New York', 'Los Angeles', 'Chicago'] }df = pd.DataFrame(data)
print(df)
`
Output:
`
Name Age City
0 Alice 25 New York
1 Bob 30 Los Angeles
2 Charlie 35 Chicago
`
3. Accessing Data
You can access data in a DataFrame using various methods: - Using column names:`
python
df['Name'] Accesses the 'Name' column
`
- Using .iloc[]
for integer-location based indexing:
`
python
df.iloc[0] Accesses the first row
`
4. Data Manipulation
Pandas provides numerous functions for data manipulation. Here are a few common operations: - Filtering Data`
python
filtered_df = df[df['Age'] > 28] Filters rows where Age is greater than 28
`
- Adding a New Column
`
python
df['Salary'] = [70000, 80000, 90000] Adds a new column to the DataFrame
`
5. Summary
Understanding Pandas data structures is essential for data analysis. Series are suitable for one-dimensional data, while DataFrames are ideal for two-dimensional data with multiple attributes. Mastering these structures will enable you to perform a wide range of data manipulation tasks efficiently.Practical Example
Let’s say we have a dataset of employees with their details. We want to find all employees who earn more than $75,000 and live in 'New York'. Here’s how you can do it:`
python
Sample employee data
employee_data = { 'Name': ['Alice', 'Bob', 'Charlie', 'David'], 'Salary': [70000, 80000, 90000, 60000], 'City': ['New York', 'Los Angeles', 'New York', 'Chicago'] }employee_df = pd.DataFrame(employee_data)
Filtering employees
high_earners_ny = employee_df[(employee_df['Salary'] > 75000) & (employee_df['City'] == 'New York')] print(high_earners_ny)`
Output:
`
Name Salary City
2 Charlie 90000 New York
`
This example demonstrates how to use conditions to filter data in a DataFrame effectively.