Quiz: Working with Pandas Data Structures | Data Handling for AI (Pandas, NumPy)

Working with Pandas Data Structures

Pandas is a powerful library for data manipulation and analysis in Python. In this section, we will explore various data structures provided by Pandas, including Series and DataFrame, and how to efficiently work with them.

Overview of Pandas Data Structures

1. Series

A Pandas Series is a one-dimensional array-like object that can hold any data type. It is similar to a list or an array in Python but with additional capabilities, such as indexing and data manipulation.

Creating a Series You can create a Series using the pd.Series() constructor:

`python import pandas as pd

Creating a Series from a list

data = [10, 20, 30, 40] series = pd.Series(data) print(series) `

Output: ` 0 10 1 20 2 30 3 40 dtype: int64 `

2. DataFrame

A DataFrame is a two-dimensional, size-mutable, and potentially heterogeneous tabular data structure. It is similar to a spreadsheet or SQL table and can be thought of as a collection of Series.

Creating a DataFrame You can create a DataFrame from various data sources:

`python

Creating a DataFrame from a dictionary

data = { 'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35], 'City': ['New York', 'Los Angeles', 'Chicago'] }

df = pd.DataFrame(data) print(df) `

Output: ` Name Age City 0 Alice 25 New York 1 Bob 30 Los Angeles 2 Charlie 35 Chicago `

3. Accessing Data

You can access data in a DataFrame using various methods: - Using column names: `python df['Name']

Accesses the 'Name' column

` - Using .iloc[] for integer-location based indexing: `python df.iloc[0]

Accesses the first row

4. Data Manipulation

Pandas provides numerous functions for data manipulation. Here are a few common operations: - Filtering Data `python filtered_df = df[df['Age'] > 28]

Filters rows where Age is greater than 28

` - Adding a New Column `python df['Salary'] = [70000, 80000, 90000]

Adds a new column to the DataFrame

5. Summary

Understanding Pandas data structures is essential for data analysis. Series are suitable for one-dimensional data, while DataFrames are ideal for two-dimensional data with multiple attributes. Mastering these structures will enable you to perform a wide range of data manipulation tasks efficiently.

Practical Example

Let’s say we have a dataset of employees with their details. We want to find all employees who earn more than $75,000 and live in 'New York'. Here’s how you can do it:

`python

Sample employee data

employee_data = { 'Name': ['Alice', 'Bob', 'Charlie', 'David'], 'Salary': [70000, 80000, 90000, 60000], 'City': ['New York', 'Los Angeles', 'New York', 'Chicago'] }

employee_df = pd.DataFrame(employee_data)

Filtering employees

high_earners_ny = employee_df[(employee_df['Salary'] > 75000) & (employee_df['City'] == 'New York')] print(high_earners_ny) `

Output: ` Name Salary City 2 Charlie 90000 New York `

This example demonstrates how to use conditions to filter data in a DataFrame effectively.