How to Create a DataFrame in Python
Learn how to create and manipulate DataFrames in Python using pandas, the most popular data analysis library.
pandas DataFrames are the backbone of data analysis in Python. They provide a powerful, flexible way to work with structured data. In this guide, you’ll learn multiple ways to create DataFrames.
Prerequisites
First, make sure you have pandas installed:
pip install pandas
Then import it in your Python script:
import pandas as pd
Method 1: From a Dictionary
The most common way to create a DataFrame is from a Python dictionary:
import pandas as pd
# Create a DataFrame from a dictionary
df = pd.DataFrame({
'name': ['Alice', 'Bob', 'Charlie'],
'age': [25, 30, 35],
'city': ['New York', 'London', 'Tokyo']
})
print(df)
Output:
name age city
0 Alice 25 New York
1 Bob 30 London
2 Charlie 35 Tokyo
Method 2: From a List of Dictionaries
You can also create a DataFrame from a list of dictionaries where each dictionary represents a row:
data = [
{'name': 'Alice', 'age': 25, 'city': 'New York'},
{'name': 'Bob', 'age': 30, 'city': 'London'},
{'name': 'Charlie', 'age': 35, 'city': 'Tokyo'}
]
df = pd.DataFrame(data)
print(df)
Method 3: From a CSV File
Reading data from a CSV file is extremely common:
# Read CSV file
df = pd.read_csv('data.csv')
# Read with specific options
df = pd.read_csv('data.csv',
sep=',', # delimiter
header=0, # row number for column names
index_col=None) # column to use as index
Method 4: From NumPy Arrays
If you’re working with numerical data, you can create a DataFrame from NumPy arrays:
import numpy as np
import pandas as pd
# Create from NumPy array
data = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
df = pd.DataFrame(data, columns=['A', 'B', 'C'])
print(df)
Inspecting Your DataFrame
After creating a DataFrame, use these methods to explore it:
# First few rows
df.head()
# Last few rows
df.tail()
# Shape (rows, columns)
df.shape
# Column info
df.info()
# Summary statistics
df.describe()
# Column names
df.columns
Summary
Creating DataFrames in Python with pandas is versatile:
- Use dictionaries for quick DataFrame creation
- Use
pd.read_csv()for loading CSV files - Use NumPy arrays for numerical data
- Always inspect your DataFrame with
.head(),.info(), and.describe()
Now you’re ready to start analyzing data with pandas!