Pandas Concatenation: Where Dataframes Come Together
Pandas provide several functions for concatenating DataFrames, including pd.concat(), pd.append(), and pd.merge().
The pd.concat() function concatenates DataFrames along a particular axis. By default, the axis is axis=0, which means that the DataFrames are concatenated vertically (i.e., row-wise). You can specify axis=1 to concatenate horizontally (i.e., column-wise).
Here’s an example of how to use pd.concat() to concatenate two DataFrames vertically:
import pandas as pd
df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2'],
'B': ['B0', 'B1', 'B2'],
'C': ['C0', 'C1', 'C2'],
'D': ['D0', 'D1', 'D2']},
index=[0, 1, 2])
df2 = pd.DataFrame({'A': ['A3', 'A4', 'A5'],
'B': ['B3', 'B4', 'B5'],
'C': ['C3', 'C4', 'C5'],
'D': ['D3', 'D4', 'D5']},
index=[3, 4, 5])
df3 = pd.concat([df1, df2])
print(df3)
This will output the following DataFrame:
A B C D
0 A0 B0 C0 D0
1 A1 B1 C1 D1
2 A2 B2 C2 D2
3 A3 B3 C3 D3
4 A4 B4 C4 D4
5 A5 B5 C5 D5
The pd.append() function is a convenient shortcut for concatenating DataFrames vertically. It works the same as pd.concat(), but with a default axis=0 and the ability to specify the DataFrames to be concatenated as separate arguments instead of a list.
Here’s an example of how to use pd.append() to concatenate two DataFrames:
df4 = pd.DataFrame({'A': ['A6', 'A7', 'A8'],
'B': ['B6', 'B7', 'B8'],
'C': ['C6', 'C7', 'C8'],
'D': ['D6', 'D7', 'D8']},
index=[6, 7, 8])
df5 = pd.DataFrame({'A': ['A9', 'A10', 'A11'],
'B': ['B9', 'B10', 'B11'],
'C': ['C9', 'C10', 'C11'],
'D': ['D9', 'D10', 'D11']},
index=[9, 10, 11])
df6 = df3.append(df4, ignore_index=True).append(df5, ignore_index=True)
Pandas provides several ways to concatenate DataFrames and Series, which allows you to combine data from multiple sources into a single DataFrame or Series. Here are some of examples of how to use concatenation in Pandas:
pd.concat(): This method is used to concatenate multiple DataFrames or Series along a specific axis (rows or columns).
# Concatenate two DataFrames along rows
pd.concat([df1, df2])
# Concatenate two DataFrames along columns
pd.concat([df1, df2], axis=1)
pd.append(): This method is used to append rows to a DataFrame. It is a shorthand for pd.concat([df, new_row])
# Append a new row to a DataFrame
df.append(new_row)
pd.merge(): This method is used to merge two DataFrames on a specific column or index. It is similar to a SQL join operation.
# Merge two DataFrames on a specific column
pd.merge(df1, df2, on='Name')
pd.join(): This method is used to join two DataFrames on a specific column or index. It is similar to a SQL join operation.
# Join two DataFrames on a specific column
df1.join(df2, on='Name')
These are some examples of how to use concatenation in Pandas. The concatenation functions in Pandas provide a powerful and flexible way to combine data from multiple sources into a single DataFrame or Series, which can be useful for data cleaning, data wrangling, data analysis, and data modelling.