DataFrame & Series
DataFrames
数据框是一种二维数据结构,即数据在行和列中以表格方式对齐。
以下是数据框架的特征。
- 潜在的列是不同的类型
- 大小 – 可变
- 带标签的轴(行和列)
- 可以对行和列进行算术运算
结构【structure】
让我们假设我们正在使用学生的数据创建一个数据框。
您可以将其视为 SQL 表或电子表格数据表示。
pandas.DataFrame
-
可以使用以下构造函数创建 Pandas DataFrame -
pandas.DataFrame( data, index, columns, dtype, copy)
-
构造函数的参数如下 -
Sr.No Parameter & Description 1 data: data 采用各种形式,如 ndarray, series, map, lists, dict, constants and also another DataFrame. 2 index: 对于行标签,如果没有传递索引,则用于结果帧的索引是可选的默认 np.arange(n)。 3 columns: 对于列标签,可选的默认语法是 - np.arange(n)。这仅在没有传递索引时才成立。 4 dtype: 每列的数据类型。 5 copy: 如果默认值为 False,则此命令(或其他任何命令)用于复制数据。
Create an Empty DataFrame
可以创建的基本数据帧是空数据帧。
Example
#import the pandas library and aliasing as pd
import pandas as pd
df = pd.DataFrame()
print df
Its output is as follows ?
Empty DataFrame
Columns: []
Index: []
Create a DataFrame from Lists
可以使用单个列表或列表的列表创建 DataFrame。
Example 1
import pandas as pd
data = [1,2,3,4,5]
df = pd.DataFrame(data)
print df
Its output is as follows ?
0
0 1
1 2
2 3
3 4
4 5
Example 2
import pandas as pd
data = [[‘Alex‘,10],[‘Bob‘,12],[‘Clarke‘,13]]
df = pd.DataFrame(data,columns=[‘Name‘,‘Age‘])
print df
Example 3
import pandas as pd
data = [[‘Alex‘,10],[‘Bob‘,12],[‘Clarke‘,13]]
df = pd.DataFrame(data,columns=[‘Name‘,‘Age‘],dtype=float)
print df
Its output is as follows ?
Name Age
0 Alex 10.0
1 Bob 12.0
2 Clarke 13.0
注意 - 显而易见,dtype 参数将 Age 列的类型更改为浮点数。
Create a DataFrame from Dict of ndarrays / Lists
所有 ndarray 必须具有相同的长度。如果索引超出,则索引的长度应等于数组的长度。
如果索引未超出,则默认情况下,索引长将为 range(n),其中 n 是数组长度。
Example 1
import pandas as pd
data = {‘Name‘:[‘Tom‘, ‘Jack‘, ‘Steve‘, ‘Ricky‘],‘Age‘:[28,34,29,42]}
df = pd.DataFrame(data)
print df
Its output is as follows ?
Age Name
0 28 Tom
1 34 Jack
2 29 Steve
3 42 Ricky
注意 - 显而易见 0,1,2,3。它们是使用函数 range(n) 分配给每个的默认索引。
Example 2
现在让我们使用arrays创建一个带索引的 DataFrame。
import pandas as pd
data = {‘Name‘:[‘Tom‘, ‘Jack‘, ‘Steve‘, ‘Ricky‘],‘Age‘:[28,34,29,42]}
df = pd.DataFrame(data, index=[‘rank1‘,‘rank2‘,‘rank3‘,‘rank4‘])
print df
Its output is as follows ?
Age Name
rank1 28 Tom
rank2 34 Jack
rank3 29 Steve
rank4 42 Ricky
注意 - 显而易见,索引参数为每一行分配一个索引。
Create a DataFrame from List of Dicts
字典列表可以作为输入数据传递以创建一个 DataFrame。默认情况下,字典的键作为列名。
Example 1
The following example shows how to create a DataFrame by passing a list of dictionaries.
import pandas as pd
data = [{‘a‘: 1, ‘b‘: 2},{‘a‘: 5, ‘b‘: 10, ‘c‘: 20}]
df = pd.DataFrame(data)
print df
Its output is as follows ?
a b c
0 1 2 NaN
1 5 10 20.0
Note -显而易见,NaN(非数字)附加在缺失区域。
Example 2
The following example shows how to create a DataFrame by passing a list of dictionaries and the row indices.
import pandas as pd
data = [{‘a‘: 1, ‘b‘: 2},{‘a‘: 5, ‘b‘: 10, ‘c‘: 20}]
df = pd.DataFrame(data, index=[‘first‘, ‘second‘])
print df
Its output is as follows ?
a b c
first 1 2 NaN
second 5 10 20.0
Example 3
The following example shows how to create a DataFrame with a list of dictionaries, row indices, and column indices.
import pandas as pd
data = [{‘a‘: 1, ‘b‘: 2},{‘a‘: 5, ‘b‘: 10, ‘c‘: 20}]
#With two column indices, values same as dictionary keys
df1 = pd.DataFrame(data, index=[‘first‘, ‘second‘], columns=[‘a‘, ‘b‘])
#With two column indices with one index with other name
df2 = pd.DataFrame(data, index=[‘first‘, ‘second‘], columns=[‘a‘, ‘b1‘])
print df1
print df2
Its output is as follows ?
#df1 output
a b
first 1 2
second 5 10
#df2 output
a b1
first 1 NaN
second 5 NaN
Note ? Observe, df2 DataFrame is created with a column index other than the dictionary key; thus, appended the NaN’s in place. Whereas, df1 is created with column indices same as dictionary keys, so NaN’s appended.
Create a DataFrame from Dict of Series
可以通过Dict of Series以形成数据帧。结果索引是所有通过的系列索引的并集。
Example
import pandas as pd
d = {‘one‘ : pd.Series([1, 2, 3], index=[‘a‘, ‘b‘, ‘c‘]),
‘two‘ : pd.Series([1, 2, 3, 4], index=[‘a‘, ‘b‘, ‘c‘, ‘d‘])}
df = pd.DataFrame(d)
print df
Its output is as follows ?
one two
a 1.0 1
b 2.0 2
c 3.0 3
d NaN 4
Note ? Observe, 对于系列一,没有传递标签“d”,但在结果中,对于 d 标签, 附加了 NaN。
现在让我们通过示例了解列的选择、添加和删除。
Column Selection
从 DataFrame 中选择一列。
Example
import pandas as pd
d = {‘one‘ : pd.Series([1, 2, 3], index=[‘a‘, ‘b‘, ‘c‘]),
‘two‘ : pd.Series([1, 2, 3, 4], index=[‘a‘, ‘b‘, ‘c‘, ‘d‘])}
df = pd.DataFrame(d)
print df [‘one‘]
Its output is as follows ?
a 1.0
b 2.0
c 3.0
d NaN
Name: one, dtype: float64
Column Addition
向现有DataFrame添加新列
Example
import pandas as pd
d = {‘one‘ : pd.Series([1, 2, 3], index=[‘a‘, ‘b‘, ‘c‘]),
‘two‘ : pd.Series([1, 2, 3, 4], index=[‘a‘, ‘b‘, ‘c‘, ‘d‘])}
df = pd.DataFrame(d)
# Adding a new column to an existing DataFrame object with column label by passing new series
print ("Adding a new column by passing as Series:")
df[‘three‘]=pd.Series([10,20,30],index=[‘a‘,‘b‘,‘c‘])
print df
print ("Adding a new column using the existing columns in DataFrame:")
df[‘four‘]=df[‘one‘]+df[‘three‘]
print df
Its output is as follows ?
Adding a new column by passing as Series:
one two three
a 1.0 1 10.0
b 2.0 2 20.0
c 3.0 3 30.0
d NaN 4 NaN
Adding a new column using the existing columns in DataFrame:
one two three four
a 1.0 1 10.0 11.0
b 2.0 2 20.0 22.0
c 3.0 3 30.0 33.0
d NaN 4 NaN NaN
Column Deletion
Columns can be deleted or popped;
Example
# Using the previous DataFrame, we will delete a column
# using del function
import pandas as pd
d = {‘one‘ : pd.Series([1, 2, 3], index=[‘a‘, ‘b‘, ‘c‘]),
‘two‘ : pd.Series([1, 2, 3, 4], index=[‘a‘, ‘b‘, ‘c‘, ‘d‘]),
‘three‘ : pd.Series([10,20,30], index=[‘a‘,‘b‘,‘c‘])}
df = pd.DataFrame(d)
print ("Our dataframe is:")
print df
# using del function
print ("Deleting the first column using DEL function:")
del df[‘one‘]
print df
# using pop function
print ("Deleting another column using POP function:")
df.pop(‘two‘)
print df
Its output is as follows ?
Our dataframe is:
one three two
a 1.0 10.0 1
b 2.0 20.0 2
c 3.0 30.0 3
d NaN NaN 4
Deleting the first column using DEL function:
three two
a 10.0 1
b 20.0 2
c 30.0 3
d NaN 4
Deleting another column using POP function:
three
a 10.0
b 20.0
c 30.0
d NaN
Row Selection, Addition, and Deletion
We will now understand row selection, addition and deletion through examples.
Selection by Label
Rows can be selected by passing row label to a loc function.
import pandas as pd
d = {‘one‘ : pd.Series([1, 2, 3], index=[‘a‘, ‘b‘, ‘c‘]),
‘two‘ : pd.Series([1, 2, 3, 4], index=[‘a‘, ‘b‘, ‘c‘, ‘d‘])}
df = pd.DataFrame(d)
print df.loc[‘b‘]
Its output is as follows ?
one 2.0
two 2.0
Name: b, dtype: float64
The result is a series with labels as column names of the DataFrame. And, the Name of the series is the label with which it is retrieved.
Selection by integer location
Rows can be selected by passing integer location to an iloc function.
import pandas as pd
d = {‘one‘ : pd.Series([1, 2, 3], index=[‘a‘, ‘b‘, ‘c‘]),
‘two‘ : pd.Series([1, 2, 3, 4], index=[‘a‘, ‘b‘, ‘c‘, ‘d‘])}
df = pd.DataFrame(d)
print df.iloc[2]
Its output is as follows ?
one 3.0
two 3.0
Name: c, dtype: float64
Slice Rows切片行
Multiple rows can be selected using ‘ : ’ operator.
import pandas as pd
d = {‘one‘ : pd.Series([1, 2, 3], index=[‘a‘, ‘b‘, ‘c‘]),
‘two‘ : pd.Series([1, 2, 3, 4], index=[‘a‘, ‘b‘, ‘c‘, ‘d‘])}
df = pd.DataFrame(d)
print df[2:4]
Its output is as follows ?
one two
c 3.0 3
d NaN 4
Addition of Rows
Add new rows to a DataFrame using the append function. This function will append the rows at the end.
import pandas as pd
df = pd.DataFrame([[1, 2], [3, 4]], columns = [‘a‘,‘b‘])
df2 = pd.DataFrame([[5, 6], [7, 8]], columns = [‘a‘,‘b‘])
df = df.append(df2)
print df
Its output is as follows ?
a b
0 1 2
1 3 4
0 5 6
1 7 8
Deletion of Rows
使用索引标签从 DataFrame 中删除或删除行。如果标签重复,则将删除多行。
如果您观察到,在上面的示例中,标签是重复的。让我们删除一个标签,看看有多少行会被删除。
import pandas as pd
df = pd.DataFrame([[1, 2], [3, 4]], columns = [‘a‘,‘b‘])
df2 = pd.DataFrame([[5, 6], [7, 8]], columns = [‘a‘,‘b‘])
df = df.append(df2)
# Drop rows with label 0
df = df.drop(0)
print df
Its output is as follows ?
a b
1 3 4
1 7 8
Series
references