> For the complete documentation index, see [llms.txt](https://zeliang-yao.gitbook.io/my-note-zeliang-yao/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://zeliang-yao.gitbook.io/my-note-zeliang-yao/useful/pandas/basic.md). # Basic ```python import pandas as pd print (f" Using {pd.__name__},Version {pd.__version__}") ``` ```python Using pandas,Version 0.23.0 ``` ## 创建空Dataframe ```python df = pd.DataFrame() print(df) ``` ``` Empty DataFrame Columns: [] Index: [] ``` ## 从Dict创建Dataframe ```python dict = {'name':["Tom", "Bob", "Mary", "James"], 'age': [18, 30, 25, 40], 'city':["Beijing", "ShangHai","GuangZhou", "ShenZhen"]} df = pd.DataFrame(dict) df ``` | | name | age | city | | - | ----- | --- | --------- | | 0 | Tom | 18 | Beijing | | 1 | Bob | 30 | ShangHai | | 2 | Mary | 25 | GuangZhou | | 3 | James | 40 | ShenZhen | ```python index = pd.Index(["Tom", "Bob", "Mary", "James"],name = 'person') cols = ['age','city'] data = [[18,'Beijing'], [30,'ShangHai'], [25,'GuangZhou'], [40,'ShenZhen']] df =pd.DataFrame(index = index,data =data,columns = cols) df ``` | | age | city | | ------ | --- | --------- | | person | | | | Tom | 18 | Beijing | | Bob | 30 | ShangHai | | Mary | 25 | GuangZhou | | James | 40 | ShenZhen | ## 对columns的基础操作 ### add column ```python dict = {'name':["Tom", "Bob", "Mary", "James"], 'age': [18, 30, 25, 40], 'city':["Beijing", "ShangHai","GuangZhou", "ShenZhen"]} df = pd.DataFrame(dict) df ``` | | name | age | city | | - | ----- | --- | --------- | | 0 | Tom | 18 | Beijing | | 1 | Bob | 30 | ShangHai | | 2 | Mary | 25 | GuangZhou | | 3 | James | 40 | ShenZhen | ```python df['country'] = 'USA' df ``` | | name | age | city | country | | - | ----- | --- | --------- | ------- | | 0 | Tom | 18 | Beijing | USA | | 1 | Bob | 30 | ShangHai | USA | | 2 | Mary | 25 | GuangZhou | USA | | 3 | James | 40 | ShenZhen | USA | ```python df['adress'] = df['country'] df ``` | | name | age | city | country | adress | | - | ----- | --- | --------- | ------- | ------ | | 0 | Tom | 18 | Beijing | USA | USA | | 1 | Bob | 30 | ShangHai | USA | USA | | 2 | Mary | 25 | GuangZhou | USA | USA | | 3 | James | 40 | ShenZhen | USA | USA | ### Change column values ```python df['country'] = 'China' df ``` | | name | age | city | country | adress | | - | ----- | --- | --------- | ------- | ------ | | 0 | Tom | 18 | Beijing | China | USA | | 1 | Bob | 30 | ShangHai | China | USA | | 2 | Mary | 25 | GuangZhou | China | USA | | 3 | James | 40 | ShenZhen | China | USA | ```python df['adress'] = df['city']+','+ df['country'] df ``` | | name | age | city | country | adress | | - | ----- | --- | --------- | ------- | --------------- | | 0 | Tom | 18 | Beijing | China | Beijing,China | | 1 | Bob | 30 | ShangHai | China | ShangHai,China | | 2 | Mary | 25 | GuangZhou | China | GuangZhou,China | | 3 | James | 40 | ShenZhen | China | ShenZhen,China | ### Delete columns ```python df.drop('country',axis=1, inplace=True) del df['city'] df ``` | | name | age | adress | | - | ----- | --- | --------------- | | 0 | Tom | 18 | Beijing,China | | 1 | Bob | 30 | ShangHai,China | | 2 | Mary | 25 | GuangZhou,China | | 3 | James | 40 | ShenZhen,China | ### Select columns ```python df['age'] ``` ```python 0 18 1 30 2 25 3 40 Name: age, dtype: int64 ``` ```python df.name ``` ```python 0 Tom 1 Bob 2 Mary 3 James Name: name, dtype: object ``` ```python df[['age','name']] ``` | | age | name | | - | --- | ----- | | 0 | 18 | Tom | | 1 | 30 | Bob | | 2 | 25 | Mary | | 3 | 40 | James | ```python df.columns ``` ```python Index(['name', 'age', 'adress'], dtype='object') ``` ### Rename columns ```python df.rename(index = str, columns = {'age':'Age','name':'Name','adress':'Adress'},inplace=True) df ``` | | Name | Age | Adress | | - | ----- | --- | --------------- | | 0 | Tom | 18 | Beijing,China | | 1 | Bob | 30 | ShangHai,China | | 2 | Mary | 25 | GuangZhou,China | | 3 | James | 40 | ShenZhen,China | ```python df.rename(str.lower, axis='columns',inplace =True) df ``` | | name | age | adress | | - | ----- | --- | --------------- | | 0 | Tom | 18 | Beijing,China | | 1 | Bob | 30 | ShangHai,China | | 2 | Mary | 25 | GuangZhou,China | | 3 | James | 40 | ShenZhen,China | ```python df.rename(str.capitalize, axis='columns',inplace =True) df ``` | | Name | Age | Adress | | - | ----- | --- | --------------- | | 0 | Tom | 18 | Beijing,China | | 1 | Bob | 30 | ShangHai,China | | 2 | Mary | 25 | GuangZhou,China | | 3 | James | 40 | ShenZhen,China | ### Set column value with conditions ```python df['Group'] = 'elderly' df.loc[df['Age']<=18, 'Group'] = 'young' df.loc[(df['Age'] >18) & (df['Age'] <= 30), 'Group'] = 'middle_aged' df ``` | | Name | Age | Adress | Group | | - | ----- | --- | --------------- | ------------ | | 0 | Tom | 18 | Beijing,China | young | | 1 | Bob | 30 | ShangHai,China | middle\_aged | | 2 | Mary | 25 | GuangZhou,China | middle\_aged | | 3 | James | 40 | ShenZhen,China | elderly | ## 对rows的基础操作 ### loc函数查询 ``` df ``` | | Name | Age | Adress | Group | | - | ----- | --- | --------------- | ------------ | | 0 | Tom | 18 | Beijing,China | young | | 1 | Bob | 30 | ShangHai,China | middle\_aged | | 2 | Mary | 25 | GuangZhou,China | middle\_aged | | 3 | James | 40 | ShenZhen,China | elderly | ```python df.loc[:] ``` | | Name | Age | Adress | Group | | - | ----- | --- | --------------- | ------------ | | 0 | Tom | 18 | Beijing,China | young | | 1 | Bob | 30 | ShangHai,China | middle\_aged | | 2 | Mary | 25 | GuangZhou,China | middle\_aged | | 3 | James | 40 | ShenZhen,China | elderly | ### loc函数条件查询 ```python df.loc[df['Age']>20] ``` | | Name | Age | Adress | Group | | - | ----- | --- | --------------- | ------------ | | 1 | Bob | 30 | ShangHai,China | middle\_aged | | 2 | Mary | 25 | GuangZhou,China | middle\_aged | | 3 | James | 40 | ShenZhen,China | elderly | ### loc函数条件行列查询 ```python df.loc[df['Group']=='middle_aged','Name'] ``` ```python 1 Bob 2 Mary Name: Name, dtype: object ``` ### Where 查询 ```python filter_adult = df['Age']>25 result = df.where(filter_adult) result ``` | | Name | Age | Adress | Group | | - | ----- | ---- | -------------- | ------------ | | 0 | NaN | NaN | NaN | NaN | | 1 | Bob | 30.0 | ShangHai,China | middle\_aged | | 2 | NaN | NaN | NaN | NaN | | 3 | James | 40.0 | ShenZhen,China | elderly | ### Query 筛选 ``` df ``` | | Name | Age | Adress | Group | | - | ----- | --- | --------------- | ------------ | | 0 | Tom | 18 | Beijing,China | young | | 1 | Bob | 30 | ShangHai,China | middle\_aged | | 2 | Mary | 25 | GuangZhou,China | middle\_aged | | 3 | James | 40 | ShenZhen,China | elderly | ```python df.query('Group=="middle_aged"'and 'Age>30' ) ``` | | Name | Age | Adress | Group | | - | ----- | --- | -------------- | ------- | | 3 | James | 40 | ShenZhen,China | elderly | ### Dataframe其他信息 ``` df.shape ``` ``` (4, 4) ``` ``` df.describe() ``` | | Age | | ----- | --------- | | count | 4.000000 | | mean | 28.250000 | | std | 9.251126 | | min | 18.000000 | | 25% | 23.250000 | | 50% | 27.500000 | | 75% | 32.500000 | | max | 40.000000 | ``` df.head(3) df.tail(3) ``` | | Name | Age | Adress | Group | | - | ----- | --- | --------------- | ------------ | | 1 | Bob | 30 | ShangHai,China | middle\_aged | | 2 | Mary | 25 | GuangZhou,China | middle\_aged | | 3 | James | 40 | ShenZhen,China | elderly | ## 读写CSV ### 把df导出为CSV，不要index ```python df.to_csv('person.csv',index=None,sep=',') ``` ### 读取CSV为dataframe ```python person = pd.read_csv('person.csv') person ``` | | Name | Age | Adress | Group | | - | ----- | --- | --------------- | ------------ | | 0 | Tom | 18 | Beijing,China | young | | 1 | Bob | 30 | ShangHai,China | middle\_aged | | 2 | Mary | 25 | GuangZhou,China | middle\_aged | | 3 | James | 40 | ShenZhen,China | elderly | --- # Agent Instructions This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com. ## Querying This Documentation If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question. Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter: ``` GET https://zeliang-yao.gitbook.io/my-note-zeliang-yao/useful/pandas/basic.md?ask=&goal= ``` `ask` is the immediate question: it should be specific, self-contained, and written in natural language. `goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal. The response will contain a direct answer to the question and relevant excerpts and sources from the documentation. Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.