Thursday, June 15, 2017

python pandas practice

rename dateframe column name


df=df.rename(columns = {'two':'new_name'})

convert a list of string to dataframe


the first item in the list is the column names
pd.DataFrame(data[1:], columns=data[0])

dataFrame get all columns

list(df) returns all columns except the 'index'
to include the index, one way is
columns = list(df)
columns.append(df.index.name)

get values from next row

df['nextDayClose'] = df['Close'].shift(-1)

rolling window

Here is the example to calculate the 7 days average.
df['7_mean'] = df['Close'].rolling(window=7).mean()

dataframe remove rows containing NAN

dat.dropna(how='any') #to drop if any value in the row has a nan
dat.dropna(how='all') #to drop if all values in the row are nan

get last value of one column

df['columnName'].values[-1]

dataframe change type to numeric

df = df.apply(pd.to_numeric, errors='ignore')

dataframe get one column

df['colName']
* putting text in the square brackets means manipulate columns
* putting number in the square brackets means manipulate rows

dataframe get multiple columns

df[['colName1', 'colName2', 'colName3']]

dataframe get first row

df[:1]
the api is like this df[startIndex:endIndex:sort], df[:1] equals df[0:1] which returns the first row of the dataframe

dataframe get last row

df[-1:]

dataframe get all rows except first/last

df[1:]
df[:-1]

dataframe reverse the order

df[::-1]
to get rows except last and reverse the order, this one does not work, df[:-1:-1]
it has to be df[:-1][::-1]

dataframe sort by columns

df.sort_values('col2')

dataframe keep rows have null/nan values

df = df[df.isnull().any(axis=1)]

readcsv and explicitly specify column type

quotes = pd.read_csv(csv, dtype={'Symbol': 'str'})
DataFrame.from_csv does not have this feature. It will convert "INF" to number infinite.

dataframe drop duplicates

quotes.drop_duplicates(subset=['Date'], keep='last', inplace=True)

two dataframes merge like inner join

merged = pd.merge(DataFrameA,DataFrameB, on=['Code','Date'])

dataframe only contain rows have null values

try to understand this
df[pd.isnull(df).any(axis=1)]
https://stackoverflow.com/questions/14247586/python-pandas-how-to-select-rows-with-one-or-more-nulls-from-a-dataframe-without

dataframe locate a cell

df.loc[0,'Close'] return the value of first row and column 'Close'
to set the value, df.loc[0,'Close']=123

dataframe locate by lambda

quotes_df.loc[lambda df: pd.isnull(df.Open) | pd.isnull(df.Volume) | df.Open == 0,:]
https://pandas.pydata.org/pandas-docs/stable/indexing.html#selection-by-callable

No comments:

Post a Comment