Thursday, June 15, 2017

python pandas practice

rename dateframe column name

df=df.rename(columns = {'two':'new_name'})

convert a list of string to dataframe

the first item in the list is the column names
pd.DataFrame(data[1:], columns=data[0])

dataFrame get all columns

list(df) returns all columns except the 'index'
to include the index, one way is
columns = list(df)

get values from next row

df['nextDayClose'] = df['Close'].shift(-1)

rolling window

Here is the example to calculate the 7 days average.
df['7_mean'] = df['Close'].rolling(window=7).mean()

dataframe remove rows containing NAN

dat.dropna(how='any') #to drop if any value in the row has a nan
dat.dropna(how='all') #to drop if all values in the row are nan

get last value of one column


dataframe change type to numeric

df = df.apply(pd.to_numeric, errors='ignore')

dataframe get one column

* putting text in the square brackets means manipulate columns
* putting number in the square brackets means manipulate rows

dataframe get multiple columns

df[['colName1', 'colName2', 'colName3']]

dataframe get first row

the api is like this df[startIndex:endIndex:sort], df[:1] equals df[0:1] which returns the first row of the dataframe

dataframe get last row


dataframe get all rows except first/last


dataframe reverse the order

to get rows except last and reverse the order, this one does not work, df[:-1:-1]
it has to be df[:-1][::-1]

dataframe sort by columns


dataframe keep rows have null/nan values

df = df[df.isnull().any(axis=1)]

readcsv and explicitly specify column type

quotes = pd.read_csv(csv, dtype={'Symbol': 'str'})
DataFrame.from_csv does not have this feature. It will convert "INF" to number infinite.

dataframe drop duplicates

quotes.drop_duplicates(subset=['Date'], keep='last', inplace=True)

two dataframes merge like inner join

merged = pd.merge(DataFrameA,DataFrameB, on=['Code','Date'])

dataframe only contain rows have null values

try to understand this

dataframe locate a cell

df.loc[0,'Close'] return the value of first row and column 'Close'
to set the value, df.loc[0,'Close']=123

dataframe locate by lambda

quotes_df.loc[lambda df: pd.isnull(df.Open) | pd.isnull(df.Volume) | df.Open == 0,:]

No comments:

Post a Comment