Thursday, June 15, 2017

python pandas practice

iternate dataframe

for index, row in prediction_df.iterrows():
        print(row['columnName'])

rename dataframe column name

df=df.rename(columns = {'two':'new_name'})

convert a list of string to dataframe

the first item in the list is the column names
pd.DataFrame(data[1:], columns=data[0])

dataFrame get all columns

list(df) returns all columns except the 'index'
to include the index, one way is
columns = list(df)
columns.append(df.index.name)

get values from next row

df['nextDayClose'] = df['Close'].shift(-1)

rolling window

Here is the example to calculate the 7 days average.
df['7_mean'] = df['Close'].rolling(window=7).mean()

dataframe remove rows containing NAN

dat.dropna(how='any') #to drop if any value in the row has a nan
dat.dropna(how='all') #to drop if all values in the row are nan

get last value of one column

df['columnName'].values[-1]

dataframe change type to numeric

df = df.apply(pd.to_numeric, errors='ignore')

dataframe get one column

df['colName']
* putting text in the square brackets means manipulate columns
* putting number in the square brackets means manipulate rows

dataframe get multiple columns

df[['colName1', 'colName2', 'colName3']]

dataframe get first row

df[:1]
the api is like this df[startIndex:endIndex:sort], df[:1] equals df[0:1] which returns the first row of the dataframe

dataframe get last row

df[-1:]

dataframe get all rows except first/last

df[1:]
df[:-1]

dataframe reverse the order

df[::-1]
to get rows except last and reverse the order, this one does not work, df[:-1:-1]
it has to be df[:-1][::-1]

dataframe sort by columns

df.sort_values('col2')

dataframe keep rows have null/nan values

df = df[df.isnull().any(axis=1)]

readcsv and explicitly specify column type

quotes = pd.read_csv(csv, dtype={'Symbol': 'str'})
DataFrame.from_csv does not have this feature. It will convert "INF" to number infinite.

dataframe drop duplicates

quotes.drop_duplicates(subset=['Date'], keep='last', inplace=True)

two dataframes merge like inner join

merged = pd.merge(DataFrameA,DataFrameB, on=['Code','Date'])

dataframe only contain rows have null values

try to understand this
df[pd.isnull(df).any(axis=1)]
https://stackoverflow.com/questions/14247586/python-pandas-how-to-select-rows-with-one-or-more-nulls-from-a-dataframe-without

dataframe locate a cell

df.loc[0,'Close'] return the value of first row and column 'Close'
to set the value, df.loc[0,'Close']=123

dataframe locate by value

df.loc[df['column_name'] == some_value]

dataframe locate by lambda

quotes_df.loc[lambda df: pd.isnull(df.Open) | pd.isnull(df.Volume) | df.Open == 0,:]
https://pandas.pydata.org/pandas-docs/stable/indexing.html#selection-by-callable

dataframe locate with or condition

new_df = prediction_df.loc[(prediction_df['result'] != 0) | (prediction_df['prediction'] != 0)]
df.loc[(df['column_name'] == some_value) & df['other_column'].isin(some_values)]
isin returns a boolean Series, so to select rows whose value is not in some_values, negate the boolean Series using ~
df.loc[~df['column_name'].isin(some_values)]

replace null with Nan

df = df.replace('null', np.nan)
then we can do dropna to delete the rows
df.dropna(how='any', subset=['Open', 'High', 'Low', 'Close', 'Volume'], inplace=True)

dateframe group by

group by one column value and returns multiple dataframes
dateSymbol_df is
          Date Symbol
    0   2017-08-10   SFBC
    1   2017-08-10   SVBI
    2   2017-08-09  SENEB
    ...
for date, symbol_df in dateSymbol_df.groupby('Date'):
    dateSymbols[date] = list(symbol_df['Symbol'])
the dateSymbols is
{
    '2017-08-10': ['SFBC','SVBI'],
    '2017-08-09': ['SENEB'],
    ...
https://stackoverflow.com/questions/40498463/python-splitting-dataframe-into-multiple-dataframes-based-on-column-values-and

iterate between two dates

for i in pd.date_range(start, end):
    print(i)


https://www.datacamp.com/community/tutorials/pandas-tutorial-dataframe-python#gs.hEu=LKY

No comments:

Post a Comment