iternate dataframe
for index, row in prediction_df.iterrows():
print(row['columnName'])
rename dataframe column name
df=df.rename(columns = {'two':'new_name'})convert a list of string to dataframe
the first item in the list is the column namespd.DataFrame(data[1:], columns=data[0])
dataFrame get all columns
list(df) returns all columns except the 'index'to include the index, one way is
columns = list(df)
columns.append(df.index.name)
get values from next row
df['nextDayClose'] = df['Close'].shift(-1)rolling window
Here is the example to calculate the 7 days average.df['7_mean'] = df['Close'].rolling(window=7).mean()
dataframe remove rows containing NAN
dat.dropna(how='any') #to drop if any value in the row has a nandat.dropna(how='all') #to drop if all values in the row are nan
get last value of one column
df['columnName'].values[-1]dataframe change type to numeric
df = df.apply(pd.to_numeric, errors='ignore')dataframe get one column
df['colName']* putting text in the square brackets means manipulate columns
* putting number in the square brackets means manipulate rows
dataframe get multiple columns
df[['colName1', 'colName2', 'colName3']]dataframe get first row
df[:1]the api is like this df[startIndex:endIndex:sort], df[:1] equals df[0:1] which returns the first row of the dataframe
dataframe get last row
df[-1:]dataframe get all rows except first/last
df[1:]df[:-1]
dataframe reverse the order
df[::-1]to get rows except last and reverse the order, this one does not work, df[:-1:-1]
it has to be df[:-1][::-1]
dataframe sort by columns
df.sort_values('col2')dataframe keep rows have null/nan values
df = df[df.isnull().any(axis=1)]readcsv and explicitly specify column type
quotes = pd.read_csv(csv, dtype={'Symbol': 'str'})DataFrame.from_csv does not have this feature. It will convert "INF" to number infinite.
dataframe drop duplicates
quotes.drop_duplicates(subset=['Date'], keep='last', inplace=True)two dataframes merge like inner join
merged = pd.merge(DataFrameA,DataFrameB, on=['Code','Date'])dataframe only contain rows have null values
try to understand thisdf[pd.isnull(df).any(axis=1)]
https://stackoverflow.com/questions/14247586/python-pandas-how-to-select-rows-with-one-or-more-nulls-from-a-dataframe-without
dataframe locate a cell
df.loc[0,'Close'] return the value of first row and column 'Close'to set the value, df.loc[0,'Close']=123
dataframe locate by value
df.loc[df['column_name'] == some_value]dataframe locate by lambda
quotes_df.loc[lambda df: pd.isnull(df.Open) | pd.isnull(df.Volume) | df.Open == 0,:]https://pandas.pydata.org/pandas-docs/stable/indexing.html#selection-by-callable
dataframe locate with or condition
new_df = prediction_df.loc[(prediction_df['result'] != 0) | (prediction_df['prediction'] != 0)]df.loc[(df['column_name'] == some_value) & df['other_column'].isin(some_values)]
isin returns a boolean Series, so to select rows whose value is not in some_values, negate the boolean Series using ~
df.loc[~df['column_name'].isin(some_values)]
replace null with Nan
df = df.replace('null', np.nan)then we can do dropna to delete the rows
df.dropna(how='any', subset=['Open', 'High', 'Low', 'Close', 'Volume'], inplace=True)
dateframe group by
group by one column value and returns multiple dataframesdateSymbol_df is
Date Symbol
0 2017-08-10 SFBC
1 2017-08-10 SVBI
2 2017-08-09 SENEB
...
for date, symbol_df in dateSymbol_df.groupby('Date'): dateSymbols[date] = list(symbol_df['Symbol'])the dateSymbols is
{
'2017-08-10': ['SFBC','SVBI'],
'2017-08-09': ['SENEB'],
...
https://stackoverflow.com/questions/40498463/python-splitting-dataframe-into-multiple-dataframes-based-on-column-values-and
iterate between two dates
for i in pd.date_range(start, end):print(i)
print more columns
pd.set_option('display.expand_frame_repr', False)https://stackoverflow.com/questions/11707586/python-pandas-how-to-widen-output-display-to-see-more-columns
https://www.datacamp.com/community/tutorials/pandas-tutorial-dataframe-python#gs.hEu=LKY
No comments:
Post a Comment