Monday, July 10, 2017

python function arguments

Python function parameters is like this

*non-keywords is a tuple
** keywords is a dict
You can call the method like newfoo(2, 4, *(6, 8), **{'foo': 10, 'bar': 12})
In this case, normal_arg1 = 2, normal_arg2 = 4, nonkeywords is (6,8), keywords is {'foo': 10, 'bar': 12}
newfoo(1)
    get error
newfoo(1,2 )
    normal_arg1 =1, normal_arg2 = 2
newfoo(*(1,2))
    normal_arg1 =1, normal_arg2 = 2
newfoo(1, *(2,))
    normal_arg1 =1, normal_arg2 = 2
newfoo(1, *(2,3))
    normal_arg1 =1, normal_arg2 = 2, nonkeywords: (3,)
newfoo(1, 2, x=3)
    normal_arg1 =1, normal_arg2 = 2, keywords: {'x':3}

For named arguments, the sequence can be changed.
test1(100,200) is equivalent to test1(x=100,y=200) and is equivalent to test1(y=200, x=100)
test1(100, y=200) is also allowed
but test1(100, x=200) is not allowed

For mixed arguments, named arguments have to follow default arguments, you cannot do this
Change it to

You can call test2(1,2,3) which is equivalent to test2(x=3, *(1,2))

To print all input arguments

Thursday, June 15, 2017

python pandas practice

rename dateframe column name


df=df.rename(columns = {'two':'new_name'})

convert a list of string to dataframe


the first item in the list is the column names
pd.DataFrame(data[1:], columns=data[0])

dataFrame get all columns

list(df) returns all columns except the 'index'
to include the index, one way is
columns = list(df)
columns.append(df.index.name)

get values from next row

df['nextDayClose'] = df['Close'].shift(-1)

rolling window

Here is the example to calculate the 7 days average.
df['7_mean'] = df['Close'].rolling(window=7).mean()

dataframe remove rows containing NAN

dat.dropna(how='any') #to drop if any value in the row has a nan
dat.dropna(how='all') #to drop if all values in the row are nan

get last value of one column

df['columnName'].values[-1]

dataframe change type to numeric

df = df.apply(pd.to_numeric, errors='ignore')

dataframe get one column

df['colName']
* putting text in the square brackets means manipulate columns
* putting number in the square brackets means manipulate rows

dataframe get multiple columns

df[['colName1', 'colName2', 'colName3']]

dataframe get first row

df[:1]
the api is like this df[startIndex:endIndex:sort], df[:1] equals df[0:1] which returns the first row of the dataframe

dataframe get last row

df[-1:]

dataframe get all rows except first/last

df[1:]
df[:-1]

dataframe reverse the order

df[::-1]
to get rows except last and reverse the order, this one does not work, df[:-1:-1]
it has to be df[:-1][::-1]

dataframe sort by columns

df.sort_values('col2')

dataframe keep rows have null/nan values

df = df[df.isnull().any(axis=1)]

readcsv and explicitly specify column type

quotes = pd.read_csv(csv, dtype={'Symbol': 'str'})
DataFrame.from_csv does not have this feature. It will convert "INF" to number infinite.

dataframe drop duplicates

quotes.drop_duplicates(subset=['Date'], keep='last', inplace=True)

two dataframes merge like inner join

merged = pd.merge(DataFrameA,DataFrameB, on=['Code','Date'])

dataframe only contain rows have null values

try to understand this
df[pd.isnull(df).any(axis=1)]
https://stackoverflow.com/questions/14247586/python-pandas-how-to-select-rows-with-one-or-more-nulls-from-a-dataframe-without

dataframe locate a cell

df.loc[0,'Close'] return the value of first row and column 'Close'
to set the value, df.loc[0,'Close']=123

dataframe locate by value

df.loc[df['column_name'] == some_value]

dataframe locate by lambda

quotes_df.loc[lambda df: pd.isnull(df.Open) | pd.isnull(df.Volume) | df.Open == 0,:]
https://pandas.pydata.org/pandas-docs/stable/indexing.html#selection-by-callable

replace null with Nan

df = df.replace('null', np.nan)
then we can do dropna to delete the rows
df.dropna(how='any', subset=['Open', 'High', 'Low', 'Close', 'Volume'], inplace=True)

iterate between two dates

for i in pd.date_range(start, end):
    print(i)



Friday, June 9, 2017

python practice

check if dict has key

if key in dict

python 3.6 install package

after 3.4, pip is not standalone, to install a package, run
python -m pip install SomePackage

functools.partial

returns a function with partially providing the parameters
https://docs.python.org/2/howto/functional.html 

python threads pool

csvFiles = list(filter(isFile, os.listdir("d:/quotes")))
with  multiprocessing.Pool(multiprocessing.cpu_count() - 1) as p:
    p.map(loadCsv, csvFiles)
Here loadCsv is the function, csvFiles is the list to iterate
https://docs.python.org/3.6/library/multiprocessing.html

pickle

with open('filename', 'a') as f:
        pickle.dump(data, f)

with open('filename', 'rb') as f:
    x = pickle.load(f)

python prirnt current exception

import traceback
traceback.print_exc()

zip two arrays into tuples

let's say you have two arrays [-1, 0, 1, 2] and [7,8,9,10]
you want to merge them into tuples like this [(-1, 7), (0, 8), (1, 9), (2, 10)]
list(zip(array1, array2))

array filter

result = [a for a in A if a not in subset_of_A]

python function argument

http://detailfocused.blogspot.ca/2017/07/python-function-parameters.html

define function inside function

def method_a(arg):

    def method_b(arg):
        return some_data

    some_data = method_b

Wednesday, June 7, 2017

ember.js practice

-install ember-cli
npm install -g ember-cli

- create project
ember new hello-world

--start server
ember server

- create controller
ember g controller application

- create route
ember g route about
ember g route parent
ember g route parent/child

--create component
ember g component x-counter


{{#unless showName}}
   

Hello {{name}}


{{else}}
   

Hello Ember


{{/unless}}






{{numClicks}} Click



Here is a tutorial online.

Monday, June 5, 2017

spring injects into static class

Usually, a static class shall not depend on some specific object. But in some cases, we want to inject some env specific instance into the static class by Spring.
MethodInvokingFactoryBean can help on this. Here is one example

If the method has multiple arguments, here is the example

This xml setting let spring call one specific set method on one object or class(static) to inject the object from the spring container. This piece of xml setting can be repeated if multiple objects need to be injected.

Friday, May 26, 2017

elasticsearch distinct value and search

In elasticsearch, to get the distinct value of one field, it is using term aggregation.

For example, here are the documents 
{
    "sourceTitle": "Arrival",
    "otherFields": ...
}
{
    "sourceTitle": "Arrival",
    "otherFields": ...
}
{
    "sourceTitle": "Eye in the Sky",
    "otherFields": ...
}
We want to get the result
    Arrival
    Eye in the sky
Here is the query to get the distinct values.
{
  "size": 0,
  "aggs": {
    "sourceTitle": {
      "terms": {
        "field": "sourceTitle",
        "size": 10
      }
    }
  }
}

But by default, elastic search will tokenize the field when indexing the data. To avoid that, we need make the field type "keyword". Then another problem comes up, when searching on "keyword" field, it has to be fully matched. Searching "Eye" on sourceTitle won't return anything. How to support getting distinct value and searching by partial text at the same moment.

We can set the field type to be text and give it a child field "keyword" which type is "keyword". 
"sourceTitle": {
  "type": "text",
  "fields": {
    "keyword": {
      "type": "keyword"
    }
  }
}

When getting distinct value, we shall run the aggs on the sourceTitle.keyword.
{
  "size": 0,
  "aggs": {
    "sourceTitle": {
      "terms": {
        "field": "sourceTitle.keyword",
        "size": 10
      }
    }
  }
}

https://discuss.elastic.co/t/distinct-value-and-search/86838/2

Wednesday, May 24, 2017

elasticsearch query

Full-text search queries

The most important queries in this category are the following:
- match_all
- match
- match_phrase
- multi_match
- query_string

-match_all return all documents, same result as {}

GET words_v1/userFamilarity/_search
{
  "query": {
    "match_all": {
    }
  }
}

- match

{
    "query" : {
        "match": {
            "title":"abc"
        }
    }
}
match in same order returns first
GET /my_test/words/_search
{
  "query": {
    "match": {
      "english" : "\"This is all of it\""
    }
  }
}
it will return "This is all of it" as first document.
then "This is all right"
then "it is all right"

- match_phrase exact match

{
    "query" : {
        "match_phrase" : {
            "spoken_words":"makes me laugh"
        }
    }
}

- multi_match

match in mutliple fields, the result shall have the query words in either fields "spoken_words" or "raw_character_text". If both fields are matched, the result get high score.
{
    "query" : {
        "mutli_match" : {
            "query":"homer simpson",
            "fields": ["spoken_words", "raw_character_text"]
        }
    }
}
boost the result. The "raw_character_text" was boost by factor "8".
{
    "query" : {
        "mutli_match" : {
            "query":"homer simpson",
            "fields": ["spoken_words", "raw_character_text^8"]
        }
    }
}
mutli_match
GET /my_test/words/_search
{
  "query": {
    "multi_match": {
      "query" : "eye sky",
      "fields" : ["english", "sourceTitle"],
      "operator" : "and"
    }
  }
}

- query_string


- AND - OR

operator, return the document contains both "all" and "special"
GET /my_test/words/_search
{
  "query": {
    "match": {
      "english": {
        "query" : "all special",
        "operator": "and"
      }
    }
  }
}

-wildcard

note: wildcard can consume a a lot of memory and time...
{
    "query" : {
        "fields":["spoken_words"],
        "query":"fri*"
    }
}

fuzzy match, even misspelling

{
    "query" : {
        "fields":["spoken_words"],
        "query":"dnout~"
    }
}
fuzzy match distance factor, to increase the performance. default distance is 2.
{
    "query" : {
        "fields":["spoken_words"],
        "query":"dnout~1"
    }
}

Term-based search queries

The most important queries in this category are the following:
- Term query
- Terms query
- Range query
- Exists query / Missing query

- Term query

The term query does an exact term matching in a given field. So, you need to provide the exact term to get the correct results. For example, if you have used a lowercase filter while
indexing, you need to pass the terms in lowercase while querying with the term query.
Another example: house after stemmer, it is "hous". The match query with parameter "hous" cannot return anything.
The term query with parameter "hous" can return the documents containing "house"
GET /words_v1/words/_search
{
  "query": {
    "term" : {
      "english" : "hous"
    }
  }
}
GET /words_v1/words/_search
{
  "query": {
    "match" : {
      "english" : "hous"
    }
  }
}

- Terms query


- Range query

exists query and missing query, get the documents which has or has not a value in one field
{
    "query": {
        "exists" : { "field" : "user" }
    }
}
The missing query can be done by combining must_not and exists query
{
    "query": {
        "bool": {
            "must_not": {
                "exists": {
                    "field": "user"
                }
            }
        }
    }
}

compound query

Compound queries are offered to connect multiple simple queries together to make your search better.\
- bool query
- not query
- Function score query

-bool query

{
    "query":{
        "bool":{
            "must":[{}],
            "should":[{}],
            "must_not":[{}]
            "filter":[{}]  //A query wrapped inside this clause must appear in the matching documents. However, this does not contribute to scoring.
        }
    }
}

{
    "query" : {
        "bool": {
            "must": {"match": {"title":"homer"}},
            "must_not": {"range": {"imdb_rating":{"gt": 8}}}
        }
    }
}

{
    "query" : {
        "bool": {
            "must": {"match": {"title":"homer"}},
            "must": {"range": {"imdb_rating":{"gt": 4, "lt":8}}}
        }
    }
}
Change "must" to "filter". Elastic search will do the filter first, then do the title match.
{
    "query" : {
        "bool": {
            "must": {"match": {"title":"homer"}},
            "filter": {"range": {"imdb_rating":{"gt": 4, "lt":8}}}
        }
    }
}

--------------------
the query json object is query => query type (such like match, term, multi_match, range...) => field name => more settings the query need

Queries were used to find out how relevant a document was to a particular query by calculating a score for each document, whereas filters were used to match certain criteria. In the query context, put the queries that ask the questions about document relevance and score calculations, while in the filter context, put the queries that need to match a simple yes/no question.


* query on date field, e.g find documents create after 2017-Feb-01
* constant_score: A query that wraps another query and simply returns a constant score equal to the query boost for every document in the filter.
* because of performance considerations; do not use sorting on analyzed fields.
---------------------

aggregation

4 types, pipline, matrics, bucket, matric aggregation
- Metrics are used to do statistics calculations, such as min, max, average, on a field of a document that falls into a certain criteria.
{
    "aggs": {  //declare doing aggregation
        "avg_word_count" : { //the field name in the result
            "avg" : {  // the function to do the aggregation, could be max, min...?
                "field" : "word_count"
            }
        }
}
The structure is like this
{
    "aggs": {
        "aggaregation_name": {
            "aggrigation_type": {
                "field": "name_of_the_field"
            }
        }
    }
}
size
{
    "size" : 0, //without this, the result displays the original documents first, then the aggregation result
    "aggs": {  //doing aggregation? 
        "avg_word_count" : { //the field name in the result
            "avg" : {  // the function to do the aggregation, could be max, min...?
                "field" : "word_count"
            }
        }
}

- extended_stats

GET /words_v1/userFamilarity/_search
{
  "size": 0,
  "aggs": {
    "result": {
      "extended_stats": {
        "field": "familarity"
      }
    }
  }
}
Here is the result
"aggregations": {
    "result": {
      "count": 47,
      "min": 0,
      "max": 3,
      "avg": 2.8085106382978724,
      "sum": 132,
      "sum_of_squares": 388,
      "variance": 0.3675871435038475,
      "std_deviation": 0.6062896531393617,
      "std_deviation_bounds": {
        "upper": 4.0210899445765955,
        "lower": 1.595931332019149
      }
    }
  }

- cardinality

The count of a distinct value of a field can be calculated using the cardinality aggregation.
{
    "size" : 0,
    "aggs": {  
        "speaking_line_count" : {
            "cardinality" : {
                "field" : "raw_character_text"
            }
        }

- percentile

{
    "size" : 0,
    "aggs": {  
        "word_count_percentiles" : {
            "percentile" : {
                "field" : "word_count"
            }
        }
}


for fielddata is disabled on text fields by default
PUT /myIndex/myType/_mapping/script    //script is type name
{
    "properties" : {
        "raw_character_text" : {
            "type" : "text",
            "fielddata" : true
        }
    }
}

bucket

document categorization based on some criteria, like group by in sql

- Terms aggregation, count group by term

GET /words_v1/userFamilarity/_search
{
  "size": 0,
  "aggs": {
    "result": {
      "terms": {
        "field": "familarity"
      }
    }
  }
}
Result
"aggregations": {
    "result": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": 3,
          "doc_count": 42
        },
        {
          "key": 1,
          "doc_count": 2
        },
        {
          "key": 2,
          "doc_count": 2
        },
        {
          "key": 0,
          "doc_count": 1
        }
      ]
    }
  }

- Range aggaregation

GET /words_v1/userFamilarity/_search
{
  "size": 0,
  "aggs": {
    "result": {
      "range": {
        "field": "familarity",
        "ranges": [
          {"to":3},  //3 is excluded
          {"from":3, "to":4}
        ]
      }
    }
  }
}
Result
"aggregations": {
    "result": {
      "buckets": [
        {
          "key": "*-3.0",
          "to": 3,
          "doc_count": 5
        },
        {
          "key": "3.0-4.0",
          "from": 3,
          "to": 4,
          "doc_count": 42
        }
      ]
    }
  }

- Date range aggregation

GET /words_v1/userFamilarity/_search
{
  "size": 0,
  "aggs": {
    "result": {
      "range": {
        "field": "date",
        "format": "yyyy",
        "ranges": [
          {"to":2017},
          {"from":2017, "to":2018}
        ]
      }
    }
  }
}
Result
"aggregations": {
    "result": {
      "buckets": [
        {
          "key": "*-1970",
          "to": 2017,
          "to_as_string": "1970",
          "doc_count": 0
        },
        {
          "key": "1970-1970",
          "from": 2017,
          "from_as_string": "1970",
          "to": 2018,
          "to_as_string": "1970",
          "doc_count": 0
        }
      ]
    }
  }

- Filter-based aggregation


- combine query and aggregation

GET /words_v1/userFamilarity/_search
{
  "size": 0,
  "query": {
    "match": {
      "familarity": 0
    }
  },
  "aggs": {
    "result": {
      "terms": {
        "field": "familarity"
      }
    }
  }
}
Result
"aggregations": {
    "result": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": 0,
          "doc_count": 1
        }
      ]
    }
  }

- combine filter and aggregation, the aggregation here is sub-aggregation

{
    "size" : 0,
    "aggs": {  
        "homer_word_count" : {
            "filter" : { "term" : {"raw_character_text":"homer"}}, //filter before aggregation
            "aggs": {
                "avg_word_count" : {"avg" : {"field", "word_count"} }
            }
        }
    }
}

{
    "size" : 0,
    "aggs": {  
        "simpsons" : {
            "filter" : {
                "other_bucket" : true,
                "other_bucket_key" : "Non-Simpsons Cast",
                "filters" : {
                    "Homer" : { "match" : {"row_character_text" : "homer"}},
                    "Lisa" : { "match" : {"row_character_text" : "lisa"}}
                }
            }
        }
    }
}

{
    "query" : {
        "terms" : {"raw_character_text" : ["homer"]}
    },
    "size" : 0,
    "aggregation" : {
        "SignificatnWords" : {
            "significant_terms" : {"field": "spoken_words"}
        }
    }
}

The bucket aggregations can be nested within each other. This means that a bucket can contain other buckets within it.
For example, a country-wise bucket can include a state-wise bucket, which can further include a city-wise bucket.

- sort

{
    "query":{
        "match":{"text":"data analytics"}
    },
    "sort":[
        {"created_at":{"order":"asc"},
        "followers_count":{"order":"asc"}}
    ]
}




Thursday, May 18, 2017

elasticsearch update_by_query

https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-update-by-query.html#picking-up-a-new-property

This is helpful when you want to update a bunch of documents.

This works for me

The inline script does not work for me.
Always got this error
{
  "error": {
    "root_cause": [
      {
        "type": "script_exception",
        "reason": "runtime error",
        "script_stack": [
          "ctx._soruce.date=\"2017-05-18T05:35:23.103Z\";",
          "           ^---- HERE"
        ],
        "script": "ctx._soruce.date=\"2017-05-18T05:35:23.103Z\";",
        "lang": "painless"
      }
    ],
    "type": "script_exception",
    "reason": "runtime error",
    "caused_by": {
      "type": "null_pointer_exception",
      "reason": null
    },
    "script_stack": [
      "ctx._soruce.date=\"2017-05-18T05:35:23.103Z\";",
      "           ^---- HERE"
    ],
    "script": "ctx._soruce.date=\"2017-05-18T05:35:23.103Z\";",
    "lang": "painless"
  },
  "status": 500
}
For single document update, the inline script works

Tuesday, May 16, 2017

JVM verbose log

For example, if you want to see the verbose jvm log for SSL handshake, add JVM parameters

-Djavax.net.debug=all -Djavax.net.debug=ssl:handshake:verbose


Friday, May 12, 2017

express: request entity too large

When post with large text, you may get the error "request entity too large", here is the code to fix that.


Monday, May 1, 2017

webpack practice

webpack enable sourcemaps




-- to be continued --

Thursday, March 30, 2017

passportjs, nodejs auth

The document of passportjs is not good. I followed the document and it did not make my code work.
After reading a couple of tutorial online, I eventually made it work.
I abstracted all passport related setting in one file, auth.js.

Here is the change in the server.js, basically it is adding authentication to the URL you want to protect.

Here are some helpful online tutorial
https://www.danielgynn.com/node-auth-part2/
https://blog.risingstack.com/node-hero-node-js-authentication-passport-js/
http://passportjs.org/docs/facebook
https://github.com/jaredhanson/passport-facebook#re-authentication


Monday, March 27, 2017

Javascript asynchronous and promise

The javascript is now using lots of asynchronous call.  In some scenarios, you need an asynchronous call chain, one after another. Promise can help on asynchronous chain.

Here is the requirement. 
The UI display a task list. Each task has a priority, which determines the display order on the UI.
There is up arrow and down arrow on the right of each task. Clicking the up arrow will exchange the priority with the task above.

The code here is nodejs and mongoose. 
Here is the pseudo code. 
//find current task by taskId
//find the previous task based on the current task priority
//exchange the priority
//save the tasks (update first task, once it is done, in the callback update the second task)
There are at least 4 callback functions embedded. The code is hard for reading.
With Promise, the code could be refactored like this. It is much neat and easy for reading.
For code "Task.findById(id)", Mongoose returns a promise.
Once the record is returned, the method then is executed. It calls the function "findThisAndPreviousTaskByPriority", which returns two records in an array.
the function then returns a promise as well. Once the inside function returns, it triggers the next "then", which is exchangePriority. It also returns an array contains two tasks, but their priorities have already been exchanged. "updateTwoTasksPriority" basically creates two mongoose update promise based on the two tasks. The last function call uses "Promise.all" to convert a promises Iterate into one single promise.

Here are some good articles about javascript promise
https://davidwalsh.name/promises
http://stackoverflow.com/questions/39028882/chaining-async-method-calls-javascript
https://developer.mozilla.org/en/docs/Web/JavaScript/Reference/Global_Objects/Promise
https://www.youtube.com/watch?v=s6SH72uAn3Q
https://60devs.com/best-practices-for-using-promises-in-js.html
http://mongoosejs.com/docs/promises.html
https://developer.mozilla.org/en/docs/Web/JavaScript/Reference/Global_Objects/Promise/all
http://stackoverflow.com/questions/22519784/how-do-i-convert-an-existing-callback-api-to-promises





Thursday, March 23, 2017

MongoDB admin

To view result pretty 

db.yourcollection.find().pretty()

To view the commands sent to mongodb

For example, the query is sent by mongoose and in the code the query contains variable, how to view the exact query executed on MongoDB?
db.setLogLevel(3, 'command')
db.setLogLevel(3, 'query')
https://docs.mongodb.com/manual/reference/method/db.setLogLevel/

git commands

clone from local git repository

git clone C:\folder1 folder2
    folder1 is source, folder2 is destination




Wednesday, March 22, 2017

My 1st MEAN project

MEAN is A MEN
A is angular which is doing the client work.
MEN is server side implemented by Nodejs, Express and MongoDB (Mongoose). Basically use these 3 technologies to build a restful service.

To create a Nodejs project, use "npm init", then start with Express, integrate with Mongoose to access MongoDB. It is quite straight forward.

Here is a good tutorial about Nodejs and Mongoose
https://scotch.io/tutorials/using-mongoosejs-in-node-js-and-mongodb-applications

This youtube video teaches you how to build MEAN project step by step. It is awesome.
https://www.youtube.com/watch?v=PFP0oXNNveg&t=2226s
I basically followed the steps in this video. The only difference is that I use Angular 2 CLi to create the client angular 2 project. In the nodejs directory, I ran “ng new client”, which will generate a Angular 2 client project under the client folder.
About how to use Angular 2 Cli, here are the helpful tutorial.
https://scotch.io/tutorials/mean-app-with-angular-2-and-the-angular-cli
https://www.sitepoint.com/angular-2-tutorial/
https://www.sitepoint.com/ultimate-angular-cli-reference/
https://github.com/angular/angular-cli/wiki/build

It seems we shall start two web servers, one is running angular 2. The other one is running nodejs restful service. This may bring in AJAX cross domain issue.

The best way is to run only one web server which can have both angular 2  and restful service. I run "node server.js" which can bring up the restful service. Angular2 need build from typescript to javascript. And the default build directory of Angular 2 Cli is client/dist. We have to let the express server know that.
Here is my server.js

When developing, it is convenient to let it auto-build when files are changed. Run this command in the client folder can implement auto-build.
Then run this command to restart nodejs server, when any change is detected

https://www.npmjs.com/package/nodedev
nodedev allows developers to debug node js code in chrome. Open http://127.0.0.1:7001/debug?port=7000, then I can setup breakpoints and view the variable value. https://bitbucket.org/bonbonniere/mywork/src/7efaf965a5dd0699c5002c512a1f5d1bced2de5f/nodejs/myTaskList/?at=master

Here are some other good tutorials
https://www.youtube.com/watch?v=uONz0lEWft0
https://www.youtube.com/watch?v=_-CD_5YhJTA

Tuesday, March 21, 2017

MongoDB query practice

Here is the data
{"id":1, "priority":10}
{"id":2, "priority":9}
{"id":3, "priority":11} 
Now, the input parameter is id=3, and I want to get the records which id is 3 and the record which priority is just before the record which id=3. 
{"id":3, "priority":11}
{"id":1, "priority":10}
if the input parameter id=2, it shall return 
{"id":2, "priority":9}
If the input parameter id=1, it shall return 
{"id":1, "priority":10}
{"id":2, "priority":9}
Can it be done in one query?

delete records

db.users.deleteMany({ status : "A" })

updateMany

this is like the sql update with where clause, which update a bunch of documents satisfy the conditions.
db.restaurant.updateMany(
      { violations: { $gt: 4 } },
      { $set: { "Review" : true } }
   )

count and sort by count result

db.quote.aggregate([{$group: {_id:"$Symbol", quoteCount:{$sum:1}}, {$sort:{quoteCount:-1}}  }])

contains

db.users.findOne({"username" : {$regex : ".*son.*"}});

sort and get last N result

db.collection.find({}).sort("_id", -1).limit(N)
be careful, you cannot put in sort({"id": -1}), that will give you the error.

find documents which don't contain one field

db.prediction.find({isCorrect:{$exists:false}})

create unique index

db.quote.createIndex({"Symbol":1,"Date":1}, {unique:true})

allowDiskUse

db.users.aggregate( 
 [  
  { $group : { _id : "$key", number : { $sum : 1 } } },
  ...
 ], {
  allowDiskUse:true
 }
); 

Update field name

db.prediction.update({}, {$rename:{"date":"Date"}}, false, true)

distinct value

db.prediction.distinct("Prediction", {"Date":"2017-07-11"})

sublime text 3 shortcut key

Ctrl + tab
previous file

Friday, March 17, 2017

Angular 2 use local jquery and bootstrap

I use angular 2 cli to create the angular 2 project.

To use the jquery js or bootstrap css on local, add the configuration below into angular-cli.json

Monday, March 6, 2017

Javascript cross-domain request

This is a good article.

https://jvaneyck.wordpress.com/2014/01/07/cross-domain-requests-in-javascript/

nodejs server reload and debug

Nodedev is the wrapper for nodemon and node-inspector.
Nodemon is the tool to reload the server when source files are changed.
Node-inspector allows you debug through chrome for nodejs files.

https://www.npmjs.com/package/nodedev

For example, to run index.js
nodedev index.js

then you can open chrome and visit http://127.0.0.1:7001/debug?port=7000 to start debugging

Friday, March 3, 2017

Jmeter set JSON as body

If JSON object is set as body of request, this shall be added to the HTTP Header Manager.
Content-Type : application/json

Also on the UI of HTTP Request, select the checkbox for "Browser-compatible headers"


Mongoose findByIdAndUpdate does not return updated object

Here is the code

You will find the product is the one before update. But usually we need return the one after update. To do that, we need add option {new:true}
http://stackoverflow.com/questions/30419575/mongoose-findbyidandupdate-not-returning-correct-model

Mongoose close connection

Once the connection is created by mongoose, it will create a pool of connections. The pool is maintained by the driver so that the connections can be re-used. The best practice is to create a new connection only once for the whole application.

In MongoDB console, use this command to check the connections opened
The result is
{ "current" : 2, "available" : 999998, "totalCreated" : 14 }
One connection is by the mongoDB console, the other one is nodejs.

The interesting thing is that the number is not increased when I use JMeter to generate 10 concurrent requests.

We shall close the connection before exiting the nodejs program.

After the nodejs program exists, check the connection again and now it is
{ "current" : 1, "available" : 999998, "totalCreated" : 14 }

Here are the helpful articles.
http://stackoverflow.com/questions/23244843/how-mongodb-connection-works-on-concurrent-requests-in-nodejs-express-server

https://gist.github.com/pasupulaphani/9463004#file-mongoose_connet-js

find which command connect to mongodb
http://stackoverflow.com/questions/8975531/check-the-current-number-of-connections-to-mongodb
in windows, use this command to get pid
then use this command to find the process, replace 6888 with the pid found in the previous command

Thursday, March 2, 2017

nodejs mongoose

found a good tutorial about mongoose

https://scotch.io/tutorials/using-mongoosejs-in-node-js-and-mongodb-applications

In Mongoose, a sort can be done in any of the following ways:
Post.find({}).sort('test').exec(function(err, docs) { ... });
Post.find({}).sort({test: 1}).exec(function(err, docs) { ... });
Post.find({}, null, {sort: {date: 1}}, function(err, docs) { ... });
Post.find({}, null, {sort: [['date', -1]]}, function(err, docs) { ... });
http://stackoverflow.com/questions/4299991/how-to-sort-in-mongoose


process.env.POST

In Nodejs, you can see the code like this
the port is either process.env.PORT or 3000.
The "process.evn.PORT" can be passed in by command line argument. For example:
now the port is 4444.

Sometimes the setting may be env specific, e.g. database URL. Here is one solution.
https://gist.github.com/wilsonbalderrama/0ad2b64b5ab6287c7318

config.js

Tuesday, February 28, 2017

How to flatmap a stream of streams in Java8?

When using Java8 lamda, you may get Strem>, to flat it

Here is the api of Function
identity()
Returns a function that always returns its input argument.

Tuesday, February 21, 2017

Oracle find records by date diff

Here is the sql
;

INTERVAL default is 2 digit. If you need 3, need do this INTERVAL '300' DAY(3)

http://docs.oracle.com/cd/B19306_01/server.102/b14200/sql_elements003.htm


Wednesday, February 15, 2017

Regular expression to check a String not contain a word

For example, the string is "type1, type2, type3"

I want that if the string contains type2, the java match shall return false to me.

The regular expression shall be


?! is negative lookahead. ?!type2 means shall contain type2, then .* means any character.

There is a good online java regular expression test

http://www.freeformatter.com/java-regex-tester.html

Wednesday, January 18, 2017

total commander left right sync change dir


To sync change dir on both left and right, add a shortcut on toolbar for command

cm_SyncChangeDir

Monday, January 9, 2017

xpath contains() mutliple line text

I used to face this html and I try to get the element  by its text.

The xpath below does not work.

It has to be changed to this


Friday, January 6, 2017

java 8 flatmap

For String[]
For List
For Primitive, e.g. flatMapToInt
Here is the original article

Saturday, December 24, 2016

Xpath find element by its text

exact match


contains


regular expression

Thursday, December 22, 2016

How to verify xpath in chrome

Sometimes, I need to verify the xpath does be able to find the element I want.
Chrome
    F12 -> development mode
    Ctrl + f
    type in the xpath in the search text box
if anything is matched, it will be highlighted.

Or on the console, type
$x("your_xpath")

Friday, December 2, 2016

git ignore all except specific folders

Working on logstash pipeline.conf, changing it many times. It is good to keep the changing history.
I put pipeline.conf under myConfig directory, put my customized patterns under myPatterns directory and put my own elasticsearch template under myTemplate.
Everything under my* directory is the one I want to add to git and ignore everything else.

Here is my .gitignore

Wednesday, November 30, 2016

Application level authentioation of web service deployed on weblogic.

If you deploy web service on weblogic and you want to do the authentication on application level, not using the weblogic realm. You probably saw the error

The request requires user authentication. The response MUST include a WWW-Authenticate header field (section 14.46) containing a challenge applicable to the requested resource.

That is caused by weblogic. If you want to do the authentication in application level, not using weblogic realm, you need add this in weblogic config.xml.



Sunday, November 13, 2016

Hive installation

Hive is based on Hadoop. Before you install and run Hive, make sure Hadoop is up and running.

  • Download and unpack it
  • Add Hive to the system path by opening /etc/profile or ~/.bashrc and add the following two rows
    • export HIVE_HOME=/home/yao/mysoft/apache-hive-2.1.0-bin
    • export PATH=$PATH:$HIVE_HOME/bin:$HIVE_HOME/conf
  • Enable the settings by executing this command
    • source /etc/profile 
  • Create the configuration files
    • cd conf
    • cp hive-default.xml.template hive-site.xml
    • cp hive-env.sh.template hive-env.sh
    • cp hive-exec-log4j2.properties.template hive-exec-log4j2.properties
    • cp hive-log4j2.properties.template hive-log4j.properties
  • Modify the configuration file hive-site.sh
    • replace <value>${system:java.io.tmpdir}/${system:user.name}</value>  with <value>$HIVE_HOME/iotmp</value>
    • replace <value>${system:java.io.tmpdir}/${hive.session.id}_resources</value> with <value>$HIVE_HOME/iotmp</value>
    • replace <value>${system:java.io.tmpdir}/${system:user.name}</value> with <value>$HIVE_HOME/iotmp</value>
    • you may need create the directory iotmp
  • Modify hive-env.sh
    • add these two
    • export HADOOP_HOME=/home/yao/mysoft/hadoop-2.7.3
    • export HIVE_CONF_DIR=/home/yao/mysoft/apache-hive-2.1.0-bin/conf
  • Make sure Hadoop is running
  • Run Hive
    •  $HIVE_HOME/bin/hiveserver2
  • Run beeline  
    • $HIVE_HOME/bin/beeline -u jdbc:hive2://

Saturday, October 29, 2016

linux commands

port listern

    lsof -i:9000

make directory and its sub-directories

    mkdir -p mydata/hdfs/namenode

find files containing the string

    grep -Ril "text-to-find-here" /
  • i stands for ignore case (optional in your case).
  • R stands for recursive.
  • l stands for "show the file name, not the result itself".
  • / stands for starting at the root of your machine. 

Extract matched text 

    pcregrep -Moh '<PubSubError id=(\n|.)*?</PubSubError>' pubsubPusherWeb-1_1.log.2016-11-* > /home/x194594/error02.xml
  • M apply the regular expression to multiple lines
  • o show only the part of the line that matched
  • h suppress the prefixing filename on output, by default it output the file name if targetFile is multiple files.
The regular expression '<PubSubError id=(\n|.)*?</PubSubError>'
  • start with <PubSubError id=
  • end with </PubSubError>
  • between is (\n|.)*, which means either new line or any character, * means multiple times.
  • ? means non-greedy, so it stops at the first

Grep regular expression non-greedy

grep -oP '200:{_messageid=.*?,' pubsubPubSvc-1_0.log
  • P to use the perl syntax

Grep sort uniq and count

grep -oP '200:{_messageid=.*?,' pubsubPubSvc-1_0.log | sort | uniq | grep -c '_messageid'
  • | sort, sorting the result
  • | uniq, only return one if more than one
  • c, means count
The command can get same result
grep -oP '200:{_messageid=.*?,' pubsubPubSvc-1_0.log | sort | uniq | wc -l
  • wc -l  count line
If want to do something like SQL group by
    grep -oP '\-\[.*?]-T-1 INFO ]- Successfully saved error:' weblogicLog/pubsubPusherWeb-1_1.log | sort | uniq -c
uniq -c means count group by the uniq section.

Grep to display line number

    grep -n ...

Grep to display last n  lines

    grep pattern file | tail -1

Grep display first n lines

    grep -m 5 pattern file

Grep in specific files

    grep -r --include="*Spec.js" "your search text" searchDir/
or
    find searchDir -name "*Spec.js" | xargs grep "your search text"
xargs here is using the result of "find" as parameters of grep.

Grep display matched result colorfully

    grep --color "your search text" target

zgrep to grep from gz file 

    use zgrep if grep on gz file, the rest is same as grep.
    same, use zcat on gz file like cat on normal file

Link directory or file

    ln -s targetDirectoryOrFile link
to unlink it
    unlink

View CPU memory usage

    top
The column RES means physical memory used. The other columns are easy to understand.

View disk IO

    sar
or only care average
    sar | grep 'Average'

View disk usage for the files in specific folder

     du -sh * | sort -h

View disk space available 

    df -h .
or
    df -h

Split big files 

split --bytes=500M

Open file count

/usr/sbin/lsof |grep infra |wc -l


Thursday, October 13, 2016

Reverse an array without affecting special characters

Input:   str = "a,b$c"
Output:  str = "c,b$a"
Note that $ and , are not moved anywhere. 
Only subsequence "abc" is reversed

Input:   str = "Ab,c,de!$"
Output:  str = "ed,c,bA!$"

Input string: a!!!b.c.d,e'f,ghi
Output string: i!!!h.g.f,e'd,cba

Here is the algorithm
1) Let input string be 'str[]' and length of string be 'n'
2) l = 0, r = n-1
3) While l is smaller than r, do following
    a) If str[l] is not an alphabetic character, do l++
    b) Else If str[r] is not an alphabetic character, do r--
    c) Else (both left and right are alphabetic) swap str[l] and str[r]

Here is javascript impelementation:


Balanced parentheses

Here is the javascript implemenation

Wednesday, October 12, 2016

Sorted Linked List to Balanced Binary Tree

Here is the javascript implemenation

Minimum Depth of a Binary Tree (Level Order Traversal)

Here is the javascript implementation:


It actually goes through each level. When it reaches the first leaf node, it returns the depth.

The code can be changed a little to return the minimum path, like 'root'-2-5.


Another question can use similar algorithm: add a node into one height balanced tree, make sure it is still a height-balanced tree.
A binary tree is height balanced if the difference between any two leaves is at most 1.
The algorithm here is to traversal each level. When the first undefined leaf found, that is the position to add the new node.

Another balanced tree is weight balanced tree. A tree is weight balanced if the diff between total nodes of the left subtree and the right subtree is at most 1.
If one more node need to be added, for height balanced tree, you can add it to right of b, or left/right of d.
But for weight balanced tree, you can only add it to left/right of d.

Tuesday, October 11, 2016

quick sort and its time complexity

Here is the javascript implementation.
The first time run partition function. partition([4,8,7,1,3,5,6,2], 0, 7)
index is actually the swap position if any number less than 4(first item) is found

  • compare 8 with 4, 8>4, no change
  • compare 7 with 4, 7>4, no change
  • compare 1 with 4, 1<4, swap 1 with 8 (put the less number behind 4) and index++ (count the number less than 4)
  • now the array is 4,1,7,8,3,5,6,2
  • compare 3 with 4, 3<4, swap 3 with 7 and index++
  • now the array is 4,1,3,8,7,5,6,2
  • compare 5 with 4, 5>4, no change
  • compare 6 with 4, 6>4, no change
  • compare 2 with 4, 2<4, swap 2 with 8 and index++
  • now the array is 4,1,3,2,7,5,6,8
  • it executes swap(arr, pivot, index - 1); which swaps 4 with 2.
  • now the array is 2,1,3,4,7,5,6,8

As it reaches the end of the array, partition method is done, it returns 3, which is the position index of number 4 in current array.
We can see left to the 4, they are all numbers less than 4. The right side of 4 are all the numbers great than 4.
Then it will do recursive quickSort on left and right.

The worst time complexity is O(n2)

cn+c(n−1)+c(n−2)+⋯+2c=c(n+(n−1)+(n−2)+⋯+2)=c((n+1)(n/2)−1)=c((n2+n)/2-1) => O(n2)

The best time complexity is O(nlogn).
Quicksort's best case occurs when the partitions are as evenly balanced as possible: their sizes either are equal or are within 1 of each other.

Sunday, October 9, 2016

merge sort and its time complexity

Here is the javascript implementation.


The time complexity is O(nlogn).

It need divide all elements into two groups until one group has only 2 elements ( or 3 elemetns).
Sort in each group.
Merge two groups, compare the first element for each group, pop the small one into the result array; Then repeat the step again, when one group has no item, the values in the other group can be appended to the end of the result array.
For each stage, the compare time is n/2.
How many stage is there? divide all elements by 2, until the result is 2, so it is log2 n.
So the time complexity is O(n/2*log2 n) ==> O(nlogn)

The space complexity (memory usage) is O(n).