Installation:
install ElasticSearchcurl -XGET http://localhost:9200/
install Kibana
change config/kibana.yml, uncomment elasticsearch.url: "http://localhost:9200"
http://localhost:5601
install sense plugin
kibana plugin --install elastic/sense
run kibana
http://localhost:5601/app/sense
ElasticSearch concepts
index - a collection of documents (think it as database)type - represents a class/category of similar documents, e.g. "user" (think it as table)
mapping - similar to database schema for a table in RDBMS
includes the data type for each field, e.g. string, integer
Also includes information on how fields should be indexed and stored by Lucene
Document - a basic unit of information that can be indexed, consists of fields(think it as columns) which are key/value pairs. (think it as row in tables)
SHARDS
a index can be divided into multiple SHARDS, if one machine cannot store all data from the node
stored on any node in cluster
REPLICAS
a copy of a SHARDS, never resides on the same node of original SHARDS
Elastic search
To rank documents for a query, a score is calculated for each document that matches a query. The higher the score, the more relevant the document is to the search query.
Queries in query context affect the scores of matching documents.
- The queries in query context answer the question: "How well does the document match?"
Queries in filter context do not affect the scores of matching documents.
- The queries in filter context answer the question: "Does the document match"
Query String
Query DSL - for complex and advanced querieshttps://www.youtube.com/watch?v=ybu8XwbwXCQ
Leaf queries
Look for particular values in particular fields.
Compound queries
wrap leaf clauses or even other compound query clauses
Full Text queries
running full text queries on full text fields.
Term level queries
Used for exact matching of values, usually for structured data like number or dates, e.g. finding person born between year 1980 and 2000
Joining queries
performing joins in distributed system is expensive
Elastic provides Nested Query
has_child query returns parent documents whose child documents match the query
has_parent query returns child documents whose parent document matches the query
Geo Queries
command type in sense
GET /ecommerce/product/_search?q=name:(pasta AND spaghetti)
index type api query string
GET /ecommerce/product/_search?q=(name:(pasta AND spaghetti) AND status:active)
GET /ecommerce/product/_search?q=name:+pasta -spaghetti //does not include spaghetti
GET /ecommerce/product/_search?q=name:pasta spaghetti //without "", it equals to pasta OR spaghetti
GET /ecommerce/product/_search?q=name:"pasta spaghetti" // equals to pasta AND spaghetti, but the order does matter. The found value is pasta - spaghetti, it is not 100% match. It is analysed.
Types of aggregations
Metric
Bucket
Pipeline
useful query
list all index
http://localhost:9200/_cat/indices?v
one example to query
http://localhost:9200/logstash-2016.11.21/_search?pretty&q=response=200
query on subfields
http://localhost:9200/mytest/_search?pretty&q=metadata._offset:3068837
_source_exclude/_source_include
http://localhost:9200/simpsons/episode/1/_source?_source_exclude=video_urlhttp://localhost:9200/simpsons/episode/1/_source?_source_include=title
reindex data
Reindex data copies documents from one index to another. I already have one index with documents. I found I wanted to change the analyzer on one field. I created the new index as the index can not be modified. Reindex from the old index to the new index, then the new index has all the documents in the old index.https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html
You can limit the documents by adding type or adding a query.
delete documents by query
https://www.elastic.co/guide/en/elasticsearch/reference/5.3/docs-delete-by-query.html
increase index queue size
When doing bulk index, I got the errorrejected execution (queue capacity 200)
To increase the index queue, one way is to put this setting in elasticsearch.yml and restart the server.
Or send PUT request to persist it
http://stackoverflow.com/questions/33110310/increasing-the-size-of-the-queue-in-elasticsearch
The best way to do this is not to increase queue_size, but to do it in bulk action.
https://www.elastic.co/guide/en/elasticsearch/client/javascript-api/current/api-reference.html#api-bulk
Increase query timeout
For the timeout error, if using elasticsearch.js, the requestTimeout can be configured when initialize the elasticsearch.ClientView tokens by analyzer
To view how the text is tokenized by the analyzerview settings
view mapping
https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-get-mapping.htmlPut mapping
If an index already exists, use this to add more mappings
https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-put-mapping.html
A mapping cannot only be deleted when the index is deleted.
https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-delete-mapping.html
Alias
alias could be subset of the index, like the view in relational databasePOST /_aliases
{
"actions" : [
{
"add" : {
"index" : "simpsons",
"alias" : "homer",
"filter" : {
"term" : {"raw_character_text" : "homer"}
}
}
}
]
}
close/open indices
the closed indices do not consume computer resources. For the history data, you can open it when you really need it.POST http://localhost:9200/simpsons/_close
POST http://localhost:9200/simpsons/_open
_cat api
localhost:9200/_cat list all commandsupdate
curl -XPOST -d'{"doc":{"views":1001, "tags":["elasticsearch"]}}' localhost:9200/myIndex/myType/3/_updatethe command above update the document directly
the command below run the script to update the document
curl -XPOST -d'{"script":"ctx._srouce.views +=1"}' localhost:9200/myIndex/myType/3/_update
Updating document directly has better performance than updating document by running script.
To avoid concurrent requests competitions, use retry_on_conflict. Elasticsearch get the document and merge the changes.When writing into index, if the _version is not same, it means another process update the document at the same moment. The parameter retry_on_conflict allows elasticsearch do the steps aboe again.
curl -XPOST -d'{"script":"ctx._srouce.views +=1"}' localhost:9200/myIndex/myType/3/_update?retry_on_conflict=5
op_type=create
curl -i -XPUT -d '{"title":"Error handling in elasticsearch"}' localhost:9200/myIndex/myType/3?op_type=createif document with id 3 already exists, you will get the error "document already exists". Otherwise it will saved successfully.
ALternative
curl -i -XPUT -d '{"title":"Error handling in elasticsearch"}' localhost:9200/myIndex/myType/3/_create
if already exists, get http 409. If does not exists, get http 201.
Stop words for the same root
Here is a article about stopwords, the stopwords are filtered before stemmer filter applied.https://www.peterbe.com/plog/elasticsearch-snowball-analyzer-and-stopwords
I need stop the words for the same root, for example, I want to stop expect, expected, expecting as they are all the same root.Use the stopwords of the stemmer analyzer does not work as the filter of the stopwords were applied before stemming. I actually need the stopwords filter applied after the stemming. To do that, I need fully customized the analyzer like this, the my_stop filter is after the "porter_stem"
stemmer_override
The stem can be overridden. The rules can be embeddedor from a file. The path is either relative to config location, or absolute.
A sample file is like this
https://simpsora.wordpress.com/2014/05/02/customizing-elasticsearch-english-analyzer/
Random documents
Here is the query to get random documents, the returned documents are based on the seed.relationships and join query
http://detailfocused.blogspot.ca/2017/04/elasticsearch-join-query.html_update_by_query
http://detailfocused.blogspot.ca/2017/05/elasticsearch-updatebyquery.htmlelasticsearch query
http://detailfocused.blogspot.ca/2017/05/elasticsearch-query.htmlelasticsearch distinct value and search
http://detailfocused.blogspot.ca/2017/05/elasticsearch-distinct-value-and-search.html- to be continued -
No comments:
Post a Comment