Thursday, January 17, 2013

Elastic Search Survival Kit



  • Each Lucene segment has its own cache. So indexing is not affecting too much search performances
  • every node is a "master", everybody indexes and everybody searches
  • To use kibana  (http://127.0.0.1:5601/app/kibana#/dev_tools)
  • to be able to access kibana from remote, change kibana.yml:
    • server.host: "0.0.0.0"
PUT /fb
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "type": "custom",
          "tokenizer" : "whitespace",
          "filter": ["lowercase", "stop"]
        }
      }
    }
  },
  "mappings": {
    "pages": {
      "properties": {
        "about": {
          "type":     "text",
          "fielddata": true,
          "analyzer": "my_analyzer"
        }
      }
    }
  }
}

-----  term facets
GET /fb/pages/_search
{
  "size": 1,
  "aggs" : {
    "group_by_text_term" : {
      "terms" : {
        "field" : "about",
        "size":30
        }
    }
  }
}

----------- facets on a category (after v5, text columns are also keyword columns)
GET /fb2/pages/_search
{
  "size": 1,
  "query": {
    "match_all": {}
  },
  "aggs": {
    "group_by_cat": {          
      "terms": {          
        "field": "category.keyword",
        "size": 10
      }
    }
  }
}

------------- delete all docs
POST /fb2/_delete_by_query
{
  "query": {
    "match_all": {}
  }
}

------------- http Elastic queries

http://127.0.0.1:9200/_cat/indices

http://localhost:9200/_search?q=lastname:Bond 

http://127.0.0.1:9200/pj-search-index-1-6-test/_search


------------- versions
Elasticsearch 2.4.0 based on Lucene 5.5.2  (08/2016)
(ES version hop    2.4 to 5.0)
Elasticsearch 5.0                                          (11/2016)
Elasticsearch 5.5.1, based on Lucene 6.5.1 (07/2017)
Elasticsearch 5.6.3, based on Lucene 6.6.3 (07/2017)

Elasticsearch 6.0.0 beta (05/2017)   based on Lucene 7.0.0
Elasticsearch 6.7.1 beta (04/2019)   based on Lucene 7.7.1
Elasticsearch 7.0.0 beta (04/2019)   based on Lucene 8.0.0
In git sources:
vi  buildSrc/version.properties
elasticsearch     = 5.6.3
lucene            = 6.6.1

.

Tuesday, January 8, 2013

Hadoop / Cloudera survival kit

----- Debian
add this in /etc/apt/sources.list:

deb http://archive.cloudera.com/cdh4/debian/squeeze/amd64/cdh/ squeeze-cdh4.1.2 contrib

then you can do:

apt-get update
apt-get install hadoop

and then, things like:

hadoop fs -ls hdfs://192.168.0.135:8020/

----- Ubuntu:
add to   /etc/apt/sources.list

deb [arch=amd64] http://archive.cloudera.com/cdh4/ubuntu/precise/amd64/cdh/ precise-cdh4 contrib
deb-src http://archive.cloudera.com/cdh4/ubuntu/precise/amd64/cdh precise-cdh4 contrib

curl -s http://archive.cloudera.com/cdh4/ubuntu/precise/amd64/cdh/archive.key | sudo apt-key add -
apt-get update
sudo apt-get install hbase-master


[toto@vv182 ~]$  echo "scan 'offers', {LIMIT => 10, STARTROW => 'se|000029098138', ENDROW => 'se|0000291'}" |hbase shell

[toto@vv182 ~]$  echo "get 'offers','fr|000000002138|0000016418701245'" |hbase shell


[toto@vv182 ~]$ hadoop fs -cat /user/nomad/pipeline/delta_offers/my-file.txt