玩转elasticsearch的搜索
es之批量导入数据BulkES提供了⼀个叫 bulk 的API 来进⾏批量操作批量导⼊
数据{"index": {"_index": "book", "_type": "_doc", "_id": 1}} {"name": "权⼒的游戏"} {"index": {"_index": "book", "_type": "_doc", "_id": 2}} {"name": "疯狂的⽯头"}POST bulkcurl -X POST "localhost:9200/_bulk" -H "Content-Type: application/json" --data-binary @namees之term的多种查询
介绍单词级别查询这些查询通常⽤于结构化的数据,⽐如:number, date, keyword等,⽽不是对text。也就是说,全⽂本查询之前要先对⽂本内容进⾏分词,⽽单词级别的查询直接在相应字段的反向索引中精确查找,单词级别的查询⼀般⽤于数值、⽇期等类型的字段上
准备⼯作删除nba索引新增nba索引PUT nba { "mappings": { "properties": { "birthDay": { "type": "date" }, "birthDayStr": { "type": "keyword" }, "age": { "type": "integer" }, "code": { "type": "text" }, "country": { "type": "text" }, "countryEn": { "type": "text" }, "displayAffiliation": { "type": "text" }, "displayName": { "type": "text" }, "displayNameEn": { "type": "text" }, "draft": { "type": "long" }, "heightValue": { "type": "float" }, "jerseyNo": { "type": "text" }, "playYear": { "type": "long" }, "playerId": { "type": "keyword" }, "position": { "type": "text" }, "schoolType": { "type": "text" }, "teamCity": { "type": "text" }, "teamCityEn": { "type": "text" }, "teamConference": { "type": "keyword" }, "teamConferenceEn": { "type": "keyword" }, "teamName": { "type": "keyword" }, "teamNameEn": { "type": "keyword" }, "weight": { "type": "text" } } } }批量导⼊数据(player⽂件)
链接:https://pan.baidu.com/s/13Uahu1FxKiY6nfRYeY4Myw
提取码:t2qb
Term query 精准匹配查询(查找号码为23的球员)POST nba/_search { "query": { "term": { "jerseyNo": "23" } } }
Exsit Query 在特定的字段中查找非空值的⽂档(查找队名非空的球员)POST nba/_search { "query": { "exists": { "field": "teamNameEn" } } }
Prefix Query 查找包含带有指定前缀term的⽂档(查找队名以Rock开头的球员)POST nba/_search { "query": { "prefix": { "teamNameEn": "Rock" } } }
Wildcard Query 支持通配符查询,*表示任意字符,?表示任意单个字符(查找火箭队的球员)POST nba/_search { "query": { "wildcard": { "teamNameEn": "Ro*s" } } }
Regexp Query 正则表达式查询(查找火箭队的球员)POST nba/_search { "query": { "regexp": { "teamNameEn": "Ro.*s" } } }
Ids Query(查找id为1和2的球员)POST nba/_search { "query": { "ids": { "values": [ 1, 2 ] } } }玩转es的范围查询
查找指定字段在指定范围内包含值(日期、数字或字符串)的文档。查找在nba打了2年到10年以内的球员POST nba/_search { "query": { "range": { "playYear": { "gte": 2, "lte": 10 } } } }查找1980年到1999年出⽣的球员POST nba/_search { "query": { "range": { "birthDay": { "gte": "01/01/1999", "lte": "2022", "format": "dd/MM/yyyy||yyyy" } } } }玩转es的布尔查询
布尔查询
type
description
must
必须出现在匹配⽂档中
filter
必须出现在⽂档中,但是不打分
must_not
不能出现在⽂档中
should
应该出现在⽂档中
must (查找名字叫做James的球员)POST /nba/_search { "query": { "bool": { "must": [ { "match": { "displayNameEn": "james" } } ] } } }
效果同must,但是不打分(查找名字叫做James的球员)POST /nba/_search { "query": { "bool": { "filter": [ { "match": { "displayNameEn": "james" } } ] } } }
must_not (查找名字叫做James的⻄部球员)POST /nba/_search { "query": { "bool": { "must": [ { "match": { "displayNameEn": "james" } } ], "must_not": [ { "term": { "teamConferenceEn": { "value": "Eastern" } } } ] } } }
should(查找名字叫做James的打球时间应该在11到20年⻄部球员)即使匹配不到也返回,只是评分不同POST /nba/_search { "query": { "bool": { "must": [ { "match": { "displayNameEn": "james" } } ], "must_not": [ { "term": { "teamConferenceEn": { "value": "Eastern" } } } ], "should": [ { "range": { "playYear": { "gte": 11, "lte": 20 } } } ] } } }如果minimum_should_match=1,则变成要查出名字叫做James的打球时间在11到20年⻄部球员POST /nba/_search { "query": { "bool": { "must": [ { "match": { "displayNameEn": "james" } } ], "must_not": [ { "term": { "teamConferenceEn": { "value": "Eastern" } } } ], "should": [ { "range": { "playYear": { "gte": 11, "lte": 20 } } } ], "minimum_should_match": 1 } } }玩转es的排序查询⽕箭队中按打球时间从⼤到⼩排序的球员POST nba/_search { "query": { "match": { "teamNameEn": "Rockets" } }, "sort": [ { "playYear": { "order": "desc" } } ] }⽕箭队中按打球时间从⼤到⼩,如果年龄相同则按照身⾼从⾼到低排序的球员POST nba/_search { "query": { "match": { "teamNameEn": "Rockets" } }, "sort": [ { "playYear": { "order": "desc" } }, { "heightValue": { "order": "asc" } } ] }玩转es聚合查询之指标聚合
ES聚合分析是什么聚合分析是数据库中重要的功能特性,完成对⼀个查询的数据集中数据的聚合计算,如:找出某字段(或计算表达式的结果)的最⼤值、最⼩值,计算和、平均值等。ES作为搜索引擎兼数据库,同样提供了强⼤的聚合分析能⼒。对⼀个数据集求最⼤、最⼩、和、平均值等指标的聚合,在ES中称为指标聚合⽽关系型数据库中除了有聚合函数外,还可以对查询出的数据进⾏分组group by,再在组上进⾏指标聚合。在ES中称为桶聚合
max min sum avg求出⽕箭队球员的平均年龄POST /nba/_search { "query": { "term": { "teamNameEn": { "value": "Rockets" } } }, "aggs": { "avgAge": { "avg": { "field": "age" } } }, "size": 0 }
value_count 统计非空字段的文档数求出⽕箭队中球员打球时间不为空的数量POST /nba/_search { "query": { "term": { "teamNameEn": { "value": "Rockets" } } }, "aggs": { "countPlayerYear": { "value_count": { "field": "playYear" } } }, "size": 0 }查出⽕箭队有多少名球员POST nba/_count { "query": { "term": { "teamNameEn": { "value": "Rockets" } } } }
Cardinality 值去重计数查出⽕箭队中年龄不同的数量POST /nba/_search { "query": { "term": { "teamNameEn": { "value": "Rockets" } } }, "aggs": { "counAget": { "cardinality": { "field": "age" } } }, "size": 0 }
stats 统计count max min avg sum 5个值查出⽕箭队球员的年龄statsPOST /nba/_search { "query": { "term": { "teamNameEn": { "value": "Rockets" } } }, "aggs": { "statsAge": { "stats": { "field": "age" } } }, "size": 0 }
Extended stats ⽐stats多4个统计结果: 平方和、方差、标准差、平均值加/减两个标准差的区间查出⽕箭队球员的年龄Extend statsPOST /nba/_search { "query": { "term": { "teamNameEn": { "value": "Rockets" } } }, "aggs": { "extendStatsAge": { "extended_stats": { "field": "age" } } }, "size": 0 }
Percentiles 占⽐百分位对应的值统计,默认返回[ 1, 5, 25, 50, 75, 95, 99 ]分位上的值查出⽕箭的球员的年龄占⽐POST /nba/_search { "query": { "term": { "teamNameEn": { "value": "Rockets" } } }, "aggs": { "pecentAge": { "percentiles": { "field": "age" } } }, "size": 0 }查出⽕箭的球员的年龄占⽐(指定分位值)POST /nba/_search { "query": { "term": { "teamNameEn": { "value": "Rockets" } } }, "aggs": { "percentAge": { "percentiles": { "field": "age", "percents": [ 20, 50, 75 ] } } }, "size": 0 }玩转es聚合查询之桶聚合
ES聚合分析是什么聚合分析是数据库中重要的功能特性,完成对⼀个查询的数据集中数据的聚合计算,如:找出某字段(或计算表达式的结果)的最⼤值、最⼩值,计算和、平均值等。ES作为搜索引擎兼数据库,同样提供了强⼤的聚合分析能⼒。对⼀个数据集求最⼤、最⼩、和、平均值等指标的聚合,在ES中称为指标聚合⽽关系型数据库中除了有聚合函数外,还可以对查询出的数据进⾏分组group by,再在组上进⾏指标聚合。在ES中称为桶聚合
Terms Aggregation 根据字段项分组聚合⽕箭队根据年龄进⾏分组POST /nba/_search { "query": { "term": { "teamNameEn": { "value": "Rockets" } } }, "aggs": { "aggsAge": { "terms": { "field": "age", "size": 10 } } }, "size": 0 }
order 分组聚合排序⽕箭队根据年龄进⾏分组,分组信息通过年龄从⼤到⼩排序 (通过指定字段)POST /nba/_search { "query": { "term": { "teamNameEn": { "value": "Rockets" } } }, "aggs": { "aggsAge": { "terms": { "field": "age", "size": 10, "order": { "_key": "desc" } } } }, "size": 0 }⽕箭队根据年龄进⾏分组,分组信息通过⽂档数从⼤到⼩排序 (通过⽂档数)POST /nba/_search { "query": { "term": { "teamNameEn": { "value": "Rockets" } } }, "aggs": { "aggsAge": { "terms": { "field": "age", "size": 10, "order": { "_count": "desc" } } } }, "size": 0 }每⽀球队按该队所有球员的平均年龄进⾏分组排序 (通过分组指标值)POST /nba/_search { "aggs": { "aggsTeamName": { "terms": { "field": "teamNameEn", "size": 30, "order": { "avgAge": "desc" } }, "aggs": { "avgAge": { "avg": { "field": "age" } } } } }, "size": 0 }
筛选分组聚合湖⼈和⽕箭队按球队平均年龄进⾏分组排序 (指定值列表)POST /nba/_search { "aggs": { "aggsTeamName": { "terms": { "field": "teamNameEn", "include": [ "Lakers", "Rockets", "Warriors" ], "exclude": [ "Warriors" ], "size": 30, "order": { "avgAge": "desc" } }, "aggs": { "avgAge": { "avg": { "field": "age" } } } } }, "size": 0 }湖⼈和⽕箭队按球队平均年龄进⾏分组排序 (正则表达式匹配值)POST /nba/_search { "aggs": { "aggsTeamName": { "terms": { "field": "teamNameEn", "include": "Lakers|Ro.*|Warriors.*", "exclude": "Warriors", "size": 30, "order": { "avgAge": "desc" } }, "aggs": { "avgAge": { "avg": { "field": "age" } } } } }, "size": 0 }
Range Aggregation 范围分组聚合NBA球员年龄按20,20-35,35这样分组POST /nba/_search { "aggs": { "ageRange": { "range": { "field": "age", "ranges": [ { "to": 20 }, { "from": 20, "to": 35 }, { "from": 35 } ] } } }, "size": 0 }NBA球员年龄按20,20-35,35这样分组 (起别名)POST /nba/_search { "aggs": { "ageRange": { "range": { "field": "age", "ranges": [ { "to": 20, "key": "A" }, { "from": 20, "to": 35, "key": "B" }, { "from": 35, "key": "C" } ] } } }, "size": 0 }
Date Range Aggregation 时间范围分组聚合NBA球员按出⽣年⽉分组POST /nba/_search { "aggs": { "birthDayRange": { "date_range": { "field": "birthDay", "format": "MM-yyy", "ranges": [ { "to": "01-1989" }, { "from": "01-1989", "to": "01-1999" }, { "from": "01-1999", "to": "01-2009" }, { "from": "01-2009" } ] } } }, "size": 0 }
Date Histogram Aggregation 时间柱状图聚合按天、⽉、年等进⾏聚合统计。可按 year (1y), quarter (1q), month (1M), week (1w), day(1d), hour (1h), minute (1m), second (1s) 间隔聚合NBA球员按出⽣年分组POST /nba/_search { "aggs": { "birthday_aggs": { "date_histogram": { "field": "birthDay", "format": "yyyy", "interval": "year" } } }, "size": 0 }es之query_string查询
介绍query_string 查询,如果熟悉lucene的查询语法,我们可以直接⽤lucene查询语法写⼀个查询串进⾏查询,ES中接到请求后,通过查询解析器,解析查询串⽣成对应的查询。
指定单个字段查询POST /nba/_search { "query": { "query_string": { "default_field": "displayNameEn", "query": "james OR curry" } }, "size": 100 }POST /nba/_search { "query": { "query_string": { "default_field": "displayNameEn", "query": "james AND harden" } }, "size": 100 }
指定多个字段查询POST /nba/_search { "query": { "query_string": { "fields": [ "displayNameEn", "teamNameEn" ], "query": "James AND Rockets" } }, "size": 100 }
参考个人博客:cyz