Appearance
Elasticsearch 详解
Elasticsearch 是一个基于 Apache Lucene 的分布式搜索和分析引擎,能够实现近实时搜索。它是 Elastic Stack(ELK Stack)的核心组件,广泛用于全文搜索、日志分析、数据可视化等场景。
概述
核心概念
- 索引 (Index):类似数据库中的数据库,是文档的容器
- 文档 (Document):基本的信息单元,以 JSON 格式存储
- 字段 (Field):文档中的键值对
- 映射 (Mapping):定义文档及其字段的存储和索引方式
- 分片 (Shard):索引的水平分割单元
- 副本 (Replica):分片的备份
- 节点 (Node):集群中的单个服务器
- 集群 (Cluster):一个或多个节点的集合
主要特性
- 分布式架构:天然支持水平扩展
- 近实时搜索:文档索引后几乎立即可搜索
- RESTful API:通过 HTTP REST API 进行所有操作
- 多租户:支持多个索引和类型
- 丰富的查询 DSL:强大的查询语言
- 聚合分析:支持复杂的数据分析
- 高可用性:自动故障检测和恢复
安装与配置
安装方式
1. 官方安装包
bash
# macOS
brew install elasticsearch
# Ubuntu/Debian
wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -
echo "deb https://artifacts.elastic.co/packages/8.x/apt stable main" | sudo tee /etc/apt/sources.list.d/elastic-8.x.list
sudo apt-get update
sudo apt-get install elasticsearch
# CentOS/RHEL
sudo rpm --import https://artifacts.elastic.co/GPG-KEY-elasticsearch
echo '[elasticsearch]
name=Elasticsearch repository for 8.x packages
baseurl=https://artifacts.elastic.co/packages/8.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=0
autorefresh=1
type=rpm-md' | sudo tee /etc/yum.repos.d/elasticsearch.repo
sudo yum install --enablerepo=elasticsearch elasticsearch
2. Docker 安装
bash
# 拉取镜像
docker pull docker.elastic.co/elasticsearch/elasticsearch:8.11.0
# 单节点运行
docker run --name elasticsearch \
-p 9200:9200 -p 9300:9300 \
-e "discovery.type=single-node" \
-e "xpack.security.enabled=false" \
docker.elastic.co/elasticsearch/elasticsearch:8.11.0
# 带数据持久化
docker run --name elasticsearch \
-p 9200:9200 -p 9300:9300 \
-e "discovery.type=single-node" \
-e "xpack.security.enabled=false" \
-v /data/elasticsearch:/usr/share/elasticsearch/data \
docker.elastic.co/elasticsearch/elasticsearch:8.11.0
3. Docker Compose 集群
yaml
# docker-compose.yml
version: '3.8'
services:
es01:
image: docker.elastic.co/elasticsearch/elasticsearch:8.11.0
container_name: es01
environment:
- node.name=es01
- cluster.name=es-docker-cluster
- discovery.seed_hosts=es02,es03
- cluster.initial_master_nodes=es01,es02,es03
- bootstrap.memory_lock=true
- "ES_JAVA_OPTS=-Xms512m -Xmx512m"
- xpack.security.enabled=false
ulimits:
memlock:
soft: -1
hard: -1
volumes:
- data01:/usr/share/elasticsearch/data
ports:
- 9200:9200
networks:
- elastic
es02:
image: docker.elastic.co/elasticsearch/elasticsearch:8.11.0
container_name: es02
environment:
- node.name=es02
- cluster.name=es-docker-cluster
- discovery.seed_hosts=es01,es03
- cluster.initial_master_nodes=es01,es02,es03
- bootstrap.memory_lock=true
- "ES_JAVA_OPTS=-Xms512m -Xmx512m"
- xpack.security.enabled=false
ulimits:
memlock:
soft: -1
hard: -1
volumes:
- data02:/usr/share/elasticsearch/data
networks:
- elastic
es03:
image: docker.elastic.co/elasticsearch/elasticsearch:8.11.0
container_name: es03
environment:
- node.name=es03
- cluster.name=es-docker-cluster
- discovery.seed_hosts=es01,es02
- cluster.initial_master_nodes=es01,es02,es03
- bootstrap.memory_lock=true
- "ES_JAVA_OPTS=-Xms512m -Xmx512m"
- xpack.security.enabled=false
ulimits:
memlock:
soft: -1
hard: -1
volumes:
- data03:/usr/share/elasticsearch/data
networks:
- elastic
volumes:
data01:
driver: local
data02:
driver: local
data03:
driver: local
networks:
elastic:
driver: bridge
基本配置
elasticsearch.yml 主要配置
yaml
# 集群配置
cluster.name: my-application
node.name: node-1
node.roles: [master, data, ingest]
# 网络配置
network.host: 0.0.0.0
http.port: 9200
transport.port: 9300
# 发现配置
discovery.seed_hosts: ["host1", "host2"]
cluster.initial_master_nodes: ["node-1", "node-2"]
# 路径配置
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
# 内存配置
bootstrap.memory_lock: true
# 安全配置
xpack.security.enabled: true
xpack.security.transport.ssl.enabled: true
xpack.security.http.ssl.enabled: true
# 监控配置
xpack.monitoring.collection.enabled: true
JVM 配置 (jvm.options)
# 堆内存设置(建议设置为物理内存的一半,但不超过 32GB)
-Xms2g
-Xmx2g
# GC 配置
-XX:+UseG1GC
-XX:G1HeapRegionSize=16m
-XX:+UseG1OldGCMixedGCCount=16
-XX:+UseG1MixedGCLiveThresholdPercent=90
# 其他优化
-Djava.io.tmpdir=${ES_TMPDIR}
-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=/var/lib/elasticsearch
-XX:ErrorFile=/var/log/elasticsearch/hs_err_pid%p.log
基本操作
集群和节点管理
bash
# 检查集群健康状态
GET /_cluster/health
# 查看集群状态
GET /_cluster/state
# 查看节点信息
GET /_nodes
GET /_nodes/stats
# 查看集群设置
GET /_cluster/settings
# 更新集群设置
PUT /_cluster/settings
{
"persistent": {
"cluster.routing.allocation.disk.watermark.low": "85%",
"cluster.routing.allocation.disk.watermark.high": "90%"
}
}
索引管理
bash
# 创建索引
PUT /my_index
{
"settings": {
"number_of_shards": 3,
"number_of_replicas": 1,
"analysis": {
"analyzer": {
"my_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": ["lowercase", "stop"]
}
}
}
},
"mappings": {
"properties": {
"title": {
"type": "text",
"analyzer": "my_analyzer"
},
"content": {
"type": "text"
},
"publish_date": {
"type": "date"
},
"author": {
"type": "keyword"
},
"tags": {
"type": "keyword"
},
"views": {
"type": "integer"
}
}
}
}
# 查看索引
GET /_cat/indices?v
GET /my_index
# 查看索引设置
GET /my_index/_settings
# 查看索引映射
GET /my_index/_mapping
# 更新索引设置
PUT /my_index/_settings
{
"number_of_replicas": 2
}
# 删除索引
DELETE /my_index
# 关闭/打开索引
POST /my_index/_close
POST /my_index/_open
文档操作
索引文档
bash
# 创建文档(指定 ID)
PUT /my_index/_doc/1
{
"title": "Elasticsearch 入门指南",
"content": "Elasticsearch 是一个强大的搜索引擎...",
"publish_date": "2024-12-19",
"author": "张三",
"tags": ["elasticsearch", "搜索", "教程"],
"views": 100
}
# 创建文档(自动生成 ID)
POST /my_index/_doc
{
"title": "高级搜索技巧",
"content": "本文介绍 Elasticsearch 的高级搜索功能...",
"publish_date": "2024-12-20",
"author": "李四",
"tags": ["elasticsearch", "高级", "技巧"],
"views": 50
}
# 批量索引
POST /_bulk
{"index":{"_index":"my_index","_id":"2"}}
{"title":"搜索优化策略","content":"如何优化 Elasticsearch 搜索性能...","publish_date":"2024-12-21","author":"王五","tags":["优化","性能"],"views":75}
{"index":{"_index":"my_index","_id":"3"}}
{"title":"集群管理","content":"Elasticsearch 集群的配置和管理...","publish_date":"2024-12-22","author":"赵六","tags":["集群","管理"],"views":120}
查询文档
bash
# 根据 ID 获取文档
GET /my_index/_doc/1
# 检查文档是否存在
HEAD /my_index/_doc/1
# 获取文档的源数据
GET /my_index/_source/1
# 批量获取
GET /_mget
{
"docs": [
{"_index": "my_index", "_id": "1"},
{"_index": "my_index", "_id": "2"}
]
}
更新文档
bash
# 部分更新
POST /my_index/_update/1
{
"doc": {
"views": 150
}
}
# 脚本更新
POST /my_index/_update/1
{
"script": {
"source": "ctx._source.views += params.increment",
"params": {
"increment": 10
}
}
}
# Upsert(不存在则创建)
POST /my_index/_update/4
{
"doc": {
"title": "新文章",
"views": 1
},
"doc_as_upsert": true
}
删除文档
bash
# 删除单个文档
DELETE /my_index/_doc/1
# 根据查询删除
POST /my_index/_delete_by_query
{
"query": {
"range": {
"views": {
"lt": 50
}
}
}
}
搜索和查询
基本搜索
bash
# 搜索所有文档
GET /my_index/_search
# 简单查询字符串
GET /my_index/_search?q=elasticsearch
# 分页搜索
GET /my_index/_search
{
"from": 0,
"size": 10,
"query": {
"match_all": {}
}
}
# 排序
GET /my_index/_search
{
"query": {
"match_all": {}
},
"sort": [
{"publish_date": {"order": "desc"}},
{"views": {"order": "desc"}}
]
}
# 字段过滤
GET /my_index/_search
{
"_source": ["title", "author", "publish_date"],
"query": {
"match_all": {}
}
}
查询 DSL
全文搜索
bash
# match 查询
GET /my_index/_search
{
"query": {
"match": {
"title": "elasticsearch 搜索"
}
}
}
# multi_match 查询
GET /my_index/_search
{
"query": {
"multi_match": {
"query": "elasticsearch",
"fields": ["title^2", "content"]
}
}
}
# match_phrase 查询(短语匹配)
GET /my_index/_search
{
"query": {
"match_phrase": {
"content": "搜索引擎"
}
}
}
# query_string 查询
GET /my_index/_search
{
"query": {
"query_string": {
"query": "elasticsearch AND (搜索 OR 引擎)",
"fields": ["title", "content"]
}
}
}
精确匹配
bash
# term 查询
GET /my_index/_search
{
"query": {
"term": {
"author": "张三"
}
}
}
# terms 查询
GET /my_index/_search
{
"query": {
"terms": {
"tags": ["elasticsearch", "搜索"]
}
}
}
# range 查询
GET /my_index/_search
{
"query": {
"range": {
"views": {
"gte": 50,
"lte": 200
}
}
}
}
# exists 查询
GET /my_index/_search
{
"query": {
"exists": {
"field": "tags"
}
}
}
复合查询
bash
# bool 查询
GET /my_index/_search
{
"query": {
"bool": {
"must": [
{"match": {"title": "elasticsearch"}}
],
"filter": [
{"range": {"views": {"gte": 50}}}
],
"should": [
{"term": {"author": "张三"}}
],
"must_not": [
{"term": {"tags": "过时"}}
]
}
}
}
# boosting 查询
GET /my_index/_search
{
"query": {
"boosting": {
"positive": {
"match": {"content": "elasticsearch"}
},
"negative": {
"match": {"content": "过时"}
},
"negative_boost": 0.2
}
}
}
聚合分析
指标聚合
bash
# 基本统计
GET /my_index/_search
{
"size": 0,
"aggs": {
"avg_views": {
"avg": {"field": "views"}
},
"max_views": {
"max": {"field": "views"}
},
"min_views": {
"min": {"field": "views"}
},
"sum_views": {
"sum": {"field": "views"}
},
"stats_views": {
"stats": {"field": "views"}
}
}
}
# 百分位数
GET /my_index/_search
{
"size": 0,
"aggs": {
"views_percentiles": {
"percentiles": {
"field": "views",
"percents": [25, 50, 75, 95, 99]
}
}
}
}
桶聚合
bash
# terms 聚合
GET /my_index/_search
{
"size": 0,
"aggs": {
"authors": {
"terms": {
"field": "author",
"size": 10
}
}
}
}
# range 聚合
GET /my_index/_search
{
"size": 0,
"aggs": {
"view_ranges": {
"range": {
"field": "views",
"ranges": [
{"to": 50},
{"from": 50, "to": 100},
{"from": 100}
]
}
}
}
}
# date_histogram 聚合
GET /my_index/_search
{
"size": 0,
"aggs": {
"articles_over_time": {
"date_histogram": {
"field": "publish_date",
"calendar_interval": "month"
}
}
}
}
嵌套聚合
bash
# 按作者分组,计算每个作者的平均浏览量
GET /my_index/_search
{
"size": 0,
"aggs": {
"authors": {
"terms": {
"field": "author"
},
"aggs": {
"avg_views": {
"avg": {"field": "views"}
},
"total_views": {
"sum": {"field": "views"}
}
}
}
}
}
# 复杂嵌套聚合
GET /my_index/_search
{
"size": 0,
"aggs": {
"monthly_stats": {
"date_histogram": {
"field": "publish_date",
"calendar_interval": "month"
},
"aggs": {
"authors": {
"terms": {
"field": "author"
},
"aggs": {
"avg_views": {
"avg": {"field": "views"}
}
}
},
"total_views": {
"sum": {"field": "views"}
}
}
}
}
}
映射和分析
映射管理
bash
# 查看映射
GET /my_index/_mapping
# 添加字段映射
PUT /my_index/_mapping
{
"properties": {
"category": {
"type": "keyword"
},
"rating": {
"type": "float"
},
"location": {
"type": "geo_point"
}
}
}
# 动态模板
PUT /my_index
{
"mappings": {
"dynamic_templates": [
{
"strings_as_keywords": {
"match_mapping_type": "string",
"match": "*_keyword",
"mapping": {
"type": "keyword"
}
}
},
{
"strings_as_text": {
"match_mapping_type": "string",
"mapping": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
]
}
}
分析器配置
bash
# 自定义分析器
PUT /my_index
{
"settings": {
"analysis": {
"char_filter": {
"my_char_filter": {
"type": "mapping",
"mappings": [
"& => and",
"| => or"
]
}
},
"tokenizer": {
"my_tokenizer": {
"type": "pattern",
"pattern": "[\\W|_]+"
}
},
"filter": {
"my_stopwords": {
"type": "stop",
"stopwords": ["is", "at", "the", "on"]
},
"my_stemmer": {
"type": "stemmer",
"language": "english"
}
},
"analyzer": {
"my_analyzer": {
"type": "custom",
"char_filter": ["my_char_filter"],
"tokenizer": "my_tokenizer",
"filter": [
"lowercase",
"my_stopwords",
"my_stemmer"
]
}
}
}
}
}
# 测试分析器
GET /my_index/_analyze
{
"analyzer": "my_analyzer",
"text": "The quick brown foxes are running & jumping"
}
# 中文分析器(需要安装 IK 插件)
PUT /chinese_index
{
"settings": {
"analysis": {
"analyzer": {
"ik_smart_analyzer": {
"type": "ik_smart"
},
"ik_max_word_analyzer": {
"type": "ik_max_word"
}
}
}
},
"mappings": {
"properties": {
"title": {
"type": "text",
"analyzer": "ik_max_word",
"search_analyzer": "ik_smart"
},
"content": {
"type": "text",
"analyzer": "ik_max_word"
}
}
}
}
性能优化
索引优化
bash
# 刷新间隔优化
PUT /my_index/_settings
{
"refresh_interval": "30s"
}
# 批量索引时禁用刷新
PUT /my_index/_settings
{
"refresh_interval": "-1"
}
# 批量索引完成后恢复
PUT /my_index/_settings
{
"refresh_interval": "1s"
}
# 副本数量优化
PUT /my_index/_settings
{
"number_of_replicas": 0
}
# 强制合并段
POST /my_index/_forcemerge?max_num_segments=1
# 索引模板
PUT /_index_template/my_template
{
"index_patterns": ["logs-*"],
"template": {
"settings": {
"number_of_shards": 1,
"number_of_replicas": 1,
"refresh_interval": "30s"
},
"mappings": {
"properties": {
"@timestamp": {
"type": "date"
},
"message": {
"type": "text"
},
"level": {
"type": "keyword"
}
}
}
}
}
查询优化
bash
# 使用 filter 而不是 query(可缓存)
GET /my_index/_search
{
"query": {
"bool": {
"filter": [
{"term": {"author": "张三"}},
{"range": {"views": {"gte": 50}}}
]
}
}
}
# 使用 constant_score 查询
GET /my_index/_search
{
"query": {
"constant_score": {
"filter": {
"term": {"author": "张三"}
},
"boost": 1.0
}
}
}
# 分页优化(使用 search_after)
GET /my_index/_search
{
"size": 10,
"sort": [
{"publish_date": "desc"},
{"_id": "desc"}
]
}
# 后续页面
GET /my_index/_search
{
"size": 10,
"sort": [
{"publish_date": "desc"},
{"_id": "desc"}
],
"search_after": ["2024-12-19T10:00:00Z", "doc_id_here"]
}
硬件和配置优化
yaml
# elasticsearch.yml 性能配置
# 内存设置
bootstrap.memory_lock: true
# 线程池设置
thread_pool:
write:
size: 8
queue_size: 1000
search:
size: 13
queue_size: 1000
# 索引设置
indices.memory.index_buffer_size: 20%
indices.memory.min_index_buffer_size: 48mb
# 网络设置
network.tcp.keep_alive: true
network.tcp.reuse_address: true
# 发现设置
discovery.zen.fd.ping_timeout: 120s
discovery.zen.fd.ping_retries: 6
监控和运维
集群监控
bash
# 集群健康状态
GET /_cluster/health?level=indices
# 节点统计
GET /_nodes/stats
GET /_nodes/stats/indices,os,process,jvm,transport,http,fs,thread_pool
# 索引统计
GET /_stats
GET /my_index/_stats
# 分片分配
GET /_cat/shards?v
GET /_cluster/allocation/explain
# 任务管理
GET /_tasks
GET /_tasks?actions=*search*&detailed
# 热点线程
GET /_nodes/hot_threads
性能分析
bash
# 慢查询日志
PUT /_cluster/settings
{
"transient": {
"logger.org.elasticsearch.index.search.slowlog.query": "DEBUG",
"logger.org.elasticsearch.index.search.slowlog.fetch": "DEBUG",
"logger.org.elasticsearch.index.indexing.slowlog.index": "DEBUG"
}
}
# 索引级别的慢查询设置
PUT /my_index/_settings
{
"index.search.slowlog.threshold.query.warn": "10s",
"index.search.slowlog.threshold.query.info": "5s",
"index.search.slowlog.threshold.query.debug": "2s",
"index.search.slowlog.threshold.query.trace": "500ms",
"index.search.slowlog.threshold.fetch.warn": "1s",
"index.search.slowlog.threshold.fetch.info": "800ms",
"index.search.slowlog.threshold.fetch.debug": "500ms",
"index.search.slowlog.threshold.fetch.trace": "200ms"
}
# 查询分析
GET /my_index/_search
{
"profile": true,
"query": {
"match": {
"title": "elasticsearch"
}
}
}
备份和恢复
bash
# 创建快照仓库
PUT /_snapshot/my_backup
{
"type": "fs",
"settings": {
"location": "/mount/backups/my_backup",
"compress": true
}
}
# 创建快照
PUT /_snapshot/my_backup/snapshot_1
{
"indices": "my_index,another_index",
"ignore_unavailable": true,
"include_global_state": false,
"metadata": {
"taken_by": "admin",
"taken_because": "backup before upgrade"
}
}
# 查看快照
GET /_snapshot/my_backup/_all
GET /_snapshot/my_backup/snapshot_1
# 恢复快照
POST /_snapshot/my_backup/snapshot_1/_restore
{
"indices": "my_index",
"ignore_unavailable": true,
"include_global_state": false,
"rename_pattern": "my_index",
"rename_replacement": "restored_my_index"
}
# 删除快照
DELETE /_snapshot/my_backup/snapshot_1
实际应用案例
1. 日志分析系统
bash
# 创建日志索引模板
PUT /_index_template/logs_template
{
"index_patterns": ["logs-*"],
"template": {
"settings": {
"number_of_shards": 1,
"number_of_replicas": 1,
"index.lifecycle.name": "logs_policy",
"index.lifecycle.rollover_alias": "logs"
},
"mappings": {
"properties": {
"@timestamp": {
"type": "date"
},
"level": {
"type": "keyword"
},
"message": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"service": {
"type": "keyword"
},
"host": {
"type": "keyword"
},
"ip": {
"type": "ip"
},
"user_agent": {
"type": "text"
},
"response_time": {
"type": "float"
}
}
}
}
}
# 创建索引生命周期策略
PUT /_ilm/policy/logs_policy
{
"policy": {
"phases": {
"hot": {
"actions": {
"rollover": {
"max_size": "5GB",
"max_age": "1d"
}
}
},
"warm": {
"min_age": "7d",
"actions": {
"allocate": {
"number_of_replicas": 0
},
"forcemerge": {
"max_num_segments": 1
}
}
},
"cold": {
"min_age": "30d",
"actions": {
"allocate": {
"number_of_replicas": 0
}
}
},
"delete": {
"min_age": "90d",
"actions": {
"delete": {}
}
}
}
}
}
# 日志分析查询
GET /logs-*/_search
{
"size": 0,
"query": {
"bool": {
"filter": [
{
"range": {
"@timestamp": {
"gte": "now-1h"
}
}
}
]
}
},
"aggs": {
"error_count": {
"filter": {
"term": {"level": "ERROR"}
}
},
"services": {
"terms": {
"field": "service",
"size": 10
},
"aggs": {
"avg_response_time": {
"avg": {
"field": "response_time"
}
}
}
},
"timeline": {
"date_histogram": {
"field": "@timestamp",
"fixed_interval": "5m"
},
"aggs": {
"error_rate": {
"filter": {
"term": {"level": "ERROR"}
}
}
}
}
}
}
2. 电商搜索系统
bash
# 创建商品索引
PUT /products
{
"settings": {
"number_of_shards": 3,
"number_of_replicas": 1,
"analysis": {
"analyzer": {
"product_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": ["lowercase", "stop", "synonym"]
}
},
"filter": {
"synonym": {
"type": "synonym",
"synonyms": [
"手机,mobile,phone",
"电脑,computer,pc",
"笔记本,laptop,notebook"
]
}
}
}
},
"mappings": {
"properties": {
"name": {
"type": "text",
"analyzer": "product_analyzer",
"fields": {
"keyword": {
"type": "keyword"
}
}
},
"description": {
"type": "text",
"analyzer": "product_analyzer"
},
"category": {
"type": "keyword"
},
"brand": {
"type": "keyword"
},
"price": {
"type": "float"
},
"rating": {
"type": "float"
},
"sales_count": {
"type": "integer"
},
"tags": {
"type": "keyword"
},
"created_at": {
"type": "date"
},
"in_stock": {
"type": "boolean"
}
}
}
}
# 商品搜索查询
GET /products/_search
{
"query": {
"function_score": {
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "苹果手机",
"fields": ["name^3", "description", "tags^2"],
"type": "best_fields",
"fuzziness": "AUTO"
}
}
],
"filter": [
{"term": {"in_stock": true}},
{"range": {"price": {"gte": 1000, "lte": 8000}}},
{"range": {"rating": {"gte": 4.0}}}
]
}
},
"functions": [
{
"filter": {"range": {"sales_count": {"gte": 100}}},
"weight": 1.5
},
{
"gauss": {
"created_at": {
"origin": "now",
"scale": "30d",
"decay": 0.5
}
}
}
],
"score_mode": "multiply",
"boost_mode": "multiply"
}
},
"sort": [
"_score",
{"sales_count": {"order": "desc"}}
],
"aggs": {
"categories": {
"terms": {
"field": "category",
"size": 10
}
},
"brands": {
"terms": {
"field": "brand",
"size": 10
}
},
"price_ranges": {
"range": {
"field": "price",
"ranges": [
{"to": 1000},
{"from": 1000, "to": 3000},
{"from": 3000, "to": 5000},
{"from": 5000}
]
}
}
},
"highlight": {
"fields": {
"name": {},
"description": {
"fragment_size": 150,
"number_of_fragments": 3
}
}
}
}
3. 地理位置搜索
bash
# 创建位置索引
PUT /locations
{
"mappings": {
"properties": {
"name": {
"type": "text"
},
"location": {
"type": "geo_point"
},
"category": {
"type": "keyword"
},
"rating": {
"type": "float"
}
}
}
}
# 添加位置数据
POST /locations/_bulk
{"index":{"_id":"1"}}
{"name":"星巴克咖啡","location":{"lat":39.9042,"lon":116.4074},"category":"咖啡店","rating":4.5}
{"index":{"_id":"2"}}
{"name":"麦当劳","location":{"lat":39.9052,"lon":116.4084},"category":"快餐","rating":4.2}
{"index":{"_id":"3"}}
{"name":"肯德基","location":{"lat":39.9032,"lon":116.4064},"category":"快餐","rating":4.1}
# 地理位置搜索
GET /locations/_search
{
"query": {
"bool": {
"must": {
"match_all": {}
},
"filter": {
"geo_distance": {
"distance": "1km",
"location": {
"lat": 39.9042,
"lon": 116.4074
}
}
}
}
},
"sort": [
{
"_geo_distance": {
"location": {
"lat": 39.9042,
"lon": 116.4074
},
"order": "asc",
"unit": "km",
"mode": "min",
"distance_type": "arc"
}
}
]
}
# 地理边界搜索
GET /locations/_search
{
"query": {
"geo_bounding_box": {
"location": {
"top_left": {
"lat": 39.91,
"lon": 116.40
},
"bottom_right": {
"lat": 39.90,
"lon": 116.42
}
}
}
}
}
最佳实践
1. 索引设计
- 分片数量:根据数据量和查询负载确定,通常每个分片 20-40GB
- 副本数量:根据可用性要求和查询负载确定
- 映射设计:合理设置字段类型,避免动态映射
- 索引模板:使用索引模板统一管理索引设置
2. 查询优化
- 使用 filter:对于精确匹配使用 filter 而不是 query
- 避免深分页:使用 search_after 代替 from/size
- 合理使用聚合:避免高基数字段的 terms 聚合
- 缓存利用:充分利用查询缓存和请求缓存
3. 性能调优
- 内存设置:JVM 堆内存不超过 32GB,设置为物理内存的一半
- 磁盘优化:使用 SSD,合理设置文件系统
- 网络优化:使用高速网络,调整网络参数
- 监控告警:设置关键指标监控和告警
4. 运维管理
- 定期备份:制定备份策略和恢复测试
- 索引生命周期:使用 ILM 管理索引生命周期
- 集群升级:制定滚动升级策略
- 容量规划:监控资源使用,提前扩容
总结
Elasticsearch 是一个功能强大的分布式搜索和分析引擎,适用于全文搜索、日志分析、实时分析等多种场景。通过合理的索引设计、查询优化和集群管理,Elasticsearch 能够为应用提供高性能、高可用的搜索和分析能力。
优势
- 强大的全文搜索能力
- 丰富的查询和聚合功能
- 良好的水平扩展能力
- 近实时的搜索响应
- 活跃的社区和生态
适用场景
- 企业搜索平台
- 日志分析系统
- 电商搜索引擎
- 实时数据分析
- 监控和告警系统
通过深入理解 Elasticsearch 的核心概念和最佳实践,可以充分发挥其在现代数据处理和搜索场景中的价值。