日志系统从ELK到PLG
为什么要换掉ELK
话说使用ELK(包括Kafka和Filebeat)做日志统计分析已经有三年多了,用着还是不错的,功能很强大,当然学习门槛也挺高,当年搞了很久的grok,但更大的缺点是资源消耗有点大,所以在实际工作中,碰到资源紧张的环境就没法用了,有一点局限性。
最近听说Grafana搞了一个Loki,配合Promtail组成PLG日志系统还不错,于是打算试试。初步测试下来,PLG的资源占用的确少太多,主要还是因为ELK里ElasticSearch和Logstash都是JAVA写的,Logstash还用了JRuby,太吃内存:
ELK要能跑起来,基本要求:ElasticSearch-2G内存,Logstash-1G内存,Kibana-一两百M(node.js比JVM良心多了),Kafka是可选的不算,Filebeat-几十M(go更良心)
PLG则省心得多,三个加起来有个一两百M就能跑了……
PLG和ELK的对应关系基本上是这样:
- Grafana是页面前端,对应Kibana
- Loki是数据存储和查询引擎,对应ElasticSearch
- Promtail是日志收集分析客户端,对应Logstash+Filebeat
Loki和Grafana的安装
当然是用docker最简单,参见docker-compose.yml文件:
version: '2'
services:
loki:
image: grafana/loki:master
container_name: loki
restart: always
ports:
- 127.0.0.1:3100:3100
volumes:
- /var/lib/loki:/loki
- /etc/loki:/etc/loki
grafana:
image: grafana/grafana:master
container_name: grafana
restart: always
depends_on:
- loki
ports:
- 127.0.0.1:3000:3000
volumes:
- /var/lib/grafana:/var/lib/grafana
注意,需要创建以下几个文件夹:
/etc/loki: owner/group为root
/var/lib/loki: owner/group为10001-容器里loki用户的ID
/var/lib/grafana: owner/group为472-容器里grafana用户的ID
loki的配置文件/etc/loki/local-config.yaml:
auth_enabled: false
server:
http_listen_port: 3100
ingester:
lifecycler:
address: 127.0.0.1
ring:
kvstore:
store: inmemory
replication_factor: 1
final_sleep: 0s
chunk_idle_period: 5m # Any chunk not receiving new logs in this time will be flushed
max_chunk_age: 1h # All chunks will be flushed when they hit this age, default is 1h
# chunk_target_size: 1048576 # Loki will attempt to build chunks up to 1.5MB, flushing first if chunk_idle_period or max_chunk_age is reached first
chunk_retain_period: 30s # Must be greater than index read cache TTL if using an index cache (Default index read cache TTL is 5m)
max_transfer_retries: 0 # Chunk transfers disabled
schema_config:
configs:
- from: 2020-10-24
store: boltdb-shipper
object_store: filesystem
schema: v11
index:
prefix: index_
period: 24h
chunks:
period: 24h
storage_config:
boltdb_shipper:
active_index_directory: /loki/boltdb-shipper-active
cache_location: /loki/boltdb-shipper-cache
cache_ttl: 24h # Can be increased for faster performance over longer query periods, uses more disk space
shared_store: filesystem
filesystem:
directory: /loki/chunks
compactor:
working_directory: /loki/boltdb-shipper-compactor
shared_store: filesystem
limits_config:
reject_old_samples: true
reject_old_samples_max_age: 168h # 7days
chunk_store_config:
max_look_back_period: 2160h # 90days
table_manager:
retention_deletes_enabled: false
retention_period: 2160h # 90days
ruler:
storage:
type: local
local:
directory: /loki/rules
rule_path: /loki/rules-temp
alertmanager_url: http://localhost:9093
ring:
kvstore:
store: inmemory
enable_api: true
基于默认的配置改了几个时间,具体参数定义详见官方文档。
现在可以启动了:
docker-compose up -d
启动完即可登录Grafana添加Loki数据源。默认的Grafana用户密码为:admin/admin。注意:Loki的地址为容器名:loki,不是localhost。
Promtail的安装和配置
直接在Loki的Release页面下载相应版本的Promtail,解压即可直接运行。
以读取Nginx日志为例配置promtail-config.yaml如下:
# Promtail Server Config
server:
http_listen_port: 9080
grpc_listen_port: 0
# Positions
positions:
filename: /tmp/positions.yaml
# Loki服务器的地址
clients:
- url: http://localhost:3100/loki/api/v1/push
scrape_configs:
- job_name: nginx
static_configs:
- targets:
- localhost
labels:
job: 'nginx-access'
app: 'nginx-access'
host: 'your_hostname'
__path__: /var/log/nginx/*.access.log
pipeline_stages:
- match:
selector: '{job="nginx-access"}'
stages:
- regex:
expression: '^(?P<client_ip>[\w\.]+) - (?P<auth>[^ ]*) \[(?P<timestamp>.*)\] "(?P<verb>[^ ]*) ?(?P<request>[^ ]*)? ?(?P<protocol>[^ ]*)?" (?P<status>[\d]+) (?P<response>[\d]+) "(?P<referer>[^"]*)" "(?P<agent>[^"]*)"'
- labels:
client_ip:
auth:
timestamp:
verb:
request:
response:
referer:
agent:
status:
- timestamp:
source: timestamp
format: "02/Jan/2006:15:04:05 -0700"
个人觉得虽然Logstash用的grok功能强大,但是使用难度也大,还是Promtail这种正则表达式的方式简单得多,基本上也够用了。
用supervisor加持一下运行:
promtail -config.file=/path_to/promtail-config.yaml
推送到[go4pro.org]