The ELK stack -- Elasticsearch, Logstash, and Kibana -- is one of the most popular open-source solutions for centralized log management. In this guide, we'll walk through setting up a complete ELK stack on AWS to aggregate, search, and visualize logs from your infrastructure.
Architecture Overview
Our setup consists of the following components:
- **Elasticsearch** -- The search and analytics engine that stores and indexes your log data
- **Logstash** -- The data processing pipeline that ingests, transforms, and sends data to Elasticsearch
- **Kibana** -- The visualization layer that lets you explore and create dashboards from your data
For a production-ready deployment on AWS, we'll use an EC2 instance (or multiple instances for high availability) running all three components.
Prerequisites
- An AWS account with permissions to create EC2 instances and security groups
- A running EC2 instance with at least 4GB RAM (t3.medium or larger recommended)
- Ubuntu 18.04+ or Amazon Linux 2
- Java 8 or 11 installed
Step 1: Install Java
ELK stack requires Java. Install OpenJDK on your instance:
```bash # Ubuntu sudo apt-get update sudo apt-get install -y openjdk-11-jdk
# Amazon Linux 2 sudo yum install -y java-11-openjdk-devel
# Verify installation java -version ```
Step 2: Install Elasticsearch
Add the Elastic repository and install Elasticsearch:
```bash # Import the Elasticsearch PGP key wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -
# Add the repository echo "deb https://artifacts.elastic.co/packages/7.x/apt stable main" | sudo tee /etc/apt/sources.list.d/elastic-7.x.list
# Install Elasticsearch sudo apt-get update sudo apt-get install -y elasticsearch ```
Configure Elasticsearch by editing /etc/elasticsearch/elasticsearch.yml:
``yaml
cluster.name: my-elk-cluster
node.name: node-1
network.host: 0.0.0.0
discovery.type: single-node
``
Start and enable the service:
```bash sudo systemctl start elasticsearch sudo systemctl enable elasticsearch
# Verify it's running curl -X GET "localhost:9200" ```
Step 3: Install Logstash
``bash
sudo apt-get install -y logstash
``
Create a Logstash configuration file at /etc/logstash/conf.d/logstash.conf:
``` input { beats { port => 5044 } }
filter { if [type] == "syslog" { grok { match => { "message" => "%{SYSLOGTIMESTAMP:syslog_timestamp} %{SYSLOGHOST:syslog_hostname} %{DATA:syslog_program}(?:\[%{POSINT:syslog_pid}\])?: %{GREEDYDATA:syslog_message}" } } date { match => [ "syslog_timestamp", "MMM d HH:mm:ss", "MMM dd HH:mm:ss" ] } } }
output { elasticsearch { hosts => ["localhost:9200"] index => "%{[@metadata][beat]}-%{+YYYY.MM.dd}" } } ```
Start Logstash:
``bash
sudo systemctl start logstash
sudo systemctl enable logstash
``
Step 4: Install Kibana
``bash
sudo apt-get install -y kibana
``
Configure Kibana by editing /etc/kibana/kibana.yml:
``yaml
server.port: 5601
server.host: "0.0.0.0"
elasticsearch.hosts: ["http://localhost:9200"]
``
Start Kibana:
``bash
sudo systemctl start kibana
sudo systemctl enable kibana
``
Step 5: Configure AWS Security Groups
Make sure your EC2 security group allows the following inbound traffic:
| Port | Protocol | Source | Purpose | |------|----------|--------|---------| | 5601 | TCP | Your IP | Kibana web interface | | 9200 | TCP | VPC CIDR | Elasticsearch API | | 5044 | TCP | VPC CIDR | Logstash Beats input |
Important: Never expose Elasticsearch (port 9200) directly to the internet. Use security groups to restrict access to your VPC or specific IP ranges.
Step 6: Install Filebeat on Client Machines
To ship logs from your other servers, install Filebeat:
``bash
sudo apt-get install -y filebeat
``
Configure Filebeat at /etc/filebeat/filebeat.yml:
```yaml filebeat.inputs: - type: log enabled: true paths: - /var/log/*.log - /var/log/syslog
output.logstash: hosts: ["your-elk-server:5044"] ```
Start Filebeat:
``bash
sudo systemctl start filebeat
sudo systemctl enable filebeat
``
Step 7: Access Kibana
Open your browser and navigate to http://your-elk-server-ip:5601. You should see the Kibana welcome screen. Create an index pattern matching filebeat-* to start exploring your logs.
Production Considerations
For a production deployment, consider the following:
- **Use AWS Elasticsearch Service (OpenSearch)** -- AWS offers a managed Elasticsearch service that handles scaling, patching, and backups automatically
- **Multi-node cluster** -- Run at least 3 Elasticsearch nodes for high availability
- **Dedicated master nodes** -- Separate master-eligible nodes from data nodes
- **EBS volumes** -- Use gp3 or io2 EBS volumes for Elasticsearch data storage
- **Monitoring** -- Set up CloudWatch alarms for disk space, CPU, and JVM heap usage
- **Backups** -- Configure Elasticsearch snapshots to S3
- **Security** -- Enable X-Pack security or use an Nginx reverse proxy with authentication in front of Kibana
The ELK stack provides powerful log aggregation and visualization capabilities. While setting it up manually gives you full control, evaluate whether AWS OpenSearch Service might be a better fit for your production workloads, as it eliminates much of the operational overhead.