Update README.md
This commit is contained in:
parent
404d403d8b
commit
dd05ce437f
1 changed files with 442 additions and 0 deletions
442
README.md
442
README.md
|
|
@ -0,0 +1,442 @@
|
|||
## Overview
|
||||
|
||||
This project provides lightweight disk monitoring for a VPS with:
|
||||
|
||||
- Daily storage reports
|
||||
|
||||
- Rapid disk usage change detection
|
||||
|
||||
- Alerts sent to a Matrix room
|
||||
|
||||
- Minimal dependencies (Python + SQLite only)
|
||||
|
||||
|
||||
It is designed to be:
|
||||
|
||||
- Simple
|
||||
|
||||
- Transparent
|
||||
|
||||
- Easy to debug
|
||||
|
||||
- Low overhead
|
||||
|
||||
|
||||
---
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
systemd timers
|
||||
↓
|
||||
Python script (disk_monitor.py)
|
||||
↓
|
||||
SQLite (local state/history)
|
||||
↓
|
||||
Matrix API (alerts + reports)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Components
|
||||
|
||||
### 1. Python Script
|
||||
|
||||
**Location:**
|
||||
|
||||
```
|
||||
/opt/diskmon/disk_monitor.py
|
||||
```
|
||||
|
||||
**Responsibilities:**
|
||||
|
||||
- Collect disk usage stats
|
||||
|
||||
- Store historical samples
|
||||
|
||||
- Detect rapid changes
|
||||
|
||||
- Format messages
|
||||
|
||||
- Send messages to Matrix
|
||||
|
||||
|
||||
---
|
||||
|
||||
### 2. SQLite Database
|
||||
|
||||
**Location:**
|
||||
|
||||
```
|
||||
/var/lib/diskmon/diskmon.sqlite3
|
||||
```
|
||||
|
||||
**Purpose:**
|
||||
|
||||
- Store disk usage history
|
||||
|
||||
- Track alert cooldowns
|
||||
|
||||
|
||||
---
|
||||
|
||||
### 3. Environment Config
|
||||
|
||||
**Location:**
|
||||
|
||||
```
|
||||
/etc/diskmon.env
|
||||
```
|
||||
|
||||
**Contents:**
|
||||
|
||||
```
|
||||
MATRIX_HOMESERVER=https://matrix.yourdomain.com
|
||||
MATRIX_ROOM_ID=!roomid:yourdomain.com
|
||||
MATRIX_ACCESS_TOKEN=your_token
|
||||
DISKMON_DB=/var/lib/diskmon/diskmon.sqlite3
|
||||
DISKMON_MOUNT=/
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 4. systemd Timers
|
||||
|
||||
#### Sample Timer (every 5 min)
|
||||
|
||||
```
|
||||
diskmon-sample.timer
|
||||
```
|
||||
|
||||
#### Report Timer (daily)
|
||||
|
||||
```
|
||||
diskmon-report.timer
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Data Flow
|
||||
|
||||
### Sampling Loop (every 5 minutes)
|
||||
|
||||
1. Read disk usage (`shutil.disk_usage`)
|
||||
|
||||
2. Insert sample into SQLite
|
||||
|
||||
3. Compare against:
|
||||
|
||||
- 10-minute-old sample
|
||||
|
||||
- 60-minute-old sample
|
||||
|
||||
4. Trigger alerts if thresholds exceeded
|
||||
|
||||
5. Apply cooldown logic
|
||||
|
||||
|
||||
---
|
||||
|
||||
### Daily Report
|
||||
|
||||
1. Read current disk usage
|
||||
|
||||
2. Format summary
|
||||
|
||||
3. Send to Matrix
|
||||
|
||||
|
||||
---
|
||||
|
||||
## Database Schema
|
||||
|
||||
### samples
|
||||
|
||||
|column|type|description|
|
||||
|---|---|---|
|
||||
|id|int|primary key|
|
||||
|ts|int|unix timestamp|
|
||||
|mount|text|mount path|
|
||||
|used_bytes|int|used disk space|
|
||||
|avail_bytes|int|free space|
|
||||
|total_bytes|int|total capacity|
|
||||
|
||||
---
|
||||
|
||||
### alerts
|
||||
|
||||
|column|type|description|
|
||||
|---|---|---|
|
||||
|key|text|alert identifier|
|
||||
|last_sent_ts|int|last time alert was triggered|
|
||||
|
||||
---
|
||||
|
||||
## Alert Logic
|
||||
|
||||
### Thresholds
|
||||
|
||||
|Condition|Trigger|
|
||||
|---|---|
|
||||
|Warning|≥ 1 GiB increase in 10 minutes|
|
||||
|Critical|≥ 10 GiB increase in 60 minutes|
|
||||
|
||||
---
|
||||
|
||||
### Cooldowns
|
||||
|
||||
|Alert Type|Cooldown|
|
||||
|---|---|
|
||||
|Warning|30 minutes|
|
||||
|Critical|60 minutes|
|
||||
|
||||
---
|
||||
|
||||
### Why cooldowns exist
|
||||
|
||||
Prevents:
|
||||
|
||||
- Alert spam
|
||||
|
||||
- Repeated messages for same event
|
||||
|
||||
- Noise during sustained writes
|
||||
|
||||
|
||||
---
|
||||
|
||||
## Message Formats
|
||||
|
||||
### Daily Report
|
||||
|
||||
```
|
||||
[VPS Storage Report]
|
||||
Mount: /
|
||||
Used: 48.2 GiB
|
||||
Available: 131.7 GiB
|
||||
Total: 180.0 GiB
|
||||
Usage: 26.8%
|
||||
Timestamp: 2026-04-01 09:00:00 EDT
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Alert
|
||||
|
||||
```
|
||||
[Storage Alert]
|
||||
Mount: /
|
||||
Used space increased by 1.4 GiB in 10 minutes
|
||||
Previous used: 48.2 GiB
|
||||
Current used: 49.6 GiB
|
||||
Timestamp: 2026-04-01 09:40:00 EDT
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Monitoring the System
|
||||
|
||||
### Check timers
|
||||
|
||||
```
|
||||
systemctl list-timers | grep diskmon
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Check logs
|
||||
|
||||
#### Sample job
|
||||
|
||||
```
|
||||
journalctl -u diskmon-sample.service -f
|
||||
```
|
||||
|
||||
#### Report job
|
||||
|
||||
```
|
||||
journalctl -u diskmon-report.service -f
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Run manually
|
||||
|
||||
```
|
||||
systemctl start diskmon-sample.service
|
||||
systemctl start diskmon-report.service
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Check service status
|
||||
|
||||
```
|
||||
systemctl status diskmon-sample.service
|
||||
systemctl status diskmon-report.service
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Debugging
|
||||
|
||||
### 1. Environment variables not found
|
||||
|
||||
**Symptom:**
|
||||
|
||||
```
|
||||
KeyError: MATRIX_HOMESERVER
|
||||
```
|
||||
|
||||
**Fix:**
|
||||
|
||||
```
|
||||
set -a
|
||||
source /etc/diskmon.env
|
||||
set +a
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 2. SQLite errors
|
||||
|
||||
**Symptom:**
|
||||
|
||||
```
|
||||
sqlite3.OperationalError
|
||||
```
|
||||
|
||||
**Fix:**
|
||||
|
||||
- Check SQL syntax
|
||||
|
||||
- Delete DB and recreate if needed:
|
||||
|
||||
|
||||
```
|
||||
rm /var/lib/diskmon/diskmon.sqlite3
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 3. No Matrix messages
|
||||
|
||||
Check:
|
||||
|
||||
- correct homeserver URL
|
||||
|
||||
- valid access token
|
||||
|
||||
- correct room ID
|
||||
|
||||
- HTTPS used
|
||||
|
||||
|
||||
---
|
||||
|
||||
### 4. Script not running
|
||||
|
||||
```
|
||||
systemctl status diskmon-sample.timer
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Testing Alerts
|
||||
|
||||
### Trigger disk usage spike
|
||||
|
||||
```
|
||||
fallocate -l 2G /tmp/testfile
|
||||
```
|
||||
|
||||
Wait ~5–10 minutes.
|
||||
|
||||
Cleanup:
|
||||
|
||||
```
|
||||
rm /tmp/testfile
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Maintenance
|
||||
|
||||
### View database
|
||||
|
||||
```
|
||||
sqlite3 /var/lib/diskmon/diskmon.sqlite3
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Clean old data
|
||||
|
||||
Handled automatically:
|
||||
|
||||
- keeps ~2 days of samples
|
||||
|
||||
|
||||
---
|
||||
|
||||
## Extending the System
|
||||
|
||||
### Possible improvements
|
||||
|
||||
- Monitor multiple mounts
|
||||
|
||||
- Add low disk space alerts (e.g. <20GB)
|
||||
|
||||
- Send HTML-formatted Matrix messages
|
||||
|
||||
- Integrate with Uptime Kuma push monitor
|
||||
|
||||
- Add inode monitoring
|
||||
|
||||
- Add disk I/O rate tracking
|
||||
|
||||
|
||||
---
|
||||
|
||||
## Design Philosophy
|
||||
|
||||
This system intentionally avoids:
|
||||
|
||||
- Prometheus
|
||||
|
||||
- external monitoring stacks
|
||||
|
||||
- heavy dependencies
|
||||
|
||||
|
||||
Instead it focuses on:
|
||||
|
||||
- clarity
|
||||
|
||||
- reliability
|
||||
|
||||
- minimalism
|
||||
|
||||
- full control over alert logic
|
||||
|
||||
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
This setup provides:
|
||||
|
||||
- Continuous disk monitoring
|
||||
|
||||
- Time-window-based change detection
|
||||
|
||||
- Daily reporting
|
||||
|
||||
- Matrix integration
|
||||
|
||||
- Minimal operational overhead
|
||||
|
||||
|
||||
All in ~1 script + systemd.
|
||||
|
||||
---
|
||||
Loading…
Add table
Add a link
Reference in a new issue