Update README.md

This commit is contained in:
Mr.Bowtie 2026-04-02 11:59:03 +00:00
commit dd05ce437f

442
README.md
View file

@ -0,0 +1,442 @@
## Overview
This project provides lightweight disk monitoring for a VPS with:
- Daily storage reports
- Rapid disk usage change detection
- Alerts sent to a Matrix room
- Minimal dependencies (Python + SQLite only)
It is designed to be:
- Simple
- Transparent
- Easy to debug
- Low overhead
---
## Architecture
```
systemd timers
Python script (disk_monitor.py)
SQLite (local state/history)
Matrix API (alerts + reports)
```
---
## Components
### 1. Python Script
**Location:**
```
/opt/diskmon/disk_monitor.py
```
**Responsibilities:**
- Collect disk usage stats
- Store historical samples
- Detect rapid changes
- Format messages
- Send messages to Matrix
---
### 2. SQLite Database
**Location:**
```
/var/lib/diskmon/diskmon.sqlite3
```
**Purpose:**
- Store disk usage history
- Track alert cooldowns
---
### 3. Environment Config
**Location:**
```
/etc/diskmon.env
```
**Contents:**
```
MATRIX_HOMESERVER=https://matrix.yourdomain.com
MATRIX_ROOM_ID=!roomid:yourdomain.com
MATRIX_ACCESS_TOKEN=your_token
DISKMON_DB=/var/lib/diskmon/diskmon.sqlite3
DISKMON_MOUNT=/
```
---
### 4. systemd Timers
#### Sample Timer (every 5 min)
```
diskmon-sample.timer
```
#### Report Timer (daily)
```
diskmon-report.timer
```
---
## Data Flow
### Sampling Loop (every 5 minutes)
1. Read disk usage (`shutil.disk_usage`)
2. Insert sample into SQLite
3. Compare against:
- 10-minute-old sample
- 60-minute-old sample
4. Trigger alerts if thresholds exceeded
5. Apply cooldown logic
---
### Daily Report
1. Read current disk usage
2. Format summary
3. Send to Matrix
---
## Database Schema
### samples
|column|type|description|
|---|---|---|
|id|int|primary key|
|ts|int|unix timestamp|
|mount|text|mount path|
|used_bytes|int|used disk space|
|avail_bytes|int|free space|
|total_bytes|int|total capacity|
---
### alerts
|column|type|description|
|---|---|---|
|key|text|alert identifier|
|last_sent_ts|int|last time alert was triggered|
---
## Alert Logic
### Thresholds
|Condition|Trigger|
|---|---|
|Warning|≥ 1 GiB increase in 10 minutes|
|Critical|≥ 10 GiB increase in 60 minutes|
---
### Cooldowns
|Alert Type|Cooldown|
|---|---|
|Warning|30 minutes|
|Critical|60 minutes|
---
### Why cooldowns exist
Prevents:
- Alert spam
- Repeated messages for same event
- Noise during sustained writes
---
## Message Formats
### Daily Report
```
[VPS Storage Report]
Mount: /
Used: 48.2 GiB
Available: 131.7 GiB
Total: 180.0 GiB
Usage: 26.8%
Timestamp: 2026-04-01 09:00:00 EDT
```
---
### Alert
```
[Storage Alert]
Mount: /
Used space increased by 1.4 GiB in 10 minutes
Previous used: 48.2 GiB
Current used: 49.6 GiB
Timestamp: 2026-04-01 09:40:00 EDT
```
---
## Monitoring the System
### Check timers
```
systemctl list-timers | grep diskmon
```
---
### Check logs
#### Sample job
```
journalctl -u diskmon-sample.service -f
```
#### Report job
```
journalctl -u diskmon-report.service -f
```
---
### Run manually
```
systemctl start diskmon-sample.service
systemctl start diskmon-report.service
```
---
### Check service status
```
systemctl status diskmon-sample.service
systemctl status diskmon-report.service
```
---
## Debugging
### 1. Environment variables not found
**Symptom:**
```
KeyError: MATRIX_HOMESERVER
```
**Fix:**
```
set -a
source /etc/diskmon.env
set +a
```
---
### 2. SQLite errors
**Symptom:**
```
sqlite3.OperationalError
```
**Fix:**
- Check SQL syntax
- Delete DB and recreate if needed:
```
rm /var/lib/diskmon/diskmon.sqlite3
```
---
### 3. No Matrix messages
Check:
- correct homeserver URL
- valid access token
- correct room ID
- HTTPS used
---
### 4. Script not running
```
systemctl status diskmon-sample.timer
```
---
## Testing Alerts
### Trigger disk usage spike
```
fallocate -l 2G /tmp/testfile
```
Wait ~510 minutes.
Cleanup:
```
rm /tmp/testfile
```
---
## Maintenance
### View database
```
sqlite3 /var/lib/diskmon/diskmon.sqlite3
```
---
### Clean old data
Handled automatically:
- keeps ~2 days of samples
---
## Extending the System
### Possible improvements
- Monitor multiple mounts
- Add low disk space alerts (e.g. <20GB)
- Send HTML-formatted Matrix messages
- Integrate with Uptime Kuma push monitor
- Add inode monitoring
- Add disk I/O rate tracking
---
## Design Philosophy
This system intentionally avoids:
- Prometheus
- external monitoring stacks
- heavy dependencies
Instead it focuses on:
- clarity
- reliability
- minimalism
- full control over alert logic
---
## Summary
This setup provides:
- Continuous disk monitoring
- Time-window-based change detection
- Daily reporting
- Matrix integration
- Minimal operational overhead
All in ~1 script + systemd.
---