From dd05ce437f25a4a996217224e7052058b3bd6edb Mon Sep 17 00:00:00 2001 From: "Mr.Bowtie" Date: Thu, 2 Apr 2026 11:59:03 +0000 Subject: [PATCH] Update README.md --- README.md | 442 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 442 insertions(+) diff --git a/README.md b/README.md index e69de29..5c99922 100644 --- a/README.md +++ b/README.md @@ -0,0 +1,442 @@ +## Overview + +This project provides lightweight disk monitoring for a VPS with: + +- Daily storage reports + +- Rapid disk usage change detection + +- Alerts sent to a Matrix room + +- Minimal dependencies (Python + SQLite only) + + +It is designed to be: + +- Simple + +- Transparent + +- Easy to debug + +- Low overhead + + +--- + +## Architecture + +``` +systemd timers + ↓ +Python script (disk_monitor.py) + ↓ +SQLite (local state/history) + ↓ +Matrix API (alerts + reports) +``` + +--- + +## Components + +### 1. Python Script + +**Location:** + +``` +/opt/diskmon/disk_monitor.py +``` + +**Responsibilities:** + +- Collect disk usage stats + +- Store historical samples + +- Detect rapid changes + +- Format messages + +- Send messages to Matrix + + +--- + +### 2. SQLite Database + +**Location:** + +``` +/var/lib/diskmon/diskmon.sqlite3 +``` + +**Purpose:** + +- Store disk usage history + +- Track alert cooldowns + + +--- + +### 3. Environment Config + +**Location:** + +``` +/etc/diskmon.env +``` + +**Contents:** + +``` +MATRIX_HOMESERVER=https://matrix.yourdomain.com +MATRIX_ROOM_ID=!roomid:yourdomain.com +MATRIX_ACCESS_TOKEN=your_token +DISKMON_DB=/var/lib/diskmon/diskmon.sqlite3 +DISKMON_MOUNT=/ +``` + +--- + +### 4. systemd Timers + +#### Sample Timer (every 5 min) + +``` +diskmon-sample.timer +``` + +#### Report Timer (daily) + +``` +diskmon-report.timer +``` + +--- + +## Data Flow + +### Sampling Loop (every 5 minutes) + +1. Read disk usage (`shutil.disk_usage`) + +2. Insert sample into SQLite + +3. Compare against: + + - 10-minute-old sample + + - 60-minute-old sample + +4. Trigger alerts if thresholds exceeded + +5. Apply cooldown logic + + +--- + +### Daily Report + +1. Read current disk usage + +2. Format summary + +3. Send to Matrix + + +--- + +## Database Schema + +### samples + +|column|type|description| +|---|---|---| +|id|int|primary key| +|ts|int|unix timestamp| +|mount|text|mount path| +|used_bytes|int|used disk space| +|avail_bytes|int|free space| +|total_bytes|int|total capacity| + +--- + +### alerts + +|column|type|description| +|---|---|---| +|key|text|alert identifier| +|last_sent_ts|int|last time alert was triggered| + +--- + +## Alert Logic + +### Thresholds + +|Condition|Trigger| +|---|---| +|Warning|≥ 1 GiB increase in 10 minutes| +|Critical|≥ 10 GiB increase in 60 minutes| + +--- + +### Cooldowns + +|Alert Type|Cooldown| +|---|---| +|Warning|30 minutes| +|Critical|60 minutes| + +--- + +### Why cooldowns exist + +Prevents: + +- Alert spam + +- Repeated messages for same event + +- Noise during sustained writes + + +--- + +## Message Formats + +### Daily Report + +``` +[VPS Storage Report] +Mount: / +Used: 48.2 GiB +Available: 131.7 GiB +Total: 180.0 GiB +Usage: 26.8% +Timestamp: 2026-04-01 09:00:00 EDT +``` + +--- + +### Alert + +``` +[Storage Alert] +Mount: / +Used space increased by 1.4 GiB in 10 minutes +Previous used: 48.2 GiB +Current used: 49.6 GiB +Timestamp: 2026-04-01 09:40:00 EDT +``` + +--- + +## Monitoring the System + +### Check timers + +``` +systemctl list-timers | grep diskmon +``` + +--- + +### Check logs + +#### Sample job + +``` +journalctl -u diskmon-sample.service -f +``` + +#### Report job + +``` +journalctl -u diskmon-report.service -f +``` + +--- + +### Run manually + +``` +systemctl start diskmon-sample.service +systemctl start diskmon-report.service +``` + +--- + +### Check service status + +``` +systemctl status diskmon-sample.service +systemctl status diskmon-report.service +``` + +--- + +## Debugging + +### 1. Environment variables not found + +**Symptom:** + +``` +KeyError: MATRIX_HOMESERVER +``` + +**Fix:** + +``` +set -a +source /etc/diskmon.env +set +a +``` + +--- + +### 2. SQLite errors + +**Symptom:** + +``` +sqlite3.OperationalError +``` + +**Fix:** + +- Check SQL syntax + +- Delete DB and recreate if needed: + + +``` +rm /var/lib/diskmon/diskmon.sqlite3 +``` + +--- + +### 3. No Matrix messages + +Check: + +- correct homeserver URL + +- valid access token + +- correct room ID + +- HTTPS used + + +--- + +### 4. Script not running + +``` +systemctl status diskmon-sample.timer +``` + +--- + +## Testing Alerts + +### Trigger disk usage spike + +``` +fallocate -l 2G /tmp/testfile +``` + +Wait ~5–10 minutes. + +Cleanup: + +``` +rm /tmp/testfile +``` + +--- + +## Maintenance + +### View database + +``` +sqlite3 /var/lib/diskmon/diskmon.sqlite3 +``` + +--- + +### Clean old data + +Handled automatically: + +- keeps ~2 days of samples + + +--- + +## Extending the System + +### Possible improvements + +- Monitor multiple mounts + +- Add low disk space alerts (e.g. <20GB) + +- Send HTML-formatted Matrix messages + +- Integrate with Uptime Kuma push monitor + +- Add inode monitoring + +- Add disk I/O rate tracking + + +--- + +## Design Philosophy + +This system intentionally avoids: + +- Prometheus + +- external monitoring stacks + +- heavy dependencies + + +Instead it focuses on: + +- clarity + +- reliability + +- minimalism + +- full control over alert logic + + +--- + +## Summary + +This setup provides: + +- Continuous disk monitoring + +- Time-window-based change detection + +- Daily reporting + +- Matrix integration + +- Minimal operational overhead + + +All in ~1 script + systemd. + +--- \ No newline at end of file