## Overview This project provides lightweight disk monitoring for a VPS with: - Daily storage reports - Rapid disk usage change detection - Alerts sent to a Matrix room - Minimal dependencies (Python + SQLite only) It is designed to be: - Simple - Transparent - Easy to debug - Low overhead --- ## Architecture ``` systemd timers ↓ Python script (disk_monitor.py) ↓ SQLite (local state/history) ↓ Matrix API (alerts + reports) ``` --- ## Components ### 1. Python Script **Location:** ``` /opt/diskmon/disk_monitor.py ``` **Responsibilities:** - Collect disk usage stats - Store historical samples - Detect rapid changes - Format messages - Send messages to Matrix --- ### 2. SQLite Database **Location:** ``` /var/lib/diskmon/diskmon.sqlite3 ``` **Purpose:** - Store disk usage history - Track alert cooldowns --- ### 3. Environment Config **Location:** ``` /etc/diskmon.env ``` **Contents:** ``` MATRIX_HOMESERVER=https://matrix.yourdomain.com MATRIX_ROOM_ID=!roomid:yourdomain.com MATRIX_ACCESS_TOKEN=your_token DISKMON_DB=/var/lib/diskmon/diskmon.sqlite3 DISKMON_MOUNT=/ ``` --- ### 4. systemd Timers #### Sample Timer (every 5 min) ``` diskmon-sample.timer ``` #### Report Timer (daily) ``` diskmon-report.timer ``` --- ## Data Flow ### Sampling Loop (every 5 minutes) 1. Read disk usage (`shutil.disk_usage`) 2. Insert sample into SQLite 3. Compare against: - 10-minute-old sample - 60-minute-old sample 4. Trigger alerts if thresholds exceeded 5. Apply cooldown logic --- ### Daily Report 1. Read current disk usage 2. Format summary 3. Send to Matrix --- ## Database Schema ### samples |column|type|description| |---|---|---| |id|int|primary key| |ts|int|unix timestamp| |mount|text|mount path| |used_bytes|int|used disk space| |avail_bytes|int|free space| |total_bytes|int|total capacity| --- ### alerts |column|type|description| |---|---|---| |key|text|alert identifier| |last_sent_ts|int|last time alert was triggered| --- ## Alert Logic ### Thresholds |Condition|Trigger| |---|---| |Warning|≥ 1 GiB increase in 10 minutes| |Critical|≥ 10 GiB increase in 60 minutes| --- ### Cooldowns |Alert Type|Cooldown| |---|---| |Warning|30 minutes| |Critical|60 minutes| --- ### Why cooldowns exist Prevents: - Alert spam - Repeated messages for same event - Noise during sustained writes --- ## Message Formats ### Daily Report ``` [VPS Storage Report] Mount: / Used: 48.2 GiB Available: 131.7 GiB Total: 180.0 GiB Usage: 26.8% Timestamp: 2026-04-01 09:00:00 EDT ``` --- ### Alert ``` [Storage Alert] Mount: / Used space increased by 1.4 GiB in 10 minutes Previous used: 48.2 GiB Current used: 49.6 GiB Timestamp: 2026-04-01 09:40:00 EDT ``` --- ## Monitoring the System ### Check timers ``` systemctl list-timers | grep diskmon ``` --- ### Check logs #### Sample job ``` journalctl -u diskmon-sample.service -f ``` #### Report job ``` journalctl -u diskmon-report.service -f ``` --- ### Run manually ``` systemctl start diskmon-sample.service systemctl start diskmon-report.service ``` --- ### Check service status ``` systemctl status diskmon-sample.service systemctl status diskmon-report.service ``` --- ## Debugging ### 1. Environment variables not found **Symptom:** ``` KeyError: MATRIX_HOMESERVER ``` **Fix:** ``` set -a source /etc/diskmon.env set +a ``` --- ### 2. SQLite errors **Symptom:** ``` sqlite3.OperationalError ``` **Fix:** - Check SQL syntax - Delete DB and recreate if needed: ``` rm /var/lib/diskmon/diskmon.sqlite3 ``` --- ### 3. No Matrix messages Check: - correct homeserver URL - valid access token - correct room ID - HTTPS used --- ### 4. Script not running ``` systemctl status diskmon-sample.timer ``` --- ## Testing Alerts ### Trigger disk usage spike ``` fallocate -l 2G /tmp/testfile ``` Wait ~5–10 minutes. Cleanup: ``` rm /tmp/testfile ``` --- ## Maintenance ### View database ``` sqlite3 /var/lib/diskmon/diskmon.sqlite3 ``` --- ### Clean old data Handled automatically: - keeps ~2 days of samples --- ## Extending the System ### Possible improvements - Monitor multiple mounts - Add low disk space alerts (e.g. <20GB) - Send HTML-formatted Matrix messages - Integrate with Uptime Kuma push monitor - Add inode monitoring - Add disk I/O rate tracking --- ## Design Philosophy This system intentionally avoids: - Prometheus - external monitoring stacks - heavy dependencies Instead it focuses on: - clarity - reliability - minimalism - full control over alert logic --- ## Summary This setup provides: - Continuous disk monitoring - Time-window-based change detection - Daily reporting - Matrix integration - Minimal operational overhead All in ~1 script + systemd. ---