diff --git a/README.md b/README.md index d7eb871..261c75b 100644 --- a/README.md +++ b/README.md @@ -1,19 +1,20 @@ -Cleanmedia -========== +# Cleanmedia A poor man's data retention policy for dendrite servers. -USAGE -===== +## USAGE Check the command line options with --help. You mainly pass it the dendrite configuration file as a means to find a) the media directory and b) the postgres credentials for the dendrite data base. -You can also pass in the number of days you want to keep remote media. +You can also pass in the number of days you want to keep remote +media. Optionally, you may also purge media from local users on the +homeserver. -How it works: -------------- +### How it works + +#### Purge remote media (default) cleanmedia scours the database for all entries in the media repository where user_id is an empty string (that is, the media was not uploaded @@ -24,40 +25,55 @@ configurable via command line and a default of 30 days) This includes a number of remote media that we might want to keep (e.g. avatar images of users on remote home servers). -But the main idea behind focusing on remote media is that a server -should be able to refetch remote media in case it is needed. It would -also make sense to delete local media, but that is more -complicated. (possible scenarios: local media older than Y days, rooms -that have been left by all users and are thus "unreachable", rooms that -have been upgraded but have users left in it, media that has not been "accessed" the last Y days,....) +The main idea behind focusing on remote media is that a server +should be able to refetch remote media in case it is needed. -But finding out these things and setting all these policies will be -way more difficult and in some cases we do not have the information -we'd need (e.g. if a media is part of an avatar image, or when media -has been accessed the last time). +#### Purging "local" media (optional) -In addition it performs some sanity checks and warns if inconsistencies occur: - 1) Are there thumbnails in the db that do not have - corresponding media file entries (in the db)? +It also makes sense to delete local media, and it is possible using the +option -l, but that is more complicated. (Local means, originating by +users on our homeserver.) -Requirements ----- +a) we might be the only source of our user's media, so any local media +that we purge might not be retrievable by anyone anymore - ever. + +b) it is not easy to decide which local media are safe to purge. + +Possible scenarios: local media older than Y days, rooms that have been +left by all users and are thus "unreachable", rooms that have been +upgraded but have users left in it, media that has not been "accessed" +the last Y days, .... + +Finding out these things and setting all these policies is way more +difficult and in some cases we do not have the information we'd need +(e.g. when media has been accessed the last time). + +Right now, we purge all older local media, except for user avatar +images. + +#### Sanity checks + +In addition, we perform some sanity checks and warns if inconsistencies +occur: + +1) Are there thumbnails in the db that do not have corresponding media + file entries (in the db)? + +## Requirements - Python >= 3.8 - psycopg2 - yaml -Todo ----- +## Todo - Sanity checks: Are files on the file system that the db does not know about? -LICENSE -======= +## LICENSE This code is released under the GNU GPL v3 or any later version. -There is no warranty for correctness or data that might be -accidentally deleted. Assume the worst and hope for the best! \ No newline at end of file +**There is no warranty for correctness or data that might be +accidentally deleted. Assume the worst and hope for the best!** \ No newline at end of file