cross-posted from: https://lemmy.world/post/28808772
Finally released an alpha build for the PeerTube recommendation algorithm!
Basic UI is complete. If you want to try it out, the link is here:
👉 https://github.com/solidheron/peertube_recomendation_algorythmNew features since the last build:
- Sort by videos that share your time engagement similarity.
- Sort by videos that share your like similarity.
- Display of like similarity cosine values.
- Basic information shown for recommended videos (title, account, and channel names).
- 404 check for generated instance links (so you don’t get stuck clicking into dead videos—you’ll know which instance hosts the video).
- De-ranking for previously seen videos (simply a 0.5x multiplier on time and like similarity).
Features from previous builds:
- Ability to input multiple instance domain names (DNs) and generate playable video links.
- Limit of 5 recommendations per channel to avoid floods (e.g., during testing, The Linux Experiment would dominate otherwise—this limit is more of a failsafe than a feature).
Personal thoughts:
I still think cosine similarity beats chronological algorithms.
This algorithm also synergizes with other algorithms—it’s great for finding videos that appear next to or below what you’re currently watching.You can also revisit videos you previously liked to help strengthen your like similarity vectors.
Moving forward: basic design philosophies and current issues
There’s an issue I’m calling the “Linux pipeline.”
Basically, Linux-related videos tend to dominate PeerTube’s well-produced content.
Since the algorithm relies on English words in descriptions, titles, and tags, Linux videos—which sometimes have fewer general keywords—end up being more “orthogonal” to typical user vectors, causing lower ranking.Another challenge:
It’s really hard to properly combine like cosine similarity and time engagement cosine similarity.
You could add them, but it doesn’t fully make sense:
- High like similarity + high time engagement similarity = you probably like and will watch the video longer.
- But short videos can be liked even if they contribute almost nothing to time engagement (because time engagement is based on percentage watched × video length).
If I combined them, it would basically enter machine learning territory:
You’d have to adjust proportions dynamically based on user behavior.
Since I want this algorithm scoped to one person only (no data sharing yet), that level of ML is out of scope for now.(Sharing data across devices could come later—Brave browser has sync features, and PeerTube watch history syncing could be possible.)
Summary:
Most of the data structure is settling into place.
Future updates will probably focus on expanding the data structure and making small improvements.
I don’t even know what “it” is. A recommendation algorithm? But peertube already has a “similar” video section to the right of all videos. Does this replace that? Techies really have a problem with presenting stuff to laymen.
Yeah the algo recommendation