cross-posted from: https://lemmy.world/post/28808772

Finally released an alpha build for the PeerTube recommendation algorithm!
Basic UI is complete. If you want to try it out, the link is here:
👉 https://github.com/solidheron/peertube_recomendation_algorythm

New features since the last build:

  • Sort by videos that share your time engagement similarity.
  • Sort by videos that share your like similarity.
  • Display of like similarity cosine values.
  • Basic information shown for recommended videos (title, account, and channel names).
  • 404 check for generated instance links (so you don’t get stuck clicking into dead videos—you’ll know which instance hosts the video).
  • De-ranking for previously seen videos (simply a 0.5x multiplier on time and like similarity).

Features from previous builds:

  • Ability to input multiple instance domain names (DNs) and generate playable video links.
  • Limit of 5 recommendations per channel to avoid floods (e.g., during testing, The Linux Experiment would dominate otherwise—this limit is more of a failsafe than a feature).

Personal thoughts:
I still think cosine similarity beats chronological algorithms.
This algorithm also synergizes with other algorithms—it’s great for finding videos that appear next to or below what you’re currently watching.

You can also revisit videos you previously liked to help strengthen your like similarity vectors.


Moving forward: basic design philosophies and current issues

There’s an issue I’m calling the “Linux pipeline.”
Basically, Linux-related videos tend to dominate PeerTube’s well-produced content.
Since the algorithm relies on English words in descriptions, titles, and tags, Linux videos—which sometimes have fewer general keywords—end up being more “orthogonal” to typical user vectors, causing lower ranking.

Another challenge:
It’s really hard to properly combine like cosine similarity and time engagement cosine similarity.
You could add them, but it doesn’t fully make sense:

  • High like similarity + high time engagement similarity = you probably like and will watch the video longer.
  • But short videos can be liked even if they contribute almost nothing to time engagement (because time engagement is based on percentage watched × video length).

If I combined them, it would basically enter machine learning territory:
You’d have to adjust proportions dynamically based on user behavior.
Since I want this algorithm scoped to one person only (no data sharing yet), that level of ML is out of scope for now.

(Sharing data across devices could come later—Brave browser has sync features, and PeerTube watch history syncing could be possible.)


Summary:
Most of the data structure is settling into place.
Future updates will probably focus on expanding the data structure and making small improvements.

  • ArtificialHoldings@lemmy.world
    link
    fedilink
    English
    arrow-up
    3
    ·
    7 hours ago

    Here’s a cleaned-up version of your Lemmy post that keeps your tone but improves clarity, flow, and grammar:

    Did they forget to delete ChatGPT’s bit or did they intentionally copy the whole thing lol

    • Cattail@lemmy.worldOP
      link
      fedilink
      English
      arrow-up
      2
      ·
      6 hours ago

      Lol I didn’t copy the hole prompt I deleted the bit at the end, but it was late and I was tired so I used an AI just to fix my original text

        • Cattail@lemmy.worldOP
          link
          fedilink
          English
          arrow-up
          1
          ·
          2 hours ago

          I Still have no idea what that that term means exactly, but I did use AI to build the program. I just rigorously test it and go into the code to figure out the issues