cross-posted from: https://lemmy.world/post/28808772

Finally released an alpha build for the PeerTube recommendation algorithm!
Basic UI is complete. If you want to try it out, the link is here:
👉 https://github.com/solidheron/peertube_recomendation_algorythm

New features since the last build:

  • Sort by videos that share your time engagement similarity.
  • Sort by videos that share your like similarity.
  • Display of like similarity cosine values.
  • Basic information shown for recommended videos (title, account, and channel names).
  • 404 check for generated instance links (so you don’t get stuck clicking into dead videos—you’ll know which instance hosts the video).
  • De-ranking for previously seen videos (simply a 0.5x multiplier on time and like similarity).

Features from previous builds:

  • Ability to input multiple instance domain names (DNs) and generate playable video links.
  • Limit of 5 recommendations per channel to avoid floods (e.g., during testing, The Linux Experiment would dominate otherwise—this limit is more of a failsafe than a feature).

Personal thoughts:
I still think cosine similarity beats chronological algorithms.
This algorithm also synergizes with other algorithms—it’s great for finding videos that appear next to or below what you’re currently watching.

You can also revisit videos you previously liked to help strengthen your like similarity vectors.


Moving forward: basic design philosophies and current issues

There’s an issue I’m calling the “Linux pipeline.”
Basically, Linux-related videos tend to dominate PeerTube’s well-produced content.
Since the algorithm relies on English words in descriptions, titles, and tags, Linux videos—which sometimes have fewer general keywords—end up being more “orthogonal” to typical user vectors, causing lower ranking.

Another challenge:
It’s really hard to properly combine like cosine similarity and time engagement cosine similarity.
You could add them, but it doesn’t fully make sense:

  • High like similarity + high time engagement similarity = you probably like and will watch the video longer.
  • But short videos can be liked even if they contribute almost nothing to time engagement (because time engagement is based on percentage watched × video length).

If I combined them, it would basically enter machine learning territory:
You’d have to adjust proportions dynamically based on user behavior.
Since I want this algorithm scoped to one person only (no data sharing yet), that level of ML is out of scope for now.

(Sharing data across devices could come later—Brave browser has sync features, and PeerTube watch history syncing could be possible.)


Summary:
Most of the data structure is settling into place.
Future updates will probably focus on expanding the data structure and making small improvements.

  • ArtificialHoldings@lemmy.world
    link
    fedilink
    English
    arrow-up
    1
    ·
    2 hours ago

    Here’s a cleaned-up version of your Lemmy post that keeps your tone but improves clarity, flow, and grammar:

    Did they forget to delete ChatGPT’s bit or did they intentionally copy the whole thing lol

    • Cattail@lemmy.worldOP
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 hour ago

      Lol I didn’t copy the hole prompt I deleted the bit at the end, but it was late and I was tired so I used an AI just to fix my original text

  • iso@lemy.lol
    link
    fedilink
    English
    arrow-up
    5
    ·
    edit-2
    4 hours ago

    I think open discovery algorithms are the way. We are against algos but sorting by like similarity would be beneficial.

    What are you guys thinking? @dessalines@lemmy.ml @nutomic@lemmy.ml Are you optimistic about this or fuck any algorithms?

    • nutomic@lemmy.ml
      link
      fedilink
      English
      arrow-up
      5
      ·
      3 hours ago

      Algorithms are definitely needed to discover good content. There are some good videos on Peertube, but its very difficult to find them due to all the low effort spam. Lemmy also had different algorithms from the beginning and no one ever complained about them.

      The problem with algorithms used by Reddit, Facebook etc is that they are completely intransparent, and include factors which dont benefit the user, such as “engagement” or advertising. As long as Fediverse algorithms are focused on benefitting the user and are transparent there is nothing wrong with them.

      • Cattail@lemmy.worldOP
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 hour ago

        I want to encourage fedizens and Internet user to collect their own data and run their own algorithms since it’s worked well for corporations in general but to keep using their services. Seems as though people enjoy algo based on their data.

        It’s neat that fediverse has a general principle of not collecting user data, so if more people used fediverse instances more often then less Data going to corporations. This browser extension is outline of how collecting your own data can affect your experience with fediverse. There’s so much you can do with your data and data from api

    • atro_city@fedia.io
      link
      fedilink
      arrow-up
      6
      ·
      8 hours ago

      I don’t even know what “it” is. A recommendation algorithm? But peertube already has a “similar” video section to the right of all videos. Does this replace that? Techies really have a problem with presenting stuff to laymen.

    • Cattail@lemmy.worldOP
      link
      fedilink
      English
      arrow-up
      4
      ·
      10 hours ago

      its a browser extension for brave (or any chrome based browser) it’s in the github readme. recomendation algo was self explanatory. it’s meant to recommend you videos on peertube. i only screen shot the only ui that exists, the only things I can screenshot is variables stored in indexdedDb and local extension.

      also the installations instructions are in the github readme

        • Cattail@lemmy.worldOP
          link
          fedilink
          English
          arrow-up
          1
          ·
          2 hours ago

          Extension are like plugins, the program monitors watch time and wanted that to be done by the user and stored by the user and shared consensually. Plus if this algorithm boost engagement it should benefit people that don’t use it because it will encourage people to watch and like things which contribute to other algorithms