Cynicus Rex@lemmy.ml to Privacy@lemmy.mlEnglish · 3 months agoHow to block AI Crawler Bots using robots.txt filewww.cyberciti.bizexternal-linkmessage-square37fedilinkarrow-up12arrow-down10
arrow-up12arrow-down1external-linkHow to block AI Crawler Bots using robots.txt filewww.cyberciti.bizCynicus Rex@lemmy.ml to Privacy@lemmy.mlEnglish · 3 months agomessage-square37fedilink
minus-squareNullPointer@programming.devlinkfedilinkarrow-up0·3 months agorobots.txt will not block a bad bot, but you can use it to lure the bad bots into a “bot-trap” so you can ban them in an automated fashion.
minus-squareDave.@aussie.zonelinkfedilinkarrow-up0·3 months agoI’m guessing something like: Robots.txt: Do not index this particular area. Main page: invisible link to particular area at top of page, with alt text of “don’t follow this, it’s just a bot trap” for screen readers and such. Result: any access to said particular area equals insta-ban for that IP. Maybe just for 24 hours so nosy humans can get back to enjoying your site.
minus-squaredoodledup@lemmy.worldlinkfedilinkarrow-up0·3 months agoProblem is that you’re also blocking search engines to index your site, no?
minus-squareɐɥO@lemmy.ohaa.xyzlinkfedilinkarrow-up1·3 months agoNope. Search engines should follow the robots.txt
robots.txt will not block a bad bot, but you can use it to lure the bad bots into a “bot-trap” so you can ban them in an automated fashion.
I’m guessing something like:
Robots.txt: Do not index this particular area.
Main page: invisible link to particular area at top of page, with alt text of “don’t follow this, it’s just a bot trap” for screen readers and such.
Result: any access to said particular area equals insta-ban for that IP. Maybe just for 24 hours so nosy humans can get back to enjoying your site.
Problem is that you’re also blocking search engines to index your site, no?
Nope. Search engines should follow the robots.txt