Jump to content

Recommended Posts

Posted

We are getting hammered with bot traffic all of a sudden, trying to wring it in but apparently we've been 'discovered'

ughhh

Posted (edited)
3 hours ago, Shawn said:

We are getting hammered with bot traffic all of a sudden, trying to wring it in but apparently we've been 'discovered'

ughhh

Could be a bunch of people training their AI's.  It happened to me at the end of December on my small—very small—website (bots were using up the monthly allotment of bandwidth in a matter of days).

It stopped when I added a no-bots file that excluded everything but Google.

 

Here's the script that I used in the robots.txt that stopped the hemorrhaging:

Quote

User-agent: Googlebot
Allow: /

User-agent: *
Disallow: /

 

Edited by sketchley
Posted

It's happening all over for the last week or so, so much so that a bunch of web hosts are now saying they will block AI crawlers by default.

Posted

Yes I saw multiple AI bots crawling the site starting this last week, hammering the hell out of it.

I've got htaccess and robots blocking as many as I can, but they are just ignoring.

We have a LOT of data here these last 25 years, 1.6 million posts and hundreds of thousands of pictures.

If any AI really cares about Macross its a lot of stuff to digest!

Posted
46 minutes ago, Shawn said:

Yes I saw multiple AI bots crawling the site starting this last week, hammering the hell out of it.

I've got htaccess and robots blocking as many as I can, but they are just ignoring.

We have a LOT of data here these last 25 years, 1.6 million posts and hundreds of thousands of pictures.

If any AI really cares about Macross its a lot of stuff to digest!

 

You prolly need to try and implement various tarpits on the site to at least contain these AI crawlers. 

Posted
40 minutes ago, Shawn said:

We have a LOT of data here these last 25 years, 1.6 million posts and hundreds of thousands of pictures.

If any AI really cares about Macross its a lot of stuff to digest!

Maybe it was ChatGPT-5 being retrained after the laughable release this week.

AI companies love data and their bots will scrap from where ever, how ever. There was a report out this week that Meta has a new lawsuit for training their AI on torrented p0rn (probably to train their filter 🤷‍♂️). You can bet AI companies are just flat out ignoring and bypassing htaccess and robots.txt. 

  • 1 month later...

Please sign in to comment

You will be able to leave a comment after signing in



Sign In Now
×
×
  • Create New...