Jump to content

Recommended Posts

Posted

We are getting hammered with bot traffic all of a sudden, trying to wring it in but apparently we've been 'discovered'

ughhh

Posted (edited)
3 hours ago, Shawn said:

We are getting hammered with bot traffic all of a sudden, trying to wring it in but apparently we've been 'discovered'

ughhh

Could be a bunch of people training their AI's.  It happened to me at the end of December on my small—very small—website (bots were using up the monthly allotment of bandwidth in a matter of days).

It stopped when I added a no-bots file that excluded everything but Google.

 

Here's the script that I used in the robots.txt that stopped the hemorrhaging:

Quote

User-agent: Googlebot
Allow: /

User-agent: *
Disallow: /

 

Edited by sketchley
Posted

It's happening all over for the last week or so, so much so that a bunch of web hosts are now saying they will block AI crawlers by default.

Posted

Yes I saw multiple AI bots crawling the site starting this last week, hammering the hell out of it.

I've got htaccess and robots blocking as many as I can, but they are just ignoring.

We have a LOT of data here these last 25 years, 1.6 million posts and hundreds of thousands of pictures.

If any AI really cares about Macross its a lot of stuff to digest!

Posted
46 minutes ago, Shawn said:

Yes I saw multiple AI bots crawling the site starting this last week, hammering the hell out of it.

I've got htaccess and robots blocking as many as I can, but they are just ignoring.

We have a LOT of data here these last 25 years, 1.6 million posts and hundreds of thousands of pictures.

If any AI really cares about Macross its a lot of stuff to digest!

 

You prolly need to try and implement various tarpits on the site to at least contain these AI crawlers. 

Posted
40 minutes ago, Shawn said:

We have a LOT of data here these last 25 years, 1.6 million posts and hundreds of thousands of pictures.

If any AI really cares about Macross its a lot of stuff to digest!

Maybe it was ChatGPT-5 being retrained after the laughable release this week.

AI companies love data and their bots will scrap from where ever, how ever. There was a report out this week that Meta has a new lawsuit for training their AI on torrented p0rn (probably to train their filter 🤷‍♂️). You can bet AI companies are just flat out ignoring and bypassing htaccess and robots.txt. 

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...