× Joomla Facebook Connect support forum

Topic-icon facebookexternalhit/1.1

Active Subscriptions:

None
14 years 4 months ago #18400 by fb_627893202
My site was bombarded this morning by the facebook bot facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php). Took the server to the limit and more with more than a 1000 requests in any 15 minute spell.

I'm looking at other solutions (reducing memory size of my web pages etc) but I wondered if anyone using JFBC had come across this issue and solved it.
The topic has been locked.
Support Specialist
14 years 4 months ago #18447 by alzander
Replied by alzander on topic facebookexternalhit/1.1
John,
We haven't heard of it spamming like that before, but Facebook does try to re-crawl every page of your site that uses Facebook's social buttons every 24 hours. They do this to check for updated tags, and 24 hours is the longest duration possible. There are ways to have it scan quicker, but obviously, that's not what you're looking for.

For more information on how it scans, see:
developers.facebook.com/docs/reference/plugins/like/ (Search for "scrape"

So, if you have thousands of pages, that may just be a consequence of using Facebook on your pages, as Facebook will want to scan them. They should be spreading things out (and may with time), but there's no way to stop it.

One thing to check on your server is that the following file is hit-able. It will appear like a blank white page, and if you look at the source, it will load one Javascript file:
http://site.com/components/com_jfbconnect/assets/jfbcchannel.php
If that throws an error, you'll need to fix that. Let us know, and we can help.

Hope that at least explains,
Alex
The topic has been locked.
Active Subscriptions:

None
14 years 4 months ago #18475 by fb_627893202
Thanks. The javascript code appears fine. I'll monitor the situation. Some long hours ahead reducing requests and page size would be the solution i guess.
The topic has been locked.
Support Specialist
14 years 4 months ago #18496 by alzander
Replied by alzander on topic facebookexternalhit/1.1
It's good the Javascript file is there, but unfortunately, that means it's also not the issue.

Definitely keep us posted on how it goes. That bot is definitely normal, but it shouldn't be causing too many issues. Anything to reduce overall page size, caching, and other optimizations are always beneficial for both bots and users.. though easier said than done.

Best of luck,
Alex
The topic has been locked.
Active Subscriptions:

None
14 years 4 months ago #18556 by fb_627893202
Still crashing the server in the mornings. Causes '500 Internal Server Error' and memory error messages on front end and in mysql max connections exceeded.

Even though I'm on a managed server I understand the most likely problem is that my server is not optimized correctly. That is something I will gradually have to learn about and not your problem.

I wonder if a means exists to trigger the bot so that it scrapes at set intervals throughout the day?
Also would setting 'like' to be site dependent rather than page by page help in any way?
The topic has been locked.
Support Specialist
14 years 4 months ago #18574 by alzander
Replied by alzander on topic facebookexternalhit/1.1
John,
Been looking into this more. There are a few posts/requests to Facebook to allow limiting of these hits, but nothing available to do so.

Facebook generally recommends checking that your pages are accessible, that you don't block the user agent facebookexternalhit and some other pretty basic stuff. None of that seems to be your problem though.

The only thing I saw that might help diagnose a problem was the recommendation to use Facebook's URL tool. If you check your logs and see any pages that might be requested more than once, try running it through their tool:
developers.facebook.com/tools/debug

It should report the status code it receives (should be 200) and the data it sees. If any of that looks incorrect, let us know. It might be a sign that Facebook is simply getting into a retry-loop trying to fetch your content.

Probably not the case, but the best suggestion I have for now.

Thanks,
Alex
The topic has been locked.