
google-plus-platform - issue #178
Specific user-agent to identify the google-plus crawler
What steps will reproduce the problem? 1.The Google Plus BOT that crawls the page, after a user clicks the +1 button, doesn't identify itself as a BOT. 2.Neither as an agent without cookies support
What is the expected output? What do you see instead? There are some problems if it's not possible to identify the crawler: - The system count the crawler visit as a normal one - The system can not use a load balancer or cache system to priorize normal users.
Comment #1
Posted on Feb 24, 2012 by Massive Rhino(No comment was entered for this change.)
Comment #2
Posted on Mar 6, 2012 by Happy KangarooHaving a consistent User Agent will help our multi language website detect language settings and share it effectively
Comment #3
Posted on Mar 12, 2012 by Grumpy KangarooDeclaring a consistent user agent will allow the +1 to work on our 100% SSL site. We have to manually set cookies for each of the social sharing tools (twitter, facebook, etc) but we cannot set it up yet for G+ because of the current bot user-agent issue...
Comment #4
Posted on Mar 22, 2012 by Grumpy HippoJust noticed this also. Google +1 on links to my site count as visits with user agent "Mozilla/5.0 (Windows NT 6.1; rv:6.0) Gecko/20110814 Firefox/6.0" Facebook has "facebookexternalhit" user agent for this purpose. LinkedIn has "LinkedInBot" (I am filtering those).
Comment #5
Posted on Apr 12, 2012 by Massive RhinoIssue 209 has been merged into this issue.
Comment #6
Posted on Apr 13, 2012 by Helpful ElephantBummer that this was requested almost 2 months ago and it has not been implemented yet. I found a work around for me, but it is not ideal.
Comment #7
Posted on Apr 13, 2012 by Swift BearI think you're right, 2 months is too much to implement such an easy issue. What is your work around?
Comment #8
Posted on Apr 13, 2012 by Helpful ElephantI use the Google provided #!ajax workaround for g+ button URLs. I then can use any server side software to handle the request.
The only drawback is that g+ has the full URL in the shared item (i.e. //domain.com/seo/url#!ajax instead of //domain.com/seo/url).
This in and of itself is likely a "bug" as the Google search bot strips out the #!ajax before adding the URL to the index. I will submit it as one to see if they can address it too.
Comment #9
Posted on Apr 17, 2012 by Quick OxComment deleted
Comment #10
Posted on Apr 17, 2012 by Quick OxComment deleted
Comment #11
Posted on Apr 17, 2012 by Quick OxComment deleted
Comment #12
Posted on Apr 17, 2012 by Quick OxThis also affects pages on sites where some sort of user action is required... such as entering a zip code/state/etc before you are able to view a product page that is tied to a specific region/product/etc. Or the SSL use case mentioned by Kym above.
It seems this would be a very easy change in just about any programming language I can think of. Please, Google can we haz useragentz? :) Here's a couple examples for Java and PHP:
Java: java.net.URLConnection c = url.openConnection(); c.setRequestProperty("User-Agent", "Mozilla/5.0 (compatible; GooglePlusBot/20120417)");
PHP: setHeaders(array('User-Agent' => 'Mozilla/5.0 (compatible; GooglePlusBot/20120417)')); ?>
And even if Google Plus bot was just some script being kicked off in FF... it can be overridden in about:config with general.useragent.override ...
Comment #13
Posted on May 17, 2012 by Grumpy KangarooThere's some discussion of this going on over on StackOverflow:
Suffice to say, a nice way of identifying the Google+ bot would be extremely useful to us...
Comment #14
Posted on May 21, 2012 by Quick PandaA site I administer has an age check screen that uses bot detection do allow bots to crawl the site bypassing the age check. Not being able to detect Google+ as a bot implies that each type a user shares a link from the site on google+ (or using a social plugin to +1 a link) it gets the title, image, description and url of the age check screen instead.
Comment #15
Posted on Jul 5, 2012 by Swift OxIn the Netherlands we have to ask every visitor for permission to place cookies BEFORE we place cookies. This means we have a splash screen that users who have not agreed to receive cookies get to see. Since the googleplus crawler does not have such a cookie, it gets that splash screen no matter what page the user pressed the google+ button on. Since google+ doesn't have an identifiable user agent it is impossible for us to get te button to work properly. Facebooks button works perfectly fine. If this is not resolved on short notice we will have no choice but to remove the google+ and +1 buttons and functionality from our site.
Comment #16
Posted on Jul 20, 2012 by Swift RhinoYou can look for
X-Goog-Source:LP_
to know if this is a googlebot crawler.
Comment #17
Posted on Jul 20, 2012 by Swift RabbitStill showing the same user agent:
"Mozilla/5.0 (Windows NT 6.1; rv:6.0) Gecko/20110814 Firefox/6.0"
Comment #18
Posted on Jul 28, 2012 by Happy CamelI cannot use the google plus button without a fixed user agent for it, as my pages are only visible to people who sign up.
Please implement a fixed user agent, google.
Comment #19
Posted on Jul 29, 2012 by Swift Ox24 days after my comment above (#15) still nothing has changed. The snippet the google+ button gives is the cookie-permission text because we are unable to grant the crawler permission to skip the cookie-permission page. We at fok.nl will therefor be removing all google+ functionality from our site withing the next week. Google's arrogance in this is a shame, but I'm sure facebook won't mind.
Comment #20
Posted on Jul 30, 2012 by Helpful CatHello all, Shall we use "HTTP_X_GOOG_SOURCE" for identifying google-plus bot, this is the only noted difference I found while watching SERVER(in PHP) array.
Comment #21
Posted on Nov 14, 2012 by Happy BearIt's been 9 months now and there's still no change on this? Using the HTTP_X_GOOG_SOURCE works but feels like a hack and could be broken at any time. Even one of the testing tools from Google throws Googlebot-richsnippets, why not bring this inline?
Comment #22
Posted on Nov 14, 2012 by Quick BirdComment deleted
Comment #23
Posted on Dec 21, 2012 by Swift CamelComment deleted
Comment #24
Posted on Jan 28, 2013 by Happy Rabbit@drood: I noticed at fok.nl you have fixed this issue, since Google+ sharing now shows the correct information. Could you share your solution with us? Thanks in advance.
Comment #25
Posted on Jan 28, 2013 by Swift Ox@fr...: we used the solution from post #20 to check for $_SERVER['HTTP_X_GOOG_SOURCE']. Also we whitelisted a known google IP-range.
Comment #26
Posted on Feb 4, 2013 by Quick HorseIn case someone needs a copy-pastable asp.net version: HttpContext.Current.Request.ServerVariables["HTTP_X_GOOG_SOURCE"] != null;
Comment #27
Posted on Feb 4, 2013 by Helpful HorseI agree that it should use a specific user-agent. The Facebook crawler does and it makes sense to use something like "Googlebot-richsnippets" like said in Comment 21.
Comment #28
Posted on Feb 5, 2013 by Grumpy BirdWhat exactly is this HTTP_X_GOOG_SOURCE thing? How would I detect that in haproxy?
Comment #29
Posted on Feb 6, 2013 by Happy Elephant@elyog...: for Haproxy use "hdr*" matching criteria in your ACLs. See http://cbonte.github.com/haproxy-dconv/configuration-1.5.html#7-hdr
Comment #30
Posted on Feb 27, 2013 by Massive RabbitI've found new user-agent 'Mozilla/5.0 (Windows NT 6.1; rv:6.0) Gecko/20110814 Firefox/6.0 Google'. Maybe we can relay on 'Google'?
Comment #31
Posted on Feb 27, 2013 by Swift CatI was relying on the HTTP_X_GOOG_SOURCE header to identify the crawler, however I think it's not working now (I noticed that yesterday, maybe a recent change?).
So my question is the same than the previous comment.. Can I relay in the user-agent 'Mozilla/5.0 (Windows NT 6.1; rv:6.0) Gecko/20110814 Firefox/6.0 Google'?? I just noticed that it's including 'Google' at the end, it looks like a recent change as well
Comment #32
Posted on Feb 27, 2013 by Happy GiraffeThis is not a supported feature of the API so no you can't relay on the user agent. It could change at any time.
Comment #33
Posted on Feb 28, 2013 by Swift BearWTF!? Too bad. The very poor solution we had, using HTTP_X_GOOG_SOURCE, has gone. And you, engineers of Google, what the hell are you thinking!? You, along with a few companies have defined the behavior and netiquette of BOTs, and now it seems that you are not able to implement the most basic one: SEND YOUR OWN USER AGENT!!! We opened this issue 1 YEAR AGO!! After SIX MONTHS, someone came up with a solution (thanks a lot, btw) that uses a strange header coming with the request from Google+. Ok, it was not the best solution, but a solution, and we put it on our code. And now, even this poor solution has gone, as the integration of Google+ in our site. Thanks for nothing Google!
Comment #34
Posted on Mar 1, 2013 by Grumpy Kangaroough, +1. It's hard to justify continued use of the +1 button after this.
Comment #35
Posted on Mar 1, 2013 by Happy Rabbit@drood: I noticed the G+ button on Fok.nl continues to work. Can you tell me which Google - IP range you use to exclude? We added a whole bunch of them, which worked fine. But since a couple of days, we are again sharing the cookie-page instead of the article. So I suspect you use a broader range. Thanks in advance.
Comment #36
Posted on Mar 5, 2013 by Grumpy PandaFeedback acknowledged.
To make filtering requests easier the User-Agent will soon contain a link to the snippets help page. If you have custom rules for processing User-Agents please change them to recognize this suffix:
Google (+https://developers.google.com/+/web/snippet/)
Thank you - more details to come.
Comment #37
Posted on Mar 5, 2013 by Swift OxOnly took just over a year as well. Kudos.
Comment #38
Posted on Mar 5, 2013 by Grumpy PandaThe User-Agent change is now live. Documentation changes coming soon as well as DNS PTR records for outbound IPs.
Comment #39
Posted on Mar 25, 2013 by Grumpy PandaReverse DNS for all fetch IPs is now available.
% host 66.249.80.100 100.80.249.66.in-addr.arpa domain name pointer google-proxy-66-249-80-100.google.com.
Status: Fixed
Labels:
Type-Enhancement
Priority-Medium
Component-Plugins