Skip to content
This repository has been archived by the owner on Apr 21, 2023. It is now read-only.

Blacklist Windows Browser for transcoding to webp #978

Closed
GoogleCodeExporter opened this issue Apr 6, 2015 · 33 comments
Closed

Blacklist Windows Browser for transcoding to webp #978

GoogleCodeExporter opened this issue Apr 6, 2015 · 33 comments

Comments

@GoogleCodeExporter
Copy link

Webp support is now defined using whitelist and blacklist of browsers' versions 
(\src\pagespeed\kernel\http\user_agent_matcher.cc)

It makes sense to rely on the "image/webp" header and do feature detection 
instead of browser detection.

supports_webp_ variable has specific type FastWildcardGroup that allows to 
return bool values using Match function that takes White/Black lists into 
account.

bool UserAgentMatcher::SupportsWebp function returns 
supports_webp_.Match(user_agent, false);

\src\net\instaweb\rewriter\device_properties.cc contains sample usage of 
HasValue function that is available for headers analysis:

accepts_webp_ =
               request_headers.HasValue(HttpAttributes::kAccept, kContentTypeWebp.mime_type()) ?
                kTrue : kFalse;

So, I think we can use HasValue function to return bool value from SupportsWebp 
function basing on request headers values.

There is also separate webp-related parameter supports_webp_lossless_alpha_ 
that is handled the same way as supports_webp_ now. Not sure is it possible to 
define required value of this parameter using current headers.

There is also the following code:

const char UserAgentMatcher::kTestUserAgentWebP[] = "test-user-agent-webp";
// Note that this must not contain the substring "webp".
const char UserAgentMatcher::kTestUserAgentNoWebP[] = "test-user-agent-no";

but as I can see these variables are not used. However, in case it would be 
necessary to update them, we can place headers values instead of the hardcoded 
strings. This should work as accepted headers of client that does not support 
webp wouldn't contain "webp" substring.

Original issue reported on code.google.com by evgeny.a...@akvelon.com on 29 Aug 2014 at 6:34

@GoogleCodeExporter
Copy link
Author

Hi.

Can you clarify what enhancement you are asking for?

Original comment by jmara...@google.com on 29 Aug 2014 at 12:06

  • Changed state: RequestClarification

@GoogleCodeExporter
Copy link
Author

Hi,

I'm asking for the following enhancement:
replace useragent-based detection if client supports "image/webp" format with 
client accept header-based detection.
This should allow to remove white/blacklists for webp and requirement to 
support these lists up to date for every new version of each useragent

Original comment by v-evg...@microsoft.com on 29 Aug 2014 at 12:19

@GoogleCodeExporter
Copy link
Author

Hi,

mod_pagespeed checks both user agent and "Accept: image/webp" header to 
determine whether the browser supports WebP. User agent is used because it 
describes the capability of browser more accurately. WebP is an evolving 
format. When WebP was introduced, it only supports lossy opaque format. Then it 
added support to lossless format and alpha format. Now it also supports 
animated format. Old browsers that were compiled with the old WebP library 
can't display the newer formats. By checking the user agent, we can know 
exactly which formats of WebP the browser support so we can rewrite the image 
to the best format.

On the other hand, some browsers started to send out "Accept: image/webp" 
header when they supported lossy opaque format, the first version of WebP. 
Therefore, to avoid sending WebP of newer version to browsers compiled with 
older WebP library, we have to only use lossy opaque format, if we only checks 
"Accept" header. Some browser, such as the builtin Android browser, doesn't 
send out "Accept: image/webp" header, although it does support lossy opaque 
WebP. Therefore, by only checking "Accept" header we can't take adavantage of 
full potential of WebP.

That being said, in some use cases (such as in place resource optimization) we 
can't rely on user agent and must use "Accept" header, so we need both.

In the code, merging WebP support from user agent and "Accept" header is in
"DeviceProperties::SupportsWebpRewrittenUrls()" in "device_properties.cc".

Variables "kTestUserAgentWebP" and "kTestUserAgentNoWebP" are used in tests, 
for example in "\src\net\instaweb\rewriter\in_place_rewrite_context_test.cc".

Original comment by hui...@google.com on 29 Aug 2014 at 12:58

@GoogleCodeExporter
Copy link
Author

[deleted comment]

@GoogleCodeExporter
Copy link
Author

Hi - can you give specific examples of IE UA strings that are being served webp 
images?  We want to make sure we're actually covering the real use cases here.  
I see an impractically-large regex on the cited web page, but no actual UA 
strings.

Original comment by jmaes...@google.com on 11 Sep 2014 at 5:25

@GoogleCodeExporter
Copy link
Author


Thanks for the question. I think we are less concerned about one specific 
string and more about future-proofing against database / pagespeed logic 
changes.

Changes made now to pagespeed now may not necessarily get deployed to sites for 
some time. This means that each time a new UA string is modified it won’t be 
lit up on the web until we not only have to wait for the change to get into the 
module, but then for that module to get updated / deployed across the web. This 
lags the web behind and fragments it for everyone. We’d love for this library 
to help prevent this type of future fragmentation by doing feature detection 
rather than UA sniffing. We are recommending that the logic be changed to first 
do feature detection (looking for the Accept header), and then use UA 
refinement to handle what types of WebP formats are accepted.

We understand that some older browsers that might support WebP don’t 
necessarily send the Accept header. If you served original content to those 
browsers, you still wouldn’t be inadvertently breaking the site’s content 
by misidentifying a browser and sending WebP content. On the other hand, if you 
keep the code/logic as it exists now, sites would be broken in new UAs until 
Pagespeed accepts a change and those changes are deployed. This would happen in 
a rather fragmented fashion.

If this looks good to you, we could submit a patch. 

Original comment by n.parash...@gmail.com on 12 Sep 2014 at 9:35

@GoogleCodeExporter
Copy link
Author

Is IE putting out a UA string with the word 'Chrome' in it?  It seems like 
there might a bunch of servers who see that string 'Chrome' in the UA and then 
proceed to assume the browser is Chrome.  Note also that Chrome didn't always 
support webp and we already have to match against specific versions of Chrome.

We agree that using Accept:image/webp is a better approach in principle but it 
doesn't fully address our need to discover which features in webp are supported 
(lossless, animation, etc).  It also excludes a lot of old-Android browsers 
(20% market share; still a bit more than Chrome, according to 
http://www.netmarketshare.com/) which support webp but don't have that Accept 
header.  That's likely a pretty big impact because those older phones are often 
running on older networks where the reduced bandwidth is a real benefit.

I understand the desire to have windows-phones get UA-sniffed as mobile, but 
maybe Safari would be a better bet for masquerading because it also doesn't 
support webp (or other various new technologies that Chrome supports).

However we'd be happy to add exclusion-logic for IE based on a regex or 
substring.  Can you share with us the actual UA string?  We can try to be 
construct the logic in as future-proof a fashion as possible.

Also note: we already do use Accept:image/webp for image-transcoding in 
contexts where we cannot change the URL (in-place resource optimization), and 
must serve webp responses to a jpeg URL, adding Vary:Accept so proxy-caches 
don't get confused.

Original comment by jmara...@google.com on 13 Sep 2014 at 8:36

@GoogleCodeExporter
Copy link
Author

Summary was: Webp support detection using headers

Also I want to clarify something.  We will not blacklist a specific long 
UA-string to avoid sending webp images to IE.  We will add a new pattern to 
this wildcard list: 
https://code.google.com/p/modpagespeed/source/browse/trunk/src/pagespeed/kernel/
http/user_agent_matcher.cc#129

And sorry for the confusion -- I called for a regex earlier but actually the 
UA-matching uses a fast wildcard algorithm that compiles a sequence of 
allow/disallow wildcards into an IR that can be executed very quickly on every 
request -- much faster than a long list of regexes.

All we need is a new pattern to add to kWebpBlacklist in that file, and we can 
get this in quick.  And it would be good to get this to us quickly.  I'm not 
sure if this will make our next release (1.9) but we should be able to get into 
the first patch, which is what would go to stable eventually and reach most of 
our users.

Original comment by jmara...@google.com on 13 Sep 2014 at 11:15

  • Changed title: Blacklist Windows Browser for transcoding to webp

@GoogleCodeExporter
Copy link
Author

Ah I was curious and found the above-mentioned site site: 
http://blogs.msdn.com/b/ie/archive/2014/07/31/the-mobile-web-should-just-work-fo
r-everyone.aspx has lots of spew in the comments, but nestled in there I found 
same data:

"
For those who have asked, here is the useragent string from the new Windows 
Phone 8.1 Update from August 4, 2014 on my Nokia Lumia 928:

Mozilla/5.0 (Mobile; Windows Phone 8.1; Android 4.0; ARM; Trident/7.0; Touch; 
rv:11.0; IEMobile/11.0; NOKIA; Lumia 928) like iPhone OS 7_0_3 Mac OS X 
AppleWebKit/537 (KHTML, like Gecko) Mobile Safari/537
"

Assuming this is the right UA it's not Chrome that's being masqueraded, it's 
Android.  IMO that's worse, because a site could be sending you to the "Google 
Play" for the app, etc.  Whatever.

But I think to solve this problem for MPS we just have to put "*Windows Phone*" 
into the webp blacklist.  Easy enough.

Original comment by jmara...@google.com on 14 Sep 2014 at 2:18

  • Changed state: Accepted

@GoogleCodeExporter
Copy link
Author

Window Phone 8.1, which has been in use since August 3, 2014, includes "Android 
4.*" in its user agent, in order to get HTML 5 support. Since Android 4.0+ 
supports WebP, we transcode images to this format. I've started a change for 
excluding "Windows Phone" from WebP supported list.
[1] 
http://www.neowin.net/news/ie11-fakes-user-agent-to-fool-gmail-in-windows-phone-
81-gdr1-update
[2] http://msdn.microsoft.com/en-us/library/ie/hh869301(v=vs.85).aspx

Original comment by hui...@google.com on 15 Sep 2014 at 1:08

@GoogleCodeExporter
Copy link
Author

Fixed in https://code.google.com/p/modpagespeed/source/detail?r=4247

Original comment by hui...@google.com on 15 Sep 2014 at 4:29

  • Changed state: Fixed

@GoogleCodeExporter
Copy link
Author

I think it impacts not only mobile browsers, but also desktop browsers. Is it 
ok to update the code such that WebP is "NOT" served to any UA that contains 
both Chrome "and" Edge tokens ?

Original comment by n.parash...@gmail.com on 16 Sep 2014 at 5:51

@GoogleCodeExporter
Copy link
Author

I don't think 'Chrome' was in the mobile UA.  The problem was the "Android 
4.0".  I just checked desktop IE 11 and the UA is "Mozilla/5.0 (Windows NT 6.1; 
Win64; x64; Trident/7.0; rv:11.0) like Gecko" so I don't think we'll have a 
problem there.

Original comment by jmara...@google.com on 16 Sep 2014 at 5:58

@GoogleCodeExporter
Copy link
Author

According MSDN, IE for desktop doesn't have "Chrome"  nor "Android" in the user 
agent, so they should be confused. What made you think it's a problem?

http://msdn.microsoft.com/en-us/library/ie/hh869301(v=vs.85).aspx

Original comment by hui...@google.com on 16 Sep 2014 at 5:58

@GoogleCodeExporter
Copy link
Author

The next version of Internet Explorer may have "Chrome" and "Edge" to make more 
websites compatible. 

Original comment by n.parash...@gmail.com on 16 Sep 2014 at 6:27

@GoogleCodeExporter
Copy link
Author

n.parashuram do you work on IE?  How do you know that?  Is there an MSDN 
reference for that plan?

Original comment by jmara...@google.com on 16 Sep 2014 at 6:29

@GoogleCodeExporter
Copy link
Author

I work for Microsoft Open Technologies and work closely with the IE team. The 
IE team is considering this and it is the current plan. There is no MSDN 
article on this yet. 

Original comment by n.parash...@gmail.com on 16 Sep 2014 at 6:55

@GoogleCodeExporter
Copy link
Author

OK can you provide the full UA of the proposed change to desktop IE?  What 
sites don't work on desktop IE now?

blacklisting "*Edge*" from being webp-compatible seems too general.  I'd rather 
blacklist on "* Trident/*" or something like that.

Original comment by jmara...@google.com on 16 Sep 2014 at 7:08

@GoogleCodeExporter
Copy link
Author

The full UA has not been finalized yet. This is the change that I was referring 
to. 

Original comment by n.parash...@gmail.com on 22 Sep 2014 at 5:58

Attachments:

@GoogleCodeExporter
Copy link
Author

It is weird to me that the word Edge implies a chrome that cannot render webp.  
Surely there must be something else in the purpose UA that more obviously tells 
us that this is not the chrome we were looking for...

Original comment by jmara...@gmail.com on 22 Sep 2014 at 6:04

@GoogleCodeExporter
Copy link
Author

We aren’t looking for a Chrome, but “Chrome” will be in the UA string. 
Much the same way that Chrome UAs includes “AppleWebKit”, “Safari” and 
“Gecko” tokens the IE UA will likely include a Chrome token. The supplied 
patch is looking for UAs that contain both Chrome and Edge tokens. We could 
reduce this to simply look for “*Edge/*”

Original comment by n.parash...@gmail.com on 22 Sep 2014 at 8:54

@GoogleCodeExporter
Copy link
Author

'Edge' seems too generic and it's not obvious that won't appear in some non-IE 
browser.  Will 'Trident' not be in the UA string?  I'd really like to see the 
final UA string.

Are you sure that 'Chrome' will appear in the desktop UA for IE11?  It does not 
appear (as far as we can tell) in the new mobile IE UA.

Original comment by jmara...@google.com on 22 Sep 2014 at 8:59

@GoogleCodeExporter
Copy link
Author

I've been watching this thread for a bit. I hope you'll forgive me for not 
chiming in sooner as I was hoping to better digest the problem space.

I'm a senior PM on the performance team on IE. I don't own the UA string 
decisions nor do I have the final result, but I have seen multiple iterations 
of the string in various forms. The end goal, of course, is interoperability, 
as indicated by the MSDN blog post mentioned earlier (here again for reference: 
http://blogs.msdn.com/b/ie/archive/2014/07/31/the-mobile-web-should-just-work-fo
r-everyone.aspx). The string is still being tested with multiple web runs to 
see which tokens return the best interoperable content for the modern web.

As I understand your view, to be most precise, mod_pagespeed relies not just on 
feature detection with the Accept header, but also on UA sniffing for 
precision. I can certainly appreciate that approach considering that older 
versions of Chrome don't support newer versions of WebP. 

This approach certainly works if UAs remain generally the same for perpetuity 
and as long as new UA strings or feature variations aren't introduced. The 
Achilles heal is in update deployments of the module. With over 100,000 stated 
deployments of modpagespeed, changes to any browser's UA string must not only 
make it into the blacklist/whitelist but then must wait for deployments to 
catch up. This is very similar to the type of slow uptake we see with 
client-side JavaScript libraries/frameworks. I'm afraid the module currently 
takes content that would otherwise work cross-platform, and makes it 
Google-centric based on UA strings.

"'Edge' seems too generic and it's not obvious that won't appear in some non-IE 
browser." That may be true, but that's true for any of the strings that you are 
already testing for. But it seems that even if not generic terms, it isn't 
clear that those won't show up in non-IE browsers either. As already indicated, 
"Mozilla", "Gecko", and "Safari" show up in Chrome's UA string. So, I'm not 
sure that limiting what the IE team decides they want to put in their UA string 
is going to prevent it from showing up in another UA string.  I'm also not sure 
that Google intends to be the gatekeeper for UA strings for all vendors. 

In this case, I think the MS Open Tech guys are simply trying to be good 
stewards in helping modpagespeed get updated ahead of the curve. What is the 
final UA string? I don't know and I'm not certain that the UA folks know 
either. 

Considering the statement above, "All we need is a new pattern to add to 
kWebpBlacklist in that file, and we can get this in quick.  And it would be 
good to get this to us quickly." I'd like to confirm the pattern stated already 
is fairly stable. It is is very similar to what we've seen with Chrome -- 
basically include the kitchen sink of UA tokens -- followed by "Edge/*".

Again, I don't know every token or the order of those tokens, but the pattern 
will definitely include Chrome* and Edge*. So I think the pattern you want will 
be something like Chrome/*Edge/*

This should be all you need to catch the new version.  

I hope we can work on a better long-term solution that would depend more on 
Accept headers as the primary means for transcoding, and do the refinement 
based on UA string. In this way, you'll only ever serve WebP content to those 
that explicitly state they support it, and you can refine what version of WebP 
to serve based on the UA string after the fact. This still has it's issues, but 
would at least prevent the module from improperly transcoding content for 
browsers that could otherwise consume the original content.

Thanks again for your help!

Original comment by tob...@microsoft.com on 22 Sep 2014 at 11:03

@GoogleCodeExporter
Copy link
Author

Thanks for the perspective.  Actually I am not concerned with old Chrome, since 
Chrome auto-updates the number of old Chromes is increasingly small, and 
serving them jpegs when even if they could take webp is fine.

The problem is old Android native browsers, and the old stock Android-browser 
doesn't auto-update.  Lots of people use old Android phones on low bandwidth 
connections so they are exactly the people that would benefit most from a more 
compact file format.  So our webp-detection for Android wants to stay as is 
(modulo the "Windows Phone" blacklist).

But I think we might be able to change our webp-capable-detection for Chrome 
pretty much as you suggest: look first for Accept:image/webp and then at the UA 
to determine which webp features are available.

Hoewver, mod_pagespeed is not the only web-performance-optimization solution.  
I'm not sure which other ones try to transcode to webp.  There is also 
Strangeloop (Radware), Akamai Aqua Ion, Torbit (now part of Walmart), Yotta, 
Acceloweb (Limelight) and others.  In addition to that, individual sites may 
also add custom UA-sniffing logic.  By masquerading as Chrome it seems like IE 
is liable to break more than just MPS sites because there are other Chrome 
features that sites might use.  And most such features actually don't have 
"Accept" headers.


Your point is well-taken though: mod_pagespeed may be one of the widest 
deployed solutions that cannot be instantly updated, whereas a large site 
sending webp to Chrome via their own custom logic will probably find out 
quickly that their site is broken when a new version of IE11 comes out, and 
push a fix.


But for the record, 100k sites is understated by openly available estimates, 
such as http://trends.builtwith.com/Server/mod_pagespeed.  

And I am still curious why it's necessary for desktop IE to masquerade as 
Chrome.  I can understand mobile IE masquerading as Android, Chrome, or Safari 
because you want the mobile versions of sites.  But what do you get out of 
masquerading as Chrome on desktop?

Original comment by jmara...@google.com on 23 Sep 2014 at 12:12

@GoogleCodeExporter
Copy link
Author

No problem, and thanks for the clarification on Chrome vs. Android browsers 
being the target. That makes sense. 

I'm wondering if, given that these are older browsers with known versions, if 
you couldn't be more specific about the whitelist of Android browsers that 
accept WebP rather than relying on the blacklist to catch those that don't. I 
haven't personally haven't looked at your code yet to see how this works, so 
maybe you're doing that already. I'm happy to take a look.

We're aware of the other transcoding solutions and I'm sure there are efforts 
to reach out to them as well. I just spoke at Velocity Conference and had the 
opportunity to discuss this personally with a few performance solution 
providers. We are doing our best to be good citizens by working with folks such 
as yourself ahead of time. You being one of the largest, we certainly want to 
work closely with you to offer any support we can to make sure you're aware of 
our changes and how they'll affect your logic.

As I said, I'm a performance guy on the team, and I'm mainly interested in this 
module mainly from a performance perspective. The reasoning behind their UA 
string strategy isn't something I'm spending much time in. I just know that the 
team has been testing and crawling the web with various strings in an attempt 
to get even more desktop interoperability. 

In either case, we sincerely do appreciate your willingness to help this move 
forward. Will you be able to accept the patch with the pattern we provided as 
is or do you need something else from us?

Original comment by tob...@microsoft.com on 23 Sep 2014 at 12:47

@GoogleCodeExporter
Copy link
Author

Like I said, I don't like matching on "/Edge" as it is not clear where that 
comes from, and it's not in the IE string today.  But I think n.parashuram's 
earlier idea of combining Accept with UA for Chrome, but matching on UA only 
for Android will work and solve the problem as beset as we can tell.

I would still really like to know why this masquerading for desktop IE 11 is 
being considered at all.  Don't all web sites work reasonably well in IE 11, 
since it has very high market share on desktop?

I'm re-opening this bug so we can track the filtering of Chrome/webp detection 
on Accept.

Original comment by jmara...@google.com on 23 Sep 2014 at 3:42

  • Changed state: Accepted

@GoogleCodeExporter
Copy link
Author

Ah, sorry if I misunderstood. 

Your stated approach seems good. I'm imagining that this change is slightly 
bigger than the previous change and would take longer to update. Given that, do 
you have an ETA for the fix? 

I sincerely appreciate your expertise here. I owe you a growler of beer!

Original comment by tobin.ti...@gmail.com on 23 Sep 2014 at 4:45

@GoogleCodeExporter
Copy link
Author

We recently released MPS 1.9.32.1 (which does not have even the windows-mobile 
fix yet).  Our plan is to do a patch release 19.9.32.2 which will have that 
fix, the desktop fix, and also some other fixes.

I don't have a date for that patch release, but we will certainly do the patch 
release and ensure it looks good before marking it 'stable', which is where 
most of the users will benefit.

I would still really like to know why this masquerading for desktop IE 11 is 
being considered at all.  Don't all web sites work reasonably well in IE 11, 
since it has very high market share on desktop?

Original comment by jmara...@google.com on 23 Sep 2014 at 5:02

@GoogleCodeExporter
Copy link
Author

OK. Thanks for the info.

"I would still really like to know why this masquerading for desktop IE 11 is 
being considered at all.  Don't all web sites work reasonably well in IE 11, 
since it has very high market share on desktop?"

Yeah, I'm unfortunately not on that team and I'm not privy to why they are 
doing what they do with the UA. I would guess it's for the same reasons that 
Chrome includes "Safari" and "Gecko" in their UA string. To be honest, I'm not 
even sure why that is either. 

Original comment by tobin.ti...@gmail.com on 23 Sep 2014 at 5:09

@GoogleCodeExporter
Copy link
Author

I have a patch that does what I claimed -- it doesn't do anything special for 
IE but it looks for an accept:image/webp header if the UA is Chrome (not fully 
trusting that the word 'Chrome' in the UA is sufficient).  If the UA is Android 
it doesn't need that (and won't get it), but we'd send the Android browser webp 
anyway.

It has not been through the full testing suite or code review yet but I thought 
I'd expose it early to see if there are comments externally.


I still think it's a bad idea for other browsers to masquerade as Chrome, as 
there are other crazy high impact things we think we can get away when we think 
we are talking to Chrome, such as defer_javascript.  Tobin: if you can relate 
our concerns to the team controlling the UA I'd appreciate it.

Original comment by jmara...@google.com on 24 Sep 2014 at 9:17

Attachments:

@GoogleCodeExporter
Copy link
Author

Original comment by jmara...@google.com on 24 Sep 2014 at 9:22

Attachments:

@GoogleCodeExporter
Copy link
Author

Thanks both for the advice and for the patch. I'll certainly pass this along to 
the app team. 

Original comment by tob...@microsoft.com on 24 Sep 2014 at 10:06

@GoogleCodeExporter
Copy link
Author

https://code.google.com/p/modpagespeed/source/detail?r=4265

Original comment by jmara...@google.com on 25 Sep 2014 at 5:21

  • Changed state: Fixed
  • Added labels: Milestone-v32, release-note

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

1 participant