Skip to content
This repository has been archived by the owner on Apr 21, 2023. It is now read-only.

Allow caching HTML files #232

Closed
GoogleCodeExporter opened this issue Apr 6, 2015 · 29 comments
Closed

Allow caching HTML files #232

GoogleCodeExporter opened this issue Apr 6, 2015 · 29 comments

Comments

@GoogleCodeExporter
Copy link

I have configured my apache server to use specific Cache-Control values for 
certain files.

for example, for my html files, i need to have:

Cache-Control: private, max-age=3600

but when i enable mod_pagespeed, it changes my  Cache-Control value!

without mod_pagespeed:

$ curl -I http://www.loupiote.com/photos/833171367.shtml
HTTP/1.1 200 OK
Date: Thu, 10 Mar 2011 07:18:26 GMT
Server: Apache/2.2.9 (Fedora)
Accept-Ranges: bytes
Cache-Control: private, max-age=3600
Expires: Thu, 10 Mar 2011 08:18:26 GMT
Vary: Accept-Encoding
Connection: close
Content-Type: text/html

with mod_pagespeed:

$ curl -I http://www.loupiote.com/photos/833171367.shtml
HTTP/1.1 200 OK
Date: Thu, 10 Mar 2011 07:08:48 GMT
Server: Apache/2.2.9 (Fedora)
Accept-Ranges: bytes
X-Mod-Pagespeed: 0.9.15.3-404
Cache-Control: max-age=0, no-cache, no-store
Vary: Accept-Encoding
Connection: close
Content-Type: text/html

is there a way to tell mod_pagespeed to keep my Cache-Control values 
(configured in my apache config file) untouched?

note: i do not use extend_cache and i do not use the CoreFilters.

What version of the product are you using (please check X-Mod-Pagespeed
header)?

X-Mod-Pagespeed: 0.9.15.3-404

On what operating system?

linux

Which version of Apache?

Apache/2.2.9

Which MPM?

what does MPM mean???

Please provide any additional information below, especially a URL or an
HTML file that exhibits the problem.

sorry, no test page, unfortunately due to problems with SSI (and this cache 
problem) i have to turn pagespeed off in my config :(

Original issue reported on code.google.com by loupi...@gmail.com on 10 Mar 2011 at 7:27

@GoogleCodeExporter
Copy link
Author

This is expected behavior. In general, the HTML produced by mod_pagespeed is 
not cacheable.

You say that you do not use extend_cache or CoreFilters. What do you use?

Original comment by sligocki@google.com on 10 Mar 2011 at 3:46

  • Changed title: Allow caching HTML files
  • Added labels: Type-Enhancement
  • Removed labels: Type-Defect

@GoogleCodeExporter
Copy link
Author

i need to set html cache to "private" because my HTML pages use SSI (server 
side include), and they generate different content depending on the client 
machine that accesses the pages.  so the pages can be cached on the client 
machine side, but not on any proxy.

so i set my Cache-Control header to "Cache-Control: private, max-age=3600"

and i would appreciate if mod_pagespeed did not tamper-with or modify my 
Cache-Control instructions, especially since it does not cache the html pages.

is there a way to tell mod_pagespeed exactly what Cache-Control header it 
should use? or better, to tell it to not change the Cache-Control that is 
normally generated by apache for the files that it filters?

Original comment by loupi...@gmail.com on 10 Mar 2011 at 4:31

@GoogleCodeExporter
Copy link
Author

There is currently no way to change how mod_pagespeed sets caching headers on 
HTML, sorry.

Please note that we set the headers to no-cache, so public proxies will not 
cache your HTML. The only difference is that individual users will also not 
cache your HTML.

I understand that it is frustrating to have mod_pagespeed override your 
explicit cache settings. I am setting this as an enhancement to make this 
possible, but we have built mod_pagespeed on the premise that HTML pages won't 
be cached, and so we need to evaluate how we could give you more control over 
your cache settings while also not breaking pages.

Thanks,
-Shawn

Original comment by sligocki@google.com on 10 Mar 2011 at 4:42

@GoogleCodeExporter
Copy link
Author

A little more background on why we turn off caching altogether for HTML that is 
rewritten by mod_pagespeed.

One of mod_pagespeed's main values is that it extends the cache lifetime of 
resources, without compromising a site owner's ability to change them.  To do 
this it rewrites the URL references to the resources with a signature of their 
content.

So if you had foo.css with a cache lifetime of 10 minutes, then someone who 
revisits your site in 20 minutes will have to have at least one new HTTP 
transaction to re-fetch or re-validate the content.  But mod_pagespeed will 
rewrite the reference in HTML to be 
   foo.css.pagespeed.SIGNATURE1.cc.css
and we let that be cached for 1 year.  But let's say that you edit your .css 
file.  HTML that we rewrite will now have:
   foo.css.pagespeed.SIGNATURE2.cc.css
The foo.css.pagespeed.SIGNATURE1.cc.css sitting in the browser cache will not 
do any harm -- it won't be referenced any more and your browser will eventually 
evict it in favor of more useful content.  But if you go 2 weeks or 5 months 
without actually touching your .css file then your users will not need to 
reload or revalidate it unless it falls out of their cache.

But there's a potential flaw in this.  If we were to, say, allow HTML to be 
cached for 1 hour, then you would not be able to effectively change your CSS 
files for 1 hour, even though you had specified a 10 minute TTL (time-to-live) 
for your CSS.  That would be a violation of the user's intent.

The simplest solution to this problem is for us to prevent caching HTML files 
altogether.  A solution that would also be correct is to allow caching HTML 
files for the minimum TTL of the HTML and all its resources.  That's certainly 
possible for us to do; we just haven't done it yet.

Original comment by jmara...@google.com on 10 Mar 2011 at 4:55

@GoogleCodeExporter
Copy link
Author

the issue for me is that my webpages are optimized for mobile (e.g. iPhone, 
Android) when accessed by a mobile (i.e. less asynchronous javascript modules 
are loaded, making the pages faster to load).

because mobile networks are slower, i want to make sure that the pages are 
cached on the client if there is available cache space, to improve navigation 
speed e.g. if the user  goes to another page, then returns to this page, or hit 
the "back" button.

so i am affraid that using mod_pagespeed would actually reduce the performances 
of the site when it is accessed by a mobile, because of the caching situation 
(i.e. not allowing devices to cache locally).

i guess one possible workaround would be to rebuild all the pagespeed-filtered 
pages using a script calling curl (10,000 or so pages), and store them in a 
different folder on my server, and serve myself my own cached 
pagespeed-filtered pages with my own cache control correctly set-up.

that's really a lot of work just because mod_pagespeed tampers with my 
cache-control.

another issue that would arise in the case of that workaround is that i use 
SSI, and the SSI that i use generates different pages depending on the client. 
so in order for it to work, there should be a way to tell mod_pagespeed to 
leave all the SSI special comments (<!--# ... -->) untouched, and to tell my 
server to not process the SSI when i access the pages via curl to buid my own 
cache. at that point i don't think mod_pagespeed has an option to let the SSI 
special comments untouched while removing all the other comments from the page.

Original comment by loupi...@gmail.com on 10 Mar 2011 at 4:57

@GoogleCodeExporter
Copy link
Author

thanks.  now i understand the rational for not caching the html pages (because 
of the foo.css uri that has been changed to foo.css.pagespeed.SIGNATURE2.cc.css 
uri in the html files).

but it would definitely be better if mod_pagespeed could allow some caching of 
the html on the client side, especially with the increasing use of mobile 
(slower / more expensive network connection, but lots of caching memory 
available on the client).

Original comment by loupi...@gmail.com on 10 Mar 2011 at 5:06

@GoogleCodeExporter
Copy link
Author

This issue will help us track our progress toward allowing HTML to be cached 
for the minimum of the TTLs for all the rewritten resources on it.

I think when we resolve this your web-site will behave like you want, which 
(correct me if I'm wrong) will have your HTML and your resources all 
consistently cacheable for 1 hour.

Original comment by jmara...@google.com on 10 Mar 2011 at 5:25

@GoogleCodeExporter
Copy link
Author

> I think when we resolve this your web-site will behave like you want, which 
(correct me if I'm wrong) will have your HTML and your resources all 
consistently cacheable for 1 hour.

yes, typically that's what i was expecting. a longer caching period on the 
client (e.g. one day) would be possible but i don't think it would make much of 
a performance difference, because of the low attention span of the users (they 
usually spend just a few minute on a site).

and because my pages may contain user-generated comments, i don't want them to 
be cached longer than say, a day, so that the fresh comments will show.

Original comment by loupi...@gmail.com on 10 Mar 2011 at 5:29

@GoogleCodeExporter
Copy link
Author

Sites that use http accelerators such as Varnish or tools like mod_cache would 
also benefit from retaining the application's cache-control headers.

For example, we send out short (~30 seconds) cache-control headers from our php 
application for all text/html content types so that the large majority of 
requests are served from caches in-front of our application servers.

mod_pagespeed is a very useful tool for optimising the page before it hit the 
caches by inlining and minifying the css & js, collapsing whitespace.

The flow of the response to any request follows this chain,

 (i) application -> (ii) mod_pagespeed -> (iii) mod_diskcache -> (iv) client

... mod_pagespeed is preventing (iii) from happening due to it's overriding of 
the application's cache-control headers and it's also preventing private 
caching on the client end (iv).

> A solution that would also be correct is to allow caching HTML files for
> the minimum TTL of the HTML and all its resources.

It would be great just to have an 'obey application cache-control headers' 
option. 

Pages that are inlining all the css and js don't have to worry about to issue 
you describe above whereby a stale HTML page prevents the updating of a CSS 
asset, especially not in our circumstances where our max-age is set to a short 
amount of time.

Original comment by commuter...@gmail.com on 8 May 2011 at 8:51

@GoogleCodeExporter
Copy link
Author

Any update on this issue?  I'm considering turning mod_pagespeed off because of 
it's habit of setting cache-control to 0 on pages that it hasn't altered.

Original comment by fumeoftheday on 21 Oct 2011 at 10:28

@GoogleCodeExporter
Copy link
Author

Actually we've made a fair bit of progress on this issue in our trunk.  However 
I think  we need to do a few more tweaks to expose HTML caching to users.

However I don't have a date for a release that incorporates the new feature(s) 
yet.  Stay tuned.

Original comment by jmara...@google.com on 21 Oct 2011 at 10:34

@GoogleCodeExporter
Copy link
Author

Original comment by jmara...@google.com on 21 Oct 2011 at 10:34

@GoogleCodeExporter
Copy link
Author

hi .. is there any update available for html caching ...thanks .. JJ

Original comment by jimyjo...@gmail.com on 6 Dec 2011 at 5:08

@GoogleCodeExporter
Copy link
Author

We have some functionality to allow users to cache HTML. It has not been 
exposed via configuration yet or tested.

Original comment by sligocki@google.com on 6 Dec 2011 at 6:06

  • Changed state: Accepted

@GoogleCodeExporter
Copy link
Author

because of that, there is a problem when user click back button.
if prev request is:
GET -> it will resend the request to server.
POST -> it show msg:
1. IE 8: Page Expires
2. Chrome 17: Confirm Form Resubmission
3. Firfix 10: This document is no longer available 

the user can not see prev page any more!!!

Original comment by bill2004...@gmail.com on 4 Feb 2012 at 5:28

@GoogleCodeExporter
Copy link
Author

Hi Bill, your issue is fixed by default in trunk and will be fixed in the next 
release (coming out as we speak).

This is actually not because we use no-cache for headers, but because we used 
to use no-store.

If you can build from source, building from trunk (or tag 0.10.21.2) should 
work. Otherwise, you can wait for the release. Please let me know if that fixes 
the problem.

Original comment by sligocki@google.com on 6 Feb 2012 at 7:29

@GoogleCodeExporter
Copy link
Author

hi sligocki, it seems not works, still need remove "no-cache"?
"Cache-Control: private, max-age=0" will works. 

Original comment by bill2004...@gmail.com on 16 Feb 2012 at 1:16

@GoogleCodeExporter
Copy link
Author

Thanks for the update Bill,

What browser do you detect this in? I'll run some tests and if this is the 
case, I think we can move to using "Cache-Control: private, max-age=0".

Original comment by sligocki@google.com on 16 Feb 2012 at 2:03

@GoogleCodeExporter
Copy link
Author

as comment 16:
POST -> it show msg:
1. IE 8: Page Expires
2. Chrome 17: Confirm Form Resubmission
3. Firfix 10: This document is no longer available 

Original comment by bill2004...@gmail.com on 16 Feb 2012 at 4:48

@GoogleCodeExporter
Copy link
Author

Ah, I see, this only happens for POSTs now? I'll look into it.

Original comment by sligocki@google.com on 16 Feb 2012 at 5:47

@GoogleCodeExporter
Copy link
Author

yes. for example:
1. post a form
2. click a link.
3. click back

in step 3, it should ask "are you sure resend post?"
however if cache control has "no-cache" or "no-store", we will get that error 
msg.

Original comment by bill2004...@gmail.com on 17 Feb 2012 at 3:54

@GoogleCodeExporter
Copy link
Author

Hi Bill, I'm separating your comment out into a separate bug 394, it is a 
different problem then the OP.

Original comment by sligocki@google.com on 27 Feb 2012 at 6:30

@GoogleCodeExporter
Copy link
Author

[deleted comment]

@GoogleCodeExporter
Copy link
Author

Back to the original issue:

We recently added the config option:

ModPagespeedModifyCachingHeaders off

which will tell mod_pagespeed not to change HTML caching headers. See 
documentation: 
http://code.google.com/speed/page-speed/docs/install.html#ModifyCachingHeaders

Note: We do not suggest you turn this option off. It breaks mod_pagespeed's 
caching assumptions and can lead to unoptimized HTML being served from a proxy 
caches set up in front of the server. If you do turn it off, we suggest that 
you do not set long caching headers to HTML or users may receive stale or 
unoptimized content.

Original comment by sligocki@google.com on 27 Feb 2012 at 6:37

@GoogleCodeExporter
Copy link
Author

This issue sounds like the general issue of caching:  what if I change my 
content?

Isn't this why you run HEAD / HTTP/2.0 and get a Last-Modified header?  I mean 
even static page content can change all the time.

Finally, we use Varnish here with a 60 second cache lifetime.  We can't take 
the bandwidth here, the Varnish server is off-site so it handles all the 
bandwidth.  Effectively we handle the bandwidth of number of accesses to unique 
URLs per minute, rather than number of total accesses per second.  As you say, 
the caching headers are set to not cache for intermediary proxies; this is 
important because intermediary proxies are kind of critical.

It's also notable that mod_pagespeed allows the insertion of Google Analytics 
code automatically, which is a use case in itself.

Original comment by john.r.m...@gmail.com on 15 Oct 2012 at 12:31

@GoogleCodeExporter
Copy link
Author

So while ModPagespeedModifyCachingHeaders off seems to restore the 
Cache-Control and Expires headers properly, it appears that Last-Modified still 
gets stripped. Is this intentional?

Please see the following URLs to see the difference:
http://status.stackstat.us/
vs
http://status.stackstat.us/?ModPagespeed=off

Original comment by ta...@reddyemail.com on 21 Mar 2013 at 4:29

@GoogleCodeExporter
Copy link
Author

You are quite right, Tarun.  We should not be stripping Last-Modified with 
ModPagespeedModifyCachingHeaders off. 

Moving to separate issue: 
https://code.google.com/p/modpagespeed/issues/detail?id=652

Original comment by jmara...@google.com on 21 Mar 2013 at 12:37

@GoogleCodeExporter
Copy link
Author

This long-standing request is being actively worked on.  Assigning to Anupama 
for tracking.

Original comment by jmara...@google.com on 25 Jun 2013 at 2:32

@GoogleCodeExporter
Copy link
Author

This was fixed in 1.6.  Please see 
https://developers.google.com/speed/pagespeed/module/downstream-caching

Original comment by jmara...@google.com on 2 Oct 2013 at 2:49

  • Changed state: Fixed

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

1 participant