Skip to content
This repository has been archived by the owner on Apr 21, 2023. It is now read-only.

Wrong document URL when mod_rewrite used #234

Closed
GoogleCodeExporter opened this issue Apr 6, 2015 · 23 comments
Closed

Wrong document URL when mod_rewrite used #234

GoogleCodeExporter opened this issue Apr 6, 2015 · 23 comments

Comments

@GoogleCodeExporter
Copy link

What steps will reproduce the problem?

Just upgraded to the latest binary (0.9.16.3-r534)

What is the expected output? What do you see instead?

Style sheet returns "NOT FOUND" and all resources are being sourced from the 
wrong address.
Leading '/' or 'http:' have been removed from HTML for all links/references.

What version of the product are you using (please check X-Mod-Pagespeed
header)?

0.9.16.3-r534

On what operating system?
Debian/GNU Linux

Which version of Apache?
2.2.16

Please provide any additional information below, especially a URL or an
HTML file that exhibits the problem.

This server hosts dozens of websites, but only one is showing errors:
e.g. http://www.the-art-of-web.com/php/

Possible conflict with mod_rewrite rule:

Request URL: http://www.the-art-of-web.com/php/ (broken)
RewriteRule ^([a-z]+)/$ menu_$1.html
Actual URL: http://www.the-art-of-web.com/menu_php.html (works)

But other websites using similar (not identical) rewrite rules are not showing 
errors - so far.

Original issue reported on code.google.com by dun...@chirp.com.au on 10 Mar 2011 at 11:13

@GoogleCodeExporter
Copy link
Author

Hi -- can you try this workaround in your pagespeed.conf file to see if it 
solves the problem?

  ModPagespeedDisableFilters trim_urls

This is a new filter that we added to the core-filter set after extensive 
validation.  It's possible we missed a corner case & we'll check out your site 
to figure out what we missed.

Original comment by jmara...@google.com on 10 Mar 2011 at 2:54

@GoogleCodeExporter
Copy link
Author

Yes, it's working now with that filter disabled.  Thanks :)

Original comment by dun...@chirp.com.au on 10 Mar 2011 at 2:58

@GoogleCodeExporter
Copy link
Author

I've attached before and after files. It looks like mod_pagespeed thinks the 
document URL is http://www.the-art-of-web.com/menu_php.html rather than 
http://www.the-art-of-web.com/php/ because of the rewrite rule.

Original comment by sligocki@google.com on 10 Mar 2011 at 3:28

Attachments:

@GoogleCodeExporter
Copy link
Author

This is happening to me as well with mod_rewrite's rewritten URLs.

Original comment by ceeja...@gmail.com on 10 Mar 2011 at 5:05

@GoogleCodeExporter
Copy link
Author

Could you also try the workaround mentioned above (ModPagespeedDisableFilters 
trim_urls) to see if it solves the problem?

We're working on a fix for this problem.

Original comment by nfor...@google.com on 10 Mar 2011 at 5:59

@GoogleCodeExporter
Copy link
Author

Hello, I have the same problem and disabling trim_urls solves.

Original comment by guilherm...@foobaria.com on 10 Mar 2011 at 6:45

@GoogleCodeExporter
Copy link
Author

Summary was: Binary 0.9.16.3 generating invalid URLs

Original comment by sligocki@google.com on 10 Mar 2011 at 8:02

  • Changed title: Wrong document URL when mod_rewrite used
  • Changed state: Started
  • Added labels: Priority-High
  • Removed labels: Priority-Medium

@GoogleCodeExporter
Copy link
Author

I've determined that the problem occurs only when the request contains a 
directory while the target of the RewriteRule is in the root directory.

So the following will break:
RewriteRule ^([a-z]+)/$ menu_$1.html
RewriteRule ^([a-z]+)/page\.html menu_$1.html

While this does not:
RewriteRule ^([a-z]+)/$ directory/menu_$1.html

Original comment by dun...@chirp.com.au on 11 Mar 2011 at 10:35

@GoogleCodeExporter
Copy link
Author

We have not been able to reproduce the error you're getting on our systems, but 
we think we understand what is going wrong and have created a fix.

Can you try upgrading to the latest release (0.9.16.6) and tell us if this 
fixes the problems? You can do this by using:
  sudo yum update  or  sudo apt-get upgrade mod-pagespeed-beta
or by downloading from the Download page or compiling from the latest-beta 
branch.

Original comment by sligocki@google.com on 11 Mar 2011 at 11:58

@GoogleCodeExporter
Copy link
Author

No, same problem still occurs with 0.9.16.6-r555 and trim_urls enabled.

Original comment by dun...@chirp.com.au on 12 Mar 2011 at 9:58

@GoogleCodeExporter
Copy link
Author

I've set up a simple test case here:
http://pagespeed.chirp.com.au/test.html

Original comment by dun...@chirp.com.au on 12 Mar 2011 at 10:28

@GoogleCodeExporter
Copy link
Author

Thanks for the testcase.  Can you also supply the apache conf file?  We think 
it might be related to the exact rewriterule but maybe there is an interaction 
with another conf file stanza.

Original comment by jmara...@google.com on 12 Mar 2011 at 12:46

@GoogleCodeExporter
Copy link
Author

Sorry Duncan I asked that question before looking at your site -- you supplied 
all we should need.  Thanks again; we'll try to repro.

Original comment by jmara...@google.com on 12 Mar 2011 at 2:06

@GoogleCodeExporter
Copy link
Author

I have reproduced the problem using your tarball.

Original comment by jmara...@google.com on 12 Mar 2011 at 2:22

@GoogleCodeExporter
Copy link
Author

I found why our trial fix doesn't work.  I haven't figured out how to fix it 
yet.

The trial fix attempted to save the original URL (the one that the browser 
thinks is the URL for the page) prior to mod_rewrite mutating the request.  It 
would save it in the request->note table.

The fix doesn't work because in this configuration, mod_rewrite does not alter 
the request->unparsed_uri.  Somehow it makes a different request.  It's a 
subrequest.

So, mod_pagespeed's hook that runs prior to mod_rewrite saves the original URI 
in REQUEST_A->note.
But by the time the HTML rewriter goes, it get REQUEST_B, which has no note.

However, I've observed that REQUEST_B->prev == REQUEST_A

So perhaps we can walk down the ->prev chain somehow to find our original note. 
 We will have to do a little research to figure out how to do this safely.

Original comment by jmara...@google.com on 12 Mar 2011 at 2:38

@GoogleCodeExporter
Copy link
Author

more details:

when the subrequest is created, it gets a blank slate for request->notes.  But 
it appears that it gets request->subprocess_env which is made by walking the 
original request->subprocess_env, copying the table, but altering the keys as 
REDIRECT_orig_key=orig_value

This is all in httpd/src/modules/http/http_request.c, function 
internal_internal_redirect() and rename_original_env()

Note also the function ap_internal_fast_redirect(), which *will* copy the 
notes.  Probably the difference between the redirects that provoke this bug and 
the ones that don't are derived from which of these two methods for 
implementing redirects are selected by mod_rewrite.

Apache is a maze of twisty little passages, all different.

Of course this is based on the version of Apache we develop with, which I think 
is 2.2.16 or 2.2.17.  It's quite possible that this is all different in 2.2.3, 
which I think many mod_pagespeed users have, probably because it comes default 
on a popular distro.

Original comment by jmara...@google.com on 12 Mar 2011 at 3:30

@GoogleCodeExporter
Copy link
Author

My current theory on how to fix this is to register for 'create_request' hook, 
and use that to copy over any of our notes.

Original comment by jmara...@google.com on 12 Mar 2011 at 3:36

@GoogleCodeExporter
Copy link
Author

See this call in httpd/src/modules/http/http_core.c :      
ap_hook_create_request(http_create_request, NULL, NULL, APR_HOOK_REALLY_LAST);

I think we can put in a hook like that, adding code which copies our notes.

Original comment by jmara...@google.com on 12 Mar 2011 at 4:58

@GoogleCodeExporter
Copy link
Author

I've found the answer why we weren't able to repro this.  The handling in 
mod_rewrite.c is entirely different if the RewriteRule is specified in a 
directory context.

paraphrased:

mod_rewrite.c:hook_fixup():  if 
(ap_get_module_config(r->per_dir_config,..)==NULL) return DECLINED;

Below there, note that we prepend r->filename with "redirect:"

            r->filename = apr_pstrcat(r->pool, "redirect:", r->filename, NULL);

handler_redirect() {
    if (strncmp(r->filename, "redirect:", 9) != 0) {
        return DECLINED;
    }

    /* now do the internal redirect */
    ap_internal_redirect(apr_pstrcat(r->pool, r->filename+9,
                                     r->args ? "?" : NULL, r->args, NULL), r);

And that's where the damage is done.  If our RewriteRule is in pagespeed.conf 
outside a <Directory> scope then I believe an entirely different flow occurs.

Original comment by jmara...@google.com on 12 Mar 2011 at 5:50

@GoogleCodeExporter
Copy link
Author

Thank you for the test case!

Could you please try out the latest release, 0.9.16.9 and see if it resolves 
the problem for you?

Thank you, again!

Original comment by nfor...@google.com on 16 Mar 2011 at 10:14

@GoogleCodeExporter
Copy link
Author

Original comment by jmara...@google.com on 17 Mar 2011 at 2:31

  • Changed state: Fixed

@GoogleCodeExporter
Copy link
Author

This is fixed in binary release 0.9.16.9

Original comment by jmara...@google.com on 17 Mar 2011 at 2:31

@GoogleCodeExporter
Copy link
Author

Confirming that all problems we were seeing with trim_urls and mod_rewrite have 
been fixed in 0.9.16.9.
Also for Issue 238: outline_javascript generating broken links

Original comment by dun...@chirp.com.au on 17 Mar 2011 at 9:54

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

1 participant