Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

URL Encoding/Decoding is broken in urlrewritefilter #27

Closed
GoogleCodeExporter opened this issue Jul 5, 2015 · 5 comments
Closed

URL Encoding/Decoding is broken in urlrewritefilter #27

GoogleCodeExporter opened this issue Jul 5, 2015 · 5 comments

Comments

@GoogleCodeExporter
Copy link

urlrewritefilter fails to properly handle encoded URLs such as
http://foo/plus+slash%2fsemicolon%3bquestionmark%3f?first=plus+plus

This is due to several common misconceptions. The previous URL once decoded
is http://foo/plus+slash/semicolon;questionmark??first=plus+plus which is
impossible to reencode correctly since we have just lost the precious
information that "plus+slash/semicolon;questionmark?" used to be a single
path segment, and not several, including one with a path parameter and an
invalid query string.

The first real error lies in using either java.net.URLDecoder or
commons-codec to decode URLs, since "+" in a path segment is a valid
character which should not be decoded to a space, as should be in the query
part. I've included URLEncoder and URLDecoder classes in the patch that
perform valid escaping based on where in the URL we are doing the
encoding/decoding.

The second obstacle is that there should be both ${escape:...} (assuming we
keep it for query parts) and ${escapePath:...} functions available for
substitution, since we may need to be able to decode from a path segment
and encode to a query part (and reverse):

<from>/(.*)?foo=(.*)</from>
<to>/foo/${encodePath:$2}?query=${encode:$1}</to>

I've added these functions (and decode) to my patch.

Actually we also may need <from decode="false">...</from> in many cases,
but I haven't made that change in my patch. We want it only for some rules,
as opposed to all rules as is currently the case.

The last thing my patch does is fix <to> substitution. Because this is not
evaluated as a real language with a lexer/parser there are many errors. For
example it is currently possible to have the following happen:

Suppose you have a variable FOO set to "$1" and the URL being matched "/bar":
<from>/(.*)</from>
<to>/url/%{FOO}</to>
will be substituted first to "/url/$1" then "/url/bar" due to sequential
substitution.

My patch introduces a poor-man's evaluation/substitution technique in the
form of a chain of substitution filters similar to servlet filters, which
means that the output of (backreferences, previous back-references,
variables and functions) will not be substituted again. Also it enables
recursive function evaluation such as "${escape:${unescapePath:$1}}" which
was not previously working.

The tests have been updated and all run except one weird test which I
suspect is not applicable any more.

The only thing I did not properly fix is
"${replace:${replace:foo:bar}:gee}" because the regexp which finds the
semicolons is wrong and will find the first one even though it belongs to
an inner function. If there is a way to write a regexp that matches a
semicolon which isn't withing "${}" then I could fix it, but my expertise
stops there.

This patch applies to 3.2.0 and fixes all URL issues we had with
urlrewritefilter for us. Do not hesitate to contact me for more info.

Thanks for this great product.

Original issue reported on code.google.com by stephane...@gmail.com on 21 Jan 2009 at 11:38

Attachments:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant