Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

URL Encoding/Decoding is broken in urlrewritefilter #27

Closed
GoogleCodeExporter opened this issue Jul 5, 2015 · 5 comments
Closed

URL Encoding/Decoding is broken in urlrewritefilter #27

GoogleCodeExporter opened this issue Jul 5, 2015 · 5 comments

Comments

@GoogleCodeExporter
Copy link

urlrewritefilter fails to properly handle encoded URLs such as
http://foo/plus+slash%2fsemicolon%3bquestionmark%3f?first=plus+plus

This is due to several common misconceptions. The previous URL once decoded
is http://foo/plus+slash/semicolon;questionmark??first=plus+plus which is
impossible to reencode correctly since we have just lost the precious
information that "plus+slash/semicolon;questionmark?" used to be a single
path segment, and not several, including one with a path parameter and an
invalid query string.

The first real error lies in using either java.net.URLDecoder or
commons-codec to decode URLs, since "+" in a path segment is a valid
character which should not be decoded to a space, as should be in the query
part. I've included URLEncoder and URLDecoder classes in the patch that
perform valid escaping based on where in the URL we are doing the
encoding/decoding.

The second obstacle is that there should be both ${escape:...} (assuming we
keep it for query parts) and ${escapePath:...} functions available for
substitution, since we may need to be able to decode from a path segment
and encode to a query part (and reverse):

<from>/(.*)?foo=(.*)</from>
<to>/foo/${encodePath:$2}?query=${encode:$1}</to>

I've added these functions (and decode) to my patch.

Actually we also may need <from decode="false">...</from> in many cases,
but I haven't made that change in my patch. We want it only for some rules,
as opposed to all rules as is currently the case.

The last thing my patch does is fix <to> substitution. Because this is not
evaluated as a real language with a lexer/parser there are many errors. For
example it is currently possible to have the following happen:

Suppose you have a variable FOO set to "$1" and the URL being matched "/bar":
<from>/(.*)</from>
<to>/url/%{FOO}</to>
will be substituted first to "/url/$1" then "/url/bar" due to sequential
substitution.

My patch introduces a poor-man's evaluation/substitution technique in the
form of a chain of substitution filters similar to servlet filters, which
means that the output of (backreferences, previous back-references,
variables and functions) will not be substituted again. Also it enables
recursive function evaluation such as "${escape:${unescapePath:$1}}" which
was not previously working.

The tests have been updated and all run except one weird test which I
suspect is not applicable any more.

The only thing I did not properly fix is
"${replace:${replace:foo:bar}:gee}" because the regexp which finds the
semicolons is wrong and will find the first one even though it belongs to
an inner function. If there is a way to write a regexp that matches a
semicolon which isn't withing "${}" then I could fix it, but my expertise
stops there.

This patch applies to 3.2.0 and fixes all URL issues we had with
urlrewritefilter for us. Do not hesitate to contact me for more info.

Thanks for this great product.

Original issue reported on code.google.com by stephane...@gmail.com on 21 Jan 2009 at 11:38

Attachments:

@GoogleCodeExporter
Copy link
Author

This definitely can be a genuine enhancement in the filter.
http://code.google.com/p/urlrewritefilter/issues/detail?id=38 seems to be 
similar and
dependent(?).
I'll test the patch shortly and add post my comments.

Original comment by avl...@gmail.com on 10 Aug 2009 at 3:53

@GoogleCodeExporter
Copy link
Author

i tried this patch with the following rule

<rule>
  <from>^/documents/(.*)$</from>
  <to type="forward" encode="true">/MyServlet?document=${encode:$1}</to>
</rule>

but the generated URL was always "/MyServlet?document=". The ${encode} function
always returned an empty string.

Original comment by hasan.ca...@gmail.com on 11 Aug 2009 at 8:07

@GoogleCodeExporter
Copy link
Author

I never tried the forward option, so I cannot help much. I do know setting
"encode=true" when you encode the URL manually makes no sense, since URLs 
cannot be
encoded in their entirety (read
http://www.lunatech-research.com/archives/2009/02/03/what-every-web-developer-mu
st-know-about-url-encoding
)

I do know of at least one other bug in my patch, and since my coding time is 
strictly
limited to the portion that is directly relevant to my job, I did not have the 
time
to write proper unit tests and fix that other bug, so I'm really sorry I cannot 
help
you there.

I think my analysis of the problem is valid though, if someone can find a 
better way
to fix the problem, or patch my patch ;)

Original comment by stephane...@gmail.com on 12 Aug 2009 at 8:06

@GoogleCodeExporter
Copy link
Author

Issue 38 has been merged into this issue.

Original comment by avl...@gmail.com on 7 Jan 2010 at 5:04

@GoogleCodeExporter
Copy link
Author

Thanks!  This has bee applied (with a few tweaks).  Please verify it works as 
intended.

http://code.google.com/p/urlrewritefilter/source/detail?r=307

Original comment by tuc...@gmail.com on 9 Aug 2010 at 10:47

  • Changed state: Fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant