Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Canonicalizing "%Device% changes the meaning of the input string #306

Closed
meg23 opened this issue Nov 13, 2014 · 9 comments
Closed

Comments

@meg23
Copy link

meg23 commented Nov 13, 2014

From shilpi.a...@gmail.com on May 23, 2013 04:45:26

What steps will reproduce the problem? 1. Take String "%Device%.
2. Canonicalize it using Canonicalize method
3. Now do EncodeForHTML or simple display the result string from Canonicalize. What is the expected output? What do you see instead? The output has needs to be encoded for html, should display as "%Device%" in browser,Instead we see "Þvice%" What version of the product are you using? On what operating system? 2.0rc Does this issue affect only a specified browser or set of browsers? all Please provide any additional information below. We are using these APIs heavily. Please provide an estimate fix date.

Original issue: http://code.google.com/p/owasp-esapi-java/issues/detail?id=300

@xeno6696
Copy link
Collaborator

@kwwall this is an invalid use-case. The only thing you should do with a canonicalized string is to use it for further validation. Application implementations should default to storing data as-is, and follow the strategy of output-escaping all user input.

For the input in question, the part %De is the ASCII URI-encoding for Þ

Missing in this question also, is that the relevant encoding properties in ESAPI.properties The UTF-8 encoded version would be %c3%9e

The canonicalize function is doing exactly what it was intended to do.

@xeno6696 xeno6696 changed the title Canonicaling "%Device% changes the meaning of the input string Canonicalizing "%Device% changes the meaning of the input string Jan 28, 2016
@xeno6696
Copy link
Collaborator

Here is the relevant JUnit test to reproduce:

    public void test306(){
        String input = "%Device%";
        String testString = ESAPI.encoder().canonicalize(input);
        System.out.println(ESAPI.encoder().decodeForHTML(testString));
        assertEquals(input, ESAPI.encoder().decodeForHTML(testString));
    }

@planetlevel
Copy link

Because %de is a legit URL encoded character?

--Jeff


From: Matt Seil <notifications@github.commailto:notifications@github.com>
Sent: Thursday, January 28, 2016 4:13 PM
Subject: Re: [esapi-java-legacy] Canonicalizing "%Device% changes the meaning of the input string (#306)
To: ESAPI/esapi-java-legacy <esapi-java-legacy@noreply.github.commailto:esapi-java-legacy@noreply.github.com>

Here is the relevant JUnit test to reproduce:

public void test306(){        String input = "%Device%";        String testString = ESAPI.encoder().canonicalize(input);        System.out.println(ESAPI.encoder().decodeForHTML(testString));        assertEquals(input, ESAPI.encoder().decodeForHTML(testString));    }

Reply to this email directly or view it on GitHubhttps://github.com//issues/306#issuecomment-176420289.

@kwwall
Copy link
Contributor

kwwall commented Jan 29, 2016

@planetlevel Like @xeno6696 said, apparently it is a represents the 'Þ' character, whatever the heck that is. The only way that I see around this is to create a custom Codec that excludes certain things and use that rather than PercentCodec (which presumably is what is being used here) and add that to an explicit list of Codecs passed to the DefaultEncoder similar to what is sketched out in http://static.javadoc.io/org.owasp.esapi/esapi/2.1.0/org/owasp/esapi/Encoder.html#canonicalize%28java.lang.String,%20boolean,%20boolean%29.

However, I agree with Matt's assessment that what is being proposed as a bug here is just in invalid use case and I don't think that we should try to support it.

My vote is to close this issue. Does anyone object?

@xeno6696
Copy link
Collaborator

I think Jeff is assuming UTF-8 URL encoding. I've worked on systems using ISO 8859-1 and %DE would be a legal encoding there.

@xeno6696
Copy link
Collaborator

No objections to closing.

@jeremiahjstacey
Copy link
Collaborator

I'm reaching a bit out of my depth, so please forgive me if I'm
misunderstanding.

I think there may still be a bug here. I've written a unit test which
takes our argument string %De and passes to each of the codecs available
through esapi. When encoding I am using the String itself to create the
character immunity array. I do this with the expectation that if I give
the encoder a string, and every character in the string is immune to
encoding (as I understand the API) then I get the exact string back.
This is true for all cases except for PercentCodec.

org.junit.ComparisonFailure: expected:<[%De]> but was:<[Þ]>

I think this means the immunity list for the PercentCodec has a bug?

Test class attached.

On 1/29/2016 5:34 AM, Matt Seil wrote:

Closed #306 #306.


Reply to this email directly or view it on GitHub
#306 (comment).

package org.owasp.esapi;

import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collection;
import java.util.List;

import org.junit.Assert;
import org.junit.Test;
import org.junit.runner.RunWith;
import org.junit.runners.Parameterized;
import org.junit.runners.Parameterized.Parameters;
import org.owasp.esapi.codecs.*;

/**
 * Parameterized test to verify that the Immunity parameter for a codec encode/decode event works as expected on a series of special characters.
 *  
 * @author jeremiah.j.stacey@gmail.com
 * @since Jan 29, 2016
 *
 */
@RunWith(Parameterized.class)
public class CodecImmunityTest {
    @Parameters(name = "{0}")
    public static Collection<Object[]> getParams() {
        Collection<Codec> knownCodecs = new ArrayList<Codec>();
        knownCodecs.add(new CSSCodec());
        knownCodecs.add(new DB2Codec());
        knownCodecs.add(new HTMLEntityCodec());
        knownCodecs.add(new JavaScriptCodec());
        knownCodecs.add(new MySQLCodec(0)); //Standard
        knownCodecs.add(new MySQLCodec(1)); //ANSI
        knownCodecs.add(new OracleCodec());
        knownCodecs.add(new PercentCodec());
        knownCodecs.add(new UnixCodec());
        knownCodecs.add(new VBScriptCodec());
        knownCodecs.add(new WindowsCodec());
        knownCodecs.add(new XMLEntityCodec());

        //XXX:  Add more strings here!!
        List<String> sampleStrings = Arrays.asList("%De");

        Collection<Object[]> params = new ArrayList<Object[]>();
        for (Codec codec : knownCodecs) {
            for (String sample : sampleStrings) {
                params.add(new Object[]{codec.getClass().getSimpleName() + " " + sample, codec, sample});
            }
        }
        return params;
    }

    private final Codec codec;
    private final String string;
    private final char[] immunityList;

    public CodecImmunityTest(String ignored, Codec codec, String toTest) {
        this.codec = codec;
        this.string = toTest;
        /**
         * The Immunity character array is every character in the String we're testing.
         * 
         */
        this.immunityList = toTest.toCharArray();
    }

    @Test
    public void testImmuneEncode() {
        String encoded = codec.encode(immunityList, string);
        Assert.assertEquals(string, encoded);
    }

    @Test
    public void testImmuneDecode() {
        String decoded = codec.decode(string);
        Assert.assertEquals(string, decoded);
    }
}

@xeno6696
Copy link
Collaborator

It appears that your test is asserting that the encoded string should be the same as the unencoded string? This is not correct in the case of the input %DE since it does not contain anything other than a hit against a percent-encoded string. I would also expect the decoded value of %DE to be the "thorn" char Þ

@xeno6696
Copy link
Collaborator

I spoke with @jeremiahjstacey in person, and the problem is a legitimate bug. If you crack open the CodecTest.java unit test file, the immunity part of the API was only ever tested with an EMPTY_CHAR_ARRAY

This is a good catch. Congratulations!

I will file an appropriate bug and attach your unit test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants