My favorites | Sign in
Project Logo
Project hosting will be READ-ONLY Wednesday at 8am PST due to brief network maintenance.
             
New issue | Search
for
| Advanced search | Search tips
Issue 96: StackOverflowError with long Collections
6 people starred this issue and may be notified of changes. Back to list
Status:  Fixed
Owner:  ----
Closed:  Aug 21
Type-Defect
Priority-Medium
Milestone-Release1.4


Sign in to add a comment
 
Reported by niteg8, Jan 22, 2009
What steps will reproduce the problem?
import java.lang.reflect.Type;
import java.util.ArrayList;
import java.util.Collection;
import java.util.List;

import com.google.gson.Gson;
import com.google.gson.reflect.TypeToken;


public class TestGson
{
    private String name;
    private String value;

    public TestGson()
    {
    }


    /**
     * @param name
     * @param value
     */
    public TestGson(String name, String value)
    {
        super();
        this.name = name;
        this.value = value;
    }


    public static void main(String[] args)
    {
        List<TestGson> list = new ArrayList<TestGson>(10000);
        
        for (int x = 0; x < 10000;x++)
        {
            list.add(new TestGson("name"+x,"value"+x));
        }
        
        Gson gson = new Gson();
        
        String json = gson.toJson(list);
        System.out.println("Json: "+json);
        
        Type collectionType = new TypeToken<ArrayList<TestGson>>(){}.getType();
        
        List<TestGson> list2 = gson.fromJson(json,collectionType);
        
    }
}

What is the expected output? What do you see instead?

The stack trace looks as follows:

Exception in thread "main" com.google.gson.JsonParseException: Failed
parsing JSON source: java.io.StringReader@1b9ce4b to Json
	at com.google.gson.Gson.fromJson(Gson.java:380)
	at com.google.gson.Gson.fromJson(Gson.java:321)
	at TestGson.main(TestGson.java:48)
Caused by: java.lang.StackOverflowError
	at com.google.gson.SimpleCharStream.readChar(SimpleCharStream.java:198)
	at
com.google.gson.JsonParserTokenManager.jjMoveNfa_0(JsonParserTokenManager.java:584)
	at
com.google.gson.JsonParserTokenManager.jjStartNfaWithStates_0(JsonParserTokenManager.java:165)
	at
com.google.gson.JsonParserTokenManager.jjMoveStringLiteralDfa0_0(JsonParserTokenManager.java:172)
	at
com.google.gson.JsonParserTokenManager.getNextToken(JsonParserTokenManager.java:935)
	at com.google.gson.JsonParser.jj_ntk(JsonParser.java:396)
	at com.google.gson.JsonParser.JsonString(JsonParser.java:274)
	at com.google.gson.JsonParser.Pair(JsonParser.java:76)
	at com.google.gson.JsonParser.Members(JsonParser.java:61)
	at com.google.gson.JsonParser.Members(JsonParser.java:65)
	at com.google.gson.JsonParser.JsonObject(JsonParser.java:42)
	at com.google.gson.JsonParser.JsonValue(JsonParser.java:134)
	at com.google.gson.JsonParser.Elements(JsonParser.java:109)
	at com.google.gson.JsonParser.Elements(JsonParser.java:113)
	at com.google.gson.JsonParser.Elements(JsonParser.java:113)
...


What version of the product are you using? On what operating system?
GSON 1.3

Please provide any additional information below.
It is strange that a linear collection is deserialized using recursion.
This will always fail with large collections. Sooner or later. With my
stack size the limit was something like 8500 elements. 

Comment 1 by inder123, Mar 03, 2009
Added performance tests in r388 

On my machine (a dual-processor 64 bit ubuntu with 8GB RAM), Gson was able to
serialize a collection of 1.4 million objects. The deserializer threw a stack
overflow error in the parser beyond a collection of 87,000 objects. 
Comment 2 by inder123, Mar 03, 2009
(No comment was entered for this change.)
Status: WontFix
Comment 3 by inder123, Mar 03, 2009
The parser uses a production to match a collection, and that is implemented by JavaCC
using recursion. If someone proposes a more efficient production, we will be happy to
incorporate it.
Comment 4 by kenotron, Jul 13, 2009
One of the stated goals of GSON is to "Support arbitrarily complex objects (with deep 
inheritance hierarchies and extensive use of generic types)".

I guess you should change it to 'arbitrarily complex objects (except collections with 
large numbers of items)'.

I use a byte array to store the bytes of a file.  I can serialize this array w/ GSON 
just fine, but somehow this object is too complex to be parsed?  A byte array?  
Really?
Comment 5 by inder123, Jul 13, 2009
Can you give some more information on how large the byte array is?
Comment 6 by kenotron, Jul 13, 2009
It's a megabyte-ish file, so about a million items.  The point though is that it 
shouldn't matter how many items are in it, it's perhaps the most basic data type in all 
of Java, and if GSON can't parse it, then perhaps GSON shouldn't be generating it in 
the first place.
Comment 7 by inder123, Jul 13, 2009
Well, that is not how parsers work unfortunately. They have to match a production, 
which in this case is for matching a String (I presume this is how you are storing 
bytearrays). A String may seem like a simple thing but it actually require a fair 
amount of escaping and unescaping character by character. 

In an earlier version of Gson, we used a recursive production to match a String as an 
array of characters. This resulted in a StackOverflowError for 100KB strings. In a 
later version, I revised it to a production that does a single token match. Last I 
tested, Gson could handle strings of over 20MB in size. 

Can you give us more details (may be post a code snippet) on how are the bytes 
stored? Are you using a byte[] or are you using a String?
Comment 8 by inder123, Jul 13, 2009
Wrote performance tests for byte array serialization and deserialization in r430. Gson 
failed at serializing arrays beyond 8MB, but for deserialization it failed for arrays 
as small as 32KB. So, seems like we have a real performance issue here that we need 
fixing.
Status: Accepted
Comment 9 by inder123, Jul 13, 2009
Ok, I revised my tests to determine the limit somewhat more precisely. Gson failed at 
deserialization on byte arrays of size beyond 80k. 

The primary reason is that Gson matches an Array as a production of Array elements. 
The Elements themselves are matched with a recursive production (and therein lies the 
problem): 

private JsonArray JsonArray() :
{ JsonArray array = new JsonArray(); }
{
  "[" [ Elements(array) ] "]"
  {
    array.reverse();
    return array;
  }
}

private void Elements(JsonArray array) :
{
  JsonElement element;
}
{
  element=JsonValue() [ "," Elements(array) ]
  { array.add(element); }
}

Any suggestions from anyone on how to improve these productions?
Comment 10 by kenotron, Jul 14, 2009
Your production appears to approach N^2 in memory usage, since you pass the entire 
parsed array into the recursive call.  The top-level recursion gets a 0-item 
JsonArray, the first inner recursion gets a 1-item array, and so on, up to N items.  
But those recursive calls will stay on the stack until the whole list is parsed, so 
you end up with 1+2+3+...+N items on the stack.

What if you built the JsonArray iteratively in the top-level rule instead, so that 
the single 'result' array could be updated without passing it into another rule?    I 
haven't played with parser generators since ANTLR back in my compiler class, so I'm 
not well-versed enough in JavaCC semantics to write up the rule, but I'm imagining a 
grammar like this (in EBNF):

    JsonArray : '[' JsonArrayElement ( ',' JsonArrayElement )+ ']'

I'm not sure how that translates to JavaCC, or how you'd get the next element back,  
but the point is to not pass a potentially huge data structure around in the call 
parameters and build it up iteratively instead.
Comment 11 by kenotron, Jul 15, 2009
If you want to keep your current rules, you could perhaps instead make a single 
globally-scoped JsonArray...so long as you only parse a single array at a time, this 
may work better with your current rule...globally-scoped variables are not passed on 
the stack like parameters are, so you should be able to scan arrays all the way up to 
the size of the JVM's memory.
Comment 13 by mburton, Aug 05, 2009
The limitation is much smaller on devices with limited resources, such as Android phones.  I'm running into this 
problem trying to deserialize a string that's about 20k on Android 1.5.

Can anyone think of a constructive way to work around this issue for now?  Would writing a custom deserializer 
help, or is the problem more low-level than that?
Comment 14 by inder123, Aug 05, 2009
Unfortunately the problem is more low-level since our JavaCC parser uses recursive
productions. The parser gets invoked way before the deserializers are run. 

We will try to address this in our next release, but it would greatly help if others
can also provide alternate JavaCC productions for Javascript Arrays.
Comment 15 by kenotron, Aug 14, 2009
Recursive rules are fine when your input is guaranteed to be small, but the way your 
JsonArray rule is written, you pass the whole array into each recursive call...that's just 
plain unnecessary, and it quickly consumes all available memory for even moderately-sized 
arrays.  Instead, rephrase your grammar as left-associative rather than right-recursive.  
Read values from the left end of the input and immediately add them to the output array, then 
consume the comma and iterate.  Something like this:

private JsonArray JsonArray() :
{ JsonArray array = new JsonArray(); JsonElement elem = null; }
{
  "[" ( elem=JsonValue() {array.add(elem);} [","] )* "]"
  {    return array;  }
}

There's no need for a second Elements() rule here at all, and since the elements are read in 
the proper order to begin with, there's no need to reverse the array either.

But also note that this is a rather lax grammar, since it doesn't validate that the array is 
proper JSON...the comma is optional only so that the last value is matched, but because its 
optional you could have bogus JSON like '[ 12 34, 56 ]' return the incorrect array 
(12,34,56).  If that matters to you, I'll leave the solution to you.

I generated this to a java class and plugged in my replacement JsonArray implementation into 
the GSON source, then compiled and ran it against 234 KB of json text representing a byte 
array of about 64,000 items, and it successfully returned the array.  It was horribly slow, 
about 3 minutes, but it did return eventually...not sure why it's so slow, perhaps since I 
have done the rule for you here, you can profile it and find out where all the time is being 
spent.  You may have other recursive rules that can be similarly re-written to reduce your 
parser's memory footprint significantly.

Other than this problem (a major one IMHO), GSON has been great, definitely one of the better 
Java <-> JSON converters out there.  Keep up the good work!

PS: when I convert my class to use a String to contain the byte array, it serializes and 
deserializes without problems, and extremely quickly too (no 3 minute wait).  I see you defer 
this parsing to StringUnmarshaller.  I suspect your parser is doing a lot of backtracking, 
spinning its wheels trying to find a matching production, but a profiler will tell you for 
sure.
Comment 16 by inder123, Aug 21, 2009
Fixed in r438
Thanks for the tip, kenotron. I was able to build on your productions to come up with
something that enforces the proper JSON rules.
Status: Fixed
Labels: Milestone-Release1.4
Comment 17 by mburton, Aug 21, 2009
Lovely!  Preliminary testing on r438 seems to work fine for me
Sign in to add a comment

Hosted by Google Code