| Issue 96: | StackOverflowError with long Collections | |
| 6 people starred this issue and may be notified of changes. | Back to list |
Sign in to add a comment
|
What steps will reproduce the problem?
import java.lang.reflect.Type;
import java.util.ArrayList;
import java.util.Collection;
import java.util.List;
import com.google.gson.Gson;
import com.google.gson.reflect.TypeToken;
public class TestGson
{
private String name;
private String value;
public TestGson()
{
}
/**
* @param name
* @param value
*/
public TestGson(String name, String value)
{
super();
this.name = name;
this.value = value;
}
public static void main(String[] args)
{
List<TestGson> list = new ArrayList<TestGson>(10000);
for (int x = 0; x < 10000;x++)
{
list.add(new TestGson("name"+x,"value"+x));
}
Gson gson = new Gson();
String json = gson.toJson(list);
System.out.println("Json: "+json);
Type collectionType = new TypeToken<ArrayList<TestGson>>(){}.getType();
List<TestGson> list2 = gson.fromJson(json,collectionType);
}
}
What is the expected output? What do you see instead?
The stack trace looks as follows:
Exception in thread "main" com.google.gson.JsonParseException: Failed
parsing JSON source: java.io.StringReader@1b9ce4b to Json
at com.google.gson.Gson.fromJson(Gson.java:380)
at com.google.gson.Gson.fromJson(Gson.java:321)
at TestGson.main(TestGson.java:48)
Caused by: java.lang.StackOverflowError
at com.google.gson.SimpleCharStream.readChar(SimpleCharStream.java:198)
at
com.google.gson.JsonParserTokenManager.jjMoveNfa_0(JsonParserTokenManager.java:584)
at
com.google.gson.JsonParserTokenManager.jjStartNfaWithStates_0(JsonParserTokenManager.java:165)
at
com.google.gson.JsonParserTokenManager.jjMoveStringLiteralDfa0_0(JsonParserTokenManager.java:172)
at
com.google.gson.JsonParserTokenManager.getNextToken(JsonParserTokenManager.java:935)
at com.google.gson.JsonParser.jj_ntk(JsonParser.java:396)
at com.google.gson.JsonParser.JsonString(JsonParser.java:274)
at com.google.gson.JsonParser.Pair(JsonParser.java:76)
at com.google.gson.JsonParser.Members(JsonParser.java:61)
at com.google.gson.JsonParser.Members(JsonParser.java:65)
at com.google.gson.JsonParser.JsonObject(JsonParser.java:42)
at com.google.gson.JsonParser.JsonValue(JsonParser.java:134)
at com.google.gson.JsonParser.Elements(JsonParser.java:109)
at com.google.gson.JsonParser.Elements(JsonParser.java:113)
at com.google.gson.JsonParser.Elements(JsonParser.java:113)
...
What version of the product are you using? On what operating system?
GSON 1.3
Please provide any additional information below.
It is strange that a linear collection is deserialized using recursion.
This will always fail with large collections. Sooner or later. With my
stack size the limit was something like 8500 elements.
|
||||||||||||
,
Mar 03, 2009
Added performance tests in r388 On my machine (a dual-processor 64 bit ubuntu with 8GB RAM), Gson was able to serialize a collection of 1.4 million objects. The deserializer threw a stack overflow error in the parser beyond a collection of 87,000 objects. |
|||||||||||||
,
Mar 03, 2009
(No comment was entered for this change.)
Status: WontFix
|
|||||||||||||
,
Mar 03, 2009
The parser uses a production to match a collection, and that is implemented by JavaCC using recursion. If someone proposes a more efficient production, we will be happy to incorporate it. |
|||||||||||||
,
Jul 13, 2009
One of the stated goals of GSON is to "Support arbitrarily complex objects (with deep inheritance hierarchies and extensive use of generic types)". I guess you should change it to 'arbitrarily complex objects (except collections with large numbers of items)'. I use a byte array to store the bytes of a file. I can serialize this array w/ GSON just fine, but somehow this object is too complex to be parsed? A byte array? Really? |
|||||||||||||
,
Jul 13, 2009
Can you give some more information on how large the byte array is? |
|||||||||||||
,
Jul 13, 2009
It's a megabyte-ish file, so about a million items. The point though is that it shouldn't matter how many items are in it, it's perhaps the most basic data type in all of Java, and if GSON can't parse it, then perhaps GSON shouldn't be generating it in the first place. |
|||||||||||||
,
Jul 13, 2009
Well, that is not how parsers work unfortunately. They have to match a production, which in this case is for matching a String (I presume this is how you are storing bytearrays). A String may seem like a simple thing but it actually require a fair amount of escaping and unescaping character by character. In an earlier version of Gson, we used a recursive production to match a String as an array of characters. This resulted in a StackOverflowError for 100KB strings. In a later version, I revised it to a production that does a single token match. Last I tested, Gson could handle strings of over 20MB in size. Can you give us more details (may be post a code snippet) on how are the bytes stored? Are you using a byte[] or are you using a String? |
|||||||||||||
,
Jul 13, 2009
Wrote performance tests for byte array serialization and deserialization in r430. Gson failed at serializing arrays beyond 8MB, but for deserialization it failed for arrays as small as 32KB. So, seems like we have a real performance issue here that we need fixing.
Status: Accepted
|
|||||||||||||
,
Jul 13, 2009
Ok, I revised my tests to determine the limit somewhat more precisely. Gson failed at
deserialization on byte arrays of size beyond 80k.
The primary reason is that Gson matches an Array as a production of Array elements.
The Elements themselves are matched with a recursive production (and therein lies the
problem):
private JsonArray JsonArray() :
{ JsonArray array = new JsonArray(); }
{
"[" [ Elements(array) ] "]"
{
array.reverse();
return array;
}
}
private void Elements(JsonArray array) :
{
JsonElement element;
}
{
element=JsonValue() [ "," Elements(array) ]
{ array.add(element); }
}
Any suggestions from anyone on how to improve these productions?
|
|||||||||||||
,
Jul 14, 2009
Your production appears to approach N^2 in memory usage, since you pass the entire
parsed array into the recursive call. The top-level recursion gets a 0-item
JsonArray, the first inner recursion gets a 1-item array, and so on, up to N items.
But those recursive calls will stay on the stack until the whole list is parsed, so
you end up with 1+2+3+...+N items on the stack.
What if you built the JsonArray iteratively in the top-level rule instead, so that
the single 'result' array could be updated without passing it into another rule? I
haven't played with parser generators since ANTLR back in my compiler class, so I'm
not well-versed enough in JavaCC semantics to write up the rule, but I'm imagining a
grammar like this (in EBNF):
JsonArray : '[' JsonArrayElement ( ',' JsonArrayElement )+ ']'
I'm not sure how that translates to JavaCC, or how you'd get the next element back,
but the point is to not pass a potentially huge data structure around in the call
parameters and build it up iteratively instead.
|
|||||||||||||
,
Jul 15, 2009
If you want to keep your current rules, you could perhaps instead make a single globally-scoped JsonArray...so long as you only parse a single array at a time, this may work better with your current rule...globally-scoped variables are not passed on the stack like parameters are, so you should be able to scan arrays all the way up to the size of the JVM's memory. |
|||||||||||||
,
Aug 05, 2009
The limitation is much smaller on devices with limited resources, such as Android phones. I'm running into this problem trying to deserialize a string that's about 20k on Android 1.5. Can anyone think of a constructive way to work around this issue for now? Would writing a custom deserializer help, or is the problem more low-level than that? |
|||||||||||||
,
Aug 05, 2009
Unfortunately the problem is more low-level since our JavaCC parser uses recursive productions. The parser gets invoked way before the deserializers are run. We will try to address this in our next release, but it would greatly help if others can also provide alternate JavaCC productions for Javascript Arrays. |
|||||||||||||
,
Aug 14, 2009
Recursive rules are fine when your input is guaranteed to be small, but the way your
JsonArray rule is written, you pass the whole array into each recursive call...that's just
plain unnecessary, and it quickly consumes all available memory for even moderately-sized
arrays. Instead, rephrase your grammar as left-associative rather than right-recursive.
Read values from the left end of the input and immediately add them to the output array, then
consume the comma and iterate. Something like this:
private JsonArray JsonArray() :
{ JsonArray array = new JsonArray(); JsonElement elem = null; }
{
"[" ( elem=JsonValue() {array.add(elem);} [","] )* "]"
{ return array; }
}
There's no need for a second Elements() rule here at all, and since the elements are read in
the proper order to begin with, there's no need to reverse the array either.
But also note that this is a rather lax grammar, since it doesn't validate that the array is
proper JSON...the comma is optional only so that the last value is matched, but because its
optional you could have bogus JSON like '[ 12 34, 56 ]' return the incorrect array
(12,34,56). If that matters to you, I'll leave the solution to you.
I generated this to a java class and plugged in my replacement JsonArray implementation into
the GSON source, then compiled and ran it against 234 KB of json text representing a byte
array of about 64,000 items, and it successfully returned the array. It was horribly slow,
about 3 minutes, but it did return eventually...not sure why it's so slow, perhaps since I
have done the rule for you here, you can profile it and find out where all the time is being
spent. You may have other recursive rules that can be similarly re-written to reduce your
parser's memory footprint significantly.
Other than this problem (a major one IMHO), GSON has been great, definitely one of the better
Java <-> JSON converters out there. Keep up the good work!
PS: when I convert my class to use a String to contain the byte array, it serializes and
deserializes without problems, and extremely quickly too (no 3 minute wait). I see you defer
this parsing to StringUnmarshaller. I suspect your parser is doing a lot of backtracking,
spinning its wheels trying to find a matching production, but a profiler will tell you for
sure.
|
|||||||||||||
,
Aug 21, 2009
Fixed in r438 Thanks for the tip, kenotron. I was able to build on your productions to come up with something that enforces the proper JSON rules.
Status: Fixed
Labels: Milestone-Release1.4 |
|||||||||||||
,
Aug 21, 2009
Lovely! Preliminary testing on r438 seems to work fine for me |
|||||||||||||
| ► Sign in to add a comment | |||||||||||||