Contents | ||
This article shows you how to take data from any query made to the Google Analytics Data Export API and output the results to popular CSV format. This is one of the most common tasks people perform with Analytics data pulled from the Data Export API, so automating the process is an easy way to save tons of time on a regular basis. In addition, once you have some code for printing out CSV documents from queries, you'll be able to integrate this into larger projects, like automatic report generators, mailers and "export" functions for custom dashboards you've written.
You'll get the most out of this article if you have the following:
The code covered in this article will do the following:
DataFeed object as a parameter, print the data out in CSV format:
DataEntry makes up one row in the resulting
output.The first thing to do is set up a configurable output stream for your class
to print to. This way any code using your class can decide whether output should go to
standard out or directly to a file. All you need to do here is set up getter/setter
method for a PrintStream object. That will be the target of all
printing done by the class.
private PrintStream printStream = System.out;
public PrintStream getPrintStream() {
return printStream;
}
public void setPrintStream(PrintStream printStream) {
this.printStream = printStream;
}
Setting the output to a file is also very easy. One needs only the filename
to create a PrintStream object for that file.
FileOutputStream fstream = new FileOutputStream(filename); PrintStream stream = new PrintStream(fstream); csvprinter.setPrintStream(stream);
The first row of the CSV file is the row of column names. Each column represents a dimension or metric from the data feed, so in order to print out this first row, do the following.
getDimensions method.Dimension.getName() method,
followed by a comma.getMetrics() method.
Print commas after all but the last metric.Here's one implementation of the method to print row headers. Note that this code doesn't return a string representing the complete row: it prints to an output stream as it processes values.
public void printRowHeaders(DataFeed feed) {
if(feed.getEntries().size() == 0) {
return;
}
DataEntry firstEntry = feed.getEntries().get(0);
Iterator<Dimension> dimensions = firstEntry.getDimensions().iterator();
while (dimensions.hasNext()) {
printStream.print(sanitizeForCsv(dimensions.next().getName()));
printStream.print(",");
}
Iterator<Metric> metrics = firstEntry.getMetrics().iterator();
while (metrics.hasNext()) {
printStream.print(sanitizeForCsv(metrics.next().getName()));
if (metrics.hasNext()) {
printStream.print(",");
}
}
printStream.println();
}
Printing the "body" of the CSV file (everything below the row of column
names) is very similar. There are only two key differences. First, it's not
just the first entry being evaluated. The code needs to loop through all the
entries in the feed object. Second, instead of using the getName()
method to pull the value to be sanitized and printed, use
getValue() instead.
public void printBody(DataFeed feed) {
if(feed.getEntries().size() == 0) {
return;
}
for (DataEntry entry : feed.getEntries()) {
printEntry(entry);
}
}
public void printEntry(DataEntry entry) {
Iterator<Dimension> dimensions = entry.getDimensions().iterator();
while (dimensions.hasNext()) {
printStream.print(sanitizeForCsv(dimensions.next().getValue()));
printStream.print(",");
}
Iterator<Metric> metrics = entry.getMetrics().iterator();
while (metrics.hasNext()) {
printStream.print(sanitizeForCsv(metrics.next().getValue()));
if (metrics.hasNext()) {
printStream.print(",");
}
}
printStream.println();
}
This code breaks up your feed into entries, and your entries into values to be printed to output. But how do we make those values CSV-friendly? What if a value in the "comma-separated-values" file has a comma in it? Those values must be sanitized.
CSV is a straightforward format. A CSV file represents a data table, and each line represents a row in that table. The values in that row are separated by commas. A new line means a new row of data.
Unfortunately, this straightforward format makes it deceptively easy to throw things off with bad data. What if your value has a comma in it? What if one of your values has line breaks within it? What should happen with space between commas and values? All these situations can be accounted for using a few simple rules.
It can be a little tricky to visualize what your values should look like at this point, so here are some examples. Remember, each example represents a single value, and is escaped as such. For clarity, spaces will be shown as a _ character.
| Before | After |
|---|---|
| unchanged | unchanged |
| random " doublequote | random "" doublequote |
| comma,separated | "comma,separated" |
| Two lines |
"Two lines" |
| _leading space, and a comma | "_leading space, and a comma" |
| "leading quote, comma | """leading quote, comma" |
| _space, comma second line, and double quote" |
"_space, comma second line, and double quote""" |
The easiest way to handle all these conditions is to write a sanitizing method. Questionable data goes in, and good, clean, CSV values come out. Here's a good sample implementation of just such a method.
private String sanitizeForCsv(String cellData) {
StringBuilder resultBuilder = new StringBuilder(cellData);
// Look for doublequotes, escape as necessary.
int lastIndex = 0;
while (resultBuilder.indexOf("\"", lastIndex) >= 0) {
int quoteIndex = resultBuilder.indexOf("\"", lastIndex);
resultBuilder.replace(quoteIndex, quoteIndex + 1, "\"\"");
lastIndex = quoteIndex + 2;
}
char firstChar = cellData.charAt(0);
char lastChar = cellData.charAt(cellData.length() - 1);
if (cellData.contains(",") || // Check for commas
cellData.contains("\n") || // Check for line breaks
Character.isWhitespace(firstChar) || // Check for leading whitespace.
Character.isWhitespace(lastChar)) { // Check for trailing whitespace
resultBuilder.insert(0, "\"").append("\""); // Wrap in doublequotes.
}
return resultBuilder.toString();
}
The method starts out by checking for existing double quotes. This should be done before all the other checks, as they involve wrapping a string with double quotes, and it would be bothersome to determine the difference between double quotes that were part of the value and double quotes that were added previously by this method. These are easy to escape— they just need to be doubled up. Every " becomes a "", every "" becomes a """", and so on.
Once that condition has been met, all the other conditions (untrimmed whitespace, commas, and line breaks) can be checked for. If any of them are present, simply wrap the value in double quotes.
Note that the above uses a StringBuilder
object, never directly manipulating a raw string. This is because the
StringBuilder lets you freely manipulate the string without making
interim copies in memory. Because strings in Java are immutable, every minor
tweak you make would create a brand new string. When chugging through
spreadsheet data, this can add up very quickly.
| Number of rows | x Values per row | x Changes to value | = Total New Strings Created |
|---|---|---|---|
| 10,000 | 10 | 3 | 300,000 |
Now that you've been given a golden hammer, it's only natural to go hunting for nails. Here's some ideas to get you started.