Infeasible
Status Update
Comments
we...@google.com <we...@google.com> #2
You've stumbled into a known bug dealing with analytic (window) functions and field aliases. The example you provide will fail if "s.word" is given an AS alias:
SELECT
s.word AS foo,
SUM(s2.word_count) OVER (PARTITION BY s.corpus) AS count_corpus,
SUM(s2.word_count) OVER () AS count_total
FROM publicdata:samples.shakespeare s
INNER JOIN publicdata:samples.shakespeare s2
ON s2.word = s.word AND s2.corpus = s.corpus
You can work around this limitation by performing the aliasing as an independent step outside a sub-select:
SELECT s.word AS foo,
count_corpus,
count_total
FROM (
SELECT
s.word,
SUM(s2.word_count) OVER (PARTITION BY s.corpus) AS count_corpus,
SUM(s2.word_count) OVER () AS count_total
FROM publicdata:samples.shakespeare s
INNER JOIN publicdata:samples.shakespeare s2
ON s2.word = s.word AND s2.corpus = s.corpus)
This is a deep, rare problem, and unlikely to be fixed.
SELECT
s.word AS foo,
SUM(s2.word_count) OVER (PARTITION BY s.corpus) AS count_corpus,
SUM(s2.word_count) OVER () AS count_total
FROM publicdata:samples.shakespeare s
INNER JOIN publicdata:samples.shakespeare s2
ON s2.word = s.word AND s2.corpus = s.corpus
You can work around this limitation by performing the aliasing as an independent step outside a sub-select:
SELECT s.word AS foo,
count_corpus,
count_total
FROM (
SELECT
s.word,
SUM(s2.word_count) OVER (PARTITION BY s.corpus) AS count_corpus,
SUM(s2.word_count) OVER () AS count_total
FROM publicdata:samples.shakespeare s
INNER JOIN publicdata:samples.shakespeare s2
ON s2.word = s.word AND s2.corpus = s.corpus)
This is a deep, rare problem, and unlikely to be fixed.
ol...@gmail.com <ol...@gmail.com> #3
Thank you for investigating so promptly! Knowing that this error can be avoided by omitting the alias is in itself perfectly adequate for this particular situation.
However, I am somewhat concerned in regards to your estimate that this won't be fixed, and your characterisation of this as a 'deep, rare problem'. This is not an isolated case, I am running into many mysterious errors that are 'solved' by adding or removing an alias. What makes them hard to deal with and report is that the critical circumstances and the underlying causes remain elusive.
The problem is not just that certain things are not possible in BigQuery queries, but that the limitations are obscure, that we can't predict whether a query will run. This frustrates using BigQuery in a professional setting. This could be offset by a clear sense of continuous improvement, issues being fixed as they get reported.
On a technical level, I frankly don't understand why this would be a deep, difficult problem to fix. I would love to get a more detailed explanation.
However, I am somewhat concerned in regards to your estimate that this won't be fixed, and your characterisation of this as a 'deep, rare problem'. This is not an isolated case, I am running into many mysterious errors that are 'solved' by adding or removing an alias. What makes them hard to deal with and report is that the critical circumstances and the underlying causes remain elusive.
The problem is not just that certain things are not possible in BigQuery queries, but that the limitations are obscure, that we can't predict whether a query will run. This frustrates using BigQuery in a professional setting. This could be offset by a clear sense of continuous improvement, issues being fixed as they get reported.
On a technical level, I frankly don't understand why this would be a deep, difficult problem to fix. I would love to get a more detailed explanation.
jc...@google.com <jc...@google.com> #4
Great question. Glad you asked!
The code base behind BigQuery is nearing its 10th birthday, and it has evolved substantially over that time period. A lot of the code, particularly in the parser and query planner, was built with goals and assumptions that differ from how the code is used today. That makes bugs like this one very challenging to track down and fix, especially if the fix is done in a principled manner rather than just slapping another patch on top.
We understand that the quirks in BQ's SQL dialect are frustrating and unpredictable. Fixing this problem in a comprehensive and principled manner is one of our highest priorities, and that work is close to completion. When we close this as WontFix, we really mean HangInThere--the real fix is on its way.
The code base behind BigQuery is nearing its 10th birthday, and it has evolved substantially over that time period. A lot of the code, particularly in the parser and query planner, was built with goals and assumptions that differ from how the code is used today. That makes bugs like this one very challenging to track down and fix, especially if the fix is done in a principled manner rather than just slapping another patch on top.
We understand that the quirks in BQ's SQL dialect are frustrating and unpredictable. Fixing this problem in a comprehensive and principled manner is one of our highest priorities, and that work is close to completion. When we close this as WontFix, we really mean HangInThere--the real fix is on its way.
ol...@gmail.com <ol...@gmail.com> #5
Thank you for clarifying that, I am especially glad that WontFix is not to be taken literally. I imagine one of your greatest challenges when it comes to creating structural fixes is backwards compatibility, and I respect your efforts very much.
ol...@gmail.com <ol...@gmail.com> #6
jc...@gmail.com <jc...@gmail.com> #7
I verified that that query no longer fails in the new dialect.
Description
select
s.word
,sum(s2.word_count) over (partition by s.corpus) as count_corpus
,sum(s2.word_count) over () as count_total
from publicdata:samples.shakespeare s
inner join publicdata:samples.shakespeare s2
on s2.word = s.word
and s2.corpus = s.corpus
However, when I try to do this with our own tables, I am running into an unhelpful error message to the effect that the first field in the select statement cannot be found.
Job ID: snap-sale:job_SPeZfcVqIHlB0fzrsnAr7kcbhnY
I don't see any structural difference between these two queries, except that in the first query I am joining the table with itself.
Could you find out what causes this error, and if it cannot be fixed right away, change the error message into something less misleading?