Fixed
Status Update
Comments
[Deleted User] <[Deleted User]> #2
Note that the solution given here is equivalent to the attached patch, and was taken
directly out ofhttp://docs.python.org/lib/csv-examples.html
IMHO, there should be an argument to the bulkloader script to take a charset, which
can then be added to the Content-Type header on the request. This is handled
automatically by WebOb
(http://pythonpaste.org/webob/reference.html#unicode-variables ), and then the
"unicode_csv_reader" can just be a general encoding CSV reader.
Until then, the following patch works for UTF-8, so it will also work for ASCII.
directly out of
IMHO, there should be an argument to the bulkloader script to take a charset, which
can then be added to the Content-Type header on the request. This is handled
automatically by WebOb
(
"unicode_csv_reader" can just be a general encoding CSV reader.
Until then, the following patch works for UTF-8, so it will also work for ASCII.
wa...@gmail.com <wa...@gmail.com> #3
This works for the SDK, but doesn't work for the appengine itself. :(
vi...@gmail.com <vi...@gmail.com> #4
Just for the record: a simple workaround is to patch __init__.py as suggested in the issue 157 , and then save it
under bulkload.py name in your App Engine directory. After that, simply replace
from google.appengine.ext import bulkload
with
import bulkload
and Unicode import will work!
under bulkload.py name in your App Engine directory. After that, simply replace
from google.appengine.ext import bulkload
with
import bulkload
and Unicode import will work!
ma...@google.com <ma...@google.com>
fr...@gmail.com <fr...@gmail.com> #5
th...@gmail.com <th...@gmail.com> #6
Currently after doing all the patches I can successfuly deal with bulk load only on
the local development server, but I can't upload my data to the application on appspot.
Is this bug scheduled for fixing in the near future ?
we...@gmail.com <we...@gmail.com> #7
A resolução do problema é simples:
Conforme o post inicial copie o arquivo __init__.py do módulo google.appengine.ext.bulkload e crie um pasta no seu aplicativo qualquer, ex.:
bulkload. Efetue as alterações conforme recomendado no post inicial ( o mesmo já está
em anexo com as correções, para facilitar :) ) depois inclua ele no seu arquivo de
load do csv para importação : import bulkload e remova : from google.appengine.ext
import bulkload, pronto agora vai!!! Até que essa correção seja feita no appspot isso
irá funcionar perfeitamente.
Conforme o post inicial copie o arquivo __init__.py do módulo google.appengine.ext.bulkload e crie um pasta no seu aplicativo qualquer, ex.:
bulkload. Efetue as alterações conforme recomendado no post inicial ( o mesmo já está
em anexo com as correções, para facilitar :) ) depois inclua ele no seu arquivo de
load do csv para importação : import bulkload e remova : from google.appengine.ext
import bulkload, pronto agora vai!!! Até que essa correção seja feita no appspot isso
irá funcionar perfeitamente.
sa...@gmail.com <sa...@gmail.com> #8
I patched the __init__.py and changed the import bulkload
Stiil got the error message --
line337, in LoadEntities
new_entities = leader.CreateEntity(columns, key_name=key_name)
line233,, in CreateEntity
entity[name] = converter(val)
UnicodeEncodeError: 'ascii' codec can't encode characters u'\ufeff' in position 0:
ordinal not in range(128)
Any solution??
Stiil got the error message --
line337, in LoadEntities
new_entities = leader.CreateEntity(columns, key_name=key_name)
line233,, in CreateEntity
entity[name] = converter(val)
UnicodeEncodeError: 'ascii' codec can't encode characters u'\ufeff' in position 0:
ordinal not in range(128)
Any solution??
yo...@gmail.com <yo...@gmail.com> #9
I hit the problem also. Seems that the proceeding '\ufeff' is not acceptable. I
truncate it then it works
truncate it then it works
jo...@gmail.com <jo...@gmail.com> #10
I have the same problem:
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf3' in position 8: ordinal not in range(128)
What do you mean with "truncate it"?
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf3' in position 8: ordinal not in range(128)
What do you mean with "truncate it"?
hu...@gmail.com <hu...@gmail.com> #11
any progress on this?? It is essential, yet easy to fix stuff. Solution is mentioned.
What are we waiting for?? It is already been 8 months since this issue been opened
What are we waiting for?? It is already been 8 months since this issue been opened
sa...@gmail.com <sa...@gmail.com> #12
I gave up the bulkloader. I use the flash to read the cvs file and send url request
to appengine to update the datastore. Everything solved. Just few lines to solve this
problem.
1. read the cvs in Flash-swf (in flash)
2. call LoadVariables (in flash)
e.g. (LoadVariables("http://myaccount_123.appspot.com/cvs_loader/ ", this, "POST");
3. check success flag, go to next record, then loop back to Load Variable(in flash)
4. write a function in python
e.g.
def cvs_loader(request):
if request.method == 'POST':
form = MyDataForm(request.POST)
if form.is_valid():
newdata = MyData(name=form.cleaned_data['name'],
tel=form.cleaned_data['tel'],
description=form.cleaned_data['description'])
newdata.put()
return render_to_response('success.html')
to appengine to update the datastore. Everything solved. Just few lines to solve this
problem.
1. read the cvs in Flash-swf (in flash)
2. call LoadVariables (in flash)
e.g. (LoadVariables("
3. check success flag, go to next record, then loop back to Load Variable(in flash)
4. write a function in python
e.g.
def cvs_loader(request):
if request.method == 'POST':
form = MyDataForm(request.POST)
if form.is_valid():
newdata = MyData(name=form.cleaned_data['name'],
tel=form.cleaned_data['tel'],
description=form.cleaned_data['description'])
newdata.put()
return render_to_response('success.html')
ss...@gmail.com <ss...@gmail.com> #13
@Websyther,
Tentei seu __init__.py mas ainda o error occure:
UnicodeEncodeError: 'ascii' codec can't encode characters in position 2-3: ordinal not in range(128)
ERROR 2009-01-20 23:39:54,251 bulkload_client.py] Import failed
O que falta?
Tentei seu __init__.py mas ainda o error occure:
UnicodeEncodeError: 'ascii' codec can't encode characters in position 2-3: ordinal not in range(128)
ERROR 2009-01-20 23:39:54,251 bulkload_client.py] Import failed
O que falta?
hu...@gmail.com <hu...@gmail.com> #14
it says in sdk released notes for 1.1.8 that this issue has been fixed. But I am
trying to upload a unicode data with new bulkloader in 1.1.9 and it still fails to
upload data.
so unicode strings are still a problem.
COuld you please, check this. new bulkloader seems like a very useful tool, it would
be shame if it doesn't accept unicode.
Thanks
trying to upload a unicode data with new bulkloader in 1.1.9 and it still fails to
upload data.
so unicode strings are still a problem.
COuld you please, check this. new bulkloader seems like a very useful tool, it would
be shame if it doesn't accept unicode.
Thanks
ne...@gmail.com <ne...@gmail.com> #15
[Comment deleted]
ku...@gmail.com <ku...@gmail.com> #16
neoedmund has described the problem more succinctly.
In SDK 1.1.9, around line 1123 of
google_appengine\google\appengine\api\datastore_types.py I think this code would be
more appropriate:
if not isinstance(value, unicode):
# make a unicode object with best-guess for encoding:
value = value.decode('utf-8')
pbvalue.set_stringvalue(value.encode('utf-8')) # make a byte string
I had reported this in issue 155 but my report was poorly worded.
Note that the above bug cannot be patched on the appengine, since that code is
restricted for patching (as far as I can tell).
In SDK 1.1.9, around line 1123 of
google_appengine\google\appengine\api\datastore_types.py I think this code would be
more appropriate:
if not isinstance(value, unicode):
# make a unicode object with best-guess for encoding:
value = value.decode('utf-8')
pbvalue.set_stringvalue(value.encode('utf-8')) # make a byte string
I had reported this in
Note that the above bug cannot be patched on the appengine, since that code is
restricted for patching (as far as I can tell).
ke...@gmail.com <ke...@gmail.com> #17
The new bulkloader can be used to load unicode data, but you need to set up your
Loader subclass properly. You can use something like "lambda x: unicode(x, 'utf-8')"
as your conversion function to make it work. e.g.
class MyModel(db.Model):
field1 = db.StringProperty()
class MyLoader(Loader):
def __init__(self):
Loader.__init__(self, 'MyModel', [('field1', lambda x: unicode(x, 'utf-8'))])
Loader subclass properly. You can use something like "lambda x: unicode(x, 'utf-8')"
as your conversion function to make it work. e.g.
class MyModel(db.Model):
field1 = db.StringProperty()
class MyLoader(Loader):
def __init__(self):
Loader.__init__(self, 'MyModel', [('field1', lambda x: unicode(x, 'utf-8'))])
jo...@google.com <jo...@google.com>
fu...@gmail.com <fu...@gmail.com> #18
I don't think it's been fixed.
Now its version is 1.2.3.
In the document `Types and Property Classes' says that for a StringProperty field
its value would be either `str' or `unicode'.
Here is a model Author;
class Author(db.Model):
name = db.StringProperty()
its loader class could be like;
class AuthorLoader(bulkloader.Loader):
def __init__(self):
bulkloader.Loader.__init__(self, 'Author', [('name', unicode)])
then when you upload a csv contains non-ascii chars you'll get the well known error.
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe5 in position 0: ordinal not in
range(128)
For this issue you'll need a patch like this;
*** /local/src/google_appengine/google/appengine/tools/bulkloader.py~
--- /local/src/google_appengine/google/appengine/tools/bulkloader.py
on_class = db.class_for_kind(kind_or_class_key)
return implementation_class
***************
*** 3196,3202 ****
for (name, converter), val in zip(self.__properties, values):
if converter is bool and val.lower() in ('0', 'false', 'no'):
val = False
! properties[name] = converter(val)
entity = model_class(**properties)
entities = self.handle_entity(entity)
--- 3195,3204 ----
for (name, converter), val in zip(self.__properties, values):
if converter is bool and val.lower() in ('0', 'false', 'no'):
val = False
! if converter is unicode:
! properties[name] = converter(val, 'utf-8')
! else:
! properties[name] = converter(val)
entity = model_class(**properties)
entities = self.handle_entity(entity)
Now its version is 1.2.3.
In the document `Types and Property Classes' says that for a StringProperty field
its value would be either `str' or `unicode'.
Here is a model Author;
class Author(db.Model):
name = db.StringProperty()
its loader class could be like;
class AuthorLoader(bulkloader.Loader):
def __init__(self):
bulkloader.Loader.__init__(self, 'Author', [('name', unicode)])
then when you upload a csv contains non-ascii chars you'll get the well known error.
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe5 in position 0: ordinal not in
range(128)
For this issue you'll need a patch like this;
*** /local/src/google_appengine/google/appengine/tools/bulkloader.py~
--- /local/src/google_appengine/google/appengine/tools/bulkloader.py
on_class = db.class_for_kind(kind_or_class_key)
return implementation_class
***************
*** 3196,3202 ****
for (name, converter), val in zip(self.__properties, values):
if converter is bool and val.lower() in ('0', 'false', 'no'):
val = False
! properties[name] = converter(val)
entity = model_class(**properties)
entities = self.handle_entity(entity)
--- 3195,3204 ----
for (name, converter), val in zip(self.__properties, values):
if converter is bool and val.lower() in ('0', 'false', 'no'):
val = False
! if converter is unicode:
! properties[name] = converter(val, 'utf-8')
! else:
! properties[name] = converter(val)
entity = model_class(**properties)
entities = self.handle_entity(entity)
ke...@gmail.com <ke...@gmail.com> #19
Your Loader.__init__ call is not correct. In the (name, converter) tuples, the
converter is not a type but a function from str to the appropriate type. You need to
use "lambda x: unicode(x, 'utf-8')" as your conversion function. This will correctly
turn the utf-8 encoded str into a unicode. By specifying just "unicode" as the
conversion function, python uses the default codec (ascii) to try to create a unicode
instance from the str, and fails in this case.
I hope this clarifies things.
converter is not a type but a function from str to the appropriate type. You need to
use "lambda x: unicode(x, 'utf-8')" as your conversion function. This will correctly
turn the utf-8 encoded str into a unicode. By specifying just "unicode" as the
conversion function, python uses the default codec (ascii) to try to create a unicode
instance from the str, and fails in this case.
I hope this clarifies things.
fu...@gmail.com <fu...@gmail.com> #20
In the above patch, you can see the line;
if converter is bool and val.lower() in ('0', 'false', 'no'):
So the code expect `bool' as a converter.
Why not use `unicode' as a converter?
I think it makes sense to users more than force them to use lambda.
if converter is bool and val.lower() in ('0', 'false', 'no'):
So the code expect `bool' as a converter.
Why not use `unicode' as a converter?
I think it makes sense to users more than force them to use lambda.
mr...@gmail.com <mr...@gmail.com> #21
What kevingdon, suggested works as a charm :) thanks
So people use the lambda x: unicode(x, 'utf-8')
So people use the lambda x: unicode(x, 'utf-8')
pf...@gmail.com <pf...@gmail.com> #22
I follow the instructions here, but don't solve the problem to me.
I'm using SDK 1.3.0 and I trying to export data to csv with this:
bulkloader.Exporter.__init__(self, 'MigrationResult',
[('companyName', lambda x: x.decode('utf-8'), None),
('failure', lambda x: x.decode('utf-8'), None),
('email', lambda x: x.decode('utf-8'), None),
])
and get this:
File "/home/getsense/appengine_pyton/google/appengine/tools/bulkloader.py", line
2784, in __EncodeEntity
writer.writerow(self.__ExtractProperties(entity))
File "/home/getsense/appengine_pyton/google/appengine/tools/bulkloader.py", line
2763, in __ExtractProperties
encoding.append(fn(entity[name]))
File "latamvalley/exporter2.py", line 13, in <lambda>
('failure', lambda x: x.decode('utf-8'), None),
AttributeError: 'NoneType' object has no attribute 'decode'
then I tried with this:
bulkloader.Exporter.__init__(self, 'MigrationResult',
[('companyName', lambda x: unicode(x, 'utf-8'),
None),
('failure', lambda x: unicode(x, 'utf-8'), None),
('email', lambda x: unicode(x, 'utf-8'), None),
])
And have this:
File "/home/getsense/appengine_pyton/google/appengine/tools/bulkloader.py", line
2763, in __ExtractProperties
encoding.append(fn(entity[name]))
File "latamvalley/exporter.py", line 12, in <lambda>
[('companyName', lambda x: unicode(x, 'utf-8'), None),
TypeError: decoding Unicode is not supported
Can you help me please ?
I'm using SDK 1.3.0 and I trying to export data to csv with this:
bulkloader.Exporter.__init__(self, 'MigrationResult',
[('companyName', lambda x: x.decode('utf-8'), None),
('failure', lambda x: x.decode('utf-8'), None),
('email', lambda x: x.decode('utf-8'), None),
])
and get this:
File "/home/getsense/appengine_pyton/google/appengine/tools/bulkloader.py", line
2784, in __EncodeEntity
writer.writerow(self.__ExtractProperties(entity))
File "/home/getsense/appengine_pyton/google/appengine/tools/bulkloader.py", line
2763, in __ExtractProperties
encoding.append(fn(entity[name]))
File "latamvalley/exporter2.py", line 13, in <lambda>
('failure', lambda x: x.decode('utf-8'), None),
AttributeError: 'NoneType' object has no attribute 'decode'
then I tried with this:
bulkloader.Exporter.__init__(self, 'MigrationResult',
[('companyName', lambda x: unicode(x, 'utf-8'),
None),
('failure', lambda x: unicode(x, 'utf-8'), None),
('email', lambda x: unicode(x, 'utf-8'), None),
])
And have this:
File "/home/getsense/appengine_pyton/google/appengine/tools/bulkloader.py", line
2763, in __ExtractProperties
encoding.append(fn(entity[name]))
File "latamvalley/exporter.py", line 12, in <lambda>
[('companyName', lambda x: unicode(x, 'utf-8'), None),
TypeError: decoding Unicode is not supported
Can you help me please ?
ku...@gmail.com <ku...@gmail.com> #23
double check the values you are passing in the CSV for companyName, failure, email.
It looks like one of them is a None type when you are expecting it to be a string
(i.e. maybe you forgot the column or the row has a blank value?)
It looks like one of them is a None type when you are expecting it to be a string
(i.e. maybe you forgot the column or the row has a blank value?)
pf...@gmail.com <pf...@gmail.com> #24
I'm exporting to CSV, so I don't miss any column or data. I could be possible that
some rows contains some empty fields, but this should be managed by the exporter. I
think my error it's related with this bug. Thanks
some rows contains some empty fields, but this should be managed by the exporter. I
think my error it's related with this bug. Thanks
ku...@gmail.com <ku...@gmail.com> #25
it doesn't sound related to the bug at all. You need to change this :
'failure', lambda x: unicode(x, 'utf-8')
to:
def convert_failure(value):
if value is None:
return value
else:
return value.decode('utf-8')
'failure', convert_failure
'failure', lambda x: unicode(x, 'utf-8')
to:
def convert_failure(value):
if value is None:
return value
else:
return value.decode('utf-8')
'failure', convert_failure
pf...@gmail.com <pf...@gmail.com> #26
Sorry, I'm newbie in python, I'm just coding to export data from my Java Application,
I put this in my code like this, but I got "NameError: global name 'convert_failure'
is not defined", it's well defined ?, what happen with the lambda function ?
Thanks
from google.appengine.ext import db
from google.appengine.tools import bulkloader
class MigrationResult(db.Model):
companyName = db.StringProperty()
failure = db.StringProperty()
email = db.StringProperty()
class MigrationResultExporter(bulkloader.Exporter):
def __init__(self):
bulkloader.Exporter.__init__(self, 'MigrationResult',
[('companyName', convert_failure(x)),
('failure', convert_failure(x)),
('email', convert_failure(x)),
])
def convert_failure(value):
if value is None:
return value
else:
return value.decode('utf-8')
exporters = [MigrationResultExporter]
I put this in my code like this, but I got "NameError: global name 'convert_failure'
is not defined", it's well defined ?, what happen with the lambda function ?
Thanks
from google.appengine.ext import db
from google.appengine.tools import bulkloader
class MigrationResult(db.Model):
companyName = db.StringProperty()
failure = db.StringProperty()
email = db.StringProperty()
class MigrationResultExporter(bulkloader.Exporter):
def __init__(self):
bulkloader.Exporter.__init__(self, 'MigrationResult',
[('companyName', convert_failure(x)),
('failure', convert_failure(x)),
('email', convert_failure(x)),
])
def convert_failure(value):
if value is None:
return value
else:
return value.decode('utf-8')
exporters = [MigrationResultExporter]
ku...@gmail.com <ku...@gmail.com> #27
The documentation on App Engine for CSV uploading is very complete. Please read
through it carefully and keep in mind that the issue tracker is *not* a discussion
forum.
As stated in the documentation you need to pass in a callable. So just change
convert_failure(x) to convert_failure and move its definition to above the Exporter
subclass.
through it carefully and keep in mind that the issue tracker is *not* a discussion
forum.
As stated in the documentation you need to pass in a callable. So just change
convert_failure(x) to convert_failure and move its definition to above the Exporter
subclass.
pf...@gmail.com <pf...@gmail.com> #28
Ok Kumar, Finally I can export to csv. I think this would be useful to somebody else,
this is the final working code.
Thanks
from google.appengine.ext import db
from google.appengine.tools import bulkloader
class MigrationResult(db.Model):
companyName = db.StringProperty()
failure = db.StringProperty()
email = db.StringProperty()
def convert_failure(value):
if value is None:
return value
else:
return value.encode('utf-8')
class MigrationResultExporter(bulkloader.Exporter):
def __init__(self):
bulkloader.Exporter.__init__(self, 'MigrationResult',
[('companyName', convert_failure, None),
('failure', convert_failure, None),
('email', convert_failure, None),
])
exporters = [MigrationResultExporter]
invoked with this:
./appcfg.py download_data --config_file=latamvalley/exporter2.py
--filename=album_data_archive.csv --kind=MigrationResult
--url=http://python.latest.sandbox-getsense-it.appspot.com/remote_api latamvalley
this is the final working code.
Thanks
from google.appengine.ext import db
from google.appengine.tools import bulkloader
class MigrationResult(db.Model):
companyName = db.StringProperty()
failure = db.StringProperty()
email = db.StringProperty()
def convert_failure(value):
if value is None:
return value
else:
return value.encode('utf-8')
class MigrationResultExporter(bulkloader.Exporter):
def __init__(self):
bulkloader.Exporter.__init__(self, 'MigrationResult',
[('companyName', convert_failure, None),
('failure', convert_failure, None),
('email', convert_failure, None),
])
exporters = [MigrationResultExporter]
invoked with this:
./appcfg.py download_data --config_file=latamvalley/exporter2.py
--filename=album_data_archive.csv --kind=MigrationResult
--url=
Description
1. follow the Bulk Upload article here :
CSV has non-ascii values. You could try "Ivan Krsti\xc4\x87" which is
UTF-8 encoding of Ivan Krsti\u0107
2. Somewhere internally this creates Unicode strings so you get a
UnicodeDecodeError in google/appengine/ext/bulkload/__init__.py in Load()
on this line:
for columns in reader:
...
This is because the csv reader is not Unicode aware, see
to fix it, you'll need a "wrapper" that temporarily encodes Unicode objects
to UTF-8 byte strings, passes a line to CSV, then decodes back into Unicode.
This could be cleaned up some but I got it to work with these two methods:
def utf_8_encoder(unicode_data):
"""yields utf-8 encoded str objects for each chunk in
iterable, unicode_data
each chunk in unicode_data may or may not be unicode
(this is handled seemlessly)
Code is from
"""
for line in unicode_data:
if isinstance(line, unicode):
line = line.encode('utf-8')
yield line
def unicode_csv_reader(unicode_csv_data, dialect=csv.excel, **kwargs):
"""reads a csv file as unicode data
This is copied from
You use it just like the stdlib csv.reader
"""
# csv.py doesn't do Unicode; encode temporarily as UTF-8 str objects:
csv_reader = csv.reader(utf_8_encoder(unicode_csv_data),
dialect=dialect, **kwargs)
for row in csv_reader:
# decode UTF-8 back to Unicode, cell by cell:
yield [unicode(cell, 'utf-8') for cell in row]
...then I changed this line in google/appengine/ext/bulkload/__init__.py:Load()
reader = csv.reader(buffer, skipinitialspace=True)
to
reader = unicode_csv_reader(buffer, skipinitialspace=True)
also see issue155