My favorites | Sign in
Google
                
New issue | Search
for
| Advanced search | Search tips
Issue 217: Full-text search api
1493 people starred this issue.
Comments by non-members will not trigger notification emails to users who starred this issue.
Back to list
Status:  Started
Owner:  ----
Type-Feature
Priority-Medium
Component-Datastore


Sign in to add a comment
 
Reported by tran.the.master, Apr 15, 2008
The API currently offers a fairly limited functionality for text-search, based on the discussion on the 
list:

  http://groups.google.com/group/google-
appengine/browse_thread/thread/ba4a4a4ccefb96c5/0e3f0ab63c4c8afd?
lnk=gst&q=text+search#0e3f0ab63c4c8afd

I believe full-text search is fairly important feature for a lot of web applications.  For god sake, 
you're Google, how can you not? :-)




Comment 1 by daniel.wilkerson, Apr 24, 2008
I need full text search.

In the Rails world people tend to use Solr/Lucene.  I have not used them, but having seen a lecture it seems 
quite flexible as an API.  That is, it is not raw text search, but allows words in certain contexts, like attached 
keywords, to have more weight (I think it is called "boosting").
Comment 2 by danilat83, May 06, 2008
(No comment was entered for this change.)
Comment 3 by kubasik.kevin, Jun 15, 2008
Is there any word on a potential ETA for ranking? Is this a bug that we can expect
fixed eventually?
Comment 4 by mhanson, Jun 29, 2008
This would be a necessary feature for me to even consider using App Engine. The current text query API is totally  
inadequate.
Comment 5 by Machelski.Krzysztof, Jun 30, 2008
This is real blocker for me too. 
Comment 6 by mattias.johansson, Jul 01, 2008
mhanson, perhaps you should specifiy in what way the current API is inadequate. "full
text search" is quite a broad specification.
Comment 7 by wkornewald, Jul 01, 2008
We need exact and "startswith" matching. Case-insensitive is sufficient.

What's important for us is that we can specify which properties get searched (or at
least, which properties are searchable, at all). For example, we don't want a user's
email to match just because it happens to be a StringProperty. That's one of the
disadvantages of the current SearchableModel.

Also, it would be great to be able to retrieve only the *distinct* values of a
certain property (instead of getting full entities). For example, if our User object
had a "full name" StringProperty then we'd like to be able to search for "tho" and
get "Thomas X", "Thomas Y" instead of 100 times "Thomas X", "Thomas X", "Thomas X",
and then 100 times "Thomas Y", ... (just because there are 100 people with the same
name). It's sufficient if this matches only a single property. IOW, apart from
case-insensitivity this would be like walking the index of a property directly.
Comment 8 by DennisBPeterson, Jul 01, 2008
Also, more advanced text searching (stemming, etc). The nice stuff we're used to on 
the search engine. Google has the technology...a little birdie told me they're 
planning to apply it, but I don't know when.
Comment 9 by ronald.luitwieler, Jul 05, 2008
I also need it, hope they will deliver it soon!
Comment 10 by mhanson, Jul 05, 2008
Re #6: Sure, though this thread: http://groups.google.com/group/google-
appengine/browse_thread/thread/113020d7cbd69d8d/913569fcb72d6f1d?
lnk=gst&q=searchablemodel#913569fcb72d6f1d does a fine job describing the issues.  My feedback in this 
thread is intended largely as a product management nudge.

The SearchableModel "short-term" library is a step, of course, but a real full-text search library needs 
ranking, stemming, pluggable tokenizers, rich support for many human languages, stop words, boolean 
queries, support for multiple indexes per datastore, etc.  It must also have an indexing datastore that scales 
efficiently.  The developers know all this.  Presumably this bug plays the role of a tracker so that we can all 
click that little star and let them know we want it soon, please.

Issue-watchers, please check out the thread linked above to get a sense of the latest on this topic.  Google 
bug-scrubbers, feedback on this issue would be welcome, as it is introducing a lot of platform risk to people 
who are trying to decide whether to commit development energy to GAE.
Comment 11 by constantin.christmann, Jul 09, 2008
Me too
Comment 12 by dev7000, Aug 25, 2008
I need it also
Comment 13 by rosdikasim, Sep 05, 2008
I want this YESTERDAY!
Comment 14 by ma...@google.com, Sep 15, 2008
(No comment was entered for this change.)
Labels: -Type-Defect Type-Feature
Comment 15 by valdik2, Sep 19, 2008
Show stopper :-(
Comment 16 by a.s@google.com, Sep 25, 2008
(No comment was entered for this change.)
Status: Acknowledged
Comment 17 by troelsbay, Sep 26, 2008
Now that this issue has been acknowledged - it is equally important to note that full search in turn would need a 
bigger limit of indexed values. Currently, it is impossible to make more than a few hundred words pr 
row/article/post searchable as searchable indexed keywords are included in the limit.
Comment 18 by ajbrooks247, Sep 26, 2008
We are all searching in the dark for both our text and when we can expect a 
resolution to this major weakness in GAE. Road map please. Thanks.
Comment 19 by timur.br, Oct 01, 2008
I'm join to thread.
Comment 20 by j...@google.com, Oct 06, 2008
(No comment was entered for this change.)
Labels: Component-Datastore
Comment 21 by goutham.mullaguru, Oct 08, 2008
show stopper
Comment 22 by merpattersonnet, Oct 29, 2008
+1000
Comment 23 by edgar.jose.fernando.delgado, Nov 20, 2008
hello, maybe it can help, I changed the code to a custom one, it work's nice to me,
maybe for you also, so:

class Cancion(mysearch.SearchableModel):
    nombre_con_espacios_t = db.StringProperty()

the class must have the "nombre_con_espacios_t" attribute, because this attribute
will be the only attribute to be taken in acount to search.

it works as follow:

Creation:

cancion = Cancion(nombre_con_espacios_t="colombia es linda")
cancion.put()

Modification:

cancion.put(update_searchable=True)

Note, the FULL_TEXT_MIN_LENGTH_IN helps me to break more the regex :P
__init__.py
10.6 KB   Download
Comment 24 by edgar.jose.fernando.delgado, Nov 26, 2008
If some one find problems with it, would be great to know about those problems.
Comment 25 by bowmanb, Dec 18, 2008
show stopper
Comment 26 by james.zhuo, Dec 18, 2008
definitely a show stopper, i've wrote a blog about search.SearchableModel and why it doesn't work, check it out if 
you are looking for more info http://zhuocorporation.spaces.live.com/blog/cns!D76A58A7350B0D0B!1824.entry


Comment 27 by sebastianovide, Jan 25, 2009
it is a must !!!!
Comment 28 by sebastien.bruel, Jan 26, 2009
It would be really nice to know if this feature has not only been acknowledged but if
it:s also somewhere on the roadmap.
It's not on the one published at http://code.google.com/appengine/docs/roadmap.html
for the October 2008 - March 2008 period

I would really like to use App Engine seriously but like many people the lack of
full-text search could be a show stopper.
I can't see any good work around that would not require a huge amount of work.

I really hope full-text search will get a higher priority soon.
Comment 29 by patelgopal, Jan 26, 2009
http://code.google.com/p/googleappengine/issues/detail?id=208&q=google%20api%20google%20app%20engine&colspec=ID%20Type%20Status%20Priority%20Stars%20Owner%20Summary%20Log%20Component

may be we should star the above issue. and once the google apis integrated directly
with app engine , we can just use google base for search. nice and dandy.
Comment 30 by hus...@gmail.com, Feb 22, 2009
What an irony.

Google - the  mother of all search engines, developed a cloud which doesn't offer a 
text search in its API.

;-))))

It should have been the first function in the api. Isn't it all about search at 
Google???

Comment 31 by choongng, Feb 22, 2009
If anyone wants some rudimentary indexing and searching code shoot me a message. It basically stems and 
filters stop words, then put the resulting list into a table that maps word to to the indexed objects. It sounds like 
it could be slow but in practice it's been working fine. Currently the search is multi-word AND but extending it 
to support phrases wouldn't be a big deal.
Comment 32 by riklaunim, Feb 25, 2009
What about Google CSE? It's not something you can play on the server side, 
but if your site is public and google likes it - quota free full text search :)
Comment 33 by jairomolinajr, Feb 25, 2009
CSE is great for public sites but I still need unicode text search inside the
datastore to get the SQL functionality I am used to with commands like "LIKE
'xxx%yyy'". I hope we will have a solution for this issue soon... GAE is great!
Comment 34 by bjneary, Feb 27, 2009
It would be great to have full text search----PLEASE?!!!
Comment 35 by cris7...@hotmail.com, Mar 01, 2009
jhjhjhhjhjh
Comment 36 by royapetersen, Mar 04, 2009
Please add a search feature! I am a teacher using wordle in my classroom and it 
would be soooo helpful!
Comment 37 by seharebo, Mar 04, 2009
this is a must
Comment 38 by tally.wa13, Mar 04, 2009
i believe that this would a great thing to have.
Comment 39 by yann.leblevec, Mar 09, 2009
Highly required feature (especially when the service provider is Google)...
Comment 40 by paul.springford, Mar 09, 2009
Just to like to add my name to the list of people hoping text search comes soon for
Wordle users.
Comment 41 by shrtsa...@hotmail.com, Mar 09, 2009
searching text would be great!
Comment 42 by goduke341, Mar 10, 2009
how do i search for a wordle

Comment 43 by troelsbay, Mar 10, 2009
It would be really great if you guys would stop writing a comment saying "Me too!", "This issue is crucial!" etc. 
Can you please just follow normal procedure and star the item, so everybody at Google knows that this is an 
issue many people care about (by counting stars), and I can stop getting your useless (no offense) comments in 
my inbox? Thx.
Comment 44 by goduke341, Mar 10, 2009
please just help me i need to search something on wordle just go to this site then 
try to search for a wordle http://www.wordle.net/
i cant figure out how just go to the site figure out how to search for a wordle then 
tell me please
Comment 45 by goduke341, Mar 10, 2009
i voted
Comment 46 by goduke341, Mar 10, 2009
by hiting the star so please help me search

Comment 47 by grandmapuffy, Mar 12, 2009
Please address this issue!
Comment 48 by OakdaleFTL, Mar 12, 2009
Search is a must-have feature...
Comment 49 by henryrun, Mar 14, 2009
Yes, give me search on Wordle...otherwise I can never find my genial word-picture again!
Comment 50 by pgusler, Mar 15, 2009
As an educator it is vital that I be able to go in and find the word clouds my kids
have produced in order to trully validate their work.  Please work on this.
Comment 51 by cpp...@163.com, Mar 16, 2009
Give me chinese fulltext search!
Comment 52 by larissa.halishoff, Mar 17, 2009
Really, really need to be able to search ....
Comment 53 by service....@gmail.com, Apr 01, 2009
Give me RLIKE !! regex support !!
Comment 54 by tiwari.aditya13, Apr 06, 2009
cant we make a work around for it our self?
Comment 55 by Konstantin.Solomatov, Apr 08, 2009
In usual web application it's possible to use Lucene or similar solution but 
AppEngine doesn't proivde access to FS so we have no workaround here. IMO, it's must 
have in order to be useful.
Comment 56 by tombrander, Apr 08, 2009
I vote that this is really a key restricting issue to better adoption of GAE.. Help 
us please!!
Comment 57 by edmaroliveiraferreira, Apr 09, 2009
Please, fulltext search now !

Comment 58 by cemalettin.koc, Apr 15, 2009
Another vote :)
Comment 59 by jasoncwarner, Apr 16, 2009
like syntax please!
Comment 60 by drawkbox, Apr 16, 2009
I have already starred this but would just like to say that this probably is the top
feature needed really. Now that Java has been added which adds numerous languages, I
think that searching is important because it is the next major performance hold up on
GAE and in most applications.  Hoping this is in there soon!  Keyword search is
actually one of the biggest walls on most projects. People might start using GAE just
to aggregate searching without a Google appliance. 

Search is Google's killer feature, it could also be in the "cloud".
Comment 61 by arthur.kalm, Apr 17, 2009
If you're using GAE/J, you might be interested in Compass, which sits on top of Lucene. It seems Compass works 
in GAE/J as described here: http://www.kimchy.org/searchable-google-appengine-with-compass/
Comment 62 by kiss242, Apr 20, 2009
ghs.google.com has been blocked in some countries, which means that your 
users/clients in these countries are not able to access your GAE services with your 
own domain name.

See http://code.google.com/p/googleappengine/issues/detail?id=1269 for more details.


Comment 63 by imyousuf, Apr 20, 2009
This is quite interesting actually.
http://www.dzone.com/links/searchable_google_appengine_with_compass.html
Comment 64 by arthur.kalm, Apr 20, 2009
#63, yep, that's what I linked to in #61 :P
Comment 65 by imyousuf, Apr 20, 2009
Sorry Arthur, I missed it then :)
Comment 66 by arthur.kalm, Apr 20, 2009
With so many comments, I would have missed it too ;)
Comment 67 by execute.code, Apr 30, 2009
+1, it would be great if Search API could be provided. I have been using the
Searchable Model for quite sometime, and it fits to my basic needs, but i know a
search engine that has made the expectations of people really high, as soon as they
see a search box :-)
Comment 68 by ozgurisil, Apr 30, 2009
when I went through the features of GAE, I neglected to inspect the one feature that
couldnt be missing: "eh, it is google, so why waste time to check that full-text
search is there?". now it is time to place the search box on the page and I'm lost. 

ps. the priority of this issue is just "medium" and roadmap doesnt include a solution
yet. beautiful..
Comment 69 by zbynek.winkler, Apr 30, 2009
Please stop adding pointless comments with no new information. Spamming over a 1100 
people that starred the issue with "me too" or "+1" comments doesn't get it done any 
faster. I bet google already knows we want it. Thanks.
Comment 70 by wkornewald, May 02, 2009
Hi everyone,
we're the developers behind app-engine-patch
(http://code.google.com/p/app-engine-patch/). We'd like to sell our "search" package
which provides a more powerful feature set than SearchableModel and should help make
the wait for Google's full-text search API less painful. The features are described
in this post:

http://tinyurl.com/dxen3z

If you're potentially interested in buying our search package (for a
one-time fee) please take part in this short survey, primarily to help
us find a fair price:

http://www.surveymonkey.com/s.aspx?sm=CzIohuPfdcTL8z484vcX4Q_3d_3d

While there is not yet a demo site we hope that you can at least give
an approximate estimate. Thanks a lot!

Best regards,
Waldemar Kornewald
Comment 71 by garfunckle, May 08, 2009
I have created a full text search api by porting http://whoosh.ca/ so it is avaliable
on AppEngine. (it stores the index in the datastore)

You can download it from http://github.com/tallstreet/Whoosh-AppEngine/tree/master

It includes all of Whooshes features including:

# Pythonic API. 
# Fielded indexing and search.
# Fast indexing and retrieval
# Pluggable scoring algorithm (including BM25F), text analysis, storage, posting
format, etc.
# Powerful query language parsed by pyparsing.
# Pure Python spell-checker

Comment 72 by wkornewald, May 09, 2009
Nice! How many entities can be handled efficiently with your whoosh port? Do you have
any real-world data on how well it handles concurrent writes?
Comment 73 by sappenin, May 09, 2009
Similar question as wkornewald -- how will something like this scale?  You indicate
that the index is stored in the Google Datastore.  Does this mean that it should
scale as well as the GAE datastore does (~1 - 10 writes per second, etc).
Comment 74 by tallstreet, May 09, 2009
The only real world app using this at present is http://appjects.appspot.com/ (all
categories and searches are powered by it).

Adding a document requires 4 writes and 4 deletes to the datastore. 

Its not threadsafe, I recommend you add entities on one thread and index them in a
single thread on a cron.

I've put memcache caching on the index so searching should be very quick. 

The other limitation will be the size of the index, it uses the same index file
format that whoosh uses but just stores the files in the datastore, therefore the
whole index is stored in a 4 datastore entities, I believe there is a 10 Mb limit on
each one?


Comment 75 by wkornewald, May 10, 2009
It's actually a 1MB limit. Is there no possibility to distribute the index across
more than 4 datastore entries?

Also, I quickly skimmed through the source and I haven't seen a command for splitting
the indexing task into many small tasks, so you don't hit the request limits. Is
there anything like that?
Comment 76 by mdkachmar, Jun 10, 2009
Hi Waldemar,

In one of your posts you indirectly suggest that Google might someday be releasing a
full-text search API.

Based on the social evidence to date, though, I wouldn't be so sure about that. 

Do you really believe that Google has intentions of extending GAE with full-text
search and if so, why?




Comment 77 by wkornewald, Jun 10, 2009
Hi! Well, Google developers indicated in presentations that they have it on their 
roadmap, but it's very complicated and thus won't be added too soon (don't expect it to 
be released this year). I don't see why Google wouldn't want to add that feature. 
Without full-text search App Engine is very limited. I'm sure that sooner or later 
we'll get it. :)
Comment 78 by mdkachmar, Jun 10, 2009
Waldemar

I sure hope that's the case but something is telling me that this full-text search
issue might be more about the law of unintended consequences than a good
old-fashioned technical challenge. 

Could full-text search on AppEngine somehow threaten Google's search and advertising
franchise? If so, that would explain a lot. 

Also, I was a little disheartened to hear you suggest full-text search might not be
available until 2010 at the earliest.

Seems very odd to me, especially given the Bay Area operates at warp-speed. 


Comment 80 by lem...@gmail.com, Jun 10, 2009
This isn't about threatening Google's business model.  App Engine itself generates revenue.

Unfortunately, full-text search isn't trivial.  Further, Google generally does not announce features before they are 
available.  When pressed, Googlers tend to indicate how important they think a feature is, and then announce 
that they "have nothing to announce."

Brett Slatkin, a developer on the App Engine team, indicated that they are well aware of how many developers 
want full-text search.   Watch http://tinyurl.com/lug7j5 to see him address the issue.  As he mentioned, there is 
support for rudimentary full-text search via the google.appengine.ext.search module.  See http://tinyurl.com/3ndnge for some documentation.
Comment 81 by app-eng...@scholardocs.com, Jun 24, 2009
Hi,
we'd like to announce the immediate availablility of our full-text search package
(based on the same principle as SearchableModel, but much more flexible and
feature-rich).
It's called gae-search:
http://gae-full-text-search.appspot.com/

See it in action by searching our documentation (which is indexed with gae-search).
We also have a few demos.

Note that gae-search requires app-engine-patch (Django).

Features:
* index only specific properties (instead of all string/text properties like in
SearchableModel)
* Porter stemmers (increase search quality)
* sort your results (at least a little bit) via chain-sorting
* make "DISTINCT" queries using a so-called "values index"
* auto-completion via a jQuery plugin
* key-based pagination (fully unit-tested implementation of Ryan Barrett's algorithm)
* easy to use views and templates (add search support in just a few lines)

Since it took a lot of effort to implement all features and make them easy to use we
can't give this away for free, though. We initially implemented it for our own
projects, but after so many people complained about the lack of full-text search we
though we could provide it to others - for a little compensation.

Bye,
Waldemar Kornewald & Thomas Wanschik (the creators of app-engine-patch)
Comment 82 by tomconn, Jun 25, 2009
See the following article for a simple search 
    http://www.devx.com/Java/Article/42216
Comment 83 by shore.cloud, Jun 26, 2009
Is there such a patch for java platform?
Comment 84 by wkornewald, Jun 28, 2009
app-engine-patch is only available for Python. I don't know if there is some comparable 
App Engine project on the Java side. Currently, we have no plans to port gae-search to 
Java (well, unless there is really high demand which would justify all the work 
involved).
Comment 85 by shore.cloud, Jun 29, 2009
IC,
so I need to switch to GAE for Python.
Where is the demo for using SearchableModel to do full-text search available?
Comment 86 by billkatz, Jun 29, 2009
I just released a simple full text search module for GAE python under a MIT license.  I 
believe it is better than SearchableModel.  For more details, see: 

http://bit.ly/11yLv5
Comment 87 by shore.cloud, Jul 02, 2009
That's cool,

but is it possible to search within a specified document?
Comment 88 by shore.cloud, Jul 02, 2009
I think I need to talk about it more specificly.

By default we search a keyword against many documents.

But now I've restored 1M keywords in datastore and a specified document.

I want to find out which of 1M keywords match the specified document.

Is there an efficient solution?
Comment 89 by bdonlan, Jul 02, 2009
@shore.cloud:

What you describe is not full-text search. You should ask for advice on the google
group; this bug is subscribed to by over a thousand people, and should only be used
for on-topic discussions.
Comment 90 by shore.cloud, Jul 03, 2009
@bdonlan,
sorry for off-topic.

One more thing I want to know about full-text search is:

Is it possible to restrict the range of documents?

Say,sometimes I want to search about 'news',sometimes about 'job'?

It seems by default all entities are indexed together?
Comment 91 by app-eng...@scholardocs.com, Jul 10, 2009
Hi,
we've released a new gae-search version (full-text search for App Engine + Django).
Now, there's a "Free" version which can be used in non-commercial projects. Get it here:
http://gae-full-text-search.appspot.com/

We've also implemented the relations index technique (index is moved into a separate
child entity), but you can optionally turn it off. It's important to note that you
can integrate your model's properties into the generated index such that you can run
a full-text search and limit the results with additional filter rules (e.g., search
only in published blog posts).

The relations and values indexes are now generated via background tasks, so put()s
are much faster.

Finally, you can combine geomodel with our easy to use views in order to do a
full-text proximity search, for example. Just use the new query_converter parameter
to pass the full-text query to geomodel.

Bye,
Waldemar Kornewald & Thomas Wanschik
Comment 92 by swetashah10, Jul 12, 2009
:-)
Comment 93 by shenguochun, Jul 23, 2009
full text support for all languagues is needed!
Comment 94 by lugieb...@gmail.com, Aug 03, 2009
I think this would be a simpler to implement solution that would put the workload on 
us, developers, and would give us a powerful tool to do much more than just full-text 
search.

http://code.google.com/p/googleappengine/issues/detail?id=1935
Comment 95 by bernddorn, Sep 22, 2009
a warning to  anybody planning to use listproperty based fulltext implementations. list based fulltext search is 
based on the fact that AND joins are done if one filters on the same attribute more than one time like:

q = Doc.all().filter('term', 'cat').filter('term', 'dog)

this does work in first place without the need of an exploding index which (in practice cannot be built on 
appgine if you have long lists of words or many docs). indexe entries would look like this.

- kind: Doc
  properties:
  - name: term
  - name: term

there must be some internal limit on appengine that prevents joins with many intermediate results. so this 
pattern is not applicable for real world applications.

up till now any fulltext implementation i have seen so far for appengine is based on this pattern, so don't use 
it.



Comment 96 by max.ross, Oct 02, 2009
I'm marking this issue as "Started" but I want to set expectations appropriately: 
This is a major undertaking and this will not happen soon, even by the most generous
definition of soon.
Status: Started
Comment 97 by max.oizo.biz, Oct 02, 2009
This is a very good news :)
Comment 98 by nearmars, Oct 02, 2009
Wow! I'm very happy to hear that. Please let us know when you need testing help etc.

Comment 99 by jdpatterson, Oct 02, 2009
I am curious if the approach is to supply an out of the box take-it-or-leave-it search or to provide the low level 
tools required to implement something like a distributed Lucene index.  Will it be based on Map Reduce?  Good 
new in any case!
Comment 100 by cssguru, Oct 03, 2009
Wow, I never saw "Started" before :) Anyway, please give us a "rouge" Roadmap, "will
not happen soon" ... its 6 months, 1 year or 2+ years? There are many developers who
need this feature urgent, I've stopped most of my appeninge activity because I've
fight with too many limitations. But I'm very excited about the recent changes, just
miss a more detailed roadmap.
Comment 101 by codegent, Oct 03, 2009
So happy see the status has changed.. you guys rock. 
Comment 102 by krpaum, Oct 21, 2009
www.google.com

Comment 103 by babie.tanaka, Nov 24, 2009
n-gram for japanese please.
Comment 104 by sfornengo, Dec 02, 2009
please, in a first step, increasing the limit of 5000 indexes could allow to wait
until the arrival of a real full text search solution and unlock many projects.
Comment 105 by pra...@gmail.com, Dec 19, 2009
Happy to see the started status. At least the wheels are rolling. We will get to 
destination.. 
Sign in to add a comment