| Issue 217: | Full-text search api | |
|
1493 people starred this issue.
Comments by non-members will not trigger notification emails to users who starred this issue. |
Back to list |
Sign in to add a comment
|
The API currently offers a fairly limited functionality for text-search, based on the discussion on the list: http://groups.google.com/group/google- appengine/browse_thread/thread/ba4a4a4ccefb96c5/0e3f0ab63c4c8afd? lnk=gst&q=text+search#0e3f0ab63c4c8afd I believe full-text search is fairly important feature for a lot of web applications. For god sake, you're Google, how can you not? :-) |
||||||||||
,
Apr 24, 2008
I need full text search. In the Rails world people tend to use Solr/Lucene. I have not used them, but having seen a lecture it seems quite flexible as an API. That is, it is not raw text search, but allows words in certain contexts, like attached keywords, to have more weight (I think it is called "boosting"). |
|||||||||||
,
May 06, 2008
(No comment was entered for this change.) |
|||||||||||
,
Jun 15, 2008
Is there any word on a potential ETA for ranking? Is this a bug that we can expect fixed eventually? |
|||||||||||
,
Jun 29, 2008
This would be a necessary feature for me to even consider using App Engine. The current text query API is totally inadequate. |
|||||||||||
,
Jun 30, 2008
This is real blocker for me too. |
|||||||||||
,
Jul 01, 2008
mhanson, perhaps you should specifiy in what way the current API is inadequate. "full text search" is quite a broad specification. |
|||||||||||
,
Jul 01, 2008
We need exact and "startswith" matching. Case-insensitive is sufficient. What's important for us is that we can specify which properties get searched (or at least, which properties are searchable, at all). For example, we don't want a user's email to match just because it happens to be a StringProperty. That's one of the disadvantages of the current SearchableModel. Also, it would be great to be able to retrieve only the *distinct* values of a certain property (instead of getting full entities). For example, if our User object had a "full name" StringProperty then we'd like to be able to search for "tho" and get "Thomas X", "Thomas Y" instead of 100 times "Thomas X", "Thomas X", "Thomas X", and then 100 times "Thomas Y", ... (just because there are 100 people with the same name). It's sufficient if this matches only a single property. IOW, apart from case-insensitivity this would be like walking the index of a property directly. |
|||||||||||
,
Jul 01, 2008
Also, more advanced text searching (stemming, etc). The nice stuff we're used to on the search engine. Google has the technology...a little birdie told me they're planning to apply it, but I don't know when. |
|||||||||||
,
Jul 05, 2008
I also need it, hope they will deliver it soon! |
|||||||||||
,
Jul 05, 2008
Re #6: Sure, though this thread: http://groups.google.com/group/google- appengine/browse_thread/thread/113020d7cbd69d8d/913569fcb72d6f1d? lnk=gst&q=searchablemodel#913569fcb72d6f1d does a fine job describing the issues. My feedback in this thread is intended largely as a product management nudge. The SearchableModel "short-term" library is a step, of course, but a real full-text search library needs ranking, stemming, pluggable tokenizers, rich support for many human languages, stop words, boolean queries, support for multiple indexes per datastore, etc. It must also have an indexing datastore that scales efficiently. The developers know all this. Presumably this bug plays the role of a tracker so that we can all click that little star and let them know we want it soon, please. Issue-watchers, please check out the thread linked above to get a sense of the latest on this topic. Google bug-scrubbers, feedback on this issue would be welcome, as it is introducing a lot of platform risk to people who are trying to decide whether to commit development energy to GAE. |
|||||||||||
,
Jul 09, 2008
Me too |
|||||||||||
,
Aug 25, 2008
I need it also |
|||||||||||
,
Sep 05, 2008
I want this YESTERDAY! |
|||||||||||
,
Sep 15, 2008
(No comment was entered for this change.)
Labels: -Type-Defect Type-Feature
|
|||||||||||
,
Sep 19, 2008
Show stopper :-( |
|||||||||||
,
Sep 25, 2008
(No comment was entered for this change.)
Status: Acknowledged
|
|||||||||||
,
Sep 26, 2008
Now that this issue has been acknowledged - it is equally important to note that full search in turn would need a bigger limit of indexed values. Currently, it is impossible to make more than a few hundred words pr row/article/post searchable as searchable indexed keywords are included in the limit. |
|||||||||||
,
Sep 26, 2008
We are all searching in the dark for both our text and when we can expect a resolution to this major weakness in GAE. Road map please. Thanks. |
|||||||||||
,
Oct 01, 2008
I'm join to thread. |
|||||||||||
,
Oct 06, 2008
(No comment was entered for this change.)
Labels: Component-Datastore
|
|||||||||||
,
Oct 08, 2008
show stopper |
|||||||||||
,
Oct 29, 2008
+1000 |
|||||||||||
,
Nov 20, 2008
hello, maybe it can help, I changed the code to a custom one, it work's nice to me,
maybe for you also, so:
class Cancion(mysearch.SearchableModel):
nombre_con_espacios_t = db.StringProperty()
the class must have the "nombre_con_espacios_t" attribute, because this attribute
will be the only attribute to be taken in acount to search.
it works as follow:
Creation:
cancion = Cancion(nombre_con_espacios_t="colombia es linda")
cancion.put()
Modification:
cancion.put(update_searchable=True)
Note, the FULL_TEXT_MIN_LENGTH_IN helps me to break more the regex :P
|
|||||||||||
,
Nov 26, 2008
If some one find problems with it, would be great to know about those problems. |
|||||||||||
,
Dec 18, 2008
show stopper |
|||||||||||
,
Dec 18, 2008
definitely a show stopper, i've wrote a blog about search.SearchableModel and why it doesn't work, check it out if you are looking for more info http://zhuocorporation.spaces.live.com/blog/cns!D76A58A7350B0D0B!1824.entry |
|||||||||||
,
Jan 25, 2009
it is a must !!!! |
|||||||||||
,
Jan 26, 2009
It would be really nice to know if this feature has not only been acknowledged but if it:s also somewhere on the roadmap. It's not on the one published at http://code.google.com/appengine/docs/roadmap.html for the October 2008 - March 2008 period I would really like to use App Engine seriously but like many people the lack of full-text search could be a show stopper. I can't see any good work around that would not require a huge amount of work. I really hope full-text search will get a higher priority soon. |
|||||||||||
,
Jan 26, 2009
http://code.google.com/p/googleappengine/issues/detail?id=208&q=google%20api%20google%20app%20engine&colspec=ID%20Type%20Status%20Priority%20Stars%20Owner%20Summary%20Log%20Component may be we should star the above issue. and once the google apis integrated directly with app engine , we can just use google base for search. nice and dandy. |
|||||||||||
,
Feb 22, 2009
What an irony. Google - the mother of all search engines, developed a cloud which doesn't offer a text search in its API. ;-)))) It should have been the first function in the api. Isn't it all about search at Google??? |
|||||||||||
,
Feb 22, 2009
If anyone wants some rudimentary indexing and searching code shoot me a message. It basically stems and filters stop words, then put the resulting list into a table that maps word to to the indexed objects. It sounds like it could be slow but in practice it's been working fine. Currently the search is multi-word AND but extending it to support phrases wouldn't be a big deal. |
|||||||||||
,
Feb 25, 2009
What about Google CSE? It's not something you can play on the server side, but if your site is public and google likes it - quota free full text search :) |
|||||||||||
,
Feb 25, 2009
CSE is great for public sites but I still need unicode text search inside the datastore to get the SQL functionality I am used to with commands like "LIKE 'xxx%yyy'". I hope we will have a solution for this issue soon... GAE is great! |
|||||||||||
,
Feb 27, 2009
It would be great to have full text search----PLEASE?!!! |
|||||||||||
,
Mar 01, 2009
jhjhjhhjhjh |
|||||||||||
,
Mar 04, 2009
Please add a search feature! I am a teacher using wordle in my classroom and it would be soooo helpful! |
|||||||||||
,
Mar 04, 2009
this is a must |
|||||||||||
,
Mar 04, 2009
i believe that this would a great thing to have. |
|||||||||||
,
Mar 09, 2009
Highly required feature (especially when the service provider is Google)... |
|||||||||||
,
Mar 09, 2009
Just to like to add my name to the list of people hoping text search comes soon for Wordle users. |
|||||||||||
,
Mar 09, 2009
searching text would be great! |
|||||||||||
,
Mar 10, 2009
how do i search for a wordle |
|||||||||||
,
Mar 10, 2009
It would be really great if you guys would stop writing a comment saying "Me too!", "This issue is crucial!" etc. Can you please just follow normal procedure and star the item, so everybody at Google knows that this is an issue many people care about (by counting stars), and I can stop getting your useless (no offense) comments in my inbox? Thx. |
|||||||||||
,
Mar 10, 2009
please just help me i need to search something on wordle just go to this site then try to search for a wordle http://www.wordle.net/ i cant figure out how just go to the site figure out how to search for a wordle then tell me please |
|||||||||||
,
Mar 10, 2009
i voted |
|||||||||||
,
Mar 10, 2009
by hiting the star so please help me search |
|||||||||||
,
Mar 12, 2009
Please address this issue! |
|||||||||||
,
Mar 12, 2009
Search is a must-have feature... |
|||||||||||
,
Mar 14, 2009
Yes, give me search on Wordle...otherwise I can never find my genial word-picture again! |
|||||||||||
,
Mar 15, 2009
As an educator it is vital that I be able to go in and find the word clouds my kids have produced in order to trully validate their work. Please work on this. |
|||||||||||
,
Mar 16, 2009
Give me chinese fulltext search! |
|||||||||||
,
Mar 17, 2009
Really, really need to be able to search .... |
|||||||||||
,
Apr 01, 2009
Give me RLIKE !! regex support !! |
|||||||||||
,
Apr 06, 2009
cant we make a work around for it our self? |
|||||||||||
,
Apr 08, 2009
In usual web application it's possible to use Lucene or similar solution but AppEngine doesn't proivde access to FS so we have no workaround here. IMO, it's must have in order to be useful. |
|||||||||||
,
Apr 08, 2009
I vote that this is really a key restricting issue to better adoption of GAE.. Help us please!! |
|||||||||||
,
Apr 09, 2009
Please, fulltext search now ! |
|||||||||||
,
Apr 15, 2009
Another vote :) |
|||||||||||
,
Apr 16, 2009
like syntax please! |
|||||||||||
,
Apr 16, 2009
I have already starred this but would just like to say that this probably is the top feature needed really. Now that Java has been added which adds numerous languages, I think that searching is important because it is the next major performance hold up on GAE and in most applications. Hoping this is in there soon! Keyword search is actually one of the biggest walls on most projects. People might start using GAE just to aggregate searching without a Google appliance. Search is Google's killer feature, it could also be in the "cloud". |
|||||||||||
,
Apr 17, 2009
If you're using GAE/J, you might be interested in Compass, which sits on top of Lucene. It seems Compass works in GAE/J as described here: http://www.kimchy.org/searchable-google-appengine-with-compass/ |
|||||||||||
,
Apr 20, 2009
ghs.google.com has been blocked in some countries, which means that your users/clients in these countries are not able to access your GAE services with your own domain name. See http://code.google.com/p/googleappengine/issues/detail?id=1269 for more details. |
|||||||||||
,
Apr 20, 2009
This is quite interesting actually. http://www.dzone.com/links/searchable_google_appengine_with_compass.html |
|||||||||||
,
Apr 20, 2009
#63, yep, that's what I linked to in #61 :P |
|||||||||||
,
Apr 20, 2009
Sorry Arthur, I missed it then :) |
|||||||||||
,
Apr 20, 2009
With so many comments, I would have missed it too ;) |
|||||||||||
,
Apr 30, 2009
+1, it would be great if Search API could be provided. I have been using the Searchable Model for quite sometime, and it fits to my basic needs, but i know a search engine that has made the expectations of people really high, as soon as they see a search box :-) |
|||||||||||
,
Apr 30, 2009
when I went through the features of GAE, I neglected to inspect the one feature that couldnt be missing: "eh, it is google, so why waste time to check that full-text search is there?". now it is time to place the search box on the page and I'm lost. ps. the priority of this issue is just "medium" and roadmap doesnt include a solution yet. beautiful.. |
|||||||||||
,
Apr 30, 2009
Please stop adding pointless comments with no new information. Spamming over a 1100 people that starred the issue with "me too" or "+1" comments doesn't get it done any faster. I bet google already knows we want it. Thanks. |
|||||||||||
,
May 02, 2009
Hi everyone, we're the developers behind app-engine-patch (http://code.google.com/p/app-engine-patch/). We'd like to sell our "search" package which provides a more powerful feature set than SearchableModel and should help make the wait for Google's full-text search API less painful. The features are described in this post: http://tinyurl.com/dxen3z If you're potentially interested in buying our search package (for a one-time fee) please take part in this short survey, primarily to help us find a fair price: http://www.surveymonkey.com/s.aspx?sm=CzIohuPfdcTL8z484vcX4Q_3d_3d While there is not yet a demo site we hope that you can at least give an approximate estimate. Thanks a lot! Best regards, Waldemar Kornewald |
|||||||||||
,
May 08, 2009
I have created a full text search api by porting http://whoosh.ca/ so it is avaliable on AppEngine. (it stores the index in the datastore) You can download it from http://github.com/tallstreet/Whoosh-AppEngine/tree/master It includes all of Whooshes features including: # Pythonic API. # Fielded indexing and search. # Fast indexing and retrieval # Pluggable scoring algorithm (including BM25F), text analysis, storage, posting format, etc. # Powerful query language parsed by pyparsing. # Pure Python spell-checker |
|||||||||||
,
May 09, 2009
Nice! How many entities can be handled efficiently with your whoosh port? Do you have any real-world data on how well it handles concurrent writes? |
|||||||||||
,
May 09, 2009
Similar question as wkornewald -- how will something like this scale? You indicate that the index is stored in the Google Datastore. Does this mean that it should scale as well as the GAE datastore does (~1 - 10 writes per second, etc). |
|||||||||||
,
May 09, 2009
The only real world app using this at present is http://appjects.appspot.com/ (all categories and searches are powered by it). Adding a document requires 4 writes and 4 deletes to the datastore. Its not threadsafe, I recommend you add entities on one thread and index them in a single thread on a cron. I've put memcache caching on the index so searching should be very quick. The other limitation will be the size of the index, it uses the same index file format that whoosh uses but just stores the files in the datastore, therefore the whole index is stored in a 4 datastore entities, I believe there is a 10 Mb limit on each one? |
|||||||||||
,
May 10, 2009
It's actually a 1MB limit. Is there no possibility to distribute the index across more than 4 datastore entries? Also, I quickly skimmed through the source and I haven't seen a command for splitting the indexing task into many small tasks, so you don't hit the request limits. Is there anything like that? |
|||||||||||
,
Jun 10, 2009
Hi Waldemar, In one of your posts you indirectly suggest that Google might someday be releasing a full-text search API. Based on the social evidence to date, though, I wouldn't be so sure about that. Do you really believe that Google has intentions of extending GAE with full-text search and if so, why? |
|||||||||||
,
Jun 10, 2009
Hi! Well, Google developers indicated in presentations that they have it on their roadmap, but it's very complicated and thus won't be added too soon (don't expect it to be released this year). I don't see why Google wouldn't want to add that feature. Without full-text search App Engine is very limited. I'm sure that sooner or later we'll get it. :) |
|||||||||||
,
Jun 10, 2009
Waldemar I sure hope that's the case but something is telling me that this full-text search issue might be more about the law of unintended consequences than a good old-fashioned technical challenge. Could full-text search on AppEngine somehow threaten Google's search and advertising franchise? If so, that would explain a lot. Also, I was a little disheartened to hear you suggest full-text search might not be available until 2010 at the earliest. Seems very odd to me, especially given the Bay Area operates at warp-speed. |
|||||||||||
,
Jun 10, 2009
This isn't about threatening Google's business model. App Engine itself generates revenue. Unfortunately, full-text search isn't trivial. Further, Google generally does not announce features before they are available. When pressed, Googlers tend to indicate how important they think a feature is, and then announce that they "have nothing to announce." Brett Slatkin, a developer on the App Engine team, indicated that they are well aware of how many developers want full-text search. Watch http://tinyurl.com/lug7j5 to see him address the issue. As he mentioned, there is support for rudimentary full-text search via the google.appengine.ext.search module. See http://tinyurl.com/3ndnge for some documentation. |
|||||||||||
,
Jun 24, 2009
Hi, we'd like to announce the immediate availablility of our full-text search package (based on the same principle as SearchableModel, but much more flexible and feature-rich). It's called gae-search: http://gae-full-text-search.appspot.com/ See it in action by searching our documentation (which is indexed with gae-search). We also have a few demos. Note that gae-search requires app-engine-patch (Django). Features: * index only specific properties (instead of all string/text properties like in SearchableModel) * Porter stemmers (increase search quality) * sort your results (at least a little bit) via chain-sorting * make "DISTINCT" queries using a so-called "values index" * auto-completion via a jQuery plugin * key-based pagination (fully unit-tested implementation of Ryan Barrett's algorithm) * easy to use views and templates (add search support in just a few lines) Since it took a lot of effort to implement all features and make them easy to use we can't give this away for free, though. We initially implemented it for our own projects, but after so many people complained about the lack of full-text search we though we could provide it to others - for a little compensation. Bye, Waldemar Kornewald & Thomas Wanschik (the creators of app-engine-patch) |
|||||||||||
,
Jun 25, 2009
See the following article for a simple search
http://www.devx.com/Java/Article/42216
|
|||||||||||
,
Jun 26, 2009
Is there such a patch for java platform? |
|||||||||||
,
Jun 28, 2009
app-engine-patch is only available for Python. I don't know if there is some comparable App Engine project on the Java side. Currently, we have no plans to port gae-search to Java (well, unless there is really high demand which would justify all the work involved). |
|||||||||||
,
Jun 29, 2009
IC, so I need to switch to GAE for Python. Where is the demo for using SearchableModel to do full-text search available? |
|||||||||||
,
Jun 29, 2009
I just released a simple full text search module for GAE python under a MIT license. I believe it is better than SearchableModel. For more details, see: http://bit.ly/11yLv5 |
|||||||||||
,
Jul 02, 2009
That's cool, but is it possible to search within a specified document? |
|||||||||||
,
Jul 02, 2009
I think I need to talk about it more specificly. By default we search a keyword against many documents. But now I've restored 1M keywords in datastore and a specified document. I want to find out which of 1M keywords match the specified document. Is there an efficient solution? |
|||||||||||
,
Jul 02, 2009
@shore.cloud: What you describe is not full-text search. You should ask for advice on the google group; this bug is subscribed to by over a thousand people, and should only be used for on-topic discussions. |
|||||||||||
,
Jul 03, 2009
@bdonlan, sorry for off-topic. One more thing I want to know about full-text search is: Is it possible to restrict the range of documents? Say,sometimes I want to search about 'news',sometimes about 'job'? It seems by default all entities are indexed together? |
|||||||||||
,
Jul 10, 2009
Hi, we've released a new gae-search version (full-text search for App Engine + Django). Now, there's a "Free" version which can be used in non-commercial projects. Get it here: http://gae-full-text-search.appspot.com/ We've also implemented the relations index technique (index is moved into a separate child entity), but you can optionally turn it off. It's important to note that you can integrate your model's properties into the generated index such that you can run a full-text search and limit the results with additional filter rules (e.g., search only in published blog posts). The relations and values indexes are now generated via background tasks, so put()s are much faster. Finally, you can combine geomodel with our easy to use views in order to do a full-text proximity search, for example. Just use the new query_converter parameter to pass the full-text query to geomodel. Bye, Waldemar Kornewald & Thomas Wanschik |
|||||||||||
,
Jul 12, 2009
:-) |
|||||||||||
,
Jul 23, 2009
full text support for all languagues is needed! |
|||||||||||
,
Aug 03, 2009
I think this would be a simpler to implement solution that would put the workload on us, developers, and would give us a powerful tool to do much more than just full-text search. http://code.google.com/p/googleappengine/issues/detail?id=1935 |
|||||||||||
,
Sep 22, 2009
a warning to anybody planning to use listproperty based fulltext implementations. list based fulltext search is
based on the fact that AND joins are done if one filters on the same attribute more than one time like:
q = Doc.all().filter('term', 'cat').filter('term', 'dog)
this does work in first place without the need of an exploding index which (in practice cannot be built on
appgine if you have long lists of words or many docs). indexe entries would look like this.
- kind: Doc
properties:
- name: term
- name: term
there must be some internal limit on appengine that prevents joins with many intermediate results. so this
pattern is not applicable for real world applications.
up till now any fulltext implementation i have seen so far for appengine is based on this pattern, so don't use
it.
|
|||||||||||
,
Oct 02, 2009
I'm marking this issue as "Started" but I want to set expectations appropriately: This is a major undertaking and this will not happen soon, even by the most generous definition of soon.
Status: Started
|
|||||||||||
,
Oct 02, 2009
This is a very good news :) |
|||||||||||
,
Oct 02, 2009
Wow! I'm very happy to hear that. Please let us know when you need testing help etc. |
|||||||||||
,
Oct 02, 2009
I am curious if the approach is to supply an out of the box take-it-or-leave-it search or to provide the low level tools required to implement something like a distributed Lucene index. Will it be based on Map Reduce? Good new in any case! |
|||||||||||
,
Oct 03, 2009
Wow, I never saw "Started" before :) Anyway, please give us a "rouge" Roadmap, "will not happen soon" ... its 6 months, 1 year or 2+ years? There are many developers who need this feature urgent, I've stopped most of my appeninge activity because I've fight with too many limitations. But I'm very excited about the recent changes, just miss a more detailed roadmap. |
|||||||||||
,
Oct 03, 2009
So happy see the status has changed.. you guys rock. |
|||||||||||
,
Oct 21, 2009
www.google.com |
|||||||||||
,
Nov 24, 2009
n-gram for japanese please. |
|||||||||||
,
Dec 02, 2009
please, in a first step, increasing the limit of 5000 indexes could allow to wait until the arrival of a real full text search solution and unlock many projects. |
|||||||||||
,
Dec 19, 2009
Happy to see the started status. At least the wheels are rolling. We will get to destination.. |
|||||||||||
|
|
|||||||||||