Full text search in Google App Engine Part 1
This is one of the occasional blog entries which is split into multiple parts to not harass you by making you read too much! I introduce my first Django library that was something I would have always wanted to use. One of the reason of this blog post is showing others an example to learn from with an explanation of how I wrote the library. At the same time, I would invite your suggestions on what could have been done better!
I have worked with Google App Engine in the past as part of my work at Quikshare, and to me one of the most powerful features was that everything that you needed to build a web app was present as a module. This allows team to reach the market very fast, no need to manage scaling, databases, storage, search on their own.
My experience with the Search API in Google App Engine was widely disappointing due to the large amount of code needed to make it work well. This is one library I wanted to build since a long time, but laziness took over. Finally participating at Hackerearth’s Djangothon last weekend, I knew that this was the time to build it end to end. Here it is and here’s the presentation. I am listing examples with an example BookList app here, trust me they are really cool!
- tags: Horror NOT tags: Romance
- date < 2005-04-01
- description: novel
- author_name=Jean
- author_publisher_name=Atlas Press
You can have a look at the features on the Github page, I will start with the App implementation. Here’s the App Engine search API. Briefly, the Search API keeps a set of “documents” on their end which are created using the API and they allow the search to function well.
The conventional way
Let’s see the hard way first which I used in the past:
- Creating, deleting or updating an object needs corresponding update in a Document stored using App Engine’s search API. Mind that updating and deleting needs you to keep a reference of the Document ID in the Database. So adding a column in each model to keep the document ID for the search results, or creating a new Database to manage that.
- Most of the cases, we wouldn’t want to do the document updating during request time since it will increase the request would be increased, that needs extra work on us again.
- If we want to do a batch update as recommended in the docs, we need to write a service that does it for us. It will need to read the above from our setup and do corresponding updates/deletes. And what else, we need to do it for every model separately. DRY DRY DRY
Google App Engine search Django-fied
Inspired by the actstream library, I took up getting App Engine search into the Django world using a generic library. Here’s what you need to do to achieve all the above steps. Adding a model to Search system:
from searchApp.index import siteIndex
from .models import ModelName
siteIndex.register(ModelName,
['field', 'foreignKey.field', 'foreignKey.foreignKey.field',
'manyToManyField'], 'rankGetter', html_fields=['field'],
deleteSignal=customDeleteSignal,
updateSignal=customUpdateSignal)
Yup that’s it! It will create the necessary model for you. The fields bound would automatically trigger an update in the search documents as and when the update happens in the database. To be honest, you need to add a few more lines of code to your setup, but they are more or less Configuration setup. Check them out on the Github ReadMe for the same, it also has the explanation of each of the parameters here.
Under the hood:
Register pattern for Django
- This can be seen in 2 of the libraries inspiring the project: admin and actstream.
- Creates a single object that is used as part of the application, similar to a singleton pattern from Java, just less explicit about being so. The object in our case is siteIndex
- We use the .register function to add various models and their config which reside as part of the registry. Our object keeps all the data at application startup, and we can cache whatever is of use for us.
- In this case, we cache the field with the type of field so as to appropriately use it with the search setup. If you have a look at the Search documentation, it is very powerful when used appropriately, for eg: Dates should be saved as DateFields, Numbers as NumberFields to fully use the search capabilities.
If that was too much text for you, here’s the SearchIndex class, just a quick glance should make things clear.
Generic Foreign Key setup
Very powerful feature in Django is to create Generic Foreign key relationships, this allows us to link different model type objects in a single model using Foreign keys. Let’s see an example, Assume you have 2 models, Model1 and Model2
class Model1(models.Model): number = models.IntegerField()
class Model2(models.Model): number2 = models.IntegerField()
We use a model with a generic foreign key that can have a reference to both Model1 and Model2:
class ModelName(models.Model): content_type = models.ForeignKey(ContentType) object_id = models.IntegerField() content_object = GenericForeignKey('content_type', 'object_id')
Here Content Object is a foreign Key of combination of Content Type and the Object ID. There’s a small requirement while using Generic Foreign keys though, that primary key of each of the Models that you use with the foreign key need to have the ContentType as specified in the above model. A workaround would be to make the object_id Generic too, but I have never needed that, maybe you can write in comments if that’s a good idea.
We can consume the objects of the above 2 models easily using:
model1 = Model1.objects.create(number=1)
model2 = Model2.objects.create(number2=1)
gfk1 = ModelName(object_id=model1.pk, content_type=ContentType.objects.get_for_model(Model1)) gfk2 = ModelName(object_id=model2.pk, content_type=ContentType.objects.get_for_model(Model2))
That’s not it for Generic Foreign keys, they a lot of other powerful features to Django, you can go through them here.
Stay tuned for the next post, I will talk about the other components used, Django Signals, Foreign key traversal, App Engine Cron, and various errors I debugged during the 24 hour sleepless affair!