S3 thoughtsThe problem is, how is this supposed to work? I will outline a scenario: - Client browser requests url
- url->view->template, when rendered the thumbnails are checked for timestamp.
So we need timestamp, we connect to s3 from our django server requiring timestamp for a file. This will result in the following communication: Client browser -> Django server -> S3... ...->Client Browser. To me it does not look efficient. - All in all, isn't this going to end up unacceptably slow?
- We might need to do some periodical timestamp checking of files instead of on the fly?
What do you think?
|
Yes, I'm not sure there's a good way of modifying the template tag to use S3. Perhaps someone more clever than I will be able to solve that problem.
Where sorl + S3 would be extremely useful to me is in the actual DjangoThumbnail? class itself, using the Python API. For example, it would be really cool if you could pass DjangoThumbnail? an optional storage object.
I think I'm using sorl a bit differently than most. Instead of relying on template tags I added properties to my Photo model, like so:
@property def medium(self): return DjangoThumbnail(self.image.name, (1050,700))With this method I have a known set of sizes, which I create at upload.
Sure, it wouldn't support the dynamic generation in templates feature -- but being able to have it generate a set of known sizes (like flickr) via the Python API and have them store on S3 would be incredibly useful.
I created django-thumbs and it supports storage backends (tested with S3Storage? backend from django-storages) but it's much simpler than sorl-thumbnail.
I'm not sure what the current status here his (there's a SVN branch, but no work done yet apparently), but since I need this now (not so much for S3, but for #58) I had a look.
One problem seems to be that the current Django Storage implementation does not support anything like retrieving a timestamp. Since this is supposed to work generically with all storage engines, implementing it in a custom S3 backend is not enough, Django's FileSystemStorage? would need to support it as well, right? This looks like a roadblock to me right now.
Any work I'll be doing can be found here: http://github.com/miracle2k/sorl-thumbnail/tree/my-storage-refactor
Even if there is a timestamp for files, this will need additional request to S3 for checks. Maybe the better way is to use caching of the available/unavailable (generated/not generated) thumbnails in DB (for a "persistent cache").
Anyone heading in that direction ?
So I've got thumbnail and source saving to S3 working reasonably well. Code's here:
http://www.djangosnippets.org/snippets/1562/
General notes:
The functionality works like so:
Getting stuff to S3:
On a page load:
Advantages:
Problems:
Eek. Use django-storages for the backend rather than rolling your own. I've got django-imagekit working with django-storages and all I had to do was follow:
http://code.welldev.org/django-storages/wiki/S3Storage and then http://bitbucket.org/jdriscoll/django-imagekit/wiki/Home
With virtually no modification (except dropping cache_dir = 'photos' from the imagekit example). It looks like SORL may even work better than imagekit but there's no point in reinventing the s3 wheel when django-storages exists already.
Caching is where everybody should be focused IMHO.
@ arockinit: Take a look at the code again. It quite explicitly uses django-storage for the back end - nothing custom has rolled. And, it provides caching by using the filesystem, instead of inventing a more complex DB layer.
I'm not saying it's perfect, but it's worked very well in practice, has been responsive, and didn't require reinventing anything. This broader approach (use django-storage to provide the S3 layer, use the filesystem (even if simply for zero-byte files) for caching) seems elegant to me, and might merit further consideration for the project.
Ignoring even the S3 issue, it would be nice for Sorl to use as much of the Django storage api as possible.
Then if someone wanted to use S3 they would only have to attack the timestamp issue.
I want to give this a go today. Skoczen's basic logic is fine but requires custom save() methods on every model with files since it doesn't use DEFAULT_FILE_STORAGE.
Looking at the code, it's not a case of whether it'd be nice or not for sorl to use the Django storage API; it HAS to use it if we want stuff like django-storages to be usable. (This is 99% of the work right here.)
So I'm looking around for all the file system activity and it all seems to be in base.py, right? How the files are fed to PIL needs to be changed too.
Once I've got it all using the Django storage API I think for the timestamps I will then add an option to store them in the database. By default it will be off but slow as hell with things like S3 and CloudFiles?, but with it on it should be fine.
I find it easier to break this down into 2 steps. Making it work and then, Making it fast
As for making it work, I think it would be best to petition or fork django-storages and add a last_modified function to the storages. The S3 is absolutely trivial since the last-modified time is passed in the HTTP headers. So sorl would just have to check if there is a last_modified method and if not just assume the file is on a normal file system and figure out the modified time just as it does now.
As for making it fast, why not just use Django's cache decorator and vary the cache key based on the requested file path. If you apply the cache decorator to the functions last_modified and exists, sorl just run pretty fast. Then the user can control if the file meta-data is stored in a file, db, or memcached themselves in they set up caching. It is probably not a good idea to make that choice for them.
1) I agree, but should it be getmtime() instead of last_modified() for consistency?
2) I don't think a decorator should be used. If someone uses a decorator and then also uses memcache for their cache and then reboots, every thumbnail would be rechecked. I think it might be better to just let the app accessing the data deal with the caching itself and then it could use a mixture of DB and cache (memcache) for speed and persistence.
1) The name isn't so important it should just be consistent and descriptive.
2) I am pretty sure if you do a little plumbing you can use multiple cache backends. Perhaps it is better for sorl to default to use a the file cache backend and a user could change it to db or memcached through a SORL_CACHE_BACKEND setting. I think as we add flexibility in one area, file backends, that we should take it away in another, caching.
I decided to kickoff things here: http://code.google.com/r/jasonchrista-backends/
The first stage is to get sorl-thumbnail working solely with the FileStorage?. Currently this is mostly working. There are still some places where the storage should be used and a lot of tests need updating.
The second stage is to get sorl-thumbnail working with alternate storages. The biggest issue I am currently having is what file PIL wants to read from and write to.
The third stage is to speed up alternative storages.