|
FUSE-based file system backed by Amazon S3 For a commercially supported version with additional features see http://www.subcloud.com What's New- r177 fixed stale curl handle issue; fixed 100% cpu issue
- r166 case insensitive mime type lookup
- r152 (May 8, 2008)
- uses x-amz-copy-source; "correct" content-type lookup via /etc/mime.types; symlinks!
Overviews3fs is a fuse filesystem that allows you to mount an Amazon S3 bucket as a local filesystem. It stores files natively and transparently in S3 (i.e., you can use other programs to access the same files). Maximum file size=5G. s3fs is stable and is being used in number of production environments, e.g., rsync backup to s3. To use it: - get an amazon s3 account!
- download the source, compile it (I've used fc5/ppc, f7/i386, f9/x86, f9/x64 and Mac OS X 10.4) and slap the binary in, say, /usr/bin/s3fs
- you'll need at least fuse-2.6
- for fedora probably need to do: yum install fuse-devel
- for ubuntu probably need to do something like: sudo apt get fuse-libs (I think?!?)
- do this:
/usr/bin/s3fs mybucket -o accessKeyId=aaa -o secretAccessKey=bbb /mnt That's it! the contents of your amazon bucket "mybucket" should now be accessible read/write in /mnt If you don't like specifying your secretAccessKey on the command line then you can create a file "/etc/passwd-s3fs" with a line containing a accessKeyId:secretAccessKey pair. Then the command line becomes simply: /usr/bin/s3fs mybucket /mnt You can have more than one set of credentials (i.e., credentials for more than one amazon s3 account) in /etc/passwd-s3fs in which case you'll have to specify -o accessKeyId=aaa on the command line. s3fs supports mode (e.g., chmod), mtime (e.g, touch) and uid/gid (chown). s3fs stores the values in x-amz-meta custom meta headers, and as such does "brute-force" re-uploads of s3 objects if/when mode and/or mtime changes. and uses x-amz-copy-source to efficiently change them. s3fs has a caching mechanism: You can enable local file caching to minimize downloads, e.g., : /usr/bin/s3fs mybucket /mnt -ouse_cache=/tmp Hosting a cvsroot on s3 works! Although you probably don't really want to do it in practice. E.g., cvs -d /s3/cvsroot init. Incredibly, mysqld also works, although I doube you really wanna do that in practice! =) Using rsync with an s3 volume as the destination doesn't quite work because of timestamp issues. s3fs does not (yet) support changing timestamps on files. I mean, it will work, as in it will copy files, but, the timestamps will just be current timestamps (rsync will complain about not being able to set timestamps but will continue). s3fs works with rsync! (as of svn 43) Due to the way FUSE works and s3fs' "brute-force" support of mode (chmod) and mtime (touch), upon first sync, files are downloaded/uploaded more than once (because rsync does (a) chmod (b) touch and (c) rename), however, subsequent rsyncs are pretty much as fast as can be. If that's too much downloading/downloading for ya then try using the "use_cache" option to enable the local file cache... it will definitely minimize the number of downloads. as of r152 s3fs uses x-amz-copy-source for efficient update of mode, mtime and uid/gid. s3fs will retry s3 transactions on certain error conditions. The default retry count is 2, i.e., s3fs will make 2 retries per s3 transaction (for a total of 3 attempts: 1st attempt + 2 retries) before giving up. You can set the retry count by using the "retries" option, e.g., "-oretries=2". Options- default_acl (default="private")
- the default canned acl to apply to all written s3 objects, e.g., "public-read"
- any created files will have this canned acl
- any updated files will also have this canned acl applied!
- prefix (default="") (coming soon!)
- a prefix to append to all s3 objects
- retries (default="2")
- number of times to retry a failed s3 transaction
- use_cache (default="" which means disabled)
- local folder to use for local file cache
- connect_timeout (default="2" seconds)
- time to wait for connection before giving up
- readwrite_timeout (default="10" seconds)
- time to wait between read/write activity before giving up
DetailsIf enabled via "use_cache" option, s3fs automatically maintains a local cache of files in the folder specified by use_cache. Whenever s3fs needs to read or write a file on s3 it first downloads the entire file locally to the folder specified by use_cache and operates on it. When fuse release() is called, s3fs will re-upload the file to s3 if it has been changed. s3fs uses md5 checksums to minimize downloads from s3. The folder specified by use_cache is just a local cache. It can be deleted at any time. s3fs re-builds it on demand. New for svn 43: Local file cache is disabled and I might not bring it back. I originally added local file cache thinking it would help for rsync (and createrepo). It ends up rsync works reasonably well without it. For createrepo, just rsync back and forth! s3fs supports chmod (mode) and touch (mtime) by virtue of "x-amz-meta-mode" and "x-amz-meta-mtime" custom meta headers. As well, these are supported in a brute-force manner. That is, changing any x-amz-meta headers requires re-uploading the s3 object. This is exactly what s3fs does. When changing mode or mtime, s3fs will download the s3 object, change the meta header(s) and re-upload the s3 object. Ditto for file rename. as of r149 s3fs uses x-amz-copy-source, this means that s3fs no longer needs to operate in a brute-force manner; much faster now (one minor performance-related corner case left to solve... /usr/bin/touch) Local file caching works by calculating and comparing md5 checksums (ETag HTTP header). All s3 objects written by s3fs have a Content-Type of either "application/octet-stream" or "application/x-directory". as of r152, s3fs now leverages /etc/mime.types to "guess" the "correct" content-type based on file name extension. This means that you can copy a website to s3 and serve it up directly from s3 with correct content-types! Release Notes- r166
- case-insensitive lookup of content-type from /etc/mime.types
- r152
- added support for symlinks... ln -s works!
- r151
- r150
- added support for uid/gid... chown works!
- r149
- support x-amz-copy-source... rsync much faster now!
- r145
- log svn version at startup via syslog /var/log/messages
- r141
- added "url" runtime parameter
- r136, r138
- connect_timeout and readwrite_timeout
- r130
- set uid/gid to whatever getuid()/getgid() returns
- log some stuff to syslog (i.e., /var/log/messages)
- fixed issue 14 (local file cache bug; fixed cp, rsync, etc...)
- r117
- limit max-keys=20 (workaround for libcurl's 100% cpu issue?!?)
- r116
- r114
- r107
- r106
- r105
- only send x-amz-acl and x-amz-meta headers
- r101, r102, r103
- fixed curl_multi_timeout bug (found on mac)
- r99
- added "default_acl" option
- r92
- parallel-ized readdir(): getting a directory listing is now a lot faster
- r88
- removed 10s read timeout that should not have been introduced
- r72 2008-02-18
- use_cache now takes path to local file cache folder, e.g., /usr/bin/s3fs mybucket /s3 -ouse_cache=/tmp
- r66 2008-02-18
- local file cache is back! however, it is disabled by default... use "use_cache" option, e.g., /usr/bin/s3fs mybucket /s3 -ouse_cache=1
- r57 2008-02-18
- a few bug fixes:
- touch x-amz-meta-mtime in flush()
- use INFILE_LARGE (libcurl) (found on fc5/ppc)
- tidyup
- r43 2008-02-17
- mode (i.e., chmod), mtime and deep rename! rsync now works!
- temporarily disabled local file cache (might not bring it back!)
- r28 2007-12-15
- retry on 500 server error
- r27 2007-12-15
- file-based (instead of memory-based)
- this means that s3fs will no longer allocate large memory buffers when writing files to s3
Faq- What do I need to know?
- /usr/bin/s3fs
- /var/log/messages
- an entry in /etc/fstab (optional)
- the file /etc/passwd-s3fs (optional)
- the folder specified by use_cache (optional) a local file cache automatically maintained by s3fs, enabled with "use_cache" option, e.g., -ouse_cache=/tmp
- the file /etc/mime.types
- map of file extensions to Content-types
- on Fedora /etc/mime.types comes from mailcap, so, you can either (a) create this file yourself or (b) do a yum install mailcap
- stores files natively and transparently in amazon s3; you can access files with other tools, e.g., jets3t
- Why do I get "Input/output error"?
- Does the bucket exist?
- Are your credentials correct?
- Is your local clock within 15 minutes of Amazon's? (RequestTimeTooSkewed)
- How do I troubleshoot it?
- tail -f /var/log/messages
- Use the fuse -f switch, e.g., /usr/bin/s3fs -f my_bucket /mnt
- Why do I see "Operation cannot be completed because you do not have sufficient privliges"
- you'll see this when a program you're using (e.g., tar, rsync) is trying to explicitly set the modification time of a file. s3fs currently does not support this. Contents of the file are ok, its just that the timestamp might not be what you're expecting. I'm working to fix this. fixed in svn 43!
- Its still not working!
- Try updating your version of libcurl: I've used 7.16 and 7.17
- Q: when I mount a bucket only the current user can see it; other users cannot; how do I allow other users to see it? A: use 'allow_other'
- /usr/bin/s3fs -o allow_other mybucket /mnt
- or from /etc/fstab: s3fs#mybucket /mnt fuse allow_other,accessKeyId=aaa,secretAccessKey=bbb 0 0
- Q: How does the local file cache work?
- A: It is unbounded! if you want you can use a cron job (e.g., script in /etc/cron.daily) to periodically purge "~/.s3fs"... due to the reference nature of posix file systems a periodic purge will not interfere with the normal operation of s3fs local file cache...!
- Q: How do I change the location of the "~/.s3fs" folder?
- A: you don't (for now)... use a softlink -ouse_cache option is the path used for local file cache! e.g., /usr/bin/s3fs mybucket /s3 -ouse_cache=/tmp
- Q: s3fs uses x-amz-meta custom meta headers... will s3fs clobber any existing x-amz-meta custom header headers?
- Q: I renamed a folder and now all of the files in that folder are gone! What the?!?
- A: Rename it back and your files will be back. s3fs does not support deep directory rename and doesn't check for it either.
Limitations- no permissions checking
- no chmod support: all files are 0775 fixed in svn 43!
- no symlink support added in r152
- rename is "supported" by virtue of returning EXDEV from rename() fixed in svn 43! svn 43 supports deep renaming of files
- when writing files: requires as much memory as the size of the largest file you're writing (this can be easily fixed) fixed (svn 27) you should now be able to copy, say, a 2GB file to s3 without having s3fs malloc 2GB of memory!
- deep rename directories?!?
ToDo- support brute-force rename fixed in svn 43
- get symlinks working? added in r152
- this would bog down performance: would have to do deep getattr() for every single object already doing this in svn 43... its not too bad!
- make install target
- get "-h" help working
- handle utime so that rsync works! fixed in svn 43!
- probably a bad idea after all...
- actually don't think it can be done: can't specify arbitrary create-time for PUT
- chmod support... acl
- permissions: using -o allow_other, even though files are owned by root 0755, another use can make changes
- use default_permissions option?!?
- better error logging for troubleshooting, e.g., syslog...
- need to parse response on, say, 403 and 404 errors, etc... and log 'em!
- use temporary file for flush() and then stream it to amazon
See AlsoHere is a list of other Amazon S3 filesystems:
|
anybody troubleshooted this to get it compiling on Mac OSX ?
I have MacFuse? 0.4 and get the following errors:
Package fuse was not found in the pkg-config search path. Perhaps you should add the directory containing `fuse.pc' to the PKG_CONFIG_PATH environment variable No package 'fuse' found g++ -Wall -lcurl -I/opt/local/include/libxml2 -I/opt/local/include -L/opt/local/lib -lxml2 -lz -lpthread -L/opt/local/lib -liconv -lm -ggdb s3fs.cpp -o s3fs In file included from /usr/local/include/fuse/fuse.h:23,
/usr/local/include/fuse/fuse_common.h:30:2: error: #error Please add -D_FILE_OFFSET_BITS=64 to your compile flags! s3fs.cpp: In function 'int s3fs_getattr(const char, stat)': s3fs.cpp:279: error: expected type-specifier before 'off_t' s3fs.cpp:279: error: expected `>' before 'off_t' s3fs.cpp:279: error: expected `(' before 'off_t' s3fs.cpp:279: error: 'off_t' was not declared in this scope s3fs.cpp:279: error: expected `)' before ';' token s3fs.cpp: In function 'int s3fs_read(const char, char, size_t, off_t, fuse_file_info)': s3fs.cpp:495: warning: format '%u' expects type 'unsigned int', but argument 3 has type 'size_t' s3fs.cpp:495: warning: format '%u' expects type 'unsigned int', but argument 4 has type 'size_t' make: all? Error 1I started a discussion at, gt a step further, but still not working. http://groups.google.com/group/macfuse-devel/browse_thread/thread/4de259075741370a
Found this: http://www.rsaccon.com/2007/10/mount-amazon-s3-on-your-mac.html But not tried it yet
Note to Ubuntu Newbies - Compiling in Ubuntu
If you're getting errors related to missing 'libxml' or 'curl'. You need the following c libraries in order to compile: curl, fuse, build-essential, libxml2, and openssl.
To get these, at prompt type:[BR?]
[BR?] Then in the directory where you downloaded the repository (trunk/s3fs/), type the following in the shell to compile the s3fs.cpp file: [BR?]
First, thanks for this. Is there any progress or ETA on s3fs not using memory when writing files to S3? It says it could easily be fixed, I just wanted to check. Thanks!
yup, I've definitely made progress on the "not using memory when writing files" scheme... the scheme essentially caches files locally based on their md5 checksum... I'll try to get something checked in by the end-of-the-week...!
Hi, I compiled and ran the code. 2 problems 1. ls on the mounted directory causes s3fs to crash s3fs?# ll /mnt/test-images/ ls: reading directory /mnt/test-images/: Transport endpoint is not connected total 0 ?--------- ? ? ? ? ? itemUpload$folder$ 2. What is the format of /etc/password-s3fs ? I tried accessKey:secretKeyID .. it did not work
Hi rviswanadha- Another user reported a similar issue... however I have not been able to see this behavior myself; which linux/version are you runnning?
also, the format for passwd-s3fs is simply one line per set of credentials, separated by a colon (essentially just like you wrote), e.g., accessKeyId:secretAccessKey
rviswanadha- as well, be sure the file is called "passwd-s3fs" (not "password-s3fs")
rviswanadha- ah! there was a mistake in the wiki page... fixed! the file is "/etc/passwd-s3fs"
Hi RRizun, thanks for responding. I am using the default Redhat LAMP AMI from Amazon. First I installed all the required libraries and make tools
<br/> After that I compiled s3fuse code and mounted the drive.
Great Cookbook rviswanadha! RRizun, is there any way of mounting an existing path? say if I have a "prefix" under a bucket, <bucket>:/my/path/here shouldn't "s3fs <bucket>:/my/path/here <mntpt>" work?
Hi jimbosander- currently, no, but that appears to be an easy feature to add... i'll add an issue to track it
Anyone tried to use s3fs to make EC2 (MySQL and Apache) use files directly from S3? I read somewhere that the connection between EC2 and S3 would create problems in such a solution. Acording to your experience what could be reasonable to acomplish with EC2, S3 and s3fs when having a high-load website? Direct use, minute2minute backup, hourly backup, ...?
Hi Jorang- I'm assuming your primary interest is recovering any data in case of disaster, e.g., across instance shutdowns
in my opinion for small mysql datasets it is reasonable to add a simple cron job to do a mysql dump to s3 (hourly, daily, whatever is acceptable to you); I've done this in the past myself (daily, because 'important' data changed rarely); using mysql innodb (instead of myisam) should minimize any database contention when the backup script is run
if you have a large dataset and/or your dataset changes often then maybe amazon simpledb might be a better route
I've recently checked in a version of s3fs that adds local file caching, so, for example, you could configure apache to serve files directly from a mounted s3fs volume; with the local file caching, with the exception of the initial warming up of the cache, you might be able to get decent local-file performance (though I have not tried this setup myself yet so I have no real world experience) (you definitely would not want to run mysql in this fashion!)
hope that helps!
Not sure if I'm allowed to distribute this, but here's a binary of the above source compiled on Mac OX 10.4.10: (expires march 1st, 2008), hosted on s3, of course ;) I followed the instructions in the above linked thread, which are also more cleanly duplicated here:http://www.rsaccon.com/2007/10/mount-amazon-s3-on-your-mac.html . The one thing I needed to also do to make it all work was to add /opt/local/bin to my PATH.
https://miradu.s3.amazonaws.com/s3fs?AWSAccessKeyId=0GWZZ6FN6895K2ETZ602&Expires=1204351195&Signature=LR8atfPW0XBjPMXIlXPLNZtAdIg%3D
Cheers,
-Michael Ducker miradu@miradu.com
Thanks, Michael!
When I run s3fs as root and a dir gets mounted, only root can access the data in the s3 bucket. When I used to run fuse manually, i'd add the -o allow_others option. How can I get apache to read from the mounted dir?
"allow_other" works fine (I noticed you typed "allow_others" in your comment, i.e., pluralized... perhaps a typo?!?) so, either:
/usr/bin/s3fs mybucket /mnt -o allow_other
s3fs#mybucket /mnt fuse allow_other,accessKeyId=aaa,secretAccessKey=bbb 0 0
That it! Thanks!
Hello,
I installed the dependencies that contact.alexkuo recommended
But I am getting these errors while issuing make on Ubuntu 6.10
many thanks for any help
$make g++ -Wall -D_FILE_OFFSET_BITS=64 -I/usr/include/fuse -lfuse -lpthread -lcurl -I/usr/include/libxml2 -L/usr/lib -lxml2 -lssl -ggdb s3fs.cpp -o s3fs s3fs.cpp: In function ‘int main(int, char**)’: s3fs.cpp:1072: error: invalid conversion from ‘void* (*)(fuse_conn_info*)’ to ‘void* (*)()’ s3fs.cpp:1075: error: ‘struct fuse_operations’ has no member named ‘utimens’ s3fs.cpp: At global scope: s3fs.cpp:272: warning: ‘size_t readCallback(void*, size_t, size_t, void*)’ defined but not used make: *** [all] Error 1 $svn diff Index: s3fs.cpp =================================================================== --- s3fs.cpp (revision 42) +++ s3fs.cpp (working copy) @@ -1074,5 +1074,5 @@ s3fs_oper.access = s3fs_access; s3fs_oper.utimens = s3fs_utimens; - return fuse_main(custom_args.argc, custom_args.argv, &s3fs_oper, NULL); + return fuse_main(custom_args.argc, custom_args.argv, &s3fs_oper); } $svn info Path: . URL: http://s3fs.googlecode.com/svn/trunk/s3fs Repository Root: http://s3fs.googlecode.com/svn Repository UUID: df820570-a93a-0410-bd06-b72b767a4274 Revision: 42 Node Kind: directory Schedule: normal Last Changed Author: rrizun Last Changed Rev: 40 Last Changed Date: 2008-01-21 22:43:06 -0200 (Mon, 21 Jan 2008) $gcc -v Using built-in specs. Target: i486-linux-gnu Configured with: ../src/configure -v --enable-languages=c,c++,fortran,objc,obj-c++,treelang --prefix=/usr --enable-shared --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --enable-nls --program-suffix=-4.1 --enable-__cxa_atexit --enable-clocale=gnu --enable-libstdcxx-debug --enable-mpfr --enable-checking=release i486-linux-gnu Thread model: posix gcc version 4.1.2 20060928 (prerelease) (Ubuntu 4.1.1-13ubuntu5)I've successfully compiled on Ubuntu 7.04 but not on 6.10... perhaps fuse-lib bundled w/6.10 is too old?!?
Here's what I get on Ubuntu 6.06:
Hi. I compiled and installed s3fs Rev 55 on a Gentoo Linux system. I'm able to mount some of my buckets. But when I try to do a "ls" of my mounted filesystem, nothing's returned:
Strange. Am I doing something wrong?
Is there currently a way to make a file publicly readable? i.e. when doing a cp or mv command? Thanks!
a.skwar- not sure what's going on here... are these pre-existing s3 objects in the images bucket that are maybe somehow not "compatible" with s3fs? can you create new files/directories via s3fs in that bucket? i.e., mkdir/touch, etc...?!?
tleasure- there is no provision for making a file "public-read" with s3fs; you can use a tool such as jets3t for that ... I was thinking of making "chmod" fiddle with the s3 permissions... chmod's "mode" user/group/other would map nicely, however, I think it would actually be a bad idea ... it would be an unexpected surprise to most people, I would think, not realizing that they're making their files world-readable! even a simple rsync of some 777 files would cause those 777 files to be world-readable by anyone on the Internet!
I've noticed that if you have pre-existing files that have a slash in the S3 filename (like a virtual directory), s3fs will not pick up on it until you create the directory locally. So if you have bucket:images/somefile.txt, you want to create a "images" directory locally in your s3 mount. Afterward, the files show up. Hope this helps.
rrizun - thanks for your response. I agree that mapping chmod to S3's ACL could yield an unexpected surprise. But I do think this functionality would be huge. Do you think there are any alternatives other than chmod to set the ACL initially rather than doing another request after the file transfer?
tleasure- I guess I could add a option to s3fs: "acl_default" one of: private, public-read, public-read-write or authenticated-read?!? (defaults to "private" of course)
That would be awesome!
Please consider resurrecting/improving the cache.
In particular, I'd like to see an option to control where it is written instead of .s3fs.
I've done some timing with my EC2 machine with s3fs and the local mnt partition. It's a lot faster to use a cache than always accessing S3.
ok! local file cache is back just as it was before, however, it is disabled by default... use the "use_cache" option to enable, e.g., /usr/bin/s3fs mybucket /mnt -ouse_cache=1
configurable cache folder (svn 72): use_cache now takes the path for the local file cache, e.g., -ouse_cache=/tmp
Excellent! Thank you!
I see you're currently using curl "easy" functions. Have you looked into whether the "multi" and/or "share" functions would improve performance at all?
Not that it's neccessarily "bad" given the S3 architecture, but I'm just curious.
cblaise- use of libcurl's multi/share api by itself probably would not improve network performance all that much, however, use of FUSE's low-level asynchronous api in combination with libcurl's multi/share api would definitely improve the "responsiveness/robustness" of s3fs, e.g., hitting CTRL-C during an I/O operation should respond immediately due to async api vs sync api...
FYI I've added configurable retries... -oretries=2... default is 2... so, s3fs makes 3 attempt per s3 transaction: 1 attempt + 2 retries
With today's code using a 103M file, most of the time (but not always) I'm seeing the following error when copying to the drive:
cp: closing `s3/sitefilter-db.tgz': Input/output error
I've tried on two machines, one an EC2 machine (Fedora 8) and another a local machine (Fedora 5).
EC2 machine has shown it a few times not not as many times as the non-EC2 machine where it always happens.
Does not happen on smaller files (s3fs.cpp).
Addendum: when this occur occurs the file does not copy over. It is 0 bytes.
Seems to work properly with yesterday's (72) code.
My first clue should have been the timing. From my EC2 machine I'd expect to see under 30s (usually 11s) to copy the 103M file. On my remote machine, it should have been much longer than the ~30s time was reporting. :)
I think the problem is caused by the addition of this line:
I've removed the line and did a checkin... go ahead and do a svn update and retry?!?
sry/Thanks!
That seems to be doing it. I'll try re-copying a few more times to be certain and will post if they fail.
I'm trying this with the current (91) release code and with r88. Reads work fine, but writes are returning input/output error. A little testing and digging in the code shows that the my_curl_easy_perform is returning -EIO. (My test is just "touch /mnt/x" to create a new file)
Further, it seems that the EIO is being raised because I'm getting a 411 error back from AWS, which according to this indicates that the content-length header should be passed.
Since this is a list of errors from 2006, I'm guessing that I'm doing something wrong rather than the code or aws. Has anyone else had this problem? I assume that I'm authenticating ok to be able to read my own (private) buckets.
Hmmmm.. I just did "touch /mnt/x" using r92 and it worked fine both with and without use_cache... could it be a European vs. North America s3 issue? can you use the FUSE -f switch to run s3fs in the foreground and capture its debug output?
or better yet capture packets w/something like this:
tcpdump -s 1500 -A host s3.amazonaws.com
Ah... it was my version of cURL. I was trying it on an old server which looks like it had v7.12 installed (I could be wrong with the version). Updated to 7.18 and it's working great. Going to try running a gallery2 album via this, if it works then I'll be very impressed. Great bit of code!
rrizun, an update - if you're interested. The filesystem appears to work fine. Got gallery2 storing full and resized images on s3 using your s3fs code. A little hacking required: changed the gallery code to redirect to the s3 file and a minor change to s3fs to set the acls to public_read on file create / mkdir.
I've been looking for an easy way to do this for a long time. Thanks.
Hi there:
I've installed on OS X and created a bucket (with some third party software) but have not had much luck getting things working...seems to send my Finder mental and only a force reload will get things back to normal after which time s3fs has died.
Two questions: one, has this been tested on OS X at all? And second do you have a recommended way of managing buckets for use with this FS?
Thanks
Jamie
Hi there- I got this up and running yesterday on a MacBook? running OSX 10.4 and it worked quite well- I was able to weed out a few minor bugs in doing so. Having said that, try SVN version "107" which contains those bug fixes.
Are there files already in the bucket you're creating? Try creating and mounting an empty bucket and see if it still "sends Finder mental"! =)
I use jets3t (https://jets3t.dev.java.net) to manage buckets.
hanlon.co.uk: glad to hear its working for ya; I recommend updating to svn 107 to pull in a few bug fixes
rrizun: I have the latest SVN and its not working mate. It seems to be deadlocking or something: I've got gdb on it waiting for it to crash out and Its not even letting me break in. I'm well up for getting this working so email me - jkp@kirkconsulting.co.uk if you want a debugger.
Jamie
Worked great on my fedora 7 install.
I get svn version. But its not working in compiling.
Even after applying the patch :
=================================================================== --- s3fs.cpp (revision 42) +++ s3fs.cpp (working copy) @@ -1074,5 +1074,5 @@ s3fs_oper.access = s3fs_access; s3fs_oper.utimens = s3fs_utimens; - return fuse_main(custom_args.argc, custom_args.argv, &s3fs_oper, NULL); + return fuse_main(custom_args.argc, custom_args.argv, &s3fs_oper); }I have always problem after acompiling :
I use gcc 4.1.2 and Debian Etch or Sarge
I have testing with revision 42,55 and latest
And idea ?
from my research, debian etch bundles fuse 2.5... you'll need at least fuse 2.6 (preferably fuse 2.7) to compile s3fs... http://www.debianhelp.org/node/12310
I think settting up a mailing list would be awesome, if there isn't one yet. That way these comments would not get abused as much by people like me ;)
I tried using s3fs with rdiff and it doesn't work, python stack trace: http://pastie.caboo.se/164903, I know nothing about this stuff though.
Also, I saw that https support is planned, and that would be killer!
Hi there-
the issue you're seeing w/rdiff is related to http://code.google.com/p/s3fs/issues/detail?id=14
as a temporary workaround until I fix it, try disabling local file cache
as well, there is a mailing list: s3fs-devel; see http://code.google.com/p/s3fs/
https coming soon!
Please help! I simply can't get the /etc/passwd-s3fs to work. It contains: A12C1F:AC3143411F
That's keyId:Secret (those are obviously fake). Is this the right format? I'm getting 403 errors. Everything works if I enter the keys on the command line.
when you say "All s3 objects written by s3fs have a Content-Type of either "application/octet-stream" or "application/x-directory". " will this change in the future? content disposition for images for example when uploaded using s3fs are set to force download and wont display in a browser.
it might change in the future, specifically, I might drop application/x-directory because that information is already encoded in "mode"...
the intent is for s3fs to not care about the content-type... it would be an interesting feature though to somehow be able to configure content-type...
workaround? configuring apache httpd to serve up content from a mounted s3fs volume would work (i.e., have apache httpd to all of the content-type detection/smarts)
Wow, very slick, working like a charm!
I actually AM trying to use rsync to copy a gallery onto S3. Do the rsync issues with copying back and forth still exist? Enabling the use_cache option causes the following types of errors:
rsync: mkstemp "/v/blahblha/albums/.P6180002.JPG.Tl8lJj?" failed: Bad file descriptor (9) rsync: failed to set times on "/v/blahblah/albums/Anne M/SuzhouEmbroidery?": Bad file descriptor (9)
The cache directory ends up being only two directories deep, with the 'blahblah' directory showing up in cache as a 0-length file.
Other than this, WOW!!!
Yup, unfortunately the local file caching issues still exist in the codebase... the symptoms are exactly as you describe... I'm sure its a simple fix but I just haven't been able to scrape up enough time to get around to fixing it!!!
Workaround: for now, disable local file cache! rsync should still work pretty good w/o it...
Once I fix it then I'll post a new src tarball...
Glad it works for ya!
FYI local file caching bug is fixed in r130!
I'm very new to all this linux stuff.. and I'm trying to get this to work.
I have one box that's fedora 9 and s3fs works just fine on there... but I have another that's a dedicated virtual where I keep running into
"fuse: failed to open /dev/fuse: Permission denied"
This box is running CentOS r5 and I'm running fuse 2.7.3-1.el5
/dev/fuse has permissions crw-rw---- 1 root fuse 10, 229 Apr 8 21:25 /dev/fuse
Apparently this error usually happens with fuse when the group isn't set right.. but I still get this even when I try to mount with root... and even when I set /dev/fuse to crwxrwxrwx...
I'm sure I'm just doing something dumb but I've spent wayyy too much time on this so I thought I'd ask around!
So...Any thoughts?
This project is brilliant, by the way =b
The good news: it works, and works GREAT on Ubuntu 7.10 The bad news: There is one serious design glitch that kills its usability for my needs.
I'm using Bacula to write to an S3 bucket mounted by s3fs. I'm using local caching, and 100MB volumes in Bacula so not too much is uploaded when a volume changes.
Problem: Bacula writes that 100MB volume, and then closes the file. s3fs then starts uploading that file to S3, but it blocks until the file is uploaded, and Bacula can't create a second volume (file on disk) until the first one completes uploading. That means, I can't spool to the local cache at all if I want my backup job to run in a reasonable amount of time (we're talking multi-gigabytes on a 512kpbs uplink). Is there any way around this? I'm pretty sure you're going to tell me it involves threading s3fs, and I know that is a headache and a half, but it would be cool if it could be done. Thanks for the great project!
Hi Eli- Not sure what's going on... is this CentOS managed by an ISP? Perhaps FUSE is at their mercy?!? Also, selinux issue?!? Dunno, just guessing...
Hi pedahzur-
Ya, threading is not an issue... s3fs could do the s3 upload in a separate thread and return immediately to the caller, (essentially a write behind cache) however, but then there would be no way for s3fs to directly convey an error back to the caller, i.e., would not be able to return a bad error/status code/return value. Futhermore there are concurrency issues to consider, easiest solution would be to serialize the write behind thread.
So, ya, it could be done (quite easily, actually), but note without making it clear to the end user he ramifications and trade-offs, thru documentation I guess!
Feel free to add a new "Issue" to track this feature/enhancement!
Hi,
As a test I changed "application/octet-stream" in the source to "text/plain" and now when I go directly to the file on S3 (like an image file) the browser displays the file directly, instead of downloading it like when "application/octet-stream" is set. (I have also changed the default_acl to "public-read").
Pardon my ignorance, but what am I breaking in enabling this useful feature? Thanks!
Hi- those changes are fine! nothing should break; s3fs does not rely on content-type
there is a s3fs option called "default_acl" to set the default acl at runtime; perhaps I should also add a "default_contenttype" option?
If patch s3 to save the correct content-type, what could break?
Hi- you can patch s3fs to save whatever content-type you wish; should not break anything as s3fs does not rely on content-type
Hi there, is it possible to store a relational database like MySQL directly into S3 using S3FS and run and update it on S3 via S3FS? Has anybody done this before? or its too risky? Thanks
Hi- it is possible, however I would assume performance would be terrible (haven't tried myself), based on the fact that s3fs operates in a "bruteforce" manner, re-uploading entire files on changes. ElasticDrive? and/or PersistentFS would probably be better candidates since they are really block devices.
FYI just for fun I tried running mysqld with datadir pointing to an s3 bucket and it worked! only issue is the "service mysqld start" timed out because mysqld created a 10MB ibdata and uploaded it; apparently the mysqld init script did not want to wait that long! =) I was able to do queries and it wasn't nearly as bad as I thought it would be; best bet would be to disable innodb and use myisam... =)
Thanks mate, will try it out, but I guess if the size of the database becomes larger then performance will be an issue.
Just FYI I had all sorts of trouble using this with a European bucket. Tried for ages to get it working with no joy before it dawned on me to try a US bucket instead. Worked first time. Switched back to European bucket just to be and doing everything the same it wouldn't work.
Thanks for a brilliant FUSE project though - with rsync this is gonna be a very cheap and easy way to keep my photos synced offsite.
Indeed- I haven't looked at what it takes to support EU buckets yet... I've added a note at the top of this wikipage...
Hi there, I am having lots of speed issues with S3 using S3fs. I am using Rsync to do a backup and it has taken 1 day to do a full 1GB backup!! any clues? Thanks
Hi there- be sure you're using r152 (x-amz-copy-source)... as well, what is your upload bandwidth? how fast? and which platform? linux? mac?
also, might want to "tail -f /var/log/messages" to see if you're getting excessive retries?
We are using it off an Amazon EC2 instance running Debian so the bandwidth should not be an issue since its all internal. I have a feeling that it could be the switch time between each file copy, what do you think? if it is, any way of reducing this? Thanks so much for your help.
hi
heres some info on the above with comments
minideb:~/testing# pwd /root/testing minideb:~/testing# du -csh 1/ 78M 1/ 78M total minideb:~/testing# find 1/|wc
#theres 76 files, photos, no directories, each averaging 1mb
minideb:~/testing# time cp -Rp 1/ /mnt1/testing/1
real 0m58.384s user 0m0.021s sys 0m0.169s minideb:~/testing# du -cs 1 79184 1 79184 total minideb:~/testing# expr 79184 / 59 1342 minideb:~/testing# 1.3M/s
minideb:~/testing# time tar cf data.tar 1/
real 0m0.611s user 0m0.010s sys 0m0.268s minideb:~/testing# time cp data.tar /mnt1/testing/
real 0m9.129s user 0m0.020s sys 0m0.234s minideb:~/testing# ls -l data.tar -rw-r--r-- 1 root root 80680960 2008-05-16 12:18 data.tar minideb:~/testing# expr 80680960 / 10 8068096
root 18058 0.2 0.8 72328 14320 ? Ssl May15 2:47 /root/ec22/s3fs-r152/s3fs entrip-s3fs-1 /mnt1 -ouse_cache=/tmp
any idea why a single large file gets uploaded at 8M/s and smaller ones get on average 1M/s? could there be some processing, switching, i dont know time in between upload requests that kills it on many smaller files?
heres the above with a bit more formatting, i didnt realize wiki will break my new lines http://linux.pastebin.ca/1019983
Hi- I think this is a case of many small files vs one large file; in this case it ends up significant; on my machine (cable modem at home) using ethereal/wireshark , I can see about a one second "penalty" for setup before the file transfer actually takes place; that is, using "cp -p" preserve mode, ownership, timestamps, will cause s3fs to send a flurry of HTTP PUT requests to preserve mode, ownership, timestamps; all this takes time and adds up to the observed overhead that you're seeing; 76 files times approx 1 second overhead per file accounts for the discrepancy you're seeng: 58sec/76files=0.763 seconds overhead per file (I'm seeing just over 1 second per file, that's ec2 vs cable modem at home)
s3fs uses http keep-alive so it is probably as fast as can be; the only way to speed up copying of many small files would be to parallelize the copy operation; amazon s3 is very conducive toward parallelization (s3fs is fully multi-threaded and will use multiple simultaneous http connections)
so, all seems normal! hope that helps!
er, I guess the above calc is more like: (58sec-9sec)/76files=0.644 second overhead per file (I'm subtracting the 9 seconds it takes for the raw upload of the entire 80Mbyte file... you get the gist! =)
Thanks very much, looks like we will have to tar.
hi rrizun thanks! for s3fs as well, its better than elasticdrive if it matters anything, ed is slower than s3fs and just pretends to be faster (by returning instantly), under the hood is 2x or so slower, s3fs is great i wish i had more time to study the s3 api as well thanks ;-)
find() is case sensitive, and my mime.types has only lower case entires. So, the common .JPG extension will not find anything. This below seems to work...
/** * @param s e.g., "index.html" * @return e.g., "text/html" */ string lookupMimeType(string s) { string result("application/octet-stream"); string::size_type pos = s.find_last_of('.'); if (pos != string::npos) { s = s.substr(1+pos, string::npos); } string low_s; unsigned int i; char* buf = new char[s.length()]; s.copy(buf, s.length()); for(i = 0; i<s.length(); i++) buf[i] = tolower(buf[i]); string r(buf, s.length()); delete buf; mimes_t::const_iterator iter = mimeTypes.find(r); if (iter != mimeTypes.end()) result = (*iter).second; return result; }oops, the line "string low_s;" is left over.. I should have deleted it.
good catch! fixed in r166
Thanks!
I am unable to compile. I get this:
buffy:~/s3fs# make g++ -ggdb -Wall -D_FILE_OFFSET_BITS=64 -I/usr/include/fuse -lfuse -lpthread -lcurl -lgssapi_krb5 -lkrb5 -lk5crypto -lcom_err -lkrb5support -lresolv -lidn -ldl -lssl -lcrypto -lz -I/usr/include/libxml2 -L/usr/lib -lxml2 -lcrypto s3fs.cpp -o s3fs s3fs.cpp:1641:74: error: macro "fuse_main" passed 4 arguments, but takes just 3 s3fs.cpp: In function âint main(int, char)â: s3fs.cpp:1636: error: invalid conversion from âvoid ()(fuse_conn_info)â to âvoid ()()â s3fs.cpp:1639: error: âstruct fuse_operationsâ has no member named âutimensâ s3fs.cpp:1641: error: âfuse_mainâ was not declared in this scope s3fs.cpp: At global scope: s3fs.cpp:438: warning: âsize_t readCallback(void, size_t, size_t, void)â defined but not used make: all? Error 1
debian etch?
>>> from my research, debian etch bundles fuse 2.5... you'll need at least fuse 2.6 (preferably fuse 2.7) to compile s3fs... http://www.debianhelp.org/node/12310
Thanks!! I installed from backports and things worked. I had just done an upgrade by downloading fuse and compiling it.
-Adam
My OS is Linux 2.6.24.4-64.fc8 #1 SMP Sat Mar 29 09:15:49 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux.
s3fs version is r166
I'm trying to copy a big directory tree with approx. 70K files and total size of 5.5GB using command
Some files are failed with " ###response=400". This indication of a problem on client side. Later I tried to copy individual files, that failed with command
and they finished successfully.
What kind problem it could be? How can I get what request is being sent? Any help would be appreciated.
Thanks.
-Vlad
Hi- what is the nature of the failed files? i.e., are they large? 1GB or greater? as well, is your system clock accurate? system clock needs to be within 15 minutes of amazon s3's clock; s3fs could use some improvement in error reporting... for now you could use ethereal/wireshark to capture 400 bad requests and inspect the xml error doc returned from s3;
I've also seen some references to amazon web services return 400 bad request for no apparent reason, and retrying the request works the 2nd time, however, not sure retrying 400 bad requests is the right thing to do...
These files are not big ones. Sizes usually less than 10MB
I was able to get output from unsuccessful attempt to copy a SWF file (size: 2,605,315 bytes) Packets info were caught with the following command:
I found the xml message inside info.txt file:
Could it help to resolve the issue. Any suggestions?
Thank you very much for your help -Vlad
any chance the source files are being modified during the overall copy operation to s3?
Absolutely not. However, the error came after another user started copying some files to S3 space. But these were different files in different directories. I repeated the attempt to copy in a minute later and it failed again with the same message.
Thank you, -Vlad
Hi Vlad- thanks for the feedback; I'm afraid this is a tough one! I did see a problem similar to this a while ago and added issue 25 to document it; lemme think about it...
does unmounting and then remounting "fix" the problem?
The easiest way to ensure all your ec2 dbs are save from ec2 crashes is to replicate your ec2 database on a regular machine
I currently replicate my ec2 dbs on my centos machine at home. The updates to the slave db is practically instantaneous so if ec2 every crashed I'd probably lose no more than a minute of data
Also you can make backups of your db by backing up the slave. No need to spend precious cpu power on database updates. Just run it on a utility computer sitting in your basement
Vlad- what version of libcurl are you using? as well, do you see a "Expect: 100-Continue" in the PUT request that fails? (see issue 25)
Command "curl-config --version" returns libcurl 7.17.1. It looks like "100-continue" is present
About mounting/unmounting. As I mentioned I'm copying big structure with cp -r. I never managed to copy the whole directory without at least a dozen errors like this. I mounted/unmounted many times but some errors are always present.
Thank you, Vlad
Vlad- I wonder if you're running into this problem: http://forum.jungledisk.com/viewtopic.php?t=8203
coincidentally, s3fs' readwrite_timeout defaults to 10 seconds too
here's something to try: in s3fs set readwrite_timeout=5 (i.e., half of what amazon's idle connection timeout is)
Hi rrizun, This installed perfectly on my EC2 system but it fails on my centos computer. The error I get is:
Now I know I definately have curl installed on my system as seen below:
I also know the file libcurl.pc does not exist on my system (locate can not find it) Is there a way I can compile it without it looking for libcurl.pc? Thanks
Hi- looks like the problem is actually specifically curl_multi_timeout... for that you'll want to update to at least libcurl 7.15.4 (7.12.1 is too old)
Will do.. Thanks
Hi again, I installed libcurl 7.18.1
I'm still having issues with curl. Now the error I'm getting is:
what am I doing wrong. Thanks in advance
looks like a link problem... compiler sees header file(s) but linker does not see library file
WOW... I fixed it. In case anyone else encounters this problem then this is the fix.
The key is in the error below
I looked in the libcurl.pc file that was installed and saw the following:
prefix=/usr/local exec_prefix=${prefix} libdir=${exec_prefix}/lib includedir=${prefix}/include Name: libcurl URL: http://curl.haxx.se/ Description: Library to transfer files with ftp, http, etc.I removed the line "URL: http://curl.haxx.se" and it worked. I don't know the syntax of .pc files but I guess URL must be a reserved keyword or something.
I tested it and it worked beautifully except one new issue which I hope is not a bug.
I have the following setup. I have my ec2 system mounted and pointing to a certain bucket and I have my centos home computer mounted and pointing to the same bucket.
ec2/mymount -> s3/mybucket homecomputer/mymount -> s3/mybucket
When I place a file into ec2/mymount it shows up in s3/mybucket AND I can see it in homecomputer/mymount. BEAUTIFUL!
When I place a file into homecomputer/mymount it shows up in s3/mybucket AND I can see it in ec2/mymount. NICE!
However, when I place a file into s3/mybucket through s3fox it shows up as an EMPTY file in ec2/mymount and homecomputer/mymount. EWW!
Is this a bug or is my setup faulty
glad you got it working!
as for the s3fox thing, see issue 27
the gist of it is that there isn't really a "standard" way to represent an actual folders object in s3; each s3 client makes up their own scheme for representing folder objects
might wanna try jets3t cockpit instead?
Whenever I try to cp a file < 500 MB I get a No space left on device error and only the first 500 MB copy.
Sorry, forgot data that might be helpful...
$ dd if=/dev/zero of=/opt/s3/500MB bs=1024 count=512000 dd: writing `/opt/s3/500MB': No space left on device
$ dd if=/dev/zero of=/opt/s3/500MB bs=1024 count=506880 dd: closing output file `/opt/s3/500MB': Input/output error
$ dd if=/dev/zero of=/opt/s3/500MB bs=1024 count=501760 501760+0 records in 501760+0 records out 513802240 bytes (514 MB) copied, 121.188 seconds, 4.2 MB/s
# uname -a Linux localhost 2.6.23.1-linode36 #1 Sun Nov 4 12:03:06 EST 2007 i686 GNU/Linux
# dpkg -l | grep fuse ii fuse-utils 2.7.1-2~bpo40+1 Filesystem in USErspace (utilities) ii libfuse-dev 2.7.1-2~bpo40+1 Filesystem in USErspace (development files) ii libfuse2 2.7.1-2~bpo40+1 Filesystem in USErspace library
s3fs-r166
how much free space is on /tmp (or where ever posix tempfile() would create its temporary files)? s3fs creates temporary local files first before uploading to s3 so is there a chance you're running out of local hard disk space?
Yep: 526 MB in / (it's a linode, so I don't get to partition the root disk). Is there an argument to change where it creates those temp files?
hey. i'm having problems with mv, as opposed to cp. i have s3fs mounted at /var/n54data. when i do:
dd if=/dev/zero of=sux count=1024 bs=1024; cp sux /var/n54data
the resulting file (/var/n54data/sux) is 1M in size. however when i do:
dd if=/dev/zero of=sux count=1024 bs=1024; mv sux /var/n54data
the resulting file is 0 bytes in size.
p.z. s3fs is awesome.
s3fs uses posix http://linux.die.net/man/3/tmpfile to create its temporary local files... so, currently, no, there is no s3fs argument to control where temporary files are created
hmmm... cp vs mv... the "mv" command itself should detect a cross-filesystem move (EXDEV) and should fall back to a brute-force "cp" itself anyway, all without any knowledge whatsoever to/from s3fs (s3fs just sees it as a cp/write). any chance of running out of local disk space for temporary local files? (s3fs creates a temporary local file before uploading to s3)
re: cp vs. mv ; yeah disk space was not a problem. also in the strace, it did try a rename and then fallback to cp. also it's using read/write, not mmap, so no fuse problem there. actually the straces look extremely similar; however the mv one has a SYS_320 line it (?), and of course also contains a trailing unlink :(
are you able to reproduce the problem?
re: cp vs mv
I just tried this and was not able to reproduce the problem. in both cases I ended up with a 1MB "sux" file on s3... the trailing unlink seen via strace sounds normal; any chance that this is a s3 "eventual consistency" phenomena?
one other thought: is s3fs local cache enabled?
Hello rrizun,
I'm logged in as the user "root".
I mount a directory as another user, e.g "apache" as seen in the code below:
I do this so my http server, running as user "apache", can write to the /mymount directory.
The problem is now the user "root" does not have access to the /mymount directory as seen below:
My question is how come the superuser "root" does not have access to a mount owned by "apache"
use fuse "allow_other" switch... search for "allow_other" on this page
Is the s3fs-devel google group locked? There haven't been any posts since May 16 ... I also posted something since then and the post has not yet appeared.
shouldn't be locked (didn't know it could be!) just posted a test message...
I am new to s3fs. When I read that you mount your bucket with
(or providing a password file) I wonder whether the secret access key is transmitted securely to amazon. Is openssl enabled by default, and are there any optional switches that I should know about, to make sure I never disable ssl encryption?And what about file transfers? I'd assume they use http by default. Is there an option to change to https?
Hi- s3fs never transmits the secretAccessKey over the internet; s3fs (or any other s3 client for that matter) only transmits signatures created locally from the secretAccessKey; That is actually by amazon S3 design.
For the actual file transfer, "http" is used by default but can be overridden by usnig the s3fs "url" command line option and specifying, say, "https://s3.amazonaws.com"
hope that helps!
Has anyone gotten https access to work? http access works fine.
I keep getting the following error: ###problem with the SSL CA cert (path? access rights?)
i'm running s3fs r166, debian etch w/ libcurl 7.15.5, libfuse 2.7.1, openssl 0.9.8
the curl command line is able to access https://s3.amazonaws.com without error so i know my CA bundle is correct. it seems like the s3fs code may not be initializing the path to the CA bundle.
Running from the commandline: curl --trace foo https://s3.amazonaws.com yields: == Info: About to connect() to s3.amazonaws.com port 443 == Info: Trying 207.171.185.193... == Info: connected == Info: Connected to s3.amazonaws.com (207.171.185.193) port 443 == Info: successfully set certificate verify locations: == Info: CAfile: /etc/ssl/certs/ca-certificates.crt
Thanks Michael
Hi- I got s3fs+https to work on Fedora by merely setting url=https://... dunno about etch, perhaps fedora's default setup differs from etch such that "it just works"?!?
hey rrizun, sorry for the delay.
yes local_cache is enabled with the problem is exhibited.
Hi- and the problem is not exhibited when local_cache is disabled? feel free to add a new Issue to track this problem and I'll take a peek at it when I have a chance... Thanks!
I have https working on both lenny and hardy with libcurl4-openssl-dev 7.18.0 and ca-certificate.
I had to modify s3fs to use curl_easy_setopt(curl, CURLOPT_CAPATH, capath) with capath set to /etc/ssl/certs. See: http://curl.haxx.se/docs/sslcerts.html
Hope this helps. -Shane
Hello! This is a great project, and it seems to work very well. However, of all the things, I actually need the deep-rename capability for directories, which currently is not supported. What would it take to enable this feature?
Should s3fs_rename() become recursive, somehow? It appears to me as if this would require a 'rename' operation on every single object under the entire directory tree, right? That sounds expensive, though...
Or could it just check whether the target is a directory and then instead execute a 'cp -r' followed by a delete of the original directory? I think this would be an even more expensive operation, though, right?
Hi- yes, s3fs_rename() would need to recursively rename each sub-object and would typically be a long running operation (but should be able to be seamlessly resumed if interrupted); should be able to do deep directory rename now with some /usr/bin/find trickery, though I haven't looked into it myself; deep directory rename is one of the higher priorities, just need some spare time to implement it!
So.. Everything had been working just fine and now, for some reason, s3fs won't mount my s3 file system. I don't get any errors when I try to mount manually, just an empty directory when I do an ls on the mount point.
-Adam
try "tail -f /var/log/messages" to see if it reveals anything; also, can you, e.g., mkdir and does the directory show up? as well, you can use another s3 client such as jets3t to inspect the s3 data/objects...
Please add http://s3backer.googlecode.com/ to the "See Also" list. Thanks.
symlinks are now added. The wiki still says otherwise. That was a big requirement for me. Do others a favor and update it please.
done and done; Thanks!
That help at all?
-Adam
Just as a suggestion, if someone were to put together a more canonical set of instructions for installation on the various linux flavors it would do a lot of good for helping adoption.
Also, can anyone speak from experience about mounting an s3fs as an NFS share? Performance degradation?
Hi Dylan- indeed, more docs would be good; as for NFS, I have not tried myself, nor have I heard of anyone else doing so; I'm sure there would be some level of performance degradation due to at least the file system layering... feel free to give it a try and post your results! =)
adamchernow- doesn't say much other than there was no write activity; can you write to the mount point? what does jets3t say?
s3fs seems to be using a unique way of creating directories on S3. Is this ever going to change so as to be more compatible with other s3 tools?
S3Fox? and JungleDisk?(when in compatibility mode) both use a similar directory scheme and can read each others buckets. However neither of them can correctly parse the s3fs directory scheme.
I understand that there is no standard way of doing directories on s3, but I guess there must be a de facto standard arising as the jungledisk site states: "most S3 applications now use a simpler scheme that uses object names that resemble standard URLs, made possible by a feature called delimiters added to Amazon S3 last year."
Hi- Indeed I am interested in making s3fs compatible w/other s3 clients, as with everything its just a matter of finding time to do so; the "delimiters" feature (if it is what I'm thinking) is already being used by s3fs, but there are additional considerations wrt a complete "universal" folder representation convention; hope that helps!
I did some tests with some of the tools I'm considering. Seems that no tool is really very compatible with any other in regards to directory schemes. I made a comparison chart: http://www.coolbutuseless.com/post/40760186/s3-is-useful-as-a-backup-system-if-backups-are
I also tried the other 2 s3 fuse solutions and they aren't compatible with anything either.
Does the caching option cache the directory structure and files? Some of the filesystem needed for this application is very complex. Searching this data is based on filenames, and it seems as if the complex directory structure is holding things up, is that possible ?
Thanks.
Hi- as coded, the cache option caches file contents only; s3fs always goes online to s3 to determine the directory structure
>>> it seems as if the complex directory structure is holding things up, is that possible ?
it is possible; try using tcpdump/ethereal/wireshark to inspect the packets to get an idea if there's any "thrashing" going on
>>> it is possible; try using tcpdump/ethereal/wireshark to inspect the packets to get an idea if there's any "thrashing" going on
Seems overly complex, I doubt I could interpret the results. I'm moving to the S3 storage platform, and the directory structure is a matrix. e.g.) a/b/0/0/0 , a/b/0/0/1 , to a/b/9/9/9. Quite a few folders and a great deal are populated.
When accessing my structure under s3fs, I do see delays the deeper I go. The delays are more noticeable when doing a "find", as opposed to a simple file read under PHP.
How does one "remap" under s3fs ? Would rebooting be the only option ?
Hi- looks like the "delay" that you're seeing is directly related to the depth of the directory structure; looks like fuse does a "stat" on each directory level before each file access
e.g., if I have a dir/file like "/a/b/c/d/e/f/g/abc.txt" then fuse (or something else higher up) will end up doing a stat on "/a" and then "/a/b" and then "/a/b/c" etc... before finally ending up at "/a/b/c/d/e/f/g/abc.txt"; these are all separate and independent S3 HTTP HEAD transactions as far as s3fs is concerned
don't think s3fs has much control over that since it is really just a "slave" to the fuse subsystem; s3fs could do some caching/second guessing, I guess, but then that might start to get a little bit tricky...
>>> The delays are more noticeable when doing a "find"
if you're interested in speeding that up... in the source code, in "s3fs_readdir()" search for "max-keys" and set that value to, say, 100 (instead of 20)... recompile and remount and see if that makes a difference; it might speed up "find" by 2x or so, ymmv
Anyone experienced high load burst on a server, using s3fs?
Hi- libcurl seems to have a "bug" wrt high numbers of "multi" handles... symptom is high cpu load... to "fix" it, in the source code, in s3fs_readdir(), search for "max-keys" and set the value to something less than 20, say, 5... recompile and remount and see if that makes a difference! (though that'll slow down readdir()... i.e., less parallel reads)
the other possibility is if local_cache is enabled? i.e., calculating local md5 checksums, etc...
judging by the "burst" observation, I'm guessing its the "max-keys" issue
Hi rrizun, I am having a heck of a time right now trying to figure out why I am getting IOEs. I have been building a test instance with FC9 using s3fs r166 and have the latest curl (7.18.2).. checking my logs, s3fs is returning ###response=403.. I have double checked my access keys and they are fine.. any idea of what might be causing this... also... the program is great and very much appreciate.. would like to make a donation to you if we can..
Hi- is the local time set correctly on your fc9 box? if that isn't it then try using tcpdump (e.g., "tcpdump -s 1500 -A host s3.amazonaws.com") to capture the xml document that s3 returns for the 403 response
Thanks rrizun... I checked the time when I started but it had been reset... All is working.. will put it through its paces now and see how far I can push this..
rrizun... would like to report good things about the s3fs/NFS experiment but so far:
exportfs: Warning /mnt/s-three-test does not support NFS export.
Does some knucklehead/hacker or combination of both anticipate what it would take to get s3fs to "support NFS export"? I've yet to find anyone sharing about this, maybe it's too knuckleheaded I don't know.
btw... (note). anyone looking for some sick (in california that means 'good') Debian and Ubuntu AMIs should check out http://alestic.com. I only found these this morning and I am sad I didn't start using them earlier. that is a personal recommendation not some kind of commercial or anything, they definitely are in good shape to work with s3fs "out of the box". sometimes the kernal module thing can be weird on those xen-based AMIs, this has been my learning experience.
haven't tried exporting via nfs myself... see http://fuse.cvs.sourceforge.net/*checkout*/fuse/fuse/README.NFS
genius. it's crazy enough it just might work.
It'd be really great if it supported setxattr() and getxattr() to set metadata on files. Specifically, if there was a key for the canned ACL, you could easily write a tool to retrieve and set canned ACLs on files (eg "setacl public myfile") by calling setxattr().
It'd also be really nice if you could specify a transform - such as compression or encryption - for files to be stored with.
One other note: It'd be handy if one of the attributes you could retrieve via getxattr was the md5 sum for the file.
Hi rrizun,
I just tried to use s3fs with mpd (music player daemon) by pointing mpd at the s3fs mount. When mpd updates its database, it looks at the file modified times for everything in the directory (my bucket of lots of mp3 files). If the file is either non-existent in its database, or its modified time is newer than what mpd stored as its last modified time, it searches for ID3 tags in the file, extracts them and creates a database record.
When I did an mpd database update on my s3fs bucket, it basically started downloading my entire music collection, incurring quite a lot of OUT bandwidth. I am pretty certain that the ID3 tag algorithm only needs to read a part of the file, since on a local disk it can generate an entire database of 5000+ MP3s in about 2 minutes, which would be an extraordinary feat if the disk were reading all 25GB of data into memory.
But it seems that when the file is opened for reading, s3fs downloads the entire file. This is expensive and time-consuming. Of course, if mpd's file modified time algorithm does its job, this should be a one-time cost for each file.. it will skip over doing a read on files that have the same modified time it's expecting. And I don't modify my mp3 files very often ;)
Do you know of any way to permit the ID3 tag algorithm to do some seeking around in the file, find the start of the ID3 tags, and extract the needed few kilobytes, rather than downloading each 5+ MB mp3 file? It seems like this would more or less demand a large amount of round trips between the local system and s3, but if it could read small "pages" of the file (say, 64KB at a time) as the data is requested, this would make the ID3 tagging much more bandwidth-efficient.
Thanks,
Sean
SMcNam- as coded, s3fs follows a brute-force "all-or-nothing" strategy; indeed it will download the entire file even if ultimately only one byte needs to be read; the same phenomena that you're seeing can also be seen, e.g., by browsing an s3 mounted folder with GNOME Nautilus (or MacOSX Finder); they'll both wanna read the first few bytes of each file in order to determine their file types; I believe there is already an issue tracking this feature enhancement
arachnid- the extended attributes sounds like a great idea- feel free to add a new issue to track this feature!
rrizun, good point... It seems the two best FUSE-based s3 access methods each have a substantial weakness leading to undue bandwidth expense; let me explain.
s3fs is very good at uploading files to s3, whether small or large... it doesn't incur extraneous PUT/LIST/GETs because it seems to do a single PUT for an arbitrarily large file.
Of course, having a 1:1 mapping between actual files and s3 files has its drawbacks. Just as you said, to do any I/O on the file, it has to be retrieved in its entirety. So s3fs is very write-efficient, and very read-inefficient.
s3backer, on the other hand, reverses the problems. s3backer treats a ton of S3 files as a single virtual file in FUSE, which corresponds to a virtual hard disk drive. You then format a real filesystem on to, such as ext2. ext2 divides its data into blocks for the purpose of efficiency when reading; for the same reason, s3backer treats every few kilobytes of its virtual file as a separate S3 file. So if I want to read from a single byte inside a 20-meg file, s3backer asks the filesystem's allocation table where that byte is, and maps it to a file on S3, which is between 4k and 64k in size. So it's still getting one entire S3 file to provide your data, but that S3 file happens to be 64k instead of 20 megs.
Here comes s3backer's pitfall: when you start to deal with actual files that are several megabytes large, they are stored as tens, hundreds or thousands of different S3 files. Whenever you would like to read or write this entire file sequentially, in the background you are accruing tens, hundreds or thousands of GET or PUT requests into S3.
This has two major disadvantages: first, in the case of thousands of requests, even a multithreaded s3backer can still only upload your file at a maximum of 100 KB/s or so: the overhead of initiating a new GET/PUT request every 4k to 64k (depending on your configured block size) severely limits the amount of data that can be transferred at a time. There is a large amount of waiting done at the network layer when you're constantly initiating new TCP sockets. Second, those pennies really start adding up as your thousands of GET/PUT requests pile in.
Is it theoretically possible to design an s3-backed, Linux VFS mountable filesystem (whether FUSE or not) which is optimal for both sequential and random access? Can s3fs be improved to provide smarter random access without grabbing entire files? Or can s3backer be improved to provide smarter sequential access without creating thousands of individual TCP sockets (not to mention incurring many PUTs on your bill) for each block?
I am very interested in the direction ahead; perhaps we need guidance from Amazon to fully understand their intent with S3 and to help us come up with an optimal solution.
Thanks,
Sean
Just wanted to clarify the early parts of my last post: the issue is not reads vs. writes, but really sequential vs. random access. Random access (seek to a particular place, start reading) usually doesn't involve examining the entire file, so in that case we only want a small amount of the file to go over the network, regardless of whether it's an upload or a download. Sequential access is most efficient over S3 if it can be wrapped in a single GET or PUT request.
It seems like we can't have the best of both worlds unless S3 API allows us to read parts of an S3 file without downloading it.... if so, that might be the way to proceed: Go with s3fs, because it provides the 1:1 mapping, but try and be "smart" about VFS/FUSE requests for I/O, by only downloading pieces of the file that are requested.
Hmm: You might have to go with an adaptive algorithm that starts with small pieces when I/O begins, and "catches on" if sequential access continues in a predictable way, expanding the piece size exponentially to reduce the amount of PUTs or GETs as the I/O continues.
First: a moment of silence for yesterday's S3 incident.
Second: I found a really cool use for s3fs I'd like to share. Maybe others are doing this too but I'll share anyways. It is so simple.
Basically the challenge is migrating our initial data over to S3. So today I mounted the same S3 bucket four times:
/mnt/s3_1 /mnt/s3_2 /mnt/s3_3 /mnt/s3_4
then wrote a little multi-threaded uploader that mapped those s3fs directories to local directories and ran an rsync -ra SRC DEST on them simultaneously. Too cool, I thought. Really cut the time of transfer down.
This will only work for certain well defined situations where directory structure is such that the structure can be divided sensibly. And as you know, rrizun put in the work to support rsync, so that is great too.
Regarding the performance of s3backer's "block" access.
s3backer 1.0.x serviced each request to write a block within the FUSE thread that requested it. This ended up giving very slow write performance, because as it turns out the kernel only issues one write request at a time, and each block write would have to wait for the one before it to complete.
Version 1.1.x supports asynchronous parallel writes out of the block cache, which allows lots of blocks to be written simultaneously. Now you can saturate the network if you so choose, which is as it should be.
An interesting question is should you do the same thing, i.e., parallelization, on the read side? This would involve predictive caching ("read-ahead") of data. I.e., if you get a request to read block zero of a file, assume the next few blocks are going to be needed soon as well and read them all in parallel.
s3fs has these same issues in theory, but at the file level instead of the block level. E.g., one could imagine an application which reads or writes a bunch of files at once sequentially. Each operation would have to wait for the one before it. Of course s3fs could solve that problem the same way, i.e., using an asynchronous writer thread pool.
So there is a trade-off between file vs. block access and it probably depends on your particular situation which "granularity" is best.
Is anyone else having issues with the local cache? No matter what I try files are not being cached and constantly get re-downloaded from S3. Running on an Ubuntu EC2 AMI (feisty) with latest s3fuse (r166). Files are being read successfully (and served by apache), but not cached. Command:
/usr/local/bin/s3fs my_bucket /mnt/files -default_acl=public-read -ouse_cache=/mnt/cache -o allow_other
Figured it out. Don't use underscores in your bucket names!
Hi- I don't think underscores in bucket names has anything to do with it; underscores in bucket names are legit; I think it was just a coincidence that local cache started to work when you removed underscores; I have heard other reports of local cache behaving like this so there does seem to be an issue w/local cache, although it does not appear to be a service-affecting issue
just a thought: could it be a permissions issue wrt the local_cache folder?
It looks like you're right as the problem is back. The weird thing is, some instances (1 out of 10 at the most) will cache properly yet most will fail to cache at all.
I am creating /mnt/cache on boot (in rc.local) and chmod'ing immediately afterwards to 777, so I don't think its a permissions problem on the directory - s3fs is running as root anyway.
Without local caching being consistent s3fs is pretty much unusable for me right now :(
More weirdness, here's what I did:
Tried the same process 3 times with exactly the same results. Why would it cache the first file and then refuse to cache anything else?
Nothing in my logs to suggest any errors either.
Also, it will consistently cache the initially cached file multiple times without fail, ie:
Hi- just to clarify what's happening here... when you say "cached" and "no cache"... do you mean cache hit vs cache miss? that is, w/cache enabled, when s3fs downloads a file, it places it in the cache folder... however, that's not a "hit" or anything, only when the second and subsequent requests come in does s3fs look at the cache folder to see if there is a cache hit.
so, having said that, sounds like what you're seeing is (a) the local cache folder is being populated but (b) there are never any cache hits because, despite the fact that the file is in the local cache folder, s3fs still seems to download it (again) from s3
does that should about right?
By "no cache" I mean the file is not actually being saved to the cache folder.
Hi bmilleare- I've done a checkin of r177 that fixes a subtle stale curl handle/timeout issue; there could conceivably be some sort of interaction between that and local cache; so, if you're still interesting in resolving this, feel free to do a svn checkout of r177 and rebuild and retest and report your findings?!? I'm currently stumped on this one, so, even if this fix doesn't solve the local cache issue, it will still be addition info/help narrow things down... Thanks!
Building on CentOS 5.1 x86_64 results in gssapi_krb5 errors. s3fs.cpp: At global scope: s3fs.cpp:439: warning: ‘size_t readCallback(void, size_t, size_t, void)’ defined but not used /usr/bin/ld: skipping incompatible /usr/lib/libkrb5.so when searching for -lkrb5 /usr/bin/ld: skipping incompatible /usr/lib/libkrb5.a when searching for -lkrb5 /usr/bin/ld: cannot find -lkrb5 collect2: ld returned 1 exit status make: all? Error 1
dunno... are you running "configure" on libcurl? if so try --disable-krb4 --without-krb4 (just guessing)
any predictions when this fs will work with EU buckets? We have really slow connectivity to the US buckets from our ISP and EU buckets should be much faster, so it would be great if we could use those...
oh and by the way, nice work... we use this together with Bacula to do server backups that need to be readable in multiple places - before we had to ship physical tapes around the world.
yes, EU bucket support is on the TODO list! no ETA, just hafta find time to have a look!
ok great, if there's any way I can help, let me know... I know you're doing this in your spare time, very much appreciated :)
On Ubuntu do: sudo apt-get install fuse-utils
On my 8.04 desktop installation it is installed by default.
How do I stop it?
umount (or fusermount -u)
On OS X 10.5.4, using Finder to copy (drag and drop) gives an Error -36. ("The Finder cannot complete the operations because some data in 'YourFile?.txt' could not be read or written. (Error code -36).") The resulting file has file size zero.
However copying from the command line (using cp) works perfectly.
Sounds like this might be a MacFuse? issue, but I thought I'd mention it here just in case.
Thanks for the excellent work.
answer might be in issue 30 http://code.google.com/p/s3fs/issues/detail?id=30
FYI for a pre-compiled commercially supported enhanced variant of s3fs, see http://www.subcloud.com
Hey rrizun, Does this work for Microsoft Windows?
Hi koficharlie- its Linux and MacOSX for now
Hi,
I've been using various S3 tools prior to s3fs, most recently "s3cmd" by Michal Ludvig (http://s3tools.logix.cz/s3cmd). It seems like s3fs would be a lot more convenient for me, but I'm having the issue that s3cmd and s3fs don't seem to be able to see each other's files.
I mount the s3fs volume, and there are no suspcious errors in /var/log/messages - I can mkdir a directory, unmount, remount and that directory is still there, along with its files. However, I can't see the files that s3cmd created in s3fs, and neither does s3cmd see the directory and/or files that s3fs created. I can't see any extra buckets being created in s3cmd either, so I don't think it's that the bucket name is wrong, although the name does include a hyphen - I don't know if this causes any problems somehow.
Any ideas what the problem might be? Or should I not be surprised that these two packages don't seem to read each other's entries?
Hi cartroo-
I've heard of s3cmd but have never used it until now; looks like s3cmd does not really have any concept of folders
in general, the various s3 client programs each have their own scheme for files and folders
s3cmd should be able to see files/folders created with s3fs, in raw form
s3fs should be able to see files create with s3cmd as long as the s3cmd commands issued are "compatible" with the way s3fs wants to view things; you probably don't really wanna do that though
(I'm typing this while listening to 3 other people, so, hope that makes sense!)
df shows that an s3fs filesystem is mounted; but it does not tell me which bucket it is.
How can I tell which bucket s3fs has mounted?
>>> How can I tell which bucket s3fs has mounted?
You can't if you're using df, unless, e.g., some sort of naming convention is followed.
You can, however, use something like "ps ax | grep s3fs".
I'm running an EC2 instance and using s3fs actively with 6 different buckets. The problem I'm seeing is that s3fs is consuming a great deal of memory, and does not seem to release it. I'm using the newest source, without caching.
Is this truly a memory leak, or some other component not working right ie) fuse or curl.
Here's the info from top:
Any assistance would be appreciated.
Hi- what is the nature of the s3fs memory consumption? just trying to characterize the problem: does it slowly ramp up over time or does it consume that much memory right away? as well, are the files large in size? are there directories with lotsa files? Thanks
A quick summary would yes to all ! Files range 4-5mb for flv, lotsa php script, and we use a three tier folder lay out, so always 3 sub-directories in most cases.
Thanks for your quick response !
Hi- about the only s3fs resource I can think of that's unbounded is curl handles; s3fs maintains a pool of persistent curl handles; if s3fs needs a curl handle and the pool is empty then it allocates a new curl handle and then returns it to the pool; under normal use I can see the pool having, say 50 curl handles, however, under heavy concurrent use it could be 200-300+; not sure off hand how much memory a curl handle consumes; might want to monitor the s3fs process using "top" and then use s3fs in a highly concurrent manner and see if there is a continuous memory consumption ramp up
Hi, this project seems like a perfect fit for me. However, I'd like to join in with the others in asking for EU bucket support. Do you suppose you could give a general idea of when you would be able to find the time? AFAIK EU buckets work exactly the same way as US, except for changing the site-id or something similar (I haven't had a look at the S3 API for over a year).
Hi,
Apparently rsync does not correctly set the content-type of files. when I copy jpeg files using 'cp' the content-type is set to 'image/jpeg'. When I copy the same files with rsync the content-type is 'application/octet-stream'. Any help would be appreciated.
Hi Martin- I did have a peek at what it would take for EU bucket support; the original US bucket naming scheme that s3fs uses is different and not compatible with the new EU bucket naming scheme (e.g., mixed case, underscores, etc...), see http://docs.amazonwebservices.com/AmazonS3/2006-03-01/index.html?BucketRestrictions.html if s3fs were to use the new EU bucket naming scheme then that would break s3fs for existing buckets that were created using the original US bucket naming scheme; is the subcloud binary an option?
Hi voegtlin- indeed, sounds like a bug; s3fs sets content-type based on file name extension; rsync does its initial upload with a temporary file that ends with random characters, since those random characters do not match any known file extension from /etc/mime.types, s3fs sets the content=type to octet-stream; having said that, s3fs "rename" would need to be enhanced to re-lookup the content-type based on the new file extension during rename
thank you for the explanation. I have another question. I have noticed that 'stat' returns a size zero if a file has been recently created. I create a file (about 20K in size) and do a 'stat' right after it is created, to check that its size it correct. The returned size is zero. However, if I add a 'sleep 0.1' command between file creation and the 'stat' command, then the size is correctly returned. is this a bug ? is there a way to avoid this ?
Hi mite.net- I guess an EU version is an option
Hi voegtlin- do you see this behavior 100% of the time? or is it intermittent? as well, are you doing the stat in parallel before the 20k file has finished uploading? or is everything being done serially? any feel for if it is "eventual-consistency" related?
no, I do it sequentially, and it happens 100% of the time. the file is a thumbnail of an image, that is created with 'convert'; perhaps it has to do with convert ?
dunno if this has anything to do with it http://developer.amazonwebservices.com/connect/thread.jspa?threadID=25535&tstart=0
can you post a script that reproduces the issue?
Works fine (once I got the right program!) I haven't tested all the combinations, but at least /etc/passwd-s3fs is used correctly.
Sorry for the wasted bandwidth. I deleted my postings (to preserve my honor and to prevent more confusion), so you might want to delete yours as well.
I'm running Fuse 2.7.4 with the latest S3FS r177 on Amazon EC2 with Linux kernel 2.6.18. About once I a week I'll see a server totally lockup and then after a reboot I go into /var/log/messages where I see the following "soft lockup" resulting from fuse (and s3fs is the only fuse filesystem we run). Any ideas on what could be causing this?
I've seen references to that before; don't know what its all about http://www.google.com/search?hl=en&q=fuse+BUG%3A+soft+lockup+detected+on+CPU%231!&btnG=Search
works for me on Ubuntu 8.10 server after loading the required packages. I note that AWS returns a "403 Access Denied" error if a nonexistent bucket name is specified - this caused me to spend a long time entering the authentication information in slightly different ways until I used the jets3t browser to note that the bucket I think of as "default" when using Jungle Disk has a much longer name from AWS' (and hence s3fs') perspective. with the bucket name fixed, everything's great.
Regarding the rsync performace: We have found that it is even up to 4 -5 times faster to mount a bucket on an EC2 instance and make a remote rsync to the instance (e.g. tunneled over ssh) than mount s3fs locally and make a rsync to the mounted directory. And less expensive too. I might write a simple howto about this method.
Michal Frackowiak http://michalfrackowiak.com
Can I mount the same S3 bucket using S3fs on multiple servers? if so, any known problems?
Thanks
Also, can you please tell me what dependencies I need to install s3fs?
Thanks
what os are you building/installing on?
I am on Debian, thanks!
search this page for "apt-get"
Thanks mate!
Another question, Can I mount the same S3 bucket using S3fs on multiple servers? if so, any known problems?
Thanks!
yes!
And no known problems or issues or things I should watch out for if I attach the same bucket to multiple instances?
Thanks so much for your help
no known problems other than the usual concurrency issues, e.g., no different than NFS mounting the same folder from more than one machine
Thanks!
hi, I mounted a bucket and created some folders and files in it. I mounted the same bucket on another instance. I do not see the files created on the first instance in the second one. When I unmount and mount it again then I see the files.
Second issue is that the changes done using s3fs are not visible from S3 firefox organizer. Am I missing something here.
Thanks
don't know why files are not appearing from the perspective of the second instance... same accessKeyId? same bucket?
as far as s3fox goes, I believe s3fox uses a different scheme for representing files/folders that is incompatible with s3fs; I recommend using jets3t instead because it makes no assumptions about the contents of an s3 bucket
I'm able to mount my bucket and see my file listing in it, but I'm getting input/output errors. I'm assuming that this is not a problem with my credentials or bucket since I'm getting a listing, I don't have a clock skew. I'm seeing the following:
ls -l /mnt ls: cannot access /mnt/dir1: Input/output error ls: cannot access /mnt/list: Input/output error ls: cannot access /mnt/dir2: Input/output error total 0 ?????????? ? ? ? ? ? dir1 ?????????? ? ? ? ? ? list ?????????? ? ? ? ? ? dir2
I've reproduced this on two separate Ubuntu 8.10 machines. I'm not seeing anything in /var/log/messages. Any suggestions for some things I can try to get this working?
Sorry, let me reformat that output:
probably trying to read a bucket w/objects that were created by another s3 tool? if so then s3fs is looking for additional meta data and not finding it
These files were uploaded by s3sync.rb, the Ruby rsync clone... Anyway to fix this?
no real way to fix it because it is not really a problem per-se; s3fs and s3sync.rb do not understand each other's formats; solution would be to re-upload files w/s3fs (alternatively, figure out which meta data needs to be set to make s3fs happy then use another s3 tool to set meta data on those files uploaded w/s3sync.rb)
So is the official answer, at least for now, that s3fs only works with objects uploaded by s3fs, and not by other tools like s3fox? Does anyone know an easy way to convert the objects?
Could somebody give a rough idea of costs please? Do you pay for data transferred, or by transaction? Does the storage cost depend on the amount of data in your bucket, or just the bucket size?
Can you join buckets for larger storage sizes?
http://aws.amazon.com/s3/#pricing
I'm seeing the same error as joeauty. I mount a new empty bucket and the process finishes with no error. However, when I try to go to the mounted directory, it tells me permission is denied.
ls -l returns
Any clue what's going on here? S3FS is perfect for my needs and I'd really like to get it running.
Thanks
oh, forgot to mention -- I'm running Ubuntu 8.10 and the latest version (177) of S3FS. /var/log/messages isn't helping much -- all I get is:
Does S3FS and/or the underlying S3 storage provide any form of integrity checking? I.e., is there any way a file can be corrupted during upload that would not be detected?
s3fs does not check Content-MD5
I'm trying to figure out why rsync'ed files to S3 wouldn't be viewable in Firefox/Google Chrome. For example, if I try to view a .htm or .jpg file, it prompts me to open or save the files. Oddly enough they view fine in IE. Any ideas? Maybe s3fs isn't recognizing my /etc/mime.types?
that is possible; in ff, what does "Tools -> Page Info" say about the Content-Type?
It says "text/html" if I open a .htm file.
ah, I know what the problem is: s3fs does not consult mime.types on renames... when rsync copies, it copies to a tmp file and then renames it at the end, thus the loss of s3fs content-type... to fix it, just add these lines to s3f3_rename
meta["Content-Type"] = lookupMimeType(to); meta["x-amz-metadata-directive"] = "REPLACE";
(Note- NOT tested!)
Thanks, that worked perfectly :)
Is the size of the local file cache constrained? Or will it just cache files until the disk fills up? What happens when the disk is full?
no constraints... as a workaround, use a cron job to periodically prune/purge the local file cache
I'm mounting an S3 volume to multiple servers. When a user uploads an image to one of the servers, I want it stored on the S3 bucket so all servers can see the same images no matter which server uploaded it or which server is serving their page.
The bucket mounts to the instances without trouble, but I'm having trouble getting the permissions correct so that the apache user can write to the bucket. The bucket is symbolically linked from my web directory to /mnt/mybucket. I've also done the -o allow_other command when mounting the bucket. The bucket is set as world readable and I've tried everything from owner writable to world-writable. If I look at the directory permissions, it is showing rwxr-xr-x.
Is there a way to do this???
A workaround for the rsync/mime-type issue mentioned on "Feb 14, 2009" is to use the "--inplace" option with rsync. This forces it to write to the correct filename to start with, rather than writing to a temp file and renaming it. Which means the right content-type is set.
The '.' and '..' entities are not listed with 'ls -a'. Is this the expected result?
>>> The '.' and '..' entities are not listed with 'ls -a'. Is this the expected result?
yes
Hello! I've been using it for a few days in my EC2 server (An Ubuntu x86 Server instance). Yesterday, one of my s3fs mounts stopped working without reason, and would print "Transport endpoint is not connected" as a result of any try to access it. Re-mounting it fixed the problem (Temporarialy?).
Running ls -l in the parent folder displayed the "defective" mount point in red, with plenty of question marks:
In my /var/log/messages, I have a s3fs segfault (This one-line entry is the only interesting thing...)
This is everything I could find - Although it probably doesn't help much... :( Has anybody else had this issue? Any chances to get it fixed? (May I help? How?)
Anyway, Thanks for this great software ;)
I noticed that ls -l can return d????????? when a directory name contains a trailing /, which can be present if some other s3 clients have interacted with the bucket. I don't know if this applies in your case.
Hi - I would just like to add my voice to the requests for an Amazon S3 EU-compatible version of s3fs, please? :-)
Mieses, I didn't have any directory inside the crashed bucket (Just a long and messy list of files), so I think the "/" problem probably doesn't affect me.
And no other client was active when it crashed.
Crashed again :( Is anybody else running s3fs in a server? I'm getting these random crashes every 1-2 days. :(
I am backing up a lot of data with a script and rsync. All was well until i modified my script with nano and because the cache had filled my os drive (small 10gb) my edit wasnt saved and i lost my script. Currently disabled the cache. Is there anyway to manage the cache ? Is it really required if mainly writing data ?
I downloaded the "featured" code bundle s3fs-r177-source.tar.gz and compiled it on my Ubuntu 8.10 box. It never managed to establish a proper connection (I didn't get to the bottom of why) but my S3 Account Usage says I've made 1.3 million requests and will be charged accordingly! I imagine the code must have been looping, but that could prove to be a costly loop for me...
st...@bov.nu- there is no local cache management; you can use a cron job to periodically purge the local cache; in your case it sounds like you probably don't even need local cache-
paulo.raca- I'm not aware of any crash conditions in the s3fs code itself; wondering if its one of the libraries? can you get a coredump and invoke gdb on it for a traceback?
A cautionary tale (and some debugging tips) about using this with SSL: If you are going to use -ourl=https://s3.amazonaws.com, make sure you do not have a trailing slash on the end of the URL (I did this at first, and it took quite a lot of effort to work out why). If it works without SSL but not with it, a good thing to check is that you don't have a trailing slash on the end of your URL.
In case you are getting a different problem, here are some debugging tips I worked out while debugging this:
It writes information to syslog - check /var/log/syslog or where your system logs to for lines, which look like this:
If you get an error response (like 403), AWS sends back a response in an XML language explaining the error. The problem is you can't see the error. If the problem occurs for HTTP too, the easiest way is to use wireshark or tcpdump or another packet capture program to spy on what s3fs is doing and read the error message. If, like I was, you are encountering a problem exclusively for SSL, however, read on.
My problem meant that readdir (triggered by ls), as well as practically every other call, were not working. For ease of debugging, I chose to debug what happened when readdir is called.
s3fs sets the curl option CURLOPT_FAILONERROR, which makes it hard to get the output. So I went into the function starting with: s3fs_readdir(const char *path, void *buf, fuse_fill_dir_t filler, off_t offset, struct fuse_file_info *fi) { and changed the following line: curl_easy_setopt(curl, CURLOPT_FAILONERROR, true); to: curl_easy_setopt(curl, CURLOPT_FAILONERROR, false);
Change this line and recompile by running make.
This is only a temporary change for debugging - it will cause problems if used in production, so don't forget to change the line back and recompile once you solve the problem.
s3fs forks when it runs, so the best way to debug it is to start it normally from the command line, and then attach to it with gdb... ps ax |grep s3fs From here, get the PID of s3fs, and run
where ./s3fs is the path to your binary, and 1743 and the pid. Type:Now trigger the problematic request with the ls command from another shell, and change back to the gdb shell.
A breakpoint will hit in calc_signature as follows:
Breakpoint 3, calc_signature (method= {static npos = 18446744073709551615, _M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>}, _M_p = 0x42037dc0 "h\006G\001"}}, content_type= {static npos = 18446744073709551615, _M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>}, _M_p = 0x42037db0 "8\020���\177"}}, date= {static npos = 18446744073709551615, _M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>}, _M_p = 0x42037da0 "�\005G\001"}}, headers=0x1479620, resource= {static npos = 18446744073709551615, _M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>}, _M_p = 0x42037d90 "H\003G\001"}}) at s3fs.cpp:393 393 calc_signature(string method, string content_type, string date, curl_slist* headers, string resource) { (gdb) next 394 string Signature; (gdb) 395 string StringToSign; (gdb) 396 StringToSign += method + "\n"; (gdb) 397 StringToSign += "\n"; // md5 (gdb) 398 StringToSign += content_type + "\n"; (gdb) 399 StringToSign += date + "\n"; (gdb) 400 int count = 0; (gdb) 401 if (headers != 0) { (gdb) 404 if (strncmp(headers->data, "x-amz", 5) == 0) { (gdb) 402 do { (gdb) 411 StringToSign += resource; (gdb) 413 const void* key = AWSSecretAccessKey.data(); (gdb) 414 int key_len = AWSSecretAccessKey.size(); (gdb) print StringToSign.c_str() $1 = 0x1470378 "GET\n\n\nSat, 21 Mar 2009 03:50:45 GMT\n/wwjcode" (gdb) contNext, it will likely break in writeCallback, which you can debug like this...
In this case, you will see that the string we signed has /wwjcode in it, and the expected string has //wwjcode, because of the extra slash on the end of the URL. It is likely you will have some different type of error, but hopefully this is enough to get you start in debugging the problem.
Don't forget to change the
back to true and re-run make or you will get hung requests!
In order for me to get /etc/fstab working, I needed to install fuse-utils.
It works great, but when I delete a file from the mount, it is not deleted from the S3. Why?
I am using rsnapshot with s3fs as the target for offsite backups. My rsnapshot logs show the following error:
09/Jul/2009:23:50:45? /usr/bin/rsnapshot daily: ERROR: /bin/cp -al /mnt/s3/daily.0 /mnt/s3/daily.1 failed (result 256, exit status 1). Perhaps your cp does not support -al options?
Specifically, the -l is what's making it fail. The -l flag tells cp to link rather than copy, and a manually-invoked cp -l shows the following:
# cp -l /mnt/s3/test /mnt/s3/test2 cp: cannot create link `rsnap2': Operation not permitted
Hard links on s3fs are not supported. Symbolic linking, however, seems to be supported, with cp -s succeeding.
Is hard link support something that can be implemented?
Hi- hard links imply reference counting, so, unless amazon s3 supports references in the future, it is unlikely that s3fs will support hard links
Hello,
I am having some trouble with the current s3fs on MacFUSE 2.0.3 running on OS X 10.5.7. I can successfully mount a bucket and see its contents, but then can not copy new files into it. If I try to copy a file into S3, I will get an "Invalid argument" error and an empty file.
However, if I repeat the operation, the "Invalid argument" error goes away (although I still get an "Attribute not found" error, and the file now exists in S3.
Furthermore, drag-and-drop from the Finder generates an error and an empty file:
Any ideas on why this might be happening? I have tried s3fs on the same Amazon bucket on a similar configured computer at work, and I do not see this error. How to diagnose? I'm wondering if it might be network related?
Jeff
when you say it works on a similar configured computer at work, is that macfuse too? or is it linux
might want to look at the 'issues'; there are some macfuse related problems and solutions
What steps will reproduce the problem? 1. Compile s3fs.cpp with a modification : #define FUSE_USE_VERSION 26 #define off_t off_t 2. Mount s3fs bucket on amazon, using the command ./s3fs mybucketxxx ~/Desktop/CCCS3 -olocal,ping_diskarb,volname=CCCS3 3. hdiutil create MacHD -size 50G -type SPARSE -fs HFS+ -layout GPTSPUD -stretch 50G -volname MacintoshHD followed by an attempt to mount this : hdiutil attach MacHD.sparseimage
What is the expected output? What do you see instead?
Expected output - mounted sparseimage Seen :
$ hdiutil attach MacHD.sparseimage hdiutil: attach failed - Illegal seek
What version of the product are you using? On what operating system?
Please provide any additional information below.
Version 26 Operating system : Mac OSX Leopard 10.5.7
Yes, at work the computer is also running OS X with MacFUSE 2.0.3.
I have tried a third computer at home (also running OS X 10.5.7 with MacFUSE 2.0.3), and the third computer does not have the problem. So the problem seems to be isolated to that one computer and not the network. It is a mystery to me why different computers running the same software would respond differently.
Based on the error messages, I feel that maybe the issue has something to do with the way OS X deals with extended attributes (an issue that Linux doesn't have(?)). I'll dig deeper into this and post if I find anything useful.
My problem seems to be the same as issue 49 "Can't upload files". Unfortunately, there's no solution for that one either.
Have been using s3fs for some time now, and have begun to have problems with folders with many files. In one case there are 10000 files in the bucket, and any filescan (dir/ls) results in an s3fs lockup, and in one case, server reboot. The command line used is: s3fs -o allow_other fmc_data -o retries=15 -o connect_timeout=8 -o readwrite_timeout=40 /data No caching is used, and have modified the readdir routine, increasing max-keys to 150 seeing a small performance increase. Am tempted to raise this even more, since there are so many files. Am i heading in the right direction ?
for debian lenny, you'll need following packages
Things work beautifully for me except for any operation that involves permissions changes. For instance, a standard "cp filename dest" works fine, but "cp -p filename dest" does not. rsync, which is what I'm trying to ultimately use does a chmod automatically with the same results. A straight chmod errors out as well. All of these errors I believe are related so I think fixing one thing will most likely resolve everything.
# chmod 777 testfile.zip chmod: changing permissions of `testfile.zip': Input/output error
#/usr/bin/rsync -ru /u01/backup/exports/ /s3/backup/exports rsync: rename "/s3/backup/exports/.testfile.zip.FQOmch" -> "testfile.zip": Input/output error (5) rsync error: some files could not be transferred (code 23) at main.c(892) sender=2.6.8?
The rsync is running as root and files are owned by a lesser privileged user, if it matters.
CentOS 5.3 s3fs r177 curl 7.15.5-2.1.el5_3.5
Any ideas?
Hi,
for those running into problems dealing with European buckets I made a small fork working with European buckets url schema.
Could be found at http://github.com/tractis/s3fs-fork
Hope it's useful!
Dave
great to see european bucket support - hope this will be merged back into main development.
anybody know why using rsync over s3fs is very (very) slow? thanks
try rsync --inplace
why does the filesize show up incorrectly on a zero byte file? the same for directories, the size that shows up on a directory is ridiculously huge. this seems to break some applications that aren't happy with opening a file for writing, running stat on the file, and finding out the file size is wrong.
for example ...
# touch foo # ls -l foo -rw-r--r-- 1 root root 18446744073709551615 Oct 18 01:51 foo # mkdir bar # ls -ld bar drwxr-xr-x 1 root root 18446744073709551615 Oct 18 01:53 bar
nevermind, I figured out the problem. it turns out that starting with curl 7.19.4, it returns -1 if the Content-Length is not known . See my post here for further details and a quick and dirty patch.
rrizun, can you please commit my patch to trunk (and cleanup the patch if necessary) for those of us running a newer version of curl?
Anybody successfully using tractis/s3fs-fork? I get Input/output error after mounting and doing an ls ...
TIA
When I try to install I get the following error:
root@localhost s3fs]# make Package 'libcurl' has no Version: field g++ -ggdb -Wall -D_FILE_OFFSET_BITS=64 -I/usr/include/fuse -pthread -L/lib64 -lfuse -lrt -ldl -I/usr/include/libxml2 -lxml2 -lz -lm -lcrypto s3fs.cpp -o s3fs s3fs.cpp:365:25: error: openssl/bio.h: No such file or directory s3fs.cpp:366:28: error: openssl/buffer.h: No such file or directory s3fs.cpp:367:25: error: openssl/evp.h: No such file or directory s3fs.cpp:368:26: error: openssl/hmac.h: No such file or directory s3fs.cpp:500:25: error: openssl/md5.h: No such file or directory
I have installed openssl. Does s3fs not know where my installation is?
@heininger, yes it worked but only when using sudo....
I want to "publicly" serve files with obscure URLs that include a directory component, e.g. ab/cd/efgh.jpg but I don't want ab or ab/cd to be listable. It looks like I'm OK, in that when I try to access a directory via an s3 URL, what I get is an empty file. Am I right that the directories are inaccessible except to s3fs? So if I don't give public read access to the bucket, then nobody can list anything?