For "rm /mnt/Music/File2" on the Client, Samba takes care to delete the "/shares/Music/file2" symlink on the server, so Greyhole doesn't need to do it itself.
The database is not required to rebuild a Greyhole storage pool. If it disappears, re-creating an empty DB is enough for Greyhole to get back to work. The database stores old tasks, but only to be able to debug previous operations, if needed. Otherwise, there's the settings table that stores the sticky directories values, if any. It's better if those don't change, but if they do (because the old values disappeared, and new ones are chosen), it's not the end of the world. That would just mean your data would be moved around if you launch a --balance. And, you can always edit the sticky directories manually if you know where you want to keep sticky files.
I love the idea of greyhole, but I find the samba dependency a bit of a pain -- apart from being difficult to configure, it seems overkill if I want to use the JBOD functionality on linux.
It seems to be that, it should be possible to reuse greyhole with FUSE -- it could also log to a (the same!) spool file, and call greyhole at the right time, as well as hiding the symlinks. At the backend, greyhole could do, well, exactly the same thing. You could even use FUSE to mount the share directory for samba. But having both options (FUSE or Samba) would be good.
Don't know if there are PhP bindings for FUSE which would be the neatest. There are ones in python, I know.
How stable is the log file format? Would this be possible?
Phil: yes, this is technically possible, but not on the roadmap for now. More options are listed in this issue: http://code.google.com/p/greyhole/issues/detail?id=35 Feel free to add your comments there, so that everything about this is all together.
Questions: what happens if the client does any of those operations while the server is still processing actions from the previous client operation? One example: client deletes and then immediately writes a new file2.
What happens if the server crashes at any point?
What happens if the client updates (not replaces) a file at any point? (seek to sector 234323, and overwrite 3 sectors. What if the client keeps the file open for a very long time?)
I assume there is a process that occasionally scans for dead symlinks and rebuilds them? (possibly to a different volume, if the original volume died)
(I am seriously considering replacing WHS with this (via Amahi), and while I don't need it to be enterprise ready or real-time, I do need it to be safe). (I'm also considering Lustre, but that seems like massive overkill for a home).
Is it possible to tell Greyhole you're going to remove a drive, and it needs to maintain redundancy elsewhere before you do?
Does the filesystem of the volumes matter? I assume its probably ext4, usually?
From "in details" I assume all client operations are executed in order, even after a crash, however (not being very familiar with VFS), is this before or after samba? For instance, if the client does things to the samba copy of the file, while the original version is still being rsynced to the greyhole volumes? Is locking used?
Basically, I'm hoping the client gets an ACID transaction view of the filesystem.
Does the MySQL database get cleaned up periodically?
And there is also what happens if its the system volume, or the volume containing the mysql database (before current operations are finished) dies.
Sorry, not trying to be a pest, but I want to understand all possible states that this could be in, and what happens; (Drive-Extender refugee, insert all the reasons people want DE vs traditional RAID here).
Q: What happens if the client does any of those operations while the server is still processing actions from the previous client operation? A: The handling for those changed recently (in 0.8.99). Greyhole now tries to make sure it's not doing anything destructive by looking at the 'future' name of files it processes.
Q: What happens if the server crashes at any point? A: When it comes back online, it will restart the operation it was working on when the crash happened.
Q: What happens if the client updates (not replaces) a file at any point? (seek to sector 234323, and overwrite 3 sectors. A: Greyhole sees that as a write. It does not differentiate writes vs changes; a write is a write, no matter if the file existed before it happened or not.
Q: What if the client keeps the file open for a very long time? A: Greyhole doesn't work on locked files. It consider a file locked if lsof lists it as open for writing.
Q: I assume there is a process that occasionally scans for dead symlinks and rebuilds them? A: --fsck. It checks daily if the configuration changed since the last --fsck, and will run then, and it runs once weekly (no conditions).
Q: Is it possible to tell Greyhole you're going to remove a drive, and it needs to maintain redundancy elsewhere before you do? A: greyhole --going=/path/to/drive/that/you/want/to/remove
Q: Does the filesystem of the volumes matter? I assume its probably ext4, usually? A: Not really, as long as it's a Linux filesystem (I wouldn't try NTFS or HFS).
Q: From "in details" I assume all client operations are executed in order, even after a crash, however (not being very familiar with VFS), is this before or after samba? For instance, if the client does things to the samba copy of the file, while the original version is still being rsynced to the greyhole volumes? Is locking used? A: Indeed, locking is checked. Greyhole skips operations on locked files.
Q: Does the MySQL database get cleaned up periodically? A: Each time the Greyhole service starts, it will optimize the work table. The archive table is never cleaned, and can become quite big after a while.
Q: And there is also what happens if its the system volume, or the volume containing the mysql database (before current operations are finished) dies. A: If you loose the landing zone (the system drive by default on Amahi): you'll loose the files Greyhole didn't have time to move into the pool, obviously. If the database disappears: you'd loose the list of pending operations that Greyhole didn't have time to work on yet. Nothing a--fsck can't fix.
It would be helpful to point out, that if you are (say) trying out Amahi, deciding that you like it, and wish to add some Greyhole storage (for poor man's replication), that a "greyhole --fsck" will do a nice job of queueing the files already stored in Amahi, into the Greyhole drives.
This is a noob (I resemble that remark) sort of thing to want to do.
In the case of rm /mnt/Music/File2 on the Client, shouldn't there a last line of rm /shares/Music/file2 on the Server side?
Is the database needed after the tasks are completed?
If yes, how is the database re-created, if the disk containing it crashes?
Thanks!
For "rm /mnt/Music/File2" on the Client, Samba takes care to delete the "/shares/Music/file2" symlink on the server, so Greyhole doesn't need to do it itself.
The database is not required to rebuild a Greyhole storage pool. If it disappears, re-creating an empty DB is enough for Greyhole to get back to work. The database stores old tasks, but only to be able to debug previous operations, if needed. Otherwise, there's the settings table that stores the sticky directories values, if any. It's better if those don't change, but if they do (because the old values disappeared, and new ones are chosen), it's not the end of the world. That would just mean your data would be moved around if you launch a --balance. And, you can always edit the sticky directories manually if you know where you want to keep sticky files.
I love the idea of greyhole, but I find the samba dependency a bit of a pain -- apart from being difficult to configure, it seems overkill if I want to use the JBOD functionality on linux.
It seems to be that, it should be possible to reuse greyhole with FUSE -- it could also log to a (the same!) spool file, and call greyhole at the right time, as well as hiding the symlinks. At the backend, greyhole could do, well, exactly the same thing. You could even use FUSE to mount the share directory for samba. But having both options (FUSE or Samba) would be good.
Don't know if there are PhP bindings for FUSE which would be the neatest. There are ones in python, I know.
How stable is the log file format? Would this be possible?
Phil
Phil: yes, this is technically possible, but not on the roadmap for now. More options are listed in this issue: http://code.google.com/p/greyhole/issues/detail?id=35 Feel free to add your comments there, so that everything about this is all together.
Questions: what happens if the client does any of those operations while the server is still processing actions from the previous client operation? One example: client deletes and then immediately writes a new file2.
What happens if the server crashes at any point?
What happens if the client updates (not replaces) a file at any point? (seek to sector 234323, and overwrite 3 sectors. What if the client keeps the file open for a very long time?)
I assume there is a process that occasionally scans for dead symlinks and rebuilds them? (possibly to a different volume, if the original volume died)
(I am seriously considering replacing WHS with this (via Amahi), and while I don't need it to be enterprise ready or real-time, I do need it to be safe). (I'm also considering Lustre, but that seems like massive overkill for a home).
Is it possible to tell Greyhole you're going to remove a drive, and it needs to maintain redundancy elsewhere before you do?
Does the filesystem of the volumes matter? I assume its probably ext4, usually?
From "in details" I assume all client operations are executed in order, even after a crash, however (not being very familiar with VFS), is this before or after samba? For instance, if the client does things to the samba copy of the file, while the original version is still being rsynced to the greyhole volumes? Is locking used?
Basically, I'm hoping the client gets an ACID transaction view of the filesystem.
Does the MySQL database get cleaned up periodically?
And there is also what happens if its the system volume, or the volume containing the mysql database (before current operations are finished) dies.
Sorry, not trying to be a pest, but I want to understand all possible states that this could be in, and what happens; (Drive-Extender refugee, insert all the reasons people want DE vs traditional RAID here).
Q: What happens if the client does any of those operations while the server is still processing actions from the previous client operation? A: The handling for those changed recently (in 0.8.99). Greyhole now tries to make sure it's not doing anything destructive by looking at the 'future' name of files it processes.
Q: What happens if the server crashes at any point? A: When it comes back online, it will restart the operation it was working on when the crash happened.
Q: What happens if the client updates (not replaces) a file at any point? (seek to sector 234323, and overwrite 3 sectors. A: Greyhole sees that as a write. It does not differentiate writes vs changes; a write is a write, no matter if the file existed before it happened or not.
Q: What if the client keeps the file open for a very long time? A: Greyhole doesn't work on locked files. It consider a file locked if lsof lists it as open for writing.
Q: I assume there is a process that occasionally scans for dead symlinks and rebuilds them? A: --fsck. It checks daily if the configuration changed since the last --fsck, and will run then, and it runs once weekly (no conditions).
Q: Is it possible to tell Greyhole you're going to remove a drive, and it needs to maintain redundancy elsewhere before you do? A: greyhole --going=/path/to/drive/that/you/want/to/remove
Q: Does the filesystem of the volumes matter? I assume its probably ext4, usually? A: Not really, as long as it's a Linux filesystem (I wouldn't try NTFS or HFS).
Q: From "in details" I assume all client operations are executed in order, even after a crash, however (not being very familiar with VFS), is this before or after samba? For instance, if the client does things to the samba copy of the file, while the original version is still being rsynced to the greyhole volumes? Is locking used? A: Indeed, locking is checked. Greyhole skips operations on locked files.
Q: Does the MySQL database get cleaned up periodically? A: Each time the Greyhole service starts, it will optimize the work table. The archive table is never cleaned, and can become quite big after a while.
Q: And there is also what happens if its the system volume, or the volume containing the mysql database (before current operations are finished) dies. A: If you loose the landing zone (the system drive by default on Amahi): you'll loose the files Greyhole didn't have time to move into the pool, obviously. If the database disappears: you'd loose the list of pending operations that Greyhole didn't have time to work on yet. Nothing a--fsck can't fix.
It would be helpful to point out, that if you are (say) trying out Amahi, deciding that you like it, and wish to add some Greyhole storage (for poor man's replication), that a "greyhole --fsck" will do a nice job of queueing the files already stored in Amahi, into the Greyhole drives.
This is a noob (I resemble that remark) sort of thing to want to do.