Soliciting comments for FUSE filesystem idea

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Soliciting comments for FUSE filesystem idea

Bryan Ischo-7

Hi all.  I've been working through some initial design work for a FUSE
filesystem that I would like some input on.

I'm slowly working towards an implementation of an Amazon S3 FUSE
filesystem.  My goal is for it to be fully POSIX compliant, and with some
very sophisticated features, one of which is local file caching.

For those of you who don't know, Amazon S3 is a web-based file storage
service with fairly low prices and some other nice characteristics.  It
also has some significant and troubling shortcomings, but that's beside
the point here.

One issue I want to address is the performance problems that come from
using a networked source for file storage, especially one where the
network in question is the Internet rather than a local network.  In such
a situation latencies and bandwidth can have a hugely negative impact on
performance, and I want sophisticated caching to help alleviate these
problems.

Originally I had intended to implement some S3-specific caching into my
FUSE filesystem, but then I realized that this problem can be solved in
such a way as to be useful to any FUSE filesystem implementation, and this
solution is what I am hoping to discuss here.

So I'm thinking of writing:

- A library for locally caching files
- A FUSE filesystem based on this library

The purpose of this library would be to act as a caching layer, in between
the FUSE API itself and the back-end filesystem implementation.  As a
library, it could be used by any FUSE filesystem that wanted to avoid any
extra kernel-to-userspace overhead, by implementing the caching directly
in whatever FUSE filesystem includes it.

Creating a FUSE filesystem based on this library would also allow any FUSE
filesystem that wasn't written to incorporate the library, to get the
benefit of local file caching, but at the expense of another round trip
through a FUSE filesystem.

As a simple introduction to the caching that I would like to implement:

1. A configuration parameter would specify the local filesystem directory
and maximum number of bytes to cache at any one time
2. Any request to read file data would result in a call from the caching
layer down to the back-end filesystem implementation to fetch the data,
and then the caching layer would cache that data on the local filesystem
so that subsequent requests for the same data would be served at local
filesystem speeds instead of always being constrained by network speeds.
3. Any request to write file data would be satisfied by the caching layer
immediately writing the data into the cache, and a separate flushing
thread would flush the modified data down to the back-end filesystem at a
later time
4. The read caching code would perform configurable "lookahead" and
"lookbehind" reads to try to anticipate subsequent read operations and
speed them up.  This is a tradeoff that may or may not help depending on
the back-end implementation and the usage scenario, so it would definitely
be highly configurable.

So as a library, it would be the kind of thing that I could easily "plug
into" my existing FUSE filesystem, turning all FUSE API calls that were
originally handled by my filesystem, into calls to the caching layer,
which then would make calls down into my original FUSE code as necessary.

The standalone FUSE filesystem based on the library, would provide this
service as follows:

The user would set up a mount of the caching FUSE filesystem tying the
filesystem to be cached to a mount point that all other system activity
would use.  For example, if I had an Amazon S3 FUSE filesystem called
"fuses3", I might issue commands something like this to mount and provide
a caching layer for it (assuming here that the caching FUSE filesystem was
named "fcache", which is my intended name for it):

a. fuses3 mys3bucket /mnt/rawmnt
b. fcache /mnt/rawmnt /mnt/mys3bucket

In (a), I am issuing a command to a FUSE filesystem implementation for
mounting an Amazon S3 bucket under the local mount point /mnt/rawmnt
(other options necessary for mounting this bucket as a filesystem are left
out here for the sake of brevity).

In (b), I am furthermore issuing a command to the caching FUSE filesystem
implementation to provide access to the filesystem at /mnt/rawmnt from a
new mount point, /mnt/mys3bucket, which is the mount point I would expect
the rest of the system to use to access the S3 bucket as a local
filesystem.  Access to this filesystem would proceed as calls into the
fcache FUSE filesystem, which would use its cache implementation to speed
up most access to local disk speeds, and reading/writing file data from/to
/mnt/rawmnt as necessary to fill the cache.  So for example:

If an application were to attempt to read file data from
/mnt/mys3bucket/somefile.txt, this would result in a call into the fcache
FUSE filesystem, which would read the data out of its cache if it is
available and return it to the caller.  But if the data were not cached,
the caching layer would simply read the same file data from
/mnt/rawmnt/somefile.txt, and cache the results for use by subsequent
reads from /mnt/mys3bucket/somefile.txt.

I hope I am being clear here: my goal is to implement a single caching
layer that will be useful for any FUSE filesystem whose data source is
significantly slower than local disk storage (which I would expect to be
the case for almost any internet-service-based FUSE filesystem).  I intend
to implement this caching layer as a library that can be linked into a
given FUSE filesystem implementation, thus providing this sophisticated
caching, and also to write a FUSE filesystem wrapper around this library
for providing this same service, not just as a library, but as a fully
fledged FUSE filesystem that can act as a cache for any other independent
filesystem implementation which suffers from the same internet performance
characteristics.

First question: has this been done already?  Am I re-inventing the wheel?
I searched around but couldn't find anything that matches what I am
proposing doing.

Second question: does this sound like a good idea?  Is it something that
other people can see being useful for their own FUSE filesystem
implementation, or useful as a caching layer in their own system to cache
what previously had been uncached filesystems with poor performance?

Third question: am I missing anything obvious in my conception of how this
would work, that could cause significant problems or make the whole idea a
non-starter?

Fourth question: would it in practice be a good idea to implement a FUSE
filesystem wrapper around the library, or should I just stick with the
library itself?  In other words, would it just be pure craziness to mount
a FUSE filesystem that was backed by another FUSE filesystem, thus
requiring some requests to filter down through two FUSE layers and back up
again?  Or is this likely to not really be a big problem?

Thanks, and best wishes,
Bryan


------------------------------------------------------------------------
Bryan Ischo                [hidden email]            2001 Mazda 626 GLX
Hamilton, New Zealand      http://www.ischo.com     RedHat Fedora Core 5
------------------------------------------------------------------------



-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
fuse-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/fuse-devel
Reply | Threaded
Open this post in threaded view
|

Re: Soliciting comments for FUSE filesystem idea

Patrick Eaton-2
Bryan,

> Second question: does this sound like a good idea?  Is it something that
> other people can see being useful for their own FUSE filesystem
> implementation, or useful as a caching layer in their own system to cache
> what previously had been uncached filesystems with poor performance?

If it is done well and provides a nice set of general hooks for others
to use, yes, I think you would find other users for the caching layer.

> Third question: am I missing anything obvious in my conception of how this
> would work, that could cause significant problems or make the whole idea a
> non-starter?

Yes, I do not see any considerations for determining if an object in
the cache is still valid.  Will you age objects based on simple
timeouts?  Or will you provide hooks so that the network filesystem
can callback into the caching layer when it determines that an object
in the cache should be invalidated and refetched from source?  There
are several other options too....

> For those of you who don't know, Amazon S3 is a web-based file storage
> service with fairly low prices and some other nice characteristics.  It
> also has some significant and troubling shortcomings, but that's beside
> the point here.

I would be interested in hearing what you consider the "significant
and troubling shortcomings" from your (or anybody else's) perspective
and background.  We can take that subtopic off-list.

cheers-
patrick



On Wed, Oct 22, 2008 at 6:52 AM, Bryan Ischo
<[hidden email]> wrote:

>
> Hi all.  I've been working through some initial design work for a FUSE
> filesystem that I would like some input on.
>
> I'm slowly working towards an implementation of an Amazon S3 FUSE
> filesystem.  My goal is for it to be fully POSIX compliant, and with some
> very sophisticated features, one of which is local file caching.
>
> For those of you who don't know, Amazon S3 is a web-based file storage
> service with fairly low prices and some other nice characteristics.  It
> also has some significant and troubling shortcomings, but that's beside
> the point here.
>
> One issue I want to address is the performance problems that come from
> using a networked source for file storage, especially one where the
> network in question is the Internet rather than a local network.  In such
> a situation latencies and bandwidth can have a hugely negative impact on
> performance, and I want sophisticated caching to help alleviate these
> problems.
>
> Originally I had intended to implement some S3-specific caching into my
> FUSE filesystem, but then I realized that this problem can be solved in
> such a way as to be useful to any FUSE filesystem implementation, and this
> solution is what I am hoping to discuss here.
>
> So I'm thinking of writing:
>
> - A library for locally caching files
> - A FUSE filesystem based on this library
>
> The purpose of this library would be to act as a caching layer, in between
> the FUSE API itself and the back-end filesystem implementation.  As a
> library, it could be used by any FUSE filesystem that wanted to avoid any
> extra kernel-to-userspace overhead, by implementing the caching directly
> in whatever FUSE filesystem includes it.
>
> Creating a FUSE filesystem based on this library would also allow any FUSE
> filesystem that wasn't written to incorporate the library, to get the
> benefit of local file caching, but at the expense of another round trip
> through a FUSE filesystem.
>
> As a simple introduction to the caching that I would like to implement:
>
> 1. A configuration parameter would specify the local filesystem directory
> and maximum number of bytes to cache at any one time
> 2. Any request to read file data would result in a call from the caching
> layer down to the back-end filesystem implementation to fetch the data,
> and then the caching layer would cache that data on the local filesystem
> so that subsequent requests for the same data would be served at local
> filesystem speeds instead of always being constrained by network speeds.
> 3. Any request to write file data would be satisfied by the caching layer
> immediately writing the data into the cache, and a separate flushing
> thread would flush the modified data down to the back-end filesystem at a
> later time
> 4. The read caching code would perform configurable "lookahead" and
> "lookbehind" reads to try to anticipate subsequent read operations and
> speed them up.  This is a tradeoff that may or may not help depending on
> the back-end implementation and the usage scenario, so it would definitely
> be highly configurable.
>
> So as a library, it would be the kind of thing that I could easily "plug
> into" my existing FUSE filesystem, turning all FUSE API calls that were
> originally handled by my filesystem, into calls to the caching layer,
> which then would make calls down into my original FUSE code as necessary.
>
> The standalone FUSE filesystem based on the library, would provide this
> service as follows:
>
> The user would set up a mount of the caching FUSE filesystem tying the
> filesystem to be cached to a mount point that all other system activity
> would use.  For example, if I had an Amazon S3 FUSE filesystem called
> "fuses3", I might issue commands something like this to mount and provide
> a caching layer for it (assuming here that the caching FUSE filesystem was
> named "fcache", which is my intended name for it):
>
> a. fuses3 mys3bucket /mnt/rawmnt
> b. fcache /mnt/rawmnt /mnt/mys3bucket
>
> In (a), I am issuing a command to a FUSE filesystem implementation for
> mounting an Amazon S3 bucket under the local mount point /mnt/rawmnt
> (other options necessary for mounting this bucket as a filesystem are left
> out here for the sake of brevity).
>
> In (b), I am furthermore issuing a command to the caching FUSE filesystem
> implementation to provide access to the filesystem at /mnt/rawmnt from a
> new mount point, /mnt/mys3bucket, which is the mount point I would expect
> the rest of the system to use to access the S3 bucket as a local
> filesystem.  Access to this filesystem would proceed as calls into the
> fcache FUSE filesystem, which would use its cache implementation to speed
> up most access to local disk speeds, and reading/writing file data from/to
> /mnt/rawmnt as necessary to fill the cache.  So for example:
>
> If an application were to attempt to read file data from
> /mnt/mys3bucket/somefile.txt, this would result in a call into the fcache
> FUSE filesystem, which would read the data out of its cache if it is
> available and return it to the caller.  But if the data were not cached,
> the caching layer would simply read the same file data from
> /mnt/rawmnt/somefile.txt, and cache the results for use by subsequent
> reads from /mnt/mys3bucket/somefile.txt.
>
> I hope I am being clear here: my goal is to implement a single caching
> layer that will be useful for any FUSE filesystem whose data source is
> significantly slower than local disk storage (which I would expect to be
> the case for almost any internet-service-based FUSE filesystem).  I intend
> to implement this caching layer as a library that can be linked into a
> given FUSE filesystem implementation, thus providing this sophisticated
> caching, and also to write a FUSE filesystem wrapper around this library
> for providing this same service, not just as a library, but as a fully
> fledged FUSE filesystem that can act as a cache for any other independent
> filesystem implementation which suffers from the same internet performance
> characteristics.
>
> First question: has this been done already?  Am I re-inventing the wheel?
> I searched around but couldn't find anything that matches what I am
> proposing doing.
>
> Second question: does this sound like a good idea?  Is it something that
> other people can see being useful for their own FUSE filesystem
> implementation, or useful as a caching layer in their own system to cache
> what previously had been uncached filesystems with poor performance?
>
> Third question: am I missing anything obvious in my conception of how this
> would work, that could cause significant problems or make the whole idea a
> non-starter?
>
> Fourth question: would it in practice be a good idea to implement a FUSE
> filesystem wrapper around the library, or should I just stick with the
> library itself?  In other words, would it just be pure craziness to mount
> a FUSE filesystem that was backed by another FUSE filesystem, thus
> requiring some requests to filter down through two FUSE layers and back up
> again?  Or is this likely to not really be a big problem?
>
> Thanks, and best wishes,
> Bryan
>
>
> ------------------------------------------------------------------------
> Bryan Ischo                [hidden email]            2001 Mazda 626 GLX
> Hamilton, New Zealand      http://www.ischo.com     RedHat Fedora Core 5
> ------------------------------------------------------------------------
>
>
>
> -------------------------------------------------------------------------
> This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
> Build the coolest Linux based applications with Moblin SDK & win great prizes
> Grand prize is a trip for two to an Open Source event anywhere in the world
> http://moblin-contest.org/redirect.php?banner_id=100&url=/
> _______________________________________________
> fuse-devel mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/fuse-devel
>

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
fuse-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/fuse-devel
Reply | Threaded
Open this post in threaded view
|

Re: Soliciting comments for FUSE filesystem idea

Nikolaus Rath
"Patrick Eaton" <[hidden email]> writes:
>> For those of you who don't know, Amazon S3 is a web-based file storage
>> service with fairly low prices and some other nice characteristics.  It
>> also has some significant and troubling shortcomings, but that's beside
>> the point here.
>
> I would be interested in hearing what you consider the "significant
> and troubling shortcomings" from your (or anybody else's) perspective
> and background.  We can take that subtopic off-list.

Please keep it on the list. S3 filesystems seem to be one of the most
popular projects nowadays. I know of at least 3 (s3backer, s3fs (the
implementation hosted on google code), s3fs (the implementation that
is discussed on the Amazon AWS forums), and I'm also working on my own
implementation (http://code.google.com/p/s3ql/).

Best,


   -Nikolaus

--
 »It is not worth an intelligent man's time to be in the majority.
  By definition, there are already enough people to do that.«
                                                         -J.H. Hardy

  PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6  02CF A9AD B7F8 AE4E 425C


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
fuse-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/fuse-devel
Reply | Threaded
Open this post in threaded view
|

Re: Soliciting comments for FUSE filesystem idea

Allen Pulsifer-3
> S3 filesystems seem to be one of the most popular projects nowadays.

Add another to your list:

  PersistentFS http://www.PersistentFS.com

POSIX-compliant and includes extensive caching


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
fuse-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/fuse-devel
Reply | Threaded
Open this post in threaded view
|

Re: Soliciting comments for FUSE filesystem idea

Bryan Ischo-7
In reply to this post by Patrick Eaton-2
Thank you for your response!

Patrick Eaton wrote:

>> Third question: am I missing anything obvious in my conception of how this
>> would work, that could cause significant problems or make the whole idea a
>> non-starter?
>>    
>
> Yes, I do not see any considerations for determining if an object in
> the cache is still valid.  Will you age objects based on simple
> timeouts?  Or will you provide hooks so that the network filesystem
> can callback into the caching layer when it determines that an object
> in the cache should be invalidated and refetched from source?  There
> are several other options too....
>  
Yes, I left quite a bit of detail out when I was describing the general
concept of what I am planning to do.  Indeed, configuration options for
controlling the behavior of this caching layer would need to be very
extensive to allow it to be useful in a wide variety of applications.

In terms of your specific point, I would expect this to be configurable;
I have considered both option you specified (simple timeouts and also
allowing the back-end filesystem to invalidate cache items manually).  I
haven't thought of too many other options, but if you have anything in
mind I'd be grateful to hear it.

In addition, there would need to be similar options for controlling how
often to flush writes to the back-end filesystem.

>> For those of you who don't know, Amazon S3 is a web-based file storage
>> service with fairly low prices and some other nice characteristics.  It
>> also has some significant and troubling shortcomings, but that's beside
>> the point here.
>>    
>
> I would be interested in hearing what you consider the "significant
> and troubling shortcomings" from your (or anybody else's) perspective
> and background.  We can take that subtopic off-list.
>  

Since there were some subsequent requests to keep this particular part
of the discussion on-list, I'll respond here.

The most significant shortcoming of S3 is that it does not support
"partial file writes", meaning that writing any part of a file to S3
requires writing the entire file.  Whereas when reading files it is
possible to use byte-range HTTP headers to control the range of bytes to
be read, it is not possible to do this when writing files.  So if the
user has a large 3 GB database file and an application updates just a
few K of this file, the ENTIRE 3 GB of the changed file must be
re-uploaded to Amazon S3 to effect that change.  This is another reason
for this caching layer I am speaking of - to coalesce writes so as to
minimize the number of times that a file needs to be uploaded to S3 -
but this only helps a little bit.  I consider this to be a major problem
in that it results in significant extra costs both in terms of money and
in terms of bandwidth usage.  A filesystem in which lots of changes were
going on might end up becoming rate-limited by the upload bandwidth as
the write cache fills up and flushes of entire file flushes are performed.

What's even worse is that if there is a network failure while a large
file is being written to S3, then the upload must start over again; it
is not possible to resume writing at the last written byte.  With large
files, this can be a very significant problem.

A secondary problem is S3's "eventual consistency" mechanism.  There can
be a gap of time of indeterminite length between the writing of a file
to S3, and the reading of the same file contents back from S3.  In other
words, when you write a file up to S3, it can take some time for the
same file contents to be read back from S3.  During this gap, some
juggling would have to be performed by the caching layer to continue to
show the user the version of the file that they wrote instead of the
out-of-date version of the file that S3 will temporarily return.  This
is an issue that can be worked around in cache in this way, but since
files would have to be "pinned" in the read cache while waiting for S3
eventual consistency, in theory the read cache could fill up and hold
off reads indefinitely while waiting for some of these pinned reads to
become unpinned.

Also, there's the limit of 5 GB to files stored in S3.  This is very
unfortunate - however, given the problems described above with
whole-file uploads, it's not clear how wise it would be to store files
bigger than this on S3 anyway.

Additionally, S3 provides no file locking mechanism, making simultaneous
use of S3 mounted filesystems by multiple hosts a dicey proposition; it
becomes impossible to implement POSIX file locking semantics in such a
scenario.

Another problem that is not related to using S3 as a filesystem, but
that limits its overall utility, is the fact that HTTP requests to an S3
bucket that do not include a path, result in an XML list bucket
response, rather than redirecting to an index.html key.  So if you have
a bucket named foo, and you have done your own CNAME mapping of
foo.your.com to foo.s3.amazonaws.com, then a user who types this into
their browser:

http://foo.your.com

Will not be redirected to "http://foo.your.com/index.html" - instead,
they will get a nasty looking clump of XML.  This means that hosting a
static web site on S3 is significantly less attractive.  I realize that
whole-site-hosting is not really Amazon's intended purpose of S3, but it
seems like a very insignificant additional functionality for S3 that
would result in a considerably more useful service.  And yet Amazon
hasn't provided it despite numerous requests.

The fact that Amazon hasn't implemented the "low-hanging fruit" features
that many developers have clamored for on their developer forums, is
troubling too.  I am considering doing a large body of work to produce
software development tools that enhance the value of S3 (I've already
done some with my libs3 library at http://libs3.ischo.com/index.html),
and it concerns me that small additional features that Amazon could add
to increase the value of the tools I am developing, are routinely pushed
off by Amazon's S3 development team.  It gives one the sense that
developing tools for S3 is in effect throwing one's efforts out there
without equal consideration by Amazon, who will not "lift a finger" to
help you make your tool better by supporting small additions to the
service on their end.  I can see being very frustrated by this issue if
I completed an Amazon S3 FUSE filesystem and Amazon never addressed any
of the concerns I have that could make my tool so much better.

Finally, S3 has had much worse reliability than I expected from such a
service.  I have read enough problem reports on the S3 developer forums
to realize that it's not nearly as rock-solid as you might assume.

Thanks, and best wishes,
Bryan


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
fuse-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/fuse-devel
Reply | Threaded
Open this post in threaded view
|

Re: Soliciting comments for FUSE filesystem idea

Nikolaus Rath
Bryan Ischo <[hidden email]> writes:

>>> For those of you who don't know, Amazon S3 is a web-based file storage
>>> service with fairly low prices and some other nice characteristics.  It
>>> also has some significant and troubling shortcomings, but that's beside
>>> the point here.
>>>    
>>
>> I would be interested in hearing what you consider the "significant
>> and troubling shortcomings" from your (or anybody else's) perspective
>> and background.  We can take that subtopic off-list.
>>  
>
> Since there were some subsequent requests to keep this particular part
> of the discussion on-list, I'll respond here.
>
> The most significant shortcoming of S3 is that it does not support
> "partial file writes", meaning that writing any part of a file to S3
> requires writing the entire file.  Whereas when reading files it is
> possible to use byte-range HTTP headers to control the range of bytes to
> be read, it is not possible to do this when writing files.  So if the
> user has a large 3 GB database file and an application updates just a
> few K of this file, the ENTIRE 3 GB of the changed file must be
> re-uploaded to Amazon S3 to effect that change.

Why are you trying to map files directly to blocks? That also gives
you a file size limit of 4 GB.

> Additionally, S3 provides no file locking mechanism, making
> simultaneous use of S3 mounted filesystems by multiple hosts a dicey
> proposition; it becomes impossible to implement POSIX file locking
> semantics in such a scenario.

I think you misunderstood the concept of S3. It is not meant to be a
file system but a block storage layer. Therefore the lack of features
like partial writes or file locking is by design - your harddrive
doesn't have those features either.

The way to use S3 is as a storage device *for* a filesystem. That
filesystem should of course take into account the peculiarities of the
block layer, but it will nevertheless be a full-fledged filesystem.

Best,

   -Nikolaus

--
 »It is not worth an intelligent man's time to be in the majority.
  By definition, there are already enough people to do that.«
                                                         -J.H. Hardy

  PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6  02CF A9AD B7F8 AE4E 425C


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
fuse-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/fuse-devel
Reply | Threaded
Open this post in threaded view
|

Re: Soliciting comments for FUSE filesystem idea

Bryan Ischo-7
Nikolaus Rath wrote:

> Bryan Ischo <[hidden email]> writes:
>  
>>>> For those of you who don't know, Amazon S3 is a web-based file storage
>>>> service with fairly low prices and some other nice characteristics.  It
>>>> also has some significant and troubling shortcomings, but that's beside
>>>> the point here.
>>>>    
>>>>        
>>> I would be interested in hearing what you consider the "significant
>>> and troubling shortcomings" from your (or anybody else's) perspective
>>> and background.  We can take that subtopic off-list.
>>>  
>>>      
>> Since there were some subsequent requests to keep this particular part
>> of the discussion on-list, I'll respond here.
>>
>> The most significant shortcoming of S3 is that it does not support
>> "partial file writes", meaning that writing any part of a file to S3
>> requires writing the entire file.  Whereas when reading files it is
>> possible to use byte-range HTTP headers to control the range of bytes to
>> be read, it is not possible to do this when writing files.  So if the
>> user has a large 3 GB database file and an application updates just a
>> few K of this file, the ENTIRE 3 GB of the changed file must be
>> re-uploaded to Amazon S3 to effect that change.
>>    
>
> Why are you trying to map files directly to blocks? That also gives
> you a file size limit of 4 GB.
>  

I think we must be miscommunicating here, as I am talking about files
and you are talking about blocks.  Technically S3 calls them "objects"
and I see no fundamental problem with mapping one "object" per FUSE
file.  I did not realize that FUSE limited files to 4 GB in length, and
if that is so, then indeed the 5 GB limit of S3's objects is irrelevent.

One reason to map files to S3 objects is that each file is then
available via normal HTTP access mechanisms.  So I could copy a file to
the FUSE S3 filesystem, then set it to be publicly readable (I have
ideas how to do this in a FUSE filesystem, using special group ownership
and file permissions on the FUSE side to control "canned ACL" policies
of the S3 objects), and then the file could be downloaded by a regular
web browser.  This would be nice because I could mount and S3
"filesystem", and use that filesystem to host files on S3; this would be
one easy way to host a static web site.

If I wanted to use S3 as a "block storage" mechanism then I would just
use s3backer, which is a very nice project that uses FUSE to present a
single large file to the user in which treats like a block device using
loopback mounting.  The net result is as if the S3 bucket is a virtual
hard drive.  However, this approach has some shortcomings, depending
upon your usage pattern.  The splitting of files up into 4K chunks, each
one stored in an S3 object, may result in a significant increase in S3
transaction costs.  Also, there are issues with the blocks not being
deleted from S3 when they are no longer used since filesystems typically
don't zero out blocks on the block device they are using when those
blocks are no longer needed (and zeroing out blocks is the only way for
s3backer to delete them from S3).  And finally, s3backer hosted files
are not directly accessible from web browsers.

>> Additionally, S3 provides no file locking mechanism, making
>> simultaneous use of S3 mounted filesystems by multiple hosts a dicey
>> proposition; it becomes impossible to implement POSIX file locking
>> semantics in such a scenario.
>>    
>
> I think you misunderstood the concept of S3. It is not meant to be a
> file system but a block storage layer. Therefore the lack of features
> like partial writes or file locking is by design - your harddrive
> doesn't have those features either.
>  
I am not particularly interested in what S3 was "meant" to be except
where that affects the existing or potential functionality of the
service.  I see no problem in using S3 in creative new ways to provide
functionality that, while it may not have been "intended" for, are
possible and work well.

I realize that file locking will likely never be in S3, and that's not a
particularly large issue for me, so probably I was incorrect in
including it in my list of "significant and troubling" issues.  I think
it's "significant" in its impact on an S3-based POSIX filesystem, but I
don't think it's "troublesome" as it's not something that could
reasonably be expected from such a service anyway.

> The way to use S3 is as a storage device *for* a filesystem. That
> filesystem should of course take into account the peculiarities of the
> block layer, but it will nevertheless be a full-fledged filesystem.
>  
Once again I think we are miscommunicating.  I don't understand the
difference between using S3 "as a storage device *for* a filesystem",
and what I am proposing to do - except that I'm not treating it as a
virtual block device, but as a whole-file storage mechanism.

As others have pointed out, several FUSE-based S3 filesystems already
exist.  So I think the concept has already been proven to a large degree.

Thanks,
Bryan


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
fuse-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/fuse-devel
Reply | Threaded
Open this post in threaded view
|

Re: Soliciting comments for FUSE filesystem idea

Bryan Ischo-7
In reply to this post by Nikolaus Rath
Nikolaus Rath wrote:
> I think you misunderstood the concept of S3. It is not meant to be a
> file system but a block storage layer. Therefore the lack of features
> like partial writes or file locking is by design - your harddrive
> doesn't have those features either.
>
> The way to use S3 is as a storage device *for* a filesystem. That
> filesystem should of course take into account the peculiarities of the
> block layer, but it will nevertheless be a full-fledged filesystem.
>  
Oh, and I forgot to point out in my previous email response - I don't
think this is a correct assessment of the concept of S3.  S3 is clearly
meant to be a whole-file storage mechanism, not a block storage
mechanism.  Amazon provides extensive support for hosting complete files
in S3, including setting standard HTTP headers such as Content-Type and
Content-Disposition.  Amazon's documentation also mentions using their
service as full-file storage as its primary purpose.  And furthermore,
there would be no reason to support 5 GB file lengths in S3 if S3 were
meant only to be used for block storage.  A maximum object size of 5 MB
would suffice if that were the case.

Thanks,
Bryan


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
fuse-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/fuse-devel
Reply | Threaded
Open this post in threaded view
|

Re: Soliciting comments for FUSE filesystem idea

Allen Pulsifer-3
In reply to this post by Nikolaus Rath
> The way to use S3 is as a storage device *for* a filesystem.
> That filesystem should of course take into account the
> peculiarities of the block layer, but it will nevertheless be
> a full-fledged filesystem.

This is pretty much exactly how PersistentFS works.


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
fuse-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/fuse-devel
Reply | Threaded
Open this post in threaded view
|

Re: Soliciting comments for FUSE filesystem idea

Nikolaus Rath
In reply to this post by Bryan Ischo-7
Bryan Ischo <[hidden email]> writes:

> Nikolaus Rath wrote:
>> Bryan Ischo <[hidden email]> writes:
>>  
>>>>> For those of you who don't know, Amazon S3 is a web-based file storage
>>>>> service with fairly low prices and some other nice characteristics.  It
>>>>> also has some significant and troubling shortcomings, but that's beside
>>>>> the point here.
>>>>>    
>>>>>        
>>>> I would be interested in hearing what you consider the "significant
>>>> and troubling shortcomings" from your (or anybody else's) perspective
>>>> and background.  We can take that subtopic off-list.
>>>>  
>>>>      
>>> Since there were some subsequent requests to keep this particular part
>>> of the discussion on-list, I'll respond here.
>>>
>>> The most significant shortcoming of S3 is that it does not support
>>> "partial file writes", meaning that writing any part of a file to S3
>>> requires writing the entire file.  Whereas when reading files it is
>>> possible to use byte-range HTTP headers to control the range of bytes to
>>> be read, it is not possible to do this when writing files.  So if the
>>> user has a large 3 GB database file and an application updates just a
>>> few K of this file, the ENTIRE 3 GB of the changed file must be
>>> re-uploaded to Amazon S3 to effect that change.
>>>    
>>
>> Why are you trying to map files directly to blocks? That also gives
>> you a file size limit of 4 GB.
>>  
>
> I think we must be miscommunicating here, as I am talking about files
> and you are talking about blocks.

No, I was just confused while writing. I meant to say "s3 objects"
instead of "blocks". I was thinking of blocks, because I would rather
map blocks than files to S3 objects.


> Technically S3 calls them "objects" and I see no fundamental problem
> with mapping one "object" per FUSE file. I did not realize that FUSE
> limited files to 4 GB in length, and if that is so, then indeed the
> 5 GB limit of S3's objects is irrelevent.

No, I am talking about the S3 limitation (I think it is 4 GB and not 5
GB). I don't think that FUSE has any limits on the file size.

> One reason to map files to S3 objects is that each file is then
> available via normal HTTP access mechanisms. So I could copy a file
> to the FUSE S3 filesystem, then set it to be publicly readable (I
> have ideas how to do this in a FUSE filesystem, using special group
> ownership and file permissions on the FUSE side to control "canned
> ACL" policies of the S3 objects), and then the file could be
> downloaded by a regular web browser. This would be nice because I
> could mount and S3 "filesystem", and use that filesystem to host
> files on S3; this would be one easy way to host a static web site.

Yeah, I see your point. But I think S3 is just not made for that. IMO
the fact that in EC2, S3 is exposed as a block storage device
(/dev/sd*) rather than a mounted filesystem (e.g. /mnt/s3) also proves
that.

>> The way to use S3 is as a storage device *for* a filesystem. That
>> filesystem should of course take into account the peculiarities of the
>> block layer, but it will nevertheless be a full-fledged filesystem.
>>  
> Once again I think we are miscommunicating.  I don't understand the
> difference between using S3 "as a storage device *for* a filesystem",
> and what I am proposing to do - except that I'm not treating it as a
> virtual block device, but as a whole-file storage mechanism.

That's exactly my point. What I am trying to say is that the lack of
features like locking and partial writes is not a "significant
shortcoming". It is only a problem if you consider S3 as a file
storage mechanism, and that is, in my opinion, not fair. I would agree
that "S3 is not very good for file storage" but not that "S3 has
significant shortcomings".


> As others have pointed out, several FUSE-based S3 filesystems
> already exist. So I think the concept has already been proven to a
> large degree.

Sure. I wouldn't develop my own s3 filesystem if I did not agree with
you here.


Best,

   -Nikolaus

--
 »It is not worth an intelligent man's time to be in the majority.
  By definition, there are already enough people to do that.«
                                                         -J.H. Hardy

  PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6  02CF A9AD B7F8 AE4E 425C


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
fuse-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/fuse-devel
Reply | Threaded
Open this post in threaded view
|

Re: Soliciting comments for FUSE filesystem idea

Nikolaus Rath
In reply to this post by Allen Pulsifer-3
"Allen Pulsifer" <[hidden email]> writes:
>> S3 filesystems seem to be one of the most popular projects nowadays.
>
> Add another to your list:
>
>   PersistentFS http://www.PersistentFS.com
>
> POSIX-compliant and includes extensive caching

If you count in the commercial variants, then there is of course also
Jungle Disk, www.jungledisk.com.



Best,

   -Nikolaus

--
 »It is not worth an intelligent man's time to be in the majority.
  By definition, there are already enough people to do that.«
                                                         -J.H. Hardy

  PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6  02CF A9AD B7F8 AE4E 425C


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
fuse-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/fuse-devel