Quantcast

GetAttr calls being serialised

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

GetAttr calls being serialised

abulford
Hi all,

I'm implementing a FUSE filesystem which talks to a RESTful API to get the files and file information, this is sometimes over a high latency network, file information and the files themselves are cached.  Typically I use getattr to trigger a HEAD request to the API if I don't already have the file information cached, a subsequent open will result in a GET request to the API.

I'm finding that the getattr calls appear to be coming in sequentially, so if a single getattr call is taking place then no other getattr call will be made.  Since a getattr can result in a HEAD request, which can take up to 1 second on the high latency network, this means no other file access can happen while the HEAD request is taking place, which could be problematic for me.

I am running in the default multi-threaded mode and am finding that other requests, such as open, do seem able to run in parallel, so I'm unsure why getattr would be forced to be run sequentially.

I've already posted this question on stackoverflow, with a code sample and a bit more detail, you can see the question here: http://stackoverflow.com/questions/18471238/should-the-fuse-getattr-operation-always-be-single-threaded

I've tried this on two boxes, Kubuntu kernel version 3.8.0 with FUSE version 2.9 as well as CentOS (running on XEN) kernel version 2.6.18 FUSE version 2.7.4, I get the same results on both.

I'm quite concerned about this because an issue with the endpoint or network could result in all file access being blocked, even if the files are cached locally.

Am I completely missing something, or is this a known/intentional constraint in FUSE?

Thanks,

Andy
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: GetAttr calls being serialised

David Strauss-5
On Fri, Sep 6, 2013 at 4:12 AM, abulford <[hidden email]> wrote:
> I'm finding that the getattr calls appear to be coming in sequentially, so
> if a single getattr call is taking place then no other getattr call will be
> made.  Since a getattr can result in a HEAD request, which can take up to 1
> second on the high latency network, this means no other file access can
> happen while the HEAD request is taking place, which could be problematic
> for me.

The main reason you'd see a long sequence of getattr calls is
generally a follow-up to information returned from readdir. In our
file systems, we try to pre-fetch and cache the attribute information
(which will inevitably be requested) during readdir so we don't have
to hit the backend for each one.

--
David Strauss
   | [hidden email]
   | +1 512 577 5827 [mobile]

------------------------------------------------------------------------------
How ServiceNow helps IT people transform IT departments:
1. Consolidate legacy IT systems to a single system of record for IT
2. Standardize and globalize service processes across IT
3. Implement zero-touch automation to replace manual, redundant tasks
http://pubads.g.doubleclick.net/gampad/clk?id=51271111&iu=/4140/ostg.clktrk
_______________________________________________
fuse-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/fuse-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: GetAttr calls being serialised

abulford
<quote author="David Strauss-5">
> The main reason you'd see a long sequence of getattr calls is
> generally a follow-up to information returned from readdir. In our
> file systems, we try to pre-fetch and cache the attribute information
> (which will inevitably be requested) during readdir so we don't have
> to hit the backend for each one.

I see what you mean - normally a user might list the content of a directory, which would result in a readdir followed by a getattr call for each file in the directory, and there could be lots, generating a string of sequential getattr calls, like those my question asks about.  In my situation my test harness is specifically trying to open a large number of known file paths all at the same time (using multiple threads).  It's calling fopen on the full path, rather than determining the contents of its parent directory through readdir, so unfortunately I'm unable to make use of readdir to pre-fetch the content as you suggest.

Your comment does relate to something I've recently discovered, though - the sequential access is only on a per directory basis.

So, for example, if the test harness opens '/mnt/fs/dir1/file1.ext' and '/mnt/fs/dir1/file2.ext' then I would first get the getattr call for '/dir1/file1.ext' and only once it's finished (which is slow, when I intentionally put a delay in the endpoint) do I get the gettr call for '/dir1/file2.ext'.

However, if my test harness opens '/mnt/fs/dir1/file1.ext' and '/mnt/fs/dir2/file2.ext' then I get both getattr calls immediately, so they are both executing at the same time.

Essentially, if paths are in different parts of the tree they don't seem to affect each other.  When looking through the source code before I noticed mention of trees, so I'm going to have a bit of a closer look.

I've also found that subsequent calls of getattr to the same path are not forced to run in parallel.  Due to the caching in my file system it wasn't immediately obvious, because subsequent getattr calls are always very quick (getting the information from the cache instead of the endpoint), so it wouldn't really matter if they all happened sequentially.  However, I've noticed that when running in a mode where information is not cached, multiple getattrs are still able to run in parallel, even within the same directory, if getattr has been called for the path before.  I guess this is a result of FUSE's caching, something else I will look in to.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: GetAttr calls being serialised

Miklos Szeredi
On Mon, Sep 16, 2013 at 7:00 PM, abulford <[hidden email]> wrote:

>
>> The main reason you'd see a long sequence of getattr calls is
>> generally a follow-up to information returned from readdir. In our
>> file systems, we try to pre-fetch and cache the attribute information
>> (which will inevitably be requested) during readdir so we don't have
>> to hit the backend for each one.
>
> I see what you mean - normally a user might list the content of a directory,
> which would result in a readdir followed by a getattr call for each file in
> the directory, and there could be lots, generating a string of sequential
> getattr calls, like those my question asks about.  In my situation my test
> harness is specifically trying to open a large number of known file paths
> all at the same time (using multiple threads).  It's calling fopen on the
> full path, rather than determining the contents of its parent directory
> through readdir, so unfortunately I'm unable to make use of readdir to
> pre-fetch the content as you suggest.
>
> Your comment does relate to something I've recently discovered, though - the
> sequential access is only on a per directory basis.

Lookup (i.e. first finding the file associated with a name) is
serialized per directory.  This is in the VFS (the common filesystem
part in the kernel), so basically any filesystem is susceptible to
this issue, not just fuse.

And you can still do what David suggested, despite not using readdir:
the filesystem code detects that multiple entries are being looked up
in a  directory, so it triggers an internal readdir request to prime
the cache and then subsequent lookups can be served quickly.

Thanks,
Miklos

------------------------------------------------------------------------------
LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99!
1,500+ hours of tutorials including VisualStudio 2012, Windows 8, SharePoint
2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack includes
Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/20/13.
http://pubads.g.doubleclick.net/gampad/clk?id=58041151&iu=/4140/ostg.clktrk
_______________________________________________
fuse-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/fuse-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: GetAttr calls being serialised

abulford
On Fri, Sep 20, 2013 at 11:18 AM, Miklos Szeredi <[hidden email]> wrote:

> Lookup (i.e. first finding the file associated with a name) is
> serialized per directory.  This is in the VFS (the common filesystem
> part in the kernel), so basically any filesystem is susceptible to
> this issue, not just fuse.

I understand, thanks for the explanation.

> And you can still do what David suggested, despite not using readdir:
> the filesystem code detects that multiple entries are being looked up
> in a  directory, so it triggers an internal readdir request to prime
> the cache and then subsequent lookups can be served quickly.

I'm afraid I don't understand what you mean by this - I have readdir
implemented and just writing to a log to say it's been called, but it's not
showing any calls coming through.  Is an 'internal readdir' different to
the readdir in my FUSE implementation?  And if so, how can I hook in to
this call?

Many thanks,

Andrew
------------------------------------------------------------------------------
LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99!
1,500+ hours of tutorials including VisualStudio 2012, Windows 8, SharePoint
2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack includes
Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/20/13.
http://pubads.g.doubleclick.net/gampad/clk?id=58041151&iu=/4140/ostg.clktrk
_______________________________________________
fuse-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/fuse-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: GetAttr calls being serialised

Miklos Szeredi
On Fri, Sep 20, 2013 at 1:28 PM, Andrew Bulford <[hidden email]> wrote:

>> And you can still do what David suggested, despite not using readdir:
>> the filesystem code detects that multiple entries are being looked up
>> in a  directory, so it triggers an internal readdir request to prime
>> the cache and then subsequent lookups can be served quickly.
>
> I'm afraid I don't understand what you mean by this - I have readdir
> implemented and just writing to a log to say it's been called, but it's not
> showing any calls coming through.  Is an 'internal readdir' different to the
> readdir in my FUSE implementation?  And if so, how can I hook in to this
> call?

I'm not familiar with the RESTful API.  If you can't enumerate the
files in a directory then my idea won't going to help.

If yoy can enumerate all files (i.e. readdir) then you can effectively
prime your cache in your ->getattr() implementation which means no
more slow , serialized requests over the net.

Thanks,
Miklos

------------------------------------------------------------------------------
LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99!
1,500+ hours of tutorials including VisualStudio 2012, Windows 8, SharePoint
2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack includes
Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/20/13.
http://pubads.g.doubleclick.net/gampad/clk?id=58041151&iu=/4140/ostg.clktrk
_______________________________________________
fuse-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/fuse-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: GetAttr calls being serialised

abulford
On Fri, Sep 20, 2013 at 12:46 PM, Miklos Szeredi <[hidden email]> wrote:

> I'm not familiar with the RESTful API.  If you can't enumerate the
> files in a directory then my idea won't going to help.

> If yoy can enumerate all files (i.e. readdir) then you can effectively
> prime your cache in your ->getattr() implementation which means no
> more slow , serialized requests over the net.

The RESTful API doesn't implement directories, keys to objects might happen
to contain forward slashes, but these are not interpreted in any special
way by the API.

Thank you for your suggestion, but unfortunately I don't think I'll be able
to prime the cache in the way you suggest.  Luckily I don't think this
should be as much of an issue to me as I first expected, the file system's
clients do use directories with quite a good spread, so it's going to be
pretty rare that multiple getattr requests arrive at the same time in the
same directory.

Many thanks for your help, it's good to know I didn't just have a config
wrong!

Andy
------------------------------------------------------------------------------
LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99!
1,500+ hours of tutorials including VisualStudio 2012, Windows 8, SharePoint
2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack includes
Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/20/13.
http://pubads.g.doubleclick.net/gampad/clk?id=58041151&iu=/4140/ostg.clktrk
_______________________________________________
fuse-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/fuse-devel
Loading...