What is cached when your fuse does nothing for "open"?

classic Classic list List threaded Threaded
39 messages Options
12
Reply | Threaded
Open this post in threaded view
|

What is cached when your fuse does nothing for "open"?

Maruhan Park
Hello, I found that there are caching options, but now I'm not sure what FUSE caches.
I see that there is a "keep_cache" option for open, but what info is being cached? 
For example, what if a process tried to read "/mnt/fuse/file" where "/mnt/fuse" is the mount point of the fuse program, and I made the fuse program do nothing and return 0 for its open method? What gets cached then?

Also, I've read that you "entry, attr, file data" gets cached in the VFS. Is this referring to the dcache and inode cache?

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
--
fuse-devel mailing list
To unsubscribe or subscribe, visit https://lists.sourceforge.net/lists/listinfo/fuse-devel
Reply | Threaded
Open this post in threaded view
|

Re: What is cached when your fuse does nothing for "open"?

Stef Bon-2
The keep cache is handled here with glusterfs:

https://bugzilla.redhat.com/show_bug.cgi?id=833564

The way it is descibed is very comprehensive.

keep_cache does not invalidate cached pages for a file.

Stef

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
--
fuse-devel mailing list
To unsubscribe or subscribe, visit https://lists.sourceforge.net/lists/listinfo/fuse-devel
Reply | Threaded
Open this post in threaded view
|

Re: What is cached when your fuse does nothing for "open"?

Maruhan Park
Thanks for the reply.

I can't seem to find any information in that link about what info is being cached by FUSE (or VFS interacting with FUSE to be exact) though. I only see the info that keep_cache bypasses invalidation of page cache.

2017-01-03 5:09 GMT+09:00 Stef Bon <[hidden email]>:
The keep cache is handled here with glusterfs:

https://bugzilla.redhat.com/show_bug.cgi?id=833564

The way it is descibed is very comprehensive.

keep_cache does not invalidate cached pages for a file.

Stef


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
--
fuse-devel mailing list
To unsubscribe or subscribe, visit https://lists.sourceforge.net/lists/listinfo/fuse-devel
Reply | Threaded
Open this post in threaded view
|

Re: What is cached when your fuse does nothing for "open"?

Stef Bon-2
Well I can't help you then.
In the link is described the problem and an example. This example
reads a file many times, to show the difference when using the
keep_cache
option. If you stil don't know what is cached here......

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
--
fuse-devel mailing list
To unsubscribe or subscribe, visit https://lists.sourceforge.net/lists/listinfo/fuse-devel
Reply | Threaded
Open this post in threaded view
|

Re: What is cached when your fuse does nothing for "open"?

Maruhan Park
Yes it shows that caching improves read when using "cat". But I still don't know what cached information is allowing the performance boost.

I'm thinking the answer to this question of mine can help with that: "what if a process tried to read "/mnt/fuse/file" where "/mnt/fuse" is the mount point of the fuse program, and I made the fuse program do nothing and return 0 for its open operation method? What gets cached then?"

2017-01-03 18:48 GMT+09:00 Stef Bon <[hidden email]>:
Well I can't help you then.
In the link is described the problem and an example. This example
reads a file many times, to show the difference when using the
keep_cache
option. If you stil don't know what is cached here......


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
--
fuse-devel mailing list
To unsubscribe or subscribe, visit https://lists.sourceforge.net/lists/listinfo/fuse-devel
Reply | Threaded
Open this post in threaded view
|

Re: What is cached when your fuse does nothing for "open"?

Maxim Patlasov-3

Hi,


It should be enough to look at the following snippet to fully understand keep_cache option:

void fuse_finish_open(struct inode *inode, struct file *file)
{
    ...
    if (!(ff->open_flags & FOPEN_KEEP_CACHE))
        invalidate_inode_pages2(inode->i_mapping);

where invalidate_inode_pages2() purges page-cache associated with given inode. The idea is that if you open the same file more than once (and maybe referring by different names), the page-cache is not duplicated. There is a general cache coherence problem to cope with: imagine that you read/write a file from the node "A", so some content of the file is cached there, and now you modify the file from the node "B". Which mechanism should FUSE programmer utilize to ensure that users of node "A" won't see stale content? There is no good universal solution in FUSE methodology (AFAIK), but there are some tricks. KEEP_CACHE is one of them: it barriers stale data at least at the moment of handling open(2). Does this answer your question?


Thanks,

Maxim


On 01/03/2017 02:01 AM, 박마루한 wrote:
Yes it shows that caching improves read when using "cat". But I still don't know what cached information is allowing the performance boost.

I'm thinking the answer to this question of mine can help with that: "what if a process tried to read "/mnt/fuse/file" where "/mnt/fuse" is the mount point of the fuse program, and I made the fuse program do nothing and return 0 for its open operation method? What gets cached then?"

2017-01-03 18:48 GMT+09:00 Stef Bon <[hidden email]>:
Well I can't help you then.
In the link is described the problem and an example. This example
reads a file many times, to show the difference when using the
keep_cache
option. If you stil don't know what is cached here......



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot




------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
--
fuse-devel mailing list
To unsubscribe or subscribe, visit https://lists.sourceforge.net/lists/listinfo/fuse-devel
Reply | Threaded
Open this post in threaded view
|

Re: What is cached when your fuse does nothing for "open"?

Maruhan Park
Thank you

It answers a little bit, but not quite.

So it answers this bit: "What cache does keep_cache deal with?"
Answer: page cache

Then, I'm still confused because there are two layers of open.

There's the fuse's open operation. There, you decide whether you want the "keep_cache" option by specifying in "fuse_file_info *fi". And you can even decide the open operation to do nothing at all.
Then, there's the libc's open method which can even be called outside fuse's open operation (for example, could be called in the destroy operation even).

Logically, fuse's open operation is really not the one opening the file, so it has to be the files that libc's open method is calling that gets cached. So, how does fuse handle this?
Is it like, "Any libc's open method that gets called by the fuse program will follow the specific cache option"?

If so, imagine a file outside fuse (/home/foo/file), and your fuse's open operation calls libc's open method to that file (cat /mnt/fuse/foo/file). And then, an outsider who is not using fuse, simply opens the file (cat /home/foo/file). In this scenario, both fuse and outside-fuse are using libc's open to the file (/home/foo/file), which seems to me that they should both be looking at the same page cache.  But, if the above question is correct, it should mean that fuse can alter the page cache for the file (/home/foo/file), which affects the outsider accessing the file too (cat /home/foo/file). This does not seem correct.

If this is not the case, then how does fuse isolate the cache option only to itself?

2017-01-04 4:59 GMT+09:00 Maxim Patlasov <[hidden email]>:

Hi,


It should be enough to look at the following snippet to fully understand keep_cache option:

void fuse_finish_open(struct inode *inode, struct file *file)
{
    ...
    if (!(ff->open_flags & FOPEN_KEEP_CACHE))
        invalidate_inode_pages2(inode->i_mapping);

where invalidate_inode_pages2() purges page-cache associated with given inode. The idea is that if you open the same file more than once (and maybe referring by different names), the page-cache is not duplicated. There is a general cache coherence problem to cope with: imagine that you read/write a file from the node "A", so some content of the file is cached there, and now you modify the file from the node "B". Which mechanism should FUSE programmer utilize to ensure that users of node "A" won't see stale content? There is no good universal solution in FUSE methodology (AFAIK), but there are some tricks. KEEP_CACHE is one of them: it barriers stale data at least at the moment of handling open(2). Does this answer your question?


Thanks,

Maxim


On 01/03/2017 02:01 AM, 박마루한 wrote:
Yes it shows that caching improves read when using "cat". But I still don't know what cached information is allowing the performance boost.

I'm thinking the answer to this question of mine can help with that: "what if a process tried to read "/mnt/fuse/file" where "/mnt/fuse" is the mount point of the fuse program, and I made the fuse program do nothing and return 0 for its open operation method? What gets cached then?"

2017-01-03 18:48 GMT+09:00 Stef Bon <[hidden email]>:
Well I can't help you then.
In the link is described the problem and an example. This example
reads a file many times, to show the difference when using the
keep_cache
option. If you stil don't know what is cached here......



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot





------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
--
fuse-devel mailing list
To unsubscribe or subscribe, visit https://lists.sourceforge.net/lists/listinfo/fuse-devel
Reply | Threaded
Open this post in threaded view
|

Re: What is cached when your fuse does nothing for "open"?

Stef Bon-2
2017-01-04 3:07 GMT+01:00 박마루한 <[hidden email]>:
> Thank you

> And then, an outsider who is not using fuse, simply opens the file (cat
> /home/foo/file). In this scenario, both fuse and outside-fuse are using
> libc's open to the file (/home/foo/file), which seems to me that they should
> both be looking at the same page cache.  But, if the above question is
> correct, it should mean that fuse can alter the page cache for the file
> (/home/foo/file), which affects the outsider accessing the file too (cat
> /home/foo/file). This does not seem correct.

No your assumption that the files /home/foo/file and /mnt/fuse/foo/file
are using the same pagecache is not correct.
If you are doing a stat call to both of them, you will see different
inodes and even better
different devices. They are different files to the kernel.

Stef Bon

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
--
fuse-devel mailing list
To unsubscribe or subscribe, visit https://lists.sourceforge.net/lists/listinfo/fuse-devel
Reply | Threaded
Open this post in threaded view
|

Re: What is cached when your fuse does nothing for "open"?

Maruhan Park
"They are different files to the kernel."

Ok cool, that's what I thought.

Then, we can go back to this question,

"Logically, fuse's open operation is really not the one opening the file, so it has to be the files that libc's open method is calling that gets cached. So, how does fuse handle this?"

As I've mentioned, you can do anything you want in fuse's open operation. You can decide to call libc's open to a file. You can decide it to print "hello world". You can decide it to do nothing. Heck, you can even make it open two or more files.

When this is the case, I can only be convinced that caching happens only when libc's open is called.
Which then leads me to suspect that this is the case,

Outsider case
1. Process requests VFS to call "open" to /foo/file
2. VFS checks its cache
3. If there's none, VFS sends "open" to the appropriate filesystem (assume ext2). Then, VFS caches /foo/file. VFS returns the file handle back to the process.

Fuse case:
1. Process requests VFS to call "open" to /mnt/fuse/foo/file
2. VFS checks its cache: If there's info send the file handle back to process
3. If there's none in cache, VFS sends "open" to fuse kernel
4. Fuse kernel runs its open operation and calls the corresponding open method defined in the fuse userspace library
5. (Assume the above method calls libc's open to /foo/file) Fuse process requests VFS to call "open" to /foo/file
6. VFS checks its cache
7. If there's none, VFS sends "open" to the appropriate filesystem (assume ext2). Then, VFS caches /foo/file.
8. VFS returns the file handle back to fuse process
9. Fuse returns the file handle back to VFS. At this point, VFS caches that "file data" for /mnt/fuse/foo/file (which is actually the file data from the /foo/file) using the info from the file handle. (But what if, for example, fuse's open method called libc's open() twice on two different files? Does it cache just the first one?)
10. VFS returns the file handle back to the original process.

So both cases use the cache for /foo/file, but only the fuse case use the cache for /mnt/fuse/foo/file. Is this correct?

2017-01-04 15:01 GMT+09:00 Stef Bon <[hidden email]>:
2017-01-04 3:07 GMT+01:00 박마루한 <[hidden email]>:
> Thank you

> And then, an outsider who is not using fuse, simply opens the file (cat
> /home/foo/file). In this scenario, both fuse and outside-fuse are using
> libc's open to the file (/home/foo/file), which seems to me that they should
> both be looking at the same page cache.  But, if the above question is
> correct, it should mean that fuse can alter the page cache for the file
> (/home/foo/file), which affects the outsider accessing the file too (cat
> /home/foo/file). This does not seem correct.

No your assumption that the files /home/foo/file and /mnt/fuse/foo/file
are using the same pagecache is not correct.
If you are doing a stat call to both of them, you will see different
inodes and even better
different devices. They are different files to the kernel.

Stef Bon


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
--
fuse-devel mailing list
To unsubscribe or subscribe, visit https://lists.sourceforge.net/lists/listinfo/fuse-devel
Reply | Threaded
Open this post in threaded view
|

Re: What is cached when your fuse does nothing for "open"?

Stef Bon-2
2017-01-04 8:55 GMT+01:00 박마루한 <[hidden email]>:
> "They are different files to the kernel."
>
> Ok cool, that's what I thought.
>
> Then, we can go back to this question,
>

We? It's your problem.

> "Logically, fuse's open operation is really not the one opening the file, so
> it has to be the files that libc's open method is calling that gets cached.

No no no and no. Why does it have to be libc's open method? Like
you've mentioned
fuse can have any backend. (page) Caching does also work when the
backend is a mysql database for example.
And I can name many other backends here.

> So, how does fuse handle this?"
>
> As I've mentioned, you can do anything you want in fuse's open operation.
> You can decide to call libc's open to a file. You can decide it to print
> "hello world". You can decide it to do nothing. Heck, you can even make it
> open two or more files.
>
> When this is the case, I can only be convinced that caching happens only
> when libc's open is called.

Again nonsense. The only thing what is cached is the contents of the
backend file. If you
have an overlay fs, you've got two pagecaches: one of the file in the
fusefilesystem, and one of the target/underlying fs.

> Which then leads me to suspect that this is the case,
>
>
> So both cases use the cache for /foo/file, but only the fuse case use the
> cache for /mnt/fuse/foo/file. Is this correct?

Yes if want to put it this way. The kernel does only not know and
cannot know that these caches are related via FUSE.
Are you suggesting a patch for some insane combination to these pagecaches?

Stef

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
--
fuse-devel mailing list
To unsubscribe or subscribe, visit https://lists.sourceforge.net/lists/listinfo/fuse-devel
Reply | Threaded
Open this post in threaded view
|

Re: What is cached when your fuse does nothing for "open"?

Maruhan Park
Thank you so much for explaining (and bearing with my barrage of questions)

> The only thing what is cached is the contents of the
> backend file. If you
> have an overlay fs, you've got two pagecaches: one of the file in the
> fusefilesystem, and one of the target/underlying fs.

I'm kind of lost about the backend file you are talking about. I never knew filesystems have backend files. 
For instance, if you run this example program https://github.com/libfuse/libfuse/blob/master/example/passthrough.c, does it create a backend file somewhere?


2017-01-04 19:28 GMT+09:00 Stef Bon <[hidden email]>:
2017-01-04 8:55 GMT+01:00 박마루한 <[hidden email]>:
> "They are different files to the kernel."
>
> Ok cool, that's what I thought.
>
> Then, we can go back to this question,
>

We? It's your problem.

> "Logically, fuse's open operation is really not the one opening the file, so
> it has to be the files that libc's open method is calling that gets cached.

No no no and no. Why does it have to be libc's open method? Like
you've mentioned
fuse can have any backend. (page) Caching does also work when the
backend is a mysql database for example.
And I can name many other backends here.

> So, how does fuse handle this?"
>
> As I've mentioned, you can do anything you want in fuse's open operation.
> You can decide to call libc's open to a file. You can decide it to print
> "hello world". You can decide it to do nothing. Heck, you can even make it
> open two or more files.
>
> When this is the case, I can only be convinced that caching happens only
> when libc's open is called.

Again nonsense. The only thing what is cached is the contents of the
backend file. If you
have an overlay fs, you've got two pagecaches: one of the file in the
fusefilesystem, and one of the target/underlying fs.

> Which then leads me to suspect that this is the case,
>
>
> So both cases use the cache for /foo/file, but only the fuse case use the
> cache for /mnt/fuse/foo/file. Is this correct?

Yes if want to put it this way. The kernel does only not know and
cannot know that these caches are related via FUSE.
Are you suggesting a patch for some insane combination to these pagecaches?

Stef


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
--
fuse-devel mailing list
To unsubscribe or subscribe, visit https://lists.sourceforge.net/lists/listinfo/fuse-devel
Reply | Threaded
Open this post in threaded view
|

Re: What is cached when your fuse does nothing for "open"?

Nikolaus Rath
In reply to this post by Maruhan Park
On Jan 04 2017, 박마루한 <mrpark-tlZpZqNTSqmJ1ku80POtVQC/[hidden email]> wrote:
> Then, we can go back to this question,
>
> "Logically, fuse's open operation is really not the one opening the file,
> so it has to be the files that libc's open method is calling that gets
> cached. So, how does fuse handle this?"

I do not understand the question. Can you rephrase it?

Are you assuming that the FUSE file system would need to itself open a
file in order to process an open request (this is not necessarily the
case)? If so, what is "the file"?

What do you mean with "the files that libc's open method is calling"?
The open method is not calling any files, and I don't think it's a good
idea to add libc to the picture - better think in terms of syscalls
(libc may add yet another layer of caching when you use FILE objects
instead of fds).

Best,
-Nikolaus

--
GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F
Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

             »Time flies like an arrow, fruit flies like a Banana.«

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
--
fuse-devel mailing list
To unsubscribe or subscribe, visit https://lists.sourceforge.net/lists/listinfo/fuse-devel
Reply | Threaded
Open this post in threaded view
|

Re: What is cached when your fuse does nothing for "open"?

Maruhan Park
Thank you for answering Nikolaus

>I do not understand the question. Can you rephrase it?

Ok, so as I understand, the mount point for FUSE is simply a gateway to call methods you write in the FUSE userspace library. 
When FUSE receives an order from VFS to call "open" to a specific path, your custom written open method from the FUSE userspace library gets called.
This open method may not actually open a file at all. It may even do something wacky like print "hello world" or open multiple files.

Since we can't expect what FUSE's method will do, we don't know what kind of data it will deal with. 

I assume that when VFS orders a filesystem to call "open", it will also ask the filesystem for the address that points to the beginning of where the filesystem stores the information about open.
At that address or some known bits after that address, there is the data about how large this "information about open" is.
Using this data about the size of the information, VFS caches the appropriate block of data starting from specific bits below the aforementioned address.

Now, in a "normal" filesystem, this block of data will have a specific structure that's always the same. So when you want to read a file, VFS will go to the address that are certain bits below the starting address of that block of data.
However, for FUSE, we have no idea what the structure of this block of data will be. (I may be wrong, perhaps we do know about the structure)

For example, imagine that FUSE's open method opens two files. For a "normal" filesystem, only one file would be open so when you cache the contents of the file, you know what block of data to cache. But, in this FUSE, there are two files that opens. How does it determine what the "contents of file" are? How could this be cached?

I just couldn't comprehend that. 

2017-01-05 2:35 GMT+09:00 Nikolaus Rath <[hidden email]>:
On Jan 04 2017, 박마루한 <mrpark-tlZpZqNTSqmJ1ku80POtVQC/[hidden email]> wrote:
> Then, we can go back to this question,
>
> "Logically, fuse's open operation is really not the one opening the file,
> so it has to be the files that libc's open method is calling that gets
> cached. So, how does fuse handle this?"

I do not understand the question. Can you rephrase it?

Are you assuming that the FUSE file system would need to itself open a
file in order to process an open request (this is not necessarily the
case)? If so, what is "the file"?

What do you mean with "the files that libc's open method is calling"?
The open method is not calling any files, and I don't think it's a good
idea to add libc to the picture - better think in terms of syscalls
(libc may add yet another layer of caching when you use FILE objects
instead of fds).

Best,
-Nikolaus

--
GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F
Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

             »Time flies like an arrow, fruit flies like a Banana.«

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
--
fuse-devel mailing list
To unsubscribe or subscribe, visit https://lists.sourceforge.net/lists/listinfo/fuse-devel


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
--
fuse-devel mailing list
To unsubscribe or subscribe, visit https://lists.sourceforge.net/lists/listinfo/fuse-devel
Reply | Threaded
Open this post in threaded view
|

Re: What is cached when your fuse does nothing for "open"?

Maxim Patlasov-3



On 01/04/2017 05:24 PM, 박마루한 wrote:
Thank you for answering Nikolaus

>I do not understand the question. Can you rephrase it?

Ok, so as I understand, the mount point for FUSE is simply a gateway to call methods you write in the FUSE userspace library. 
When FUSE receives an order from VFS to call "open" to a specific path, your custom written open method from the FUSE userspace library gets called.
This open method may not actually open a file at all. It may even do something wacky like print "hello world" or open multiple files.

Since we can't expect what FUSE's method will do, we don't know what kind of data it will deal with. 

I assume that when VFS orders a filesystem to call "open", it will also ask the filesystem for the address that points to the beginning of where the filesystem stores the information about open.
At that address or some known bits after that address, there is the data about how large this "information about open" is.
Using this data about the size of the information, VFS caches the appropriate block of data starting from specific bits below the aforementioned address.

Now, in a "normal" filesystem, this block of data will have a specific structure that's always the same. So when you want to read a file, VFS will go to the address that are certain bits below the starting address of that block of data.
However, for FUSE, we have no idea what the structure of this block of data will be. (I may be wrong, perhaps we do know about the structure)

For example, imagine that FUSE's open method opens two files. For a "normal" filesystem, only one file would be open so when you cache the contents of the file, you know what block of data to cache. But, in this FUSE, there are two files that opens. How does it determine what the "contents of file" are? How could this be cached?

I just couldn't comprehend that.

That's simple, in fact. There is kernel fuse and userspace fuse. You open two files in userspace fuse, but there is still only one "struct file" in kernel fuse. Linux kernel populates (and uses) page cache when you are doing i/o, not when you open a file. You issue read(2) syscall and kernel fuse handles it like this: if there is already a page in page cache, let's use it, otherwise ask userspace fuse to "read". And every time an actual "read" happens in userspace, corresponding page is populated with actual content of the "file" -- what kernel receives from userspace. It's up to userspace fuse how to compose this content based on those two files it holds opened. Kernel fuse doesn't know how it was composed in userspace. Similarly, when you issue write(2), kernel fuse automatically populates a page with data passed as argument to that write(2). Since now that page is in page cache and available for read(2)s. Again, it is up to userspace fuse how to store the content of the page in those two files it holds opened. Needless to say, the page discussed above has nothing to do with page[s] originated by userspace i/o on those two files. They belong to different page caches and do not interfere.

Thanks,
Maxim


2017-01-05 2:35 GMT+09:00 Nikolaus Rath <[hidden email]>:
On Jan 04 2017, 박마루한 <mrpark-tlZpZqNTSqmJ1ku80POtVQC/[hidden email]> wrote:
> Then, we can go back to this question,
>
> "Logically, fuse's open operation is really not the one opening the file,
> so it has to be the files that libc's open method is calling that gets
> cached. So, how does fuse handle this?"

I do not understand the question. Can you rephrase it?

Are you assuming that the FUSE file system would need to itself open a
file in order to process an open request (this is not necessarily the
case)? If so, what is "the file"?

What do you mean with "the files that libc's open method is calling"?
The open method is not calling any files, and I don't think it's a good
idea to add libc to the picture - better think in terms of syscalls
(libc may add yet another layer of caching when you use FILE objects
instead of fds).

Best,
-Nikolaus

--
GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F
Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

             »Time flies like an arrow, fruit flies like a Banana.«

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
--
fuse-devel mailing list
To unsubscribe or subscribe, visit https://lists.sourceforge.net/lists/listinfo/fuse-devel



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot




------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
--
fuse-devel mailing list
To unsubscribe or subscribe, visit https://lists.sourceforge.net/lists/listinfo/fuse-devel
Reply | Threaded
Open this post in threaded view
|

Re: What is cached when your fuse does nothing for "open"?

Maruhan Park
Thank you so much for the response Maxim. This is honestly a tremendous help.

> Linux kernel populates (and uses) page cache when you are doing i/o, not when you open a file.

Welp, I thought all syscalls are I/Os.


> It's up to userspace fuse how to compose this content based on those two files it holds opened.

I understand that it's up to the userspace fuse to do that. But I have trouble visualizing how that actually works.

So, if you define the userspace fuse's read method to do something completely unrelated to reading a file such as pinging an ip address, then what is the actual content of the "file that kernel receives from userspace"?
Would the content be instructions for pinging the ip address?
So then when the kernel checks for the page in page cache, what it receives is the instruction for pinging the ip address?


2017-01-05 11:18 GMT+09:00 Maxim Patlasov <[hidden email]>:



On 01/04/2017 05:24 PM, 박마루한 wrote:
Thank you for answering Nikolaus

>I do not understand the question. Can you rephrase it?

Ok, so as I understand, the mount point for FUSE is simply a gateway to call methods you write in the FUSE userspace library. 
When FUSE receives an order from VFS to call "open" to a specific path, your custom written open method from the FUSE userspace library gets called.
This open method may not actually open a file at all. It may even do something wacky like print "hello world" or open multiple files.

Since we can't expect what FUSE's method will do, we don't know what kind of data it will deal with. 

I assume that when VFS orders a filesystem to call "open", it will also ask the filesystem for the address that points to the beginning of where the filesystem stores the information about open.
At that address or some known bits after that address, there is the data about how large this "information about open" is.
Using this data about the size of the information, VFS caches the appropriate block of data starting from specific bits below the aforementioned address.

Now, in a "normal" filesystem, this block of data will have a specific structure that's always the same. So when you want to read a file, VFS will go to the address that are certain bits below the starting address of that block of data.
However, for FUSE, we have no idea what the structure of this block of data will be. (I may be wrong, perhaps we do know about the structure)

For example, imagine that FUSE's open method opens two files. For a "normal" filesystem, only one file would be open so when you cache the contents of the file, you know what block of data to cache. But, in this FUSE, there are two files that opens. How does it determine what the "contents of file" are? How could this be cached?

I just couldn't comprehend that.

That's simple, in fact. There is kernel fuse and userspace fuse. You open two files in userspace fuse, but there is still only one "struct file" in kernel fuse. Linux kernel populates (and uses) page cache when you are doing i/o, not when you open a file. You issue read(2) syscall and kernel fuse handles it like this: if there is already a page in page cache, let's use it, otherwise ask userspace fuse to "read". And every time an actual "read" happens in userspace, corresponding page is populated with actual content of the "file" -- what kernel receives from userspace. It's up to userspace fuse how to compose this content based on those two files it holds opened. Kernel fuse doesn't know how it was composed in userspace. Similarly, when you issue write(2), kernel fuse automatically populates a page with data passed as argument to that write(2). Since now that page is in page cache and available for read(2)s. Again, it is up to userspace fuse how to store the content of the page in those two files it holds opened. Needless to say, the page discussed above has nothing to do with page[s] originated by userspace i/o on those two files. They belong to different page caches and do not interfere.

Thanks,
Maxim



2017-01-05 2:35 GMT+09:00 Nikolaus Rath <[hidden email]>:
On Jan 04 2017, 박마루한 <mrpark-tlZpZqNTSqmJ1ku80POtVQC/[hidden email]> wrote:
> Then, we can go back to this question,
>
> "Logically, fuse's open operation is really not the one opening the file,
> so it has to be the files that libc's open method is calling that gets
> cached. So, how does fuse handle this?"

I do not understand the question. Can you rephrase it?

Are you assuming that the FUSE file system would need to itself open a
file in order to process an open request (this is not necessarily the
case)? If so, what is "the file"?

What do you mean with "the files that libc's open method is calling"?
The open method is not calling any files, and I don't think it's a good
idea to add libc to the picture - better think in terms of syscalls
(libc may add yet another layer of caching when you use FILE objects
instead of fds).

Best,
-Nikolaus

--
GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F
Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

             »Time flies like an arrow, fruit flies like a Banana.«

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
--
fuse-devel mailing list
To unsubscribe or subscribe, visit https://lists.sourceforge.net/lists/listinfo/fuse-devel



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot





------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
--
fuse-devel mailing list
To unsubscribe or subscribe, visit https://lists.sourceforge.net/lists/listinfo/fuse-devel
Reply | Threaded
Open this post in threaded view
|

Re: What is cached when your fuse does nothing for "open"?

Michael Theall-2
Your userspace "read" callback tells you at what "offset" and what "size" the read request is. Your userspace code replies to kernel FUSE with a buffer. It is the contents of this buffer which are put into the page cache in the kernel.

Regards,
Michael Theall

On Wed, Jan 4, 2017 at 11:00 PM 박마루한 <[hidden email]> wrote:
Thank you so much for the response Maxim. This is honestly a tremendous help.

> Linux kernel populates (and uses) page cache when you are doing i/o, not when you open a file.

Welp, I thought all syscalls are I/Os.


> It's up to userspace fuse how to compose this content based on those two files it holds opened.

I understand that it's up to the userspace fuse to do that. But I have trouble visualizing how that actually works.

So, if you define the userspace fuse's read method to do something completely unrelated to reading a file such as pinging an ip address, then what is the actual content of the "file that kernel receives from userspace"?
Would the content be instructions for pinging the ip address?
So then when the kernel checks for the page in page cache, what it receives is the instruction for pinging the ip address?


2017-01-05 11:18 GMT+09:00 Maxim Patlasov <[hidden email]>:



On 01/04/2017 05:24 PM, 박마루한 wrote:
Thank you for answering Nikolaus

>I do not understand the question. Can you rephrase it?

Ok, so as I understand, the mount point for FUSE is simply a gateway to call methods you write in the FUSE userspace library. 
When FUSE receives an order from VFS to call "open" to a specific path, your custom written open method from the FUSE userspace library gets called.
This open method may not actually open a file at all. It may even do something wacky like print "hello world" or open multiple files.

Since we can't expect what FUSE's method will do, we don't know what kind of data it will deal with. 

I assume that when VFS orders a filesystem to call "open", it will also ask the filesystem for the address that points to the beginning of where the filesystem stores the information about open.
At that address or some known bits after that address, there is the data about how large this "information about open" is.
Using this data about the size of the information, VFS caches the appropriate block of data starting from specific bits below the aforementioned address.

Now, in a "normal" filesystem, this block of data will have a specific structure that's always the same. So when you want to read a file, VFS will go to the address that are certain bits below the starting address of that block of data.
However, for FUSE, we have no idea what the structure of this block of data will be. (I may be wrong, perhaps we do know about the structure)

For example, imagine that FUSE's open method opens two files. For a "normal" filesystem, only one file would be open so when you cache the contents of the file, you know what block of data to cache. But, in this FUSE, there are two files that opens. How does it determine what the "contents of file" are? How could this be cached?

I just couldn't comprehend that.

That's simple, in fact. There is kernel fuse and userspace fuse. You open two files in userspace fuse, but there is still only one "struct file" in kernel fuse. Linux kernel populates (and uses) page cache when you are doing i/o, not when you open a file. You issue read(2) syscall and kernel fuse handles it like this: if there is already a page in page cache, let's use it, otherwise ask userspace fuse to "read". And every time an actual "read" happens in userspace, corresponding page is populated with actual content of the "file" -- what kernel receives from userspace. It's up to userspace fuse how to compose this content based on those two files it holds opened. Kernel fuse doesn't know how it was composed in userspace. Similarly, when you issue write(2), kernel fuse automatically populates a page with data passed as argument to that write(2). Since now that page is in page cache and available for read(2)s. Again, it is up to userspace fuse how to store the content of the page in those two files it holds opened. Needless to say, the page discussed above has nothing to do with page[s] originated by userspace i/o on those two files. They belong to different page caches and do not interfere.

Thanks,
Maxim



2017-01-05 2:35 GMT+09:00 Nikolaus Rath <[hidden email]>:
On Jan 04 2017, 박마루한 <mrpark-tlZpZqNTSqmJ1ku80POtVQC/[hidden email]> wrote:
> Then, we can go back to this question,
>
> "Logically, fuse's open operation is really not the one opening the file,
> so it has to be the files that libc's open method is calling that gets
> cached. So, how does fuse handle this?"

I do not understand the question. Can you rephrase it?

Are you assuming that the FUSE file system would need to itself open a
file in order to process an open request (this is not necessarily the
case)? If so, what is "the file"?

What do you mean with "the files that libc's open method is calling"?
The open method is not calling any files, and I don't think it's a good
idea to add libc to the picture - better think in terms of syscalls
(libc may add yet another layer of caching when you use FILE objects
instead of fds).

Best,
-Nikolaus

--
GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F
Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

             »Time flies like an arrow, fruit flies like a Banana.«

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
--
fuse-devel mailing list
To unsubscribe or subscribe, visit https://lists.sourceforge.net/lists/listinfo/fuse-devel



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot




------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot--
fuse-devel mailing list
To unsubscribe or subscribe, visit https://lists.sourceforge.net/lists/listinfo/fuse-devel

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
--
fuse-devel mailing list
To unsubscribe or subscribe, visit https://lists.sourceforge.net/lists/listinfo/fuse-devel
Reply | Threaded
Open this post in threaded view
|

Re: What is cached when your fuse does nothing for "open"?

Nikolaus Rath
In reply to this post by Maruhan Park
On Jan 05 2017, 박마루한 <mrpark-tlZpZqNTSqmJ1ku80POtVQC/[hidden email]> wrote:
> Ok, so as I understand, the mount point for FUSE is simply a gateway to
> call methods you write in the FUSE userspace library.
> When FUSE receives an order from VFS to call "open" to a specific path,
> your custom written open method from the FUSE userspace library gets called.
> This open method may not actually open a file at all. It may even do
> something wacky like print "hello world" or open multiple files.

Right.

> Since we can't expect what FUSE's method will do, we don't know what kind
> of data it will deal with.

That sentence doesn't make sense to me. I assume with "we" you mean the
kernel? For the kernel, the data is just a sequence of bytes, there's no
need for it to know anything else.

> I assume that when VFS orders a filesystem to call "open", it will also ask
> the filesystem for the address that points to the beginning of where the
> filesystem stores the information about open.

No, it won't. It asks for a bunch of well defined metadata (see
fuse_reply_open()).

> At that address or some known bits after that address, there is the data
> about how large this "information about open" is.

No, the "information about open" always has the same size. And the file
system doesn't return the address of this data, but the data itself.

> Using this data about the size of the information, VFS caches the
> appropriate block of data starting from specific bits below the
> aforementioned address.

There is no need to cache anything when you call open. Caching only
makes sense when you have a read() or write() request.


Hope that helps!

Best,
-Nikolaus

--
GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F
Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

             »Time flies like an arrow, fruit flies like a Banana.«

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
--
fuse-devel mailing list
To unsubscribe or subscribe, visit https://lists.sourceforge.net/lists/listinfo/fuse-devel
Reply | Threaded
Open this post in threaded view
|

Re: What is cached when your fuse does nothing for "open"?

Maxim Patlasov-3
In reply to this post by Maruhan Park

On 01/04/2017 08:58 PM, 박마루한 wrote:

Thank you so much for the response Maxim. This is honestly a tremendous help.

> Linux kernel populates (and uses) page cache when you are doing i/o, not when you open a file.

Welp, I thought all syscalls are I/Os.


> It's up to userspace fuse how to compose this content based on those two files it holds opened.

I understand that it's up to the userspace fuse to do that. But I have trouble visualizing how that actually works.

So, if you define the userspace fuse's read method to do something completely unrelated to reading a file such as pinging an ip address, then what is the actual content of the "file that kernel receives from userspace"?
Would the content be instructions for pinging the ip address?
So then when the kernel checks for the page in page cache, what it receives is the instruction for pinging the ip address?

Imaging /dev/fuse as a pipe -- a communication channel between kernel and userspace fuse with well-defined interface. Kernel experiences cache miss and hence prepares FUSE_READ request for userspace. The request is actually a buffer
with <FUSE_READ, file-id, offset, length>. Then userspace fuse reads this buffer from /dev/fuse and somehow processes it. In your example it pings some ip iddress -- that's completely fine. But kernel FUSE_READ request is still queued in kernel waiting for an ACK from userspace. Until the ACK, the user who submitted initial read(2) is suspended -- kernel doesn't return control from the read(2). So, userspace fuse must compose an ACK for FUSE_READ request and write it to /dev/fuse. The ACK must contain a buffer of "length" bytes. So finally, the content of that buffer is what kernel will use to populate page cache.



2017-01-05 11:18 GMT+09:00 Maxim Patlasov <[hidden email]>:



On 01/04/2017 05:24 PM, 박마루한 wrote:
Thank you for answering Nikolaus

>I do not understand the question. Can you rephrase it?

Ok, so as I understand, the mount point for FUSE is simply a gateway to call methods you write in the FUSE userspace library. 
When FUSE receives an order from VFS to call "open" to a specific path, your custom written open method from the FUSE userspace library gets called.
This open method may not actually open a file at all. It may even do something wacky like print "hello world" or open multiple files.

Since we can't expect what FUSE's method will do, we don't know what kind of data it will deal with. 

I assume that when VFS orders a filesystem to call "open", it will also ask the filesystem for the address that points to the beginning of where the filesystem stores the information about open.
At that address or some known bits after that address, there is the data about how large this "information about open" is.
Using this data about the size of the information, VFS caches the appropriate block of data starting from specific bits below the aforementioned address.

Now, in a "normal" filesystem, this block of data will have a specific structure that's always the same. So when you want to read a file, VFS will go to the address that are certain bits below the starting address of that block of data.
However, for FUSE, we have no idea what the structure of this block of data will be. (I may be wrong, perhaps we do know about the structure)

For example, imagine that FUSE's open method opens two files. For a "normal" filesystem, only one file would be open so when you cache the contents of the file, you know what block of data to cache. But, in this FUSE, there are two files that opens. How does it determine what the "contents of file" are? How could this be cached?

I just couldn't comprehend that.

That's simple, in fact. There is kernel fuse and userspace fuse. You open two files in userspace fuse, but there is still only one "struct file" in kernel fuse. Linux kernel populates (and uses) page cache when you are doing i/o, not when you open a file. You issue read(2) syscall and kernel fuse handles it like this: if there is already a page in page cache, let's use it, otherwise ask userspace fuse to "read". And every time an actual "read" happens in userspace, corresponding page is populated with actual content of the "file" -- what kernel receives from userspace. It's up to userspace fuse how to compose this content based on those two files it holds opened. Kernel fuse doesn't know how it was composed in userspace. Similarly, when you issue write(2), kernel fuse automatically populates a page with data passed as argument to that write(2). Since now that page is in page cache and available for read(2)s. Again, it is up to userspace fuse how to store the content of the page in those two files it holds opened. Needless to say, the page discussed above has nothing to do with page[s] originated by userspace i/o on those two files. They belong to different page caches and do not interfere.

Thanks,
Maxim



2017-01-05 2:35 GMT+09:00 Nikolaus Rath <[hidden email]>:
On Jan 04 2017, 박마루한 <mrpark-tlZpZqNTSqmJ1ku80POtVQC/[hidden email]> wrote:
> Then, we can go back to this question,
>
> "Logically, fuse's open operation is really not the one opening the file,
> so it has to be the files that libc's open method is calling that gets
> cached. So, how does fuse handle this?"

I do not understand the question. Can you rephrase it?

Are you assuming that the FUSE file system would need to itself open a
file in order to process an open request (this is not necessarily the
case)? If so, what is "the file"?

What do you mean with "the files that libc's open method is calling"?
The open method is not calling any files, and I don't think it's a good
idea to add libc to the picture - better think in terms of syscalls
(libc may add yet another layer of caching when you use FILE objects
instead of fds).

Best,
-Nikolaus

--
GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F
Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

             »Time flies like an arrow, fruit flies like a Banana.«

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
--
fuse-devel mailing list
To unsubscribe or subscribe, visit https://lists.sourceforge.net/lists/listinfo/fuse-devel



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
--
fuse-devel mailing list
To unsubscribe or subscribe, visit https://lists.sourceforge.net/lists/listinfo/fuse-devel
Reply | Threaded
Open this post in threaded view
|

Re: What is cached when your fuse does nothing for "open"?

Maruhan Park
Thank you so much Maxim. This makes things MUCH clearer for me.

I still have couple clarification questions.

I'm sorry because this is the first time I saw that term. By ACK, you mean this right? https://en.wikipedia.org/wiki/Acknowledgement_(data_networks)

> But kernel FUSE_READ request is still queued in kernel waiting for an ACK from userspace. Until the ACK, the user who submitted initial read(2) is suspended -- kernel doesn't return control from the read(2). So, userspace fuse must compose an ACK for FUSE_READ request and write it to /dev/fuse.

What do you mean by "must compose an ACK"? As the example, in the FUSE userspace read method, if you just ping an ip address and do nothing else, does it mean VFS or FUSE kernel will not even ping? Or does it mean the ping will happen, but there will be some error because it will not get an ACK? What error would that be?

2017-01-06 4:24 GMT+09:00 Maxim Patlasov <[hidden email]>:

On 01/04/2017 08:58 PM, 박마루한 wrote:

Thank you so much for the response Maxim. This is honestly a tremendous help.

> Linux kernel populates (and uses) page cache when you are doing i/o, not when you open a file.

Welp, I thought all syscalls are I/Os.


> It's up to userspace fuse how to compose this content based on those two files it holds opened.

I understand that it's up to the userspace fuse to do that. But I have trouble visualizing how that actually works.

So, if you define the userspace fuse's read method to do something completely unrelated to reading a file such as pinging an ip address, then what is the actual content of the "file that kernel receives from userspace"?
Would the content be instructions for pinging the ip address?
So then when the kernel checks for the page in page cache, what it receives is the instruction for pinging the ip address?

Imaging /dev/fuse as a pipe -- a communication channel between kernel and userspace fuse with well-defined interface. Kernel experiences cache miss and hence prepares FUSE_READ request for userspace. The request is actually a buffer
with <FUSE_READ, file-id, offset, length>. Then userspace fuse reads this buffer from /dev/fuse and somehow processes it. In your example it pings some ip iddress -- that's completely fine. But kernel FUSE_READ request is still queued in kernel waiting for an ACK from userspace. Until the ACK, the user who submitted initial read(2) is suspended -- kernel doesn't return control from the read(2). So, userspace fuse must compose an ACK for FUSE_READ request and write it to /dev/fuse. The ACK must contain a buffer of "length" bytes. So finally, the content of that buffer is what kernel will use to populate page cache.




2017-01-05 11:18 GMT+09:00 Maxim Patlasov <[hidden email]>:



On 01/04/2017 05:24 PM, 박마루한 wrote:
Thank you for answering Nikolaus

>I do not understand the question. Can you rephrase it?

Ok, so as I understand, the mount point for FUSE is simply a gateway to call methods you write in the FUSE userspace library. 
When FUSE receives an order from VFS to call "open" to a specific path, your custom written open method from the FUSE userspace library gets called.
This open method may not actually open a file at all. It may even do something wacky like print "hello world" or open multiple files.

Since we can't expect what FUSE's method will do, we don't know what kind of data it will deal with. 

I assume that when VFS orders a filesystem to call "open", it will also ask the filesystem for the address that points to the beginning of where the filesystem stores the information about open.
At that address or some known bits after that address, there is the data about how large this "information about open" is.
Using this data about the size of the information, VFS caches the appropriate block of data starting from specific bits below the aforementioned address.

Now, in a "normal" filesystem, this block of data will have a specific structure that's always the same. So when you want to read a file, VFS will go to the address that are certain bits below the starting address of that block of data.
However, for FUSE, we have no idea what the structure of this block of data will be. (I may be wrong, perhaps we do know about the structure)

For example, imagine that FUSE's open method opens two files. For a "normal" filesystem, only one file would be open so when you cache the contents of the file, you know what block of data to cache. But, in this FUSE, there are two files that opens. How does it determine what the "contents of file" are? How could this be cached?

I just couldn't comprehend that.

That's simple, in fact. There is kernel fuse and userspace fuse. You open two files in userspace fuse, but there is still only one "struct file" in kernel fuse. Linux kernel populates (and uses) page cache when you are doing i/o, not when you open a file. You issue read(2) syscall and kernel fuse handles it like this: if there is already a page in page cache, let's use it, otherwise ask userspace fuse to "read". And every time an actual "read" happens in userspace, corresponding page is populated with actual content of the "file" -- what kernel receives from userspace. It's up to userspace fuse how to compose this content based on those two files it holds opened. Kernel fuse doesn't know how it was composed in userspace. Similarly, when you issue write(2), kernel fuse automatically populates a page with data passed as argument to that write(2). Since now that page is in page cache and available for read(2)s. Again, it is up to userspace fuse how to store the content of the page in those two files it holds opened. Needless to say, the page discussed above has nothing to do with page[s] originated by userspace i/o on those two files. They belong to different page caches and do not interfere.

Thanks,
Maxim



2017-01-05 2:35 GMT+09:00 Nikolaus Rath <[hidden email]>:
On Jan 04 2017, 박마루한 <mrpark-tlZpZqNTSqmJ1ku80POtVQC/[hidden email]> wrote:
> Then, we can go back to this question,
>
> "Logically, fuse's open operation is really not the one opening the file,
> so it has to be the files that libc's open method is calling that gets
> cached. So, how does fuse handle this?"

I do not understand the question. Can you rephrase it?

Are you assuming that the FUSE file system would need to itself open a
file in order to process an open request (this is not necessarily the
case)? If so, what is "the file"?

What do you mean with "the files that libc's open method is calling"?
The open method is not calling any files, and I don't think it's a good
idea to add libc to the picture - better think in terms of syscalls
(libc may add yet another layer of caching when you use FILE objects
instead of fds).

Best,
-Nikolaus

--
GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F
Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

             »Time flies like an arrow, fruit flies like a Banana.«

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
--
fuse-devel mailing list
To unsubscribe or subscribe, visit https://lists.sourceforge.net/lists/listinfo/fuse-devel



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
--
fuse-devel mailing list
To unsubscribe or subscribe, visit https://lists.sourceforge.net/lists/listinfo/fuse-devel
Reply | Threaded
Open this post in threaded view
|

Re: What is cached when your fuse does nothing for "open"?

Maruhan Park
> But kernel FUSE_READ request is still queued in kernel waiting for an ACK from userspace. Until the ACK, the user who submitted initial read(2) is suspended -- kernel doesn't return control from the read(2). So, userspace fuse must compose an ACK for FUSE_READ request and write it to /dev/fuse.

Oh and also, you say that FUSE_READ request is queued. Does that get resolved at the first ACK it receives? If the FUSE userspace read method sequentially reads two files, does that mean that the FUSE kernel will not execute the rest after it sees the ACK for the first read? Or is it that the execution happens, but the confirmation of the second read will not be interpreted by the FUSE kernel and hence the VFS? 

If the kernel happens to be able to read both ACK, then how does the kernel combine both information to populate the page cache?

2017-01-06 10:17 GMT+09:00 박마루한 <[hidden email]>:
Thank you so much Maxim. This makes things MUCH clearer for me.

I still have couple clarification questions.

I'm sorry because this is the first time I saw that term. By ACK, you mean this right? https://en.wikipedia.org/wiki/Acknowledgement_(data_networks)

> But kernel FUSE_READ request is still queued in kernel waiting for an ACK from userspace. Until the ACK, the user who submitted initial read(2) is suspended -- kernel doesn't return control from the read(2). So, userspace fuse must compose an ACK for FUSE_READ request and write it to /dev/fuse.

What do you mean by "must compose an ACK"? As the example, in the FUSE userspace read method, if you just ping an ip address and do nothing else, does it mean VFS or FUSE kernel will not even ping? Or does it mean the ping will happen, but there will be some error because it will not get an ACK? What error would that be?

2017-01-06 4:24 GMT+09:00 Maxim Patlasov <[hidden email]>:

On 01/04/2017 08:58 PM, 박마루한 wrote:

Thank you so much for the response Maxim. This is honestly a tremendous help.

> Linux kernel populates (and uses) page cache when you are doing i/o, not when you open a file.

Welp, I thought all syscalls are I/Os.


> It's up to userspace fuse how to compose this content based on those two files it holds opened.

I understand that it's up to the userspace fuse to do that. But I have trouble visualizing how that actually works.

So, if you define the userspace fuse's read method to do something completely unrelated to reading a file such as pinging an ip address, then what is the actual content of the "file that kernel receives from userspace"?
Would the content be instructions for pinging the ip address?
So then when the kernel checks for the page in page cache, what it receives is the instruction for pinging the ip address?

Imaging /dev/fuse as a pipe -- a communication channel between kernel and userspace fuse with well-defined interface. Kernel experiences cache miss and hence prepares FUSE_READ request for userspace. The request is actually a buffer
with <FUSE_READ, file-id, offset, length>. Then userspace fuse reads this buffer from /dev/fuse and somehow processes it. In your example it pings some ip iddress -- that's completely fine. But kernel FUSE_READ request is still queued in kernel waiting for an ACK from userspace. Until the ACK, the user who submitted initial read(2) is suspended -- kernel doesn't return control from the read(2). So, userspace fuse must compose an ACK for FUSE_READ request and write it to /dev/fuse. The ACK must contain a buffer of "length" bytes. So finally, the content of that buffer is what kernel will use to populate page cache.




2017-01-05 11:18 GMT+09:00 Maxim Patlasov <[hidden email]>:



On 01/04/2017 05:24 PM, 박마루한 wrote:
Thank you for answering Nikolaus

>I do not understand the question. Can you rephrase it?

Ok, so as I understand, the mount point for FUSE is simply a gateway to call methods you write in the FUSE userspace library. 
When FUSE receives an order from VFS to call "open" to a specific path, your custom written open method from the FUSE userspace library gets called.
This open method may not actually open a file at all. It may even do something wacky like print "hello world" or open multiple files.

Since we can't expect what FUSE's method will do, we don't know what kind of data it will deal with. 

I assume that when VFS orders a filesystem to call "open", it will also ask the filesystem for the address that points to the beginning of where the filesystem stores the information about open.
At that address or some known bits after that address, there is the data about how large this "information about open" is.
Using this data about the size of the information, VFS caches the appropriate block of data starting from specific bits below the aforementioned address.

Now, in a "normal" filesystem, this block of data will have a specific structure that's always the same. So when you want to read a file, VFS will go to the address that are certain bits below the starting address of that block of data.
However, for FUSE, we have no idea what the structure of this block of data will be. (I may be wrong, perhaps we do know about the structure)

For example, imagine that FUSE's open method opens two files. For a "normal" filesystem, only one file would be open so when you cache the contents of the file, you know what block of data to cache. But, in this FUSE, there are two files that opens. How does it determine what the "contents of file" are? How could this be cached?

I just couldn't comprehend that.

That's simple, in fact. There is kernel fuse and userspace fuse. You open two files in userspace fuse, but there is still only one "struct file" in kernel fuse. Linux kernel populates (and uses) page cache when you are doing i/o, not when you open a file. You issue read(2) syscall and kernel fuse handles it like this: if there is already a page in page cache, let's use it, otherwise ask userspace fuse to "read". And every time an actual "read" happens in userspace, corresponding page is populated with actual content of the "file" -- what kernel receives from userspace. It's up to userspace fuse how to compose this content based on those two files it holds opened. Kernel fuse doesn't know how it was composed in userspace. Similarly, when you issue write(2), kernel fuse automatically populates a page with data passed as argument to that write(2). Since now that page is in page cache and available for read(2)s. Again, it is up to userspace fuse how to store the content of the page in those two files it holds opened. Needless to say, the page discussed above has nothing to do with page[s] originated by userspace i/o on those two files. They belong to different page caches and do not interfere.

Thanks,
Maxim



2017-01-05 2:35 GMT+09:00 Nikolaus Rath <[hidden email]>:
On Jan 04 2017, 박마루한 <mrpark-tlZpZqNTSqmJ1ku80POtVQC/[hidden email]> wrote:
> Then, we can go back to this question,
>
> "Logically, fuse's open operation is really not the one opening the file,
> so it has to be the files that libc's open method is calling that gets
> cached. So, how does fuse handle this?"

I do not understand the question. Can you rephrase it?

Are you assuming that the FUSE file system would need to itself open a
file in order to process an open request (this is not necessarily the
case)? If so, what is "the file"?

What do you mean with "the files that libc's open method is calling"?
The open method is not calling any files, and I don't think it's a good
idea to add libc to the picture - better think in terms of syscalls
(libc may add yet another layer of caching when you use FILE objects
instead of fds).

Best,
-Nikolaus

--
GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F
Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

             »Time flies like an arrow, fruit flies like a Banana.«

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
--
fuse-devel mailing list
To unsubscribe or subscribe, visit https://lists.sourceforge.net/lists/listinfo/fuse-devel



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
--
fuse-devel mailing list
To unsubscribe or subscribe, visit https://lists.sourceforge.net/lists/listinfo/fuse-devel
12