<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://purl.org/rss/1.0/" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:syn="http://purl.org/rss/1.0/modules/syndication/" xmlns:admin="http://webns.net/mvcb/">
  <channel rdf:about="http://blog.gmane.org/gmane.linux.kernel.containers">
    <title>gmane.linux.kernel.containers</title>
    <link>http://blog.gmane.org/gmane.linux.kernel.containers</link>
    <description/>
    <syn:updatePeriod>hourly</syn:updatePeriod>
    <syn:updateFrequency>1</syn:updateFrequency>
    <syn:updateBase>1901-01-01T00:00+00:00</syn:updateBase>
    <items>
      <rdf:Seq>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.linux.kernel.containers/8349"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.linux.kernel.containers/8348"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.linux.kernel.containers/8347"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.linux.kernel.containers/8346"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.linux.kernel.containers/8345"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.linux.kernel.containers/8344"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.linux.kernel.containers/8343"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.linux.kernel.containers/8342"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.linux.kernel.containers/8340"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.linux.kernel.containers/8339"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.linux.kernel.containers/8338"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.linux.kernel.containers/8337"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.linux.kernel.containers/8336"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.linux.kernel.containers/8335"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.linux.kernel.containers/8334"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.linux.kernel.containers/8331"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.linux.kernel.containers/8328"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.linux.kernel.containers/8327"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.linux.kernel.containers/8326"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.linux.kernel.containers/8325"/>
      </rdf:Seq>
    </items>
    <image rdf:resource="http://gmane.org/img/gmane-25t.png"/>
    <textinput rdf:resource=""/>
  </channel>
  <image rdf:about="http://gmane.org/img/gmane-25t.png">
    <title>Gmane</title>
    <url>http://gmane.org/img/gmane-25t.png</url>
    <link>http://gmane.org</link>
  </image>
  <item rdf:about="http://permalink.gmane.org/gmane.linux.kernel.containers/8349">
    <title>Re: [PATCH 2/3] cgroups: add inactive subsystems to rootnode.root_list</title>
    <link>http://permalink.gmane.org/gmane.linux.kernel.containers/8349</link>
    <description>
I think you mean "rootnode's subsys_list is always empty".


This is wrong. cgroup_subsys.sibling is a member of the list of
subsystems attached to a given root, and rootnode.root_list is a
member of the list of hierarchies. Did you actually meant
&amp;rootnode.subsys_list?

Paul
</description>
    <dc:creator>Paul Menage</dc:creator>
    <dc:date>2008-12-02T01:35:15</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.linux.kernel.containers/8348">
    <title>Re: [RFC v10][PATCH 09/13] Restore open file descriprtors</title>
    <link>http://permalink.gmane.org/gmane.linux.kernel.containers/8348</link>
    <description>
Well, I at least see why you need it now.  The objhash thing is trying
to be a pretty generic hash implementation and it does need to free the
references up when it is destroyed.  Instead of keeping a "hash of
files" and a "hash of pipes" or other shared objects, there's just a
single hash for everything.

One alternative here would be to have an ops-style release function that
gets called instead of what we have now:
        
        static void cr_obj_ref_drop(struct cr_objref *obj)
        {
                switch (obj-&gt;type) {
                case CR_OBJ_FILE:
                        fput((struct file *) obj-&gt;ptr);
                        break;
                default:
                        BUG();
                }
        }
        
        static void cr_obj_ref_grab(struct cr_objref *obj)
        {
                switch (obj-&gt;type) {
                case CR_OBJ_FILE:
                        get_file((struct file *) obj-&gt;ptr);
                        break;
                default:
                        BUG();
                }
        }

That would make it something like:

struct cr_obj_ops {
int type;
void (*release)(struct cr_objref *obj);
};

void cr_release_file(struct cr_objref *obj)
{
struct file *file = obj-&gt;ptr;
put_file(file);
}

struct cr_obj_ops cr_file_ops = {
.type = CR_OBJ_FILE,
.release = cr_release_file,
};


And the add operation becomes:

get_file(file);
new = cr_obj_add_ptr(ctx, file, &amp;objref, &amp;cr_file_ops, 0);

with 'cr_file_ops' basically replacing the CR_OBJ_FILE that got passed
before.

I like that because it only obfuscates what truly needs to be abstracted
out: the release side.  Hiding that get_file() is really tricky.

But, I guess we could also just kill cr_obj_ref_grab(), do the
get_file() explicitly and still keep cr_obj_ref_drop() as it is now.

</description>
    <dc:creator>Dave Hansen</dc:creator>
    <dc:date>2008-12-02T01:31:08</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.linux.kernel.containers/8347">
    <title>Re: [PATCH 1/3] cgroups: make root_list contains active hierarchies only</title>
    <link>http://permalink.gmane.org/gmane.linux.kernel.containers/8347</link>
    <description>
Why is that change an improvement? If you're concerned about the
comment's accuracy then it would be better to just change it to say "A
list running through all hierarchies (including the inactive subsystem
hierarchy)"?

Also, with this change, "root_count" is no longer the length of the
"roots" list, but instead is length + 1, which doesn't seem good.

Paul

</description>
    <dc:creator>Paul Menage</dc:creator>
    <dc:date>2008-12-02T01:31:10</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.linux.kernel.containers/8346">
    <title>Re: [RFC v10][PATCH 09/13] Restore open file descriprtors</title>
    <link>http://permalink.gmane.org/gmane.linux.kernel.containers/8346</link>
    <description>
We also need to document that we depend on this reference in the hash to
keep the object around.  Take a look at cr_read_fd_data().  Once that
cr_attach_file() has been performed, the only thing keeping the 'file'
around is the hash reference.  If someone happened to remove it from the
hash, the vfs_llseek() below would be bogus.

I don't know how we document that the hash is one-way: writes only and
no later deletions.

</description>
    <dc:creator>Dave Hansen</dc:creator>
    <dc:date>2008-12-02T01:12:06</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.linux.kernel.containers/8345">
    <title>Re: [PATCH 3/3] cgroups: introduce link_css_set() to remove duplicatecode</title>
    <link>http://permalink.gmane.org/gmane.linux.kernel.containers/8345</link>
    <description>
Overall the change looks like an improvement, but:

- the new function could do with comments about the semantics of its
parameters, particularly tmp_cg_links.
- why are you renaming cgrp -&gt; root_cgrp in cgroup_get_sb()? That
seems like unnecessary churn.

Thanks,

Paul

</description>
    <dc:creator>Paul Menage</dc:creator>
    <dc:date>2008-12-02T00:20:42</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.linux.kernel.containers/8344">
    <title>Re: [PATCH 1/2] cgroups: cleanup for dummy root</title>
    <link>http://permalink.gmane.org/gmane.linux.kernel.containers/8344</link>
    <description>
Right - iterating across the tasks in a cgroup for an unmounted
subsystem is a completely valid thing to do. I suppose it might make
sense to optimize that to just scan the tasklist and thread lists in
that case, to avoid the overhead of maintaining the cg links for the
unmounted subsystems, but you'd have to change the cgroup iterator
first, before you stop maintaining the links. Given that I suspect the
main overhead of maintaining the links is taking css_set_lock, I'm not
sure how much it hurts to be keeping the links for the inactive
hierarchies too and keeping the code simpler.

Paul
</description>
    <dc:creator>Paul Menage</dc:creator>
    <dc:date>2008-12-02T00:15:47</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.linux.kernel.containers/8343">
    <title>Re: [RFC v10][PATCH 08/13] Dump open file descriptors</title>
    <link>http://permalink.gmane.org/gmane.linux.kernel.containers/8343</link>
    <description>
My guess is that the mm is probably ok here, but we'll need some work on
the vfs structures, at least eventually.

The mm is nice that it has -&gt;count separated from -&gt;users.  We can
easily compare the sum of mm-&gt;users to the number of tasks we've frozen
since mm-&gt;users has nice defined behavior.

But, we've got suckers like proc_fd_info() that do a
get/put_files_struct().  So, somebody just doing lots of looks in /proc
could stop us from ever checkpointing.  Unfortunately, I know of a
number of "monitoring" programs that do just that. :)

I guess we can use the plain counts for now, and add something like
mm-&gt;users for the vfs structures in a bit.  

</description>
    <dc:creator>Dave Hansen</dc:creator>
    <dc:date>2008-12-01T21:25:45</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.linux.kernel.containers/8342">
    <title>Re: [RFC v10][PATCH 08/13] Dump open file descriptors</title>
    <link>http://permalink.gmane.org/gmane.linux.kernel.containers/8342</link>
    <description>

Dave Hansen wrote:

It doesn't matter if the fdtable changes while we scan the fd's or
after we're done scanning the fd's, does it ?  the end result is the
same - inconsistency.


Again, say we get all the struct file for all those fd's. Now we have to
drop the lock, and start processing the struct files. So what if at this
point the fdtable changes ?  we still end up with potentially inconsistent
state.

We can do it atomically or non-atomically, I couldn't care less.

My argument is the following:
1) Ideally, the state should not change while we are checkpointing
2) If the state changes, then the behavior is undefined (but the kernel
   should not crash, of course).
3) It should be the userspace responsibility to ensure that (1) holds
   (for instance, that all tasks sharing these resources are frozen
   during the checkpoint)

#2 will be an uncommon event - a programmer/user/administrator mistake.
Kernel may help enforce this rule, but catching all such cases will be
expensive as it means looping through all tasks for each new resource.

As a first step, we can ensure that all threads of a process are frozen
at checkpoint time (and prevent unfreeze until checkpoint is over). But
that will not cover a clone() with CLONE_FS. And this will be the same
for other namespaces later.

Oren.
</description>
    <dc:creator>Oren Laadan</dc:creator>
    <dc:date>2008-12-01T21:20:05</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.linux.kernel.containers/8340">
    <title>Re: [RFC v10][PATCH 09/13] Restore open file descriprtors</title>
    <link>http://permalink.gmane.org/gmane.linux.kernel.containers/8340</link>
    <description>
That's kinda (and by kinda I mean really) disgusting.  Hiding that two
levels deep in what is effectively the hash table code where no one will
ever see it is really bad.  It also makes you lazy thinking that the
hash code will just know how to take references on whatever you give to
it.

I think cr_obj_ref_grab() is hideous obfuscation and needs to die.
Let's just do the get_file() directly, please.

</description>
    <dc:creator>Dave Hansen</dc:creator>
    <dc:date>2008-12-01T21:07:31</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.linux.kernel.containers/8339">
    <title>Re: [RFC v10][PATCH 08/13] Dump open file descriptors</title>
    <link>http://permalink.gmane.org/gmane.linux.kernel.containers/8339</link>
    <description>

On Mon, 1 Dec 2008, Dave Hansen wrote:

Umm, why do we even worry about this?

Wouldn't it be much better to make sure that all other threads are 
stopped before we snapshot, and if we cannot account for some thread (ie 
there's some elevated count in the fs/files/mm structures that we cannot 
see from the threads we've stopped), just refuse to dump.

There is no sane dump from a multi-threaded app that shares resources 
without that kind of serialization _anyway_, so why even try?

In other words: any races in dumping are fundamental _bugs_ in the dumping 
at a much higher level. There's absolutely no point in trying to make 
something like "dump open fd's" be race-free, because if there are other 
people that are actively accessing the 'files' structure concurrently, you 
had a much more fundamental bug in the first place!

So do things more like the core-dumping does: make sure that all other 
threads are quiescent first!

Linus
</description>
    <dc:creator>Linus Torvalds</dc:creator>
    <dc:date>2008-12-01T21:02:09</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.linux.kernel.containers/8338">
    <title>Re: [RFC v10][PATCH 09/13] Restore open file descriprtors</title>
    <link>http://permalink.gmane.org/gmane.linux.kernel.containers/8338</link>
    <description>

Dave Hansen wrote:

When a shared object is inserted to the hash we automatically take another
reference to it (according to its type) for as long as it remains in the
hash. See:  'cr_obj_ref_grab()' and 'cr_obj_ref_drop()'.  So by moving that
call higher up, we protect the struct file.

Oren.
</description>
    <dc:creator>Oren Laadan</dc:creator>
    <dc:date>2008-12-01T21:00:54</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.linux.kernel.containers/8337">
    <title>Re: [RFC v10][PATCH 05/13] Dump memory address space</title>
    <link>http://permalink.gmane.org/gmane.linux.kernel.containers/8337</link>
    <description>

Dave Hansen wrote:

True.
(while adapting older and safer code I omitted these tests with no reason).


True.

Should change the type of ctx-&gt;vfsroot to not be a pointer, and do:


and adjust accordingly in where the refcount is dropped.

What we need here is a reference point (this will change later when we handle
multiple fs-namespaces), which is the path of the "container root". Assuming
locking is correct so that current-&gt;fs does not change under us, it's enough
to get that path and later release that path.

BW, the current-&gt;fs is assumed to not change during the checkpoint; if it does,
then it's a mis-use of the checkpoint interface, and the resulting behavior
is undefined - restart is guaranteed to restore the exact old state even if
checkpoint succeeds.

Thanks,

Oren.
</description>
    <dc:creator>Oren Laadan</dc:creator>
    <dc:date>2008-12-01T20:57:09</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.linux.kernel.containers/8336">
    <title>Re: [RFC v10][PATCH 09/13] Restore open file descriprtors</title>
    <link>http://permalink.gmane.org/gmane.linux.kernel.containers/8336</link>
    <description>
Is that sufficient?  It seems like we're depending on the fd's reference
to the 'struct file' to keep it valid in the hash.  If something happens
to the fd (like the other thread messing with it) the 'struct file' can
still go away.

Shouldn't we do another get_file() for the hash's reference?

</description>
    <dc:creator>Dave Hansen</dc:creator>
    <dc:date>2008-12-01T20:54:33</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.linux.kernel.containers/8335">
    <title>Re: [RFC v10][PATCH 08/13] Dump open file descriptors</title>
    <link>http://permalink.gmane.org/gmane.linux.kernel.containers/8335</link>
    <description>
With the realloc() scheme, we have virtually no guarantees about how the
fdtable that we read relates to the source.  All that we know is that
the n'th fd was at this value at *some* time.

Using the scheme that I just suggested (and you evidently originally
used) at least guarantees that we have an atomic copy of the fdtable.

Why is this done in two steps?  It first grabs a list of fd numbers
which needs to be validated, then goes back and turns those into 'struct
file's which it saves off.  Is there a problem with doing that
fd-&gt;'struct file' conversion under the files-&gt;file_lock?

</description>
    <dc:creator>Dave Hansen</dc:creator>
    <dc:date>2008-12-01T20:51:19</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.linux.kernel.containers/8334">
    <title>Re: [RFC v10][PATCH 09/13] Restore open file descriprtors</title>
    <link>http://permalink.gmane.org/gmane.linux.kernel.containers/8334</link>
    <description>

Dave Hansen wrote:

Indeed.


Correct. This call should move higher up befor ethe call to cr_attach_file()


The assumption about tasks being frozen and no additional sharing is generally
more strict, more likely to hold, and easier to enforce for the restart.

Besides the race pointed above which would crash the kernel, the other races
are "ok" - if the user abuses the interface, then the results are "undefined"
(refer to my reply to "..PATCH 808/13] Dump open file descriptors").

Here, too, by "undefined" I mean that the restart syscall may fail, and if it
completes successfully the resulting set of tasks is not guaranteed to behave
correctly. In contrast, if the user uses the interface correctly (ensuring
that the assumption holds), then restart is guaranteed to succeed. Note that
even when the outcome is undefined, there are no security issues - all actions
are limited to what the initiating user can do.


Actually, the code alternates between "file" and "fd", in attempt to resuse
existing code and not do things ourselves:

ret = sys_fcntl(fd, F_SETFL, hh-&gt;f_flags &amp; CR_SETFL_MASK);
        if (ret &lt; 0)
        goto out;
        ret = vfs_llseek(file, hh-&gt;f_pos, SEEK_SET);
        if (ret == -ESPIPE)     /* ignore error on non-seekable files */
                ret = 0;

This is still safe: the file struct is protected with a reference count. If
the fd no longer points to the same struct file, then either it will fail
(e.g. if the fd is invalid) or the restart will eventually succeed but the
resulting state of the tasks will be incorrect (that is: undefined behavior).

Oren.
</description>
    <dc:creator>Oren Laadan</dc:creator>
    <dc:date>2008-12-01T20:41:53</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.linux.kernel.containers/8331">
    <title>Re: [RFC v10][PATCH 08/13] Dump open file descriptors</title>
    <link>http://permalink.gmane.org/gmane.linux.kernel.containers/8331</link>
    <description>

Dave Hansen wrote:

That goes for me too.


Yes, I assume that all tasks possibly sharing the table are frozen for
this to work "right"; by "right" I mean that if checkpoint completes
then a matching restart would complete successfully.

Verifying that the size doesn't change does not ensure that the table's
contents remained the same, so we can still end up with obsolete data.
For that, you'll need to take the lock over a very long period of time,
while you capture and write data about the files.

So back to the assumption: the idea is that if the user does not adhere
to this assumption - ensuring that all tasks are frozen and that there
are no sharing otherwise - then the results are undefined (just as, e.g.
not calling execve() immediately after vfork() gives undefined results).

By "undefined" I mean that it may produce a checkpoint image that can't
be restarted, or fail. If the assumption doesn't hold, then the table may
change, and we will get a list of fd's which is "incorrect". The comment
before this function explicitly says that "The caller must validate the
file descriptors...", so the checkpoint may will fail subsequently, eg.
while trying to access a bad fd. Otherwise, the image file will reflect
a state that is inconsistent, and restart will fail.

Note that in both cases, the code will not crash the kernel (unless you
think I missed something...)

(BTW, the alternative would be to test for such sharing and abort with
an error. The only way I can think of is to loop through the all the
tasks, for each files_struct that we find. But we don't really need to,
given the argument above).


Lol .. that was actually in my original post, and changed in response to
comments that explicitly asked to not count in advance :o

Actually, I think the current code is cleaner, and I don't see how counting
in advance with the lock give us any advantage here.


Not anymore. Indeed ugh...

Thanks,

Oren.

</description>
    <dc:creator>Oren Laadan</dc:creator>
    <dc:date>2008-12-01T20:23:57</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.linux.kernel.containers/8328">
    <title>Re: [RFC v10][PATCH 09/13] Restore open file descriprtors</title>
    <link>http://permalink.gmane.org/gmane.linux.kernel.containers/8328</link>
    <description>
You're just saying to flip the get_file() and fd_install()?


Ahhh.  We're depending on the 'struct file' reference that comes from
the fd table.  That's why there is supposedly "no need to cleanup 'file'
below".  But, some other thread can come along and close() the fd, which
will __fput() our poor 'struct file' and will make it go away.  Next
time we go and pull it out of the hash table, we go boom.

As a quick fix, I think we can just take another get_file() here.  But,
as Al notes, there are some much larger issues that we face with the
fd_table and multi-thread access.  They haven't "mattered" to us so far
because we assume everything is either single-threaded or frozen.
Sounds like Al isn't comfortable with this being integrated until a much
more detailed look has been taken.


One of the things about this that bothers me is that it shares too
little with existing VFS code.  It calls into a ton of existing stuff
but doesn't refactor anything that is currently there.  Surely there are
some common bits somewhere in the VFS that could be consolidated here.  

</description>
    <dc:creator>Dave Hansen</dc:creator>
    <dc:date>2008-12-01T19:22:04</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.linux.kernel.containers/8327">
    <title>[PATCH] user namespaces: require cap_set{ug}id for CLONE_NEWUSER</title>
    <link>http://permalink.gmane.org/gmane.linux.kernel.containers/8327</link>
    <description>thoughts?  (patch is on top of
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6.git#next)

thanks,
-serge

Subject: [PATCH] user namespaces: require cap_set{ug}id for CLONE_NEWUSER

While ideally CLONE_NEWUSER will eventually require no
privilege, the required permission checks are currently
not there.  As a result, CLONE_NEWUSER has the same effect
as a setuid(0)+setgroups(1,"0").  While we already require
CAP_SYS_ADMIN, requiring CAP_SETUID and CAP_SETGID seems
appropriate.

Signed-off-by: Serge E. Hallyn &lt;serue-r/Jw6+rmf7HQT0dZR+AlfA&lt; at &gt;public.gmane.org&gt;

---

 kernel/fork.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

32c36be0621dba3bf05af3d2df843ce803d25831
diff --git a/kernel/fork.c b/kernel/fork.c
index 1dd8945..e3a85b3 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
&lt; at &gt;&lt; at &gt; -1344,7 +1344,8 &lt; at &gt;&lt; at &gt; long do_fork(unsigned long clone_flags,
 /* hopefully this check will go away when userns support is
  * complete
  */
-if (!capable(CAP_SYS_ADMIN))
+if (!capable(CAP_SYS_ADMIN) || !capable(CAP_SETUID) ||
+!capable(CAP_SETGID))
 return -EPERM;
 }
 
</description>
    <dc:creator>Serge E. Hallyn</dc:creator>
    <dc:date>2008-12-01T18:52:15</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.linux.kernel.containers/8326">
    <title>Re: [RFC v10][PATCH 02/13] Checkpoint/restart: initial documentation</title>
    <link>http://permalink.gmane.org/gmane.linux.kernel.containers/8326</link>
    <description>
Sadly, no.  I'm trying to convince Oren that this is important, but I've
been unable to do so thus far.  I'd appreciate any particularly
pathological cases you can think of.

There are two cases here that worry me.  One is the case where we're
checkpointing a container that has been off in its own mount namespace
doing bind mounts to its little heart's content.  We want to checkpoint
and restore the sucker on the same machine.  In this case, we almost
certainly want the kernel to be doing the restoration of the binds.

The other case is when we're checkpointing, and moving to a completely
different machine.  The new machine may have a completely different disk
layout and *need* the admin doing the sys_restore() to set up those
binds differently because of space constraints or whatever.

For now, we're completely assuming the second case, where the admin
(userspace) is responsible for it, and punting.  Do you think this is
something we should take care of now, early in the process?


*Functionally* we know this design can work.  Practically, in Linux, I
have no freaking idea.  The hard part isn't getting it working, of
course.  The hard part is getting something that doesn't step on top of
everyone else's toes in the kernel and, at the same time, is actually
*maintainable*.

</description>
    <dc:creator>Dave Hansen</dc:creator>
    <dc:date>2008-12-01T18:15:32</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.linux.kernel.containers/8325">
    <title>Re: [RFC v10][PATCH 05/13] Dump memory address space</title>
    <link>http://permalink.gmane.org/gmane.linux.kernel.containers/8325</link>
    <description>
Yeah, we do need at least a read_lock(&amp;current-&gt;fs-&gt;lock) to keep people
from chroot()'ing underneath us.


Absolutely.  I assume you mean get_fs_struct(current) instead of
path_get().

</description>
    <dc:creator>Dave Hansen</dc:creator>
    <dc:date>2008-12-01T18:00:12</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.linux.kernel.containers/8324">
    <title>Re: [RFC v10][PATCH 08/13] Dump open file descriptors</title>
    <link>http://permalink.gmane.org/gmane.linux.kernel.containers/8324</link>
    <description>
First of all, thanks for looking at this, Al.  

I think Oren's assumption here is that all tasks possibly sharing the
table would be frozen.  I don't think that's a good assumption,
either. :)

This would be a lot safer and bulletproof if we size the allocation
ahead of time, take all the locks, then retry if the size has changed.

I think that will just plain work of we do this:


  free(fds);
  goto first_kmalloc_in_this_function;


Right?


Ugh, that certainly doesn't have any place here.  I do wonder if Oren
had some use for that in the fully put together code, but it can
certainly go for now.

I'll send patches for these shortly.

</description>
    <dc:creator>Dave Hansen</dc:creator>
    <dc:date>2008-12-01T17:47:25</dc:date>
  </item>
  <textinput rdf:about="http://search.gmane.org/?group=$group=gmane.linux.kernel.containers">
    <title>Search Engine</title>
    <description>Search the mailing list at Gmane</description>
    <name>query</name>
    <link>http://search.gmane.org/?group=$group=gmane.linux.kernel.containers</link>
  </textinput>
</rdf:RDF>
