<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://purl.org/rss/1.0/" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:syn="http://purl.org/rss/1.0/modules/syndication/" xmlns:admin="http://webns.net/mvcb/">
  <channel about="http://blog.gmane.org/gmane.comp.version-control.arch.devel">
    <title>gmane.comp.version-control.arch.devel</title>
    <link>http://blog.gmane.org/gmane.comp.version-control.arch.devel</link>
    <description/>
    <syn:updatePeriod>hourly</syn:updatePeriod>
    <syn:updateFrequency>1</syn:updateFrequency>
    <syn:updateBase>1901-01-01T00:00+00:00</syn:updateBase>
    <items>
      <rdf:Seq>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.version-control.arch.devel/1509"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.version-control.arch.devel/1507"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.version-control.arch.devel/1482"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.version-control.arch.devel/1481"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.version-control.arch.devel/1478"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.version-control.arch.devel/1477"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.version-control.arch.devel/1475"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.version-control.arch.devel/1472"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.version-control.arch.devel/1471"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.version-control.arch.devel/1459"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.version-control.arch.devel/1458"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.version-control.arch.devel/1456"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.version-control.arch.devel/1449"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.version-control.arch.devel/1446"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.version-control.arch.devel/1441"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.version-control.arch.devel/1440"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.version-control.arch.devel/1438"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.version-control.arch.devel/1420"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.version-control.arch.devel/1416"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.version-control.arch.devel/1414"/>
      </rdf:Seq>
    </items>
    <image rdf:resource="http://gmane.org/img/gmane-25t.png"/>
    <textinput rdf:resource=""/>
  </channel>
  <image rdf:about="http://gmane.org/img/gmane-25t.png">
    <title>Gmane</title>
    <url>http://gmane.org/img/gmane-25t.png</url>
    <link>http://gmane.org</link>
  </image>
  <item rdf:about="http://comments.gmane.org/gmane.comp.version-control.arch.devel/1509">
    <title>Showing ancestor in 3 way merge.</title>
    <link>http://comments.gmane.org/gmane.comp.version-control.arch.devel/1509</link>
    <description>Hi,

In thelove&lt; at &gt;canonical.com/bazaar--devo--1.5--patch-20, I /tried/ to
implement a --show-ancestor option for merge commands.

The idea is that the conflict markers should be of the form

&lt;&lt;&lt;&lt;&lt;&lt;&lt; TREE
lines from TREE
||||||| ANCESTOR
lines from ANCESTOR
=======
lines from MERGE-SOURCE

This would be quite usefull in the case of a small patch applied in a
large portion of modified file (TREE and ANCESTOR do not differ much,
but ANCESTOR and MERGE-SOURCE have many difference).

The way I've implemented it was to remove the "-E" option in the call
to diff3. This works fine in the case where TREE and MERGE-SOURCE do
not have the same modifications applied. Unfortunately, it also
reports a conflict when TREE and MERGE-SOURCE are identical and
different from ANCESTOR.

Does anybody know the best way to implement what I'm looking for? I
didn't find an option doing this in diff3. Either it reports too many
conflicts or it doesn't show the ancestor in the conflict markers.

Thanks for your help.

</description>
    <dc:creator>Matthieu Moy</dc:creator>
    <dc:date>2005-08-07T18:47:03</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.version-control.arch.devel/1507">
    <title>important web site updates</title>
    <link>http://comments.gmane.org/gmane.comp.version-control.arch.devel/1507</link>
    <description>
I've made an important change (first link) to 

   http://www.gnuarch.org

and

   http://www.seyza.com


Thanks,
-t




</description>
    <dc:creator>Thomas Lord</dc:creator>
    <dc:date>2005-08-05T19:04:33</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.version-control.arch.devel/1482">
    <title>2.0, history,and merging (was Re: [Gnu-arch-users] arch 2.0 anddelta compression)</title>
    <link>http://comments.gmane.org/gmane.comp.version-control.arch.devel/1482</link>
    <description>
Leaving aside the burning questions of whether the
"codeville merge algorithm" is desirable and, for
that matter, exactly what it is ...

I'll try here to briefly fill in developers on 
what's in revc re ancestry-for-merging tracking.

This is a good point in time, btw, to influence
the final design before it is locked down.

Here are things you need to know about ancestry in
revc, starting with some background:

* no more {arch}

  Revc does away entirely with the `{arch}' mechanism.
  It has no "in-tree" meta-data whatsoever.

  Revc *does* create a `.revc' directory at the root
  of a checked out tree but (a) it does not contain
  a patch log and (b) it is not a "regular directory" --
  it is not part of the inventory of the tree;  it is
  not ever blindly modified by merge algorithms; etc.

  The complete ancestry of a tree (see below) is available
  locally, in the .revc dir.   The complete ancestry of
  a tree can be retrieved from its commit record in "O(1)"
  which means that you can obtain it by opening a file
  and decompressing its contents.


* four pieces of metadata

  Each revision in revc has exactly four pieces of 
  metadata, stored both in each commit record and
  in each .revc directory.  These are:

     manifest -- a list (by path) of files in the tree
     ignored -- a list (by path) of files expected to
                appear in working dirs but which are
                explicitly not "part of the source"
     prereqs -- a list (by commit id) of other
                commits needed in order to build 
                this tree from its own commit
     ancestry -- the history for merge commands (see below)

  The first three of those are not especially relevant 
  to this thread.


* the subtle structure of "fully qualified names"

  Forget everything you know about "fully qualified revision
  names" from 1.x.

  In revc, each commit has three names, each successively more
  specific (qualified).

  The "revision name" is a (nearly) arbitrary, user-selected 
  short string, naming the revision.   Multiple users are
  free to chose the same revision name for different trees.
  A single user is free to commit the same tree multiple times,
  with the same revision name (and to commit multiple trees
  with the same revision name).   In short, the revision name
  is just a "nickname" for a tree and tells you nothing formal
  about the contents of the tree it names:

revc-0.1.1

  The "half-qualified revision name" is a revision name combined
  with a cryptographic hash summarizing both the contents of
  the tree and its merge history.  Two different commits can
  easily create revisions with the same "half-qualified revision 
  name".  The easy way to do that is to commit the same tree 
  contents with the same ancestry, twice -- while varying the
  "prereqs" of the commit.   The harder (but, let's assume eventually
  possible) way to do that is to commit trees that differ in contents
  or ancestry but that happen to wind up with the same checksum:

         revc-0.1.1/+&lt;tree-contents&amp;ancestry-SHA1&gt;

  Finally, the "fully-qualified revision name" is a revision
  name combined with a cryptographic hash summarizing both the 
  half-qualified name *and* the specific package of bits created
  by the commit.   It is probabilistically impossible for a user
  to create two revisions with the same fully-qualified revision
  name casually;  while serious analysis is needed, there are 
  at least good intuitive reasons to believe these will be very hard
  to forge.   The intention is that even very loosely coupled
  communities can rely on the uniqueness of fully-qualified names:

         revc-0.1.1/&lt;commit-data-SHA1&gt;+&lt;tree-contents&amp;ancestry-SHA1&gt;

  Revc, like git (and all crypto-based systems), is thus "trust based"
  but unlike git, revc *trys* to set a very high bar (minimize the
  amount of trust needed).   That's where some expert (not speculative)
  crypto analysis would help:  just how hard to forge do we think
  these fully qualified names will be?  (AFAICT, if SHA1 fell to the
  same state MD5 is currently in, the trustworthiness of a community
  relying on git will be all they have left, short of adding new 
  mechanisms.)


* ancestry

  [n.b.: I lifted (but probably modified, so don't blame them)
   this mechanism indirectly from either baz-ng or the fertile
   mind of abentley.  (Not sure which, actually.  It comes out
   of some random IRC chat.]

  Ancestry in revc is a two level list.   All entries in the list
  are *half-qualified revision names*.  The upper level of the 
  list is the direct lineage of the tree.   The second level of
  the list are trees "merged in".   A pair of decimal numbers
  is used to describe the structure of the two-level list:

        2.0 revc-0.1.1/+&lt;SHA1&gt;      # 2nd revision since import
        2.1 revc-0.1.0be/+&lt;SHA1&gt;    # merged in big-endian support
2.2 revc-0.1.0joe/+&lt;SHA1&gt;   # merged from joe
1.0 revc-0.1.0/+&lt;SHA1&gt;      # call this the first "import"
0.0 nil/+&lt;SHA1&gt;             # the universal ancestor

   The direct ancestry there is:

revc-0.1.1  revc-0.1.0  nil

   and, as noted, revc-0.1.1 merged in from two other trees.

   Whenever you `commit', the generated commit record contains the 
   complete ancestry of the newly created tree.  On disk and for
   transport, the list is compressed (and, as you can see, should
   compress well in excess of the usual 50% rule of thumb).

   Uncompressed, these lists are still of reasonable size even if
   they contain 10s of thousands of entries.  It is not clear that
   their growth rate will *ever* require "history pruning" but,
   if so, at least that point in time is quite a ways off.  A list
   with 100K entries would, today, not be unreasonable to store in
   memory (even perhaps iterating over it and indexing it) in most
   environments.  (By contrast, 10K patch-log entries in 1.x is 
   already a pretty uncomfortable amount.)


-t

p.s.:

* background

  Recall that in the 1.x family, the history of a 
  tree is (for most purposes) implicit in a "patch log"
  which is stored in each tree (all those deeply nested
  files under `{arch}').   Some merge commands care
  only about which revisions are present in the patch log;
  I have heard that some Baz instead trace ancestry via
  the headers in log messages.

  The distinction between the two approaches could be
  described as a difference between intrinsic and extrinsic
  history:  the set of logs is an extrinsic history -- a
  user-editable assertion of ancestry which does not have
  to correspond to the literal history of commits and 
  merges (although it does by default);  the traced ancestry
  is an intrinsic ancestry -- irrevocably determined (modulo
  weaknesses in the implementation) by the actual history
  of merges and commits.

  This general arrangement has received much and multi-faceted
  criticism, a substantial part of which is reasonable.  For
  example, in-tree patch-logs can grow quite large over time
  and become a performance issue.  Another example: some users,
  especially those using fairly anarchic patch flow, find the
  conflation of revision name order with revision history
  problematic.

  Solutions to those issues within the framework of 1.x
  create roughly as many new problems as they solve old 
  problems.   For example, large patch logs make "log pruning"
  a more critical issue but advocates of log pruning tend
  to come up with solutions that add new mechanisms to the
  code and new semantics to revisions -- some of the more 
  radical solutions involve things like extending the archive
  format (always a chore).   Another example: using intrinsic
  rather than extrinsic history, absent new kinds of caching
  or similar, modify Arch client-to-archive access patterns
  substantially an in ways that violate assumptions made in the
  original design.   In general, we can observe people trying
  to solve these problems by "throwing more code at them" --
  never the happiest of choices.

* analyzing the 1.x problems

  The conflation of revision-name-order with history
  is certainly unfortunate.   Users demand freer-form
  names and increasingly we are learning of applications
  for which the graph topology of the existing namespace
  is inappropriate.   It is difficult to retain name-based
  extrinsic history while, at the same time, relaxing the
  namespace syntax and topological structure.

  It is also unfortunate that 1.x commits to keeping 
  in-tree log messages (the text typed by users for a log)
  using the *same mechanism* used to track history.
  Experience has shown that many (most?) log messages
  are useless -- supplied by the user only because it was
  required and conveying no useful information.  The storage
  mechanism, in order to allow log messages to be saved,
  uses a disk-hungry mechanism.


</description>
    <dc:creator>Thomas Lord</dc:creator>
    <dc:date>2005-07-26T19:30:08</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.version-control.arch.devel/1481">
    <title>file-based history commands</title>
    <link>http://comments.gmane.org/gmane.comp.version-control.arch.devel/1481</link>
    <description>Hi all,

We've discussed some time ago about the ability to see which revision
modified a particular file.

I'm hacking on this for baz at the moment. The
file-based-history-aware "logs" command is more or less operational.

Here's a demo (the file used to be scp-global.h and is now
pinapa-global.h):

$ baz logs -s -- include/scp-global.h
patch-12
    Use of namespaces in pinapa
patch-32
    Renamed pinapa source files (scp -&gt; pinapa)
patch-33
    Renamed all macros and autoconf variables prefixed by SCP
patch-34
    Updated Copyrights in headers (use LGPL)

(=&gt; works fine if you specify the original name)


$ baz logs -s -- include/pinapa-global.h                                                                                                                                          
patch-32
    Renamed pinapa source files (scp -&gt; pinapa)
patch-33
    Renamed all macros and autoconf variables prefixed by SCP
patch-34
    Updated Copyrights in headers (use LGPL)


(=&gt; follows renaming from the point the file has the specified name)

$ baz logs -s -r -- include/pinapa-global.h                                                                                                                                       
patch-34
    Updated Copyrights in headers (use LGPL)
patch-33
    Renamed all macros and autoconf variables prefixed by SCP
patch-32
    Renamed pinapa source files (scp -&gt; pinapa)
patch-12
    Use of namespaces in pinapa


(=&gt; with -r, --reverse, you can provide the current name and baz will
follow renamings backwards)

$ baz logs -s -r -- include/pinapa-global.h configure.ac | head
patch-52
    Less strict regexp in the parsing of $CXX --version
patch-43
    Array management in Pinapa [Matthieu.Moy&lt; at &gt;imag.fr--pinapa (patch 0-4)]
patch-35
    A few more renaming (scp -&gt; pinapa)
patch-34
    Updated Copyrights in headers (use LGPL)
patch-33
    Renamed all macros and autoconf variables prefixed by SCP


(=&gt; you can provide as many files as you want on the command line)


Any feedback on the implementation and/or the user interface is
appreciated.

You can find the version on which I'm hacking here:

Matthieu.Moy&lt; at &gt;imag.fr--public/bazaar--file-history--1.5
  Located at: http://www-verimag.imag.fr/~moy/arch/public

</description>
    <dc:creator>Matthieu Moy</dc:creator>
    <dc:date>2005-07-26T16:57:59</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.version-control.arch.devel/1478">
    <title>[BAZ] build on Mac 10.4 fails with -Werror</title>
    <link>http://comments.gmane.org/gmane.comp.version-control.arch.devel/1478</link>
    <description/>
    <dc:creator>John A Meinel</dc:creator>
    <dc:date>2005-07-25T22:58:24</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.version-control.arch.devel/1477">
    <title>[BAZ][BUG] Incomplete ancestry mirror messes up "bazget"</title>
    <link>http://comments.gmane.org/gmane.comp.version-control.arch.devel/1477</link>
    <description/>
    <dc:creator>John A Meinel</dc:creator>
    <dc:date>2005-07-25T22:13:23</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.version-control.arch.devel/1475">
    <title>[BAZ] lock-revision without parameters</title>
    <link>http://comments.gmane.org/gmane.comp.version-control.arch.devel/1475</link>
    <description/>
    <dc:creator>John A Meinel</dc:creator>
    <dc:date>2005-07-25T21:01:26</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.version-control.arch.devel/1472">
    <title>[BUG] Neon 2.5 breaks at least bazaar.</title>
    <link>http://comments.gmane.org/gmane.comp.version-control.arch.devel/1472</link>
    <description>register-archive does a

              t_uchar * temp_location = escape_location (archive_name);

which itself does

t_uchar *
escape_location (t_uchar const *location)
{
    return ne_path_escape(location);
}

which on libneon 2.5 seems to transform "http://server.com" into
"/current/working/dir/http:/server.com".

Well, at least, a friend of mine had this strange behavior on MacOS
10.4.2, and downgrading to libneon 2.4 solved the problem.

</description>
    <dc:creator>Matthieu Moy</dc:creator>
    <dc:date>2005-07-20T09:35:29</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.version-control.arch.devel/1471">
    <title>[BAZ] unit-ar test failure on Cygwin</title>
    <link>http://comments.gmane.org/gmane.comp.version-control.arch.devel/1471</link>
    <description/>
    <dc:creator>John A Meinel</dc:creator>
    <dc:date>2005-07-19T19:06:14</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.version-control.arch.devel/1459">
    <title>Travelling to Brazil</title>
    <link>http://comments.gmane.org/gmane.comp.version-control.arch.devel/1459</link>
    <description/>
    <dc:creator>Robert Collins</dc:creator>
    <dc:date>2005-07-15T14:32:26</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.version-control.arch.devel/1458">
    <title>attn Tom - hackerlab bug</title>
    <link>http://comments.gmane.org/gmane.comp.version-control.arch.devel/1458</link>
    <description/>
    <dc:creator>Robert Collins</dc:creator>
    <dc:date>2005-07-15T10:21:54</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.version-control.arch.devel/1456">
    <title>CVS to arch</title>
    <link>http://comments.gmane.org/gmane.comp.version-control.arch.devel/1456</link>
    <description/>
    <dc:creator>John A Meinel</dc:creator>
    <dc:date>2005-07-14T21:10:13</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.version-control.arch.devel/1449">
    <title>Branch API for baz</title>
    <link>http://comments.gmane.org/gmane.comp.version-control.arch.devel/1449</link>
    <description/>
    <dc:creator>Robert Collins</dc:creator>
    <dc:date>2005-07-14T07:41:54</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.version-control.arch.devel/1446">
    <title>Arch 2.0 second release</title>
    <link>http://comments.gmane.org/gmane.comp.version-control.arch.devel/1446</link>
    <description>  See http://www.seyza.com


  [[title
                         GNU Arch 2.0: `revc'
  ]]

  [[abstract
     /GNU Arch/ is a distributed, democratic, changeset-oriented,
     peer-to-peer revision control system.
  ]]


  [[blockquote
    The second (unstable) release of Arch 2.0 is ready.

    [[blockquote

        &lt;http://www.seyza.com/releases/revc-0.0x1.tar.gz&gt;

        /SHA1:/ `101e282242cfbcf83d38cca23b9e7ae60c579023'

        /Size:/ `1743053'
    ]]

    This release adds the `changes' command which prints
    a description of differences between a tree and its 
    immediate ancestor.


  ]]

  Revc is currently, tentatively self-hosting however I've not yet
  officially published the self-hosting archive: some minor tweaks to
  archive format are still likely to be made in the near future.

  You can read about:

  [[blockquote
     /&lt;"brochure" -- quick-intro.html&gt;/ -- A quick introduction
     to the core of the `revc' command set.

     /&lt;"hacker's guide" -- hackers-guide.html&gt;/ -- A quick guide
     for hackers to the revc design and source code.
  ]]


* `revc' Highest Priority TODO Items

  The code is very clean but could use a light reorg, a careful
  review, dealing with all parts marked `dangle' or `debug'.

  The archive format needs to be modified before I publish my
  archive: it needs to use a slightly deeper directory structure
  and accomodate same-named commits with differing commit-ticket
  blobs.

  The commands need to be implemented: `ancestor', `merged', 
  `get-merge-trees', `prereqs', `init name', `name'.

  Some convenience command is needed that captures the idea of
  "branching" and "tagging".

  The ,es file built in hackerlab is called ,es.exe on windows.
  The makefile should be robust against that.  Scan the mailing
  list to find the other Windows issue(s) reported.
 
  
* `revc' Next Priority TODO Items

  Activate the "blob hint" optimization.

  Implement an inode-based file signature cache.

  Debug the support for partial commits.

  Improve the transactionality of working dirs.




* Copyright

  Copyright /(C)/ 2005 Tom Lord (`lord&lt; at &gt;emf.net')

  This program is free software; you can redistribute it and/or modify
  it under the terms of the /GNU General Public License/ as published by
  the Free Software Foundation; either version 2, or (at your option)
  any later version.

  This program is distributed in the hope that it will be useful,
  but \\/WITHOUT ANY WARRANTY/\\; without even the implied warranty of
  \\/MERCHANTABILITY/\\ or \\/FITNESS FOR A PARTICULAR PURPOSE/\\.  See
the
  /GNU General Public License/ for more details.

  You should have received a copy of the /GNU General Public License/
  along with this software; see the file &lt;"`COPYING'" --
  $/COPYING.html&gt;.  If not, write to the Free Software Foundation, 675
  Mass Ave, Cambridge, MA 02139, USA.

  [[null
   ; arch-tag: Tom Lord Fri Jul  8 11:47:42 2005 (index.txt)
  ]]


</description>
    <dc:creator>Thomas Lord</dc:creator>
    <dc:date>2005-07-13T19:17:56</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.version-control.arch.devel/1441">
    <title>Annotate postscript</title>
    <link>http://comments.gmane.org/gmane.comp.version-control.arch.devel/1441</link>
    <description>
I should add that the resulting design has a 
fairly large MasterAnnotate database on disk.
Small in terms of storage consumed but a pain
to backup or ship over the net.

Two things follow:

The API to Annotate should be abstract enough
to work using a MasterAnnotate DB stored remotely.
Probably this comes for free if the `commit'
storage manager is used as backing store but
the details are unclear to me.   In other words: if 
I mirror your archive but not your MasterAnnotate
db, I should still be able to annotate my branches
of your source, at least as long as I'm connected
to the net.

It is probably a good idea to extend the Annotate
definition with an "uncertain" or "ancient" output
value to cope with cases of partial information: 
clients may wind up using a MasterAnnotate resource
which knows about most but not all of the ancestors
in the subject of their query.

-t



</description>
    <dc:creator>Thomas Lord</dc:creator>
    <dc:date>2005-07-13T18:05:10</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.version-control.arch.devel/1440">
    <title>How to Optimize for Annotate Calculations</title>
    <link>http://comments.gmane.org/gmane.comp.version-control.arch.devel/1440</link>
    <description>
Let's call trees "revisions".  Every revision 
has a unique name.   Every revision has a
list of earlier revisions which are called 
its "ancestors".

We'll assume that files have some kind of
unique identity (could be tags, could just
be by filename) such that we can recognize
the "same file" between two trees, even
if the contents of the file have changed.

If you compute a `diff' between a file and
the same file in the immediate ancestor tree,
all lines that the `diff' reports as unmodified
are considered to be inherited from the ancestors
and all modified or inserted lines are considered
to come from this tree.

So there is a function -- kind of like `annotate' but
not quite:

   Annotate_sorta (Revision, File, Line)
     -&gt; Ancestor_Line | Revision

which can be used to find out which lines are modified
in a given revision.  That function returns a corresponding
line number from the immediate ancestor file if the line
was not modified;  the name of the tree's revision if
the line has been modified.

The annotate function itself is then just:

   Annotate (Rev, File, Line)
    -&gt; X if X is a revision name
       otherwise Annotate (Immediate_ancestor (Rev), 
                           File, 
                           X)
        where
          X = Annotate_sorta (Rev, File, Line)

(Edge cases such as a file that is newly added are not
considered here.)

Suppose that we have a dynamically growing list of revisions for
which we'd like to optimize Annotate computations.  How can we make
this fast yet economically space efficient?

Let's all the list of revisions AllRevs.   The value of AllRevs
grows over time but at any one time we can define:

    MasterAnnotate(File, Line)
     -&gt; list of Revisions, R, such that
        Annotate_sorta (R, File, Line) == R

Since MasterAnnotate doesn't rely on Annotate for its definition,
we can do the opposite: redefine Annotate in terms of MasterAnnotate

    Annotate (Rev, File, Line)
      -&gt; X where 
             X is in MasterAnnotate (File, Line)
         and X is in Ancestors (Rev) or X == Rev
         and X is the most recent such ancestor of Rev


The abstract function MasterAnnotate describes a database which 
maps file identifiers and line numbers to all revisions which
modified that line.   From that, the results of Annotate can
be quickly computed.

We have two kinds of events to consider:  one is when somebody
wants to lookup annotation data in a MasterAnnotate database;
the other is when somebody wants to add to the list of revisions
remembered in the MasterAnnotate database.

This suggests a memo, to me, if it is practical.  In other words:
keep a persistent database that records the complete results of
MasterAnnotate so far and that is cheap to incrementally update
whenever a new revision needs to be indexed.   The questions are:
can that database be small enough yet support fast access and 
can it support fast update?

Observe that, for every line number of every file, a MasterAnnotate
memo needs to store a list of revision names which is a (possibly
complete) sub-list of the list of all revisions monitored by the
database.   For example, the database could keep one bitset per
file line, where the bitsets have as many bits are there are 
revisions being indexed.

So: build a filesystem tree in which subdir and filenames are
used as keys mapping (File, Line) pairs to file contents
which are bitsets.   For example, maybe try to make so that 
4k worth of bitsets are stored in "average" files.

To compute MasterAnnotate, read the right bitsets and read off
the list of revisions.

There is some algorithmic elegance-dancing to do to get all that 
right but my envelope calculations, at least, say that that 
will much less than double the storage requirements for expected-case
history and that both incremental update and MasterAnnotate operations
will be very cheap.   It would probably be not unreasonable to
impose it on the critical path of `commit' (for most people)
or to fork/daemonize incremental update or cron it (for others).

Now, just getting to the *good* part:

Building the suggested MasterAnnotate data structure on a unix
filesystem would be an exercise in tedium.   Don't do it.
Forget what I said about mapping (File, Line) onto filenames.

One doesn't want the annotate database just on local disk, anyway:
one wants it *in the archive*.

Therefore, invent a new kind of commit.  The new kind of commit
includes prereqs and a blob-db just like the regular kind of 
commit -- but doesn't have a commit-ticket or ticket.  Instead, 
it has an annotate-ticket.   And the blobs, instead of representing
directories and files, represent pages of your database of bitsets...

Map (File,Line) pairs into blob addresses.   For ordinary commits
blob addresses happen to be based on content-addressability but
the blob storage and retrieval logical doesn't depend on that.  
You can instead address blobs by "uuid", so to speak.  You can
sanely think of the blob stuff as just a very generic interface
to persistent memory with fairly large interobject references (blob
addresses).

Summary:

Build a secondary index which is basically a (virtually) huge
persistent bitset that describes the ancestry of each line in
each file, making it easy to incrementally update it for each
commit and quick to search it to compute Ancestry.  This isn't
all that hard.   Use the same storage manager as is used for
ordinary commits as the backing store for this database, thereby
levering all the P2P foo, transaction support, etc.

-t

         



</description>
    <dc:creator>Thomas Lord</dc:creator>
    <dc:date>2005-07-13T17:58:06</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.version-control.arch.devel/1438">
    <title>arch 2.0 and delta compression</title>
    <link>http://comments.gmane.org/gmane.comp.version-control.arch.devel/1438</link>
    <description>
weaves,

If people want the *space efficiency* of delta-compression
they are being over-picky: *any* kind of compression that
works across the boundaries of individual files will do.

It's interesting that you site a second reason to want
not only delta-compression but weaves specifically:

the

I'm very skeptical of "the Codeville merge algorithm" for reasons
not terribly important here.   Let's stipulate, for the sake
of conversation, that it's the greatest thing since toast and,
additionally, that it imposes a moral imperative on revctl developers
to optimize-for-annotate.

Piece of cake.   If people are serious, we can perhaps make this
a 9-month goal for revc 2.5.

The difficulty, and why I'm not leaping to the cause on this one
at just this moment, is that optimize-for-annotate complicates
storage management by -- guessing -- 3x lines of code.  I.e., I
have maybe a few hundred lines for reading and writing blobs -- 
that would grow to a couple of thousand.   And trickier code, too.

Since I'm skeptical anything is actually gained, I've got higher
priorities but if you think it through, it's actually quite
easy (just tedious).

-t




</description>
    <dc:creator>Thomas Lord</dc:creator>
    <dc:date>2005-07-12T18:12:14</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.version-control.arch.devel/1420">
    <title>GNU Arch 2.0 -- first source</title>
    <link>http://comments.gmane.org/gmane.comp.version-control.arch.devel/1420</link>
    <description/>
    <dc:creator>Thomas Lord</dc:creator>
    <dc:date>2005-07-08T20:57:46</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.version-control.arch.devel/1416">
    <title>build-config, I need your opinion.</title>
    <link>http://comments.gmane.org/gmane.comp.version-control.arch.devel/1416</link>
    <description>Hi,

I've just added a --switch option to "baz build-config". The behavior
is the following:

* When a subdirectory doesn't exist, create it as usual (baz get)

* When a subdirectory already exist, run "baz switch" to set the
  version to the requested one.

As a result, "baz build-config --switch" looks like "baz update", but
for configurations: It preserves local changes and non-source files in
the project tree, but upgrades to the latest version available.

I'm thinking about making this the default, and provide an option
--backup or something like this to go back to the old behavior.


Perhaps a "--update" running update (keeping the version) instead of
switch would be usefull too. Typical use-case:

I have a local tree of bazaar, developing a feature in my own branch
for ./src/baz. It takes some time, and other people commit to the
master archive, both in baz and hackerlab in the meantime. I merge
from the master, and try to recompile, but figure out I need the
hackerlab modifications too. Then, I could run "baz build-config
--update" that would update every subprojects, get the new ones, and
I'd get an up-to-date configuration, with my version instead of the
upstream in ./src/baz.


What do you think?

Thanks,

</description>
    <dc:creator>Matthieu Moy</dc:creator>
    <dc:date>2005-07-04T21:03:53</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.version-control.arch.devel/1414">
    <title>directory closures in baz and tla 1.x</title>
    <link>http://comments.gmane.org/gmane.comp.version-control.arch.devel/1414</link>
    <description/>
    <dc:creator>Robert Collins</dc:creator>
    <dc:date>2005-07-04T05:28:54</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.version-control.arch.devel/1413">
    <title>FEATURE REQUEST: multiple ids per file</title>
    <link>http://comments.gmane.org/gmane.comp.version-control.arch.devel/1413</link>
    <description/>
    <dc:creator>Jonathan Geisler</dc:creator>
    <dc:date>2005-06-30T17:33:57</dc:date>
  </item>
  <textinput about="http://search.gmane.org/?group=$group=gmane.comp.version-control.arch.devel">
    <title>Search Engine</title>
    <description>Search the mailing list at Gmane</description>
    <name>query</name>
    <link>http://search.gmane.org/?group=$group=gmane.comp.version-control.arch.devel</link>
  </textinput>
</rdf:RDF>
