<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://purl.org/rss/1.0/" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:syn="http://purl.org/rss/1.0/modules/syndication/" xmlns:admin="http://webns.net/mvcb/">
  <channel rdf:about="http://blog.gmane.org/gmane.comp.search.xapian.devel">
    <title>gmane.comp.search.xapian.devel</title>
    <link>http://blog.gmane.org/gmane.comp.search.xapian.devel</link>
    <description/>
    <syn:updatePeriod>hourly</syn:updatePeriod>
    <syn:updateFrequency>1</syn:updateFrequency>
    <syn:updateBase>1901-01-01T00:00+00:00</syn:updateBase>
    <items>
      <rdf:Seq>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.search.xapian.devel/2345"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.search.xapian.devel/2344"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.search.xapian.devel/2343"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.search.xapian.devel/2340"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.search.xapian.devel/2337"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.search.xapian.devel/2332"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.search.xapian.devel/2331"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.search.xapian.devel/2328"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.search.xapian.devel/2325"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.search.xapian.devel/2324"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.search.xapian.devel/2322"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.search.xapian.devel/2321"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.search.xapian.devel/2318"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.search.xapian.devel/2302"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.search.xapian.devel/2301"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.search.xapian.devel/2300"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.search.xapian.devel/2297"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.search.xapian.devel/2292"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.search.xapian.devel/2291"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.search.xapian.devel/2289"/>
      </rdf:Seq>
    </items>
    <image rdf:resource="http://gmane.org/img/gmane-25t.png"/>
    <textinput rdf:resource=""/>
  </channel>
  <image rdf:about="http://gmane.org/img/gmane-25t.png">
    <title>Gmane</title>
    <url>http://gmane.org/img/gmane-25t.png</url>
    <link>http://gmane.org</link>
  </image>
  <item rdf:about="http://comments.gmane.org/gmane.comp.search.xapian.devel/2345">
    <title>Backend for Lucene format indexes-How to getdoclength</title>
    <link>http://comments.gmane.org/gmane.comp.search.xapian.devel/2345</link>
    <description>&lt;pre&gt;Hi, all:

I have wrote a demo patch for Backend for Lucene format indexes, Lucene
version is 3.6.2.
http://lucene.apache.org/core/3_6_2/fileformats.html

Now, this demo patch just support the basic features in Lucene. Compound
File(.cfs/.cfe)、term vector(.tvx/.tvd/.tvf)
delete document(.del) are not supported, skip list in .fdx is not supported
too

example/quest.cc is used to test this demo. query like this:
field_name:term, or file_name:term1  AND field_name:term2

Until now, I found some data needed for BM25 in Xapian are not existed in
Lucene:
1. doclength_lower_bound、doclength_upper_bound
2. wdf_lower_bound、wdf_uppper_bound
3. total_length
4. doclength(for each document)
1-3 are statistics data, can be caculated when doing copydatabase, and
store them in somewhere. But doclengh is
hard to do this way.

1. some other data instead of doclength?
2. Xapian support other rank algorithm which does not need doclength?
Is there some suggestions to solve this problem?

And the demo patch is here:
https://github.com/white127/xapian-patch/blob/master/xapian_lucene_demo.patch

Regards
Jiang
_______________________________________________
Xapian-devel mailing list
Xapian-devel&amp;lt; at &amp;gt;lists.xapian.org
http://lists.xapian.org/mailman/listinfo/xapian-devel
&lt;/pre&gt;</description>
    <dc:creator>jiangwen jiang</dc:creator>
    <dc:date>2013-06-16T04:32:31</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.search.xapian.devel/2344">
    <title>add some class to wrap all operation of file</title>
    <link>http://comments.gmane.org/gmane.comp.search.xapian.devel/2344</link>
    <description>&lt;pre&gt;Sorry for my poor English first.
I am a beginner of xapian. I found that each database backend has own code to handle the file operations such as open/read/write/mkdir/rename...
I want to add some class to wrap all of these.
That will bring some benefits:
* the code which depends on the OS/compiler will be collected in one place.
* developer can write his own File System implement. such as no file lock while he can ensure no concurrency conflict( needn't fork now ), or cacheable file I/O.
* developer can use other "File System" such as his own distributed file system.

I just change the chert backend in xapian-core-1.2.15, add two file into the project: include/xapian/filesystem.h and common/filesystem.cc, and changed the other code in chert/common/net/api.
I only run the simple test in example path( simpleindex and simplesearch ) after patched.
That seems like work fine.

Is it useful for the xapian?

I've add the patch files here:
https://github.com/cytal/Xapian-patched/compare/master...FileSystem


_______________________________________________
Xapian-devel mailing list
Xapian-devel&amp;lt; at &amp;gt;lists.xapian.org
http://lists.xapian.org/mailman/listinfo/xapian-devel
&lt;/pre&gt;</description>
    <dc:creator>cyt&lt; at &gt;coremail.cn</dc:creator>
    <dc:date>2013-05-22T06:56:39</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.search.xapian.devel/2343">
    <title>Better parsing of BM25 parameters in Omega</title>
    <link>http://comments.gmane.org/gmane.comp.search.xapian.devel/2343</link>
    <description>&lt;pre&gt;Hello guys, as discussed on IRC, I have written some code for better
parsing of BM25 parameters in Omega. If no parameters are specified ,it
defaults all of them. However, if there some are specified and some are not
or if the invalid values are given for any of them,it throws an error.

https://github.com/aarshkshah1992/xapian/commit/ac0a11f5d8ff975fad1e96e63764eab9b04dfcfb

-Regards
-Aarsh
_______________________________________________
Xapian-devel mailing list
Xapian-devel&amp;lt; at &amp;gt;lists.xapian.org
http://lists.xapian.org/mailman/listinfo/xapian-devel
&lt;/pre&gt;</description>
    <dc:creator>aarsh shah</dc:creator>
    <dc:date>2013-05-15T13:27:08</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.search.xapian.devel/2340">
    <title>Added support for TfIdf to Omega</title>
    <link>http://comments.gmane.org/gmane.comp.search.xapian.devel/2340</link>
    <description>&lt;pre&gt;Hello guys,I have added code for tfidf to the weight.cc file in omega/ .

Here is the patch : -

https://github.com/aarshkshah1992/xapian/commit/5ff41a15f574e6780cc61e67e7f3da3d97ff4ec8

It compiles well and I think it'll work well.

Here's the link to the documentation file omegascript.rst where I've added
tfidf.

https://github.com/aarshkshah1992/xapian/commit/9434ad15ad8b69691ad45f2d340450b3070f524e

Please let me know what you'll think. Also,I somehow seem to have messed my
branch by trying to merge the tfidf branch with the master branch and so am
not sending a pull request.I'll sort it out soon.

-Regards
-Aarsh
_______________________________________________
Xapian-devel mailing list
Xapian-devel&amp;lt; at &amp;gt;lists.xapian.org
http://lists.xapian.org/mailman/listinfo/xapian-devel
&lt;/pre&gt;</description>
    <dc:creator>aarsh shah</dc:creator>
    <dc:date>2013-04-11T10:19:14</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.search.xapian.devel/2337">
    <title>Question about Project: Posting list encodingimprovements</title>
    <link>http://comments.gmane.org/gmane.comp.search.xapian.devel/2337</link>
    <description>&lt;pre&gt;Dear all:

My name is Xudong Zhang, from Peking University. I'm very interested in the
Project: Posting list encoding improvements, and I have had some experience
on encoding, and so I want to be involved in the project.

About the project，I have a question to consult:

What's the main goal of the improvement of the speed for the encoding and
decoding of posting lists? Is it OK to increase the encoding/decoding speed
greatly with the slight loss of compression?

Below is about my personal information. I'm a master student from Peking
University, Beijing, China. My research area is search engine and web
mining, and I've been focusing on compression algorithms of posting lists
in search engines since last February. I have proposed a novel decoding
algorithm called SIMD-PFORDelta, which exploit SIMD instructions in
PFORDelta. It is fast and can achieve good compression ratio. Recently, I
also did some experiments to test the state-of-the-art and proposed
encoding algorithms of posting lists. We used our own search engine called
PARADISE and the datasets are GOV2(25 million docs) and ClueWeb09B(50
million docs). Below is some experimental results of docIDs’ gap sequence
on ClueWeb09B, which is similar to Lemire's results[1] (The implementation
of SIMD_pfor_delta is somewhat different to that in Lemire's paper):



Algorithm Name

Compression Speed(M/s)

Decompression Speed(M/s)

Compression Ratio

simple9

88.2749297

500.8305639

16.72%

simple16

51.29381855

488.0998879

15.55%

var_byte

492.9379145

519.1921354

26.02%

group_var_byte

243.7189985

501.3713526

31.76%

group_var_byte_incomp_unary

124.5200775

501.8188062

28.77%

group_var_byte_comp_unary

123.1249031

461.8294703

28.70%

SIMD_group_var_byte

240.296232

793.0340184

31.76%

SIMD_group_var_byte_incomp_unary

126.3003402

1325.49703

28.77%

SIMD_group_var_byte_comp_unary

126.3347572

1128.968014

28.70%

packed_binary

252.6257354

1151.000784

27.55%

SIMD_packed_binary

254.2433557

1507.316133

27.55%

pfor_delta

26.99031483

1026.717585

19.53%

SIMD_pfor_delta

27.23024567

1580.111396

19.53%

rice

68.74742448

79.16529964

15.77%

k_gamma

172.9461803

242.8787555

15.24%

gamma_opt

61.7750133

59.30571281

14.91%

SIMD__kgamma

174.1829249

262.0594328

15.24%


Thank you very much!

P.S. In D. Lemire's paper: Decoding billions of integers per second through
vectorization(CORR'12), SIMD-PORDelta can outperforms VSEncoding w.r.t
decoding speed, while remaining nearly the same compression ratio, namely,
bits/int.

[1] D. Lemire et al. Decoding billions of integers per second through
vectorization, CORR’12, http://arxiv.org/abs/1209.2137

Best wishes,

Xudong Zhang

&lt;/pre&gt;</description>
    <dc:creator>张旭东</dc:creator>
    <dc:date>2013-04-05T13:42:28</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.search.xapian.devel/2332">
    <title>GSOC - Posting list encoding improvements</title>
    <link>http://comments.gmane.org/gmane.comp.search.xapian.devel/2332</link>
    <description>&lt;pre&gt;
Done! Tried to get as much information in here as possible.


Guess I understood the request wrong.
Using a binary search tree in order traversal modified version we get
this solution, which is pretty close to what you said:

extra items in the BitReader (or a derived class)
stack&amp;lt;DIState&amp;gt; di_stack; //will get to a maximum of O(logn) space size indeed
DIState di_current;

struct DIState {
  DIState() { set(0, 0, 0, 0); }
  DIState(int p_j, int p_k, Xapian::termpos pos_j, Xapian::termpos
pos_k) { set(j, k, pos_j, pos_k); }
  void set(int j, int k) { this.j = j; this.k = k;, this.pos_j =
pos_j; this.pos_k = pos_k; }
  bool is_nextterm() { return j + 1 &amp;lt; k; };
  bool is_uninitialized() { return j == 0 &amp;amp;&amp;amp; k == 0 &amp;amp;&amp;amp; pos_j == 0 &amp;amp;&amp;amp;
pos_k == 0; }
  Xapian::termpos pos_j, pos_k;
  int j, k;
};

// after calling this method, decode_interpolative_next will
sequentially return the termpos from j + 1 to k - 1
// on all the decoding methods, including decode_interpolative, we
must also check if a decode_interpolative has been started and not
finished, and signal an error or do all the reads from the buffer.
Xapian::termpos
BitReader::decode_interpolative(int j, int k, Xapian::termpos pos_j,
Xapian::termpos pos_k)
{
    di_current(j, k, pos_j, pos_k);
}

Xapian::termpos
BitReader::decode_interpolative_next()
{
    if (di_current.is_uninitialized()) {
        //signal an error
    }
    while (!di_stack.empty() || di_current.is_nextterm()) {
        if (di_current.is_nextterm()) {
            di_current.push(di_current);
            int mid = (j + k) / 2;
            int pos_mid = decode(pos[k] - pos[j] + j - k + 1) +
(pos[j] + mid - j);
            di_current.set(di_current.j, mid, di_current.pos_j, pos_mid);
        } else {
            Xapian::termpos pos_ret = di_current.pos_k;
            if (di_stack.empty() &amp;amp;&amp;amp; !di_current.is_nextterm()) {
                // signal an error. we have called this method too many times.
            }
            di_current = di_stack.top();
            di_stack.pop();
            int mid = (j + k) / 2;
            int pos_mid = pos_ret;
            di_current.set(mid, di_current.k, pos_mid, di_current.pos_k);
            return pos_ret;
        }
    }
}

I may be able to improve this even further by keeping less information
in each DIState (if I can get it otherwise). Will focus on this if the
approach is ok.
Hope the code is self explanatory. Will add more comments if necessary.




&lt;/pre&gt;</description>
    <dc:creator>Marius Tibeica</dc:creator>
    <dc:date>2013-04-02T13:37:57</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.search.xapian.devel/2331">
    <title>FastMail's search using Xapian</title>
    <link>http://comments.gmane.org/gmane.comp.search.xapian.devel/2331</link>
    <description>&lt;pre&gt;G'day,

I thought some folks on this list would be interested to know that the new full text email search feature rolled out today by FastMail

http://blog.fastmail.fm/2013/04/02/fast-full-message-searching-across-all-folders/

is powered by Xapian. All the code for indexing and searching emails is Open Source and in Fastmail's version of the Cyrus IMAP server.

https://github.com/gnb/cyrus-imapd/tree/fastmail

That isn't the most recent code but its pretty close.
We expect that code will be merged into a future release of Cyrus (but not the next release which is imminent).

We evaluated Solr, Sphinx and Xapian. Sphinx was the fastest but had a number of deployment issues due to our unique requirements.
I must say that the Xapian learning curve put us off for a while, but after gaining experience with other search technologies that became less of a problem.
In the end the ability to run Xapian as a library linked with our C application, and thus to control startup times and memory usage more easily than with a standalone server, was the deciding factor.

Anyway, thanks for a great piece of software, and keep up the good work.

Greg._______________________________________________
Xapian-devel mailing list
Xapian-devel&amp;lt; at &amp;gt;lists.xapian.org
http://lists.xapian.org/mailman/listinfo/xapian-devel
&lt;/pre&gt;</description>
    <dc:creator>Greg Banks</dc:creator>
    <dc:date>2013-04-02T09:01:27</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.search.xapian.devel/2328">
    <title>Doubt about GSOC proposal</title>
    <link>http://comments.gmane.org/gmane.comp.search.xapian.devel/2328</link>
    <description>&lt;pre&gt;Hello guys.I have begun work on writing my proposal as discussed on IRC and
will submit a draft in a couple of days so that I can make it detailed and
refine it after getting feedback.

I wanted to know about the number of weeks a proposal should cover and
also,is it okay if I set aside a buffer week somewhere in the middle of the
summer  for something like cleaning the code,working on the feedback  and
getting it merged(merging whatever code Ive completed till then.) ?

-Regards
-Aarsh
_______________________________________________
Xapian-devel mailing list
Xapian-devel&amp;lt; at &amp;gt;lists.xapian.org
http://lists.xapian.org/mailman/listinfo/xapian-devel
&lt;/pre&gt;</description>
    <dc:creator>aarsh shah</dc:creator>
    <dc:date>2013-04-01T16:30:04</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.search.xapian.devel/2325">
    <title>Need help as Pl2 tests not performing as expected</title>
    <link>http://comments.gmane.org/gmane.comp.search.xapian.devel/2325</link>
    <description>&lt;pre&gt;Hello guys. I just ran the updated tests for PL2 and they are not giving
the mset order I expect.Now,the thing is, dfr's behavior is a bit hard to
predict and so even if I expect a particular order ,it may give another
order and still be correct.So,the only way to write correct tests for PL2
is to manually calculate the weight of the documents to decide the expected
order.For that,I need to have a look at the statistics  stored in the
database for my tests.I ran the tests and after it failed ,I tried "delve
db" at the terminal,but it says that it can't open the database.Please can
someone help me this.

Also,the *max possible  *statistic of the Mset *is less than max
attained*and so ,Ill have to have a look at the code again.This may
take some
time,as PL2 has a very complex formula and it's a bit hard to understand
what's happening where.

-Regards
-Aarsh

On Wed, Mar 27, 2013 at 6:23 PM, aarsh shah &amp;lt;aarshkshah1992&amp;lt; at &amp;gt;gmail.com&amp;gt;wrote:

_______________________________________________
Xapian-devel mailing list
Xapian-devel&amp;lt; at &amp;gt;lists.xapian.org
http://lists.xapian.org/mailman/listinfo/xapian-devel
&lt;/pre&gt;</description>
    <dc:creator>aarsh shah</dc:creator>
    <dc:date>2013-03-27T14:14:49</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.search.xapian.devel/2324">
    <title>Major Mistake in pL2 tests in the pull request</title>
    <link>http://comments.gmane.org/gmane.comp.search.xapian.devel/2324</link>
    <description>&lt;pre&gt;Hello guys.I just  realized that Ive not set the weighting scheme to PL2 in
the tests for PL2 and so a default weighting scheme of BM25 is used. I am
extremely sorry for this and am updating the tests by setting the weighting
scheme to PL2.

-Regards
-Aarsh
_______________________________________________
Xapian-devel mailing list
Xapian-devel&amp;lt; at &amp;gt;lists.xapian.org
http://lists.xapian.org/mailman/listinfo/xapian-devel
&lt;/pre&gt;</description>
    <dc:creator>aarsh shah</dc:creator>
    <dc:date>2013-03-27T12:53:01</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.search.xapian.devel/2322">
    <title>Merging of the TfIdf patch</title>
    <link>http://comments.gmane.org/gmane.comp.search.xapian.devel/2322</link>
    <description>&lt;pre&gt;Hello Guys. I have updated the code,tests,documentation,makefile entries
and the registry entry of the* *TfIdf patch as per the feedback.Please do
let me know if any additional changes are required before the  patch can be
merged,

-Regards
-Aarsh

On Sun, Mar 3, 2013 at 2:50 PM, aarsh shah &amp;lt;aarshkshah1992&amp;lt; at &amp;gt;gmail.com&amp;gt; wrote:

_______________________________________________
Xapian-devel mailing list
Xapian-devel&amp;lt; at &amp;gt;lists.xapian.org
http://lists.xapian.org/mailman/listinfo/xapian-devel
&lt;/pre&gt;</description>
    <dc:creator>aarsh shah</dc:creator>
    <dc:date>2013-03-26T15:59:53</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.search.xapian.devel/2321">
    <title>Added feature tests to the PL2 pull request</title>
    <link>http://comments.gmane.org/gmane.comp.search.xapian.devel/2321</link>
    <description>&lt;pre&gt;Hello guys.I have added various tests to the PL2 pull request.They are
working fine. Have also added PL2 to the registry and to the java and
csharp makefiles.Please do let me know what you'll think.Other than the
collection frequency problem discussed on IRC, it is ready.Am now beginning
work on adding code and tests for DPH to the same branch.

-Regards
-Aarsh
_______________________________________________
Xapian-devel mailing list
Xapian-devel&amp;lt; at &amp;gt;lists.xapian.org
http://lists.xapian.org/mailman/listinfo/xapian-devel
&lt;/pre&gt;</description>
    <dc:creator>aarsh shah</dc:creator>
    <dc:date>2013-03-25T15:48:07</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.search.xapian.devel/2318">
    <title>GSoC aspirant interested in Projects related toTesting</title>
    <link>http://comments.gmane.org/gmane.comp.search.xapian.devel/2318</link>
    <description>&lt;pre&gt;Hi,
I am a F/OSS enthusiasts interested in applying for GSoC this time. I am
currently doing my third year of bachelors in Computer Science and
Engineering. I have done some open source contributions till now. I am
familiar working with large code bases and version control systems. I found
the projects related to testing interesting and would like get started with
it and I am comfortable working with C++. Can someone help me with the
project ?

&lt;/pre&gt;</description>
    <dc:creator>sreepriya</dc:creator>
    <dc:date>2013-03-24T02:43:32</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.search.xapian.devel/2302">
    <title>GSOC - 2013 - Introduction (Learning to Rank)</title>
    <link>http://comments.gmane.org/gmane.comp.search.xapian.devel/2302</link>
    <description>&lt;pre&gt;Hello Everyone,

I am looking forward to contribute to Xapian and also apply as a Google
Summer of Code student. I would like to start by introducing myself. I am a
final year M.Sc.(H) Chemistry and B.E. (H) Electronics and Instrumentation
student at BITS - Pilani, Goa. I am interested in Machine Learning and
presently pursuing my thesis on the same. I have been selected for Google
Summer of Code (GSoC)  twice. During GSoC -11, I worked for the Centre for
the Study of Complex Systems, University of Michigan on making evolutionary
model for swarms (http://code.google.com/p/cscs-repast-demos/wiki/Mudit)
and last summer, GSoC - 12, I worked on making HLA interfaces for The
network simulator - 3 to aid ns-3 in distributed simulation (
http://www.nsnam.org/wiki/index.php/GSOC2012HLA). My first-degree thesis
last semester was on computational cognitive modelling of artificial
investors(
http://code.google.com/p/multiagent-reinforcement-learning/downloads/list).
This semester I am working on the use of probabilistic algorithms for
dynamic gesture recognition. My googlecode profile can be found here (
http://code.google.com/u/110675325175605367090/ ) and my linkedIn profile
can be found here (
http://www.linkedin.com/profile/view?id=79832898&amp;amp;trk=tab_pro)

I am interested in "Learning To Rank" project.  If I am not wrong, I found
the framework incorporated by Parth in the cloned code. It needed some
refactoring in order to incorporate more algorithms and was done by Rishabh
and available in his git repo (https://github.com/rishabhmehrotra/xapian)
but is still not merged. So, I assume I should think of additions to the
code in Rishbh's repo. Moreover, I noticed that SVM-rank, ListMLE and
ListNet is already present in the code. I am interested in addition of a
random forest approach and looking for appropriate libraries. I would be
great to get input by the Xapian community in terms of preference of
algorithms and open source libraries. It would also be great to know the
priority of the Letor project to the Xapian community.

Any further pointers/links related to the "Letor" project would be
appreciated. Thank you for your time.

Best Regards,

Mudit Raj Gupta
_______________________________________________
Xapian-devel mailing list
Xapian-devel&amp;lt; at &amp;gt;lists.xapian.org
http://lists.xapian.org/mailman/listinfo/xapian-devel
&lt;/pre&gt;</description>
    <dc:creator>Mudit Gupta</dc:creator>
    <dc:date>2013-03-21T13:28:41</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.search.xapian.devel/2301">
    <title>Gsoc help.</title>
    <link>http://comments.gmane.org/gmane.comp.search.xapian.devel/2301</link>
    <description>&lt;pre&gt;My name is Udit and I’m a 3rd year undergrad at BITS,Pilani, India.
I have experience in programming (C/C++, python, Java) and web
development(PHP, HTML5, CSS).

I went through the project ideas mentioned on your GSOC project ideas page
 and found "Improving Python's Bindings " quite interesting.

Could I have more information about this project? Can this project pan out
the entire summer and does it have enough potential to be taken as a GSOC
project ?

&lt;/pre&gt;</description>
    <dc:creator>Udit Saxena</dc:creator>
    <dc:date>2013-03-21T12:41:17</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.search.xapian.devel/2300">
    <title>(no subject)</title>
    <link>http://comments.gmane.org/gmane.comp.search.xapian.devel/2300</link>
    <description>&lt;pre&gt;Hi,****

** **

My name is Abhishek and I’m a 2nd year undergrad at Indian Institute of
Technology Roorkee. I've experience in programming (Java, C++, python) and
web development(PHP, javascript, jquery, CSS). I went through the project
ideas mentioned on your GSOC web page and found "Performance Test Suite" quite
interesting. Could I have more information about this project?

** **

Could I also have some info about "Java Binding Improvement" project?


I’m very much interested in participating in GSOC 2013 and would love to
contribute as much as I can. Thanks in advance.****

 **

Best Regards,****

Abhishek
_______________________________________________
Xapian-devel mailing list
Xapian-devel&amp;lt; at &amp;gt;lists.xapian.org
http://lists.xapian.org/mailman/listinfo/xapian-devel
&lt;/pre&gt;</description>
    <dc:creator>Abhishek Tyagi</dc:creator>
    <dc:date>2013-03-21T10:56:58</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.search.xapian.devel/2297">
    <title>Registering a weighting scheme with Xapian</title>
    <link>http://comments.gmane.org/gmane.comp.search.xapian.devel/2297</link>
    <description>&lt;pre&gt;Hello guys,I've modified the TfIdf patch as per the feedback I got on it
and have added the code to the pull request. Please do have a look and let
me now what you'll think.

https://github.com/xapian/xapian/pull/6

Also,I read somewhere that I need to register this weighting scheme with
Xapian. Please can you'll throw some light on that ?

-Regards
-Aarsh
_______________________________________________
Xapian-devel mailing list
Xapian-devel&amp;lt; at &amp;gt;lists.xapian.org
http://lists.xapian.org/mailman/listinfo/xapian-devel
&lt;/pre&gt;</description>
    <dc:creator>aarsh shah</dc:creator>
    <dc:date>2013-03-20T15:23:37</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.search.xapian.devel/2292">
    <title>Implementing a dummy DFR framework for feedback</title>
    <link>http://comments.gmane.org/gmane.comp.search.xapian.devel/2292</link>
    <description>&lt;pre&gt;Hey guys, hi :) I have some ideas about implementing a general DFR
framework that will be able to generate any DFR model.

I'm thinking of implementing a dummy DFR framework that will be capable of
generating a fixed named scheme right now but can be extended later to
include all the schemes.Is it okay if I send a pull request for such a
dummy framework to get it reviewed by the community so that I can get
feedback on it and get an idea of what the whole framework should look like
? However,it wont be the kind of code that can be merged as a lot of
functionalities will be missing.

-Regards
-Aarsh
_______________________________________________
Xapian-devel mailing list
Xapian-devel&amp;lt; at &amp;gt;lists.xapian.org
http://lists.xapian.org/mailman/listinfo/xapian-devel
&lt;/pre&gt;</description>
    <dc:creator>aarsh shah</dc:creator>
    <dc:date>2013-03-19T16:39:00</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.search.xapian.devel/2291">
    <title>Up and running with ZeroMQ</title>
    <link>http://comments.gmane.org/gmane.comp.search.xapian.devel/2291</link>
    <description>&lt;pre&gt;Hi,

As suggested earlier, I'm getting adept with the usage of ZeroMQ and
understand its usage in the present context.

What I have done till now is :-

(1) Study ZeroMQ API and get in ease with its usage. Still implementing
sample programs and learning basic functions.

(2) Installed, compiled and setup my development environment with Xapian
and ZeroMQ. Lots of work here already!

Though, I read and understood some code available in *xapian-core/net, *I
require a bit of help as follows :

*(1) Protocol usage :* The backend functions in two modes - prog and tcp.
Prog is the mode where the entire program-space is spawned for every
request and tcp functions normally as TCP sockets do. But both of them
remain quite different from the standard HTTP framework/s. Any insight on
how ZeroMQ would suffice?

*(2) Object handling :* Within the scope of the project, object handling is
yet another important feature to take care of.


processing takes place?

*(3) Idea Deliverables :* As most other ideas have a set of deliverables
defined, what are those for this idea? Kindly enlist a few, if possible.

&lt;/pre&gt;</description>
    <dc:creator>Ankit Bhatnagar</dc:creator>
    <dc:date>2013-03-18T18:20:31</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.search.xapian.devel/2289">
    <title>patch - add SnippetGenerator class</title>
    <link>http://comments.gmane.org/gmane.comp.search.xapian.devel/2289</link>
    <description>&lt;pre&gt;G'day,

This is some code which have been using for some time now in our
searching beta, forward ported from 1.2.12 to today's git.  It adds a
class Xapian::SnippetGenerator which can be used to generate
human-readable snippet strings.  By default, the snippets are HTML
formatted, with the matched search term inside a &amp;lt;b&amp;gt; tag, 5 words of
context around matched search terms, and ellipsis "..." between
non-adjacent context.

There is a certain amount of code duplication with the TermGenerator
class, which is ugly but I considered preferable to major surgery to
separate out the various parsing phases of TermGenerator so that some of
them could be re-used for SnippetGenerator.  Sorry.

Yes, I know there is a GSOC project to add snippets to Xapian.  That
code did not appear to be stable when I needed it, so I wrote my own. 
Perhaps it might be useful to someone.

&lt;/pre&gt;</description>
    <dc:creator>Greg Banks</dc:creator>
    <dc:date>2013-03-13T05:56:44</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.search.xapian.devel/2285">
    <title>DFR framework as a GSOC project</title>
    <link>http://comments.gmane.org/gmane.comp.search.xapian.devel/2285</link>
    <description>&lt;pre&gt;Hey guys,hi.:) I've finished implementing the PL2 scheme . The bounds I
have implemented for it are as good as I could, given the nature of the
scheme and my mathematical skills.However,tight bounds for other named DFR
schemes will  be easier to implement because their forumlas are quite
simpler compared to PL2 . Will send in a  pull request in a couple of days
once I'm done with the tests and the documentation.

I'll now start working on the DPH scheme as described in Section 3 here:-
http://trec.nist.gov/pubs/trec18/papers/uglasgow.BLOG.ENT.MQ.RF.WEB.pdf

Now that GSOC is coming near,I want to start working on my proposal to make
it as detailed as possible  and my aim is to implement document weighting
and query expansion using the DFR Framework(currently,we have a hard coded
formula for query expansion). I hope to complete as many named DFR schemes
as I can before the application period starts so that during GSOC , I can
focus on implementing the DFR Framework which will allow the user to create
any DFR scheme that he wants to and also implement Query Expansion using
the DFR Framework .I hope  to be able to do the following work by the end
of GSOC:-

1.) About 8 named frequently used DFR schemes mentioned on the terrier
homepage and those mentioned by Olly on IRC.Each of these will be an
independent weighting scheme subclassed from Xapian::Weight .
2.) A DFR framework which allows the creation of any DFR scheme by choosing
a probablistic model,a risk gain normalization and a term frequency
normalization.
3/) Implement Query Expansion by using the DFR schemes and allow the user
to choose any named scheme to expand the query (just like we do for MSet.)

II hope to able to finish at least 50 % of 1.) before April ends so that I
can focus on 2.) and 3.) during the summer.

Please do comment on this and let me know what you think.Also,I have no
prior experience with writing proposals and so,please can you tell me what
a proposal for something like this should include ? I'd really appreciate
your help.

-Regards
-Aarsh
_______________________________________________
Xapian-devel mailing list
Xapian-devel&amp;lt; at &amp;gt;lists.xapian.org
http://lists.xapian.org/mailman/listinfo/xapian-devel
&lt;/pre&gt;</description>
    <dc:creator>aarsh shah</dc:creator>
    <dc:date>2013-03-15T16:32:24</dc:date>
  </item>
  <textinput rdf:about="http://search.gmane.org/?group=$group=gmane.comp.search.xapian.devel">
    <title>Search Engine</title>
    <description>Search the mailing list at Gmane</description>
    <name>query</name>
    <link>http://search.gmane.org/?group=$group=gmane.comp.search.xapian.devel</link>
  </textinput>
</rdf:RDF>
