<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://purl.org/rss/1.0/" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:syn="http://purl.org/rss/1.0/modules/syndication/" xmlns:admin="http://webns.net/mvcb/">
  <channel rdf:about="http://blog.gmane.org/gmane.comp.lang.perl.bio.general">
    <title>gmane.comp.lang.perl.bio.general</title>
    <link>http://blog.gmane.org/gmane.comp.lang.perl.bio.general</link>
    <description/>
    <syn:updatePeriod>hourly</syn:updatePeriod>
    <syn:updateFrequency>1</syn:updateFrequency>
    <syn:updateBase>1901-01-01T00:00+00:00</syn:updateBase>
    <items>
      <rdf:Seq>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.lang.perl.bio.general/26683"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.lang.perl.bio.general/26682"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.lang.perl.bio.general/26681"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.lang.perl.bio.general/26680"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.lang.perl.bio.general/26679"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.lang.perl.bio.general/26678"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.lang.perl.bio.general/26677"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.lang.perl.bio.general/26676"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.lang.perl.bio.general/26675"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.lang.perl.bio.general/26674"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.lang.perl.bio.general/26673"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.lang.perl.bio.general/26672"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.lang.perl.bio.general/26671"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.lang.perl.bio.general/26670"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.lang.perl.bio.general/26669"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.lang.perl.bio.general/26668"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.lang.perl.bio.general/26667"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.lang.perl.bio.general/26666"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.lang.perl.bio.general/26665"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.lang.perl.bio.general/26664"/>
      </rdf:Seq>
    </items>
    <image rdf:resource="http://gmane.org/img/gmane-25t.png"/>
    <textinput rdf:resource=""/>
  </channel>
  <image rdf:about="http://gmane.org/img/gmane-25t.png">
    <title>Gmane</title>
    <url>http://gmane.org/img/gmane-25t.png</url>
    <link>http://gmane.org</link>
  </image>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.lang.perl.bio.general/26683">
    <title>Re: Building Many trees with Bioperl</title>
    <link>http://permalink.gmane.org/gmane.comp.lang.perl.bio.general/26683</link>
    <description>&lt;pre&gt;Hi,

If I just run phyml on the command line it seems to run ok - it accept's
my file and appears to undergo the tree building process - I haven't
actually see it complete yet, but ML and Bayes always does take a while -
and I have many reads that need to be aligned. But PhyML gets to the point
it asks - are you sure you want to proceed - I say yes, then it keeps
quiet and is currently working along to itself:

. 766 patterns found (out of a total of 795 sites).

. 58 sites without polymorphism (7.30%).

. Computing pairwise distances...

. Building BioNJ tree...

. WARNING: this analysis requires at least 556 MB of memory space.

. Do you really want to proceed? [Y/n] Y


It appears to be working =/
 
Best,
Ben.



On 24/05/2013 22:25, "Brian Osborne" &amp;lt;bosborne11&amp;lt; at &amp;gt;verizon.net&amp;gt; wrote:

&lt;/pre&gt;</description>
    <dc:creator>Ben Ward (TSL</dc:creator>
    <dc:date>2013-05-24T21:46:40</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.lang.perl.bio.general/26682">
    <title>Re: Building Many trees with Bioperl</title>
    <link>http://permalink.gmane.org/gmane.comp.lang.perl.bio.general/26682</link>
    <description>&lt;pre&gt;Ben,

What happens when you take the Phyml command itself and run it from the command line?

Also, a minor point: the message "MSG: Replacing one sequence [FXCNDTJ02P/1-366]" is not an error, it is a warning. An error accompanies an exit, warnings are just informative.

Brian O.


On May 24, 2013, at 3:14 PM, Ben Ward (TSL) &amp;lt;Ben.Ward&amp;lt; at &amp;gt;sainsbury-laboratory.ac.uk&amp;gt; wrote:

&lt;/pre&gt;</description>
    <dc:creator>Brian Osborne</dc:creator>
    <dc:date>2013-05-24T21:25:38</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.lang.perl.bio.general/26681">
    <title>Building Many trees with Bioperl</title>
    <link>http://permalink.gmane.org/gmane.comp.lang.perl.bio.general/26681</link>
    <description>&lt;pre&gt;Hi,

I'm new to Bioperl and plan to make a script to automate making trees with many alignment files (themselves generated by automating the process of multiple alignment for many datasets by using clustalw in a bioperl script).
But I need to get it right for one pice of test data before I can do it for all:

What I have produced so far is the below. It's supposed to load in the alignment file as as SimpleAlign. Then use that alignment in phyml. I looked at the documentation and tried to follow examples.

However it gives me many errors like:
--------------------- WARNING ---------------------
MSG: Replacing one sequence [FXCNDTJ02P/1-366]

And then gives me:
Phyml command = /Users/wardb/clustalw2/phyml /var/folders/kp/clkqvqn9739ffw2755zjwy74_skf_z/T/MTFzfN4jED/aln8058.phylip  0 i 1 0 HKY e e 1 e BIONJ y y

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Phyml call (/Users/wardb/clustalw2/phyml /var/folders/kp/clkqvqn9739ffw2755zjwy74_skf_z/T/MTFzfN4jED/aln8058.phylip  0 i 1 0 HKY e e 1 e BIONJ y y) did not give an output [/var/folders/kp/clkqvqn9739ffw2755zjwy74_skf_z/T/MTFzfN4jED/aln8058.phylip_phyml_stat.txt]: 11
STACK: Error::throw
STACK: Bio::Root::Root::throw /Library/Perl/5.12/Bio/Root/Root.pm:472
STACK: Bio::Tools::Run::Phylo::Phyml::_run /Library/Perl/5.12/Bio/Tools/Run/Phylo/Phyml.pm:851
STACK: /Library/Perl/5.12/Bio/Tools/Run/Phylo/Phyml.pm:338
-----------------------------------------------------------

Can someone let me know if I'm going about this correctly and what I need to do to get it to work. I've also tried to run phyml by giving the filename in the run() method like:
my $inputfilename = 'outputalignmentfile';
my $tree = $factory-&amp;gt;run("$inputfilename");

But I get the same EXCEPTION: Bio::Root::Exception message.

Thanks,
Ben W.

SCRIPT ---

#!/usr/bin/perl
use warnings;
use strict;
BEGIN { $ENV{PHYMLDIR} = '/Users/wardb/clustalw2' }
use Bio::TreeIO;
use Bio::AlignIO;
use Bio::Tools::Run::Phylo::Phyml;

my $alnin = Bio::AlignIO-&amp;gt;new(-file =&amp;gt; "&amp;lt;outputalignmentfile",
-format =&amp;gt; 'phylip');

my $aln = $alnin-&amp;gt;next_aln();


# Make a Phyml factory.
my $factory = Bio::Tools::Run::Phylo::Phyml-&amp;gt;new(-verbose =&amp;gt; 2,
 -data_type =&amp;gt; 'dna');

# Pass the factory an alignment and run:
# my $inputfilename = 'outputalignmentfile';

my $tree = $factory-&amp;gt;run($aln); # $tree is a Bio::Tree::Tree object.


# Setup tree output stream...
my $treeio = Bio::TreeIO-&amp;gt;new(-format =&amp;gt; 'newick',
 -file =&amp;gt; 'tree.newick');

$treeio-&amp;gt;write_tree($tree);

exit 0;
&lt;/pre&gt;</description>
    <dc:creator>Ben Ward (TSL</dc:creator>
    <dc:date>2013-05-24T19:14:17</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.lang.perl.bio.general/26680">
    <title>Re: Phylip format error</title>
    <link>http://permalink.gmane.org/gmane.comp.lang.perl.bio.general/26680</link>
    <description>&lt;pre&gt;On Thu, May 23, 2013 at 3:05 PM, Fields, Christopher J
&amp;lt;cjfields&amp;lt; at &amp;gt;illinois.edu&amp;gt; wrote:

The relaxed phylip 'format' goes back further than that, e.g.
http://lists.open-bio.org/pipermail/biopython-dev/2008-June/003899.html

RAxML and PHYML support relaxed phylip - but with their own ID limits.

Peter
&lt;/pre&gt;</description>
    <dc:creator>Peter Cock</dc:creator>
    <dc:date>2013-05-23T14:53:09</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.lang.perl.bio.general/26679">
    <title>Re: Speed issues with making IUPAC consensus from alignment</title>
    <link>http://permalink.gmane.org/gmane.comp.lang.perl.bio.general/26679</link>
    <description>&lt;pre&gt;(keep the list cc'd)

On May 22, 2013, at 6:31 PM, Senanu &amp;lt;senanu.junk&amp;lt; at &amp;gt;gmail.com&amp;gt; wrote:


That's interesting, but again not surprising.  One would have to look at the code, but I wouldn't be surprised if the method is terribly inefficient.

chris
&lt;/pre&gt;</description>
    <dc:creator>Fields, Christopher J</dc:creator>
    <dc:date>2013-05-23T13:56:32</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.lang.perl.bio.general/26678">
    <title>Re: Phylip format error</title>
    <link>http://permalink.gmane.org/gmane.comp.lang.perl.bio.general/26678</link>
    <description>&lt;pre&gt;

As Adam pointed out, prior to the introduction of 'relaxed phylip' we had an alternative solution that didn't require a modified format but still allowed one to use PHYLIP and other tools requesting the format.  I think 'relaxed phylip' was introduced by CIPRES a few years back.  Frankly, this is the first time I have seen this mentioned on the list; yay, yet another format variation :)

The variant format parsing (as implemented for SeqIO::fastq, as you know) deals with variant names like 'fastq-sanger', where the main format name is first, the variant of the format second.  The order in this case is reversed (relaxed-phylip), which I'm pretty sure will not work.  Not impossible to allow, but we would probably allow support like this initially:

my $in = Bio::AlignIO-&amp;gt;new(-format =&amp;gt; 'phylip',
                           -variant =&amp;gt; 'relaxed',
                             …);

chris
&lt;/pre&gt;</description>
    <dc:creator>Fields, Christopher J</dc:creator>
    <dc:date>2013-05-23T14:05:32</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.lang.perl.bio.general/26677">
    <title>Re: Phylip format error</title>
    <link>http://permalink.gmane.org/gmane.comp.lang.perl.bio.general/26677</link>
    <description>&lt;pre&gt;Alexey,

Just want to point out that 'relaxed phylip' format was introduced long after this parser was created; in fact (as Adam points out) there was an alternative workaround to deal with the lossy names.  

The content of that page is on a wiki, which anyone is free to edit (just need an OpenID to set up an account).

chris

On May 23, 2013, at 2:22 AM, Alexey Morozov &amp;lt;alexeymorozov1991&amp;lt; at &amp;gt;gmail.com&amp;gt; wrote:

&lt;/pre&gt;</description>
    <dc:creator>Fields, Christopher J</dc:creator>
    <dc:date>2013-05-23T13:48:31</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.lang.perl.bio.general/26676">
    <title>Re: Phylip format error</title>
    <link>http://permalink.gmane.org/gmane.comp.lang.perl.bio.general/26676</link>
    <description>&lt;pre&gt;
Not sure if there is an actual question in these messages, but BioPerl
can be used to generate valid Phylip format and run, like this:

## Build Align object
my $aln = Bio::SimpleAlign-&amp;gt;new(-seqs=&amp;gt;$seqs);

## swap the taxa names with 8 characters long unique IDs
my ($aln_safe, $ref_name) = $aln-&amp;gt;set_displayname_safe(8);

## Write out phylip format infile
Bio::AlignIO-&amp;gt;new(-file=&amp;gt;'&amp;gt;infile.out', -format=&amp;gt;'phylip', -interleaved
=&amp;gt; 0)-&amp;gt;write_aln($aln);

## run PHYLIP's pars program
my &amp;lt; at &amp;gt;params = (idlength=&amp;gt;10);   #, jumble=&amp;gt;"17,10");
my $tree_factory = Bio::Tools::Run::Phylo::Phylip::Pars-&amp;gt;new(&amp;lt; at &amp;gt;params);
$tree_factory-&amp;gt;quiet(1);  # Suppress pars messages to terminal
my $tree = $tree_factory-&amp;gt;create_tree($aln_safe);

## fix the node labels back
my &amp;lt; at &amp;gt;nodes = sort { defined $a-&amp;gt;id &amp;amp;&amp;amp; defined $b-&amp;gt;id &amp;amp;&amp;amp; $a-&amp;gt;id cmp $b-&amp;gt;id
} $tree-&amp;gt;get_nodes();
foreach my $nd (&amp;lt; at &amp;gt;nodes) {
if ( $nd-&amp;gt;is_Leaf ) {
$nd-&amp;gt;id($ref_name-&amp;gt;{$nd-&amp;gt;id_output})
}
}

HTH

Adam

On 23/05/2013 08:22, Alexey Morozov wrote:
&lt;/pre&gt;</description>
    <dc:creator>Adam Witney</dc:creator>
    <dc:date>2013-05-23T08:43:15</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.lang.perl.bio.general/26675">
    <title>Re: Phylip format error</title>
    <link>http://permalink.gmane.org/gmane.comp.lang.perl.bio.general/26675</link>
    <description>&lt;pre&gt;On Thu, May 23, 2013 at 8:22 AM, Alexey Morozov
&amp;lt;alexeymorozov1991&amp;lt; at &amp;gt;gmail.com&amp;gt; wrote:

Biopython's AlignIO defines both a (strict) "phylip" and "relaxed-phylip"
as two separate formats (or variants, like the "fastq" variants). Doing
the same in BioPerl would seem sensible since auto-detection is not
easy.

http://biopython.org/wiki/AlignIO#File_Formats

Peter

P.S. Where does that 250 characters for the taxon name limit come from?
The trouble with relaxed phylip is that some tools are more relaxed than
others ;)
&lt;/pre&gt;</description>
    <dc:creator>Peter Cock</dc:creator>
    <dc:date>2013-05-23T08:30:21</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.lang.perl.bio.general/26674">
    <title>Re: Phylip format error</title>
    <link>http://permalink.gmane.org/gmane.comp.lang.perl.bio.general/26674</link>
    <description>&lt;pre&gt;Which is also worsened by the fact that there is relaxed phylip format,
which allows up to 250 chars for taxon name. They are separated from a
sequence by single space, which creates problems if names were extended to
10 chars in strict Felsenstein's format by whitespaces. On the whole,
phylip is as messily defined format as one can make from a plain textfile
with information content of fasta.
Bioperl documentation says nothing about whether Bio::SeqIO accepts relaxed
phylip and how does it tell dialects from one another. Even if code support
is OK, it may be worthwile to explain it somewhere at bioperl.org


2013/5/21 Bernard Cohen &amp;lt;b.l.cohen_home&amp;lt; at &amp;gt;btinternet.com&amp;gt;




&lt;/pre&gt;</description>
    <dc:creator>Alexey Morozov</dc:creator>
    <dc:date>2013-05-23T07:22:13</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.lang.perl.bio.general/26673">
    <title>Re: Speed issues with making IUPAC consensus from alignment</title>
    <link>http://permalink.gmane.org/gmane.comp.lang.perl.bio.general/26673</link>
    <description>&lt;pre&gt;

Probably the former, but...


It shouldn't take that long, 7 Mb isn't that large.  Or is that 7 Mb for one genome?


The code isn't really optimized for this, but again this isn't terribly large.  Is the bottleneck reading the alignment in, or is it the consensus_iupac() step?  Hard to say w/o seeing the alignment data itself.


chris
&lt;/pre&gt;</description>
    <dc:creator>Fields, Christopher J</dc:creator>
    <dc:date>2013-05-22T23:17:50</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.lang.perl.bio.general/26672">
    <title>Speed issues with making IUPAC consensus from alignment</title>
    <link>http://permalink.gmane.org/gmane.comp.lang.perl.bio.general/26672</link>
    <description>&lt;pre&gt;Hi all,

I am wondering if the consensus_iupac method of Bio::Align is known to be extremely slow, or if I'm doing something wrong. 

I have bacterial whole-genome alignments (~7 Mbases) that I made in progressiveMauve and wish to get an IUPAC consensus. (I know that progressiveMauve uses a non-standard XMFA format, but Bio::AlignIO seems to read them just fine.) The code below takes more than all night to make a consensus. It works fine on tiny test alignments. 

Is this a known problem? Is there another way to generate such a consensus?


my $in = Bio::AlignIO-&amp;gt;new(-file =&amp;gt; $files[0],
                           -format =&amp;gt; 'XMFA');
while  (my $aln = $in-&amp;gt;next_aln()) {
    foreach  my $seq ($aln-&amp;gt;each_seq) {
        $seq-&amp;gt;alphabet('dna');
    }
    my $con = $aln-&amp;gt;consensus_iupac();
}


Thanks in advance.
Ngwenyama
&lt;/pre&gt;</description>
    <dc:creator>Senanu</dc:creator>
    <dc:date>2013-05-22T20:15:24</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.lang.perl.bio.general/26671">
    <title>Phylip format error</title>
    <link>http://permalink.gmane.org/gmane.comp.lang.perl.bio.general/26671</link>
    <description>&lt;pre&gt;Hello!

I happen to have checked to see what the PERL webpage says about Phylip format for DNA alignment files and see that it is erroneous. 

I am not a PERL user and do not want to be bothered to register or otherwise learn how to make an official comment, so forward this for someone to pick up.

Phylip format allows up to 10 spaces for taxon names; the data must start in the 11th space. This can be checked on Jo Felsenstein's site.

The PERL page accessed by searching for "Phylip format PERL" allows only 8 spaces for the name. 

B. L. Cohen
&lt;/pre&gt;</description>
    <dc:creator>Bernard Cohen</dc:creator>
    <dc:date>2013-05-20T18:49:50</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.lang.perl.bio.general/26670">
    <title>Re: Problem compiling Bio::DB::Sam</title>
    <link>http://permalink.gmane.org/gmane.comp.lang.perl.bio.general/26670</link>
    <description>&lt;pre&gt;
Compiled correctly!

thank you



&lt;/pre&gt;</description>
    <dc:creator>Miquel Ràmia</dc:creator>
    <dc:date>2013-05-21T15:08:18</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.lang.perl.bio.general/26669">
    <title>Re: Downloading annotated Gene nucleotide fasta files</title>
    <link>http://permalink.gmane.org/gmane.comp.lang.perl.bio.general/26669</link>
    <description>&lt;pre&gt;Another option that I've used before is to download the gene2accession, gene2refseq,  and gene_info files from here ftp://ftp.ncbi.nih.gov/gene/DATA/ then parse out the data you require.
It might work for you?

--Russell

-----Original Message-----
From: bioperl-l-bounces&amp;lt; at &amp;gt;lists.open-bio.org [mailto:bioperl-l-bounces&amp;lt; at &amp;gt;lists.open-bio.org] On Behalf Of shalabh sharma
Sent: Saturday, 18 May 2013 4:26 a.m.
To: Francisco J. Ossandón
Cc: bioperl-l
Subject: Re: [Bioperl-l] Downloading annotated Gene nucleotide fasta files

Hey Francisco,
           Thanks a lot. Basically i just wanted gene nucleotide fasta files with GI numbers.
I think i will have to parse it from gbk files.

-Shalabh


On Fri, May 17, 2013 at 11:59 AM, Francisco J. Ossandón &amp;lt; fossandonc&amp;lt; at &amp;gt;hotmail.com&amp;gt; wrote:



--
Shalabh Sharma
Scientific Computing Professional Associate (Bioinformatics Specialist) Department of Marine Sciences University of Georgia Athens, GA 30602-3636

_______________________________________________
Bioperl-l mailing list
Bioperl-l&amp;lt; at &amp;gt;lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l

=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================
&lt;/pre&gt;</description>
    <dc:creator>Smithies, Russell</dc:creator>
    <dc:date>2013-05-19T19:26:35</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.lang.perl.bio.general/26668">
    <title>Re: Downloading annotated Gene nucleotide fasta files</title>
    <link>http://permalink.gmane.org/gmane.comp.lang.perl.bio.general/26668</link>
    <description>&lt;pre&gt;Thanks Russell,
                            Actually i wanted all the Bacterial gene
nucleotide files, so i parsed it from *gbk. But yes these files might help
me for my other parts of my work.

Thanks
Shalabh


On Sun, May 19, 2013 at 3:26 PM, Smithies, Russell &amp;lt;
Russell.Smithies&amp;lt; at &amp;gt;agresearch.co.nz&amp;gt; wrote:




&lt;/pre&gt;</description>
    <dc:creator>shalabh sharma</dc:creator>
    <dc:date>2013-05-19T19:33:16</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.lang.perl.bio.general/26667">
    <title>Re: sets of sequences - how to read?</title>
    <link>http://permalink.gmane.org/gmane.comp.lang.perl.bio.general/26667</link>
    <description>&lt;pre&gt;On May 17, 2013, at 12:12 AM, Carnë Draug &amp;lt;carandraug+dev&amp;lt; at &amp;gt;gmail.com&amp;gt;
 wrote:



(note: critical point in this is Bio::ASN1::Entrezgene would allow this, I'm not sure it would.  Otherwise this is all really hand-wavy)

To me a 'set of stuff', particularly when the 'stuff' is stored sequentially in a flat file, is a simple 'database' or 'store' of similar items, where the class allows one the ability to look up particular members in the set, but also could store higher level information about the set as a whole if needed.  If it were me, I would implement a method particular to Bio::SeqIO::entrezgene that specifically creates and returns this ( next_geneset(), for instance ); next_seq() could then be implemented to iterate through the items in that database/store.  

Two useful things come out of this.  First, if the data for the Entrez Gene file/chunk are parsed to store offsets per ID, one would only need to parse out the chunks needed (offset of ID to next offset), then pass that into the parser and create objects on the fly.  This would probably be as fast or faster than (for instance) the greedy method of parsing the entire file and storing everything in objects up-front, then iterating through those objects one at a time, which I think is current behavior.

Second: if an index is created, the upfront cost is already paid (you could reuse the same index when parsing the same data).

An analogous example might be storing all FASTQ data in a sequencing run; I don't want to expend the effort to parse all the FASTQ data, but I may want to run operations on individual items in the set as well as store additional information about the data (barcodes per run, lanes, overall quality stats, etc).  

Does that make sense?  The pieces for this are lying around (Bio::Index::* for instance has methods for indexing flat files, and classes like Bio::DB::Fasta).

chris
&lt;/pre&gt;</description>
    <dc:creator>Fields, Christopher J</dc:creator>
    <dc:date>2013-05-17T17:37:53</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.lang.perl.bio.general/26666">
    <title>Re: Downloading annotated Gene nucleotide fasta files</title>
    <link>http://permalink.gmane.org/gmane.comp.lang.perl.bio.general/26666</link>
    <description>&lt;pre&gt;Hey Francisco,
           Thanks a lot. Basically i just wanted gene nucleotide fasta
files with GI numbers.
I think i will have to parse it from gbk files.

-Shalabh


On Fri, May 17, 2013 at 11:59 AM, Francisco J. Ossandón &amp;lt;
fossandonc&amp;lt; at &amp;gt;hotmail.com&amp;gt; wrote:



&lt;/pre&gt;</description>
    <dc:creator>shalabh sharma</dc:creator>
    <dc:date>2013-05-17T16:26:26</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.lang.perl.bio.general/26665">
    <title>Re: Downloading annotated Gene nucleotide fasta files</title>
    <link>http://permalink.gmane.org/gmane.comp.lang.perl.bio.general/26665</link>
    <description>&lt;pre&gt;Hi,
You can get the annotations from here:
ftp://ftp.ncbi.nih.gov/genomes/Bacteria/

The ".ffn" are the genes nucleotide fasta files but it does not show the
product name, on the other hand the ".faa" are the genes aminoacid fasta
files and shows the product name, but if you want both product and
nucleotide is much better to use the Genbank ".gbk" files that contains the
complete data and you can parse it easily using BioPerl to obtain all genes,
and then print the /protein_id, /product, and the nucleotide sequences in a
new fasta file. Check these to see how to do it:
http://www.bioperl.org/wiki/HOWTO:SeqIO
http://www.bioperl.org/wiki/HOWTO:Feature-Annotation

Cheers,

Francisco J. Ossandon

-----Mensaje original-----
De: bioperl-l-bounces&amp;lt; at &amp;gt;lists.open-bio.org
[mailto:bioperl-l-bounces&amp;lt; at &amp;gt;lists.open-bio.org] En nombre de shalabh sharma
Enviado el: viernes, 17 de mayo de 2013 10:55
Para: bioperl-l
Asunto: [Bioperl-l] Downloading annotated Gene nucleotide fasta files

HI,
                 First of all i am really sorry for sending this mail here,
i am not sure if this is the right forum. I know lot of people work on
similar stuff.
I wrote to NCBI but nobody replied.

Actually i am looking for all bacterial/microbial gene annotation nucleotide
fasta files.
Does anyone knows where to download these kind of files.
I tried *ffn files but they are not annotated.
Or is there any module in bioperl that i can use ?
I would really appreciate your help.

Thanks
Shalabh

--
Shalabh Sharma
Scientific Computing Professional Associate (Bioinformatics Specialist)
Department of Marine Sciences University of Georgia Athens, GA 30602-3636
_______________________________________________
Bioperl-l mailing list
Bioperl-l&amp;lt; at &amp;gt;lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
&lt;/pre&gt;</description>
    <dc:creator>Francisco J. Ossandón</dc:creator>
    <dc:date>2013-05-17T15:59:04</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.lang.perl.bio.general/26664">
    <title>Downloading annotated Gene nucleotide fasta files</title>
    <link>http://permalink.gmane.org/gmane.comp.lang.perl.bio.general/26664</link>
    <description>&lt;pre&gt;HI,
                 First of all i am really sorry for sending this mail here,
i am not sure if this is the right forum. I know lot of people work on
similar stuff.
I wrote to NCBI but nobody replied.

Actually i am looking for all bacterial/microbial gene annotation
nucleotide fasta files.
Does anyone knows where to download these kind of files.
I tried *ffn files but they are not annotated.
Or is there any module in bioperl that i can use ?
I would really appreciate your help.

Thanks
Shalabh

&lt;/pre&gt;</description>
    <dc:creator>shalabh sharma</dc:creator>
    <dc:date>2013-05-17T14:54:55</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.lang.perl.bio.general/26663">
    <title>Re: sets of sequences - how to read?</title>
    <link>http://permalink.gmane.org/gmane.comp.lang.perl.bio.general/26663</link>
    <description>&lt;pre&gt;
:s I'm not sure I understood your suggestion. I think the problem is
just the introduction of a new concept, a "set" of stuff (genes in
this case), and how should SeqIO handle multiple sets.

Carnë
&lt;/pre&gt;</description>
    <dc:creator>Carnë Draug</dc:creator>
    <dc:date>2013-05-17T05:12:24</dc:date>
  </item>
  <textinput rdf:about="http://search.gmane.org/?group=$group=gmane.comp.lang.perl.bio.general">
    <title>Search Engine</title>
    <description>Search the mailing list at Gmane</description>
    <name>query</name>
    <link>http://search.gmane.org/?group=$group=gmane.comp.lang.perl.bio.general</link>
  </textinput>
</rdf:RDF>
