<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://purl.org/rss/1.0/" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:syn="http://purl.org/rss/1.0/modules/syndication/" xmlns:admin="http://webns.net/mvcb/">
  <channel rdf:about="http://permalink.gmane.org/gmane.comp.clustering.torque.user">
    <title>gmane.comp.clustering.torque.user</title>
    <link>http://permalink.gmane.org/gmane.comp.clustering.torque.user</link>
    <description/>
    <syn:updatePeriod>hourly</syn:updatePeriod>
    <syn:updateFrequency>1</syn:updateFrequency>
    <syn:updateBase>1901-01-01T00:00+00:00</syn:updateBase>
    <items>
      <rdf:Seq>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.clustering.torque.user/13069"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.clustering.torque.user/13068"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.clustering.torque.user/13067"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.clustering.torque.user/13066"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.clustering.torque.user/13065"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.clustering.torque.user/13064"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.clustering.torque.user/13063"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.clustering.torque.user/13062"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.clustering.torque.user/13061"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.clustering.torque.user/13060"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.clustering.torque.user/13059"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.clustering.torque.user/13058"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.clustering.torque.user/13057"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.clustering.torque.user/13056"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.clustering.torque.user/13055"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.clustering.torque.user/13054"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.clustering.torque.user/13053"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.clustering.torque.user/13052"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.clustering.torque.user/13051"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.clustering.torque.user/13050"/>
      </rdf:Seq>
    </items>
    <image rdf:resource="http://gmane.org/img/gmane-25t.png"/>
    <textinput rdf:resource=""/>
  </channel>
  <image rdf:about="http://gmane.org/img/gmane-25t.png">
    <title>Gmane</title>
    <url>http://gmane.org/img/gmane-25t.png</url>
    <link>http://gmane.org</link>
  </image>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.clustering.torque.user/13069">
    <title>MPI job submitted with TORQUE does not use InfiniBand if running and start nodes overlap [2]</title>
    <link>http://permalink.gmane.org/gmane.comp.clustering.torque.user/13069</link>
    <description>&lt;pre&gt;
Hi there,

I am using Intel MPI (4.1.0.024 from ICS 2013.0.028) to run my parallel
application (Gromacs 4.6.1 molecular dynamics) on a SGI cluster with
CentOS 6.2 and Torque 2.5.12.

When I submitt a MPI job with Torque to start and run on 2 nodes, MPI 
startup fails to negotiate with Infiniband (IB) and internode 
communication falls back to Ethernet. This is my job script:

#PBS -l nodes=n001:ppn=32+n002:ppn=32
#PBS -q normal
source /opt/intel/impi/4.1.0.024/bin64/mpivars.sh
source /opt/progs/gromacs/bin/GMXRC.bash
cd $PBS_O_WORKDIR/
export I_MPI_DEBUG=2
mpiexec.hydra -machinefile macs -np 64 mdrun_mpi &amp;gt;&amp;amp; md.out

Of course the machinefile macs may be obtained from $PBS_NODEFILE, but
was fixed in this example. The output is:

[54] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1
...
[45] MPI startup(): DAPL provider ofa-v2-mlx4_0-1
...
[33] MPI startup(): DAPL provider &amp;lt;NULLstring&amp;gt; on rank 0:n001 differs from ofa-v2-mlx4_0-1(v2.0) on rank 33:n002
...
[0] MPI startup()&lt;/pre&gt;</description>
    <dc:creator>Guilherme Menegon Arantes</dc:creator>
    <dc:date>2013-06-19T17:55:50</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.clustering.torque.user/13068">
    <title>Re: VNC service on cluster nodes with torque</title>
    <link>http://permalink.gmane.org/gmane.comp.clustering.torque.user/13068</link>
    <description>&lt;pre&gt;
Hi Chris,

Our site does a couple of different things in this space.
- We allow vnc session on the cluster head/login node and have many concurrent sessions with a custom system for limiting resource usage of individual processes. We have a helper script for users to setup/start/find a vnc session just to make things easier. We could scale this to multiple head nodes but have not needed to (yet).
- We unset the sshd uselocalhost option so that if users come into the head node with ssh X forwarding.
- In both cases users get a DISPLAY that is network accessible within the cluster and has an xauth entry in the users (shared) home directory
- we recommend that users who need a DISPLAY in a batch job use the qsub option '-v DISPLAY' (the performance in this case seems much better than that with qsub -X, but relies on the DISPLAY being network accessible and xauth based security).

Our documentation on this is at: https://wiki.csiro.au/display/ASC/Quick+Start+Guide+for+Linux and pointers therein.

This setup has&lt;/pre&gt;</description>
    <dc:creator>Gareth.Williams&lt; at &gt;csiro.au</dc:creator>
    <dc:date>2013-06-18T11:28:15</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.clustering.torque.user/13067">
    <title>Re: Torque 4.x and cpusets</title>
    <link>http://permalink.gmane.org/gmane.comp.clustering.torque.user/13067</link>
    <description>&lt;pre&gt;Hello,

On another thread in this list, i write some pb i have on cpuset... it seems pbs_mom create the mount point on start now (i make the test of stop mom / umount dev/cpuset / start mom), but it cost nothing to try that ;)Do you have hwloc properly installed on the mom ? Have made some tests on the mom server with a "hwloc-bind core:0 -- sleep 10" for example ?

              _______________________________________________
torqueusers mailing list
torqueusers&amp;lt; at &amp;gt;supercluster.org
http://www.supercluster.org/mailman/listinfo/torqueusers
&lt;/pre&gt;</description>
    <dc:creator>François P-L</dc:creator>
    <dc:date>2013-06-18T04:54:36</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.clustering.torque.user/13066">
    <title>Re: Torque 4.2 and OSC mpiexec</title>
    <link>http://permalink.gmane.org/gmane.comp.clustering.torque.user/13066</link>
    <description>&lt;pre&gt;-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 16/04/13 02:57, Sérgio Almeida wrote:


Going by what I've seen on the mpiexec list there is was a bug in
Torque 4.1 and 4.2 which breaks OSC mpiexec, I've not seen any mention
on that list of Adaptive having fixed it yet. :-(

All the best,
Chris
- -- 
 Christopher Samuel        Senior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: samuel&amp;lt; at &amp;gt;unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/      http://twitter.com/vlsci

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlG/5UYACgkQO2KABBYQAh8PGgCgiRy41dccUtxKrDGzm1ZZj9IQ
07MAoIVTF0XX09Bct9PNUfjH5az6GUDK
=RjdJ
-----END PGP SIGNATURE-----
&lt;/pre&gt;</description>
    <dc:creator>Christopher Samuel</dc:creator>
    <dc:date>2013-06-18T04:42:46</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.clustering.torque.user/13065">
    <title>Re: Torque 4.x and cpusets</title>
    <link>http://permalink.gmane.org/gmane.comp.clustering.torque.user/13065</link>
    <description>&lt;pre&gt;
You might need to mount /dev/cpuset before starting the mom.  This was needed before version 4 but I don't have experience with current versions.

See for example 3.5.3 in: http://www.clusterresources.com/torquedocs21/3.5linuxcpusets.shtml

Gareth
&lt;/pre&gt;</description>
    <dc:creator>Gareth.Williams&lt; at &gt;csiro.au</dc:creator>
    <dc:date>2013-06-18T04:36:04</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.clustering.torque.user/13064">
    <title>Re: pbs_sched ver 4.2.3.1 : "Problem with creating server data strucutre"</title>
    <link>http://permalink.gmane.org/gmane.comp.clustering.torque.user/13064</link>
    <description>&lt;pre&gt;I have the same error message when I installed 4.2.3, the pbs_sched
does not work, jobs sit in queue. Any clues?

Thanks.

Kind regards.

J

On 6/16/13, Matt Harvey &amp;lt;m.j.harvey&amp;lt; at &amp;gt;acellera.com&amp;gt; wrote:
&lt;/pre&gt;</description>
    <dc:creator>jupiter</dc:creator>
    <dc:date>2013-06-18T01:15:45</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.clustering.torque.user/13063">
    <title>Resoving a jobid</title>
    <link>http://permalink.gmane.org/gmane.comp.clustering.torque.user/13063</link>
    <description>&lt;pre&gt;Hi all,

In TORQUE 4.x we made changes to the naming. Occasionally, this has caused
some issues with job ids. For example a call to qdel with just a sequence
number would return an Unknown Job Id error if the naming (DNS)
configuration was not quite right. Eventually, we are able to help users
resolve the issue but it is not always straight forward.

I have a proposal to see if we can make it so TORQUE is more forgiving of
naming configurations when it comes to resolving job ids. Please let me
know if what follows will cause a problem with your current configuration.

A TORQUE job id is made of two parts; a sequence number assigned from
TORQUE and the host name of the machine where pbs_server is running. The
host name part is the name returned by a call to the 'C' function
gethostname(). gethostname() returns the same name as hostname -f from the
command line, This is the fully qualified domain name of the machine. So
all TORQUE job ids are the sequence number and the fqdn of the host.

When a TORQUE utility&lt;/pre&gt;</description>
    <dc:creator>Ken Nielson</dc:creator>
    <dc:date>2013-06-17T22:37:15</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.clustering.torque.user/13062">
    <title>job failure- cannot find user in password file</title>
    <link>http://permalink.gmane.org/gmane.comp.clustering.torque.user/13062</link>
    <description>&lt;pre&gt;Hi there guys,

I'm having a problem with pbs i wish anyone can help me out with it.

first Here are some helping info:
server=hatem-Inspiron-5520
client=toma-VirtualBox
shazly= a user on the server
job id=9.hatem-Inspiron-5520

So i installed and configured pbs torque on a mini-cluster (server + one 
client), when i submit a job from the server, i don't get the output files, 
so i went to check the mom log on the client machine and i found these 
entries:

pbs_mom;Svr;mom_server_add;server hatem-Inspiron-5520 added
pbs_mom;Svr;pbs_mom;LOG_ALERT::mom_server_valid_message_source, bad connect 
from "the server ip"- unauthorized server
mom_server_check_connection;sending hello to server 'hatem-Inspiron-5520'
pbs_mom;LOG_ERROR::start_exec, no password entry for user 'shazly'
pbs_mom;Req;send_sisters;sending ABORT to sisters for job '9.hatem-Inspiron-
5520'
pbs_mom;Svr;pbs_mom;LOG_ERROR::sucess(0) in fork_to_user, cannot find 
'shazly' in password file
pbs_mom;Req;req_reject;Reject reply code=15025(BAD UID for jo&lt;/pre&gt;</description>
    <dc:creator>shazly</dc:creator>
    <dc:date>2013-06-17T16:39:00</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.clustering.torque.user/13061">
    <title>Re: Torque 4.2 and OSC mpiexec</title>
    <link>http://permalink.gmane.org/gmane.comp.clustering.torque.user/13061</link>
    <description>&lt;pre&gt;Hello Sergio,

This isn't exactly comparable to what you're reporting, but we had to
recompile OpenMPI when moving from Torque 2.5 to 4.x.  There was some
issue with the task manager interface, and while several list members
offered suggestions on how to work around that, we ended up recompiling
anyway.  You can see some of the discussion in the list archives from June
2012.  Perhaps you can find some clues that could be relevant to your
situation.

Regards,
Peter Ruprecht

On 4/15/13 10:57 AM, "Sérgio Almeida" &amp;lt;sergio.almeida&amp;lt; at &amp;gt;ist.utl.pt&amp;gt; wrote:

&lt;/pre&gt;</description>
    <dc:creator>Peter A Ruprecht</dc:creator>
    <dc:date>2013-06-17T15:10:53</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.clustering.torque.user/13060">
    <title>MPI job submitted with TORQUE does not use InfiniBand if running and start nodes overlap</title>
    <link>http://permalink.gmane.org/gmane.comp.clustering.torque.user/13060</link>
    <description>&lt;pre&gt;
Hi there,

I am using Intel MPI (4.1.0.024 from ICS 2013.0.028) to run my parallel
application (Gromacs 4.6.1 molecular dynamics) on a SGI cluster with
CentOS 6.2 and Torque 2.5.12.

When I submitt a MPI job with Torque to start and run on 2 nodes, MPI 
startup fails to negotiate with Infiniband (IB) and internode 
communication falls back to Ethernet. This is my job script:

#PBS -l nodes=n001:ppn=32+n002:ppn=32
#PBS -q normal
source /opt/intel/impi/4.1.0.024/bin64/mpivars.sh
source /opt/progs/gromacs/bin/GMXRC.bash
cd $PBS_O_WORKDIR/
mpiexec.hydra -machinefile macs -np 64 mdrun_mpi &amp;gt;&amp;amp; md.out

Of course the machinefile macs may be obtained from $PBS_NODEFILE, but
was fixed in this example. The output is:

[54] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1
....
[45] MPI startup(): DAPL provider ofa-v2-mlx4_0-1
....
[33] MPI startup(): DAPL provider &amp;lt;NULLstring&amp;gt; on rank 0:n001 differs from ofa-v2-mlx4_0-1(v2.0) on rank 33:n002
...
[0] MPI startup(): shm and tcp data &lt;/pre&gt;</description>
    <dc:creator>Guilherme Menegon Arantes</dc:creator>
    <dc:date>2013-06-17T13:41:42</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.clustering.torque.user/13059">
    <title>Re: getting a summary by username</title>
    <link>http://permalink.gmane.org/gmane.comp.clustering.torque.user/13059</link>
    <description>&lt;pre&gt;Hi,

  Short time listener, first time poster.

  User stats, assuming you're using Maui, are done using the showstats
command.

  showstats -u

  Manpage(?) : http://docs.adaptivecomputing.com/maui/commands/showstats.php

md


On 17 April 2013 21:00, Mahmood Naderan &amp;lt;nt_mahmood&amp;lt; at &amp;gt;yahoo.com&amp;gt; wrote:

_______________________________________________
torqueusers mailing list
torqueusers&amp;lt; at &amp;gt;supercluster.org
http://www.supercluster.org/mailman/listinfo/torqueusers
&lt;/pre&gt;</description>
    <dc:creator>Mark Dwyer</dc:creator>
    <dc:date>2013-04-17T11:48:41</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.clustering.torque.user/13058">
    <title>pbs_sched ver 4.2.3.1 : "Problem with creating serverdata strucutre"</title>
    <link>http://permalink.gmane.org/gmane.comp.clustering.torque.user/13058</link>
    <description>&lt;pre&gt;Hi,

I've just updated a torque installation from 2.5.9 to 4.2.3.1 (release 13
Jul 13).  To update, all daemons were replaced by the new versions and
restarted. I also started the new trqauthd.

no changes were made to the qmgr config, which was previously working and
in left in an active (scheduling) state.

After the update scheduling did not appear to be working. The pbs_sched's
sched_out file contains the following two lines repeated:

pbs_statserver failed: 15033
Problem with creating server data strucutre

According to the admin manual this error is:

PBSE_NOCONNECTS 15033 No free connections

connection to the pbs_server.

Is the pbs_sched in the new release broken, or does the error suggest I
have some misconfiguration?

Thanks,

Matt
_______________________________________________
torqueusers mailing list
torqueusers&amp;lt; at &amp;gt;supercluster.org
http://www.supercluster.org/mailman/listinfo/torqueusers
&lt;/pre&gt;</description>
    <dc:creator>Matt Harvey</dc:creator>
    <dc:date>2013-06-15T19:17:18</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.clustering.torque.user/13057">
    <title>Not receiving output</title>
    <link>http://permalink.gmane.org/gmane.comp.clustering.torque.user/13057</link>
    <description>&lt;pre&gt;I've newly installed torque on a mini-cluster (2 machines) but i've a
problem when submitting a job, i don't receive an output file nor an error
file, i've tested the connection between the server and the client
(ping-ed, ssh-ed) and everything was working fine, i checked the config
file on mom_priv and nodes file on the server_priv and nothing was wrong
with them, the job status queues and completes fine but there is something
i noted when running qstat -f command, i get this output in a field called
sched_hint:

sched_hint = sched_hint = Post job file processing error; job
2.hatem-inspiron-5520  on  host toma-VirtualBox/0Bad UID for job execution
REJHOST=toma-virtualbox MSG=cannot find user 'hatem' in password file

the server is (hatem-Inspiron-5520) and the client is (toma-VirtualBox).
i also checked the server log and there was nothing unusual and there was
no log for pbs_mom in mom_logs!!, i double checked /etc/hosts file on both
the server and the client and each file has the (ip hostname) of the
corr&lt;/pre&gt;</description>
    <dc:creator>Hatem Elshazly</dc:creator>
    <dc:date>2013-06-15T17:15:17</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.clustering.torque.user/13056">
    <title>Re: VNC service on cluster nodes with torque</title>
    <link>http://permalink.gmane.org/gmane.comp.clustering.torque.user/13056</link>
    <description>&lt;pre&gt;Chris,

On 06/14/2013 05:43 PM, Chris Hunter wrote:
I think that the solution you are trying to find will lead to 
performance degradation from all your nodes throughout time. VNC 
solutions are not that clean and neat as we expect they would be by 2013.

Don't the interactive sessions with X forwarding suffice for your users 
needs?

Around here, users do qsub -I -X and they get an interactive session in 
a special queue with X11 forwarding from nodes to the entry node. Given 
that they are connected to the entry node with X11 forwarding, they will 
be able to use the graphical interfaces in their connecting computers. 
We have been using it successfully with visit, mathematica, flair, idl 
and so on. We are only allowing small sessions (&amp;lt; 6h) for this since 
people started leaving idle programs open from one day to the other just 
to check how it went when no time limit was set.

Regarding users that want to leave stuff running to come back later 
(implying a disconnect), we haven't thought about that yet &lt;/pre&gt;</description>
    <dc:creator>Sérgio Almeida</dc:creator>
    <dc:date>2013-06-14T17:09:23</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.clustering.torque.user/13055">
    <title>qsub jobs in queue, but not execeuted</title>
    <link>http://permalink.gmane.org/gmane.comp.clustering.torque.user/13055</link>
    <description>&lt;pre&gt;Hi,

I am new to the list, sorry for asking FAQ. I've just installed a
server and a node mom from source 4.2.3 on CentOS 6.4. I am testing it
using pbs_sched. I can submit jobs but all jobs remain in queue (all
jobs are simply bash scripts with one line of "echo test").

$ qstat
Job ID                    Name             User            Time Use S Queue
------------------------- ---------------- --------------- -------- - -----
1.login                    job1.sh          tester                0 Q
batch
2.login                    job2.sh          tester                0 Q
batch
3.login                    job3.sh          tester                0 Q batch

The communication between the server and the node seems fine, but the
jobs did not move to the node:

$ pbsnodes
desktop3
     state = free
     np = 1
     ntype = cluster
     status = rectime=1371190614,varattr=,jobs=,state=free,netload=12520819104,gres=,loadave=0.00,ncpus=1,physmem=3923096kb,availmem=3550292kb,totmem=3923096kb,idletime=690945,nusers=2,nses&lt;/pre&gt;</description>
    <dc:creator>jupiter</dc:creator>
    <dc:date>2013-06-14T06:35:19</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.clustering.torque.user/13054">
    <title>Setting default</title>
    <link>http://permalink.gmane.org/gmane.comp.clustering.torque.user/13054</link>
    <description>&lt;pre&gt;We've set Maui to enforce memory limits, but if a user submits a job without a memory specification, there's no limit to enforce. So we want to put a minimum/default memory amount on the queue to encourage users to remember to set a limit. I believe that's done by:

qmgr -c "set queue default resources_default.mem =1 gb"

I've never modified the queue, so I don't know what issuing that qmgr command will do, operationally. Do I need to restart a daemon after making this change? Or issue some other command? And what happens to currently running jobs when I make that change. We currently only have a default queue, as our cluster is new and tiny.

Or should I be doing this in Maui. I assume it ought to be done at the queue level.

Obviously I'm very new at this.
David Kieschnick
_______________________________________________
torqueusers mailing list
torqueusers&amp;lt; at &amp;gt;supercluster.org
http://www.supercluster.org/mailman/listinfo/torqueusers
&lt;/pre&gt;</description>
    <dc:creator>David Kieschnick</dc:creator>
    <dc:date>2013-04-24T23:34:37</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.clustering.torque.user/13053">
    <title>Torque, LDAP &amp; long usernames</title>
    <link>http://permalink.gmane.org/gmane.comp.clustering.torque.user/13053</link>
    <description>&lt;pre&gt;Hi,

We have an old running Torque (2.3.6) &amp;amp; maui cluster with one dedicated server 
and several computing nodes (all Scientific Linux 5). The server also runs the 
NIS server without problem.

We have recently changed from NIS to a LDAP server(running in the same server) 
and everything looked fine, but we're starting to find problems. Sometimes, 
jobs get errors like:

06/07/2013 18:59:59;0001;   pbs_mom;Svr;pbs_mom;start_exec, no password entry 
for user garciavdaniel
06/07/2013 18:59:59;0001;   pbs_mom;Svr;pbs_mom;Success (0) in fork_to_user, 
cannot find user 'garcxxydaniel' in password file
06/07/2013 18:59:59;0080;   pbs_mom;Req;req_reject;Reject reply code=15023(Bad 
UID for job execution REJHOST=yed04.sct.uo MSG=cannot find user 
'garciavdaniel' in password file), aux=0, type=CopyFiles, from 
PBS_Server&amp;lt; at &amp;gt;jaula04.sct.uo
06/07/2013 18:59:59;0001;   pbs_mom;Svr;pbs_mom;Inappropriate ioctl for device 
(25) in req_cpyfile, fork_to_user failed with rc=-15023 'cannot find user 
'garcxxyvdaniel' in password &lt;/pre&gt;</description>
    <dc:creator>Clúster de Modelización Científica</dc:creator>
    <dc:date>2013-06-10T12:23:39</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.clustering.torque.user/13052">
    <title>OSC mpiexec and Torque 4</title>
    <link>http://permalink.gmane.org/gmane.comp.clustering.torque.user/13052</link>
    <description>&lt;pre&gt;Hi All,

Is anyone successfully using OSC mpiexec (https://www.osc.edu/~djohnson/mpiexec/index.php) and Torque 4?  

We have been testing it with torque 4.1.4 and are running into problems where jobs just seem to hang.  Looking back on this mailing list there are reports of problems with this combination of software back in July 2012 but I have not seen any updates since then.  Does anyone know of any open bug reports that I can follow related to mpiexec and Torque 4.

Thanks,

- John


---
John Mehringer
USC ITS-HPCC
&lt;/pre&gt;</description>
    <dc:creator>JOhn Mehringer</dc:creator>
    <dc:date>2013-03-15T17:09:16</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.clustering.torque.user/13051">
    <title>submitted jobs not running on all nodes</title>
    <link>http://permalink.gmane.org/gmane.comp.clustering.torque.user/13051</link>
    <description>&lt;pre&gt;Hello,

I'm running a couple of clusters, one 64 node cluster and one 5 node 
cluster, utilizing the default torque package for scheduling and 
everything else. When I try to submit a job that will utilize more than 
one node it appears that it will not use all of the nodes, but rather it 
stays on one node. When I run tracejob &amp;lt;job-id&amp;gt; or qstat -f &amp;lt;job-id&amp;gt; it 
shows that the nodes have been allocated to the job and everything 
appears to be fine. If I go to the nodes individually They have the 
appropriate job files in the mom directory, but if I run top or ps -ef 
the job will only appear on one node and use only the processors of that 
node while not showing up in any of the other nodes it has been set to use.

Does anyone have any idea what may be causing this behavior?

Here is view of my qmgr.
#
# Create queues and set their attributes.
#
#
# Create and define queue batch
#
create queue batch
set queue batch queue_type = Execution
set queue batch resources_default.nodes = 1
set queue batch resources_de&lt;/pre&gt;</description>
    <dc:creator>Chris Bright</dc:creator>
    <dc:date>2013-06-05T19:25:22</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.clustering.torque.user/13050">
    <title>submitted jobs not running on all nodes</title>
    <link>http://permalink.gmane.org/gmane.comp.clustering.torque.user/13050</link>
    <description>&lt;pre&gt;Hello,

I'm running a coupe of clusters one 64 node cluster and one 5 node 
cluster utilizing the default torque package for scheduling and 
everything else. When I try to submit a job that will utilize more than 
one node it appears that it will not use all of the nodes, but rather it 
stays on one node. When I run tracejob &amp;lt;job-id&amp;gt; or qstat -f &amp;lt;job-id&amp;gt; it 
shows that the nodes have been allocated to the job and everything 
appears to be fine. If I go to the nodes individually They have the 
appropriate job files in the mom directory, but if I run top or ps -ef 
the job will only appear on one node and use only the processors of that 
node while not showing up in any of the other nodes it has been set to use.

Does anyone have any idea what may be causing this behavior?

Here is view of my qmgr.
#
# Create queues and set their attributes.
#
#
# Create and define queue batch
#
create queue batch
set queue batch queue_type = Execution
set queue batch resources_default.nodes = 1
set queue batch resources_defau&lt;/pre&gt;</description>
    <dc:creator>Chris Bright</dc:creator>
    <dc:date>2013-06-04T14:29:08</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.clustering.torque.user/13049">
    <title>submitted jobs not running all nodes</title>
    <link>http://permalink.gmane.org/gmane.comp.clustering.torque.user/13049</link>
    <description>&lt;pre&gt;Hello,

I'm sorry if I'm sending this to the wrong list, if so please forgive me 
and direct me to the appropriate place.

I'm running a coupe of clusters one 64 node cluster and one 4 node 
cluster utilizing the default torque package for scheduling and 
everything else. When I try to submit a job that will utilize more than 
one node it appears that it will not use all of the nodes, but rather it 
stays on one node. When I run tracejob &amp;lt;job-id&amp;gt; or qstat -f &amp;lt;job-id&amp;gt; it 
shows that the nodes have been allocated to the job and everything 
appears to be fine. If I go to the nodes individually and run top or ps 
-ef the job will only appear on one node and use only the processors of 
that node.

Does anyone have any idea what may be causing this behavior?

Here is view of my qmgr.
#
# Create queues and set their attributes.
#
#
# Create and define queue batch
#
create queue batch
set queue batch queue_type = Execution
set queue batch resources_default.nodes = 1
set queue batch resources_default.walltime = 01:00&lt;/pre&gt;</description>
    <dc:creator>Chris Bright</dc:creator>
    <dc:date>2013-06-03T15:39:53</dc:date>
  </item>
  <textinput rdf:about="http://search.gmane.org/?group=$group=gmane.comp.clustering.torque.user">
    <title>Search Engine</title>
    <description>Search the mailing list at Gmane</description>
    <name>query</name>
    <link>http://search.gmane.org/?group=$group=gmane.comp.clustering.torque.user</link>
  </textinput>
</rdf:RDF>
