Computing Tips

From Eric's Wiki

Jump to: navigation, search

Contents

MPI hostname confusion

Q: mpich fails to run a program and produces the error:

p4_error: Could not gethostbyname for host laggan; may be invalid name

A: This happens when the hostname of a node is not resolvable by another node.

Eg 1: The master is cherry.orchard.net. On cherry, hostname reports the name as cherry. If another node is apple.blossom.orchard.net, the node will recieve from the master the hostname cherry. The node will attempt to resolve cherry's IP by calling gethostbyname. This will fail, since gethostbyname(cherry) will look for cherry.blossom.orchard.net.

Fix 1: On the master, set the environment variable MPI_HOST to cherry.orchard.net. That over-rides the hostname mpirun sends to all the hosts.

Eg 2: The master is cherry.orchard.net. A slave is apple.blossom.orchard.net and another slave is peach.orchard.net. And Fix 1 has been applied, so apple.blossom can find cherry, peach can find cherry, cherry can find peach, but apple.blossom cannot find peach. That is because the machine file looks like this:

cherry
peach
apple.blossom

Fix 2: Change the machine file to:

cherry
peach.orchard.net
apple.blossom.orchard.net

You might think that changing the machine file to:

cherry.orchard.net
peach.orchard.net
apple.blossom.orchard.net

might also fix example 1. But it doesn't. As far as I can tell, it makes no difference to example 1 since MPI_HOST is set via a call to hostname if it is not already set.

Eric Tittley (04 08 10)

Fortran memory limit

Q. Unable to run FORTRAN programmes that use more than 1Gb, even if I have more than 1Gb of memory.

A: Compile using the -static flag. You made need to use g77 3.4.1 or greater.

Eric Tittley (04 08 23)


Password-less logins

Q. How to do password-less logins on a cluster which shares a common /home filesystem.

A.

cd ~/.ssh
ssh-keygen -t rsa
<ret>
<ret>
<ret>
cp id_rsa.pub authorized_keys

Test:

ssh garve

(It should log you in w/o asking for a password)

For systems that don't share a common /home filesystem, copy the files: id_rsa.pub and authorized_keys to the ~/.ssh/ directory on each of the machines to which you want password-less logins.

Eric Tittley (05 03 23)

Directories a machine exports

Q. What directories does my machine export and to whom?

A. exportfs

Eric Tittley (06 09 05)

Directories another machine exports

Q. What directories does another machine export and to whom?

A. /sbin/showmount -a <host>

Eric Tittley (06 09 05)

Login order of file sourcing

Q. In what order are .login, .profile, .tcshrc, .cshrc read?

A.

tcsh: .tcshrc, .cshrc, .login

bash: .profile, .login

.tcshrc/.cshrc is read whenever tcsh/csh is invoked, including a script that was not called with the '-f' flag (fast).

If there is a .tcshrc file, tcsh will not read .cshrc.

.login is read only when a shell is invoked for interactive use.

.tcshrc/.cshrc is read by cron.

Eric Tittley (06 11 30)

So scripts generally call .tcshrc/.cshrc and shells .tcshrc/.cshrc then .login


Eric Tittley (07 02 05)

malloc() error

*** glibc detected *** double free or corruption: 0x?????? ***

This message is due to a malloc() error. You can control the behaviour of malloc() using the environment variable MALLOC_CHECK_:

  1. Do not generate an error message, and do not kill the program
  2. Generate an error message, but do not kill the program
  3. Do not generate an error message, but kill the program
  4. Generate an error message and kill the program

If MALLOC_CHECK_ is explicitly set a value other than 0, this causes glibc to perform more tests that are more extensive than the default, and may impact performance.

Changing default shells

chsh -s /bin/tcsh

--Etittley 15:03, 19 July 2007 (BST)

sort doesn't sort numbers properly

The problem is with locale settings. Writers of locale tables are happy to fix the stuff pertaining to their language, but ignore non-language-specific issues, like how to sort numbers. If LANG is set, sort is obliged to use the lazily-written locale tables.

Before sorting, set:

setenv LC_ALL POSIX

See: http://www.gnu.org/software/coreutils/faq/coreutils-faq.html#Sort-does-not-sort-in-normal-order_0021

--Etittley 12:20, 8 August 2007 (BST)

Never use round(), rint(), or nearbyint()

They may sound obvious, but different libraries produce different results, some astoundingly wrong.

For example, (int)round(1.) =

  • 0 gcc-3.3.5 on garve
  • 1 gcc-4.1.1 on garve (complains about "incompatible implicit declaration of built-in function")
  • 1 gcc-3.3.3 on UKAFF
  • 42 xlc
  • 0 xlc -q64
  • j=1=round(1.) xlc -O0
  • j=268435456=round(1.) xlc -O2 or higher

All xlc results are on UKAFF.

On supa64 (SGI Altrix)

  • 1 gcc-3.3.3 all -On levels
  • 1 icc all -On levels
  • 1 gcc-4.1.1 all -On levels, but compiler throws: "incompatible implicit declaration of built-in function 'round'"

--Etittley 13:47, 9 August 2007 (BST)


Converting PDF to PS

Not all PDF to PS converters consistently produce PS that printers accept. On my linux box, there are three tools to convert PDF to PS:

  • acroread
  • pdf2ps
  • pdftops

Of these, pdftops works best.

So I have in my .aliases file:

alias lppdf 'pdftops \!* - | lpr -Zduplex'

--Etittley 16:18, 13 September 2007 (BST)


Using multiple arguments in an alias

A CSh alias can take a single argument as:

alias foo  'bar \!*'

So foo arg will run bar arg and foo arg1 arg2 will be run as bar arg1 arg2.

But if more than one argument is expected and they aren't to be supplied to a single part of the alias, then the separate arguments can be referenced independently:

alias foo  'bar \!:1 | bar2 > \!:2'

So foo arg1 arg2 will run bar arg1 | bar2 > arg2.

--Etittley 16:32, 13 September 2007 (BST)

--whatprovides in Debian

RPM provides a method for finding to which package a file belongs. For example:

> rpm -q --whatprovides /home/ert/local/Intel/bin/icc
intel-icc9-9.0-030.i386

In Debian, a similar functionality is provided by:

> dpkg-query -S /usr/bin/netkit-rsh
rsh-client: /usr/bin/netkit-rsh

--Etittley 13:01, 9 November 2007 (UTC)

Email signatures that don't get included in replies

Signatures are files containing a few (sometimes way more!) lines of text that get automatically included at the end of emails.

The signatures can be included in a way that allows the email reader to know they are just signatures and ignore them when composing replies which include the text of the original email. This has the benefit of shortening replies, particularly in the cases of lengthy signatures and multiple replies back and forth.

Some email clients (thunderbird, for one) automatically preface the signature with the necessary code. Others do not.

Adding the following line as the first line of your email signature file will allow identification of your signature as a signature.
--
Note: there is a necessary blank space after the two dashes!

--Etittley 14:43, 26 February 2008 (UTC)

Fast index in Fortran

Q: What is the faster index to a Fortran array, the first or the last?

A: The first.

What is meant by the "fast index" of the array is the index that increments fastest as one steps through the array in memory space.

For example, consider the 3x3 array A. If the array elements A(i,j) are arranged in memory as:
a1,1, a1,2, a1,3, a2,1, a2,2, a2,3, a3,1, a3,2, a3,3
then the 2nd index is the fast index.

However, if they are arranged as:
a1,1, a2,1, a3,1, a1,2, a2,2, a3,2, a1,3, a2,3, a3,3
then the 1st index is the fast index.

Significant improvements to performance can be had when manipulating a multi-dimensional array by setting the inner loop to process through the fast index. For example, consider the following code snippet written in the intuitive manner:

      do i=1,N
       do j=1,M
        A(i,j)=real(i)*real(j)
       enddo
      enddo

Rearranging the loops to:

      do j=1,M
       do i=1,N
        A(i,j)=real(i)*real(j)
       enddo
      enddo

With M=N=20000, the first form takes 27x longer to run than the second (gfortran 4.3.0 and ifort 10.1 at -O3).

The increase in performance is easily understood as a result of the next element to process being next in memory.

--Etittley 13:51, 21 March 2008 (UTC)


Fast index in C

Q: What is the faster index to a C multi-dimensional array, the first or the last?

A: The last.

What is meant by the "fast index" of the array is the index that increments fastest as one steps through the array in memory space.

For example, consider the 3x3 array A. If the array elements of A are arranged in memory as:
a1,1, a1,2, a1,3, a2,1, a2,2, a2,3, a3,1, a3,2, a3,3
then the 2nd index is the fast index.

However, if they are arranged as:
a1,1, a2,1, a3,1, a1,2, a2,2, a3,2, a1,3, a2,3, a3,3
then the 1st index is the fast index.

Significant improvements to performance can be had when manipulating a multi-dimensional array by setting the inner loop to process through the fast index. For example, consider the following code snippet written in the intuitive manner:

 for(i=0;i<N;i++) {
  for(j=0;j<M;j++) {
   A[i][j]=(float)i * (float)j;
  }
 }

The loop may alternatively be arranged as:

 for(j=0;j<M;j++) {
  for(i=0;i<N;i++) {
   A[i][j]=(float)i * (float)j;
  }
 }

The second form takes 14 to 48 times longer (N=20000, M=10000), depending on compiler and optimisation level. Indeed, the fastest form (first) is also most optimisable, gaining 3.5x speed between -O0 and -O3.

--Etittley 15:59, 22 April 2008 (BST)

Fast index in Matlab/Octave

Q: What is the faster index to a Matlab/Octave multi-dimensional array, the first or the last?

A: The first.

Not Vectorised code (with i the inner loop):

A=zeros(N,M);
for j=1:M
 for i=1:N
  A(i,j)=real(i)*real(j);
 end
end

Timings (s)
Not vectorised (N=M=2000)

Inner index Octave Matlab
i 87 0.095
j 88 0.215

Vectorised code (with i the inner loop):

A=zeros(N,M);
i=[1:N];
for j=1:M
 A(:,j)=real(i)*real(j);
end

Timings (s)
Vectorised (N=M=10000)

Inner index Octave Matlab
i 4.8 1.95
j 23 17.6

--Etittley 17:03, 1 July 2008 (BST)

Using my Subversion server

My Subversion server is svn.hmet.net.

Checking out

To check out the project foo:
svn checkout svn://svn.hmet.net/foo

If you are behind a firewall, as is the case at the ROE, tunnel through ssh:
svn checkout svn+ssh://etittley@svn.hmet.net/home/svn/foo

More complicated tunnelling

To tunnel from byre (behind two firewalls) directly to the subversion server on svn.hmet.net via garve (for eg.) using subversion's default port of 3690:

byre -> fetlar -> svn.hmet.net

On byre:
byre> ssh -L 3690:localhost:3690 fetlar.roe.ac.uk
That logs you into garve.

On fetlar (same terminal as above):
fetlar> ssh -L 3690:localhost:3690 etittley@svn.hmet.net
That logs you into svn.

Back on byre: svn co svn://localhost/PMRT

That checks out PMRT!

--Etittley 22:05, 25 July 2011 (BST)

Updating

For a good overview of updating, differencing, and committing changes, see: The Subversion Book

Current packages hosted at svn.hmet.net

The current packages are:
CDRT
EnzoRT
PMRT
RT
enzo_displacement_statistics
enzo2pm
idl
inits
matlab
tpmcdm_ic
Timedep2

Note: EnzoRT and PMRT require RT. For example:

svn checkout svn://svn.hmet.net/EnzoRT
cd EnzoRT
svn checkout svn://svn.hmet.net/RT

Ignoring a file or directory

Suppose you want to ignore the file or directory FileToIgnore.
svn propset svn:ignore FileToIgnore .

To add to the list, or create a list of objects to ignore, edit the properties with:
svn propedit svn:ignore .

Add a new-line-delimited list of objects to ignore.

--Etittley 10:51, 15 November 2010 (UTC)

Getting a user's numeric UID

Q: How do I get a user's numeric UID

A: There are many ways, but a full-proof way is:
stat /home/username

--Etittley 17:11, 1 July 2008 (BST)

mpirun unable to connect on Debian 4.0 systems

Here's an example session:

mpirun -machinefile hostfile -np 4 ./myMPIprogramme
foo.bar.org: Connection refused
p0_15896:  p4_error: Child process exited while making connection to remote process on foo: 0
p0_15896: (37.056298) net_send: could not write to fd=4, errno = 32

The problem is that MPICH has been configured to use /usr/bin/rsh as the default connection agent. The network at bar.org requires ssh.

Adding the flag -rsh ssh should work, but it doesn't in Debian 4.0:

mpirun -rsh ssh -machinefile hostfile -np 4 ./myMPIprogramme
foo.bar.org: Connection refused
p0_15896:  p4_error: Child process exited while making connection to remote process on foo: 0
p0_15896: (37.056298) net_send: could not write to fd=4, errno = 32

The error is in /usr/lib/mpich/bin/mpirun.ch_p4.args which sets the setrshcmd="no" by default, even if it has already been set to "yes" by a previous call to mpirun.ch_p4.args. Modify it accordingly:

Replace:

setrshcmd="no"

with:

if [ ! $setrshcmd ]; then
 setrshcmd="no"
fi

Alternatively, and perhaps easier, just set the environment variable:

setenv P4_RSHCOMMAND ssh

--Etittley 15:57, 2 July 2008 (BST)


Listing Pre-defined Compiler Macros

  • GCC: gcc -dM -E - < /dev/null
  • Intel: icc -dM -E - < /dev/null
  • PGI: pgcc -dM -E /dev/null

(See: Wikipedia's C preprocessor entry)

  • For xlc/xlC, see the XL C/C++ Users Guide.

--Etittley 16:51, 4 August 2010 (BST)

relocation truncated to fit: R_X86_64_PC32

Compile with the flag (Intel & GCC): -mcmodel=medium

Or, with Intel: -mcmodel medium -shared-intel

--Etittley 15:42, 25 October 2010 (BST)

Controlling CPU frequency regulation

CPU frequencies can be allowed to vary depending on the demand, saving on power when idle.

To list the possible governors:

cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_governors

To set the governor to no throttling (important when benchmarking):

cpufreq-set -c 0 --governor performance
cpufreq-set -c 1 --governor performance

To set the governor to throttle the CPU when demand is low:

cpufreq-set -c 0 --governor ondemand
cpufreq-set -c 1 --governor ondemand

--Etittley 13:47, 20 November 2008 (UTC)

Testing for NaN in C

Such a simple problem. But there are some pitfalls.

I recommend either of the following:

float c = sqrt(-1);
if( isnan(c) )     c = 0.0; /* Most obvious */
if( !isfinite(c) ) c = 0.0; /* Will also catch INF */
if( c != c )       c = 0.0; /* IEEE requires this to work */

I've added this note because someone gave me a bit of code that tests this way:

if( c*0.0 != 0.0 ) c = 0.0;

A logical solution that is also clear, since presumable the NaN bit is preserved in all mathematical operations and NaN is obviously not equal to 0. Compare with ( NaN != NaN ) which sounds like it would be false but IEEE requires that it return true. As cute and handy the "c != c" construction is, IMHO it is never good that a programme do quite the opposite of what you might naively think it would do.

Unfortunately, however logical ( c*0 != 0 ) is, different compilers treat it differently.

gcc returns True
icc returns False
xlc returns True

The expected behaviour of ( c*0 != 0 ) where c == NaN is not obvious. For example, gcc returns True, but the libc library documents: "Unless the calculation would produce the same result no matter what real value replaced NaN, the result is NaN." From that, one might expect 0*NaN = 0. 0*Inf should be NaN, but not necessarily NaN.

The documentation also states: "NaN is unordered: it is not equal to, greater than, or less than anything, including itself. x == x is false if the value of x is NaN." Their italics.

(http://www.gnu.org/s/libc/manual/html_node/Infinity-and-NaN.html)

Matlab and IDL both agree 0*NaN == NaN.

Could this just be a bug in the Intel Compiler Suite?

--Etittley 14:44, 21 July 2009 (BST)

What does Nan*0.0 evaluate to?

Not specified. So never rely on it to evaluate to either 0 or NaN.

Different compilers and compiler flags produce different answers.

See #Testing for NaN in C for further details.

--Etittley 12:17, 22 July 2009 (BST)

Higher quality patch plots in Matlab

Short answer:

  1. matlab>> print -dpdf T_maps.pdf
  2. csh> pdftops -eps T_maps.pdf T_maps.1.eps
  3. csh> eps2eps T_maps.1.eps T_maps.2.eps
  4. csh> set NewLine=`grep %%BoundingBox T_maps.2.eps`
  5. csh> cat T_maps.1.eps | sed "/Bounding/ c$NewLine" > T_maps.eps
  6. csh> rm T_maps.1.eps T_maps.2.eps

At issue is that Matlab rasterizes any figure with patch elements when generating an EPS (Encapsulated PostScript) file. The raster image is low resolution (150dpi). One can boost the resolution with the -r600 option to print, but that typically generates a 15Mb file. I think it writes it as an uncompressed raster, while higher-level postscript can actually accept compressed raster images. In any case, the solution given by the example above produces a 500kb file that seems to be 600dpi.

  1. In matlab, export the figure to a PDF file.
  2. At the shell, convert the PDF to an EPS file using pdftops. The bounding box will end up being the papersize.
  3. Process the EPS file with eps2eps, producing a file with the proper bounding box (good) but uncompressed raster (bad).
  4. Get the value of the bounding box from the eps2eps-processed file.
  5. Replace the BoundingBox value in the file produced with pdftops with the one from the eps2eps-processed file. The resultant file is the one we want to keep.
  6. Clean up.

--Etittley 15:47, 31 August 2009 (BST)

C, write(), and a 2GB file limit

Even if you have a 64bit OS and a filesystem that supports files >2Gb in size, write() still has the limitation of being able to only write <2Gb at a time. At least on linux systems.

Here's a bit of code to get around the limitation.

#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>

int main(void) {
 ssize_t rez;
 int f = creat("filldisk.dat",S_IRUSR | S_IWUSR);
#if 0
 double *c = calloc(1024,sizeof(double));
 while (1) {
   rez = write(f, c, sizeof(double)*1024);
   if (rez<0) break;
 }
#else
 const size_t TWO_GB           = ((size_t)32768)*((size_t)65536);;
 const size_t FOUR_GB          = (size_t)2*TWO_GB;

 char *c = calloc(FOUR_GB,sizeof(char));
 ssize_t BytesToWrite=(ssize_t)FOUR_GB;
 ssize_t ByteToStartFrom=0;
 do {
  printf("Writing %li bytes starting at %li ...",
         (long int)BytesToWrite,(long int)ByteToStartFrom);
  fflush(stdout);
  rez = write(f, c+ByteToStartFrom, BytesToWrite);
  printf("Wrote %li bytes\n",rez);
  fflush(stdout);
  BytesToWrite -= rez;
  ByteToStartFrom += rez;
 } while(BytesToWrite > 0);
#endif
 close(f);
 return 0;
}

The first block demonstrates that write() can generate a file bigger than 2GB.

The second block demonstrates a loop that will write out the block, regardless of limitations (other than filesize).

--Etittley 14:19, 11 February 2010 (UTC)

Tunneling to ROE's internal tWiki

Access to the ROE's tWiki (hosted on apache) can be achieved via:

ssh -L 20000:apache:80 username@sshserver.roe.ac.uk

Note that you need to replace username with your ROE account login and replace sshserver with an appropriate external SSH server.

To access the tWiki from the client, point the browser to: http://localhost:20000/

--Etittley 18:14, 25 April 2010 (BST)

Installing Flash on 64bit Linux

See: http://forums.debian.net/viewtopic.php?f=16&t=53036

In particular, follow the instructions "Howto get flash working in 64bit using nspluginwrapper"

--Etittley 23:27, 24 June 2010 (BST)

Crazy file permissions

Q: ls -l mydirectory reports files with permission -????????? ? ? ? ?  ? myfile.ext

A: Who the hell knows?

--Etittley 12:54, 1 April 2011 (BST)

Personal tools