What steps will reproduce the problem? 1.Create machiensefile for mpich2 with localhost 2.Run adda_mpi 3.Unplug network cable
What is the expected output? What do you see instead?
Uninterruped execution
At the end of execution there's this error:
Fatal error in MPI_Finalize: Other MPI error, error stack:
MPI_Finalize(281).....: MPI_Finalize failed
MPI_Finalize(209).....:
MPID_Finalize(131)....:
MPIDI_PG_Finalize(106): PMI_Finalize failed, error -1
What version of the product are you using? On what operating system? adda_mpi: 1.1a Win32 SP3 + latest patches 2-core CPU
Please provide any additional information below. This seems to happen whenever there's a change in the network configuration, e.g., network cable is plugged or unplugged, not immediately, but at the end of adda_mpi execution. However, I'm running adda_mpi on my local machine alone. I noticed this behavior with nearfield too, but after I changed the definition of the host to localhost (127.0.0.1), the error had stopped. Nevertheless, I still see this error with adda_mpi (v.1.1a)
Comment #1
Posted on Feb 13, 2011 by Happy DogI guess the error occurs at the end of ADDA execution, and all the simulation results are produced normally. Is it so? So the error happens when MPI_Finalize is called immediately before exit.
I have tried to reproduce your problem, using MPICH2 1.3.1, but could not get any errors. I ran several times mpiexec -machinefile mf -n 2 adda_mpi -grid 64 -m 1.2 0 I've tried two versions of file mf: "localhost:2" and "127.0.0.1:2" I unplugged the network cable between the half and end of the simulation
Overall, it seems like an error of MPI implementation (MPICH2), which should ignore the problems with network if it is not really relevant. I noticed before that MPICH2 does use network even when running locally. For example, Windows firewall produces warning asking that mpiexec is trying to access the network. I am not sure whether this is a bug or not, but it may at least explain the errors.
The problem may be sensitive to the particular version of MPICH2. So here I attach the latest adda_mpi, compiled linking to MPICH2 1.3.1. Also I think that the most important is the version of MPICH2 installed at machine where this program is run.
Actually, when I run mpiexec on my laptop locally, I do not use any machinefile at all. So you may try to run mpiexec -n 2 adda_mpi ... to see if there is any difference. Another option is mpiexec -localonly 2 adda_mpi ... which should force MPICH2 to use only local resources.
- adda_mpi.exe 393.82KB
Comment #2
Posted on Jun 10, 2011 by Happy DogWe got no replication of these issue, so there seems nothing to fix.
Status: WontFix
Labels:
Type-Defect
Priority-Low
Component-UI
MPI