Debugging and Profiling
Arm-Forge
On ruche, you can use Arm Forge (Ultimate editation) to debug an MPI/OpenMP or CUDA codes with DDT, profile a code with MAP or analyze it with Arm performance reports. Some user guides and tutorials are available on https://www.linaroforge.com/download-documentation. The most appropriate way to use Arm Forge on ruche is via remote connect with Forge remote client and reverse connect mechanism, as explained below.
Prerequisite
Install the Forge client on your remote laptop/workstation: download the 24.0.1 version, launch it and enter the following configuration for ruche:
- Connection name: ruche
- Host name: login@ruche.mesocentre.universite-paris-saclay.fr (where login is your login on ruche)
- Remote installation directory:
/gpfs/softs/debuggers_and_profilers/arm/forge/24.0.1
.
Click on Ok and Close.
Reverse connect mechanism
1/ On ruche, load the arm forge environment module, arm-forge/24.0.1/oneapi-2023.2.1
, and those needed to compile your code.
2/ Compile the code with -O0 and -g options.
3/ Write the SLURM script to run the ddt or map command with --connect
option. Here is an example of debugging an MPI application example.exe with ddt:
$ cat ruche_submit.sh
#!/bin/bash
#! SBATCH Directives
#SBATCH --job-name=cpujob
#SBATCH --output=%x.o%j
#SBATCH --time=01:00:00
#SBATCH --ntasks=4
# number of licenses should be equal to ntask
#SBATCH --licenses=arm-forge:4
#SBATCH --partition=cpu_short
# To clean and to load the same modules at the compilation phases
module purge
module load intel/19.0.3/gcc-4.8.5
module load intel-mpi/2019.3.199/intel-19.0.3.199
module load arm-forge/24.0.1/oneapi-2023.2.1
# To compute in the submission directory
application="$SLURM_SUBMIT_DIR/example.exe"
#! Work directory (i.e. where will the job run)
workdir="$SLURM_SUBMIT_DIR"
# execution with 'ntasks' MPI processes
ddt --connect --mem-debug=balanced srun -n $SLURM_NTASKS $application
4/ Submit this SLURM script using the sbatch
command.
5/ Once the code is running on ruche, launch the remote client on your laptop. In the "remote launch" section, select ruche. A new window appears and reads
"Connecting to login@ruche.mesocentre.universite-paris-saclay.fr ..." ... login@ruche.mesocentre.universite-paris-saclay.fr's password: "
Enter the password of your ruche account.
Note that we recommend not using a gateway (via your ssh/.config
file, for instance) to access ruche. The gateway may interfere with the reverse connect mechanism for the remote client.
A new window appears and reads:
"A new Reverse Connect request is available from nodeXXX for Arm DDT. Command Line: --connect --mem-debug=balanced srun -n 4 /gpfs/users/login/example.exe Do you want to accept this request?"
Accept and click on Run to debug your code.
6/ When you have finish debugging, click on "End Session" in File and quit the Forge client.
Type squeue
on ruche and check that your job has finished.
Intel Parallel Studio (Cluster Edition)
Intel Parallel Studio includes the following profiling tools:
- Intel® Application Performance Snapshot (APS)
- Intel® VTune™ Profiler
- Intel® Advisor
- Intel® Inspector
- Intel® Trace Analyzer and Collector
Versions
$ module avail intel-parallel-studio
-------------------- /gpfs/softs/modules/modulefiles/tools ---------------------
intel-parallel-studio/cluster.2019.3/intel-19.0.3.199
intel-parallel-studio/cluster.2020.2/intel-20.0.2
Intel® Advisor
This tool provides:
- Vectorization and Code Insights
- CPU / Memory Roofline Insights and GPU Roofline Insights
- Offload Modeling
- Threading
Help command:
$ module load intel-parallel-studio/cluster.2020.2/intel-20.0.2
$ advixe-cl --help
Intel(R) Advisor Command Line Tool
Copyright (C) 2009-2020 Intel Corporation. All rights reserved.
Usage: advixe-cl <--action> [--action-option] [--global-option] [[--]
<target> [target options]]
<action> is one of the following:
collect Run the specified analysis and collect data. Tip: Specify the search-dir when collecting data.
command Issue a command to a running collection.
(...)
Example for SIMD vectorization
See SIMD training course at IDRIS, "Support de cours", part "Outils" and "Travaux pratiques du cours" (hands-on), tp3 with hydro code.
1/ Load the latest version of intel-parallel-studio module:
module load intel-parallel-studio/cluster.2020.2/intel-20.0.2
2/ Compile your code with intel compiler and -g
3/ Collect data
Script example (adapted from "SIMD training course" at IDRIS):
cat adv.slurm
#!/bin/bash
#SBATCH --job-name=hydroadvice
#SBATCH --ntasks=1
#SBATCH --time=00:30:00
#SBATCH --output=hydroadvice.%j.out
#SBATCH --error=hydroadvice.%j.out
#SBATCH --partition=cpu_short
module load intel-parallel-studio/cluster.2020.2/intel-20.0.2
advixe-cl -collect survey -interval=10 -data-limit=1024000 -project-dir ./Adv/ -- ./hydro input_sedov_noio_10000x10000.nml
advixe-cl -collect tripcounts -flop -interval=10 -data-limit=1024000 -project-dir ./Adv/ -- ./hydro input_sedov_noio_10000x10000.nml
Submit this SLURM script using the sbatch
command.
4/ When the job is finished, use ruche visualization interface to launch advixe-gui
Maqao
MAQAO (Modular Assembly Quality Analyzer and Optimizer) is a performance analysis and optimization framework operating at binary level with a focus on core performance. Its main goal of is to guide application developpers along the optimization process through synthetic reports and hints.
Versions
$ module avail maqao
--------------------- /gpfs/softs/modules/modulefiles/debuggers_and_profilers ----------------------
maqao/2.14.1/gcc-9.2.0
Help command:
$ module load maqao/2.14.1/gcc-9.2.0
$ maqao --help
[mouronval@ruche02 hydro_maqao]$ maqao --help
Synopsis:
maqao <command>|<script.lua> [...]
Description:
MAQAO (Modular Assembly Quality Analyzer and Optimizer) is a tool for application performance analysis
that deals directly with the target code's binary file.
The current version handles the following architectures:
- x86_64
(...)
Example for SIMD vectorization
See SIMD training course at IDRIS, "Support de cours", part "Outils" and "Travaux pratiques du cours" (hands-on), tp3 with hydro code.
1/ Load the latest version of intel-parallel-studio module:
module load intel-parallel-studio/cluster.2020.2/intel-20.0.2
2/ Compile your code with intel compiler and -g
3/ Write and submit SLURM script
Script example (adapted from "SIMD training course" at IDRIS):
[mouronval@ruche02 hydro_maqao]$ cat maqao.slurm
#!/bin/bash
#SBATCH --job-name=hydromaqao
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=10
#SBATCH --time=00:30:00
#SBATCH --output=hydromaqao.%j.out
#SBATCH --error=hydromaqao.%j.out
#SBATCH --partition=cpu_short
module load intel-parallel-studio/cluster.2020.2/intel-20.0.2
module load maqao/2.14.1/gcc-9.2.0
# ./hydro input_sedov_noio_10000x10000.nml
maqao lprof -- ./hydro input_sedov_noio_10000x10000.nml
Submit this SLURM script using the sbatch
command.