stencil-3d-7pts
stencil-3d-7pts¶
This example evaluates the performance of 7-point stencil. The kernel records
the start
and end
of spmv
by tsc counter. In addition the tsc
counters of all PEs are not sychronized in the beginning. To avoid the timing
variation among those PEs, sync()
synchronizes all PEs and samples the
reference clock.
The kernel kernel.csl
defines a couple of host-callable functions,
f_sync()
, f_tic()
and f_toc()
in order to synchronize the PEs and
record the timing of spmv
.
The kernel allreduce/pe.csl
performs a reduction over the whole rectangle
to synchronize the PEs, then the bottom-right PE sends a signal to other PEs
to sample the reference clock.
The kernel stencil_3d_7pts/pe.csl
performs a matrix-vector product (spmv)
where the matrix has 7 diagonals corresponding to 7 point stencil. The stencil
coefficients can vary per PE, but must be the same for the local vector. The
user can change the coefficients based on the boundary condition or curvilinear
coordinate transformation.
The script run.py
has the following parameters:
-k=<int>
specifies the maximum size of local vector.--zDim=<int>
specifies how many elements per PE are computed.--channels=<int>
specifies the number of I/O channels, no bigger than 16.
The tic()
samples “time_start” and toc()
samples “time_end”. The
sync()
samples “time_ref” which is used to adjust “time_start” and
“time_end”. The elapsed time (unit: cycles) is measured by
cycles_send = max(time_end) - min(time_start)
The overall runtime (us) is computed via the following formula
time_send = (cycles_send / 0.85) * 1.e-3 us
The bandwidth is calculated by
bandwidth = ((6*w*h*4)/time_send)