.. _working-with-code-samples: Working With Code Samples ========================= The :ref:`csl-examples` section contains CSL programs, the ``.csl`` files, that each either demonstrate individual features of the language, or solve a larger application problem. Each program is accompanied by a Python script, the ``run.py`` file, to run that program with the simulator. The source for these code samples can be found inside the ``csl-extras-{build id}.tar.gz`` tarball within the release, or at the `CSL examples GitHub repository `_ (request access from developer@cerebras.net). For the GEMV tutorial code samples, we additionally provide step-by-step explanations of the code in the :ref:`csl-tutorials` section. .. warning:: If you're just getting started, we recommend walking through the step-by-step tutorials in :ref:`csl-tutorials` to get a fuller explanation of these programs. Compiling the code samples -------------------------- Each code sample contains a CSL file as the top-level source file, typically named ``layout.csl``. This file may reference additional CSL source files in that directory. Each code sample also contains a ``commands.sh`` script, which contains the commands required to compile and run the example. For example, the ``tutorials/gemv-01-complete-program/commands.sh`` at :ref:`sdkruntime-gemv-01-complete-program` contains: .. code-block:: bash cslc ./layout.csl --fabric-dims=8,3 \ --fabric-offsets=4,1 -o out --memcpy --channels 1 cs_python run.py --name out .. seealso:: See :ref:`csl-compiler` for the compiler options documentation. To compile the program: 1. First cd into the directory that contains the CSL files. 2. Then run the ``cslc`` command shown in the ``commands.sh`` file to compile the program. Note, this command may span multiple lines. This command execution will produce files with the ``elf`` extension. For example: .. code-block:: bash $ cd tutorials/gemv-01-complete-program/ $ cslc ./layout.csl --fabric-dims=8,3 --fabric-offsets=4,1 -o out --memcpy --channels 1 $ ls out bin east generated out.json west $ ls out/bin out_0_0.elf out_rpc.json Running the program ------------------- Use the ``run.py`` Python script that is in the code sample directory to execute the compiled program. For example, to run the above compiled program, execute the following command in the ``gemv-01-complete-program`` directory: .. code-block:: bash $ cs_python run.py --name out If the program runs correctly, you will see the message ``SUCCESS!`` near the end of the output. Debugging --------- See :ref:`debugging-guide` and :ref:`sdk-gui`. .. # This needs to be rewritten for SdkRuntime Run script ---------- The run script ``run.py`` performs the following five tasks: 1. Specifies the ELF files This is done by passing a list of strings, each of which is an ELF filename, to the ``CSELFRunner`` class. This is often accomplished by using python's ``glob`` function to find all the ELF files in the output directory: .. code-block:: python elfs = glob(f"{output_dir}/bin/out_*.elf") # This 'runner' will run the simulator. runner = CSELFRunner(elfs) See also :ref:`elf-api-reference`. 2. Specifies the I/O tensors The ``CSELFRunner`` exports two methods, ``add_input_tensor()`` and ``add_output_tensor()`` to send and receive tensors to/from the host, and they have the following declaration: .. code-block:: python add_input_tensor(color_value, port_map, tensor) add_output_tensor(color_value, port_map, data_type) where ``color_value`` is the color along which you want to send or receive the tensor, ``port_map`` (described below) indicates the mapping of tensor indices to PEs, ``tensor`` is the tensor you wish to send to the chip, and ``data_type`` indicates how to interpret the bytes of the output tensor (for example, ``numpy.float16``). The port map is a mapping specified using notation from the `Integer Set Library `_, and it usually looks like the following: .. code-block:: csl {[=0:] -> [PE[,] -> index[]]} In the above map, the ``tensor_name`` and ``idx_var`` fields may contain any alphanumeric characters or `_`, but may not start with a number. The ``max_idx`` field is the maximum index of the input or output tensor. Indices start at 0, so if the tensor has 10 elements, the maximum index is 9. The ``col`` and ``row`` fields indicate the PE(s) that should send/receive the elements of the tensor and the direction that the tensor will arrive at. Depending on the direction of the input tensor, you add or subtract 1 from the appropriate coordinate. Specifically: 1. For tensors arriving on the EAST link, add 1 to the ``col`` coordinate 2. For tensors arriving on the WEST link, subtract 1 from the ``col`` coordinate 3. For tensors arriving on the NORTH link, subtract 1 from the ``row`` coordinate 4. For tensors arriving on the SOUTH link, add 1 to the ``row`` coordinate For example, if PE 3,0 is to receive a tensor of size 10 from the NORTH direction, use the following map: .. code-block:: python "{in_tensor[idx=0:9] -> [PE[3,-1] -> index[idx]]}" The index variable can also be used for ``col`` and ``row`` values, thus allowing you to distribute a tensor along the **edge** of the rectangle. Moreover, the map permits use of multiple index variables as well. For example, if you wish to gather 10 elements from each of 20 PEs located along the WEST edge of the rectangle, then use the following map: .. code-block:: python "{input[peIdx=0:19, valIdx=0:9] -> [PE[-1,peIdx] -> index[peIdx, valIdx]]}" .. note:: Only PEs along the edge of the rectangle can send/receive tensors to/from the host. 3. Executes the program The ``CSELFRunner`` class exports the function ``connect_and_run()`` to execute the program. The function accepts an optional string argument pointing to a filename that will be used to save the memory contents at the end of execution. This output is often useful for debugging. 4. Optionally reads the output tensor(s) The output tensors produced by the program are saved in the ``out_tensor_dict`` dictionary of the ``CSELFRunner`` object. To retrieve the tensor, use the tensor from the port map as a key into the ``out_tensor_dict`` dictionary. For example: .. code-block:: python runner = CSELFRunner(elfs) # PE 4,0 sends one 32 bit unsigned int to the North. We capture this as an # output tensor called "my_out_tensor". out_port_map = "{my_out_tensor[idx=0:0] -> [PE[4,-1] -> index[idx]]}" runner.add_output_tensor(out_color, out_port_map, np.uint32) runner.connect_and_run() value = runner.out_tensor_dict['my_out_tensor'] 5. Optionally reads the memory output The ``CSELFRunner`` class provides a pair of methods to read memory contents *after* calling ``connect_and_run()``: ``get_symbol`` and ``get_symbol_rect``. The ``get_symbol`` method reads the specified symbol on a single PE. The ``get_symbol_rect`` method reads the value of the specified symbol on a rectangle of PEs. For example, to read the value of global variable ``x:u32`` from every PE: .. code-block:: python runner = CSELFRunner(elfs) runner.connect_and_run() offset = (1, 1) # top-left corner of the PE rectangle width = 4 height = 4 rectangle = (offset, (width, height)) xs = runner.get_symbol_rect(rectangle, "x", np.uint32) print(f"The value of 'x' on PE in column 1, row 2 is: {xs[1,2]}") The file containing the memory output of the execution can also be used to inspect the state of the PE memory at the end of the execution via the ``ELFLoader`` class. This can be used to inspect memory contents of past runs without re-running the simulation. For example, to read the value of global variable ``x:u32`` from every PE: .. code-block:: python from cerebras.elf.cself import ELFLoader, ELFMemory ... runner = CSELFRunner(elfs) # Tell CSELFRunner to generate a dump of the # memory contents at the end of the run by # specifying the name of an output file. runner.connect_and_run('memory-core.out') loader = ELFLoader(core_file='memory-core.out') for elf in elfs: loader.set_elf(elf) for col, row in ELFMemory(elf).iter_coordinates(): x = loader.get_as_array_from(col, row, "x", np.uint32) print(f"value of x on PE {col},{row} is {x}") Moving From Simulation To Hardware ---------------------------------- After successfully simulating your CSL program, you can run it on hardware by following the below guidelines when using ``cslc``: Pass ``--arch`` flag ~~~~~~~~~~~~~~~~~~~~ Use the ``--arch`` flag with ``cslc``. This will ensure that the appropriate Cerebras system is targeted. Allowed values are ``--arch=wse1`` for WSE-1 architecture and ``--arch=wse2`` for WSE-2 architecture. For example: .. code-block:: bash cslc --arch=wse2 ./layout.csl --fabric-dims=8,3 \ --fabric-offsets=4,1 -o out --memcpy --channels 1 Provide ``--fabric-dims`` ~~~~~~~~~~~~~~~~~~~~~~~~~ When compiling for simulation with ``cslc``, the ``--fabric-dims`` flag can be any bounding box large enough to contain your program's PEs. However, when compiling for hardware, these dimensions must match your Cerebras system's fabric dimensions. For example: .. code-block:: bash cslc --arch=wse2 ./layout.csl --fabric-dims=757,996 \ --fabric-offsets=4,1 -o out --memcpy --channels 1 Provide IP address to SdkRuntime ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ To run on the Cerebras system hardware, you must pass the IP and port address of the network-attached Cerebras system to the ``cmaddr`` argument of the ``SdkRuntime`` constructor in your ``run.py``: .. code-block:: python runner = SdkRuntime(compile_dir, cmaddr="1.2.3.4:9000")