Cell Simulator

With PPU outputing Hello World, it is time to incorporate the SPUs into the program. (Note: if you haven't done the previous tutorial, Configuring and writing the first Cell app, you need to have it done first.)

For, ease of use, in the main directory of helloworld create hello.h:
(MAMBO)/Code/helloword/hello.h:
#ifndef __hello_h__
#define __hello_h__

/* This union helps clarify calling parameters between the PPE and the SPE. */
typedef union{
    unsigned long long ull;
    unsigned int ui[2];
}addr64;

#endif

The reason for this header file will be explained later.

create hello_spu.c in the (MAMBO)/Code/helloword/spu/ directory and then edit the Makefile (in that directory) to reflect changes.

Makefile:
# --------------------------------------------------------------
# (C)Copyright 2001,2006,
# International Business Machines Corporation,
# Sony Computer Entertainment, Incorporated,
# Toshiba Corporation,
#
# All Rights Reserved.
# --------------------------------------------------------------
# PROLOG END TAG zYx

########################################################################
# Target
########################################################################

PROGRAMS_spu := hello_spu
LIBRARY_embed64 := hello_spu.a


########################################################################
# Local Defines
########################################################################

IMPORTS = $(SDKLIB_spu)/libc.a


########################################################################
# make.footer
########################################################################

include /opt/IBM/cell-sdk-1.1/make.footer
For hello_spu.c:
#include "../hello.h"
#include

int main(unsigned long long speid, addr64 argp, addr64 envp) {
    printf("Hello world, from a SPU!");

    return 0;
}
Now, a simple 'make' in the (MAMBO)/Code/helloword/spu/ directory should yield:
$ make
/opt/sce/toolchain-3.2/spu/bin/spu-gcc -W -Wall -Winline -Wno-main -I. -I /opt/IBM/cell-sdk-1.1/sysroot/usr/spu/include -include spu_intrinsics.h -O3 -c hello_spu.c
hello_spu.c: In function 'main':
hello_spu.c:4: warning: unused parameter 'speid'
hello_spu.c:4: warning: unused parameter 'argp'
hello_spu.c:4: warning: unused parameter 'envp'
/opt/sce/toolchain-3.2/spu/bin/spu-gcc -o hello_spu hello_spu.o -Wl,-N /opt/IBM/cell-sdk-1.1/sysroot/usr/spu/lib/libc.a
/opt/sce/toolchain-3.2/ppu/bin/ppu-embedspu -m64 hello_spu hello_spu hello_spu-embed64.o
/opt/sce/toolchain-3.2/ppu/bin/ppu-ar -qcs hello_spu.a hello_spu-embed64.o

Now, back into the (MAMBO)/Code/helloword/ppu/ we need to change hello.c to actually call the SPU's hello_spu program. (Note: even if you compiled the ppu right now it doesn't have the spu's program in it.) We are creating a CESOF or an executable that contains both the executable for the PPU and for the SPU. When executed the PPU sends part of the program to each SPU for execution. So, without further ado, here is the new hello.c for the PPU:

#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <libspe.h>
#include <sched.h>

extern spe_program_handle_t hello_spu;
spe_gid_t gid;
speid_t speids[8];
int status[8];

int main(int argc, char *argv[]){
    int i;

    printf("Hello World!\n");

    gid = spe_create_group (SCHED_OTHER, 0, 1);
    if (gid == NULL) {
        fprintf(stderr, "Failed spe_create_group(errno=%d)\n", errno);
        return -1;
    }

    if (spe_group_max (gid) < 8) {
        fprintf(stderr, "System doesn't have eight working SPEs. I'm leaving.\n");
        return -1;
    }

    for (i = 0; i < 8; i++) {
        speids[i] = spe_create_thread (gid, &hello_spu,
            NULL, NULL, -1, 0);
        if (speids[i] == NULL) {
            fprintf (stderr, "FAILED: spe_create_thread(num=%d, errno=%d)\n",
                i, errno);
        exit (3+i);
        }
    }

    for (i=0; i<8; ++i){
        spe_wait(speids[i], &status[i], 0);
    }

    __asm__ __volatile__ ("sync" : : : "memory");

    return 0;
}

For those programmers use to pthreads, this code should look similiar. One important thing to remember is that this code is running on the PPU and not the SPU. Anyways, we disect the program.

To start, 'extern spe_program_handle_t hello_spu; ' is used to identify what program we want to run on the SPU (notice how the make file generated hello_spu.a in (MAMBO)/Code/helloword/spu/ directory). 'spe_gid_t gid;' is used to identify what group the SPUs are in. The group, for this tutorial, is a single general group that all the SPU threads belong too. Next, 'speid_t speids[8];' are 8 data structures to hold information about each SPU thread. (Like is it running, identify where thread is which, etc...) 'int status[8];' is used to hold any return status given by a SPU after completion. With the data strutures in place, next are operations on them.

The 'spe_create_group(SCHED_OTHER, 0, 1)' function creates a simple group. Then a simple check is made ot ensure that the group was created successfully. Do these checks! You are working under a simulator that isn't completely bug-free yet. If something goes wrong you will want to find it as quickly as possible. So, after the check, another check 'spe_group_max (gid)' is made to ensure that 8 SPUs are available. If I remember correctly Sony released that each processor will have 8 SPUs, but only garauntees 7 working SPUs. So this check is mainly for the simulator. In a "real" program, you may want to check to restraints. I digress. Now the program runs from 0 to 7 to create each thread with 'spe_create_thread (gid, &hello_spu, NULL, NULL, -1, 0)' The important parameters for now are the gid, and the handle 'hello_spu' which are used to iniate a thread. The other paramemters basically say that nothing is being passed to the SPU. Naturally a check to ensure the theread creation was successfule follows.

With the thread running, the PPU waits for each thread to complete with the 'spe_wait(speids[i], &status[i], 0);' command. As with pthreads, the PPU "stalls" until each SPU thread completes and returns. Finally, an assembly memory sync is issued ot ensure on memory buses are flushed. (Again, to ensure all memory operations are done.)

Again, we need to update the Makefile in order to compile the new ppu code:
# --------------------------------------------------------------
# (C)Copyright 2001,2006,
# International Business Machines Corporation,
# Sony Computer Entertainment, Incorporated,
# Toshiba Corporation,
#
# All Rights Reserved.
# --------------------------------------------------------------
# PROLOG END TAG zYx

########################################################################
# Target
########################################################################

PROGRAM_ppu64 = hello


########################################################################
# Local Defines
########################################################################

IMPORTS = ../spu/hello_spu.a -lspe


########################################################################
# make.footer
########################################################################

include /opt/IBM/cell-sdk-1.1/make.footer
A simple make should yield something like:
$ make
/opt/sce/toolchain-3.2/ppu/bin/ppu-gcc -W -Wall -Winline -I. -I /opt/IBM/cell-sdk-1.1/sysroot/usr/include -O3 -c hello.c
hello.c: In function 'main':
hello.c:12: warning: unused parameter 'argc'
hello.c:12: warning: unused parameter 'argv'
/opt/sce/toolchain-3.2/ppu/bin/ppu-gcc -o hello hello.o -L/opt/IBM/cell-sdk-1.1/sysroot/usr/lib64 -R/opt/IBM/cell-sdk-1.1/sysroot/usr/lib ../spu/hello_spu.a -lspe
Now, just send 'hello' to /tmp/ and run it in the simulator. Should look like:
[root@(none) ~]# callthru source /tmp/hello > hello && chmod +x hello && ./hello
Hello World!
Hello world, from a SPU!
Hello world, from a SPU!
Hello world, from a SPU!
Hello world, from a SPU!
Hello world, from a SPU!
Hello world, from a SPU!
Hello world, from a SPU!
Hello world, from a SPU!
[root@(none) ~]#
For completeness the Makefile in (MAMBO)/Code/helloword/ directory:
# --------------------------------------------------------------
# (C)Copyright 2001,2006,
# International Business Machines Corporation,
# Sony Computer Entertainment, Incorporated,
# Toshiba Corporation,
#
# All Rights Reserved.
# --------------------------------------------------------------
# PROLOG END TAG zYx

########################################################################
# Target
########################################################################

DIRS = spu ppu

########################################################################
# make.footer
########################################################################

include /opt/IBM/cell-sdk-1.1/make.footer
(You should be able to just type 'make' in (MAMBO)/Code/helloword/ directory to compile/update all executable.)

This tutorial was the first to get SPUs working. The rest of these tutorials assume that you are capable of setting up the compiling environment, compiling, moving/executing executable, and settinging up a very simple threads (as discussed in this tutorial. On a ifnal note, the directory structure of having a general .h file in a "root" directory, with directories for PPUs and SPUs is carried out throughout these tutorials & code. You can deviate and write your own all in one directory, but that can get messy.

As always, here is the code: Helloworld