# BEE2: a multi-purpose computing platform for radio telescope DSP applications

Chen Chang, John Wawrzynek, Bob Brodersen Dan Werthimer, Melvyn Wright EECS, UC Berkeley Space Science Laboratory, SETI Institute Radio Astronomy Laboratory

# **Radio Astronomy Correlators**

Radio Astronomy Correlators and Data Rates



**UC Berkeley** 

#### Polarization, Mosaicing, and time multiplexing

- Polarization required for many astrophysical situations
- Single polarization requires time multiplexing
- Polarizer is frequency and polarization dependent
- Polarization observation are inefficient and difficult to schedule
- Mosaicing requires time multiplexing
- Polarization, mosaicing, and multiple spectral lines require multiple tracks
- Calibration of dual polarization is more robust
- Dual polarization improves sensitivity for spectral lines

# **Relative performance of CARMA**

| Telescope                                                | CARMA-15       | CARMA-23 | SMA       | ALMA      | ACA         | IRAM PdB |
|----------------------------------------------------------|----------------|----------|-----------|-----------|-------------|----------|
| nants                                                    | 6x10.4 + 9x6.1 | + 8x3.5  | 8x6.1     | 64x12     | 4x12 + 12x7 | 6x15     |
| collecting area $[m^2]$                                  | 773            | 850      | 230       | 7238      | 914         | 1060     |
| average diameter [m]                                     | 8.1            | 6.9      | 6.1       | 12        | 8.5         | 15       |
| nants x diameter                                         | 122            | 158      | 49        | 768       | 136         | 90       |
| number of baselines                                      | 105            | 253      | 28        | 2016      | 120         | 15       |
| $\max baseline/antdiam$                                  | 328            | 571      | 82        | 375       | 8           | 58       |
| polarizations                                            | 1              | 2        | 2         | 2         | 2           | 2        |
| $\operatorname{continuum BW} / \operatorname{pol} [GHz]$ | 4              | 8        | 2         | 8         | 8           | 4        |
| spectral windows /pol                                    | 8              | 8        | 6x4       | 8         | 8           | 8        |
| window bandwidths [MHz]                                  | 2 - 500        | 2 - 500  | 82        | 31 - 2000 | 31 - 2000   | 20 - 320 |
| spectral channels/window                                 | 64             | 64       | 64 - 2048 | 2048      | 8192        | 64 - 512 |
| total spectral channels/pol                              | 512            | 512      | 3072x2    | 16384     | 65536       | 4096     |
|                                                          |                |          |           |           |             |          |

# **CARMA's role**

- Wide range of spatial scales
  - Heterogeneous array of 10.4, 6.1 and 3.5 m antennas
  - Antenna spacings from 4m to 2km
- Spatial frequencies sampled by interferometer
  - Multiple primary beams decouple source from primary beam illumination
- Calibration of single dish and heterogeneous array from overlap in spatial frequencies
- Excellent UV coverage using CARMA and SZA

## CARMA-23 UV coverage dec=-30



### CARMA-23 UV coverage dec=+30

I 230.0000 GHz



Dec 12, 2004

## CZ without 3.5~6.1m spacings



UC Berkeley

#### CZ + DZ including 3.5~6.1m spacings



#### **Fourier Transform of Saturn Images**



**UC Berkeley** 

# **CARMA** as R&D array

- Accessible to instrumentalists and students
- Facilitate adding user instruments and software
- General purpose DSP backend provides access to a wider range of users, by lowering the barrier of entry
  - Pulsar processing
  - RFI mitigation
  - Beamforming

#### **Berkeley Wireless Research Center**

UC Berkeley

A partnership of UC Berkeley researchers, industry, and government

- 15 Industrial Members:
  - Atmel Corporation
  - Cadence Design Systems
  - Conexant Systems
  - Ericsson Radio Systems
  - Hewlett Packard Company
  - Hitachi Ltd.
  - Infineon Technologies
  - Intel Corporation
  - NEC Corporation
  - Philips Research
  - Qualcomm Incorporated
  - Samsung Electronics
  - STMicroelectronics
  - Sun Microsystems
  - Xilinx Incorporated
- Other Funding:
  - DARPA, AFRL, NSF, ONR, MARCO, MURI, CA Energy Com.

Dec 12, 2004

- Operational since Feb.1999
- 12000 sq. feet facility located downtown Berkeley, CA
- 60 UCB EECS Graduate Students, 11 Faculty





# **BEE1 System Overview**

**Analog Front-end** 



# **BEE1 system development time**

- system conception & specification: 2 FTE month
  PCB design (schematics/layout): 12 FTE month
  PCB fabrication
- Hardware system testing and characterization: 2
   FTE month
- Software system development (CAD tools, User Interface, Linux OS): 12 FTE month

#### **BEE1 application development time**

- 1 Mbps QPSK Transceiver: 1 FTE month
  BCJR decoder: 3 FTE month
- IDCT MPEG encoder: 4 FTE month
- 4X4 SVD block: 6 FTE month

# **BEE2 system design philosophy**

- Compute-by-the-yard
  - Modular computing resource
  - Flexible interconnect architecture
  - On-demand reconfiguration of computing resources
- Economy-of-scale
  - Ride the semiconductor industry Moore's Law curve
  - All COTS components, no specialized hardware
  - Survival of application software using technology independent design flow

# **BEE2 Module: PCB board**

- Over 250 billion CMAC/s
- Up to 12.8 GBps memory bandwidth, with maximum 8 GB capacity
- 360 Gbps I/O bandwidth in 18 Infiniband 4X connectors
- 14X17 inch 22 layer PCB (FR4, 4/4 mils)



#### **Unified Digital Processing Architecture**



- Distributed per antenna spectral channel processing
- Multiple reconfigurable backend application processing
- Commercial packet switched interconnect
- Backend data pulling through remote DMA access
- Locally synchronous, global asynchronous

# **BEE2 subsystem hardware cost**

- BEE2 module: ~\$20K
  - PCB: \$6.5K
  - DDR2 Memory: \$3.5K
  - FPGA: \$10K
  - Mechanical, etc: \$0.5K
- Infiniband switch
  - 24-port IB: \$8K
  - 96-port IB: \$60K
  - 2 meter cable: \$107
  - 12 meter cable: \$333





#### **BEE2 correlator hardware cost table**

| Name             | COBRA | CARMA First Light<br>(15,1.5,1,96) | CARMA<br>(15,4,1,128) | CARMA<br>(15,4,1,256) | CARMA<br>(15,4,1,512) | CARMA<br>(23,4,2,256) | CARMA<br>(23,4,2,512) |
|------------------|-------|------------------------------------|-----------------------|-----------------------|-----------------------|-----------------------|-----------------------|
| Antenna          | 6     | 15                                 | 15                    | 15                    | 15                    | 23                    | 23                    |
| Bandwidth        | 4     | 1.5                                | 4                     | 4                     | 4                     | 4                     | 4                     |
| Polarization     | 1     | 1                                  | 1                     | 1                     | 1                     | 2                     | 2                     |
| Baselines        | 15    | 105                                | 105                   | 105                   | 105                   | 253                   | 253                   |
| PFB ch/IF band   | 32    | 96                                 | 128                   | 256                   | 512                   | 256                   | 512                   |
| IF bands         | 8     | 3                                  | 8                     | 8                     | 8                     | 8                     | 8                     |
| FPGA (PFB)       | 1     | 2                                  | 4                     | 8                     | 15                    | 23                    | 46                    |
| FPGA (XMAC)      | 3     | 7                                  | 17                    | 17                    | 17                    | 81                    | 81                    |
| BEE2 (PFB)       | 1     | 1                                  | 1                     | 2                     | 4                     | 6                     | 12                    |
| BEE2 (XMAC)      | 1     | 2                                  | 5                     | 5                     | 5                     | 21                    | 21                    |
| Digitizer boards | 24    | 22.5                               | 60                    | 60                    | 60                    | 184                   | 184                   |
| IB cables        | 62    | 77                                 | 196                   | 204                   | 218                   | 738                   | 784                   |
| IB switches      | 1     | 2                                  | 4                     | 4                     | 5                     | 16                    | 18                    |
| \$ PFB (K)       | 20    | 20                                 | 20                    | 40                    | 80                    | 120                   | 240                   |
| \$ XMAC (K)      | 20    | 40                                 | 100                   | 100                   | 100                   | 420                   | 420                   |
| \$ cables (K)    | 6     | 8                                  | 20                    | 20                    | 22                    | 74                    | 78                    |
| \$ switch (K)    | 8     | 16                                 | 32                    | 32                    | 40                    | 128                   | 144                   |
| \$ digitizer (K) | 120   | 68                                 | 180                   | 180                   | 180                   | 552                   | 552                   |
| \$ total(K)      | 174   | 151                                | 352                   | 372                   | 422                   | 1,294                 | 1,434                 |
| Dec 12-2         | 2004  | *                                  | UC Berk               |                       | -                     |                       | 20                    |

Dec 12, 2004

# **Project timeline**

- BEE2 PCB layout design (9/2004, done)
- PCB fabrication of 2 prototype boards (12/2004, done)
- 2 compute nodes testing and characterization (1/2005)
- 10 node system manufacturing (4/2005)
- Demonstration applications:
  - SETI billion channel spectrometer (6/2005)
  - 32 antenna 500MHz dual polarization correlator (12/2005)
  - Wide-field imager (12/2005)

٠

# Future: BEE3 in sight!

- Xilinx just announced Virtex-4 family
  - 4~6X performance improvement
- DDR2 Memory specification up to 800MHz, 4GB per DIMM
  - 100 Gbps Infiniband specification under development
- Direct scaling of BEE2 architecture
- Implementation possible in 1 years



# **Staged Development of Peta-BEE**

| machine           | BEE2 prototype | BEE2 full-rack | BEE3 full-rack  | BEE4 prototype | BEE4 full-rack   |
|-------------------|----------------|----------------|-----------------|----------------|------------------|
| year              | Q1 2005        | Q3 2005        | Q1 2006         | Q1 2007        | Q3 2007          |
| chip technology   | 130 nm         | 130 nm         | 90 nm           | 65 nm          | 65nm             |
| fixed-point perf. | 1.6-2 TOPS     | 32-40 TOPS     | 128-160 TOPS    |                | 2048-2560 TOPS   |
| FP performance    | 64-80 GFLOPS   | 1.3-1.6 TFLOPS | 5.12-6.4 TFLOPS |                | 82-122 TFLOPS    |
| with acceleration | Ban            |                |                 |                | (400-600 TFLOPS) |
| special           |                |                |                 | custom masks   | custom masks     |
|                   |                |                |                 | stacked-die    | stacked-die      |

UC Berkelev

#### Notes:

- "prototypes" are 2 modules.
- "full-rack" versions are 40 modules, plus necessary switches, power supplies, etc.
- BEE3 implementation is optional.
- Schedule reflects technical feasibility. Development schedule would build in slack.

Dec 12, 2004

• BEE4:

- Assumes 65nm by late 2006.
- Special masks may be needed to provide proper balance of I/O, memory, and logic.
- Special masks could boost floating point performance (5x) if needed by applications.
- Memory die stacked on FPGAs to gain 4x in density.

23

## The BEE2 Team

- Faculty in charge
  - John Wawrzynek
  - Bob W. Brodersen
- Graduate students
  - Chen Chang
  - Pierre-Yves Droz
  - Nan Zhou
  - Yury Markovskiy
  - Zohair Hyder
  - Adam Megacz
  - Alexander Krasnov
  - Hayden So
  - Kevin Camera

- Industrial Liaison
  - Bob Conn (Xilinx)
  - Ivo Bolsens (Xilinx)
- Research associates
  - Dan Werthimer (SSL)
  - Melvyn Wright (UCB, RAL)
  - Don Backer (UCB, astro)
- Technical staff
  - Brian Richards
  - Susan H. Mellers
- Undergraduate student
  - Greg Gibeling