### **JLab Streaming DAQ Test Stand**

Hall A Collaboration - 1/31/19 Ed Jastrzembski Jefferson Lab DAQ Group

# Motivation

- For CERN LHC Run 3 (2021) the ALICE collaboration is upgrading their TPC with a GEM based detection system that is read out continuously.
- A new front end ASIC (SAMPA) was developed for this purpose.
- We are interested in seeing how experiments at Jefferson Lab can take advantage of this technology and of the continuous readout concept.

#### **SAMPA Block Diagram**



#### **SAMPA Block Diagram**



**Direct Mode – bypass DSP** 

(Raw data rate (10MHz) = 3.2Gb/s = MAX output of chip)

- <u>Charge Sensitive Amplifier (CSA)</u>
  - Integrates and amplifies short current pulse
  - Output is a Voltage signal with amplitude proportional to the total charge Q
  - Tail of Voltage pulse is long (T = Rf\*Cf)
  - Vulnerable to pile-up unless followed by a shaping filter
- <u>Shaper</u>
  - Creates a 4<sup>th</sup> order semi-Gaussian pulse shape
  - Available shaping times (TS): 80, 160, 300 ns (SAMPA V3, V4)
  - Permits sampling by ADC at reasonable rates (10, 20 MHz)
  - 80 ns option eliminated in order to reduce noise in CSA
  - SAMPA V5 is now in development with 80, 160 ns shaping times

#### **Pulse from Shaper**



- <u>ADC</u>
  - 10 bit precision
  - 10 MSPS or 20 MSPS (5 MHz for ALICE TPC)
  - Split capacitor fully differential SAR architecture (low power)
  - ADC data rate = 10 MSPS \* 10 bits \* 32 channels = 3.2Gb/s (6.4 Gb/s)



Successive Approximation Register

• <u>DSP</u>

- <u>Baseline Correction 1 (BC1)</u> removes low frequency perturbations and systematic effects
- <u>Digital Shaper</u> (DS) tail cancellation or peaking time correction (IIR filter)
- <u>Baseline Correction 2 (BC2)</u> moving average filter
- <u>Baseline Correction 3 (BC3)</u> slope based filter (alternative to BC2)
- <u>Zero suppression</u> fixed threshold
- Formatting; encoding for compression Huffman
- Buffering (16K x 10 bit)

- <u>e-link</u>
  - Electrical interface for transmission of serial data over PCB traces or electrical cables, for distances of several meters
  - Up to 320 Mb/s
  - Developed by CERN for the connection between Front-end ASICs and their GigaBit Transceiver (GBTx) chip
  - Based on SLVS standard (Scalable Low-Voltage Signaling) supply voltage as low as 0.8 V
  - Radiation-hard IP blocks for integration into ASICs
  - SAMPA: 11 e-links  $\rightarrow$  3.52 Gb/s max data output
  - <u>Number and speed of SAMPA e-links used is programmable</u>

### **SAMPA Specifications (ALICE)**

| Specification                   | TPC                  | МСН                 |
|---------------------------------|----------------------|---------------------|
| Voltage supply                  | 1.25 V               | 1.25 V              |
| Polarity                        | Negative             | Positive            |
| Detector capacitance (Cd)       | 18.5 pF              | 40 pF - 80 pF       |
| Peaking time (ts)               | 160 ns               | 300 ns              |
| Shaping order                   | 4th                  | 4th                 |
| Equivalent Noise Charge (ENC)   | < 600e@ts=160 ns*    | < 950e @ Cd=40 pF*  |
|                                 |                      | < 1600e @ Cd=80 pF* |
| Linear Range                    | 100 fC or 67 fC      | 500 fC              |
| Sensitivity                     | 20 mV/fC or 30 mV/fC | 4 mV/fC             |
| Non-Linearity (CSA + Shaper)    | < 1%                 | < 1%                |
| Crosstalk                       | < 0.3%@ts=160 ns     | < 0.2%@ts=300 ns    |
| ADC effective input range       | 2 Vpp                | 2 Vpp               |
| ADC resolution                  | 10-bit               | 10-bit              |
| Sampling Frequency              | 10 (20) Msamples/s   | 10 Msamples/s       |
| INL (ADC)                       | <0.65 LSB            | <0.65 LSB           |
| DNL (ADC)                       | <0.6 LSB             | <0.6 LSB            |
| ENOB (ADC)**                    | > 9.2-bit            | > 9.2-bit           |
| Power consumption (per channel) |                      |                     |
| CSA + Shaper + ADC              | < 15 mW              | < 15 mW             |
| Channels per chip               | 32                   | 32                  |

 $R_{esd} = 70\Omega$ 

\*\* @ 0.5MHz, 10Msamples/s

### **ALICE SYSTEM**



- FEC Front End Card (160 ch / FEC)
- <u>CRU Common Readout Unit</u> (~12 FECs / CRU = ~1920 ch / CRU
- DCS Detector Control System
- LTU Local Trigger Unit

# **Common Readout Unit (CRU)**

- Interface between the on-detector systems, the online computing system, and the Central Trigger Processor
- Multiplexes data from several front-end links into higher speed data links
- Can do processing on data (big FPGA)
- Sends trigger, control, and configuration data to front-ends
- Based on commercial high-performance FPGA
- Located outside of radiation area, so no worry of SEUs
- PCIe platform

### **SAMPA Readout**



Time frame is programmable (max = 1024 ADC samples) 102.4 us @ 10 MSPS ( 51.2 us @ 20 MSPS)

### **SAMPA Readout**

#### <u>Continuous mode</u>

- New time frame starts when preceding frame is finished
- All channels and chips use the <u>same</u> time frame aligned by the *sync* input of the chip (at startup)

#### <u>Triggered mode</u>

- Time frame starts when external trigger is received
- Data from ADC can be delayed by up to 192 samples to account for trigger latency
- All channels use the same time frame
- All chips that are programed with the same delay (latency) have time frames that are aligned (assuming triggers are aligned)

### **SAMPA Readout - Zero Suppression**

- <u>Cluster</u> consecutive ADC samples above threshold ( > 1)
- <u>pre/post samples</u> can be included in the cluster (programmable number same for all channels of chip)
- Clusters are <u>merged</u> if there are up to 2 samples below threshold separating them
- For each <u>time frame</u> all channels produce their own <u>data packet</u> from the cluster data
- <u>Header</u> for data packet has time stamp (bunch crossing counter)
- Cluster data has time offset (sample number) appended

#### **SAMPA zero suppression**



Figure 1.9: Basic detection scheme.



Figure 1.10: Feature extraction with two extra samples before pulse and three after.

#### **SAMPA zero suppression**



Figure 1.11: Glitch filtering with minimum samples above threshold of 2. Samples in solid black are treated as if they were below the threshold.



Figure 1.12: Merging of close clusters. Samples in red are included to make one complete cluster.

#### **SAMPA zero suppression encoding**



Figure 1.13: The SAMPA data format for zero suppression encodings.

# **SAMPA** packets

- Besides the data type packet (e.g. zero suppression encoding) the SAMPA can produce some special packets
- <u>Heartbeat packet</u> generated as a result of a signal on the heartbeat trigger pin
  - No payload; conveys only bunch crossing count
  - Sent only on serial link 0
  - Highest priority; sent immediately after current packet has completed transmission
  - Used as a marker in the data stream

# **SAMPA Sync and Trigger inputs**

- hb\_trg a pulse on this input causes the capture of the beam crossing count and a <u>heartbeat packet</u> is created
- trg this is the event trigger when running in triggered mode. When running in continuous mode a pulse on this input <u>causes a new time frame</u> to be started and so is effective in synchronizing multiple devices.
- **bx\_sync\_trg** a signal on this input will reset the bunch crossing counter and so serves to synchronize this counter across multiple devices

### Linking Triggered and Continuous Data

- All data packets from both triggered and continuous sources are time stamped with bunch crossing number
- Heartbeat Trigger
  - Non-physics trigger generated by Central Trigger Processor (CTP)
  - Regular frequency, <u>highest priority</u>
  - All detector readout systems respond by inserting a "Heartbeat Event"
  - These events separate the data streams into pieces (<u>heartbeat time</u> <u>frames</u>) that are used in event building
  - Event building nodes get different frames; data associated with trigger near end of frame may extend to *next* frame, so at least part of the next frame must also be sent to node (unless small data loss allowed)
  - Can also be used as a <u>synchronization event</u>: by sending global time stamp with heartbeat trigger, detector readout unit can compare with its local time stamp and report/correct difference

### **GBT** link



Single bidirectional optical link simultaneously provides data paths for:

- Timing and Trigger Control (TTC)
- Data Acquisition (DAQ)
- Slow Controls (SC)

**Fixed Latency** 

### **GBT frame format**



80 bit payload 80 bits x <u>40 MHz</u> = **3.2 Gb/s**  FEC – Forward Error Correction Corrects up to 16 bit burst error

### Wide frame format



# JLab Test Stand Goal

- Determine if the **SAMPA** chip is appropriate for detectors systems at JLab
- To achieve this goal we should:
  - Understand the SAMPA front end response to detector signals
  - Learn how to utilize the complex SAMPA DSP functionality to reduce data volume
  - Deal with a continuous readout data stream and link it with triggered data streams from other sources
- The last point goes beyond the SAMPA chip. <u>Continuous readout systems</u> <u>are expected to be used in many future experiments</u>.

### Goals

- Ideally we should have a test system with an architecture that can be scaled up and used for the final detector
- We should have a mechanism to pulse the inputs in a controlled fashion to study the effects of pileup and high rates on the SAMPA's DSP functions
- We should be able to connect the test system to an existing detector (e.g. prototype GEM detector)

Chosen Path – use as many components of the ALICE TPC readout/control chain as possible



FEC – Front End Card (160 ch / FEC)

<u>CRU – Common Readout Unit</u> (12 FECs / CRU = 1920 ch / CRU)

DCS – Detector Control System

LTU – Local Trigger Unit

# Advantages of the ALICE system

- System components have been verified and tested together.
- Almost "plug and play".
- Development is reduced to coding (VHDL for data processing and formatting in FPGA, and software for readout and integration into CODA).
- Although the FEC would have to be redesigned to match the detector, the data transport model and sub-components (GBTx, GBT-SCA, VTRx, VTTx) can be used in the final solution.
- The CRU can be used in the final solution.
- What we learn from the test setup can be carried over to the actual system implemented.
- We have acquired 5 FECs and 1 CRU

# ALICE Front End Card (FEC)

- **Design group** Oak Ridge National Lab (ORNL)
- Plan
  - ORNL gave us all manufacturing files and details necessary to duplicate FEC circuit board
  - We purchased the specialized components (SAMPA, GBTx, ... ) and had the board assembled
  - 5 FECs fabricated



#### 

# ALICE Common Readout Unit (PCIe40)

- ALICE development firmware for the PCIe40 available
- Firmware implements the custom protocol of the GBTx chips using the FPGA gigabit transceivers
- PCle Gen 3 x16 interface included (100 Gb/s)
- Remaining FPGA resources for data processing and formatting
- <u>Software to configure and monitor SAMPA available</u>

**Negative** – due to delays and a high demand within the collaboration, we can't get one anytime soon

### **Alternative to ALICE CRU**

- <u>ATLAS Readout Unit (BNL-712)</u>
- **Design group** Brookhaven National Lab (BNL)
- Part of the FELIX (Front-End LInk eXchange) system
- PCIe based custom designed <u>identical in concept to ALICE CRU</u>
- Firmware exists that implement GBTx custom protocol and PCIe interface

**Negative –** Needs **work** to integrate FELIX readout unit with ALICE front end card

• Purchased 1 BNL-712



### **Test Pulse Board**

- Designed a test pulse PCB to inject a known charge Q into SAMPA inputs
- Allows controlled study of SAMPA pulse processing and data flow from the FEC
- Simple, flexible, cheap
- Plugs directly into FEC connector or to FEC through a cable



SAMPA linear range Q < 100 fC (V < 100 mV)

Tight tolerance on C (2%)

Q precision depends on pulse generator (use attenuator)

Can use Cd value to simulate detector capacitance

#### **Test Pulse PCB**





### **Plan of Action and Progress**

- All components in place October 1
- Power board measure all voltages **O.K.**
- Configure GBTx0 using external I2C master (Bus Pirate) **O.K.**
- Configure SAMPA chips through GBT link and GBT-SCA O.K.
- Read out pedestal data in direct ADC mode (bypass DSP) **ACTIVE** 
  - Data successfully read out working to understand it
- Input pulses into SAMPA with pulse generator card and read out data
- Configure SAMPA to use DSP with zero suppression and read out data
- Configure system with multiple front end cards (5) and read out data
- <u>Map front end card connectors to existing prototype GEM detector (800 channels) and read out data in continuous mode</u>
- <u>Slow start</u> trying to use FELIX card with ALICE front end card

#### Problems:

- ALICE Front End Card hardwired to use GBT links in <u>wide frame mode</u> (NO forward error correction).
- <u>Although the FELIX configuration tools support the wide frame mode</u> <u>there are no firmware builds in the FELIX user repository that support it</u>.
- We have to modify the firmware ourselves.
- Requested access to the FELIX firmware design repository at CERN
- Waiting . . . . . . . .
- Moved to 'Plan B'

#### **Common Readout Receiver Card (C-RORC)**

- ALICE C-RORC / ATLAS RobinNP identical except for firmware
- Developed for current LHC <u>Run 2</u> (some will be reused in ALICE Run 3)
- > 300 boards installed in ALICE and ATLAS
- Because the ALICE CRU is not available most ALICE detector test systems for the upgrade use the C-RORC (including SAMPA TPC test stands at Oak Ridge and in Europe)
- <u>Now that ALICE Run 2 has ended</u> we have borrowed one of their spare C-RORCs and installed the GBT protocol firmware on it
- <u>We're back to 'plug and play' again since all the SAMPA TPC test stand</u> <u>software developed by others will directly run on our system</u>

#### **C-RORC**

#### Xilinx Virtex-6 FPGA



PCIe Gen2, 8 Lanes 8x 5.0 Gbps, connected to Xilinx PCIe Hard Block

(~ 30 Gb/s)

Up to 6.6 Gbps per

channel

## Summary

- Slow initial progress due to challenges of integrating ATLAS FELIX with ALICE front end card
- Have to create our own firmware version for the FELIX readout unit (several weeks project)
- Delayed this effort and will instead use the C-RORC in the test stand
- Received C-RORC from Europe just before the holiday break
- In a short time we were able to completely configure the front end card (GBTX, GBT-SCA, SAMPA chips) and read out data
- Now trying to understand the data in direct ADC mode (bypass DSP)

## **Direct ADC mode**

- There are NO headers or other markers in SAMPA data for this mode
- Essential that the many serial data streams are synchronized at startup
- On start of readout all SAMPAs are simultaneously triggered to output a synchronization sequence (32 cycles of alternating 0x2B5, 0x14A)
- Readout software detects sequence from all SAMPAs to confirm synchronization
- <u>Right now we are not correctly identifying the synchronization sequence in</u> <u>the readout data</u>
- Consulting with our European friends in ALICE (Torsten Alt, Stefan Kirsch)
- Found 2 errors in the setup configuration looking for that 'last' one

# **Reference Slides**

# **Useful Sources of Information**

- TDRs for the Upgrade of ALICE
  - <u>https://cds.cern.ch/record/1622286/files/ALICE-TDR-016.pdf</u>
  - <u>http://cds.cern.ch/record/1603472/files/ALICE-TDR-015.pdf</u>
- Other
  - <u>https://www.bnl.gov/aum2014/content/workshops/Workshop\_1/bnl\_david\_s</u> <u>ilvermyr.pdf</u>
  - <u>http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7031978</u>
  - <u>http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7543104</u>
- SAMPA chip prototype tests
  - <u>http://iopscience.iop.org/article/10.1088/1748-0221/12/04/C04008/pdf</u>
  - <u>http://iopscience.iop.org/article/10.1088/1748-0221/11/02/C02088/pdf</u>

- Collection of links to up-to-date SAMPA technical documents
  - <u>https://docs.google.com/spreadsheets/d/16SnfEWtvvZYONnxmMhVzUo-St-</u> ZtPRVV3Z6mfy13dRU/edit?usp=sharing

#### **Background - ALICE TPC**



Volume = 90 cubic meters (largest in world) ~ 100 us electron drift time (90% Ne – 10% CO2) Current detector – MWPC (end plates) (0.5 M channels)

## ALICE TPC



ROC = Read out chamber

<u>Active Gating Grid</u> - trigger causes grid to be transparent, allowing ionization electrons to pass into the amplification region. After 100 us, Gating Grid is biased with alternating voltage that renders grid opaque to electrons and ions. This protects the amplification region against unwanted ionization from the drift region, and prevents back-drifting ions from entering the drift volume (leading to driftfield distortion).

Trigger rate limited to 3.5 KHz

# LHC Luminosity Upgrade

- LHC Run 3 (2021)  $\rightarrow$  50 KHz interaction rate (Pb-Pb)
- ~ 5 events (100 us \* 50 KHz) concurrent in TPC volume
- TPC Gating grid would cause large loss of data
- Replace MWPC with **quad-layer GEM detectors** (resistant to backflow of ions into drift volume).
- Continuous readout of TPC data desirable (~1 TByte/s)
- New ASIC developed requirements set to meet needs of both TPC and Muon chambers

### **SAMPA Analog Front-end Details**



- Negative and positive polarity CSA with capacitive and resistive feedback connected in parallel
- Pole-Zero Cancellation network
- High pass filter
- Two bridged-T second order low pass filters
- Non-inverting stage

### **Analog Front-end Details**

- First shaper is a scaled down version of the CSA and generates two first poles and one zero
- Copy of the first shaper connected in unity gain configuration is implemented in order to provide a differential mode input to the next stage
- Second stage of the shaper is a fully differential second order bridged-T filter and it includes a Common-Mode feed back network
- Non-inverting stage adapts the DC voltage level of the shaper to use the full dynamic range of the ADC. It consists of a parallel connection of two equally designed Miller compensated amplifier.

| Gain    | Shaping time |
|---------|--------------|
| 30mV/fC | 160 ns       |
| 20mV/fC | 160ns        |
| 4mV/fC  | 300 ns       |

# Another approach

- From Scratch Build a prototype Front-End Card (FEC) for SAMPA chip
  - Use FPGA on FEC to multiplex serial data streams (e-links) from SAMPA(s) into multi-gigabit data stream(s)
  - Optical link to module for data processing, formatting, and readout (e.g. JLab Sub-System Processor (SSP))
  - Reverse optical link to FEC for programming of SAMPA chip (I2C)
  - <u>Advantages</u>:
    - simple concept
    - some components on hand (SSP)
  - <u>Disadvantages</u>:
    - hardware and firmware development
    - **Non-trivial** PCB (mixed-signal design, fine pitch BGA components)
    - doesn't easily translate to final design due to radiation effects on FPGA and commercial optical transceivers



Fig. 7: Updated block diagram of the TPC FEC.

For the data multiplexing into optical links, the radiation-hard CERN GBT [8] and Versatile link components are used. This scheme has not changed with respect to the TPC Upgrade TDR. However, the SAMPAs are connected in a different way to the 2 GBTx ASICs located on each FEC. The GBT system is operated in the Wide Bus Mode where the total bandwidth for the uplink (from the detector to the CRU) is increased by 40 % with respect to the standard GBT mode. In Wide Bus Mode the forward error correction is switched off. However, the radiation load at the TPC front-end electronics is comparingly, relatively low [9], such that no influence on the bit error rate is to be expected. In this mode, a total of 28 input eLinks at 160 Mbit/s are available per GBTx ASIC. The total available 56 input eLinks match nicely the 55 output eLinks from the 5 SAMPA ASICs.

#### **Successive Approximation ADC**



Successive Approximation ADC Block Diagram

The successive approximation register is initialized so that the most significant bit (MSB) is equal to a digital 1. This code is fed into the DAC, which then supplies the analog equivalent of this digital code ( $V_{ref}/2$ ) into the comparator circuit for comparison with the sampled input voltage. If this analog voltage exceeds  $V_{in}$  the comparator causes the SAR to reset this bit; otherwise, the bit is left a 1. Then the next bit is set to 1 and the same test is done, continuing this binary search until every bit in the SAR has been tested. The resulting code is the digital approximation of the sampled input voltage and is finally output by the SAR at the end of the conversion (EOC).



## **Heartbeat Trigger**



### **GBTX** architecture



E-Link - Electrical serial link (SLVS)

## **E-Link Groups**

- 5 E-link groups for normal mode, 7 E-link groups for wide mode
- Each E-link group of GBT frame is assigned 16 bits of data in frame
- <u>Flexible E-link speed</u> (frame rate = 40 MHz)

| type | # E-links in group | bits per E-link | E-link speed |
|------|--------------------|-----------------|--------------|
| 8x   | 8                  | 2               | 80 Mb/s      |
| 4x   | 4                  | 4               | 160 Mb/s     |
| 2x   | 2                  | 8               | 320 Mb/s     |



#### **FEC – ALICE version**

#### SAMPA packet

| 0 50       |         |                   |  |
|------------|---------|-------------------|--|
| Hea<br>der | Payload | (variable length) |  |
|            |         |                   |  |

#### Packet header

| 0 567       | 9 10          | 19 20 23 2 | 24 28 29 | )        | 48 49  |
|-------------|---------------|------------|----------|----------|--------|
| Hamming P I | PKT Num words | H add      | CH add   | BX count | D<br>P |

| Name      | Bits | Description                              |            |
|-----------|------|------------------------------------------|------------|
| Hamming   | 6    | Hamming code                             |            |
| Р         | 1    | Parity (odd) of header including hamming |            |
| PKT       | 3    | Packet type, see table 2.6               |            |
| Num words | 10   | Number of 10 bit words in data payload   |            |
| H add     | 4    | Hardware address of chip                 |            |
| CH add    | 5    | Channel address                          |            |
| BX count  | 20   | Bunch-crossing counter (40MHz counter)   | Time stamp |
| DP        | 1    | Parity (odd) of data payload             |            |