# **Clock/Control for Streaming Readout**

- 1. Clock Control requirement
- 2. JLAB Trg/Clk/Sync/Busy overview
- 3. Existing studies
- 4. Proposed Clock/Control system
- 5. Feasibility studies
- 6. Summary

William Gu & Ben Raydo

Streaming Readout VI May 2020, CNU\_virtual



## 1. Clock Control Requirement

- Clock jitter: ~ps for ADC measurement;
- Clock Time Interval Error : ~ps for TDC measurement, hit coincidence between different parts of the detector;
- Phase aligned slower clocks derived from the fast system clock;
- Correct (synchronized) time\_zero for streaming data time stamp;
- Calibration 'events' (detector, electronics, and timestamp etc.)
- .....



ADC: jitter, neighboring clocks

TDC: TIE, absolute time measurement; jitter, if a reference exists (the width measurement)





- Same clock for the system; •
- Frontend electronics is Synced to one clock period (4 ns);
- Delay adjustment on TI to compensate for the fiber delays;
- Clock jitter: several ps;
- Clock TIE: source dependent only;
- Board setup/control: VME



# 3. Existing Studies

3.1 Clock recovery from SerDes

Xilinx MGT recovered clock: jitter ~ps, but TIE ~ 100ps. Good enough for ADC, but not for TDC

3.2 White Rabbit Network

Based on Etherment PTP, and Sync Ethernet (1 Gbe) Jitter ~ps, but TIE is sub\_ns Clock is limited to 10MHz, and 125 MHz

#### 3.3 CERN IpGBT

Radiation tolerant (TID and SEU) But it is optimized for 40 MHz clock, and TIE ~10s ps





# 4. Proposed Clock/Control system:

4.1 Dedicated CLOCK distribution tree using QSFP connectors

#### CCM: Clock\_Control\_Master

-- control encoding, status decoding.

The interface for machine clock source, Data acquisition, detector control and monitor, Front end electronics. The CCM may be a PCIe card sitting in the DAQ computer, or it may be built in on the CCD with an Ethernet connection to the DAQ computer

#### CCD: Clock\_Control\_Distribution

The clock/control fanout, and status/busy accumulation. Each CCD may have some CCM functions for the subsystem test; The CCD can be chained for expandability.

#### CCN: Clock\_Control\_Node

-- control decoding, status encoding. The clock/control interface to the front end electronics. It will be a mezz. card with the decoded control, clock, data, monitoring, and the required power to/from the carrier board (or it can be built-in on the carrier board, where a standard QSFP connector is required)





## 4.2 QSFP signal assignment

It is not expensive (Commercial !) and relatively compact:

QSFP multimode optic transceiver \$40 (~100 meters),

Single-mode optic transceiver: \$200 (~km),

short Copper cables <\$50 (~meters)</pre>



#### 4.3 PCB diagrams





Thomas Jefferson National Accelerator Facility

## 4.3 PCB diagram







## 4.4 Clock/Control working model

4.4.1 Clock distribution (fan out tree)

LVPEL/LVDS/optic distribution: minimum added jitter (< 1ps)

Super low jitter buffers readily available

Any frequency clock (to match with the accelerator) (> optic transceiver frequency requirement)

Lower frequency clock phase deterministic synthesizers with optional jitter cleaner: SI534x, AD9510, etc

ightarrow ~ps jitter, common clock is easily achievable

#### 4.4.2 Slow control /asynchronous control (GBE)

The slow control is through the GBE network asynchronously. Each CCN and CCD will have a unique (16-bit) setting (MAC, also possible for a local network IP). No other buses (e.g. VME) are needed. The GBE can reprogram the CCN/CCD remotely;

The status can be accessed through the GBE network, and the dedicated links.



## 4.4 Clock/Control distribution working model

#### 4.4.3 Fast control /Clock synchronous control

The CCN (and the CCD input QSFP) can initiate Fiber delay measurement by signal loopback. The measurement results can be used to compensate the delays of the synchronization commands. The requirement on the fiber latency measurement precision is half of the transmitted clock period.

The fast controls (e.g. RESET) are coded at the system clock rate, then Manchester encoded, and the control commands are generated on the minimum multiple of the front end clock periods, and distributed synchronously.

 $\rightarrow$  The slower clocks on the CCN are phase aligned across the system



ferson National Accelerator Facility

## 4.4 Clock/Control distribution working model

#### 4.4.4 How to deal with data overflow

Option#1: using the BUSY feedback

When the computer readout can not catch up with the data input, it can send a PAUSE fast control signal to the front end to stop the data streaming;

Meanwhile, the front end can also send a PAUSE\_REQUEST by asserting the BUSY signal indicating that the front end buffer is about to overflow.

Option#2: Automatic data drop

The front end can drop all the data in a certain time interval (for example 100 us) if the front end can not stream the data out promptly.

The daq computer can also drop certain time intervals, if the computer can not process them, or the data is too late.

Because all the front end data has time stamp, and the data should be formatted in certain time buckets (100 us for example). (This also means that the front end will send 'no data', or empty bucket, with this time stamp to indicate no data is dropped by front end). The event recorder can keep track the data lost (either dropped by the front end or dropped by the data processor).

Both options will be designed as the option#1 is more efficient if the data loss is frequent, and option#2 is more efficient if the data loss is occasional.



## 4.5 not just Clock/Control distribution

## The CCD can also be served as the data collection board.

- The CCN (or its carrier board) serves as a front end data collection board ( >= 10 gbps ). The zerosuppression should be performed, (timestamp and BUSY are needed)
- The CCD merge the data and sends them directly to a commercial card inside the DAQ computer (for example: Xilinx VCU1525 (shown here), or Xilinx AI board VCK190 (will be available).
- The data can be processed by the online computers, or the commercial FPGA (Acceleration, or AI) boards;



#### 4.6 Where is the software (CODA)



## 4.6 Where is the software (CODA)



lefferson National Accelerator Facility

# 5. Feasibility studies:

## 5.1 FADC $\rightarrow$ VTP

The VTP can 'be' a carrier board (hosting the CCN), and generate data at >= 10 gbps The VTP can also 'be' a data concentrator board, collects 16 inputs and pass the data at >= 40 gbps

## 5.2 VETROC $\rightarrow$ Xilinx KCU116:

QSFP: VETROC (Streaming TDC) → Xilinx KCU1500: 20 Gbps with 8b/10b encoding, limited by the Artix-7 capability;

KCU1500  $\rightarrow$  host PC: > 5 Gbyte/s with PCIe3x8;



#### 5.3 CC Test board:

A new version of the PCI express Trigger Interface (TIpcie) board is being prototyped. It can be used to test the proposed QSFP fiber assignments, to test the CCM, CCD, and CCN distribution tree, and data readout.



also:

**FPGA** based high resolution absolute timing measurement (64-channel TDC)



#### 5.3 CC Test board:

Test board configurations

#### CCM: PCIe→QSFP



#### CCN: QSFP $\leftarrow \rightarrow$ CarrierBoard



#### CCD: QSFP $\rightarrow$ QSFP distribution



#### CCD: Data Concentrator (2xQSFP $\rightarrow$ PCIe)



#### CCN: Data Acquisition (TDC $\rightarrow$ QSFP)







# 6. Summary:

## The Clock Control for streaming readout is feasible

- The dedicated clock distribution tree is easier, flexible and cost manageable;
- The SYNC can be accomplished with the fiber length measurement, and delay compensation;
- The distribution system is Ethernet controlled, does not depend on a crate (VME)

#### The Data acquisition for streaming readout

- The Front end FPGAs do the zero suppression
- Extra processing in the Xilinx (AI) board, not just the computers

But, depends on the radiation in the EIC, we may have to adapt the IpGBT, or other rad hard components

# Thank You

