ELMB Microcontroller Firmware and SCADA Integration for the LHCb Muon Detector Readout Control System

V. Bocci, F. Iacoangeli, F. Messi, R. Nobrega, D. Pinci, W. Rinaldi

INFN Sezione Roma e Università “La Sapienza”

rafael.nobrega@roma1.infn.it

Abstract

The LHCb system requires high efficiency muon detection into LHC bunch crossing: 95% into a 25 ns time window. To reach such efficiency many parameters of the detector readout apparatus have to be calibrated and adjusted and its channels must be aligned in time. In addition, essential characteristics must be monitored to guarantee a good working condition of the apparatus (to avoid loss of efficiency and to minimize systematic errors). As the number of the muon readout parameters is extremely high (~700000 registers), a system able to process information in parallel is required: 122000 readout channels will be controlled by about 600 microcontrollers and 6 computers. The complexity of such an apparatus requires the use of a distributed system. For this a Supervisory Control And Data Acquisition (SCADA) based system is being developed to control the entire detector readout equipment. Moreover, a Finite State Machine (FSM) implementation is being developed to integrate the Detector Readout Control (DRC) into the LHC Experiment Control System (ECS).

I. INTRODUCTION

The LHCb Muon Detector [1] is divided into 5 stations (M1 to M5). One station is located before the calorimeters while the other ones are placed at the end of the detector. The Detector is made up of 1368 Multiwire Proportional Chambers (MWPC) and 24 Gas Electron Multiplier (GEM) chambers (for the innermost region).

The detector output signals are read by front-end electronics (FEE). It consists of 7536 16-channels front-end boards [2] (FEB), 168 Intermediate Boards (IB) and 152 Off Detector Electronics [3] (ODE) Boards. In addition, 156 Service Boards (SB) and 10 Pulse Distribution Modules (PDM) are used to control and send clock machine synchronous pulses to the very front-end electronics. In total there are about 122000 physical channels (detector output channels) to be read out by the FEE system. Physical channels are transformed into logical channels by means of FEB and Intermediate Board logical combinations. This process reduces the number of channels to be processed by the ODE from about 122000 to 26000. The ODE arriving signals receive a bunch-crossing identification number before being sent to the level zero trigger (L0) pipeline where they waits for the L0 Decision Unit (4 µs latency). In case of a L0 trigger positive answer, data are sent to TELL1 (“Trigger ELectronics and L1 board”), readout consolidation modules, where event information is eventually transmitted as packets via a Gigabit Ethernet network to the event building farm. Accepted data are then elaborated and kept available to the DAQ system. In addition, each input ODE signal has an available TDC (Time to Digital Converter) channel. It allows measuring phase (with 1.5 ns resolution) of arriving detector signals with regard to the bunch-crossing signal.

The DRC system (illustrated by the block named Service Board in Figure 2) is responsible mainly for control of all front-end boards. In addition, the DRC system can generate test pulses synchronous with the LHC machine clock in order to test and time align (together with the ODE TDC and the front-end adjustable delay feature) the detector readout channels. Two boards have been developed for this purpose: Service Board and Pulse Distribution Module. Both make use of Embedded Local Monitor Board [4] (ELMB) (developed by the ATLAS Detector Control System group). The ELMB is a radiation tolerant plug-on board based on the ATmega128 AVR ATMEL microcontroller and the SAE81C91 SPI CAN.
controller. The microcontroller uses RISC architecture and is fabricated in 0.35µm technology.

Integration of ELMB into the LHCb Muon DRC system has been implemented via the introduction of new features within different firmware versions (see section II).

Due to the LHCb detector radiation environment, errors in front-end electronics will occur. This can cause the entire readout system to malfunction: it is vital to detect and repair those errors in a short period of time to avoid permanent damage and errors in the data-stream. In addition, once the detector is mounted, access to readout electronics equipment becomes increasingly difficult. Therefore the control system of an apparatus with such a complexity should be able to control and supervise the detector conditions, correcting and recovering automatically from errors as soon as they occur. Such a result can be achieved by the development of a SCADA and FSM based system (see section III).

A. Detector Readout Control Electronics

The Embedded Local Monitor Board (ELMB) is the core of the DRC system. It is in charge of accessing and controlling all Service Board, Pulse Distribution Module and front-end components. As a consequence of the ELMB centre role, many multi-device procedures can also be written inside its firmware, therefore reducing execution time and CPU load otherwise increased by running such procedures on the main control computers. This approach shows a substantial degree of parallelism, which is reflected into about 600 ELMB microcontrollers utilized in the apparatus.

Another key component is the Timing, Trigger and Control [5] receiver (TTCrx). The TTCrx chip recovers and distributes the 40.08 MHz LHC clock machine with minimum jitter via an optical fiber link. The LHC clock signal arrives to the TTCrx and is recovered by means of two independent high-resolution phase shifters, providing a programmable delay, afterwards available for synchronisation purposes. In addition it contains bunch-crossing and event counters. The former counts the number of bunch-crossings while the latter the events accepted by L0 trigger. TTCrx can also receive broadcast or individually-addressed commands from the TTC distribution network (each TTCrx into the LHC clock distribution system has a unique 14-bit channel identification number). TTCrx is also in charge of transmitting first-level trigger accepted decisions and their associated bunch and event identification numbers.

1) Pulse Distribution Module

The Pulse Distribution Board (see Figure 3) has been implemented mainly to distribute the machine clock (from the TTCrx chip) to Service Boards and to generate pulses (by means of logic implemented in an FPGA) at set bunch-crossing identification numbers, based on the TTCrx bunch-crossing counter, or at specific Timing and Fast Control [6] (TFC) commands to the Service Boards. Communication is controlled by the on-board ELMB. Both components, TTCrx and FPGA, are accessible via I2C protocol. The PDM has additional components to reset and power cycle all other ELMBs in each DRC crate.

2) Service Board

The Service Board (see Figure 4) has been implemented to adapt ELMB for FEE control. A Service Board can host up to 4 ELMBs. Many additional features have been included in the board: 12 ports for front-end electronics control, 4 flash SPI and one EEPROM memory devices, 32 input/output digital signals and an FPGA used mainly to generate synchronous pulses for the front-end electronics.

3) DRC Full System Outline

DRC electronic boards are inserted into 10 equipment crates, alongside the Muon Detector. A crate consists of one PDM and up to 20 SBs. A custom back-plane is used to distribute signals between the boards. Each crate is connected to a different detector sector. The whole system is controlled by six computers.
II. ELMB Firmware Development

In this section the main ELMB procedures applied to its microcontroller firmware and developed specifically for LHCb Muon Detector will be described.

A. Internal Communication Checking

Many procedures have been created to ensure the correct communication between an ELMB and all components it controls. An ELMB checks if such devices respond to commands and, in case of failure, an indication is sent to high level control software. Test routines are executed at system start-up and can also be accessed by means of SDO message requests from client to allow verification of communication during normal operation. In addition, a node identification address (node ID) and bound rate configurations firmware management have been upgraded to allow them to be set remotely. The node ID is obtained from DRC crate backplane signals while bound rate can be modified by writing on the ELMB internal EEPROM at address 0x108.

B. I2C Communication

ELMB firmware works as a complete I2C [7] client device, verifying SCL/SDA conditions and the acknowledge bit generated during each data transfer. I2C is considered to be a safe protocol mainly because of its acknowledgment signal which is used to confirm transactions after each data transfer. For front-end electronics a communication failure can leave detector sectors malfunctioning or even completely off. It is crucial to have a robust communication management between control and front-end electronics. Hence, the following protections have been taken:

- The acknowledge I2C bit is always controlled.
- Write and read commands are verified inside firmware. In case of failure the firmware attempts recovery repeating the operation.
- If failure condition persists an emergency and an error messages are sent to the CAN client. In this way, a communication failure can be retrieved by the CAN client and remedial decisions can be made via high level software.

I2C devices controlled by a microcontroller are: FEBs, a TTCrx, an EEPROM, FPGAs and the I/O ports devices (Philips PCF8575).

C. SPI Memory Communication

A library within the ELMB firmware has been developed in order to read/write data to the Serial Peripheral Interface (SPI) flash memory. Using this new library, memory can be used to keep a large amount of data as it can occur in case of a threshold scan (see section II.G). The SPI device controlled by the microcontroller AT45DB041 (a flash memory chip); a Service Board houses 4 such items, one per ELMB. Its data space is organized into 2048 pages of 264 bytes.

D. Front-end Electronics Control

Many SDO/segmented SDO have been developed to read and write the front-end parameters as threshold, signal delay, signal width, counters, logic combination, etc. In addition a PDO message has been developed to identify and check if the front-end boards are connected and responding. The CANopen protocol message types (SDO, segmented SDO or PDO) have been chosen to optimize the communication traffic respecting the front-end functional structure.

The use of a PDO to control the front-end state allows configuring the ELMB to send to a client a time triggered message, without the need of a request message from that client. It allows monitoring continuously if the front-end boards are working correctly. In addition, the microcontroller is able to remember its front-end boards I2C address values. With such a feature it is possible to send general commands applied only to a subset of front-end boards: those listed by the microcontroller. Some of the procedures using this feature are threshold scan, test pulse generation and DLL calibration.

E. Front-end Test Pulse

Many procedures have been written in the microcontroller firmware for front-end channels test and detector time alignment. With a single SDO command it is possible to send a programmable number of pulses to many front-end boards in parallel. In the same way a frequency pulse signal can be generated. The distance between consecutive pulses is also defined within the SDO message. Pulses can be also generated synchronously with the LHC machine clock. This allows alignment of timing response of each single channel of the LHCb Muon Readout apparatus. The main commands are summarize here: 1) FEBs thresholds are set to a high value to avoid noise triggered pulses. 2) FEBs are set to auto-injection mode by SBs. 3) Generation of clock machine synchronous pulses from the PDM-SB electronics. 4) ODE time histograms can be produced to measure misalignment of incoming pulses, the bunch crossing identifier and the phase of the incoming pulses. 5) FEB delay registers can then be adjusted by the SB system according to the ODE timing measurements. Cables size and circuitry delay of the DRC system should be taken in consideration to achieve an effective time calibration (e.g. the difference in time of pulses generated in different DRC crate slots can reach almost 10 ns).
Tests have been carried out to validate the entire chain as presented in Figure 7. Figure 8 shows two histograms generated at the ODE board after sending 1000 pulses from the PDM-SB system. Each bin of the ODE histogram is equivalent to about 1.5 ns. The fact that all injected pulses fall into a unique bin means that the system is synchronous to the LHC clock machine, with a jitter lower than 1.5 ns. The plot on the left hand side was produced by means of a front-end board digital pulse generation feature while the histogram plotted on the right was obtained via a charge auto-injection facility. In the latter case two different histograms can be seen: due to the high charge injected by the front-end auto-injection setup, a second peak is formed, resulting in two output pulses as a result of a single charge injection pulse. Since injected charge is not adjustable, the second peak cannot be canceled out at the readout circuitry level.

Pulses generated at specific bunch crossing IDs have been tested as well. The histograms of Figure 9 show the front-end timing response, of a M4 inner chamber, at the ODE board after sending about 223 thousand pulses from the PDM-SB electronics. Since all the FEBs are in the same chain, the only physical item that could influence on the timing measurement is the distance between the front-end boards (test pulse carrying cables from a FEB to another). As a result, a 1 ns timing shift per front-end board is observed.

**F. Noise Rate**

The LHCb Muon System requires front-end electronics noise rate below 1 kHz to guarantee a highly efficient event triggering. A specific procedure has been implemented in the ELMB firmware to measure the noise rate of each detector physical channel. Microcontrollers working in parallel are then able to measure the noise rate of the entire detector in a short period of time. Such a firmware procedure can be used to monitor noise rates of all the detector physical channels before and during detector operation. Noisy channels can then have their threshold readjusted to minimize trigger efficiency deterioration.

**G. Threshold Scan**

The threshold scan procedure is able to measure the Equivalent Noise Charge (ENC) present at the front-end channels. Since the scan procedure runs at microcontroller level it is possible to scan many chambers in parallel, minimizing the time needed to scan all 122000 detector channels. During a threshold scan, data is transferred to the local flash memory. To read out data from memory, a segmented SDO has been implemented. Once data are transferred to the control computers, analysis procedures are executed to calibrate threshold and to measure the ENC level of each single channel. Finally it will be possible to have a complete picture of electronics noise level of the LHCb Muon Detector. Threshold scan data will also allow monitoring the front-end chips performance variations along the LHCb operational time.

Two routines have been introduced into the ELMB firmware; one of them foresees usage of zero suppression to reduce memory utilization.

**III. SUPERVISORY SYSTEM**

The LHCb DRC supervisory system can be divided into two different parts, PVSS [8] and FSM. PVSS (Prozess Visualisierungs und Steuerungs System) is the name of the SCADA system chosen as a common slow control software package for LHC experiments. It is a complete SCADA system with data acquisition, archiving, trending, alarm handling and data display functions. Different PVSS modules communicating via TCP/IP protocol handle these functions. Modules communication, relying on TCP/IP, makes it specially suited to assemble a distributed control system. In the DRC system, about 122000 FE channels, or roughly...
700000 control registers, will be managed by PVSS based software running in a distributed mode on six computers. In addition many user-friendly panels have been developed. Many of them are shown in Figure 14.

A Finite State Machine is a representation of an event-driven reactive system. A FSM hierarchy structure has been designed by means of well defined commands and rules in order to describe state transition conditions and automatic reactions according to detector and electronics parameters variations. Figure 13 shows a scheme of the DRC FSM hierarchy and its main possible states.

Figure 13: FSM hierarchy scheme for detector readout supervise

PVSS and FSM structures can be built from a custom made script based on a mapping file describing all the connections from the devices to be monitored up to the high level control components. It makes the system very flexible; the whole system can be reconfigured in few minutes.

IV. CONCLUSIONS

The main goals achieved by the DRC system developments can be summarized as follows:

- Shift of time consuming procedures from CPUs into ELMB firmware, resulting in an increased bandwidth of the control system.
- ELMB firmware has been developed to increase the likelihood of error correction without involvement of high level software.
- The apparatus can be controlled without direct human intervention. It is an essential point once LHCb detector will not be accessible for a long time.
- Control electronics (ELMB firmware), PVSS and FSM can be easily remotely reconfigured.
- SB-PDM synchronous pulse generation together with front-end and ODE control procedures provide: a straight method to check the complex connectivity of the Detector Readout apparatus and an uncomplicated technique to time align all the detector readout channels without beam.

V. REFERENCES