Over the past few years, quantum computing has experienced remarkable growth—from early demonstrations of quantum supremacy to breakthroughs in error suppression using advanced error correction techniques. While last year’s workshop focused on how classical computing can bolster NISQ-era quantum devices to overcome environmental noise and technological limitations, the landscape is rapidly evolving. Today’s research is shifting toward fault-tolerant quantum computing (FTQC) systems capable of reliable, large-scale computation.
QCCC-26 aims to push these boundaries further by exploring innovative designs for quantum-classical cooperative computing (QCCC) systems that span both the NISQ and FTQC eras. The workshop will emphasize architectural and system-level approaches where classical computing not only enhances performance and scalability but also mitigates challenges such as qubit decoherence, constrained interconnectivity, and the overhead of robust error correction. Additionally, demonstrable approaches on current quantum platforms (e.g., IBMQ, IonQ, Quantinuum, QuEra, etc.) will be welcomed.
By bridging past insights with new technological directions, QCCC-26 seeks to bring together researchers from academia, industry, and national laboratories to advance the next generation of hybrid quantum-classical computing systems.
The call for papers can be downloaded here in pdf format.
QCCC-26 will be co-located with ISCA 2026 in Raleigh, NC, USA, on Saturday, June 27, 2026 (half-day).
Topics of interest for this workshop include, but are not limited to:
Topics that are not relevant include pure quantum or pure classical algorithm/hardware design, and benchmarking of quantum algorithms/devices.
All deadlines are Anywhere on Earth (AoE).
We invite 2–4 page extended abstracts (excluding references). Submissions should adhere to standard IEEE or ISCA workshop formatting guidelines. Accepted abstracts can be included in the workshop proceedings.
Accepted papers will be given 15 mins to present in the workshop. The detailed presentation format guidance will be announced soon.
Papers are to be submitted electronically through Easychair at Here.
At least one author of each accepted submission must register for the workshop and present the work.

Prof. Huiyang Zhou
Professor
Dept. of Electrical and Computer Engineering
North Carolina State University
Exploring Hybrid CV-DV Quantum Computing Systems: A Computer Architect’s Perspective
This talk offers an introduction to hybrid Continuous Variable (CV)- Discrete Variable (DV) quantum computing system that is tailored to computer architects with a background in qubit-based (i.e., DV) quantum computing. I will share my experience in exploring hybrid CV-DV quantum computing systems, including analyzing the advantage over DV- or CV-only quantum computing systems, reasoning about the hybrid CV-DV circuits, benchmarking and compilation. The goal of the talk is to inspire more computer architecture and system researchers to join the effort on developing this emerging quantum computing paradigm.
| Time | Title | Author |
|---|---|---|
| 1:30 - 1:35 | Workshop Opening | |
| 1:35 - 2:05 | Invited talk | Prof. Huiyang Zhou |
| 2:05 - 2:25 | Contributed Talk 1: Real-Time Post-Decoding for Quantum Error Correction | Shipra Singh, Narges Alavisamani and Ramin Ayanzadeh |
| 2:25 - 2:30 | Break | |
| 2:30 - 2:50 | Contributed Talk 2: An Analysis of Speculative Window Decoders for Quantum Error Correction | Jocelyn Li and Margaret Martonosi |
| 2:50 - 3:10 | Contributed Talk 3: A Quantum-Classical Approach to Distributed Lattice Surgery via Heralded Punctures | Shuwen Kan, Chenxu Liu, Ying Mao and Samuel Stein |
| 3:10 - 3:30 | Contributed Talk 4: Synthesizing Compound Pulse Gadgets for Hamiltonian Simulation on Trapped-Ion Platforms | Ria Patel, Masoud Hakimi Heris, Yuan Liu and Frank Mueller. |
| 3:30 - 4:00 | Coffee Break | |
| 4:00 - 4:20 | Contributed Talk 5: NWQRE: A Modular Quantum Resource Estimation Workflow for Fault-Tolerant Algorithms | Zhixin Song, Meng Wang, Muqing Zheng, Samuel Stein, Spencer H. Bryngelson, Johannes Muelmenstaedt, Xiangyu Li, Ang Li and Chenxu Liu |
| 4:20 - 4:40 | Contributed Talk 6: Diagonal-Aware SpMV and SpMSpM Kernels for GPU-Accelerated Hamiltonian Simulation | Yuchao Su and Frank Mueller |
| 4:40 - 4:50 | Break | |
| 4:50 - 5:10 | Contributed Talk 7: A Catalyzed Approach to Applying Fault Tolerant T^{1/2^k} Rotations to Surface Codes | Tanner Smith, Tamara Lehman and Ramin Ayanzadeh |
| 5:10 - 5:30 | Contributed Talk 8: Toward Compact Multiqubit Gates on Trapped Ions: A Hardware-Aware Pulse Synthesis | Masoud Hakimi Heris, Kevin J. Joven, Yuan Liu and Frank Mueller |
Real-Time Post-Decoding for Quantum Error Correction
Real-time decoding is a critical bottleneck in Quantum Error Correction (QEC), where decoding must complete within a QEC cycle time (i.e., within 1us in superconducting quantum systems). We observe that fast heuristic decoders such as Union-Find (UF) complete in tens of nanoseconds on average, leaving more than 95% of the time budget unused. Based on this observation, we propose real-time post-decoding, a new direction that uses the remaining time to refine decoding outcomes within the same cycle. We show that in most failed UF decodings, errors are localized: only a few pairs are incorrectly matched, while larger deviations are rare, forming a long-tail distribution. We introduce TailCut, the first realization of real-time post-decoding, that uses the remaining time and available FPGA resources to perform bounded local re-matching, correcting dominant mismatches without exceeding the real-time budget or degrading performance. Our experiments show that TailCut corrects up to 63% of raw UF failures, corresponding to a logical error rate (LER) reduction of up to 2.7x, while preserving real-time execution. These results establish real-time post-decoding as a new direction for accurate real-time quantum error correction.
An Analysis of Speculative Window Decoders for Quantum Error Correction
A Quantum-Classical Approach to Distributed Lattice Surgery via Heralded Punctures
Modular quantum computing distributes computation across multiple nodes, with one approach for inter-node operations realized through heralded Bell-pair generation. In fault-tolerant logical computation with the surface code, entangling gates between distant logical qubits are implemented via lattice surgery, which requires repeated rounds of physical entangling gates to perform stabilizer checks whose outcomes are then passed to a decoder. The number of “seam” stabilizers that require remote physical operations scales with the code distance, demanding a steady supply of Bell pairs across the module boundary. In each round, the heralding signals for Bell-pair generation determine which Bell pairs are available and dictate how the circuit schedule should handle heralds that fail to arrive in time. One option is synchronous: defer syndrome extraction until every required herald has succeeded, at the cost of idling errors on every physical qubit involved in the logical operation. We instead study an asynchronous schedule that skips only the operations whose Bell-pair heralds have not arrived, accepting the temporary puncture of the corresponding stabilizer in that round. This strategy is represented by a heralded, single-round absence of a seam stabilizer measurement at a known spacetime location, accompanied by an online rewrite of the decoder’s detector error model (DEM) in which the punctured detector is deleted and its temporal neighbors are merged. We study the effective-distance reduction across puncture cluster configurations and show, via Stim simulations on a distance-11 lattice-surgery patch, that the spacetime location of punctures materially affects the logical error rate even when the effective distance is unchanged.
Synthesizing Compound Pulse Gadgets for Hamiltonian Simulation on Trapped-Ion Platforms
Standard gate-level transpilation introduces significant physical noise and overhead for high-precision quantum algorithms, such as the Quantum Singular Value Transformations (QSVT), on near-term trapped-ion hardware. Current compiler treatments approximate discrete units, forcing the physical control layer to execute highly fragmented laser pulses. To address this hardware-software disconnect, this work introduces a holistic pulse synthesis strategy that bypasses discrete gate-stitching to compile algorithms directly into continuous compound pulse gadgets. As a proof-of-concept, we target Hamiltonian simulation of the H2 molecule, block-encoding the problem into a QSVT circuit to approximate the time-evolution operator U = e^{-iHt} across 3 computational ions (2 system, 1 ancilla). We utilize the Gradient Ascent Pulse Engineering (GRAPE) algorithm to generate these compound gadgets and evaluate our methodology using noisy Lindblad master equation simulations. Preliminary observations indicate that the proposed strategy achieves significant temporal compression, reducing the total pulse schedule duration compared to standard compilers. Furthermore, synthesizing operations holistically eliminates the control-layer latency associated with discrete pulse lookup overhead. By streamlining the physical control schedule, this methodology offers a promising pathway to execute operations faster, highlighting the potential for compound gadgets to increase the computational depth achievable within fundamental T2 decoherence limits.
NWQRE: A Modular Quantum Resource Estimation Workflow for Fault-Tolerant Algorithms
Quantum resource estimation is essential for assessing whether fault-tolerant quantum computers can provide practical computational advantage, but existing estimates are often difficult to compare because they depend on different algorithmic assumptions, compilation pipelines, and hardware models. We present NWQRE, a modular quantum resource estimation workflow that connects quantum algorithm instances to fault-tolerant resource metrics, including Clifford+$T$ gate counts, logical resources, physical qubit estimates, and runtime under surface-code-based assumptions. Our workflow incorporates circuit decomposition, rotation optimization, Clifford+$T$ synthesis, and fault-tolerant compilation through Pauli-based computation, enabling systematic comparison across algorithmic building blocks and architecture-level assumptions. We benchmark the tool on small problem instances to illustrate how resource costs scale with problem size, rotation precision, and compilation choices. As a demonstration, we apply the workflow to components of a linear-combination-of-Hamiltonian-simulation construction for the diffusion equation, showing how a high-level quantum differential-equation solver can be translated into concrete fault-tolerant resource estimates. Rather than providing a full application-scale resource comparison, this work establishes a reusable toolchain for rapid, transparent, and extensible resource analysis of scientific quantum algorithms.
Diagonal-Aware SpMV and SpMSpM Kernels for GPU-Accelerated Hamiltonian Simulation
Hamiltonian simulation plays a key role in quantum computing as such quantum algorithms, after classical simulation, are seen as a means of validating quantum devices before they are deployed. In Hamiltonian simulation, state-vector evolution and explicit operator construction are two dominant computational components. State-vector simulation repeatedly applies a Hamiltonian, or a low-order polynomial in the Hamiltonian, to a quantum state, making sparse matrix-vector multiplication (SpMV) the dominant kernel. Explicit operator construction forms derived sparse operators, making sparse matrix-matrix multiplication (SpMSpM) the dominant kernel.
Hamiltonians arising in quantum simulation are highly structured and contain only a small set of active diagonal offsets. However, current GPU sparse kernels do not fully exploit this diagonal structure. To address these challenges, we propose two diagonal-aware GPU kernels: a reconstructed diagonal-tile SpMV kernel that maps local diagonal computation to tensor-core matrix multiplication, and a guided planner for the diagonal-interaction SpMSpM kernel that shifts output discovery to a host-side planning stage before tiled GPU execution. We evaluate both kernels on representative diagonal-sparse Hamiltonians from HamLib using an NVIDIA H100 GPU.
A Catalyzed Approach to Applying Fault Tolerant T^{1/2^k} Rotations to Surface Codes
The realization of high-precision non-Clifford rotations remains a critical bottleneck for fault-tolerant quantum computing, as traditional approaches rely on resource-intensive magic state distillation (MSD) and state injection. Existing MSD factories are typically optimized for T-gates and exhibit high spatial overheads when generalized to arbitrary small-angle rotations. In this paper, we propose the Catalyzed Rotation Unit (CRU), a specialized microarchitecture designed to apply $T^{1/2^k}$ rotations directly to logical qubits in a surface code. Unlike the Reed-Muller $T^{1/2^k}$ distillation factories, which suffer from exponential spatial scaling, the CRU utilizes a modular framework of catalyzed circuits to achieve a footprint that scales linearly with rotation precision k. By prioritizing a reduced physical footprint, our architecture enables the execution of higher-precision rotations on resource-constrained chips that would otherwise be unable to accommodate a standard Reed-Muller factory alongside the user’s algorithm. This work establishes the CRU as a foundational microarchitectural primitive, bridging the gap between abstract catalyzed transformations and the physical resource constraints of near-term quantum computers.#### Talk Title
Toward Compact Multiqubit Gates on Trapped Ions: A Hardware-Aware Pulse Synthesis
Multiqubit gates such as the Toffoli are essential building blocks for quantum algorithms, but their standard realization on trapped-ion hardware requires gate decomposition that adds significant circuit depth. This overhead is particularly relevant for advanced algorithmic frameworks such as quantum signal processing and quantum singular value transformation, which rely heavily on multi-controlled operations. As part of a broader investigation into native multiqubit gates on trapped ions, we examine the single-step Toffoli construction of Wang et al., which has remained theoretical due to the controlled-rotation primitive it requires. Recent advances in effective-Hamiltonian engineering through Magnus expansions of bichromatic spin-dependent forces provide a route to that primitive. Using HyPulse, we apply optimal-control pulse design on realistic trapped-ion calibration parameters and synthesize a pulse that performs the Toffoli gate using the same physical resources as the construction of Wang et al. The result demonstrates that single-step Toffoli synthesis is feasible within this operator structure using only native trapped-ion primitives, providing a complementary route alongside existing proposals.
For more information or any questions, please contact the organizing committee:
We look forward to seeing you at QCCC-26 in Raleigh, NC, USA!