Download Characterizing the Performance Effect of Trials and Rotations in

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Topological quantum field theory wikipedia , lookup

Matter wave wikipedia , lookup

Ensemble interpretation wikipedia , lookup

Renormalization wikipedia , lookup

Aharonov–Bohm effect wikipedia , lookup

Renormalization group wikipedia , lookup

Bohr–Einstein debates wikipedia , lookup

Double-slit experiment wikipedia , lookup

Quantum dot cellular automaton wikipedia , lookup

Scalar field theory wikipedia , lookup

Basil Hiley wikipedia , lookup

Delayed choice quantum eraser wikipedia , lookup

Theoretical and experimental justification for the Schrödinger equation wikipedia , lookup

Bell test experiments wikipedia , lookup

Particle in a box wikipedia , lookup

Measurement in quantum mechanics wikipedia , lookup

Density matrix wikipedia , lookup

Max Born wikipedia , lookup

Quantum field theory wikipedia , lookup

Hydrogen atom wikipedia , lookup

Bell's theorem wikipedia , lookup

Copenhagen interpretation wikipedia , lookup

Quantum entanglement wikipedia , lookup

Quantum decoherence wikipedia , lookup

Quantum dot wikipedia , lookup

Algorithmic cooling wikipedia , lookup

Path integral formulation wikipedia , lookup

Quantum electrodynamics wikipedia , lookup

Probability amplitude wikipedia , lookup

Coherent states wikipedia , lookup

Quantum fiction wikipedia , lookup

Many-worlds interpretation wikipedia , lookup

EPR paradox wikipedia , lookup

Orchestrated objective reduction wikipedia , lookup

History of quantum field theory wikipedia , lookup

Interpretations of quantum mechanics wikipedia , lookup

Quantum group wikipedia , lookup

Symmetry in quantum mechanics wikipedia , lookup

Canonical quantization wikipedia , lookup

Quantum key distribution wikipedia , lookup

Quantum cognition wikipedia , lookup

Quantum state wikipedia , lookup

Hidden variable theory wikipedia , lookup

Quantum machine learning wikipedia , lookup

T-symmetry wikipedia , lookup

Quantum teleportation wikipedia , lookup

Quantum computing wikipedia , lookup

Transcript
Characterizing the Performance Effect of Trials and Rotations
in Applications that use Quantum Phase Estimation
Shruti Patil∗ , Ali JavadiAbhari∗ , Chen-Fu Chiang† , Jeff Heckey‡ , Margaret Martonosi∗ and Frederic T. Chong‡
∗ Department
of Computer Science, Princeton University
of Mathematics and Computer Science, University of Central Missouri
‡ Department of Computer Science, University of California at Santa Barbara
∗
Email: {spatil,ajavadia,mrm}@princeton.edu, † [email protected], ‡ [email protected], ‡ [email protected]
† Department
Abstract—Quantum Phase Estimation (QPE) is one of the
key techniques used in quantum computation to design quantum
algorithms which can be exponentially faster than classical
algorithms. Intuitively, QPE allows quantum algorithms to find
the hidden structure in certain kinds of problems. In particular,
Shor’s well-known algorithm for factoring the product of two
primes uses QPE. Simulation algorithms, such as Ground State
Estimation (GSE) for quantum chemistry, also use QPE.
Unfortunately, QPE can be computationally expensive, either
requiring many trials of the computation (repetitions) or many
small rotation operations on quantum bits. Selecting an efficient
QPE approach requires detailed characterizations of the tradeoffs and overheads of these options. In this paper, we explore
three different algorithms that trade off trials versus rotations.
We perform a detailed characterization of their behavior on
two important quantum algorithms (Shor’s and GSE). We also
develop an analytical model that characterizes the behavior of a
range of algorithms in this tradeoff space.
I. I NTRODUCTION
A class of computational problems exist for which no
polynomial-time algorithm running on a classical Turing
Machine has yet been proposed. However, a number of
these algorithms—with important practical implications—have
been found to be efficiently solvable using a quantum computer. This is known as the BQP (Bounded-error Quantum
Polynomial-time) complexity class, and is conjectured to be
different from the P (Polynomial-time) or even the BPP
(Bounded-error Probabilistic Polynomial-time) class that can
be solved efficiently on classical computers. The hope of
accelerating important BQP class problems (e.g. factoring) has
spurred interest in building quantum computers and in devising
quantum algorithms to map computations onto them.
This paper studies an important technique in quantum
computation: Quantum Phase Estimation (QPE). QPE is a core
building block computation for various quantum applications
such as solving systems of linear equations [12], prime factorization [27], and finding answers to quantum many-body
problems [31, 32]. In fact, there are really only three core
building blocks for useful quantum algorithms: QPE, quantum
random walks [9] (used for some graph search algorithms), and
Grover’s function [11] (used for Grover’s search algorithm).
Because of QPE’s central importance in many QC applications, characterizing its performance and identifying efficient
implementation approaches is important. As one application
example, consider Ground State Estimation (GSE). Finding the
exact ground-state energy of a system of many particles is an
intractable task for classical computers due to the exponential
growth in computational cost as a function of the particles
involved. However, using quantum systems themselves to store
and process data, as is the case in quantum computers, this
problem can be efficiently solved. Molecular energies are
represented as eigenvalues of an associated Hamiltonian and
can be obtained by encoding a molecular wave function into
a set of quantum bits (qubits), simulating their evolution
using quantum operations on those qubits, and then extracting
the energy using quantum phase estimation. Experimental
realizations of QPE for calculating chemical system energies
have already been reported using linear optics [20].
The overall goal of QPE is to compute the eigenvalue of a
quantum unitary circuit. This circuit is also referred to as the
Oracle, since it can be regarded as a black box of computation,
whose eigenvalue will reveal important information for solving the overall problem. Different applications have different
oracle circuits of varying complexity, and this is a key factor in
the tradeoff space of QPE techniques discussed in this paper.
QPE approaches for obtaining phase estimations have
commonly rested on two underlying techniques:
• Quantum Fourier Transform (QFT): In this approach,
qubits are first computed on via the oracle function,
and then a QFT is performed to extract an eigenvalue.
A full implementation of QFT requires potentially
many, exponentially small quantum rotation operations, which may be costly to realize with good
precision.
• Repetitive Trials: Using repetitive trials involves applying the oracle several times, measuring samples
from the probability distribution of the eigenvectors, and finding the final answer via classical postprocessing of the answers; this reconstructs an estimate based on the collected measurements. This technique reduces the number of quantum rotation operations required (compared to QFT-based approaches)
but may involve many trials in order to obtain an
accurate estimate.
TABLE I: Comparison of three QPE Methods, with the idealized assumption of perfect rotation precision. (With more
realistic imperfect rotation precisions, AQFT will have a nonzero number of trials since it uses repeated trials to regain
precision.) ACPA is a middle ground between KHT and AQFT
in terms of the number of rotations and trials involved.
Number of Rotations
Number of Trials
KHT
None
Many
AQFT
Many
None
ACPA
Some
Some
Given these two general implementation styles, specific
QPE algorithms vary in their use of them. Table I shows
a coarse-level comparison of three distinct approaches for
Quantum Phase Estimation which cover a spectrum of possible
methods:
• Kitaev Hadamard Tests (KHT): The approach originally proposed by Kitaev [17] relies on a predetermined number of trials to achieve a desired target
for the error-rate and precision of estimation.
• Approximate Quantum Fourier Transform (AQFT):
This approach is based on QFT, but uses Approximate
QFT [4] instead of the full QFT in order to curb the
number of rotations.
• Arbitrary Constant Precision Algorithm (ACPA):
ACPA is an approach proposed by Ahmadi and Chiang
[3] which offers a method between KTH and AQFT in
terms of the number of rotations and trials involved.
This paper provides the first detailed performance comparison of these QPE techniques within real, full-scale QC applications. Overall this paper makes the following contributions:
• Through implementing different QPE methods, we
perform an empirical workload characterization of the
tradeoff between performing higher-precision rotation
operations versus performing more trials for obtaining
accurate phase estimates.
• We characterize two important quantum applications
that use quantum phase estimation as a fundamental
building block: Shor’s algorithm for prime factorization, and the Ground State Estimation algorithm for
calculating molecule energy levels. We analyze the
effect of different methods of QPE in each of these
algorithms.
• We find that practical resource constraints limit parallel execution within our QPE methods, heavily influencing which methods have the lowest runtime.
• We also find that the application also heavily influences the appropriate QPE method, since some
methods require more redundant execution of the
application than others.
The rest of this paper is organized as follows: Section II
gives background on basic QC concepts and our QC algorithm
toolflow. Section III describes QPE in detail, and Section IV
describes the different implementation approaches we consider.
Section V analyzes the resulting design space tradeoff. Section
VI discusses related work, and Section VII concludes. We also
include an appendix that derives key application characteristics
from Shor’s algorithm and Ground State Estimation.
II. BACKGROUND
This section offers a brief background on basic concepts
in quantum computation.
A. Quantum Bits and Operations
A quantum bit (or qubit) is not confined to a binary state.
At any time, the state of a qubit can be a linear superposition
of the 0 and 1 states denoted as |ψi = α|0i+β|1i, where α and
β are complex numbers subject to |α|2 +|β|2 = 1. Therefore, a
quantum state can be thought of as a vector of norm 1, with an
arbitrary phase—a concept which can be extended to n qubits
by thinking of the state of an n-qubit system as a unit vector in
the 2n -dimensional space. This can be represented as a point
on the surface of a 2n -dimensional sphere, commonly referred
to as the Bloch Sphere (Fig. 1).
Quantum operations are merely norm-preserving rotations,
which move the state of an n-qubit system along the surface
of the Bloch sphere. Consequently, every quantum operation is
a “rotation” operation. However, in the QPE subroutine which
is the focus of this paper, the term “rotations” specifically
refers to those that are performed about the Z axis only, to
distinguish a specific set of operations. For arbitrary rotation
angles, no straightforward, error-tolerant implementation on a
physical device is known. This forces us to approximate them
using a set of “standard” gates—akin to instruction selection
in classical compilers. Section III-A discusses this “rotation
decomposition” in more detail.
Fig. 1: The Bloch sphere is a good visualization of the quantum
state space. Points on the sphere correspond to unique states,
and quantum operations are equivalent to moves on the sphere.
B. Quantum Circuits
Ultimately, quantum computation must provide a classical
answer to a classical query. Therefore, it must be possible to
observe the state of the quantum system, via a measurement
operation. Measuring a quantum state collapses it to the
classical values of 0 or 1, with probabilities proportional to
the amplitude of the superposition coefficients.
Following the analogy of a classical logic circuit, a quantum circuit can be used to represent a collection of qubits
and gates, illustrating the operations performed in a quantum
algorithm step-by-step. In these circuits, quantum single- or
multi-bit operations are shown with boxes labeled with the operation. Measurement is shown with a “meter” symbol. Single
wires denote a qubit, and double wires denote classical bits
which are the outputs of measurement. Finally, “controlled”
operations are shown with a wire connecting the controlling
qubit to the target operation. The meaning is that the operation
will be performed if and only if the controlling qubit is in the
|1i state. The reader can refer to Figures 4-6 for examples of
quantum circuit diagrams.
This paper uses diagrams of quantum circuits to compactly express computations, but our toolflow (described in
Section II-D) starts from quantum algorithms written in a
high-level programming language and compiles down to an
assembly-language level. Resource estimates and algorithm
runtimes are generated by our compiler infrastructure. The use
of circuit diagrams to show the computation flow is analogous
to illustrating a classical computation using compiler-level
control/dataflow graphs.
C. Quantum Error Correction
Since quantum operations are inherently probabilistic, and
since qubits are extremely sensitive to noise, Quantum Error
Correction plays a central role in building quantum computers.
An important result known as the threshold theorem states that
quantum computation can be made robust against errors when
the error rate is below a certain threshold [2]. Intuitively, a
quantum computer can sustain its computation if a single error
can be corrected before two errors occur. Given an error correction scheme, a reliability threshold can be calculated. This
implies that scalable quantum computing is in theory feasible,
even though significant engineering challenges remain.
In order to take advantage of the threshold theorem,
Quantum Error Correction must occur periodically during
computation to ensure errors are corrected faster than they are
accumulated. A typical error correction scheme, for example,
uses a two-level, recursive Steane error correction code [29],
where each qubit is augmented with 49 extra qubits and each
timestep of computation is backed up by 23,409 additional
timesteps [22]. Reducing this cost is extremely important and
the amount of achieved reduction can easily determine whether
the computation executes in practical timescales (eg. not 100
years) [22].
Although the bulk of resources in any quantum circuit
implementation is due to added error correction redundancies (e.g. if every timestep of computation requires 23,409
timesteps of error correction), the algorithmic level (also
referred to as the logical level) has very high leverage in
controlling this. Algorithms with fast runtimes are less prone
to the accumulation of errors and can tolerate fewer error
correction steps. Thus, finding better high-level QC algorithms
and optimizing their implementation efficiency pays off multiplicatively because of the further benefits that accrue in
reducing quantum error correction required.
In our comparison of QPE schemes, we focus on logical qubits and logical timesteps. That is, we estimate costs
before error correction is added. We do this because error
correction overheads are heavily dependent upon technology
and coding choices. We attempt to remain independent of
those choices. Our logical comparison is quantitatively valid
as long as all the alternatives used end up using the same
amount of error correction per logical bit and logical operation.
Our comparisons, however, are still qualitatively valid when
trying to determine the QPE scheme with the lowest runtime.
Adding error correction will maintain runtime relationships,
since longer logical runtimes will experience equal or greater
error correction overhead than smaller runtimes.
D. Scaffold Language and the ScaffCC Compiler
This work characterizes the performance impact of QPE
implementation choices on full-scale quantum applications.
To accomplish this, we work with full QC apps and QPE
methods written in Scaffold, a C-like quantum programming
language [13]. Our evaluations of resource requirements and
program runtimes are based on analysis performed as part of
the ScaffCC compiler framework [14]. ScaffCC synthesizes
logical quantum circuits from a high-level language description. Scaffold includes data types to distinguish quantum and
classical bits and offers built-in functions for quantum gates.
Our work here primarily manipulates logical qubits without
considering their error correction. Subsequent passes could
implement additional functionality to co-schedule the logical
qubits and their gate operations along with QECC ancilla bits
and their operations.
%OperaAon%Control%
Classical%
State%
Quantum%
Core%
Quantum%
State%
Measurement%Outcome%
Fig. 2: Quantum computation is performed on a distinct “coprocessor” controlled by a classical computer.
E. Execution Model
Ultimately the QC programs we consider here would be
executed on QC hardware. While specific options for QC
implementation technology vary widely, Fig. 2 shows a general
approach in which quantum computations execute on a coprocessor unit controlled by a classical computer. Within the
quantum processor are several operating zones or gates, which
can each perform distinct operations in parallel on distinct
qubits. Each of the QC operating zones can be controlled
to execute any of the gate types this machine has chosen to
implement. The gate they implement can be changed from
cycle to cycle via control bits sent over from the classical
computer. (One can think of the QC operating zones being
analogous to an ALU in a classical computer; with a small
number of bits, ALUs can be controlled to perform an addition
one cycle, subtraction the next.) The runtime of a QC is, to
first order, the number of these gate operations (i.e., cycles
or steps) that need to occur in sequence in order to complete
the computation. (Several may happen in parallel.) When QC
algorithms are drawn as circuits, the timesteps needed to
complete the calculation is referred to as the circuit depth.
As with any real-world computer, computation on a QC
is fundamentally constrained by the availability of hardware
resources. For example, the number of operating zones available poses a fundamental limit on the number of simultaneous
parallel computations that are possible. Likewise, the number
of logical qubits is another fundamental constraint: in some
QCs, qubits are physically large (relative to state bits in
classical computers) and in many QCs, sufficient inter-qubit
spacing must be provided in order to minimize unintended state
interactions or interference that leads to qubit decoherence.
In addition, since each logical qubit must be underpinned by
dozens or hundreds of physical qubits implementing QECC,
the number of logical qubits required by a computation becomes a fundamental constraint. In this work, we use qubit
count as an indicator of parallelism: more qubits being computed upon means more parallelism. In addition, qubit count
can denote a resource constraint: some experiments limit the
maximum number of qubits to evaluate application runtime
under less idealized, finite-hardware assumptions.
III. Q UANTUM P HASE E STIMATION
If a quantum unitary operator U acting on an m-qubit input
vector yields:
U |ui = ei2πϕ |ui,
(1)
then |ui is said to be an eigenstate (or eigenvector) of the
corresponding unitary matrix of U . In this case, the eigenvalue
is a phase shift ei2πϕ that is introduced in the state vector.
The goal of QPE is to estimate the value of phase ϕ in the
ϕ
e = 0.x1 x2 x3 ...xn
Where 0.x1 x2 x3 ...xn is a notation for
xi ∈ {0, 1}.
(2)
Pn
i=1 xi ∗
1
2i
and
A. Quantum Rotation Decomposition
As mentioned in Section II-A, there are an infinite number
of single-qubit rotations that can be done on a Bloch sphere,
but we can implement a finite subset on a physical computer.
A major cost in some of the QPE approaches arises when algorithms must decompose arbitary rotations—especially precise
small rotations—into a sequence of physical operations.
Decomposition is possible due to the universality of a
small set of quantum gates for representing any single-qubit
operation. For example, Hadamard (H), Phase (S), ControlledNOT (CNOT) and the π8 -gates (T) form a universal set [6].
If we compare applying the decomposed set of gates instead
of the actual (perfectly precise) rotation to a particular qubit
state, the decomposition precision indicates how closely the
approximate outcome state would resemble the exact state (i.e.,
the likelihood that measuring it would yield the same answer.)
For an arbitrary angle, we need an approach for determining the correct decomposed sequence. In this paper we
employ the Single Qubit Circuit Toolkit (SQCT) proposed by
Kliuchnikov et al. [18], which offers practical decompositions
of up to accuracy 10−15 , based on the results of [19].
For the purpose of quantum phase estimation, we are
interested in rotation angles used by the quantum Fourier
transform, which are of the form R( 2π
), and the tradeoffs
2k
that are present in limiting their precision. Fig. 3 shows
the length of decomposition obtained from SQCT [28] for
these rotation operations. Higher precision decompositions can
be directly linked to a longer gate sequence, although at a
particular precision level, there is no significant difference
between different angles.
Using smaller angles in an algorithm means that the
precision of decomposition must be increased, in order to have
a faithful approximation of those rotations. Our model (Section
V) assumes that the presence of angle θ = 2π/2k means that
decomposition should take place at least with precision:
e < 1.
(3)
|θ − θ|
2k
The lower dotted line shows this trend, and guides the
actual precision that we should pick in different scenarios,
depending on the degree of the smallest angle present. This
increase in precision would affect the approximation of all
gates, thus being costly for circuit size.
Sequence$Length$in$the$Decomposi'on$of$R(2*π/2^k)$$
eigenvalue for a given unitary quantum circuit U . This phase
value reveals important information about the hidden structure
of the quantum oracle function in algorithms such as order
finding, the core computation of Shor’s algorithm.
Generally, estimating ϕ is a challenging task. Two factors
affect the adequacy of estimation: The precision δ determines
how close to the actual eigenvalue the estimation outcome
is. The error rate determines how likely it is, given the
probabilistic nature of quantum states, for the final estimate to
have the desired precision (equivalent to 1 − P rob(success)).
We use ϕ
e to denote an estimate of ϕ. Since the overall
phase shift lies in the [0, 2π) interval, an estimate of 0 ≤ ϕ < 1
with n bits of precision is written as:
1e212<p<1e210"
1e210<p<1e28"
1e28<p<1e26"
1e26<p<1e24"
1e24<p<1e22"
1e22<p<1e20"
Length"for"Required"Precision"
300"
250"
200"
150"
100"
50"
0"
0"
1"
2"
3"
4"
5"
6"
7"
8"
9" 10" 11" 12" 13" 14" 15" 16" 17" 18" 19" 20" 21" 22" 23" 24" 25" 26" 27"
k$
Fig. 3: The length of the sequence of gates generated by
SQCT to approximate a rotation operation with angle 2π
, as
2k
a function of k. Increased precision (p) yields progressively
longer sequences. For k = 0 there is no rotation, and for
k = 1, 2, 3 the standard gates of Z, S, T are obtained
respectively. For some loose precisions, small-enough angles
approximate to 0.
IV.
I MPLEMENTATIONS OF Q UANTUM P HASE
E STIMATION
This section describese the three QPE methods and their
relative strengths and weaknesses. Overall, they span a design space that presents tradeoffs of runtime (circuit depth),
operation-level parallelism (circuit width) and resource usage
(number of qubits and gates). They also implicitly differ in
the accuracy with which they estimate the phase of the input
eigenvector. In order to make the approaches comparable for
our characterization, we set each of their success probabilities
(1 − error rate) to 0.8 in our experiments, i.e. an 80% total
probability of obtaining the correct outcome.
A. Kitaev’s Hadamard Test with S gate (KHT)
Kitaev [17] proposed an algorithm that avoids arbitary
quantum rotations, which expand into long sequences of physical operations as shown in Fig. 3. Unfortunately, the algorithm
uses a large number of independent trials to perform QPE to
required precision. These trials can be parallelized, but can
require an impractical amount of quantum hardware to do so.
Our results in Section V-C will elaborate upon this.
Kitaev uses a series of Hadamard tests to perform the phase
estimation, each of which requires a single phase-shift gate S:
1 0
(4)
S=
0 i
This matrix is a transformation on a qubit vector |ψi =
α|0i + β|1i as previously described in Section II. The circuit
that accomplishes Kitaev’s Hadamard Test is shown in Fig. 4.
When applied to the |0i qubit (no probability of measuring
“1”), the Hadamard gate (H) results in a qubit that has equal
probability of |0i and |1i. The Hadamard gate is the inverse of
itself and is often applied again before measurement, as in the
Hadamard test. Each Hadamard test estimates one bit of the
1
[15]. The accuracy at which
phase ϕ with a precision of 16
we are computing the estimated phase ϕ̃ can be expressed as:
1
)>1−
(5)
2n
To achieve a desired success rate, that is to succeed with 1 − probability, we need mKHT trials of Kitaev’s Hadamard test
for each bit [3], where
P r(|ϕ − ϕ̃| <
mKHT = 55 ln
4n
∼ 55 ln(20n)
(6)
of all previous qubits. This approach achieves an asymptotic
probability of (4/π 2 ) when k > log2 n + 2 is selected [7],
thereby reaching the lower bounds of QFT success guarantees.
This bounds the number of rotations to O(n log2 n) limiting
k
their degrees to e2πi/2 . In practice, due to the logarithmic
reduction in circuit length, AQFT provides a viable alternative
that performs just as well (and sometimes better) than QFT in
the presence of decoherence [4]. Therefore, we choose to use
the AQFT circuit for our comparison.
A better lower bound on the success probability of AQFT
has also been derived as (4/π 2 − 1/16) [7]. When estimating
an exact n-bit phase, the success probability of AQFT also
approaches 1. However, these analyses assume that the rotations are precisely applied. In reality, rotations can only be
achieved up to a precision factor (Section III-A discusses this
approximation), which reduces the overall success probability.
We will discuss this impact in the following section, in the
context of both the AQFT method and the ACPA method which
is described next.
In the above equation, we set 1/ such that the success
probability 1 − = 0.8. This gives us the value of mKHT
that achieves an overall success probability equivalent to that
of QFT. The estimation of each bit in Equation 2 occurs
independently of the other bits. Therefore the circuit requires
no controlled phase operations. This also allows many (or all)
bits to be estimated simultaneously, subject to the resource
constraint that enough qubits are available for the individual
Hadamard tests. Thus, KHT has two levels of parallelism,
fine-grained parallelism (analogous to instruction-level parallelism) within the computation in Fig. 4, and coarse-grained
parallelism (analogous to task-level parallelism) between trials.
Indeed, all the approaches have coarse-grained parallelism but
only KHT has fine-grained parallelism across bits.
|0i
H
|0i
•
|xk i
..
.
|0i
H
..
.
H
|ui
|ψi
/ nLR
|0i
|ui
H
S
U2
k−1
H
Fig. 4: Kitaev’s Hadamard Test with S Gate (time flows from
left to right in this circuit diagram)
B. QFT and Approximate QFT (AQFT)
Because KHT requires a large number of trials, a more
common implementation of the QPE algorithm is based on
Quantum Fourier Transform (QFT), shown in Fig. 5. QFT (and
Approximate QFT) trade the expense of arbitrary quantum
rotations for fewer trials in achieving desired QPE accuracy.
In the QFT approach, n ancilla qubits are used to form
the upper register in the circuit, while nLR qubits form the
eigenstate in the lower register, whose periodicity (phase) is
to be estimated. The last step of QPE is implemented using
the inverse quantum Fourier transform (QF T † ) on the ancilla
states. The inverse QFT module will cause the probability
distribution of the state of the upper register to be clustered
around the correct phase value. With a Hadamard gate and a
measurement, an estimated bit is revealed. A typical circuit
for an n-qubit QF T † is shown in Fig. 6. The estimated
phase is computed from its least to its most significant place.
Proceeding one qubit at a time, the estimation accuracy of
higher significance qubits increases by taking into account the
estimates of all previously determined qubits. This approach
provides a success probability of at least (4/π 2 ) [4]. When ϕ
is an exact multiple of 1/2n , the success probability is 1.
The QFT circuit requires O(n2 ) rotations with degree up to
i2π/2n
e
. In reducing this complexity, Barenco [4] showed that
the lower bound of success probability of QFT can be achieved
with fewer rotations per qubit, by considering the estimates
of only the last k (also referred to as kAQF T ) qubits instead
xn
···
•
···
•
···
0
1
U2
U2
x2
QF T †
..
.
x1
•
···
n−1
U2
Fig. 5: n-qubit QPE circuit based on QFT to estimate the phase
ϕ = 0.x1 x2 xn .
|yn i
H
···
•
R2−1
|yn−1 i
...
•
···
H
•
···
|xn i
···
|xn−1 i
..
.
|x1 i
•
Rn−1
···
|y1 i
−1
Rn−1
···
R2−1
H
Fig. 6: n-qubit QF T † circuit. This circuit requires rotations of
exponentially increasing precision with the number of qubits.
|yn i
|yn−1 i
..
.
|yn−k+1 i
H
•
R2−1
•
•
H
···
···
···
|xn i
···
···
···
|xn−1 i
...
|xn−k+1 i
•
Rk−1
−1
Rk−1
···
R2−1
H
···
|y1 i
···
•
..
.
···
···
Rk−1
•
−1
Rk−1
..
.
•
···
R2−1
H
|x1 i
Fig. 7: Limiting the precision of rotations to achieve the nqubit AQF T † circuit to estimate the phase ϕ = 0.x1 x2 ...xn ,
where k >= 2 + log2 (n) is the highest degree of rotation
applied. This is the same circuit used in ACPA, but with
different requirements for the value of k.
C. Arbitrary Constant Precision Algorithm (ACPA)
The Arbitrary Constant Precision Algorithm (ACPA) [8]
bridges the KHT and AQFT approaches, providing a design
space where the number of arbitrary rotations can be reduced at
the expense of more trials. Intuitively, AQFT performs a series
of smaller and smaller (higher-precision) rotations to reach a
desired accuracy. ACPA omits some of the smallest rotations at
the expense of accuracy. This loss in accuracy is then recovered
by performing more trials and then performing a majority vote
on the classical bits resulting from measurements (the result
is binary).
k
If the degree of rotation is limited to e2πi/2 where k ≥ 3
(k also referred to as kACP A ), the number of trials required
is:
2ln(1/0 )
mACP A =
(7)
π2
(1 − 2k−1 )2
2
0
Here, 1 − is the success probability of estimating each bit.
The overall success probability obtained is 1 − n0 . However,
the above expression also assumes that the rotations are
perfectly implemented. Since practical rotations are imperfect,
they increase the number of required trials to reach the desired
accuracy. [8] derives the effect of imperfect rotations on the
number of trials in ACPA. In particular, if the rotations can
1
be achieved with an accuracy of at least η =
, the
(k − 1)2k
number of trials is given by:
mACP A =
2ln(5n)
2ln(1/0 )
∼
π2 2
π2
(1 − 2k−3 )
(1 − 2k−3 )2
2
2
(8)
We use this equation to derive the number of required trials.
Here, we set 1 − n0 ≥ 0.8, giving us 0 ∼ 1/5n to accomplish
an 80% success probability.
One can consider the AQFT approach as the limiting case
of ACPA, i.e. when kACP A is chosen to be > 2 + log2 n, the
two approaches implement the same circuit. Therefore, we can
compute the number of trials required for the AQFT circuit as:
mAQF T =
2ln(5n)
with k > 2 + log2 n
π2
(1 − 2k−3 )2
2
(9)
D. Phase Kickback (Uf )
Finally, looking at the circuits for all the QPE methods
(Figures 4-7), we see that for any type of quantum phase
estimation with accuracy of n bits, the range of quantum
k
operations U 2 with 0 ≤ k < n − 1 must be applied. The
n−1
modules of U , U 2 , U 4 , . . . , U 2
are collectively called the
phase kickback function and we refer to it as Uf . Fig. 8 depicts
the phase kickback for the case of the AQFT circuit. (The
first stage of AQFT and ACPA are indeed equivalent to one
application of Uf ; KHT splits Uf across its different trials.)
Uf has in many cases its own interpretation— for example, in
Shor’s algorithm it is equivalent to a modular exponentiation
module, raising its input to the power of a fixed number,
modulo the number to be factored.
|0i
H
..
.
|0i
|u0 i
QF T †
..
.
H
Uf
|u1 i
..
.
|unLR i
..
.
Fig. 8: The QFT-based quantum phase estimation circuit of
Fig. 5 redrawn to reflect the Uf module. The cost of Uf is
equivalent to 2n−1 times the cost of U .
V. A NALYSIS OF THE D ESIGN S PACE
This section lays out the design space options in detail, and
evaluates the tradeoff between QPE techniques, providing runtime results for different precision requirements and resource
constraints.
Our analysis is divided into two categories. First, we
have factors that constrain the number of trials that can be
executed in parallel (task-level parallelism). These include
the number of qubits as well as the number of available
eigenstates in each application with QPE approach. These
resource requirements affect how many parallel trials can be
supported by the hardware. Second, we have a factor that
affects the execution time of each trial. This is the size of
the phase kickback function (Uf ) that must be called across
the many trials, thus contributing to the circuit size generated
by the implementation.
A. Factors Affecting the Parallelization of Trials
These factors actually can also affect parallelization within
trials, but we can assume that resources are sufficient for
supporting parallel execution within at least one trial.
1) Number of logical qubits: Parallel execution of multiple
trials will require multiple copies of quantum state, which we
measure in logical qubit count. (The number of physical qubits
is larger due to QECC.) Since arbitrary quantum data cannot be
copied (the quantum no-cloning theorem), independent trials
can only occur if they start from the beginning of the entire
application or from an intermediate computation that starts
with only freshly initialized qubits.
The maximally-parallel implementations require the most
qubits since they require a copy of the entire computation for
each trial. In general, if p is the parallelization factor achieved
during an execution, the number of qubits required by the
three methods is given by: N Q = p ∗ (nLR + 1). Since the
maximum parallelism available in the algorithms is across the
trials and the bits, the number of qubits required can be as
high as mn(nLR + 1) in maximally parallel implementations.
In practical implementations, p ranges from 1 to mn. For
example, KHT may be applied with p = m such that the
bits are processed serially while the m trials are performed
in parallel. This reduces the requirement to mKHT (nLR + 1)
by reusing expensive qubit resources between successive bit
estimations.
2) Number of eigenstates available: To estimate the eigenvalues through multiple trials, the phase estimation algorithms
must be applied to the same eigenstate a number of times. For
the trials to be parallelized, multiple copies of this eigenstate
must be available. In general, the parallelization of trials is
constrained by the number of such copies available. While
this can be bounded by the number of available qubits in
the system, in some cases it may simply be too expensive
to prepare multiple copies of the states. However, for many
applications including Shor’s and GSE, these eigenstates can
be generated by applying Uf to suitable initial states which
are straightforward to prepare. Therefore, our study assumes
that with sufficient resouce availability, the eigenstates can be
prepared in parallel.
B. Factors Affecting Circuit Size
In addition to the specific computations described for each
QPE approach in Section IV, the size and implementation
cost of the Uf function is an important factor. Appendix A
considers the cost of varying the size of Uf . Here we examine
the implementation cost by studying the number of invocations
of the U modules. The higher the invocation count of U , the
greater the execution time of each trial and the greater the
resources needed to support parallelism across trials.
The Uf function performs a central transformation in the
QPE algorithm in creating the eigenstate whose periodicity is
to be estimated. For many algorithms, the Uf function is also
the most expensive part of the circuit. Therefore the cost of
its implementation is an important consideration and provides
an insight into its complexity. In general, the function U k can
be implemented using k copies of U . Hence, the number of
invocations of U is given by:
that resource constraints can limit parallelism and heavily influence the choice of QPE implementation. We later generalize
our evaluation by plugging in realistic oracle costs and QPE
precision targets based on parameters derived in the appendix.
Results were obtained using the ScaffCC compiler framework, which can measure resource usage in terms of logical
qubits and logical gates, as well as compute runtime in terms
of logical timesteps. The compiler can generate these measurements through static analysis because quantum applications are
generally compiled with all inputs known.
1) The Effect of Desired Precision on Trials and Rotations:
As described previously, the number of trials varies for the
different methods for a given n, the desired QPE precision in
bits. Fig. 9 shows how estimating more bits of QPE precision
causes a sharper increase in the required number of trials
in KHT as opposed to AQFT. The graph also depicts the
impact of kACP A (the highest-precision rotation) on the ACPA
method. The value of kACP A = 3 is insufficient to generate
a high enough success probability in each trial demanding
considerably large number of trials. For kACP A of 4 or higher,
its behavior more closely tracks AQFT. On the other hand,
using even a few controlled rotations helps the estimation
process more than KHT, which does not take into account
the values of the previously estimated bits.
Figure$9$–$Num$of$trials$for$succ$prob$=$
0.8$
NU = m ∗ (1 + 2 + 4 + ... + 2n−1 ) = m ∗ (2n − 1)
(10)
In our characterizations, the GSE algorithm uses this scheme
to implement Uf . However, for large n, this can result in an
undesirably high number of invocations of U. For example, in
Shor’s algorithm, n is expected to be about 1000. Therefore,
a more efficient implementation of U k where the cost stays
constant regardless of the value of k is desirable, and has been
proposed for Shor’s algorithm [25]. If such implementations
are available for U , the number of invocations of U are
NU = m∗n. For both types of implementations, the smaller the
number of trials, the fewer the invocations of U , consequently,
one can expect better overall runtime.
Table II shows a simplified comparison of the resources
and runtime for the three methods, assuming that only the
trials are parallelized. Choosing between Kitaev’s algorithm
and others depends on the relative cost of Uf and rotations,
while choosing a suitable technique between that of ACPA and
AQFT depends on the number of trials, number of rotations
and the cost of the rotations. In the appendix, we determine
these costs empirically.
C. Empirical Evaluation
To evaluate the different design metrics for the QPE implementations, we first limit the circuit to a micro-benchmark
whose oracle is substituted by a proxy circuit consisting of
simple CNOT gates. The intention is to explore the tradeoff
space of QPE methods independent of oracle costs, which can
vary significantly across applications. In particular, we show
Number'of'Trials'
600$
500$
400$
KHT$
300$
ACPAO3$
200$
ACPAO4$
ACPAO5$
100$
AQFT$
0$
8$
16$ 32$ 64$ 128$ 256$ 512$ 1024$
n0bit'Precision'of'Es5mated'Phase'
Fig. 9: Number of trials for KHT, ACPA-k and AQFT. Trials
for ACPA-k are computed from Equation (8) assuming imperfect rotation gates.
2) Tradeoff between Resources and Runtime: A key tradeoff in our QPE schemes arises when parallelism is constrained
by practical constraints in the number of logical qubits that
can be implemented in a quantum machine. Specifically, the
runtime of each QPE scheme is heavily dependent upon how
much concurrency can be physically supported in executing
independent trials.
Fig. 10 compares the runtime of algorithms as we bound
the number of logical qubits for the QPE implementations
requiring 64-bit precision of phase estimates. Low availability
of qubits forces a serial execution of the algorithms, driving up
their runtimes. With more resources, the runtime of each algorithm improves as their inherent parallelism in the algorithms
gets exploited. For KHT, the trials as well as the bits can
be processed in parallel; however exploiting this parallelism
requires ∼106 logical qubits. (With error correction, this could
easily be ∼108 physical qubits, an impractical number for the
foreseeable future.) On the other hand, for ACPA and AQFT,
the truly serial parts are the rotations that require bit-wise
processing, thus their parallelism can be exploited with fewer
available qubits. When a low number of qubits is available,
ACPA executes the fastest among the three. When available
TABLE II: Analytical comparison of the three techniques with respect to number of gates, circuit depth and number of qubits,
assuming that the trials are performed in parallel while bits are processed serially.
KHT
ACPA
AQFT
mKHT (Ufgates + 3n)mACP A (Ufgates + n ∗ (kACP A − 1) ∗ Avg Rgates )mAQF T (Ufgates + n ∗ (kAQF T − 1) ∗ Avg Rgates )
Runtime
Uftimesteps + 3n
Num of qubits
mKHT (1 + nLR )
Fig.12$
Uftimesteps + n ∗ (kACP A − 1) ∗ Avg Rtimesteps Uftimesteps + n ∗ (kAQF T − 1) ∗ Avg Rtimesteps
mACP A (1 + nLR )
Fig.10$
logical qubits becomes greater than 106 , all algorithms can be
fully parallelized, with KHT offering the best runtime.
Run5me'(in'5mesteps)'with'
'cost'of'Uf=1'5mestep'
1E+6$
1E+5$
1E+4$
KHT$
1E+3$
ACPAO3$
ACPAO4$
1E+2$
ACPAO5$
1E+1$
1E+0$
1E+1$
AQFT$
1E+2$
1E+3$
1E+4$
1E+5$
1E+6$
1E+7$
Fig. 10: Space-time tradeoffs presented by the three methods
with bounded resource availability. With fewer available resources, ACPA executes fastest, while KHT executes fastest
when available resources are as high as ∼106 .
Run5me'(in'5mesteps)'with'
'cost'of'Uf=10^7'5mesteps'
Fig.11$
KHT$
ACPAO3$
ACPAO4$
ACPAO5$
AQFT$
1E+2$
1E+3$
1E+4$
1E+5$
1E+6$
1E+13$
1E+12$
1E+11$
1E+10$
1E+9$
1E+8$
1E+7$
1E+6$
1E+5$
1E+4$
1E+3$
1E+2$
1E+1$
1E+0$
1E+1$
KHT$
ACPAO3$
ACPAO4$
ACPAO5$
AQFT$
1E+2$
1E+3$
1E+4$
1E+5$
1E+6$
Number'of'available'qubits'
(Required'Precision'of'Es5mated'Phase'is'80bit)'
Fig. 12: Space-time tradeoffs with Uf and desired precision
set for GSE.
Number'of'available'qubits'
(Required'Precision'of'Es5mated'Phase'is'640bit)'
1E+12$
1E+11$
1E+10$
1E+9$
1E+8$
1E+7$
1E+6$
1E+5$
1E+4$
1E+3$
1E+2$
1E+1$
1E+0$
1E+1$
mAQF T (1 + nLR )
Run5me'(in'5mesteps)'with'
'cost'of'Uf=10^9'5mesteps'
Gates
1E+7$
Number'of'available'qubits'
(Required'Precision'of'Es5mated'Phase'is'640bit)'
Fig. 11: Space-time tradeoffs with Uf and desired precision set
for Shor’s algorithm. With fewer available resources, AQFT
executes fastest due to fewer invocations of expensive serialized Uf .
3) Resource Constraints with a Larger Oracle Function:
We can extend our microbenchmark by increasing the cost
of the oracle function from a simple CNOT gate to the
oracle for Shor’s algorithm and GSE. Figures 11 and 12 show
the runtimes for the three methods when the runtime of Uf
function is ∼107 and ∼109 timesteps (order of the size of Uf
for Shor’s algorithm and GSE, respectively). Desired precision
is set for 64 and 8 bits, also according to application. The
smaller number of invocations of Uf in the AQFT method
allows it to complete execution faster than the other methods.
When sufficient number of qubits become available, however,
the Uf application can be parallelized, and the truly serial
parts of the algorithms (which are the inverse-QFT and the
Hadamard Test) give rise to the differences in their runtimes,
similar to the trends in Fig. 10.
D. Summary
This section characterized the tradeoffs of different QPE
techniques and the design space that they create. While the
KHT method is beneficial in terms of parallelism and hence
runtime performance with ample resources, it incurs a large
cost in terms of qubits and gates. In all metrics, ACPA is
a middle solution to the extreme requirements of KHT and
AQFT. Ultimately, the amount of resources at hand (especially
logical qubits), the level of parallelism supported by the
architecture, and the target runtime will determine the method
of choice. QPE is a fundamental component of two very
important QC applications: Shor’s factoring algorithm and
GSE. Using the specific characteristics and parameters from
these algorithms (see Appendix) our results here have also
detailed the tradeoffs regarding how best to implement QPE
for these full applications.
VI. R ELATED W ORK
Efficient decomposition of single-qubit unitaries has been
widely studied. The Solovay-Kitaev algorithm [16] is traditionally well-known for this purpose, and an early implementation by Dawson and Nielsen [10] outputs a sequence
of order log 3.97 ( 1 ) gates for every rotation approximated
with precision. Selinger [26] proposes an improvement
which decomposes Z-axis rotations into a sequence of length
4log2 ( 1 ) in the worst case. Paetznick and Svore [23] include
non-determinism in their algorithm to achieve gains in circuit
cost, while Bocharov et al. [5] improve this by moving away
from the traditional decomposition bases of H, S, CNOT and
T gates.
Abrams and Lloyd [1] were the first to notice quantum
phase estimation can be used in estimating the ground state
energy of a molecule. Improvements on making this method
faster have since been proposed [24].
1.0E+07$
Num'of'Gates/Num'of'Timesteps'
Total'Number'of'Gates'
(Normalized'to'AQFT'U=10^4)'
Fig.14$
Shor’s$
Algorithm$
1.0E+06$
1.0E+05$
KHT$
1.0E+04$
ACPAO3$
1.0E+03$
ACPAO4$
1.0E+02$
ACPAO5$
1.0E+01$
AQFT$
1.0E+00$
1.0EO01$
10^4$
10^5$
10^6$
10^7$ 10^8$
10^9$
10^10$
6E+10$
3E+10$
2E+10$
1E+10$
0$
4$
VII. C ONCLUSION
This paper has examined the tradeoff of three different
methods for quantum phase estimation. These methods on
the surface differ in the extent to which they use rotation
gates and trials to accomplish the goal of phase estimation,
but resource-wise they create a design space which requires
different amounts of algorithm runtime and quantum space
(or qubit usage). We performed an analytical as well as
empirical study of the tradeoffs present in this design space,
and discussed the effect of both hardware constraints and the
type of algorithm that uses the phase estimation in guiding
the choice of a suitable QPE method for different quantum
applications. In particular, we find that ACPA provides a good
tradeoff between the number of trials and the number of
rotations, often achieving a sweet spot in execution time when
parallelism is limited by available physical resources.
5$
6$
7$
8$
9$
10$
Fig.15$
A. Case Study 1: Shor’s Algorithm
Factorizing a large number into its prime constituents is a
hard problem for classical computers, and forms the basis of
many modern cryptography techniques. A seminal quantum
algorithm by Shor [27] offers exponential speedup over the
current best known classical algorithm [21]. Quantum phase
estimation lies at the heart of Shor’s algorithm. We have
implemented an optimized phase kickback function Uf for this
algorithm, as proposed by Pavlidis and Gizopoulos [25].
12$
Fig. 14: Obtaining the number of gates and timesteps in the
Uf module for the GSE algorithm. The number of gates
and timesteps of constituent U ’s increase exponentially with
higher required precision. It can also be seen that little overall
parallelism exists in this algorithm.
100000$
10000$
1000$
KHT$
ACPAO3$
100$
ACPAO4$
10$
ACPAO5$
1$
AQFT$
0.1$
4$ 5$ 6$ 7$ 8$ 9$ 10$ 11$ 12$
Precision'of'Es5mated'Phase'in'GSE'(n0bit)'
Fig. 15: Comparing the total number of gates of GSE using
KHT, AQFT and ACPA methods when the required precision
of estimated phase is varied.
In this algorithm, the overall phase kickback module Uf
implements a modular exponentiation function:
Uf
|xi|1i −−→ |ax modN i
A PPENDIX
In this appendix, we derive the QPE target precision
and oracle size for two algorithms that have quantum phase
estimation as a fundamental component: Shor’s algorithm for
prime factorization, and the Ground State Estimation (GSE)
algorithm for finding the ground energy level of a molecule.
The fundamental difference between these algorithms (and
generally all algorithms involving QPE) is the oracle U function that is used for their implementation.
11$
Precision'of'Es5mated'Phase'in'GSE'(n0bit)'
Total'Number'of'Gates''
(Normalized'to'AQFT'n=4)'
Finally, Svore et al. [30] propose an improvement over
the KHT method which makes asymptotic improvements on
its circuit width and depth, by modifying the Hadamard tests
to infer multiple bits of the phase simultaneously instead of
one bit at a time. However, this requires extra invocations
of the U function, which is a significant cost factor in QPE
implementations as per our findings.
Timesteps$in$Uf$
4E+10$
Size'of'U'func5on'(number'of'gates)'
Fig. 13: Comparing the impact of the size of the Uf function on
number of total gates using KHT, AQFT and ACPA methods.
The size of Shor’s Uf function lies around the 5 ∗ 107 line.
Gates$in$Uf$
5E+10$
(11)
Referring to our discussion of U and Uf modules in Section
IV-D, here the U modules are equivalent to modular multiplication units, while the overall phase kickback module Uf
implements a modular exponentiation function. From modular
arithmetic it is known that a repeated application of modular
multiplication yields modular exponentiation. That is:
0
n−1
ax modN = (a2 modN )x0 + . . . + (a2
modN )xn−1 , (12)
where x0 , . . . , xn−1 are the bits of x, the upper register of
phase estimation.
Our implementation of this algorithm is efficient in the
sense that there is a polynomial cost associated with the phase
kickback module Uf as a function of n, because each of the
k
U 2 modules is customized separately and does not result
from exponentially many calls to the U module. Analytical
analysis yields n ∗ (796n2LR + 692nLR ) number of gates and
n∗(1045nLR −38) number of timesteps for the phase kickback
circuit [25], where nLR = n/2 is the size of the lower register
of phase estimation. These values can be used in the analytical
model of Table II to yield the overall cost of implementing
Shor’s algorithm with the three QPE methods.
Fig. 13 shows this cost as a function of the size of the phase
kickback module Uf . A particular problem size (factoring a
32-bit number with 64 bits of precision) has been marked on
the graph.
[7]
B. Case Study 2: The Ground State Estimation Algorithm
The ground state estimation algorithm is used for estimating the ground state energy E0 of a molecule [32]. Knowing
the energy to lie in an interval [Emin , Emax ], this can be
reduced to estimating the phase ϕ in the equation
[10]
U |ψ0 i = ei2πϕ |ψ0 i,
[13]
(13)
with U being dependent on the system Hamiltonian H, among
other things:
U = eiEmax τ eiHτ .
(14)
[8]
[9]
[11]
[12]
[14]
[15]
k
Due to the complexity of implementing the U 2 subcircuits, the subcircuit U is approximated by an ApproxU module. We implement the algorithm for a molecule of molecular
weight 10, with up to 12 bits of precision for the energy
estimate, and with simplified assumptions for ϕ. It is difficult
to obtain a closed-form analytical expression for the number of
gates and timesteps in the phase kickback module. However,
from our algorithm implementation we can obtain empirical
values for these numbers with varying precision. These values
are plotted in Fig. 14. This implementation, unlike that of
Shor’s, does not result in a polynomial function for the size
of the phase kickback module. The lack of customized and
k
efficient specifications of U 2 circuits has resulted in exponential growth from repetitive application of the U module, as
mentioned in Eqn. 10.
Fig. 15 shows the total number of gates as a result of
varying the required precision in the overall GSE algorithm,
showing the exponential growth effect.
[16]
[17]
[18]
[19]
[20]
[21]
[22]
[23]
ACKNOWLEDGMENTS
The authors thank Pawel Wocjan for useful discussions.
This work was partly supported by NSF grant PHY-1415537.
R EFERENCES
D. S. Abrams and S. Lloyd. Quantum Algorithm Providing Exponential
Speed Increase for Finding Eigenvalues and Eigenvectors. Physical
Review Letters, 83(24):5162, 1999.
[2] D. Aharonov and M. Ben-Or. Fault-tolerant quantum computation with
constant error. In Proceedings of the 29th annual ACM Symposium on
Theory of Computing, STOC ’97, 1997.
[3] H. Ahmadi and C.-F. Chiang. Quantum Phase Estimation with Arbitrary
Constant-Precision Phase Shift Operators. Quantum Info. Comput.,
12(9-10):864–875, Sept. 2012.
[4] A. Barenco, A. Ekert, K.-A. Suominen, and P. Törmä. Approximate Quantum Fourier Transform and Decoherence. Physics Rev A.,
54(1):139–146, Jul 1996.
[5] A. Bocharov, Y. Gurevich, and K. M. Svore. Efficient Decomposition
of Single-Qubit Gates into V Basis Circuits. Physical Review A,
88(1):012313, 2013.
[6] P. O. Boykin et al. On Universal and Fault-Tolerant Quantum Computing: A Novel Basis and a New Constructive Proof of Universality for
Shor’s Basis. In Foundations of Computer Science, 1999. 40th Annual
Symposium on, pages 486–494. IEEE, 1999.
[24]
[25]
[1]
[26]
[27]
[28]
[29]
[30]
[31]
[32]
D. Cheung. Improved Bounds for the Approximate QFT. In Proceedings of the Winter International Synposium on Information and
Communication Technologies, WISICT ’04, pages 1–6. Trinity College
Dublin, 2004.
C.-F. Chiang. Selecting Efficient Phase Estimation with ConstantPrecision Phase Shift Operators. Quantum Information Processing,
13(2):415–428, 2014.
A. M. Childs et al. Exponential Algorithmic Speedup by a Quantum
Walk. In Symposium on Theory of Computing. ACM, 2003.
C. M. Dawson and M. A. Nielsen. The Solovay-Kitaev algorithm.
Quantum Info. Comput., 2006.
L. K. Grover. A Fast Quantum Mechanical Algorithm for Database
Search. In Symposium on Theory of Computing. ACM, 1996.
A. W. Harrow et al. Quantum Algorithm for Linear Systems of
Equations. Physical Review Letters, 103(15):150502, 2009.
A. JavadiAbhari et al. Scaffold: Quantum Programming Language.
Technical report, Princeton University, NJ, USA, 2012.
A. JavadiAbhari, S. Patil, D. Kudrow, J. Heckey, A. Lvov, F. T. Chong,
and M. Martonosi. Scaffcc: A Framework for Compilation and Analysis
of Quantum Computing Programs. In ACM Conference on Computing
Frontiers, 2014.
A. Y. Kitaev. Quantum Measurements and the Abelian Stabilizer
Problem. arXiv preprint quant-ph/9511026, 1995.
A. Y. Kitaev. Quantum Computations: Algorithms and Error Correction.
Russian Mathematical Surveys, 52(6):1191–1249, 1997.
A. Y. Kitaev, A. H. Shen, and M. N. Vyalyi. Classical and Quantum
Computation, volume 47 of Graduate Studies in Mathematics. American Mathematical Society, 2002.
V. Kliuchnikov, D. Maslov, and M. Mosca. Practical Approximation
of Single-Qubit Unitaries by Single-Qubit Quantum Clifford and T
Circuits. arXiv preprint arXiv:1212.6964, 2012.
V. Kliuchnikov, D. Maslov, and M. Mosca. Asymptotically Optimal
Approximation of Single Qubit Unitaries by Clifford and T Circuits
Using a Constant Number of Ancillary Qubits. Physical review letters,
110(19):190502, 2013.
B. P. Lanyon, J. D. Whitfield, G. Gillett, M. E. Goggin, M. P. Almeida,
I. Kassal, J. D. Biamonte, M. Mohseni, B. J. Powell, M. Barbieri,
et al. Towards Quantum Chemistry on a Quantum Computer. Nature
Chemistry, 2(2):106–111, 2010.
A. K. Lenstra et al. The Development of the Number Field Sieve, volume
1554. Springer, 1993.
M. Oskin et al. A Practical Architecture for Reliable Quantum
Computers. IEEE Computer, 35(1):79–87, 2002.
A. Paetznick and K. M. Svore.
Repeat-Until-Success: NonDeterministic Decomposition of Single-Qubit Unitaries. arXiv preprint
arXiv:1311.1074, 2013.
A. Papageorgiou, I. Petras, J. Traub, and C. Zhang. A Fast Algorithm
for Approximating the Ground State Energy on a Quantum Computer.
Mathematics of Computation, 82(284):2293–2304, 2013.
A. Pavlidis and D. Gizopoulos. Fast Quantum Modular Exponentiation
Architecture for Shor’s Factoring Algorithm. Quantum Information and
Computation, 14:0649–0682, 2014.
P. Selinger. Efficient Clifford+T Approximation of Single-Qubit Operators. Quantum Info. Comput. (arXiv preprint arXiv:1212.6253), 2014.
P. W. Shor. Algorithms for Quantum Computation: Discrete Logarithms
and Factoring. In Foundations of Computer Science. IEEE, 1994.
Sqct: Single Qubit Circuit Toolkit - https://code.google.com/p/sqct/,
May 2014.
A. Steane. Multiple-Particle Interference and Quantum Error Correction. Proceedings of the Royal Society of London. Series A:
Mathematical, Physical and Engineering Sciences, 452(1954):2551–
2577, 1996.
K. M. Svore, M. B. Hastings, and M. Freedman. Faster Phase
Estimation. Quantum Info. Comput., 14(3-4):306–328, Mar. 2014.
K. Temme, T. Osborne, K. Vollbrecht, D. Poulin, and F. Verstraete.
Quantum metropolis sampling. Nature, 471(7336):87–90, 2011.
J. D. Whitfield et al. Simulation of Electronic Structure Hamiltonians
Using Quantum Computers. Molecular Physics, 109(5):735, 2010.