Use this file to discover all available pages before exploring further.
View on GitHub
Open this notebook in GitHub to run it yourself
Quantum Support Vector Machines is the quantum version of classical Support Vector Machines (SVM); i.e., a data classification method that separates the data by performing a mapping to a high-dimensional space, in which the data is separated by a hyperplane [1]. QSVM is a hybrid quantum–classical classification algorithm in which classical data are embedded into a high-dimensional quantum Hilbert space using a parameterized quantum feature map. A quantum processor is then used to evaluate inner products between these quantum states, producing a kernel matrix that captures similarities between data points in this quantum feature space. This kernel is passed to a classical support vector machine optimizer, which learns the optimal separating hyperplane by identifying support vectors and model parameters. For prediction, the trained model classifies new data points using quantum-evaluated kernel values and a classical decision rule. For some problem instances, a quantum feature map may enable improved classification performance while using fewer computational resources than a classical algorithm [2].The algorithm treats the following problem:
Input: Classical data points xi, where xi∈Rd are d-dimensional vectors, corresponding labels yi∈{−1,1}, where i=1,…,m, as well as feature map Uϕ(xi), encoding the classical data in a quantum state.
Output: A kernel matrix evaluated using quantum measurements. The matrix is then fed into a classical SVM optimizer, producing a full characterization of the separating hyperplane.
Our goal is to find a hyperplane in Rd which separates the points {xi} into ones for which the corresponding labels are yi=+1 and yi=−1.The hyperplane is conveniently defined by a vector normal to it, w∈Rd and an offset b∈R.The classification of a point x can be determined by hw(x)=sign(⟨w,x⟩+b), which decides on which side of the hyperplane the point lies on.Here ⟨w,x⟩ is the inner product between the two vectors.To describe the goal explicitly, we introduce the geometric margin as the distance from the hyperplane to the closest training point xi: minx((⟨w,x⟩/∣∣w∣∣).The optimal classification corresponds to the hyperplane and offset that maximizes the geometric margin.This goal can be stated as a naive optimization problem: find w satisfyingmaxwmini(sign(⟨w,xi⟩+b))subject to: sign(⟨w,xi⟩+b)=sign(yi).However, this formulation of the problem is a nonlinear optimization problem due to the sign comparison. Alternatively, we can express the objective as a linear function with linear constraints. A separating hyperplane satisfies
(⟨w,xi⟩+b)yi≥0, moreover, we set the length of w by enforcing that the inner product with respect to the nearest point (in fact, there will always be two data points on each side of the hyperplane with the same minimal distance to the hyperplane) to be ⟨w,xmin⟩=1. As a result, all data points satisfy (⟨w,xi⟩+b)yi≥1.The condition enables defining the optimization problemminimize21∣∣w∣∣2subject to (⟨w,xi⟩+b)yi≥1fori=1,…,m.In general, it would not be possible to separate the bare data points by a hyperplane; therefore, a transformation of the data to a higher-dimensional space using a feature map ϕ(x) is performed.Following the transformation, the hyperplane and offset are evaluated (similar problem, obtained by transforming xi→ϕ(xi)).The main disadvantage of the present (primal) formulation of the problem is that explicitly computing ϕ(x) may require infeasible computational resources, or even involve a mapping to an infinite-dimensional space.An alternative approach utilizes the dual formulation of the problem.This approach relies on the Karush-Kuhn-Tucker theorem [3], which implies that one can formulate a dual optimization problem, whose solution (under certain conditions which are satisfied for the present case) coincides with the solution of the present (primal) problem.The dual problem of the original primal optimization problem is given bymaximizeLD(α1,…,αm)=i=1∑mαi−21i,j=1∑myiyjαiαjK(xi,xj)(2)subject to:αi≥0fori=1,…,mi=1∑myiαi=0,where K(xi,xj)=⟨xi,xj⟩ is the (i,j) component of the matrix, called the kernel matrix.The important advantage of the dual formulation is that for specific feature maps, evaluation of the kernel matrix components does not require explicit assessment of the inner product of two feature vectors (which might be infinite-dimensional after the feature map transformation).The quantum version of SVM is based on the dual optimization problem, where the main innovation is that a quantum computer can perform unitary feature transformations by applying quantum circuits and evaluate the inner product between transformed states by specified measurements.
Data loading of the classical data and feature map transformation.
Evaluation of the overlap between two feature states.
Classical optimization procedure, optimizing the circuit control parameters and modification of the feature map transformation.
Train:Step 1: The mapping of a classical data point x into a quantum feature state involves loading or encoding the data into a quantum state∣0n⟩UDE(x)∣x⟩.Various popular transformations exist, for example: basis, amplitude, angle, and dense encoding.The possible approaches showcase a general tradeoff between the number of qubits required to encode the classical data and the circuit depth. Generally, highly entangled states allow encoding more classical data utilizing fewer qubit, while requiring deeper circuits.
Following, a unitary feature operation maps the encoded state to a quantum feature state∣x⟩Uϕ(x)∣ϕ(x)⟩.The two transformations can be combined to a single unitary transformation U(x)=Uϕ(x)UDE(x), dependent on the classical data point x.Step 2:
The overlap between two feature vectors ϕ(xi) and ϕ(xj) is performed by applying the circuitUQSVM(xi,xj)=U†(xi)U(xj)to the initial state ∣0n⟩ and measuring the probability of measuring 0n.The expected probability, P∣0n⟩=∣⟨ϕ(xi)∣ϕ(xj)⟩∣2 provides the elements of the kernel matrixK(xi,xj)=⟨ϕ(xi)∣ϕ(xj)⟩.Step 3:
Optimize the dual problem using a classical optimization algorithm.The SVM optimization problem, Eq. (2), is a quadratic programming problem (a specific case of a convex optimization problem), for which several exact or approximate solution methods exist, such as active-set, interior point, and gradient/projection-based methods.The algorithm is given the kernel matrix and constraints as input, and produces the optimized {αi} coefficients.Prediction:
For a new data point s, the kernel matrix of the new datum is evaluated with respect to the optimized {αi}Predicted Labels(x)=sign(i=1∑myiαiK(xi,s)+b).(1)In practice, only a number of data points contribute to optimization (the support vectors).Only for these data points the corresponding coefficients, αi, do not vanish. As a consequence, most of the terms in the sum of Eq. (1) vanish, and we can limit the calculation of the kernel matrix elements for i‘s for which αi=0.
A “simple” data set, constructed by randomly distributing points around two source data points.
This data set enables a straightforward linear classification of the data by the introduction of a separating line (hyperplane in 2D) and constitutes a simple preliminary example.
A more complex data set, generated by qiskit’s ad_hoc_data function.
This is a special type of dataset that requires a highly nonlinear transformation to classify the data. Specifically, it can be accurately classified by a Pauli transformation map.The data sets are classified utilizing two feature maps: the Bloch Sphere and Pauli maps.From these data sets and feature maps, we construct exemplary examples.
In the first example, we generate “simple” artificial training, test, and prediction data sets (defined points in the data space). A Bloch sphere transformation is employed as a feature map, allowing perfect classification of the data.
The model is trained, tested, and then used for predictions of the labels of the prediction data set.
The second example involves classifying the ad_hoc_data by application of both the Bloch sphere and Pauli mappings.
The Pauli transformation can accurately classify the data, while the Bloch sphere mapping manages to classify approximately half of the test and prediction data sets.The examples emphasize the importance of tailoring the chosen feature map to the specific data set.Note: In the following examples, the labels are either 0 or 1 instead of +1/−1 as is in the theoretical description.
In the data generation we utilize a number of utility functions:
generate_data: takes two sources points and outputs a python dictionary with the training data points (random points within the vicinity of the sources).
data_dict_to_data_and_labels: given a generated data dictionary, outputs the input data and associated labels.
# Importing functions used for this demo, to generate random linearly separable datafrom classiq.applications.qsvm.qsvm_data_generation import ( data_dict_to_data_and_labels, generate_data,)# Generate sample data:sources = np.array([[1.23016026, 1.72327701], [3.20331931, 5.32365722]])training_input: dict = generate_data(sources=sources)test_input: dict = generate_data(sources=sources)predict_input, predict_real_labels = data_dict_to_data_and_labels( generate_data(sources=sources))
In addition to the feature map, we need to prepare our data.The training_input and test_input datasets consist of data and its labels.The labels are a 1D array where the value of the label corresponds to each data point and can be basically anything, such as (0, 1), (3, 5), or (‘A’, ‘B’).The predict_input consists only of data points (without labels).We normalize the data to be in the range [−1,1).
# Prepare and define `train_input` and `test_input` datasets consisting of data and labelsTRAIN_DATA_1, TRAIN_LABELS_1 = data_dict_to_data_and_labels(training_input)TEST_DATA_1, TEST_LABELS_1 = data_dict_to_data_and_labels(test_input)# Prepare and define `predict_input`PREDICT_DATA_1 = predict_inputdef normalize(data: np.ndarray, range: tuple) -> np.ndarray: """ Normalizes data in the range [range[0], range[1]) to be between [-1,1) Args: data (np.ndarray): data to be normalized range (tuple): range of the data """ return (2 * data - range[1] - range[0]) / (range[1] - range[0])# normalizing the data appropriatlyRANGE = (0, 6 * np.pi)TRAIN_DATA_1 = normalize(TRAIN_DATA_1, RANGE)TEST_DATA_1 = normalize(TEST_DATA_1, RANGE)PREDICT_DATA_1 = normalize(PREDICT_DATA_1, RANGE)
To get a better understanding of the classification task at hand, we plot the data.
# Plot the dataplot_range = (-1, -0.4)colors = {0: "blue", 1: "orange"}for i in range(TRAIN_DATA_1.shape[0]): plt.scatter(*TRAIN_DATA_1[i, :].T, c=colors[TRAIN_LABELS_1[i]])plt.title("Training Data")plt.xlim(plot_range)plt.ylim((-1, 0))plt.show()for i in range(TRAIN_DATA_1.shape[0]): plt.scatter(*TEST_DATA_1[i, :].T, c=colors[TEST_LABELS_1[i]])plt.title("Test Data")plt.xlim(plot_range)plt.ylim((-1, 0))plt.show()plt.scatter(*PREDICT_DATA_1.T)plt.title("Prediction Data (unlabeled)")plt.xlim((-1, -0.4))plt.ylim((-1, -0.2))plt.show()
Here the dark blue and orange dots correspond to the labels 0 and 1, and the prediction data is unlabelled.
When constructing a QSVM model, we must supply the feature map, encoding the classical data into quantum states in Hilbert space (the feature space of the problem). Here, we choose to encode the data onto the surface of the Bloch sphere.This can be defined in terms of the following transformation on the 2D data pointx=[x0,x1]T→RZ(πx1)RX(πx0)∣0⟩=cos(πx0/2)∣0⟩+ieiπx1sin(πx0/2)∣1⟩,where the circuit takes a single qubit per data point and the last equality is up to a global phase.We define a quantum function that generalizes the Bloch sphere mapping to an input vector of any dimension (also known as “dense angle encoding” in the field of quantum neural networks):RX(πx2i)RZ(πx2i+1)∣i⟩=cos(πx2i/2)∣0⟩+ieiπx2i+1sin(πx2i/2)∣1⟩,where xi are the elements of the vector x and d is the dimension of the vector.Each pair of entries in the vector is mapped to a Bloch sphere. If there is an odd size, we apply a single RX gate on an extra qubit.Since a single qubit stores the data of a single data point, for such a feature mapping the number of qubits required is n=⌈d/2⌉.The feature map is uploaded from the open_library.functions.variational module at the beginning of the notebook.Unlike the other feature maps, this feature map is located outside the quantum_feature_map module, as it serves as a building block beyond the scope of QSVM and QML.
We begin by building Classiq’s QSVM class, consisting of the QML model.The model is given a feature map from the quantum_feature_maps module, and possibly ExecutionPreferences.The feature map will be employed in the evaluation of the kernel in the training step.
# Build a quantum support vector machine modelbloch_num_qubits = int(np.ceil(np.log2(TRAIN_DATA_1.shape[1])))qsvm_model = QSVM(feature_map=inplace_encode_on_bloch, num_qubits=bloch_num_qubits)
Testing the training process, and outputting a test score.
Predicting, by taking unlabeled data and returning its predicted labels.
This may be applied multiple times on different datasets.These steps are performed utilizing the train, test and predict methods of the QSVM class.
In the training stage, the quantum kernel is constructed element by element, by repeated execution of quantum circuits.The execution employs Classiq’s sample_batch to evaluate the overlap between the states encoding the two data points of each pair.The kernel matrix is then fed into scikit-learn’s SVC (Support Vector Classifier) function to optimize the quadratic program (the dual optimization problem).The knowledge of the optimized coefficients enables the model to predict the classification of a new data point, utilizing Eq. (1).
# Train the modelqsvm_model.train(TRAIN_DATA_1, TRAIN_LABELS_1)
# Check the test scoretest_score, test_predicted_labels = qsvm_model.test(TEST_DATA_1, TEST_LABELS_1)# Predict labelspredicted_labels = qsvm_model.predict(PREDICT_DATA_1)
The quantum program is accessible by QSVM’s method get_qprog.
We can view the classification accuracy through test_score, moreover, since this data was previously generated, we also know the real labels and can print them for comparison.
Example 2: Pauli and Bloch Sphere Feature Map on a Complex Data Set
We begin by importing the relevant software packages
from itertools import combinationsfrom qiskit_algorithms.utils import algorithm_globalsfrom qiskit_machine_learning.datasets import ad_hoc_data
We consider a more complicated classification task, utilizing qiskit’s ad_hoc_data, we generate a data set which can be fully separated by the ZZ feature map [1]
Next we split the test features and labels to a test and prediction set.
# the sizes of the `test_features` and `test_labels` are double those of the `test_size` argument# Since there are `test_size` items for each `adhoc_dimension`def split(obj: np.ndarray, n: int = 20): quarter = n // 4 half = n // 2 first = np.concatenate((obj[:quarter], obj[half : half + quarter])) second = np.concatenate((obj[quarter:half], obj[half + quarter :])) return first, secondTEST_DATA_2, PREDICT_DATA_2 = split(test_features)TEST_LABELS_2, predict_real_labels_2 = split(test_labels)
We build a Pauli feature map.This feature map is of size N qubits for data x of size N, and it corresponds to the following unitary:U=exp(∑fk(1)(x)Hk(1)+∑fk(2)(x)Hk(2)+…)H⊗n,where H⊗n designates the Hadamard transform, and H(i) is a Hamiltonian acting on i qubits according to some connectivity map, and f(i) is some classical function, typically taken as the polynomial of degree i.For example, if our data is of size 3 and we assume circular connectivity, taking Hamiltonians depending only on Z, the Hamiltonian reads as∑fk(1)(x)Hk(1)=α(x0+β)ZII+α(x1+β)IZI+α(x2+β)IIZ,∑fk(2)(x)Hk(2)=γ2(x0+ζ)(x1+ζ)ZZI+γ2(x1+ζ)(x2+ζ)IZZ+γ2(x0+ζ)(x3+ζ)ZIZ,where (α,β) and (γ,ζ) define some affine transformation on the data and correspond to the functions f(1,2).We start by defining classical functions for creating a connectivity map for the Hamiltonians and for generating the full Hamiltonian:
We first define the hyperparameters of the Pauli feature map and construct an appropriate wrapper feature map, utilizing pauli_feature_map.
# Define the parameters for the Pauli feature mapN_DIM = 2PAULIS = [[Pauli.Z], [Pauli.Z, Pauli.Z]]CONNECTIVITY = 2AFFINES = [[1, 0], [1, np.pi]]REPS = 2# Build the wrapper function for the Pauli feature mapfeature_map = lambda data, qba: pauli_feature_map( data, PAULIS, AFFINES, CONNECTIVITY, REPS, qba)
Next, the model is constructed
pauli_num_qubits = 2# Build a quantum support vector machine modelqsvm_model_pauli = QSVM(feature_map=feature_map, num_qubits=pauli_num_qubits)
The notebook demonstrated the application of the Quantum Support Vector Machine (QSVM) algorithm to supervised data classification tasks.Two datasets were analyzed using two distinct quantum feature maps: a Bloch-sphere–based encoding and a Pauli-based feature map.The Bloch-sphere mapping achieved perfect classification accuracy on a linearly separable dataset but showed only moderate performance when applied to a more complex, nonlinearly separable dataset. In contrast, the Pauli feature map successfully captured the structure of the complex dataset and yielded accurate classification results.The results emphasize a central consideration in QSVM design: the choice of quantum feature map plays a decisive role in model performance.Effective alignment between the feature map and the underlying structure of the data is essential for achieving high classification accuracy.