Replace the fitter of StarkRamseyXYAmpScan experiment #1243

nkanazawa1989 · 2023-08-01T19:11:39Z

Summary

This PR replaces the fitter of StarkRamseyXYAmpScan experiment. Current fitter puts strong assumption on model parameter and it makes the fitting quite unstable to the change in dephasing rate.

This PR doesn't introduce any breaking API change and a separate release note is not necessary because the experiment is not released yet.

Details and comments

The current fitter simultaneously fits the P1 values of the Ramsey X and Y experiment. Given the Stark shift is the third order polynominal of tone amplitude x, the P1 values for X and Y are

Px(x) = A cos(2 pi t_S (c1 x + c2 x**2 + c3 x**3 + f_err)) + offset
Px(x) = A sin(2 pi t_S (c1 x + c2 x**2 + c3 x**3 + f_err)) + offset

where t_S is the fixed Stark delay and others are the fit parameters. Note that current fitter assumes that A is independent of x, however, since this is a variant of Ramsey experiment, the P1 amplitude actually depends on x. This is because qubit T2 may depend on the frequency, or at larger Stark amplitude the qubit heating may cause faster dephasing. In the new fitter, we directly extract the phase polynominal and perform fit on this synthesized data. Namely,

poly = unwrap(arctan2(Py, Px)) ~ 2 pi t_S (c1 x + c2 x**2 + c3 x**3 + f_err)

Because A is canceled out in this form, the fitting becomes robust to the amplitude dependence of the dephasing rate. Usually we cannot obtain the reasonable model for A(x).

For example, the experiment data below shows the difference of fit on the poly data and raw Px, Py data (actually the parameters are fit on the poly data and Px, Py data are just visualized with the fitted parameters).

As you can see, the Px, Py data are damped quickly on positive amplitudes, and the envelope of the trigonometric function is asymmetric with respect to 0. In this situation the current fitter doesn't work well. In contrast, new phase fitter works nicely.

wshanks

This is looking good!

wshanks · 2023-10-26T15:00:07Z

qiskit_experiments/curve_analysis/curve_analysis.py

+            for m_id, (m_name, model_spec) in enumerate(self.options.data_subfit_map.items()):
+                if model_spec.items() <= metadata.items():
+                    break
+            else:


Is this change necessary? It makes some behavioral changes which I thought might be needed by the analysis when I first started reading the PR, but now that I have read it all I don't think so.

Before this change, there needed to be one model for each series to be fit in the experiment. With this change, there can be one model that gets reused for multiple series that are distinguished by their metadata and the filtering of the data_subfit_map dictionary. That works for this section because model_id gets set to the index of the data_subfit_map. However, below, the model_id is used to index in to self._models so there actually does still need to be one model per series. I think it would be better to continue to iterate over self._models here rather than data_subfit_map in that case. Otherwise, there is the potential for confusion if an analysis author does not write the models list and data_subfit_map keys in the same order.

This change now assigns all circuits without a match to the first model rather than only doing that when there is only one model. I think it is assumed that all circuits correspond to a model so this is probably fine. More enforcement of the assumptions could be done though to check for user error. For example, I think the first model should only be used when there is a single model and no data_subfit_map. If there are multiple models or a data_subfit_map with a single model and a circuit does not match any model, I think it should be a user error (or that circuit could be ignored but BatchExperiment could be used to batch unrelated circuits and have them filtered out before getting to CurveAnalysis, so I don't know that we need to allow for unfit circuits in the experiment data).

one model that gets reused for multiple series that are distinguished by their metadata and the filtering of the data_subfit_map dictionary.

This is not correct because new fit model only works with the phase data, while what the _run_data_processing method generates is P1 data of (X, Y) quadrature. When I assign model_id (0, 1) to them, the fitter or initial guess subroutine may receive unexpected numbers because they filter the data with the index. We cannot discard X Y data because they are used to create figure at later time.

This 1038bd8 is the alternative implementation, in which we let the curve analysis baseclass _run_data_processing classify the data based on the fit models, which yields false classification in this analysis. Then ramsey fitter subclass does re-classification in its format method (or it could overwrite _run_data_processing as well).

I am having some trouble matching up your comments to mine and to the code changes, but I think the way you changed things is okay.

one model that gets reused for multiple series that are distinguished by their metadata and the filtering of the data_subfit_map dictionary.

Here I was commenting on what could be possible and not specifically on what you were doing, so that was confusing. I meant that by matching the data_subfit_map the code could all using one model for multiple series. Like RamseyXY could use a single sinusoidal model and map the data to two series to fit X and Y. I didn't mean that here the X and Y data were getting mapped to a single model that fit their phase.

This 1038bd8 is the alternative implementation, in which we let the curve analysis baseclass _run_data_processing classify the data based on the fit models, which yields false classification in this analysis. Then ramsey fitter subclass does re-classification in its format method

I thought the original PR also misclassified the XY series and then fixed it in _format_data?

Regarding overriding _run_data_processing, let's review the current processing chain for the data:

Call _run_analysis() on ExperimentData

Call CurveAnalysis._run_data_processing() on ExperimentData
i. Optional filtering of the data (I think only used by CrossResonanceHamiltonianAnalysis)
ii. Map each ExperimentData.data() entry to a model ID.
iii. Save metadata key/values with each measurement.
iv. Run the data processor on all Experiment.data() (was trained earlier).

Call CurveAnalysis._format_data on the ScatterTable returned by _run_data_processing().

In the base class, this mostly just averages the data and labels it formatted.

Call CurveAnalysis._run_curve_fit() on the ScatterTable returned by _format_data()

This passes the full ScatterTable to CurveData._generate_fit_guesses()

This groups the data by model ID and passes the group yval, yerr, and xval to the model.

So one of the first steps is to map each ExperimentData.data() to a model ID. For the simple case where each measurement maps one to one to a model, this is fine, but here we see that it might not be general enough since we want to combine two measurements into one formatted data point for fitting. We could just override _format_data to produce the combined points as the formatted points but we want to keep the XY formatted points as well for the plotting code. We solve this by cheating a little bit and assigning "n/a" model ID to the XY points because the fitting can skip that but the plotting goes by the model name instead of the model ID.

Some things to think about:

Assignment of model ID could be delayed until the formatted data rather than applied to the processed data. This is inconvenient to do because the mapping is based on the result metadata and it is better for the data frame to hold an integer column than to keep looking at dictionaries of metadata. As long as the mapping is like here where non-overlapping sets get mapped to one point it is clean. A case where overlapping sets of points got mapped to new fit curves would be more problematic.

Classification of curves to be plotted could be done separately from data to be fit. Then the XY data could be captured as formatted data into the scatter table without mapping it to a model ID.

I did cleanup of the scatter table in #1319 . The point of refactoring is model_name -> name and model_id -> class_id. In the raw category data, the name and id doesn't need to be consistent with the fit model. The PR is rebased and the analysis class is reimplemented in bdd83d1

test/curve_analysis/test_baseclass.py

qiskit_experiments/library/characterization/analysis/ramsey_xy_analysis.py

wshanks · 2023-10-26T17:42:19Z

qiskit_experiments/library/characterization/analysis/ramsey_xy_analysis.py

+            tmp[:, columns.index("xval")] = amplitudes
+            tmp[:, columns.index("yval")] = unwrapped_phase
+            tmp[:, columns.index("yerr")] = phase_s
+            tmp[:, columns.index("model_name")] = f"F{direction}"


What is the F for?

F stands for frequency. There are Xpos, Xneg, Ypos, Yneg, Fpos, Fneg in the scatter table, which will be saved in artifact once we activate it.

Okay, it might be worth mentioning in a docstring somewhere for clarity. Why "frequency" rather than "phase"?

I choose frequency because this is the quantity of interest, but I realized that we can keep phase internally and convert it into frequency at the visualization step. This becomes θpos/neg now.

Oh, right, the dependence of frequency on amplitude is complicated but the dependence of frequency on phase at constant amplitude is simple. Maybe it would be better to divide by 2 * pi * ts in _format_data and store the frequency rather than the phase? That might be better for a user wanting to work with the scatter table after running the analysis.

That makes sense :) done in 994642f

test/library/characterization/test_stark_ramsey_xy.py

qiskit_experiments/library/characterization/analysis/ramsey_xy_analysis.py

wshanks

I have a few minor comments, but I think this is just about ready.

I feel like what the current class is doing is pushing the limits of how CurveAnalysis is set up to work and would like to see it fit a little better but I wouldn't block the PR just for that.

qiskit_experiments/library/characterization/analysis/ramsey_xy_analysis.py

wshanks · 2023-10-31T03:22:32Z

qiskit_experiments/library/characterization/analysis/ramsey_xy_analysis.py

+            for direction in ("pos", "neg"):
+                rows = (curve_data.series == series) & (curve_data.direction == direction)
+                curve_data.loc[rows, "model_name"] = f"{series}{direction}"
+                # Assign random model id which will be invalidated later.


This works but it feels like it is breaking the rules of the system. I wonder if the data processing steps could be adjusted to account for this. Looking at the steps I listed out in an earlier comment, I think the model classification could occur later. Since _run_data_processing() saves the metadata for each measurement, it could group data by these metadata to do the _format_data averaging without assigning to a model ID (model_id could be NA for everything at this point). Then there could be a _classify_models hook that matches the models to the formatted data. In the case of StarkRamseyXYAmpScanAnalysis, it could call super()._format_data() and the add in the phase data. It could then either override _classify_models() or it could use different metadata for the phase data than the XY so that the default _classify_models() only mapped the phase data to models.

Thanks. Delaying model mapping makes sense. Done in bdd83d1 with a bit different approach.

- Remove context of fit model from data name and index; model_name -> name, model_id -> class_id - Remove extra metadata from the scatter table

data_subfif_map becomes a single source of the data name. This means data is classified without notion of the fit model. Index mapping is made in the formatter function through the data-model name identity.

Co-authored-by: Will Shanks <[email protected]>

wshanks

I had one question about phase versus frequency naming but otherwise I think this is ready to merge after #1319 merges and this is rebased on main.

qiskit_experiments/library/characterization/analysis/ramsey_xy_analysis.py

wshanks · 2023-11-16T18:34:13Z

qiskit_experiments/library/characterization/analysis/ramsey_xy_analysis.py

+            tmp[:, columns.index("xval")] = amplitudes
+            tmp[:, columns.index("yval")] = unwrapped_phase
+            tmp[:, columns.index("yerr")] = phase_s
+            tmp[:, columns.index("model_name")] = f"F{direction}"


Oh, right, the dependence of frequency on amplitude is complicated but the dependence of frequency on phase at constant amplitude is simple. Maybe it would be better to divide by 2 * pi * ts in _format_data and store the frequency rather than the phase? That might be better for a user wanting to work with the scatter table after running the analysis.

qiskit_experiments/library/characterization/analysis/ramsey_xy_analysis.py

wshanks · 2023-12-19T19:07:36Z

I was just looking over this again. I think it is still in good shape (besides one small wording suggestion). It just needs to be updated after #1319 is merged. That should take care of the test failure related to class ID numbers changing by 1.

### Summary This PR modifies `ScatterTable` which is introduced in #1253. This change resolves some code issues in #1315 and #1243. ### Details and comments In the original design `ScatterTable` is tied to the fit models, and the columns contains `model_name` (str) and `model_id` (int). Also the fit module only allows to have three categorical data; "processed", "formatted", "fitted". However, #1243 breaks this assumption, namely, the `StarkRamseyXYAmpScanAnalysis` fitter defines two fit models which are not directly mapped to the results data. The data fed into the models is synthesized by consuming the input results data. The fitter needs to manage four categorical data; "raw", "ramsey" (raw results), "phase" (synthesized data for fit), and "fitted". This PR relaxes the tight coupling of data to the fit model. In above example, "raw" and "ramsey" category data can fill new fields `name` (formally model_name) and `class_id` (model_id) without indicating a particular fit model. Usually, raw category data is just classified according to the `data_subfit_map` definition, and the map doesn't need to match with the fit models. The connection to fit models is only introduced in a particular category defined by new option value `fit_category`. This option defaults to "formatted", but `StarkRamseyXYAmpScanAnalysis` fitter would set "phase" instead. Thus fit model assignment is effectively delayed until the formatter function. Also the original scatter table is designed to store all circuit metadata which causes some problem in data formatting, especially when it tries to average the data over the same x value in the group. Non-numeric data is averaged by builtin set operation, but this assumes the metadata value is hashable object, which is not generally true. This PR also drops all metadata from the scatter table. Note that important metadata fields for the curve analysis are one used for model classification (classifier fields), and other fields just decorate the table with unnecessary memory footprint requirements. The classifier fields and `name` (`class_id`) are sort of duplicated information. This implies the `name` and `class_id` fields are enough for end-users to reuse the table data for further analysis once after it's saved as an artifact. --------- Co-authored-by: Will Shanks <[email protected]>

…ure/unwrapped_phase_fitter

…_analysis.py Co-authored-by: Will Shanks <[email protected]>

…1989/qiskit-experiments into feature/unwrapped_phase_fitter

wshanks

Looks good!

### Summary This PR modifies `ScatterTable` which is introduced in qiskit-community#1253. This change resolves some code issues in qiskit-community#1315 and qiskit-community#1243. ### Details and comments In the original design `ScatterTable` is tied to the fit models, and the columns contains `model_name` (str) and `model_id` (int). Also the fit module only allows to have three categorical data; "processed", "formatted", "fitted". However, qiskit-community#1243 breaks this assumption, namely, the `StarkRamseyXYAmpScanAnalysis` fitter defines two fit models which are not directly mapped to the results data. The data fed into the models is synthesized by consuming the input results data. The fitter needs to manage four categorical data; "raw", "ramsey" (raw results), "phase" (synthesized data for fit), and "fitted". This PR relaxes the tight coupling of data to the fit model. In above example, "raw" and "ramsey" category data can fill new fields `name` (formally model_name) and `class_id` (model_id) without indicating a particular fit model. Usually, raw category data is just classified according to the `data_subfit_map` definition, and the map doesn't need to match with the fit models. The connection to fit models is only introduced in a particular category defined by new option value `fit_category`. This option defaults to "formatted", but `StarkRamseyXYAmpScanAnalysis` fitter would set "phase" instead. Thus fit model assignment is effectively delayed until the formatter function. Also the original scatter table is designed to store all circuit metadata which causes some problem in data formatting, especially when it tries to average the data over the same x value in the group. Non-numeric data is averaged by builtin set operation, but this assumes the metadata value is hashable object, which is not generally true. This PR also drops all metadata from the scatter table. Note that important metadata fields for the curve analysis are one used for model classification (classifier fields), and other fields just decorate the table with unnecessary memory footprint requirements. The classifier fields and `name` (`class_id`) are sort of duplicated information. This implies the `name` and `class_id` fields are enough for end-users to reuse the table data for further analysis once after it's saved as an artifact. --------- Co-authored-by: Will Shanks <[email protected]>

…ty#1243) ### Summary This PR replaces the fitter of `StarkRamseyXYAmpScan` experiment. Current fitter puts strong assumption on model parameter and it makes the fitting quite unstable to the change in dephasing rate. This PR doesn't introduce any breaking API change and a separate release note is not necessary because the experiment is not released yet. ### Details and comments The current fitter simultaneously fits the P1 values of the Ramsey X and Y experiment. Given the Stark shift is the third order polynominal of tone amplitude x, the P1 values for X and Y are ``` Px(x) = A cos(2 pi t_S (c1 x + c2 x**2 + c3 x**3 + f_err)) + offset Px(x) = A sin(2 pi t_S (c1 x + c2 x**2 + c3 x**3 + f_err)) + offset ``` where t_S is the fixed Stark delay and others are the fit parameters. Note that current fitter assumes that A is independent of x, however, since this is a variant of Ramsey experiment, the P1 amplitude actually depends on x. This is because qubit T2 may depend on the frequency, or at larger Stark amplitude the qubit heating may cause faster dephasing. In the new fitter, we directly extract the phase polynominal and perform fit on this synthesized data. Namely, ``` poly = unwrap(arctan2(Py, Px)) ~ 2 pi t_S (c1 x + c2 x**2 + c3 x**3 + f_err) ``` Because A is canceled out in this form, the fitting becomes robust to the amplitude dependence of the dephasing rate. Usually we cannot obtain the reasonable model for A(x). For example, the experiment data below shows the difference of fit on the poly data and raw Px, Py data (actually the parameters are fit on the poly data and Px, Py data are just visualized with the fitted parameters). ![image](https://github.com/Qiskit-Extensions/qiskit-experiments/assets/39517270/d14b8537-dcc9-489c-b277-fe7df7af8938) As you can see, the Px, Py data are damped quickly on positive amplitudes, and the envelope of the trigonometric function is asymmetric with respect to 0. In this situation the current fitter doesn't work well. In contrast, new phase fitter works nicely. --------- Co-authored-by: Will Shanks <[email protected]> Co-authored-by: Will Shanks <[email protected]>

nkanazawa1989 added this to the Release 0.6 milestone Aug 8, 2023

nkanazawa1989 force-pushed the feature/unwrapped_phase_fitter branch from b8cb0ea to c01b0fa Compare August 9, 2023 09:20

nkanazawa1989 force-pushed the feature/unwrapped_phase_fitter branch from 027ad30 to 5619ed6 Compare October 20, 2023 06:14

nkanazawa1989 assigned wshanks Oct 20, 2023

wshanks reviewed Oct 26, 2023

View reviewed changes

wshanks reviewed Oct 30, 2023

View reviewed changes

qiskit_experiments/library/characterization/analysis/ramsey_xy_analysis.py Outdated Show resolved Hide resolved

wshanks reviewed Oct 31, 2023

View reviewed changes

nkanazawa1989 mentioned this pull request Nov 14, 2023

Scatter table refactoring #1319

Merged

nkanazawa1989 and others added 11 commits November 15, 2023 11:05

Refactoring

8db9df9

- Remove context of fit model from data name and index; model_name -> name, model_id -> class_id - Remove extra metadata from the scatter table

Update name assignment in data processing

59c0524

data_subfif_map becomes a single source of the data name. This means data is classified without notion of the fit model. Index mapping is made in the formatter function through the data-model name identity.

Replace P1 fitter with the unwrapped phase fitter for robustness.

a903d06

Reimplement with scatter table data

a85c13a

Documentation fix

2057db9

Co-authored-by: Will Shanks <[email protected]>

Use nullable integer

bd2bf87

Move model classification to formatter

c311e8a

Revert test

b49cd31

Rewrite formatter based on qiskit-community#1319

bdd83d1

update unwrapper logic

cd72c27

Documentation fix 2

98aee49

Co-authored-by: Will Shanks <[email protected]>

nkanazawa1989 force-pushed the feature/unwrapped_phase_fitter branch from 2b6b70d to 98aee49 Compare November 16, 2023 12:59

wshanks reviewed Nov 16, 2023

View reviewed changes

nkanazawa1989 mentioned this pull request Nov 19, 2023

Rework of Stark module and workflow #1236

Merged

nkanazawa1989 added 2 commits November 21, 2023 03:08

replace formatted data with frequency values

994642f

fix fit parameter description

227fff6

nkanazawa1989 and others added 4 commits January 10, 2024 10:32

Merge branch 'main' of github.com:Qiskit/qiskit-experiments into feat…

d8bad0d

…ure/unwrapped_phase_fitter

Update qiskit_experiments/library/characterization/analysis/ramsey_xy…

30a6df9

…_analysis.py Co-authored-by: Will Shanks <[email protected]>

Add description of pulse_len option

3c576a4

Merge branch 'feature/unwrapped_phase_fitter' of github.com:nkanazawa…

6ff9d66

…1989/qiskit-experiments into feature/unwrapped_phase_fitter

wshanks approved these changes Jan 10, 2024

View reviewed changes

wshanks added this pull request to the merge queue Jan 10, 2024

Merged via the queue into qiskit-community:main with commit ab40c5a Jan 10, 2024
11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace the fitter of StarkRamseyXYAmpScan experiment #1243

Replace the fitter of StarkRamseyXYAmpScan experiment #1243

nkanazawa1989 commented Aug 1, 2023 •

edited

Loading

wshanks left a comment

wshanks Oct 26, 2023

nkanazawa1989 Oct 27, 2023

nkanazawa1989 Oct 27, 2023

wshanks Oct 30, 2023

wshanks Oct 30, 2023 •

edited

Loading

nkanazawa1989 Nov 16, 2023

wshanks Oct 26, 2023

nkanazawa1989 Oct 27, 2023

wshanks Oct 30, 2023

nkanazawa1989 Nov 16, 2023

wshanks Nov 16, 2023

nkanazawa1989 Nov 20, 2023

wshanks left a comment

wshanks Oct 31, 2023

nkanazawa1989 Nov 16, 2023

wshanks left a comment

wshanks Nov 16, 2023

wshanks commented Dec 19, 2023

wshanks left a comment

Replace the fitter of StarkRamseyXYAmpScan experiment #1243

Replace the fitter of StarkRamseyXYAmpScan experiment #1243

Conversation

nkanazawa1989 commented Aug 1, 2023 • edited Loading

Summary

Details and comments

wshanks left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wshanks Oct 30, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wshanks left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wshanks left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wshanks commented Dec 19, 2023

wshanks left a comment

Choose a reason for hiding this comment

nkanazawa1989 commented Aug 1, 2023 •

edited

Loading

wshanks Oct 30, 2023 •

edited

Loading