Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Content Audit: "address_2" in Sandy's DB #19379

Open
Agile6MSkinner opened this issue Oct 1, 2024 · 18 comments
Open

Content Audit: "address_2" in Sandy's DB #19379

Agile6MSkinner opened this issue Oct 1, 2024 · 18 comments
Assignees
Labels
Content audit current sprint Facilities Facilities products (VAMC, Vet Center, etc) Needs refining Issue status sitewide

Comments

@Agile6MSkinner
Copy link

Agile6MSkinner commented Oct 1, 2024

Background

The Google directions link is failing to find directions for some facilities that have data in Line 2 of the address. The assumption is that this results from data passed in "address_2" that contain certain special characters (which ones is currently unknown). In order to test, scope and evaluate possible solutions, we need to know the volume of special characters in in "address_2", across facility types.

Slack Thread

Technical notes

Drupal concatenates Address 1 & 2 into a single field when data is migrated in from Facilities API. Likely we will need to query the Facilities API, to get a complete sense of the various address_2 values.

Per Lighthouse (thread with Adam here), Lighthouse does not exclude any special characters from either address_1 or address_2.

Some Google Directions links are based on:

  • Drupal GraphQL queries in Content Build (wouldn't have access to line 1 and 2 separately)
  • Facilities API queries in Vets Website

While in theory it would save time to independently validate which of these characters causes the issue, there's also a case to be made that there could be others that are broken because of a different character being used. We can assume the address_2 field does not block any special characters, but Jill will reach out to LH to verify.

Acceptance Criteria

  • Determine number of locations that use address_2
  • List special characters that are found in address_2 (not alpha, not numeric)
@Agile6MSkinner Agile6MSkinner added CMS Team CMS Product team that manages both editor exp and devops Content audit Facilities Facilities products (VAMC, Vet Center, etc) Needs refining Issue status sitewide labels Oct 1, 2024
@Agile6MSkinner Agile6MSkinner assigned EWashb and unassigned EWashb Oct 1, 2024
@ian-sears
Copy link

Adding https://va-gov.atlassian.net/browse/VAHELP-8087 for @mmiddaugh as a new, additional reference that just came in this afternoon as well.
This CMS Jira Help Desk ticket is from Barbara Kuhn regarding the U.S. Virgin Islands Vet Center (USVI)
Barb writes:
"Quick question on the Mapbox location showing for USVI: Although the site isn’t yet published, I am seeing in Preview mode that the visual map display isnt what I see when I manually check the address in other search engines. "
Barb checked VAST and there is no option for USVI, Just VI
cc'ing @mmiddaugh for visibility

@jilladams
Copy link
Contributor

Technical notes from the slack thread, to factor into next steps ticket

  • For VAMCs, address 2 comes from Lighthouse. For VBAs, it comes from Sandy's DB (where Michelle could help us with data cleanup, as owner)
  • We are interested in how to align the behavior of address 2, to make better / not broken Google Maps links. Sometimes street address is in address_2, so we cannot just ignore the field. Instead, we want to understand how it's used today, in order to clean it up
  • Technically, we could evaluate the contents of address_2 when building the Google maps link. e.g. don't include address 2 if the value includes words like "floor, floors, basement" etc.

@jilladams jilladams added Drupal engineering CMS team practice area and removed CMS Team CMS Product team that manages both editor exp and devops labels Oct 2, 2024
@jilladams
Copy link
Contributor

jilladams commented Oct 3, 2024

Asked LH if they validate against any special chars in address_2 here: They do not.

@Agile6MSkinner Agile6MSkinner removed the Drupal engineering CMS team practice area label Oct 16, 2024
@mmiddaugh
Copy link
Contributor

mmiddaugh commented Oct 16, 2024

Results of audit for VBAs

  • 284 VBA locations in the DB have populated address_2 fields
    • of these, 96 have special characters (not alpha, not numeric) in address_2 (see table)

Several special characters were also found in address_1, plus 2 not seen in address_2: ' and ß

Special character Instances Example facility address_1 address_2
. 3 vba_329 477 Michigan Avenue Patrick V. McNamara Federal Building, 12th Floor
one or more , 72 vba_304h 600 Mt. Pleasant Avenue East Campus, Building 5 Lower Level
one or more - 27 vba_314o 700 24th Street Building 8204-2, Room 210
& 8 vba_317u 3000 NW 83rd St Building R, Room 110 - Veterans & Military Success Services
: 2 vba_311b Dr Hitzelberger Straße, 66849 Landstuhl REG-MED Center Unit:33100 Attn: Department of Veterans Affairs APO, AE 09180
/ 9 vba_318p 2843 Normandy Street Soldier Support Center/VA Outbase office
multiple vba_311 83-2 Dodu-ri, Paengseong-eup, Pyeongtaek-si, Gyeonggi-do, South Korea Veterans Affairs Office/BDD USAG Camp Humphreys Unit #15228 Attention: Building P-6400 APO, AP 96271-5228

@jilladams
Copy link
Contributor

Amazing, thanks MIchelle!

Since Address_2 is used across all our facility types coming from LH, we will still need to do this audit for VAMC, Vet Center (and NCA? do we care about them yet?). In theory that's possible to do in the Drupal UI, but I still think a DB query will be the fastest way to get it done, which would be Drupal engineering work.

@jilladams
Copy link
Contributor

Wait, I think I'm wrong, it's not possible to do in the Drupal UI, or in the Drupal DB. DaveP noted before: we smoosh address_1 and address_2 into a single Address field in Drupal, when we import. I think we have to ask LH to help with auditing address_2. @omahane does that sound right to you?

@jilladams
Copy link
Contributor

(This ticket title is limited to Sandy's DB, but I don't think that was the spirit of trying to figure out special chars in address_2. I could be lost in the sauce.)

@jilladams
Copy link
Contributor

Conversation with Adam re: address_2 in LH: https://dsva.slack.com/archives/C02BTJTDFTN/p1727970129537389

@jilladams
Copy link
Contributor

  • VAMC - has 1 address field
  • Vet Centers - has 1. First = address_1. Second = address_2 + address_3
  • VBA - 1 address field

@jilladams
Copy link
Contributor

Sitewide can GET from LH API the contents of address_1 and address_2 (and address_3), but we can't do it in Drupal bc of the way the Drupal migration is smooshing the field data.

@jilladams
Copy link
Contributor

Michelle noted she can also pull addresses out of the VAST report. In that case, will include only have VHA + Vet Centers.

Adam noted that for NCA: it depends if the cemetery is national or state.

  • National - most fields from CDW.
  • State - most fields from NCA-hosted XML files.
  • Sometimes Common Name is in Address1, at other times, in Address 2. Sometimes: NCA populates an alternative facility name in address1 and put the primary address in address2.

@jilladams
Copy link
Contributor

https://dsva.slack.com/archives/C06EK4NLGDQ/p1729545788290989

here's the list of VAST locations (sharepoint). It includes VA health and Vet Center facilities (plus mobiles) which are marked as Active or Temp. Deactivated - consistent with LH API.

  • There are 1792 total listings.
    • 4 have blank Street Address fields - CWTs, as previously identified
    • 1146 have blank Street Address-Suite (Address 2) fields.
    • I highlighted fields with "non-traditional" values (which could be locally appropriate but may be worth checking) - i.e., Street name without a number, Intersection, Building name, "Lot"
      • 16 in Street Address
      • 9 in Street Address-Suite

@jilladams
Copy link
Contributor

Things we need to figure out:

  1. Right now we pull in addresses 1, 2, and 3, and store them in Drupal in a variety of ways across content types. If we want to standardize, we will need to do migrations.
  2. If the goal is to get better address structure in our Google Maps directions links, can we do that without having to migrate / handle addresses differently? (is there a short pole?) To Christian's point: GIGO. If addresses 1 & 2 are used willy nilly, it will be hard to programmatically clean that up and send Google better data.

We need to do some smart thinking about how we might go about standardizing. This feels like a content modeling review, based on this audit. @davidmpickett does that sound right to you? If so, we could repurpose this ticket and refine it as content modeling, since the audit sort of happened async in the comments.

@davidmpickett
Copy link
Contributor

@jilladams Content modeling seems incredibly premature here. That is a way of iterating on solution architecture. This feels like it is still early in Discovery. Here are some questions I would need answered before content modeling.

  • What is the scope of the problem we’re trying to solve?
  • Is it about how all addresses are stored in Drupal? Are we distinguishing between editor-entered addresses and those migrated in via API?
  • Is the Google directions link the only thing that is broken on our products because of address data formatting? Are there other indicators of success?
  • What aspects of our products are dependent on the current formatting and might break if we switch to a new model?
  • Is this project a Q4 priority?

Address formatting and validation is also way more complex than phone number validation. Comparing this to the phone number standardization epic, that was a clearly defined subset of the phone number problem space: implement better guardrails in Drupal so that phone numbers are formatted consistently and can be fed in to the Design System component. We also already had a pattern in place that we were adopting, vs this where it’s not clear that formatting is even the issue

@mmiddaugh
Copy link
Contributor

mmiddaugh commented Oct 23, 2024

Summary of further testing observations

  • Locations with vague street address (Corner of Lamont Street and Veterans Way or Tobyhanna Army Depot) can have Google map directions links which behave as expected
  • Google map directions links for a location may be different when directing from the Facility Locator vs the facility page and one may behave as expected while the other does not (see vha_693GC)
  • Google map directions may behave as expected, even when Street Address 2 is populated (see vc_0325V)
  • another example of FL disregarding state parameter - if St. Croix or St. Croix, VI is entered as parameter, search results are located in Kentucky but zip code as the parameter returns locations in the Virgin Islands
"Did you mean" prompt image.png

About attached Data file

  • Column labels for VHA and Vet Center locations include a column labeled "Street Address - Building Complex name" as available in the VAST export.
  • VBA does not have this field so follows separately.

@jilladams
Copy link
Contributor

From refinement:

  • Right now: migrations from LH don't construct Address in Drupal in such a way that we can easily programmatically build better Google links.
  • We could dive deep and modify the migration(s) and how we store address. This has a very big scope of impact, across facility types, and may not be a priority to go that deep.
  • We decided that we need to review occurrences / metrics to determine priority and next steps.

@mmiddaugh
Copy link
Contributor

Clickthrough for "Get directions to Google maps" link in the month of September 2024

  • VAMC pages: 27,838 times
  • Vet Center pages: 1,890
  • Regional office pages: 124
  • Facility Locator: 20,686

@Agile6MSkinner
Copy link
Author

Agile6MSkinner commented Dec 11, 2024

Update: We have identified and solved the problems with Guam, Puerto Rico and USVI in #19905.

Still outstanding:

Google map directions links for a location may be different when directing from the Facility Locator vs the facility page and one may behave as expected while the other does not (see vha_693GC)

In this case we found that the saddr worked when it was set to the local town (directly from the FL), but not when it was set to Current+Location (the facility page).

Another contributing issue has been identified in the form of parameter-breaking characters in addresses being passed in URLs to Google Maps, specifically "&". While there are ways to defensively address this, it may not be worth the engineering effort.

There are other situations where editors have just added too much information to the address_2 field (e.g. multiple buildings, floors, suites, etc.) and Google Maps is not able to effectively extract the target address.

After speaking with @mmiddaugh at length, we are going to approach address issues identified in this ticket as follows:

1 - Michelle will look through Sandy's DB/VAST for addresses containing "&" and replace with the word "and" or reformat in another way. If we find that this is a widespread issue and not just a few instances, we can cut a ticket to replace "&" with "%26" in our links to Google Maps. This addresses the "&" issue.
2 - Michelle will look for through Sandy's DB for instance of address_2 that are long and contain superfluous information like multiple floors and suites. She will work with the editors to get an address that is more compatible with Google Maps.
3 - We will look more closely at new address requests to make sure that they do not have either of these issues
4 - Product investigate and decision on whether we should use Current+Location as the saddr in our Maps URLs

Related: The URL is sometimes generated with saddr set to "Undefined." I have cut a ticket (#20080) to investigate this, since it causes the result to attempt directions from the closest establishment with "Undefined" in the name.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Content audit current sprint Facilities Facilities products (VAMC, Vet Center, etc) Needs refining Issue status sitewide
Projects
None yet
Development

No branches or pull requests

6 participants