Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

changes for new elasticsearch version #135

Closed
wants to merge 26 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
561aeeb
upgrade Puma
wkdewey May 19, 2022
deb704a
add specific version of puma to avoid security warnings
wkdewey May 19, 2022
065857a
update .ruby-version
wkdewey May 25, 2022
38b02b7
update to later version of puma
wkdewey May 25, 2022
e763bb1
another round of gem updates
wkdewey May 25, 2022
2769319
add facet for matching nested facet
wkdewey May 24, 2022
3b33d4a
add filter for matching nested facet
wkdewey May 24, 2022
fa5c245
change split character, add missing comma
wkdewey May 24, 2022
21310ef
parse the array for matching nested fields
wkdewey May 26, 2022
4656dc6
change how compound facet name is parsed
wkdewey May 26, 2022
2b3af8e
use facet name as agg name
wkdewey May 26, 2022
100ac90
change query to filter
wkdewey May 26, 2022
3dd6608
fix nested filter aggregation so it doesn't cause 400 error
wkdewey May 27, 2022
fafb3a6
check for deeper nesting of buckets
wkdewey May 31, 2022
2ade65b
Change separator
wkdewey Jun 1, 2022
3c223e4
Fix parsing and query for filter matching
wkdewey Jun 1, 2022
ef22307
rewrite filtered aggregation to be either nested or not
wkdewey Jun 2, 2022
16a490d
filtering on a single item can either be nested or not
wkdewey Jun 2, 2022
b038be0
update config for server
wkdewey Sep 26, 2022
6eaa38b
revise query to match both the facet and the filter
wkdewey Oct 19, 2022
88a8f80
use reverse nested agg for correct item count
wkdewey Oct 20, 2022
dd4bacd
used doc_count from reverse nested if it exists
wkdewey Oct 20, 2022
bd739e0
change key for new elasticsearch version
wkdewey Oct 21, 2022
f0c3124
change order query to avoid deprecated '_term'
wkdewey Oct 24, 2022
7441a96
gitignore master key
wkdewey Oct 26, 2022
255f9db
add basic auth to elasticsearch requests
wkdewey Oct 26, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -48,3 +48,5 @@ bower.json
.byebug_history

.DS_Store

/config/master.key
2 changes: 1 addition & 1 deletion .ruby-gemset
Original file line number Diff line number Diff line change
@@ -1 +1 @@
api
api-v2
2 changes: 1 addition & 1 deletion .ruby-version
Original file line number Diff line number Diff line change
@@ -1 +1 @@
ruby-2.6.8
ruby-2.7.6
2 changes: 1 addition & 1 deletion Gemfile
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ gem 'rails', '~> 6.0.2'
# Use sqlite3 as the database for Active Record
gem 'sqlite3'
# Use Puma as the app server
gem 'puma', '~> 3.7'
gem 'puma', '>= 5.6'
# Build JSON APIs with ease. Read more: https://github.com/rails/jbuilder
# gem 'jbuilder', '~> 2.5'
# Use Redis adapter to run Action Cable in production
Expand Down
7 changes: 4 additions & 3 deletions Gemfile.lock
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ GEM
globalid (1.0.0)
activesupport (>= 5.0)
http-accept (1.7.0)
http-cookie (1.0.4)
http-cookie (1.0.5)
domain_name (~> 0.5)
i18n (1.10.0)
concurrent-ruby (~> 1.0)
Expand All @@ -96,7 +96,8 @@ GEM
nokogiri (1.13.6)
mini_portile2 (~> 2.8.0)
racc (~> 1.4)
puma (3.12.6)
puma (5.6.4)
nio4r (~> 2.0)
racc (1.6.0)
rack (2.2.3)
rack-test (1.1.0)
Expand Down Expand Up @@ -168,7 +169,7 @@ DEPENDENCIES
bootsnap
byebug
listen (>= 3.0.5, < 3.2)
puma (~> 3.7)
puma (>= 5.6)
rails (~> 6.0.2)
rest-client (>= 2.1.0.rc1, < 2.2)
spring
Expand Down
3 changes: 2 additions & 1 deletion app/controllers/application_controller.rb
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,8 @@
class ApplicationController < ActionController::API

def post_search(json, error_method=method(:display_error))
res = RestClient.post("#{ES_URI}/_search", json.to_json, { "content-type" => "json" })
auth_hash = { "Authorization" => "Basic #{Base64::encode64("#{ES_USER}:#{ES_PASSWORD}")}" }
res = RestClient.post("#{ES_URI}/_search", json.to_json, auth_hash.merge({ "content-type" => "json" }))
raise
return JSON.parse(res.body)
rescue => e
Expand Down
117 changes: 108 additions & 9 deletions app/services/search_item_req.rb
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,8 @@ def build_request

# add bool to request body
req["query"]["bool"] = bool
# uncomment below line to log ES query for debugging
# puts req.to_json()
return req
end

Expand All @@ -72,7 +74,7 @@ def facets
dir = "desc"
if @params["facet_sort"].present?
sort_type, sort_dir = @params["facet_sort"].split(@@filter_separator)
type = "_term" if sort_type == "term"
type = "term" if sort_type == "term"
dir = sort_dir if sort_dir == "asc"
end

Expand All @@ -83,8 +85,7 @@ def facets
aggs = {}
Array.wrap(@params["facet"]).each do |f|
# histograms use a different ordering terminology than normal aggs
f_type = type == "_term" ? "_key" : "_count"

f_type = (type == "term") ? "_key" : "_count"
if f.include?("date") || f[/_d$/]
# NOTE: if nested fields will ever have dates we will
# need to refactor this to be available to both
Expand All @@ -98,13 +99,76 @@ def facets
aggs[f] = {
"date_histogram" => {
"field" => field,
"interval" => interval,
"calendar_interval" => interval,
"format" => formatted,
"min_doc_count" => 1,
"order" => { f_type => dir },
}
}
# if nested, has extra syntax
#nested facet, matching on another nested facet

elsif f.include?("[")
# will be an array including the original, and an alternate aggregation name


options = JSON.parse(f)
original = options[0]
agg_name = options[1]
facet = original.split("[")[0]
# may or may not be nested
nested = facet.include?(".")
if nested
path = facet.split(".").first
end
condition = original[/(?<=\[).+?(?=\])/]
subject = condition.split("#").first
predicate = condition.split("#").last
aggregation = {
# common to nested and non-nested
"filter" => {
"term" => {
subject => predicate
}
},
"aggs" => {
agg_name => {
"terms" => {
"field" => facet,
"order" => {f_type => dir},
"size" => size
},
"aggs" => {
"field_to_item" => {
"reverse_nested" => {},
"aggs" => {
"top_matches" => {
"top_hits" => {
"_source" => {
"includes" => [ agg_name ]
},
"size" => 1
}
}
}
}
}
}
}
}
#interpolate above hash into nested query
if nested
aggs[agg_name] = {
"nested" => {
"path" => path
},
"aggs" => {
agg_name => aggregation
}
}
else
#otherwise it is the whole query
aggs[agg_name] = aggregation
end
elsif f.include?(".")
path = f.split(".").first
aggs[f] = {
Expand All @@ -115,7 +179,7 @@ def facets
f => {
"terms" => {
"field" => f,
"order" => { type => dir },
"order" => {f_type => dir},
"size" => size
},
"aggs" => {
Expand All @@ -135,7 +199,7 @@ def facets
aggs[f] = {
"terms" => {
"field" => f,
"order" => { type => dir },
"order" => { f_type => dir },
"size" => size
},
"aggs" => {
Expand All @@ -161,8 +225,43 @@ def filters
# (type 2 will only be used for dates)
filters = fields.map {|f| f.split(@@filter_separator, 3) }
filters.each do |filter|
# NESTED FIELD FILTER
if filter[0].include?(".")
# filter aggregation with nesting
if filter[0].include?("[")
original = filter[0]
facet = original.split("[")[0]
nested = facet.include?(".")
if nested
path = facet.split(".").first
end
condition = original[/(?<=\[).+?(?=\])/]
subject = condition.split("#").first
predicate = condition.split("#").last
term_match = {
# "person.name" => "oliver wendell holmes"
# Remove CR's added by hidden input field values with returns
facet => filter[1].gsub(/\r/, "")
}
term_filter = {
subject => predicate
}
if nested
query = {
"nested" => {
"path" => path,
"query" => {
"bool" => {
"must" => [
{ "match" => term_filter },
{ "match" => term_match }
]
}
}
}
}
end
filter_list << query
#ordinary nested facet
elsif filter[0].include?(".")
path = filter[0].split(".").first
# this is a nested field and must be treated differently
nested = {
Expand Down
20 changes: 16 additions & 4 deletions app/services/search_item_res.rb
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,6 @@ def build_response
# strip out only the fields for the item response
items = combine_highlights
facets = reformat_facets

{
"code" => 200,
"count" => count,
Expand Down Expand Up @@ -66,7 +65,7 @@ def format_bucket_value(facets, field, bucket)
# dates return in wonktastic ways, so grab key_as_string instead of gibberish number
# but otherwise just grab the key if key_as_string unavailable
key = bucket.key?("key_as_string") ? bucket["key_as_string"] : bucket["key"]
val = bucket["doc_count"]
val = bucket.key?("field_to_item") ? bucket["field_to_item"]["doc_count"] : bucket["doc_count"]
source = key
# top_matches is a top_hits aggregation which returns a list of terms
# which were used for the facet.
Expand All @@ -89,8 +88,7 @@ def reformat_facets
facets = {}
raw_facets.each do |field, info|
facets[field] = {}
# nested fields do not have buckets at this level of response structure
buckets = info.key?("buckets") ? info["buckets"] : info.dig(field, "buckets")
buckets = get_buckets(info, field)
if buckets
buckets.each { |b| format_bucket_value(facets, field, b) }
else
Expand All @@ -110,4 +108,18 @@ def remove_nonword_chars(term)
transliterated.gsub(/<\/?(?:em|strong|u)>|\W/, "").downcase
end

def get_buckets(info, field)
buckets = nil
# ordinary facet
if info.key?("buckets")
buckets = info["buckets"]
# nested facet
elsif info.dig(field, "buckets")
buckets = info.dig(field, "buckets")
# filtered facet
else
buckets = info.dig(field, field, "buckets")
end
buckets
end
end
3 changes: 2 additions & 1 deletion app/services/search_service.rb
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,8 @@ def initialize(url, params={}, user_req)
end

def post(url_ending, json)
res = RestClient.post("#{@url}/#{url_ending}", json.to_json, { "content-type" => "json" } )
auth_hash = { "Authorization" => "Basic #{Base64::encode64("#{Rails.application.credentials.elasticsearch[:user]}:#{Rails.application.credentials.elasticsearch[:password]}")}" }
res = RestClient.post("#{@url}/#{url_ending}", json.to_json, auth_hash.merge({ "content-type" => "json" } ))
JSON.parse(res.body)
rescue => e
e
Expand Down
1 change: 1 addition & 0 deletions config/environments/development.rb
Original file line number Diff line number Diff line change
Expand Up @@ -61,4 +61,5 @@
# CDRH CONFIGURATION

config.hosts << "cdrhdev1.unl.edu"
config.hosts << "whitman-dev.unl.edu"
end
8 changes: 4 additions & 4 deletions test/services/search_item_req_test.rb
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ def test_facets
"facet" => [ "title", "subcategory" ]
}).facets
assert_equal(
{"title"=>{"terms"=>{"field"=>"title", "order"=>{"_term"=>"asc"}, "size"=>10}, "aggs"=>{"top_matches"=>{"top_hits"=>{"_source"=>{"includes"=>["title"]}, "size"=>1}}}}, "subcategory"=>{"terms"=>{"field"=>"subcategory", "order"=>{"_term"=>"asc"}, "size"=>10}, "aggs"=>{"top_matches"=>{"top_hits"=>{"_source"=>{"includes"=>["subcategory"]}, "size"=>1}}}}},
{"title"=>{"terms"=>{"field"=>"title", "order"=>"asc", "size"=>10}, "aggs"=>{"top_matches"=>{"top_hits"=>{"_source"=>{"includes"=>["title"]}, "size"=>1}}}}, "subcategory"=>{"terms"=>{"field"=>"subcategory", "order"=>"asc", "size"=>10}, "aggs"=>{"top_matches"=>{"top_hits"=>{"_source"=>{"includes"=>["subcategory"]}, "size"=>1}}}}},
facets
)

Expand All @@ -69,7 +69,7 @@ def test_facets
"facet" => [ "creator.name" ]
}).facets
assert_equal(
{"creator.name"=>{"nested"=>{"path"=>"creator"}, "aggs"=>{"creator.name"=>{"terms"=>{"field"=>"creator.name", "order"=>{"_term"=>"desc"}, "size"=>20}, "aggs"=>{"top_matches"=>{"top_hits"=>{"_source"=>{"includes"=>["creator.name"]}, "size"=>1}}}}}}},
{"creator.name"=>{"nested"=>{"path"=>"creator"}, "aggs"=>{"creator.name"=>{"terms"=>{"field"=>"creator.name", "order"=>"desc", "size"=>20}, "aggs"=>{"top_matches"=>{"top_hits"=>{"_source"=>{"includes"=>["creator.name"]}, "size"=>1}}}}}}},
facets
)

Expand All @@ -83,14 +83,14 @@ def test_facets
# sort term order specified
facets = SearchItemReq.new({ "facet" => ["title", "format"], "facet_sort" => "term|desc" }).facets
assert_equal(
{"title"=>{"terms"=>{"field"=>"title", "order"=>{"_term"=>"desc"}, "size"=>20}, "aggs"=>{"top_matches"=>{"top_hits"=>{"_source"=>{"includes"=>["title"]}, "size"=>1}}}}, "format"=>{"terms"=>{"field"=>"format", "order"=>{"_term"=>"desc"}, "size"=>20}, "aggs"=>{"top_matches"=>{"top_hits"=>{"_source"=>{"includes"=>["format"]}, "size"=>1}}}}},
{"title"=>{"terms"=>{"field"=>"title", "order"=>"desc", "size"=>20}, "aggs"=>{"top_matches"=>{"top_hits"=>{"_source"=>{"includes"=>["title"]}, "size"=>1}}}}, "format"=>{"terms"=>{"field"=>"format", "order"=>"desc", "size"=>20}, "aggs"=>{"top_matches"=>{"top_hits"=>{"_source"=>{"includes"=>["format"]}, "size"=>1}}}}},
facets
)

# sort term no order specified
facets = SearchItemReq.new({ "facet" => ["title", "format"], "facet_sort" => "term" }).facets
assert_equal(
{"title"=>{"terms"=>{"field"=>"title", "order"=>{"_term"=>"desc"}, "size"=>20}, "aggs"=>{"top_matches"=>{"top_hits"=>{"_source"=>{"includes"=>["title"]}, "size"=>1}}}}, "format"=>{"terms"=>{"field"=>"format", "order"=>{"_term"=>"desc"}, "size"=>20}, "aggs"=>{"top_matches"=>{"top_hits"=>{"_source"=>{"includes"=>["format"]}, "size"=>1}}}}},
{"title"=>{"terms"=>{"field"=>"title", "order"=>"desc", "size"=>20}, "aggs"=>{"top_matches"=>{"top_hits"=>{"_source"=>{"includes"=>["title"]}, "size"=>1}}}}, "format"=>{"terms"=>{"field"=>"format", "order"=>"desc", "size"=>20}, "aggs"=>{"top_matches"=>{"top_hits"=>{"_source"=>{"includes"=>["format"]}, "size"=>1}}}}},
facets
)

Expand Down