Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XQuery to SVG #8

Open
ebeshero opened this issue Oct 10, 2015 · 28 comments
Open

XQuery to SVG #8

ebeshero opened this issue Oct 10, 2015 · 28 comments

Comments

@ebeshero
Copy link
Member

@ezimmer @mollyodonnell : We've got loads of new and better letters now, thanks to coordinated efforts of Molly and Lisa and the rest of the active Mitford team! In the morning I'll "cull" or "harvest" or otherwise collect the good files and load them into our eXist database. And we'll re-run the XQuery we wrote, Erica, that makes a simple chart, but we'll move on to plot some SVG from it.

By the way (or maybe not), in the coordinated effort, Lisa and I discovered that Miss James might be the elder/eldest sister of what could be a trio of sisters: We learn that Miss James lost a younger sister named Emily, and that Susan James could be a younger sister of Miss James. It's not entirely certain but seems likely...and we'll be able to see more as we explore with our new Annotation Enhancement Tool!

@ezimmer
Copy link
Collaborator

ezimmer commented Oct 10, 2015

That's amazingly exciting! (How oddly Bronte, too: guessing you've seen that Charlotte and MRM died in the same year? Not going to make anything of Miss James' "losing a younger sister named Emily"... :) )

Very much looking forward to the next few weeks. @ebeshero @mollyodonnell

@ezimmer
Copy link
Collaborator

ezimmer commented Oct 10, 2015

(Just to be clear, that's not a serious suggestion! :) A remarkably odd parallel, though.)

@ebeshero
Copy link
Member Author

I know--The James sisters were apparently working as governesses or seeking governess work, so the Bronte parallels abound! @ezimmer

@ezimmer
Copy link
Collaborator

ezimmer commented Oct 10, 2015

That's fascinating (and might even suggest a basis for further research--who knows?).

Would there be a chance some time Monday might work well for talking SVG briefly? (Not sure if the holiday is better or worse for you--would love to touch base some time this upcoming week!) @ebeshero

@mollyodonnell
Copy link
Collaborator

@ebeshero @ezimmer This is awesome! So cool that the team pulled together. Lisa is so fast, and Elisa was able to untangle my crazy code issues in a flash. Elisa, I pinged Lisa earlier this week because one letter I was going to update the header for wasn't in the spreadsheet yet. I'll ping you so you know which I mean. The other thing is I still have two of Lisa's letters to proof/update headers. Was going to try to wrap those today, but will ping you on them now, uploading my latest versions, in case it's too late. I know yesterday was the deadline, but I'm in New Orleans for a conference and have had a mtn of grading...excuses, excuses, ah. More soon.

@ebeshero
Copy link
Member Author

@mollyodonnell @ezimmer Molly: No worries, but do ping me from Box when you upload repaired files, because I've uploaded the current batch of letters files (and literary files) that were well-formed into our eXist database. Erica, we have new collections now, and we might be updating those now and again over the next couple of weeks, but that's okay. We need to work on that XQuery we started and start drawing some SVG with it. And I need to get that started. I spent what time I had today on late edits and prepping the database. I'm going to break and work on grading and other stuff for a bit and then come back to it shortly...I'll ping again soon!

@mollyodonnell
Copy link
Collaborator

@ebeshero will do.

@ebeshero
Copy link
Member Author

@ezimmer Erica--nearly missed your note, but just saw it! No, alas--Monday is crazy. Our break isn't until the following week. But I'm going to try to mock up something with XQuery shortly and maybe we can discuss it here.

@athenerica2003
Copy link

No worries here--thank you both so much for all that you have done! Will
get going with what we have, too--anything we do will become more nuanced
and grounded from this point, so that's great. Here's looking forward!

@ebeshero @mollyodonnell

On Sat, Oct 10, 2015 at 6:05 PM, Elisa Beshero-Bondar <
[email protected]> wrote:

@ezimmer https://github.com/ezimmer Erica--nearly missed your note, but
just saw it! No, alas--Monday is crazy. Our break isn't until the following
week. But I'm going to try to mock up something with XQuery shortly and
maybe we can discuss it here.


Reply to this email directly or view it on GitHub
#8 (comment).

@ebeshero
Copy link
Member Author

@ezimmer @mollyodonnell Making some progress here! I'm refining our tester XQuery: It was a little trickier than I expected to get the top 3 (or more) in a given category, but we've got it now! I want to talk to you both about outputting a good readable plot of our output: It may be easier and clearer for comparison to do this as a treemap rather than radiating wedges around a dial--though we can tinker with this. (Sometimes radial plots can introduce weird distortions, and for the moment I just want something simple and straightforward to calculate.) I'm first going to try a treemap, so we can see what it looks like. Google treemaps, and click the image results

I'll explain more and send a mockup this weekend!

@mollyodonnell
Copy link
Collaborator

@ebeshero @ezimmer My apologies for the delay on the two remaining letters. I might still take care of them this week and ping you, though it probably won't be worth redoing what you've done for so few that are already correctly tagged, though lack the complete header data. Again, apologies.

Wedges sound great to me (trivial pursuit superfan). Looking forward to more, too.

@mollyodonnell
Copy link
Collaborator

@ebeshero pinged in box last night for remaining two. Just updating here for GitHub record.

@mollyodonnell
Copy link
Collaborator

@ebeshero and @ezimmer updating my academia.edu & added abstract and links to this talk along with my others, tagging you. If you have any issues with that, please let me know and I'll take it down / amend. I just linked to the online program.

@ezimmer
Copy link
Collaborator

ezimmer commented Oct 16, 2015

@mollyodonnell That's great! Glad you did.

On Friday, October 16, 2015, mollyodonnell [email protected] wrote:

@ebeshero https://github.com/ebeshero and @ezimmer
https://github.com/ezimmer updating my academia.edu & added abstract
and links to this talk along with my others, tagging you. If you have any
issues with that, please let me know and I'll take it down / amend. I just
linked to the online program.


Reply to this email directly or view it on GitHub
#8 (comment).

@athenerica2003
Copy link

And @ebondar, that sounds like a terrific plan. Will check in again as soon
as at a real computer!

On Friday, October 16, 2015, Erica Zimmer [email protected] wrote:

@mollyodonnell That's great! Glad you did.

On Friday, October 16, 2015, mollyodonnell <[email protected]
javascript:_e(%7B%7D,'cvml','[email protected]');> wrote:

@ebeshero https://github.com/ebeshero and @ezimmer
https://github.com/ezimmer updating my academia.edu & added abstract
and links to this talk along with my others, tagging you. If you have any
issues with that, please let me know and I'll take it down / amend. I
just
linked to the online program.


Reply to this email directly or view it on GitHub
#8 (comment).


Reply to this email directly or view it on GitHub
#8 (comment).

@ezimmer
Copy link
Collaborator

ezimmer commented Oct 18, 2015

(Sorry--meant to reply to @ebeshero.)

Huge cheers to the top three in each group! (Thank you also for all the
work you have been doing.)

Since radiating wedges around a dial introduce more distortion than
necessary, am now searching for examples--and code--of possibilities that
aren't circular.

From our initial conversation, it seemed the two most useful features of a
visualization would be the following:

  1. clusters of item types (that is, the top 3 persNames together, the top 3
    placeNames together, etc.)

  2. length of distance from the central item as conveying the secondary
    item's relative co-occurrence with the central item.

(In other words, the most frequently co-occurring item within a particular
category could be shortest/closest, with the second most frequent
co-occurrence the next shortest/closest, and so on.)

More soon.

On Fri, Oct 16, 2015 at 8:12 PM, Erica Zimmer [email protected]
wrote:

And @ebondar, that sounds like a terrific plan. Will check in again as soon
as at a real computer!

On Friday, October 16, 2015, Erica Zimmer [email protected]
wrote:

@mollyodonnell That's great! Glad you did.

On Friday, October 16, 2015, mollyodonnell <[email protected]
javascript:_e(%7B%7D,'cvml','[email protected]');> wrote:

@ebeshero https://github.com/ebeshero and @ezimmer
https://github.com/ezimmer updating my academia.edu & added abstract
and links to this talk along with my others, tagging you. If you have
any
issues with that, please let me know and I'll take it down / amend. I
just
linked to the online program.


Reply to this email directly or view it on GitHub
#8 (comment).


Reply to this email directly or view it on GitHub
#8 (comment).


Reply to this email directly or view it on GitHub
#8 (comment).

@ezimmer
Copy link
Collaborator

ezimmer commented Oct 18, 2015

Hi @ebeshero--whenever you have a second, could I ask if the "Mitford
Co-Citation Counts" query saved in eXist/eXide is the right one to run for
the current counts?

Am asking only to work from a clearer sense of the proportions--that is,
how big the differences between categories/number of occurrences are.

Thank you!

On Sat, Oct 17, 2015 at 8:34 PM, Mary Zimmer [email protected] wrote:

(Sorry--meant to reply to @ebeshero.)

Huge cheers to the top three in each group! (Thank you also for all the
work you have been doing.)

Since radiating wedges around a dial introduce more distortion than
necessary, am now searching for examples--and code--of possibilities that
aren't circular.

From our initial conversation, it seemed the two most useful features of a
visualization would be the following:

  1. clusters of item types (that is, the top 3 persNames together, the top
    3 placeNames together, etc.)

  2. length of distance from the central item as conveying the secondary
    item's relative co-occurrence with the central item.

(In other words, the most frequently co-occurring item within a particular
category could be shortest/closest, with the second most frequent
co-occurrence the next shortest/closest, and so on.)

More soon.

On Fri, Oct 16, 2015 at 8:12 PM, Erica Zimmer [email protected]
wrote:

And @ebondar, that sounds like a terrific plan. Will check in again as
soon
as at a real computer!

On Friday, October 16, 2015, Erica Zimmer [email protected]
wrote:

@mollyodonnell That's great! Glad you did.

On Friday, October 16, 2015, mollyodonnell <[email protected]
javascript:_e(%7B%7D,'cvml','[email protected]');> wrote:

@ebeshero https://github.com/ebeshero and @ezimmer
https://github.com/ezimmer updating my academia.edu & added
abstract
and links to this talk along with my others, tagging you. If you have
any
issues with that, please let me know and I'll take it down / amend. I
just
linked to the online program.


Reply to this email directly or view it on GitHub
<#8 (comment)
.


Reply to this email directly or view it on GitHub
#8 (comment).


Reply to this email directly or view it on GitHub
#8 (comment).

@ezimmer
Copy link
Collaborator

ezimmer commented Oct 18, 2015

(Update: just ran the aforementioned query--definitely not it! :) )

Would be curious, whenever you have time.

Thanks again!

@ebeshero

On Sun, Oct 18, 2015 at 1:50 PM, Mary Zimmer [email protected] wrote:

Hi @ebeshero--whenever you have a second, could I ask if the "Mitford
Co-Citation Counts" query saved in eXist/eXide is the right one to run for
the current counts?

Am asking only to work from a clearer sense of the proportions--that is,
how big the differences between categories/number of occurrences are.

Thank you!

On Sat, Oct 17, 2015 at 8:34 PM, Mary Zimmer [email protected] wrote:

(Sorry--meant to reply to @ebeshero.)

Huge cheers to the top three in each group! (Thank you also for all the
work you have been doing.)

Since radiating wedges around a dial introduce more distortion than
necessary, am now searching for examples--and code--of possibilities that
aren't circular.

From our initial conversation, it seemed the two most useful features of
a visualization would be the following:

  1. clusters of item types (that is, the top 3 persNames together, the top
    3 placeNames together, etc.)

  2. length of distance from the central item as conveying the secondary
    item's relative co-occurrence with the central item.

(In other words, the most frequently co-occurring item within a
particular category could be shortest/closest, with the second most
frequent co-occurrence the next shortest/closest, and so on.)

More soon.

On Fri, Oct 16, 2015 at 8:12 PM, Erica Zimmer [email protected]
wrote:

And @ebondar, that sounds like a terrific plan. Will check in again as
soon
as at a real computer!

On Friday, October 16, 2015, Erica Zimmer [email protected]
wrote:

@mollyodonnell That's great! Glad you did.

On Friday, October 16, 2015, mollyodonnell <[email protected]
javascript:_e(%7B%7D,'cvml','[email protected]');> wrote:

@ebeshero https://github.com/ebeshero and @ezimmer
https://github.com/ezimmer updating my academia.edu & added
abstract
and links to this talk along with my others, tagging you. If you
have any
issues with that, please let me know and I'll take it down / amend. I
just
linked to the online program.


Reply to this email directly or view it on GitHub
<#8 (comment)
.


Reply to this email directly or view it on GitHub
#8 (comment).


Reply to this email directly or view it on GitHub
#8 (comment).

@ebeshero
Copy link
Member Author

@ezimmer: Its TesterMissJames-coRef, and it's in my queries folder. It's only outputting the top three people, but I had to rewrite our code and start over to simplify things.

@ebeshero
Copy link
Member Author

@ezimmer Basically I want to build on that code--it's saving to an output file (like our last one was). But now that it's getting the top three of "X", I want to keep building on it, and also output SVG shapes from it. I had some help on Thursday--sat down with David Birnbaum a while and we worked on debugging my first attempt. He also advised not going with the radial wedges--and showed me the tree maps. Those will be easier to plot anyway.

I want to work on this a little later today and tomorrow (which is Fall Break for me)--For right now I'm trying to clear some time-sensitive stuff for class and clear the decks...more soon.

@ezimmer
Copy link
Collaborator

ezimmer commented Oct 18, 2015

No worries--thank you so much! I was only asking to see if there would be
things I could do to help.

(The db won't let me run the query, I think because the results are already
saved in a folder for which permissions may be set to you. No worries there
either--I don't want to mess anything up!)

Will just keep working around, and will look forward to touching base
whenever you have time.

Thank you again, @ebeshero!

On Sun, Oct 18, 2015 at 1:57 PM, Elisa Beshero-Bondar <
[email protected]> wrote:

@ezimmer https://github.com/ezimmer Basically I want to build on that
code--it's saving to an output file (like our last one was). But now that
it's getting the top three of "X", I want to keep building on it, and also
outputting shapes from it. I had some help--sat down with David Birnbaum a
while and we worked on debugging my first attempt. He also advised not
going with the radial wedges--and showed me the tree maps. Those will be
easier to plot anyway.

I want to work on this a little later today and tomorrow (which is Fall
Break for me)--For right now I'm trying to clear some time-sensitive stuff
for class and clear the decks...more soon.


Reply to this email directly or view it on GitHub
#8 (comment).

@ebeshero
Copy link
Member Author

@ezimmer Reading your earlier post: Here's how that might work with a tree map:
Imagine three rectangles, for each of the top three categories.

We could position those relative to the Miss James node, so the closest rectangle to her is the most frequently associated and the furthest away is the least frequent.

Within the rectangles, we divide the space according to the most frequent and least frequent co-occurring reference--as a tree map does.

I'm not sure if the rectangles need to be the same sizes. Another way to handle the comparative frequency of co-occurrence might be to position the boxes in the same relative position (not plotting them by distance), but sizing the rectangles based on their co-occurrence with Miss James. That was what I had in my head to draw after Thursday!

We can probably try plotting it both ways and see what it looks like. I think for web interfaces, though, we might actually want something tidy and compact that can sit with other kinds of data on a web page. So: imagine bringing up a list of files that reference Miss James, together with some detailed information (in text) about the named entities who are represented in our visualization. It might make sense for this to share space with a window holding, on-click, a view of passages showing points of co-reference. Just some thoughts for designing a web interface around this!

@ebeshero
Copy link
Member Author

@ezimmer Let me take a look at the permissions... or just move this into a conference-prep directory that we share. It's kind of hard for me to find it in my own queries directory because there's so much in it.

@ezimmer
Copy link
Collaborator

ezimmer commented Oct 18, 2015

All of these thoughts sound terrific, and I definitely see the logic!

I also like the idea of the visual annotation appearing alongside
text(s)--it sounds like that form of web rendering could be used with a
list of texts, alongside an individual text, or both. (Does that fit with
what you were thinking?)

Such an approach also chimes with the idea of "visual annotation": a form
of graphic output that appears alongside texts or key items, and that
provides immediate commentary re: recurring elements of contexts in which
said key items are found.

Basically, the format you're describing sounds like a super-compact graphic
rendering of archival insight, and I think that's what we're going for! (I
also heard some terrific thoughts this weekend about evidence "hiding in
plain view" that seem useful for our presentation.)

Perhaps it's best if I focus right now on the paper itself, based on these
ideas, the visualization model you're outlining, and the larger contexts
we'd discussed. (I'm doing it in "detailed outline" form first--we can all
then work from that basis!)

If that sounds good, I'll forge ahead in that area.

Very excited about this!

@ebeshero
@mollyodonnell

On Sun, Oct 18, 2015 at 2:10 PM, Elisa Beshero-Bondar <
[email protected]> wrote:

@ezimmer https://github.com/ezimmer Reading your earlier post: Here's
how that might work with a tree map:
Imagine three rectangles, for each of the top three categories.

We could position those relative to the Miss James node, so the closest
rectangle to her is the most frequently associated and the furthest away is
the least frequent.

Within the rectangles, we divide the space according to the most frequent
and least frequent co-occurring reference.

I'm not sure if the rectangles need to be the same sizes. Another way to
handle the comparative frequency of co-occurrence might be to position the
boxes in the same relative position (not plotting them by distance), but
sizing the rectangles based on their co-occurrence with Miss James.
That was what I had in my head to draw after Thursday!

We can probably try plotting it both ways and see what it looks like. I
think for web interfaces, though, we might actually want something tidy and
compact that can sit with other kinds of data on a web page. So: imagine
bringing up a list of files that reference Miss James, together with some
detailed information (in text) about the named entities who are represented
in our visualization. It might make sense for this to share space with a
window holding, on-click, a view of passages showing points of
co-reference. Just some thoughts for designing a web interface around this!


Reply to this email directly or view it on GitHub
#8 (comment).

@ezimmer
Copy link
Collaborator

ezimmer commented Oct 18, 2015

@ebeshero Thank you--please don't spend too much time on that right now,
though! It sounds like you have a ton to do, and there's more than enough
for me to work on in other areas.

On Sun, Oct 18, 2015 at 2:15 PM, Elisa Beshero-Bondar <
[email protected]> wrote:

@ezimmer https://github.com/ezimmer Let me take a look at the
permissions... or just move this into a conference-prep directory that we
share. It's kind of hard for me to find it in my own queries directory
because there's so much in it.


Reply to this email directly or view it on GitHub
#8 (comment).

@ebeshero
Copy link
Member Author

@ezimmer I've created a new directory called "AnnotationToolQs" and I've made you the "group" that accesses it so we can both use it.

Here's a view of the output from eXist: http://dxcvm05.psc.edu:8080/exist/rest/db/output/tester.html

Just an HTML chart for the moment, and just the Persons output.

@ebeshero
Copy link
Member Author

@ezimmer It's funny how I only see your long posts AFTER the short ones! (lol). I'll quote inline here:

"I also like the idea of the visual annotation appearing alongside
text(s)--it sounds like that form of web rendering could be used with a
list of texts, alongside an individual text, or both. (Does that fit with
what you were thinking?)"

YES!! 👍 That's the idea. :-)

Such an approach also chimes with the idea of "visual annotation": a form
of graphic output that appears alongside texts or key items, and that
provides immediate commentary re: recurring elements of contexts in which
said key items are found.

Basically, the format you're describing sounds like a super-compact graphic
rendering of archival insight, and I think that's what we're going for! (I
also heard some terrific thoughts this weekend about evidence "hiding in
plain view" that seem useful for our presentation.)

Perhaps it's best if I focus right now on the paper itself, based on these
ideas, the visualization model you're outlining, and the larger contexts
we'd discussed. (I'm doing it in "detailed outline" form first--we can all
then work from that basis!)

If that sounds good, I'll forge ahead in that area.

**Yes--that's a great plan! I'll concentrate on generating some visuals! And I can do the general intro to Mitford, and handle some of the details on how we're producing our graphs, though you and I can probably talk about that together (and we'll want to prepare for that together when we're all in Lyon).

Very excited about this!"

**me too! :-)

@ezimmer
Copy link
Collaborator

ezimmer commented Oct 18, 2015

Thanks, @ebeshero ! persName seems the most salient category, so it's a great one to start with, too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants