-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow for in-app download of data #89
Comments
It should be noted that for translations this data would need to be on a per language pair basis, as a user would likely only need the data for their spoken or system language to the target keyboard.
|
Hey @andrewtavis! I was wondering how this feature would eventually work in the back-end side. I know there's the Scribe-Data repo, which could be used as the download location for the data packs, but I was also wondering if the data pack hosting could be somewhere else even. For instance, could Scribe use Wikimedia's Toolforge perhaps to run a service that the apps could periodically check for available data pack updates? And then the apps could present the option to download those updates (or even have the option to do so automatically)? Some things that came to mind were:
These are just some thoughts I had. I don't think a service and automation should be definitively implemented or even if Toolforge should be used for it (Scribe would have to see if it could and if it makes sense to). I was more curious if there were already any thoughts on how to implement this. Of course, the apps just downloading the files that it needs directly from GitHub could very well be enough for what Scribe needs 😆 which in that case, feel free to ignore 😬 |
A suggestion that I got from an Apple employee tonight was to use downloadable assets that could be hosted on the App Store, which I didn't even know existed. I think that Toolforge would likely be a better option though, as when Android gets up and running we'll need something that's platform agnostic :) We could definitely provide an option for automatic updates in the settings, and maybe something that could happen at the start is that we could prompt the user to download the new data in the background whenever they open the keyboard after a new update update (we'd just update both at the same time as is done now).
This is exactly what we're looking to do, and yet it's currently manual (with the help of Python, but I do type a single command to run
For now let's not worry about vetting data. There is a script where I remove profanities from autosuggestions via a Wikidata query, with that being integrated into an eventual system going forward. Big thing for this is let's focus on getting what we can by expanding Wikidata if need be, and we can also code in specific things if need be (I removed the word "nazi" from all autosuggestions, for instance, as WWII is talked about on so many Wikipedia articles that it was randomly popping up).
I'm 100% for decoupling the releases as it will doubtless lead to more flexibility, especially considering we're hoping to one day be releasing iOS, Android and Desktop 😉 I think that downloading files directly from GItHub will likely be the first thing this ends up doing, but I'd love to be proved wrong and jump directly to Toolforge 😊 Thanks for further explorations and all the thought you're giving this! :) :) |
Completely agree! 👍
Huh.. how interesting! 🤔😆 unexpected outcomes reflected in the data. They will surely happen, of course, but it's just interesting to see them.
You know, I think Scribe could be fine starting off with GitHub actually. The client-side mechanism on the apps to go check for available updates would need to be implemented, and that would be regardless if it's GitHub, Toolforge, or whatever else serving the back-end. The details of what it checks for to determine a new update would be different, but the core functionality could stay. Scribe could start with GitHub and then later switch if it'd like. My idea for Toolforge, I think, came firstly more so cause it is a Wikimedia project. With that, I believe there could be some benefits:
On the other hand though, GitHub could just very well be a viable path. We already mentioned that files could be downloaded directly from the repo. As far as the automation for new data from Wikipedia/Wikidata, perhaps GitHub Actions could be used as the tool for that? It seems there's a way to schedule Actions via a cron schedule. The Action could be even to just run My curiosity - was there another earlier idea that you had in mind in how to do the automation for new data checks? In summary - Toolforge could be an appealing option to get the chance to use it, to work closer within more Wikimedia things, or to not rely on GitHub too much. However, there could very well be a way to accomplish this with GitHub. |
We can definitely do a comparison between the offerings of Toolforge and GitHub for this 😊 I don't inherently have a preference for either, but do like to keep things kind of centralized so that we're not bouncing between platforms. GitHub actions will definitely be a major step for us going forward as far as testing :) With that being said, keeping the Wikidata related processes in a Wikimedia flow would also be ok. And I very much expect that we can get support from the community for Toolforge. We do have a tag on Phabricator, which is their task system. I'd say let's make a decision between these two :)
Honestly I hadn't thought too in depth about how we'd do it. There's been enough going on with the project and everything else that I just made this issue to start a discussion :) Generally the idea is that we need to centralize the JSONs, update them regularly in this centralized location, provide a mechanism to get into the app, and let the user know when there's new data to download/do this automatically if possible. Also wasn't sure what the end data structure for the app was going to be, but through discussions for #96 it looks like we'll be figuring out a SQLite solution. Thanks as always for your insights! Really is nice to talk this stuff over with you 🚀 Also, let me know if you'd be interested in Wikimedia events :D Happy to keep you up to date on ones where we'll be participating 😊 |
Agreed! We can definitely compare both offerings and evaluate. However, whether it ends up being GitHub or Toolforge for the long-term, I am thinking atm perhaps that the GitHub route could be the starting point, mostly since I believe that the level of effort would likely be smaller (the files are already being hosted via the git repository anyways). Scribe could then, at first, focus more on completing the client-side mechanism. With Toolforge, I feel there will likely be more - exposing a way to check for updates (perhaps via an API), figuring out data storage, etc. Yet, Toolforge can always later replace GitHub. However!! 😆 this is just what I'm feeling now and can definitely change after further comparing both offerings. Also - the introduction of SQLite is interesting regarding all this (see [1]).
I didn't realize Scribe did. That is awesome!
I see. That makes sense to me! Working against SQLite directly, as opposed to Core Data, could also potentially prove useful when later doing the same in Android. Findings and implementation could carry over more easily. [1] Going back to the topic of centralized location, I wonder if it could make sense to also leverage a DB on the back-end side. Reason is, in thinking about update downloads, I wondered if there could be a way to only download the diff of what is new. That could help with download sizes as data grows. A DB could help with identifying that if there is perhaps a
Of course! 😄 I'm more than happy to help out with Scribe. I'll also be glad and willing to stay involved with this specific feature/side of Scribe. I think it'll be some interesting work 😉🧑💻 Also, you know what - yes! I think I would be interested in the events, actually. It'd be fun 🤘 |
Makes sense to me as well :) For the initial offering on this we focus on GitHub, and then we go from there 🚀
Only downloading the new differences would be great! Having to wait what'd doubtless be a long download is definitely something we'd try to avoid, and this sounds like a great solution 😊 I agree that this would also point us a bit towards Toolforge ⚒️ I guess from here we work on #16 and the data solution that we'll do as a part of #96. Then this'd be unblocked and we can do an initial GitHub solution followed by a more advanced one :) Maybe something we can try at first is to have the app preloaded, and then we can have downloading an update outside of an app update be an initial thing. Once that's decoupled we can then move on to the rest 🚀
Glad Scribe's something that you can get so much from 😊 I'll write on here for the next Wikimedia events coming up. Likely the next one would be Data Reuse Days - assuming that they do another one in 2023 :) |
Hey @wkyoshida and @SaurabhJamadagni 👋😊 Really happy to have gotten v2.2.0 out today! A major step to add in emojis — thank you both for your efforts! Obviously little bits to fix here and there, but this was a major step on the roadmap. Plus the way Scribe can repeat emojis via #283 one after another adds a cool extra feature that system keyboards lack 🙌 There really is some progress being made here! I went to the iOS meetup again tonight and there really is a difference in how people are viewing Scribe with some of these major features in the interface. The obvious next step of overhauling the app interface in #16 seems like the final bit to put "still MVP" to rest as the single page app still gets a bit of a look. Reason I'm writing you both in #89 here is something I was a bit worried about while reworking the data to SQLite now definitely seems like it's cause for concern: we're at 137.9 MB. Aside from #16, the next major things we have on the roadmap are:
I wanted to check with you both about your opinions of the app size. How big is too big? To me we're at or close to that size now, and it's ok to be there, but adding more keyboards and translation data is going to really expand the size. I think it's safe to say that we cannot do the top two options above without doing the data download process, but then I wanted to check :) Does it makes sense to shift this to next in the priority after #16? @wkyoshida, I could talk with some coworkers and we could eventually do a call with them to ask about the ideas of doing the data download process that we've discussed. Two engineers and I were doing some good brainstorming for Scribe over coffee today 😊 Thanks again to you both! 🚀 |
I just went through and did a quick organization of the projects board, btw. Obviously not set in stone, and let me know if something looks off. As we're not doing sprints or anything, I think for now it makes sense to archive the finished issues upon release and do another ordering :) |
Hey @andrewtavis, I went through the popular keyboard options on the app store (ex: Gboard, SwiftKey) and they are somewhere around the size of 80-90 MB. That is barebones, without the additional theme downloads. Whereas Grammarly, a keyboard to help with grammar is around 230 MB. So I would say we are not far off with the size. It definitely is a top three priority. I was thinking what if we work on the cross-translation before this issue? That way the languages that we currently are offering could become complete packages. We then modularise so that specific languages can be downloaded (i.e. this issue). We could move on to adding further languages and creating their packages for downloading after that. But again, the changes will be pushed together anyway right? |
Yeah, adding keyboards or more translations aren't entirely blocked exactly, but I do agree that it would be a good idea to add the data download feature to avoid bloating the app size.
I think having this data download feature after the menu makes sense 👍 It would free us up to add more keyboards with less concern over the app size. Awesome to hear that we could get some feedback from your coworkers, @andrewtavis! They could for sure help. Just a thought though, if we'd like to, the Scribe-Server idea could be something that we hold off on for a bit. We could implement the data download feature to download the
Like mentioned above, I think there could be some hesitancy with adding all of the data for cross-translation into the Scribe-Data repo, but I do think though that we could definitely already do the work for the data extract/transform logic at least 😄
Yeap, we could push them together. I think another option for us though, could be to split releases even. I think there might be some flexibility here for us. Adding the menu + the data download (simply of the data that we have today) could be one release. I think this as a base already frees us up to add more keyboards we'd like with less app size concern. Later adding the ability to select which languages to do any cross-translation for though could be another release. I think this selection ability might be important for cross-translation, as I suspect users likely won't want miscellaneous translations for other languages to get downloaded too. Just some thoughts ✌️ |
This makes sense to me, @wkyoshida. We’d just need to think on how to update the data on the iOS side of things :)
My assumption was we’d make source language based tables where the word is the key and all the translations are the other column elements? How does this sound?
This is a good point and we’ll need to talk about the options that the user is being asked for data downloads. Do we ask what their source language is and only get that plus whatever their target keyboard language’s translations are? I guess that’d be fine, but we would need to check on that during the download phase with the default option then being their phone’s language or English if we don’t have it. Thanks for the thoughts! 🙏✌️ |
I did think of this option too tbf, but I'm a little unsure. There are some downsides, such as:
Some other ideas I think could be to:
This is interesting actually, because I would advocate even for the option to select multiple source languages. Personally, for instance:
There might be several different reasons for this, but this could have to do with which language someone learned another one in, which someone's known language has more similarities with their target, how some source languages have better translation data for the target, etc. Giving the option for multiple source languages could be good for those reasons. Adding to thoughts on the DB structure, I guess that multiple source languages would likely also make downloading the translation data pack not as easy as simply "only download the table with source language X" since multiple sources would be in play. |
I think that source language-target language pairings could work as I don’t think that there would be more than two or three Scribe keyboards used by a given user usually, and on the DB side the maintenance would be cumbersome, but should be doable. I’ll take a look at the SO question more thoroughly though :) Sorry I’m super exhausted after the week/activist meetup (which went really well, btw 😊), so I couldn’t focus as much on it as I’d like to :) I guess I hadn’t fully considered the option of a user using different source languages. Very interesting :) :) I think that that the case of selecting a source language would also work though :) The default would just be set to their phone’s language for the source language, but they could select a new one from a dropdown before downloading. Assuming your phone’s in Portuguese, you’d just need to select English as the source language within the German keyboard download interface 😊 |
Sorry @andrewtavis, @wkyoshida can't provide any input regarding this issue right now. I have kind of fallen behind in the discussion. Will catch up during our meeting! Loving the progress and discussion though! 😄 |
We can discuss this on Tuesday a bit as well :) :) |
Based on my tasks for this week, the following is the new data download screen. The circle to the right of the |
I really like the new look of the designs! Thanks to both of you for such great suggestions today 😊 |
Terms
Description
This issue is for the discussion and implementation of data downloads within the Scribe app. As more keyboards are added to Scribe, the size of the app will slowly grow and become cumbersome. To counteract this, it would be best if keyboard data was downloaded by the user in app. Downloading a keyboard would allow a user to then add the given keyboard in the settings.
The text was updated successfully, but these errors were encountered: