Intelligent mobile keyboards

We analyze a dataset of tens of thousands of mobile users in order to discover and solve issues with mobile keyboards.

Data processing steps

Download data from Palin et al.
Rename the tables by running rename_tables.sql
Mark invalid test sections by running mark_invalid.sql
Get a sample of participants by running sample.sql
Export log_sample to csv (tab delimiter)
Clean the csv of the sample by running clean_sample.py
Export participants to csv (tab delimiter)
Parse the user agent of the participants of the participants by running parse_user_agent.py

Future improvements

Use a more complete word frequency list (or derive one manually from a word corpus). Currently, some words (e.g. "Azerbaijan") are so uncommon that they are not listed in our word frequency list. We currently assign these words an artificially low frequency of 1, but this is a completely arbitrary value. A more complete word frequency list would allow us to characterize the frequency of more words.
Incorrect levenshtein distance calculation.
Normalize the leadup and base speeds in the selection model. Since we know that suggestion users naturally type slower, we should normalize the typing speed according to the user's natural speed.
Handle multi-word suggestions. This usually occurs for common word combinations (e.g. user types "I am going", and selects "to the" from the suggestion list). We do not currently handle these cases explicitly when classifying ITE's, and therefore the behaviour is undefined.
Decide how to handle multiple suggestions for one word. It rarely happens, but sometimes users will use the suggestion list more than once for the same word. Sometimes it occurs consecutively, while on other occasions the user inserts characters in between using the suggestion list. Currently we only register only the last suggestion, but other methods include registering only the first suggestion, or completely removing these cases from the analysis.
Localize middle-of-string inputs. Currently, we can only handle user inputs that correspond to the end of the existing string. Users inserting letters, spaces, and backspaces in any location other than the end of the string (i.e. by manually moving their cursor) are marked as undefined because we do not know where in the strong they were inserted. If we are able to pinpoint where in the string these inputs occur (e.g. during the edit distance calculation), then we can include these keystrokes in our analysis.

Limitations of the dataset

Given that this is a transcription task, we miss out on some strategies of suggestion usage, such as using the suggestion list as a "dictionary" in order to spell a word that the user does not know how to spell. Since the sentences are transcribed, the user can always consult the template sentence in order to confirm the spelling, thereby eliminating this strategy.
We only focus on end-of-sentence keystrokes. That is, we only consider keystrokes that modify the end of a sentence. This includes typing letters, backspacing, and predicting words. However, changes that were made in the middle of the sentece (i.e. by moving the cursor manually) are removed from the analysis.
It is difficult to get statistically significant suggestion usage patterns for individual participants, since suggestion is not used often enough for there to be large enough sample size. For example, if a user uses suggestions 5 times, and all of those occur on words of length 5 or less, does that mean that they prefer to use it on shorter words? Or is it just too small of a sample?

Name		Name	Last commit message	Last commit date
Latest commit History 73 Commits
data		data
scripts		scripts
.gitignore		.gitignore
01_processing_log.ipynb		01_processing_log.ipynb
02_processing_words.ipynb		02_processing_words.ipynb
03_processing_participants.ipynb		03_processing_participants.ipynb
04_processing_touch_data.ipynb		04_processing_touch_data.ipynb
05_performance_analysis.ipynb		05_performance_analysis.ipynb
06_substrategy_description.ipynb		06_substrategy_description.ipynb
07_selection_model.ipynb		07_selection_model.ipynb
A_issues.ipynb		A_issues.ipynb
B_ite_classification_performance.ipynb		B_ite_classification_performance.ipynb
C_filtering_effects.ipynb		C_filtering_effects.ipynb
README.md		README.md
legacy.ipynb		legacy.ipynb
processing.py		processing.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Intelligent mobile keyboards

Data processing steps

Future improvements

Limitations of the dataset

About

Releases

Packages

Languages

itko/typing_automation

Folders and files

Latest commit

History

Repository files navigation

Intelligent mobile keyboards

Data processing steps

Future improvements

Limitations of the dataset

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages