-
Notifications
You must be signed in to change notification settings - Fork 0
Meeting Minutes Year 2022
-
Probabilistic search and aspiration when they see performances of others ( March, Shapiro ). The theories of probabilistic search / problem solving and aspirations help us to develop hypothesis on the effect of performance transparency. Key authors are March, Shapiro, but the core-author is Simon.
-
Solution transparency will lead to convergence ( Bodreau, Bernstein ). Bodreau uses the cumulative innovation theory and re-combinatory innovation ( Weitzman , Murray). These are innovation economics theories. Bernstein draws upon theories of social influence and social learning. Solution transparency accelerates innovation according to Bernstein via collective learning.
Agenda
Discussion of new notebooks
-
Put titles into the notebook and also description ( summary and how that will relate to the model )
-
Normality test, exponential test, and power-law test
-
Compare the distance of search to Billinger's, Brunswicker 2019, and Bodreau findings.
-
Put references and descriptives on the local relational novelty notebook.
-
Look into the technicality of the similarity.
Agenda
Action Items On Notebook
-
Separate Notebooks for Functions Pre-processing, DataFrame
-
Have separate headings for each construct
-
Restructure Notebook to Remove Repetitive Lines of Codes
-
Produce missing variables which are the two variables mentioned in the issues.
-
Run the correlation analysis
-
March and Simon's papers are both on search and exploration
-
We saw that full transparency has the highest number of rare functions, but the dissimilarity is not the highest. We need to see how Jaccard similarity works and consider library hierarchy?
-
The argument is that the treatments where participants are exposed to others' work are more likely to have converging solutions.
-
Read Finke's paper on creative cognition
-
Complete log-transform for template, rareness and relational novelty
Agenda
Topic on Individual Exploration/Distance of Search
-
Check out the guy that has 0.6, very high exploration in no transparency condition in Bogota 0 in phase 1 - 2. Is this guy using a lot of the template?
-
Check out the scoring sheet for novelty and check out the best performance in the full transparency condition and low performers in terms of the similarity in functions
-
Low performers with performance information such as performance transparency and full transparency group will tend to take more risks. ( ref to Billinger, March, Zur, Lant, Brunswicker, need to read more about them ).
-
Information about the others solution in the solution transparency and full transparency group will generate the tendency for the solutions to converge.
Topic on Template
- Think about the template similarity and the count of the template functions as a construct. Check out the difference in output between the count of template functions and template similarity.
Topic on Rareness
Agenda
- Cross-sectional data, panel data, time-series data
Agenda
-
Rework the section on Explorations, Variety, and Uniqueness.
-
Sabine and Jay agree on finding the dependent variables (relate to literature) based on functions.
-
Innovation about Rareness ( Uzzi and Schiling Paper Check Them Out )
-
We still have to polish on the rareness measure.
-
learn about skew distribution
Agenda
-
Read paper on "How intermittent breaks in interaction improve collective intelligence"
-
Read Uzzi's paper on "Atypical combinations and scientific impact"
-
Read Schilling's paper on "Recombinant search and breakthrough idea generation: An analysis of high impact papers in the social sciences"
-
Get frequency of individual hack group for functions, see if the quantile is the same as the overall sample
-
Get the 0.1% unique functions in individual hack as additional column in the individual person dataframe
-
Get the 0.1 unique functions in overall sample as additional column in the individual person dataframe
Agenda
-
Identify if functions are used in the script or not
-
Remove the functions that are not used
-
Get the log count of the functions per individual
** HIGH PRIORITY **
- We will have another column on an individual basis saying how many of those belong to that particular individual. ( sample of whole hack and sample of group only )
Agenda
- Distributions of functions used in each phase total samples ( opposite to individual-level comparison )
See turnover of functions across group
-
Number of Unique functions in each phase for each group ( Count and List )
-
Number of common functions in each phase for each group ( Count and List )
See differences among groups
-
Look into which functions in which group
-
Look into unique functions in each group
Agenda
-
Read paper on "Variable risk preferences and the focus of attention"
-
Read paper on "Aspiration Level Adaptation: An Empirical Exploration"
-
Read paper on Billinger search and Kyriakou paper to define a measure of distance and search.
-
Find references in Sabine's paper that measure uniqueness ( design context ).
Agenda
• Review the Repo and Clarify Action for Changes
-
Make comments on the path to Box in the pre-processing notebook on pre-survey
-
Add a readme file to the pre-processing folder and explaining what each of the notebooks/scripts do, mention the data source as well.
• Review the Paper on Exploration Metric
• Look at Descriptive for Exploration
- Measuring how wide the search is, we need to get the similarity between the phases but not within the phases.
• Review the results and then clarify next steps
• Show Sabine cleaned up repository https://github.com/ironhacks/analysis-2017
• Show Sabine cleaned up issue https://github.com/ironhacks/analysis-2017
• Set up the repo to run on Sabine's computer.
• Revisit the results that we had.
• Identify next steps.
• Revisit the hypotheses.
• Look at the results.
Recall That
-
A significant p-value of 0.000175 ( way smaller than 0.05 ) supports the fact that the total score of the apps being clicked in Full Transparency is way higher compared to Solution Transparency.
-
Significant p-value of 0.0333 supports the fact that "Full transparency with the score component increases participants' attention towards others’ solutions ( project and codes ) compared to solution transparency only without scores. "
-
Significant p-value of 2.146e-10 supports the fact that "Full transparency with the score component increases the number of attention the apps get compared to solution transparency only without scores. " ( p-value as low as 2.146e-10 from Welch Two Sample T-test )
-
Significant p-value = 0.03656 indicates that the total lines of codes added BETWEEN phase 1 and phase 5 is greater for full transparency compared to solution transparency.
-
From the descriptive analysis of the user requirement score, tech score, infovis score and novelty score, people in the full transparency group score the best compared to all 3 of the other treatment groups. People in the no transparency group has the lowest average score for user requirement, tech, infovis and novelty. Performance and Solution Transparency have scores in between the extreme.
-
The scores in no transparency is significantly lower across all 4 score dimensions compared to full transparency. tech score ( p-value p-value = 0.003057 ), infovis score (p-value = 0.01033 ), novelty score ( p-value = 0.04514 ), user-requirement score ( p-value = 0.035 )
-
The scores in no transparency is not significantly lower across 3 score dimensions compared to solution transparency. novelty-score ( p-value = 0.2634 ), infovis-score ( p-value = 0.08511 ), user-requirement score ( p-value = 0.2363 ). No transparency only has a significant lower score in tech. ( p-value p-value = 0.01595 )
**Repository Structure **
-
Get rid of the "Analysis" folder.
-
Spring-2017 and Fall-2017. Inside Spring-2017 and Fall-2017, we have separate data folders. "Raw data" subfolder. "Proceesed data" subfolder. Remove those debugging data files.
-
We have the third folder called Paper.
Main Structure
spring-2017
- data
- raw-data
- processed-data
- script
- pre-processing
- analysis
- paper
Fall-2017
**Welch T-test **
Comment - Clarify the hypothesis. Solution transparency allows for cognitive fixation, accelerated learning. Bodreau management science 2011 We need to recode the Group not as a continuous variable, but rather as distinct dummy variables.
User requirement score
Full transparency > Performance Transparency > Solution Transparency > No Transparency With Welch t-test, no transparency and full transparency is significant for user requirement score.
Novelty score
Full transparency > Performance Transparency > Solution Transparency > No Transparency With Welch t-test, no transparency and full transparency is significant for novelty score.
Number of Lines of Codes Added
Full transparency > Performance Transparency > No Transparency > Solution Transparency With Welch t-test, solution transparency and full transparency is significant for lines of codes added.
Note that the clicks count was also not significant with the user requirement score/novelty score for the solution transparency case. Maybe because the participants don't know where to explore for the best?
Note - I picked Welch t-test because Welch t-test is more reliable when the two samples have unequal variances and/or unequal sample sizes
**Click Analysis **
Overall Group
- linear relationships exist between novelty score/user requirement score and sum of clicks/ project clicks/ score clicks
- medium to large effect ( based on R2 )
Sub Groups
- linear relationships exist between novelty score/user requirement score and sum of clicks/ score clicks
- medium to large effect ( based on R2 )
**Lines of Codes Analysis **
Overall Group
- linear relationships exist between novelty score/user requirement score and total lines of codes added
- medium to large effect ( based on R2 )
Subgroup
- Novelty score is significant ( large effect ) with total lines of code added between phase 1 and phase 5 for bogota 0, bogota 1, bogota 2, except for bogota 3. Bogota 3 is not too bad either, p value ( 0.07 ), quite close to 0.05 also.
**Multiple Linear Regression **
- user requirement score and novelty score explained the most variance of the total score.
- project clicks count explained the most variance for user requirement score
- project clicks count explained the most variance for novelty score
**Use of Bartlett Factor Score in Regression **
- Factor Score Diagram
-
Discussion on the 3 issues opened on 2nd January 2022.
-
PCA Errors
-
Subsample results on the EFA model
-
ANOVA on weighted arithmetic mean
-
- R notebook for EFA
-
- R notebook for PCA
-
- Differences between PCA and EFA that contributes to the difference in results
-
- The steps for PCA and EFA
-
- Miscellaneous
-
- Checkout dictionaries in R