-
Notifications
You must be signed in to change notification settings - Fork 6
/
urls.txt
436 lines (436 loc) · 34.9 KB
/
urls.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
https://aiimpacts.org/three-kinds-of-competitiveness/
https://www.alignmentforum.org/posts/qYzqDtoQaZ3eDDyxa/distinguishing-ai-takeover-scenarios
https://www.alignmentforum.org/posts/4kYkYSKSALH4JaQ99/toy-problem-detective-story-alignment
https://www.alignmentforum.org/posts/4DegbDJJiMX2b3EKm/tai-safety-bibliographic-database
https://www.openphilanthropy.org/blog/ai-governance-grantmaking
https://www.alignmentforum.org/posts/Tr7tAyt5zZpdTwTQK/the-solomonoff-prior-is-malign
https://www.gwern.net/Scaling-hypothesis
https://www.alignmentforum.org/posts/qEjh8rpxjG4qGtfuK/the-backchaining-to-local-search-technique-in-ai-alignment
https://www.alignmentforum.org/s/boLPsyNwd6teK5key
https://www.alignmentforum.org/posts/Xts5wm3akbemk4pDa/non-obstruction-a-simple-concept-motivating-corrigibility
https://www.lesswrong.com/posts/EdAHNdbkGR6ndAPJD/memetic-downside-risks-how-ideas-can-evolve-and-cause-harm
https://jbkjr.github.io/posts/2020/12/mapping_conceptual_territory_AI_safety_alignment/
https://www.lesswrong.com/posts/3qypPmmNHEmqegoFF/failures-in-technology-forecasting-a-reply-to-ord-and
https://forum.effectivealtruism.org/posts/EfCCgpvQX359xuZ4g/existential-risks-are-not-just-about-humanity
https://www.lesswrong.com/s/xezt7HYfpWR6nwp7Z
https://www.alignmentforum.org/posts/k2SNji3jXaLGhBeYP/extrapolating-gpt-n-performance
https://www.alignmentforum.org/posts/Xg2YycEfCnLYrCcjy/defining-capability-and-alignment-in-gradient-descent
https://www.microsoft.com/en-us/research/blog/deepspeed-extreme-scale-model-training-for-everyone/
https://www.alignmentforum.org/posts/3aDeaJzxinoGNWNpC/confucianism-in-ai-alignment
https://www.lesswrong.com/posts/YMokuZdoY9tEDHjzv/agi-predictions
https://www.alignmentforum.org/posts/jS2iiDPqMvZ2tnik2/ai-alignment-philosophical-pluralism-and-the-relevance-of
https://www.alignmentforum.org/posts/pTYDdcag9pTzFQ7vw/2020-ai-alignment-literature-review-and-charity-comparison
https://www.alignmentforum.org/posts/PJLABqQ962hZEqhdB/debate-update-obfuscated-arguments-problem
https://www.alignmentforum.org/s/CmrW8fCmSLK7E25sa
https://www.alignmentforum.org/posts/YNuJjRuxsWWzfvder/recursive-quantilizers-ii
https://www.alignmentforum.org/posts/2JGu9yxiJkoGdQR4s/learning-normativity-a-research-agenda
https://www.alignmentforum.org/posts/cYsGrWEzjb324Zpjx/comparing-utilities
https://www.alignmentforum.org/posts/SzecSPYxqRa5GCaSF/clarifying-inner-alignment-terminology
https://www.alignmentforum.org/posts/2dKvTYYN4PTT7g4of/knowledge-manipulation-and-free-will
https://forum.effectivealtruism.org/posts/42reWndoTEhFqu6T8/ai-governance-opportunity-and-theory-of-impact
https://deepmind.com/blog/article/alphafold-a-solution-to-a-50-year-old-grand-challenge-in-biology
https://www.lesswrong.com/posts/PzAnWgqvfESgQEvdg/any-rebuttals-of-christiano-and-ai-impacts-on-takeoff-speeds#zFEhTxNqEp3eZbjLZ
https://www.alignmentforum.org/posts/vX7KirQwHsBaSEdfK/what-is-narrow-value-learning
https://www.alignmentforum.org/posts/5eX8ko7GCxwR5N9mN/what-is-ambitious-value-learning
https://www.alignmentforum.org/posts/8LEPDY36jBYpijrSw/what-counts-as-defection
https://bounded-regret.ghost.io/ai-forecasting/
https://www.alignmentforum.org/posts/yEa7kwoMpsBgaBCgb/towards-a-new-impact-measure
https://www.alignmentforum.org/posts/eD9T4kiwB6MHpySGE/the-human-side-of-interaction
https://www.alignmentforum.org/posts/hvGoYXi2kgnS3vxqb/some-ai-research-areas-and-their-relevance-to-existential-1
https://longtermrisk.org/reasons-to-be-nice-to-other-value-systems/
https://www.lesswrong.com/posts/qKvn7rxP2mzJbKfcA/persuasion-tools-ai-takeover-without-agi-or-agency
https://www.alignmentforum.org/posts/ZiLLxaLB5CCofrzPp/reward-uncertainty
https://www.alignmentforum.org/posts/CAwwFpbteYBQw2Gkp/p-b-plan-to-p-b-better
https://www.lesswrong.com/posts/kgsaSbJqWLtJfiCcz/naturalized-induction-a-challenge-for-evidential-and-causal
https://casparoesterheld.com/2017/01/18/is-it-a-bias-or-just-a-preference-an-interesting-issue-in-preference-idealization/
https://longtermrisk.org/international-cooperation-vs-ai-arms-race/
https://longtermrisk.org/gains-from-trade-through-compromise/
https://forum.effectivealtruism.org/posts/7MdLurJGhGmqRv25c/multiverse-wide-cooperation-in-a-nutshell
https://casparoesterheld.com/2018/08/06/moral-realism-and-ai-alignment/
https://longtermrisk.org/how-would-catastrophic-risks-affect-prospects-for-compromise/
https://www.lesswrong.com/posts/L23FgmpjsTebqcSZb/how-roodman-s-gwp-model-translates-to-tai-timelines
https://casparoesterheld.com/2018/04/26/goertzels-golem-implements-evidential-decision-theory-applied-to-policy-choice/
https://www.alignmentforum.org/posts/oH8KMnXHnw964QyS6/preface-to-the-sequence-on-value-learning
https://www.alignmentforum.org/posts/rzqACeBGycZtqCfaX/fun-with-12-ooms-of-compute
https://longtermrisk.org/flavors-of-computation-are-flavors-of-consciousness/
https://www.alignmentforum.org/posts/Tdu3tGT4i24qcLESh/equilibrium-and-prior-selection-problems-in-multipolar-1
https://forum.effectivealtruism.org/posts/CmNBmSf6xtMyYhvcs/descriptive-population-ethics-and-its-relevance-for-cause
https://casparoesterheld.com/2017/01/17/decision-theory-and-the-irrelevance-of-impossible-outcomes/
https://longtermrisk.org/coordination-challenges-for-preventing-ai-conflict/
https://casparoesterheld.com/2017/06/25/complications-in-evaluating-neglectedness/
https://www.alignmentforum.org/posts/EzoCZjTdWTMgacKGS/clr-s-recent-work-on-multi-agent-systems
https://longtermrisk.org/do-artificial-reinforcement-learning-agents-matter-morally/
https://longtermrisk.org/differential-intellectual-progress-as-a-positive-sum-project/
https://www.lesswrong.com/posts/iqpizeN4hkbTjkugo/did-edt-get-it-right-all-along-introducing-yet-another
https://www.lesswrong.com/posts/LvtsFKxg2t3nWhKRq/commitment-and-credibility-in-multipolar-ai-scenarios
https://longtermrisk.org/collaborative-game-specification/
https://forum.effectivealtruism.org/posts/225Aq4P4jFPoWBrb5/cause-prioritization-for-downside-focused-value-systems
https://forum.effectivealtruism.org/posts/Xf6QE6txgvfCGvZpk/case-studies-of-self-governance-to-reduce-technology-risk
https://www.alignmentforum.org/posts/HhWhaSzQr6xmBki8F/birds-brains-planes-and-ai-against-appeals-to-the-complexity
https://casparoesterheld.com/2017/02/06/betting-on-the-past-by-arif-ahmed/
https://www.lesswrong.com/posts/cyJgdhgYaM2CbZ7tP/are-causal-decision-theorists-trying-to-outsmart-conditional
https://casparoesterheld.com/2017/05/12/anthropic-uncertainty-in-the-evidential-blackmail/
https://www.alignmentforum.org/posts/wfpdejMWog4vEDLDg/ai-and-compute-trend-isn-t-predictive-of-what-is-happening
https://www.alignmentforum.org/posts/aFaKhG86tTrKvtAnT/against-gdp-as-a-metric-for-timelines-and-takeoff-speeds
https://www.alignmentforum.org/posts/br7KRSeNymwSvZnf5/embedded-vs-external-decision-problems
https://www.alignmentforum.org/posts/pDaxobbB9FG5Dvqyv/discussion-objective-robustness-and-inner-alignment
https://towardsdatascience.com/assessing-generalization-in-reward-learning-intro-and-background-da6c99d9e48
https://aisafety.camp/2020/05/30/aisc4-research-summaries/
https://www.lesswrong.com/posts/5Kv2qNfRyXXihNrx2/ai-safety-debate-and-its-applications
https://www.lesswrong.com/posts/i5dLfi6m6FCexReK9/a-brief-review-of-the-reasons-multi-objective-rl-could-be
https://aiimpacts.org/what-if-you-turned-the-worlds-hardware-into-ai-minds/
https://www.alignmentforum.org/posts/6Xgy6CAf2jqHhynHL/what-2026-looks-like
https://aiimpacts.org/walsh-2017-survey/
https://aiimpacts.org/time-for-ai-to-cross-the-human-range-in-english-draughts/
https://aiimpacts.org/time-for-ai-to-cross-the-human-performance-range-in-go/
https://aiimpacts.org/time-for-ai-to-cross-the-human-performance-range-in-chess/
https://aiimpacts.org/takeaways-from-safety-by-default-interviews/
https://aiimpacts.org/relevant-pre-agi-possibilities/
https://aiimpacts.org/reinterpreting-ai-and-compute/
https://aiimpacts.org/trends-in-dram-price-per-gigabyte/
https://aiimpacts.org/time-for-ai-to-cross-the-human-range-in-starcraft/
https://aiimpacts.org/diabetic-retinopathy-as-a-case-study-in-time-for-ai-to-cross-the-range-of-human-performance/
https://aiimpacts.org/primates-vs-birds-is-one-brain-architecture-better-than-the-other/
https://aiimpacts.org/misalignment-and-misuse-whose-values-are-manifest/
https://aiimpacts.org/interpreting-ai-compute-trends/
https://aiimpacts.org/etzioni-2016-survey/
https://aiimpacts.org/error-in-armstrong-and-sotala-2012/
https://aiimpacts.org/cortes-pizarro-and-afonso-as-precedents-for-ai-takeover/
https://aiimpacts.org/conversation-with-rohin-shah/
https://aiimpacts.org/conversation-with-paul-christiano/
https://aiimpacts.org/precedents-for-economic-n-year-doubling-before-4n-year-doubling/
https://aiimpacts.org/on-the-inapplicability-of-corporate-rights-cases-to-digital-minds/
https://aiimpacts.org/likelihood-of-discontinuous-progress-around-the-development-of-agi/
https://aiimpacts.org/human-level-hardware-timeline/
https://aiimpacts.org/discontinuous-progress-in-history-an-update/
https://aiimpacts.org/description-vs-simulated-prediction/
https://aiimpacts.org/conversation-with-robin-hanson/
https://aiimpacts.org/conversation-with-adam-gleave/
https://aiimpacts.org/ai-conference-attendance/
https://aiimpacts.org/conversation-with-ernie-davis/
https://aiimpacts.org/beyond-fire-alarms-freeing-the-groupstruck/
https://aiimpacts.org/automated-intelligence-is-not-ai/
https://aiimpacts.org/atari-early/
https://aiimpacts.org/agi-11-survey/
https://aiimpacts.org/2019-recent-trends-in-geekbench-score-per-cpu-price/
https://aiimpacts.org/time-for-ai-to-cross-the-human-performance-range-in-imagenet-image-classification/
https://aiimpacts.org/research-topic-hardware-software-and-ai/
https://aiimpacts.org/trends-in-the-cost-of-computing/
https://aiimpacts.org/transmitting-fibers-in-the-brain-total-length-and-distribution-of-lengths/
https://aiimpacts.org/cost-of-teps/
https://aiimpacts.org/surveys-on-fractional-progress-towards-hlai/
https://aiimpacts.org/returns-to-scale-in-research/
https://aiimpacts.org/resolutions-of-mathematical-conjectures-over-time/
https://aiimpacts.org/progress-in-general-purpose-factoring/
https://aiimpacts.org/survey-of-prescient-actions/
https://aiimpacts.org/list-of-analyses-of-time-to-human-level-ai/
https://aiimpacts.org/investigation-into-the-relationship-between-neuron-count-and-intelligence-across-differing-cortical-architectures/
https://aiimpacts.org/historical-growth-trends/
https://aiimpacts.org/discontinuity-from-nuclear-weapons/
https://aiimpacts.org/costs-of-extinction-risk-mitigation/
https://aiimpacts.org/cases-of-discontinuous-technological-progress/
https://aiimpacts.org/ai-timeline-surveys/
https://aiimpacts.org/miri-ai-predictions-dataset/
https://aiimpacts.org/multipolar-research-projects/
https://aiimpacts.org/human-level-ai/
https://aiimpacts.org/are-human-engineered-flight-designs-better-or-worse-than-natural-ones/
https://aiimpacts.org/hanson-ai-expert-survey/
https://aiimpacts.org/partially-plausible-fictional-ai-futures/
https://aiimpacts.org/examples-of-early-action-on-a-risk/
https://aiimpacts.org/evidence-on-good-forecasting-practices-from-the-good-judgment-project/
https://aiimpacts.org/evidence-against-current-methods-leading-to-human-level-artificial-intelligence/
https://aiimpacts.org/discontinuous-progress-investigation/
https://aiimpacts.org/costs-of-human-level-hardware/
https://aiimpacts.org/coordinated-human-action-example-superhuman-intelligence/
https://aiimpacts.org/conversation-with-tom-griffiths/
https://aiimpacts.org/conversation-with-steve-potter/
https://aiimpacts.org/brain-performance-in-teps/
https://aiimpacts.org/ai-vignettes-project/
https://aiimpacts.org/ai-impacts-research-bounties/
https://aiimpacts.org/2019-recent-trends-in-gpu-price-per-flops/
https://aiimpacts.org/2016-expert-survey-on-progress-in-ai/
https://aiimpacts.org/ai-risk-terminology/
https://aiimpacts.org/accuracy-of-ai-predictions/
https://aiimpacts.org/recent-trend-in-the-cost-of-computing/
https://medium.com/partnership-on-ai/lessons-for-the-ai-community-from-the-h5n1-controversy-32432438a82e
http://ai.googleblog.com/2020/08/understanding-view-selection-for.html
https://www.alignmentforum.org/posts/dKxX76SCfCvceJXHv/ai-alignment-2018-19-review
https://www.alignmentforum.org/posts/WiXePTj7KeEycbiwK/survey-on-ai-existential-risk-scenarios
https://qualiacomputing.com/2019/08/30/why-care-about-meme-hazards-and-thoughts-on-how-to-handle-them/
https://www.alignmentforum.org/posts/BEMvcaeixt3uEqyBk/what-does-optimization-mean-again-optimizing-and-goodhart
http://mediangroup.org/brain1.html
http://mediangroup.org/insights2.html
https://medium.com/@lucarade/issues-with-iterated-distillation-and-amplification-5aa01ab37173
http://mediangroup.org/gpu.html
https://www.alignmentforum.org/posts/zdeYiQgwYRs2bEmCK/applying-overoptimization-to-selection-vs-control-optimizing
https://www.alignmentforum.org/posts/2neeoZ7idRbZf4eNC/re-introducing-selection-vs-control-for-optimization
http://mediangroup.org/insights
https://forum.effectivealtruism.org/posts/oovy5XXdCL3TPwgLE/a-case-for-strategy-research-what-it-is-and-why-we-need-more
https://www.alignmentforum.org/posts/EF5M6CmKRd6qZk27Z/my-research-methodology
https://ai-alignment.com/mundane-solutions-to-exotic-problems-395bad49fbe7
https://ai-alignment.com/low-stakes-alignment-f3c36606937f
https://www.alignmentforum.org/posts/BxersHYN2qcFoonwg/experimentally-evaluating-whether-honesty-generalizes
https://www.alignmentforum.org/posts/AyNHoTWWAJ5eb99ji/another-outer-alignment-failure-story
https://www.alignmentforum.org/posts/QvtHSsZLFCAHmzes7/a-naive-alignment-strategy-and-optimism-about-generalization
https://www.alignmentforum.org/posts/QqwZ7cwEA2cxFEAun/teaching-ml-to-answer-questions-honestly-instead-of
https://www.alignmentforum.org/posts/7jSvfeyh8ogu8GcE6/decoupling-deliberation-from-competition
https://www.alignmentforum.org/posts/roZvoF6tRH6xYtHMF/avoiding-the-instrumental-policy-by-hiding-information-about
https://www.alignmentforum.org/posts/SRJ5J9Tnyq7bySxbt/answering-questions-honestly-given-world-model-mismatches
https://www.alignmentforum.org/posts/6E6D3qLPM3urXDPpK/what-makes-counterfactuals-comparable-1
https://www.alignmentforum.org/posts/ajvvtKuNzh7aHmooT/stuck-exploration
https://aisrp.org/?page_id=169
https://www.alignmentforum.org/posts/rSMbGFfsLMB3GWZtX/what-is-interpretability
https://www.alignmentforum.org/posts/vFXK8eQdLhicYNNqF/vulnerabilities-in-cdt-and-ti-unaware-agents
https://www.alignmentforum.org/posts/z2ofM2oZQwmcWFt8N/ai-services-as-a-research-paradigm
https://www.alignmentforum.org/posts/maBNBgopYxb9YZP8B/sparsity-and-interpretability-1
https://www.alignmentforum.org/posts/XAeWHqQTWjJmzB4k6/reference-post-trivial-decision-problem
https://www.gleech.org/grids
https://www.lesswrong.com/posts/QEmfyhqMcSpfnY2dX/how-teams-went-about-their-research-at-ai-safety-camp
https://www.alignmentforum.org/posts/uRnprGSiLGXv35foX/how-can-interpretability-help-alignment
https://www.alignmentforum.org/posts/PZYD5kBpeHWgE5jX4/extraction-of-human-preferences
https://www.alignmentforum.org/posts/iJDmL7HJtN5CYKReM/empirical-observations-of-objective-robustness-failures
https://www.lesswrong.com/posts/LBwpubeZSi3ottfjs/aisc5-retrospective-mechanisms-for-avoiding-tragedy-of-the
https://www.alignmentforum.org/posts/pfmFe5fgEn2weJuer/go-west-young-man-preferences-in-imperfect-maps
https://forum.effectivealtruism.org/posts/CWFn9qAKsRibpCGq8/does-economic-history-point-toward-a-singularity
https://www.alignmentforum.org/posts/v6Q7T335KCMxujhZu/clarifying-what-failure-looks-like-part-1
https://www.alignmentforum.org/posts/w8QBmgQwb83vDMXoz/dynamic-inconsistency-of-the-inaction-and-initial-state
https://cullenokeefe.com/ai-benefits-index
https://www.alignmentforum.org/posts/9m2fzjNSJmd3yxxKG/acdt-a-hack-y-acausal-decision-theory
https://www.alignmentforum.org/posts/7cXBoDQ6udquZJ89c/a-toy-model-of-the-control-problem
https://deepmindsafetyresearch.medium.com/what-mechanisms-drive-agent-behaviour-e7b8d9aee88
https://vkrakovna.wordpress.com/2020/07/05/tradeoff-between-desirable-properties-for-baseline-choices-in-impact-measures/
https://forum.effectivealtruism.org/posts/2e9NDGiXt8PjjbTMC/technical-agi-safety-research-outside-ai
https://vkrakovna.wordpress.com/2018/04/02/specification-gaming-examples-in-ai/
https://www.alignmentforum.org/posts/BKjJJH2cRpJcAnP7T/thoughts-on-human-models
http://www.incompleteideas.net/IncIdeas/BitterLesson.html
https://www.lesswrong.com/posts/DJB82jKwgJE5NsWgT/some-cruxes-on-impactful-alternatives-to-ai-policy-work
https://vkrakovna.wordpress.com/2015/11/29/ai-risk-without-an-intelligence-explosion/
https://www.alignmentforum.org/posts/wTKjRFeSjKLDSWyww/possible-takeaways-from-the-coronavirus-pandemic-for-slow-ai
https://deepmindsafetyresearch.medium.com/progress-on-causal-influence-diagrams-a7a32180b0d1
https://forum.effectivealtruism.org/posts/oR9tLNRSAep293rr5/why-those-who-care-about-catastrophic-and-existential-risk-2
https://futureoflife.org/2019/06/18/iclr-safe-ml-workshop-report/
https://humanityplus.org/philosophy/transhumanist-faq/
https://www.alignmentforum.org/posts/mdQEraEZQLg7jtozn/subagents-and-impact-measures-full-and-fully-illustrated
https://www.lesswrong.com/posts/CSEdLLEkap2pubjof/research-agenda-v0-9-synthesising-a-human-s-preferences-into
https://cullenokeefe.com/blog/debate-evidence
https://www.alignmentforum.org/posts/k54rgSg7GcjtXnMHX/model-splintering-moving-from-one-imperfect-model-to-another-1
https://www.alignmentforum.org/posts/cnC2RMWEGiGpJv8go/model-mis-specification-and-inverse-reinforcement-learning
https://www.alignmentforum.org/posts/Kr76XzME7TFkN937z/predictors-exist-cdt-going-bonkers-forever
https://www.lesswrong.com/posts/PT8vSxsusqWuN7JXp/my-understanding-of-paul-christiano-s-iterated-amplification
https://www.alignmentforum.org/posts/gzWb5kWwzhdaqmyTt/if-i-were-a-well-intentioned-ai-i-image-classifier
https://longtermrisk.org/a-dialogue-on-suffering-subroutines/
https://casparoesterheld.com/2017/10/22/a-behaviorist-approach-to-building-phenomenological-bridges/
https://longtermrisk.org/artificial-intelligence-and-its-implications-for-future-suffering/
https://casparoesterheld.com/2017/06/27/a-survey-of-polls-on-newcombs-problem/
https://longtermrisk.org/a-lower-bound-on-the-importance-of-promoting-cooperation/
https://jsteinhardt.stat.berkeley.edu/blog/measurement-and-optimization
https://www.alignmentforum.org/posts/xxnPxELC4jLKaFKqG/learning-biases-and-rewards-simultaneously
https://www.alignmentforum.org/posts/nyDnLif4cjeRe9DSv/generalizing-the-power-seeking-theorems
https://www.alignmentforum.org/posts/DfcywmqRSkBaCB6Ma/intuitions-about-goal-directed-behavior
https://www.ri.cmu.edu/publications/integrating-human-observer-inferences-into-robot-motion-planning/
https://www.alignmentforum.org/posts/ANupXf8XfZo2EJxGv/humans-can-be-assigned-any-values-whatsoever
https://www.alignmentforum.org/posts/4783ufKpx8xvLMPc6/human-ai-interaction
https://www.alignmentforum.org/posts/EhNCnCkmu7MwrQ7yz/future-directions-for-ambitious-value-learning
https://www.alignmentforum.org/posts/MxadmSXHnoCupsWqx/future-directions-for-narrow-value-learning
https://www.alignmentforum.org/posts/eBd6WvzhuqduCkYv3/following-human-norms
https://www.alignmentforum.org/posts/BMj6uMuyBidrdZkiD/corrigibility-as-outside-view
http://bair.berkeley.edu/blog/2019/10/21/coordination/
https://www.alignmentforum.org/posts/TE5nJ882s5dCMkBB8/conclusion-to-the-sequence-on-value-learning
https://www.alignmentforum.org/posts/NxF5G6CJiof6cemTw/coherence-arguments-do-not-imply-goal-directed-behavior
https://www.alignmentforum.org/posts/mJ5oNYnkYrd4sD5uE/clarifying-some-key-hypotheses-in-ai-alignment
https://www.alignmentforum.org/posts/26eupx3Byc8swRS7f/bottle-caps-aren-t-optimisers
https://www.alignmentforum.org/posts/8GdPargak863xaebm/an-analytic-perspective-on-ai-alignment
https://www.alignmentforum.org/posts/tHxXdAn8Yuiy9y2pZ/ai-safety-without-goal-directed-behavior
https://www.alignmentforum.org/posts/Wap8sSDoiigrJibHA/garrabrant-and-shah-on-human-modeling-in-agi
https://www.alignmentforum.org/posts/JbcWQCxKWn3y49bNB/disentangling-arguments-for-the-importance-of-ai-safety
https://www.alignmentforum.org/posts/vqpEC3MPioHX7bv4t/environments-as-a-bottleneck-in-agi-development
https://medium.com/@deepmindsafetyresearch/building-safe-artificial-intelligence-52f5f75058f1
https://www.alignmentforum.org/posts/GqxuDtZvfgL2bEQ5v/arguments-against-myopic-training
https://www.alignmentforum.org/posts/S9GxuAEeQomnLkeNt/a-space-of-proposals-for-building-safe-advanced-ai
https://www.alignmentforum.org/posts/9zpT9dikrrebdq3Jf/will-humans-build-goal-directed-agents
https://www.alignmentforum.org/posts/LpM3EAakwYdS6aRKf/what-multipolar-failure-looks-like-and-robust-agent-agnostic
https://www.lesswrong.com/posts/wWnN3y5GmqLLCJFAz/two-boxing-smoking-and-chewing-gum-in-medical-newcomb
https://casparoesterheld.com/2018/03/31/three-wagers-for-multiverse-wide-superrationality/
https://www.lesswrong.com/posts/JPan54R525D68NoEt/the-date-of-ai-takeover-is-not-the-day-the-ai-takes-over
https://www.lesswrong.com/posts/BcYfsi7vmhDvzQGiF/taboo-outside-view
https://www.alignmentforum.org/posts/PKy8NuNPknenkDY74/soft-takeoff-can-still-lead-to-decisive-strategic-advantage
https://www.lesswrong.com/posts/M4w2rdYgCKctbADMn/sequence-introduction-non-agent-and-multiagent-models-of
https://longtermrisk.org/weak-identifiability-and-its-consequences-in-strategic-settings/
https://longtermrisk.org/using-surrogate-goals-deflect-threats/
https://casparoesterheld.com/2016/11/21/thoughts-on-updatelessness/
https://casparoesterheld.com/2018/02/15/the-law-of-effect-randomization-and-newcombs-problem/
https://longtermrisk.org/the-future-of-growth-near-zero-growth-rates/
https://casparoesterheld.com/2017/03/15/the-average-utilitarians-solipsism-wager/
https://www.alignmentforum.org/posts/brXr7PJ2W4Na2EW2q/the-commitment-races-problem
https://www.lesswrong.com/posts/FkZCM4DMprtEp568s/shaping-economic-incentives-for-collaborative-agi
https://www.alignmentforum.org/posts/oqghwKKifztYWLsea/four-motivations-for-learning-normativity
https://www.alignmentforum.org/posts/a7jnbtoKFyvu5qfkd/formal-inner-alignment-prospectus
https://www.alignmentforum.org/s/kxs3eeEti9ouwWFzr
https://www.alignmentforum.org/posts/QvwSr5LsxyDeaPK5s/existential-risk-from-ai-survey-results
https://arbital.com/p/direct_limit_oppose/
https://arbital.com/p/context_disaster/
https://www.alignmentforum.org/posts/u9Azdu6Z7zFAhd4rK/bayesian-evolving-to-extinction
https://www.alignmentforum.org/posts/cQwT8asti3kyA62zc/automating-auditing-an-ambitious-concrete-technical-research
https://arbital.com/p/cev/
https://www.lesswrong.com/posts/S7csET9CgBtpi7sCh/challenges-to-christiano-s-capability-amplification-proposal
https://www.alignmentforum.org/posts/gEw8ig38mCGjia7dj/answering-questions-honestly-instead-of-predicting-human
https://www.alignmentforum.org/posts/CvKnhXTu9BPcdKE4W/an-untrollable-mathematician-illustrated
https://www.alignmentforum.org/posts/fRsjBseRuvRhMPPE5/an-overview-of-11-proposals-for-building-safe-advanced-ai
https://www.alignmentforum.org/posts/A8iGaZ3uHNNGgJeaD/an-orthodox-case-against-utility-functions
https://www.alignmentforum.org/posts/N64THGX7XNCqRtvPG/alignment-proposals-and-complexity-classes
https://www.alignmentforum.org/posts/EL4HNa92Z95FKL9R2/a-semitechnical-introductory-dialogue-on-solomonoff-1
https://intelligence.org/2017/12/06/chollet/
https://arbital.com/p/aligning_adds_time/
https://www.alignmentforum.org/posts/YWwzccGbcHMJMpT45/ai-safety-via-market-making
https://unstableontology.com/2019/07/11/the-ai-timelines-scam/
https://www.alignmentforum.org/posts/ySLYSsNeFL5CoAQzN/a-critique-of-functional-decision-theory
https://www.alignmentforum.org/posts/KnPN7ett8RszE79PH/demons-in-imperfect-search
https://www.alignmentforum.org/posts/k8F8TBzuZtLheJt47/deconfusing-human-values-research-agenda-v1
https://www.alignmentforum.org/posts/WxW6Gc6f2z3mzmqKs/debate-on-instrumental-convergence-between-lecun-russell
https://www.openphilanthropy.org/could-advanced-ai-drive-explosive-economic-growth
https://www.ibm.com/blogs/policy/bias-in-ai/
https://www.alignmentforum.org/posts/teCsd4Aqg9KDxkaC9/bootstrapped-alignment
https://medium.com/@tdietterich/benefits-and-risks-of-artificial-intelligence-460d288cccf3
https://www.alignmentforum.org/posts/Nwgdq6kHke5LY692J/alignment-by-default
https://www.alignmentforum.org/posts/42YykiTqtGMyJAjDM/alignment-as-translation
https://www.alignmentforum.org/posts/BnDF5kejzQLqd5cjH/alignment-as-a-bottleneck-to-usefulness-of-gpt-3
https://www.alignmentforum.org/posts/8fpzBHt7e6n7Qjoo9/ai-risk-for-epistemic-minimalists
https://www.lesswrong.com/posts/bkG4qj9BFEkNva3EX/ai-development-incentive-gradients-are-not-uniformly
https://medium.com/partnership-on-ai/aligning-ai-to-human-values-means-picking-the-right-metrics-855859e6f047
https://www.alignmentforum.org/posts/BRiMQELD5WYyvncTE/ai-unsafety-via-non-zero-sum-debate
https://www.lesswrong.com/posts/7GEviErBXcjJsbSeD/ai-alignment-research-overview-by-jacob-steinhardt
https://www.alignmentforum.org/posts/3SG4WbNPoP8fsuZgs/agency-in-conway-s-game-of-life
https://ai-alignment.com/advisor-games-b33382fef68c
http://www.alexirpan.com/2020/05/07/rl-potpourri.html
https://www.alignmentforum.org/posts/cKfryXvyJ522iFuNF/a-gym-gridworld-environment-for-the-treacherous-turn
https://www.alignmentforum.org/posts/idb5Ppp9zghcichJ5/a-general-model-of-safety-oriented-ai-development
https://unstableontology.com/2020/03/05/a-critical-agential-account-of-free-will-causation-and-physics/
https://forum.effectivealtruism.org/posts/nSot23sAjoZRgaEwa/2016-ai-risk-literature-review-and-charity-comparison
https://www.alignmentforum.org/posts/SvuLhtREMy8wRBzpC/ambitious-vs-narrow-value-learning
https://ai-alignment.com/an-unaligned-benchmark-b49ad992940b
https://ai-alignment.com/alphago-zero-and-capability-amplification-ede767bb8446
https://ai-alignment.com/approval-maximizing-representations-56ee6a6a1fe6
https://www.alignmentforum.org/posts/CtGH3yEoo4mY2taxe/weak-hch-accesses-exp
https://arbital.com/p/unforeseen_maximum/
https://intelligence.org/2013/08/25/transparency-in-safety-critical-systems/
https://www.alignmentforum.org/posts/tKwJQbo6SfWF2ifKh/toward-a-new-technical-explanation-of-technical-explanation
https://ai-alignment.com/better-priors-as-a-safety-problem-24aa1c300710
https://ai-alignment.com/directions-and-desiderata-for-ai-control-b60fca0da8f4
https://ai-alignment.com/benign-model-free-rl-4aae8c97e385
https://openai.com/blog/faulty-reward-functions/
https://www.alignmentforum.org/posts/6ccG9i5cTncebmhsH/frequent-arguments-about-alignment
https://www.alignmentforum.org/posts/JKj5Krff5oKMb8TjT/imitative-generalisation-aka-learning-the-prior-1
https://ai-alignment.com/implicit-extortion-3c80c45af1e3
http://ai.googleblog.com/2018/09/introducing-unrestricted-adversarial.html
https://ai-alignment.com/techniques-for-optimizing-worst-case-performance-39eafec74b99
https://www.alignmentforum.org/posts/h9DesGT3WT9u2k7Hr/the-easy-goal-inference-problem-is-still-hard
https://ai-alignment.com/inaccessible-information-c749c6a88ce
https://ai-alignment.com/informed-oversight-18fcb5d3d1e1
https://ai-alignment.com/learning-the-prior-48f61b445c04
https://ai-alignment.com/the-strategy-stealing-assumption-a26b8b1ed334
https://ai-alignment.com/towards-formalizing-universality-409ab893a456
https://ai-alignment.com/two-guarantees-c4c03a6b434f
https://ai-alignment.com/universality-and-model-based-rl-b08701394ddd
https://ai-alignment.com/universality-and-consequentialism-within-hch-c0bee00365bd
https://ai-alignment.com/universality-and-security-amplification-551b314a3bab
https://ai-alignment.com/training-robust-corrigibility-ce0e0a3b9b4d
https://www.alignmentforum.org/posts/Br4xDbYu4Frwrb64a/writeup-progress-on-ai-safety-via-debate-1
https://ai-alignment.com/unsupervised-translation-as-a-safety-problem-99ae1f9b6b68
https://www.alignmentforum.org/posts/HBxe6wdjxK239zajf/what-failure-looks-like
https://intelligence.org/2017/10/13/fire-alarm/
https://www.yudkowsky.net/singularity/aibox
https://arbital.com/p/task_agi/
https://intelligence.org/2018/10/03/rocket-alignment/
https://arbital.com/p/optimized_agent_appears_coherent/
https://arbital.com/p/hyperexistential_separation/
https://intelligence.org/2015/12/31/safety-engineering-target-selection-and-alignment-theory/
https://www.lesswrong.com/posts/bBdfbWfWxHN9Chjcq/robustness-to-scale
https://www.alignmentforum.org/posts/xJyY5QkQvNJpZLJRo/radical-probabilism-1
https://arbital.com/p/patch_resistant/
https://www.lesswrong.com/posts/zEvqFtT4AtTztfYC4/optimization-amplifies
https://arbital.com/p/task_goal/
https://www.alignmentforum.org/posts/dJSD5RK6Qoidb3QY5/synthesizing-amplification-and-debate
https://www.econlib.org/archives/2016/03/so_far_unfriend.html
https://intelligence.org/2017/11/26/security-mindset-and-the-logistic-success-curve/
https://intelligence.org/2017/11/25/security-mindset-ordinary-paranoia/
https://arbital.com/p/updated_deference/
https://arbital.com/p/ontology_identification/
https://arbital.com/p/nonadversarial/
https://arbital.com/p/nearest_unblocked/
https://forum.effectivealtruism.org/posts/Ayu5im98u8FeMWoBZ/my-personal-cruxes-for-working-on-ai-safety
https://www.alignmentforum.org/posts/AyfDnnAdjG7HHeD3d/miri-comments-on-cotra-s-case-for-aligning-narrowly
https://www.alignmentforum.org/posts/WmBukJkEFM72Xr397/mesa-search-vs-mesa-control
https://arbital.com/p/minimality_principle/
https://arbital.com/p/meta_unsolved/
https://arbital.com/p/edge_instantiation/
https://arbital.com/p/diamond_maximizer/
https://arbital.com/p/consequentialist/
https://www.alignmentforum.org/posts/m7oGxvouzzeQKiGJH/how-should-ai-debate-be-judged
https://arbital.com/p/general_intelligence/
https://www.alignmentforum.org/posts/xRyLxfytmLFZ6qz5s/the-theory-practice-gap
https://www.alignmentforum.org/posts/HHunb8FPnhWaDAQci/the-alignment-problem-in-different-capability-regimes
https://www.alignmentforum.org/posts/k7oxdbNaGATZbtEg3/redwood-research-s-current-project
https://ought.org/updates/2020-11-09-forecasting
https://www.lesswrong.com/posts/DFkGStzvj3jgXibFG/factored-cognition
https://ought.org/updates/2020-01-11-arguments
https://www.alignmentforum.org/posts/C9YMrPAyMXfB8cLPb/more-on-disambiguating-discontinuity
https://www6.inrae.fr/mia-paris/Equipes/Membres/Anciens/Laurent-Orseau/Mortal-universal-agents-wireheading
https://www.openphilanthropy.org/blog/modeling-human-trajectory
https://www.alignmentforum.org/posts/ynt9TD6PrYw6iT49m/malign-generalization-without-internal-search
https://jsteinhardt.wordpress.com/2015/06/24/long-term-and-short-term-challenges-to-ensuring-the-safety-of-ai-systems/
https://www.alignmentforum.org/posts/cfXwr6NC9AqZ9kr8g/literature-review-on-goal-directedness
https://www.alignmentforum.org/posts/HkWB5KCJQ2aLsMzjt/locality-of-goals
https://www.alignmentforum.org/posts/gnvrixhDfG7S2TpNL/latent-variables-and-model-mis-specification
https://www.alignmentforum.org/posts/AHhCrJ2KpTjsCSwbt/inner-alignment-explain-like-i-m-12-edition
https://forum.effectivealtruism.org/posts/ZJiCfwTy5dC4CoxqA/information-security-careers-for-gcr-reduction
https://www.alignmentforum.org/posts/6m5qqkeBTrqQsegGi/inner-alignment-requires-making-assumptions-about-human
https://www.alignmentforum.org/posts/7CJBiHYxebTmMfGs3/infinite-data-compute-arguments-in-alignment
https://www.openphilanthropy.org/brain-computation-report
https://www.alignmentforum.org/posts/ajQzejMYizfX4dMWK/how-does-iterated-amplification-exceed-human-abilities
https://www.alignmentforum.org/posts/yW3Tct2iyBMzYhTw7/how-does-bee-learning-compare-with-machine-learning
https://www.alignmentforum.org/posts/d4NgfKY3cq9yiBLSM/goals-and-short-descriptions
https://aipulse.org/genetically-modified-organisms-a-precautionary-tale-for-ai-governance-2/
https://slatestarcodex.com/2017/04/01/g-k-chesterton-on-ai-risk/
https://www.alignmentforum.org/posts/h3ejmEeNniDNFXTgp/fractional-progress-estimates-for-ai-timelines-and-implied
https://www.alignmentforum.org/posts/jHSi6BwDKTLt5dmsG/grokking-the-intentional-stance
https://www.alignmentforum.org/posts/X5WTgfX5Ly4ZNHWZD/focus-you-are-allowed-to-be-bad-at-accomplishing-your-goals
https://www.alignmentforum.org/posts/Y4YHTBziAscS5WPN7/epistemological-framing-for-ai-alignment-research
https://www.alignmentforum.org/posts/CDSXoC54CjbXQNLGr/epistemology-of-hch
https://www.alignmentforum.org/posts/dSAJdi99XmqftqXXq/eight-claims-about-multi-agent-agi-safety
https://www.alignmentforum.org/posts/YgNYA6pj2hPSDQiTE/distinguishing-definitions-of-takeoff
https://www.alignmentforum.org/posts/L9HcyaiWBLYe7vXid/distinguishing-claims-about-training-vs-deployment
https://www.lesswrong.com/posts/xoQRz8tBvsznMXTkt/dissolving-confusion-around-functional-decision-theory
https://gcrinstitute.org/the-ethics-of-sustainability-for-artificial-intelligence/
https://gcrinstitute.org/collective-action-on-artificial-intelligence-a-primer-and-review/
https://gcrinstitute.org/moral-consideration-of-nonhumans-in-the-ethics-of-artificial-intelligence/
https://gcrinstitute.org/2020-survey-of-artificial-general-intelligence-projects-for-ethics-risk-and-policy/
https://gcrinstitute.org/artificial-intelligence-needs-environmental-ethics/
https://globalprioritiesinstitute.org/nick-beckstead-and-teruji-thomas-a-paradox-for-tiny-probabilities-and-enormous-values/
https://longtermrisk.org/weak-identifiability-and-its-consequences-in-strategic-settings/
https://longtermrisk.org/collaborative-game-specification/
https://cset.georgetown.edu/publication/ai-accidents-an-emerging-threat/
https://cset.georgetown.edu/publication/truth-lies-and-automation/
https://cset.georgetown.edu/publication/harnessed-lightning/
https://cset.georgetown.edu/publication/ethical-norms-for-new-generation-artificial-intelligence-released/
https://cset.georgetown.edu/publication/white-paper-on-trustworthy-artificial-intelligence/
https://cset.georgetown.edu/publication/ethics-and-artificial-intelligence/
https://cset.georgetown.edu/publication/ai-verification/
https://cset.georgetown.edu/publication/key-concepts-in-ai-safety-an-overview/
https://cset.georgetown.edu/publication/contending-frames/
https://cset.georgetown.edu/publication/classifying-ai-systems/
https://cset.georgetown.edu/publication/federal-prize-competitions/
https://cset.georgetown.edu/publication/key-concepts-in-ai-safety-robustness-and-adversarial-examples/
https://cset.georgetown.edu/publication/key-concepts-in-ai-safety-interpretability-in-machine-learning/
https://sideways-view.com/2018/02/24/takeoff-speeds/
http://ai.googleblog.com/2020/08/understanding-view-selection-for.html
https://thegradient.pub/independently-reproducible-machine-learning/
https://www.alignmentforum.org/posts/BGxTpdBGbwCWrGiCL/plausible-cases-for-hrad-work-and-locating-the-crux-in-the
https://www.alignmentforum.org/posts/RvrTZ3qKWpg9aiFqZ/openness-norms-in-agi-development
https://www.alignmentforum.org/posts/ky988ePJvCRhmCwGo/using-vector-fields-to-visualise-preferences-and-make-them
https://www.alignmentforum.org/posts/2yLn8iTrvHoEgqXcJ/the-two-layer-model-of-human-values-and-problems-with
https://www.lesswrong.com/posts/GHNokcgERpLJwJnLW/some-comments-on-stuart-armstrong-s-research-agenda-v0-9
https://www.alignmentforum.org/posts/X7S3u5E4KktLp7gHz/tessellating-hills-a-toy-model-for-demons-in-imperfect
https://www.alignmentforum.org/posts/farherQcqFQXqRcvv/universality-unwrapped
https://www.alignmentforum.org/posts/5WECpYABCT62TJrhY/will-ai-undergo-discontinuous-progress
http://ai.googleblog.com/2020/03/fast-and-easy-infinitely-wide-networks.html
https://ai-alignment.com/the-steering-problem-a3543e65c5c4