This repository has been archived by the owner on Mar 31, 2022. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 1
/
Security.html
1869 lines (1702 loc) · 161 KB
/
Security.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<!doctype html>
<html>
<head>
<meta charset="utf-8">
<!-- Always force latest IE rendering engine or request Chrome Frame -->
<meta content="IE=edge,chrome=1" http-equiv="X-UA-Compatible">
<!-- REPLACE X WITH PRODUCT NAME -->
<title>Security | Pivotal Docs</title>
<!-- Local CSS stylesheets -->
<link href="/stylesheets/master.css" media="screen,print" rel="stylesheet" type="text/css" />
<link href="/stylesheets/breadcrumbs.css" media="screen,print" rel="stylesheet" type="text/css" />
<link href="/stylesheets/search.css" media="screen,print" rel="stylesheet" type="text/css" />
<link href="/stylesheets/portal-style.css" media="screen,print" rel="stylesheet" type="text/css" />
<link href="/stylesheets/printable.css" media="print" rel="stylesheet" type="text/css" />
<!-- Confluence HTML stylesheet -->
<link href="/stylesheets/site-conf.css" media="screen,print" rel="stylesheet" type="text/css" />
<!-- Left-navigation code -->
<!-- http://www.designchemical.com/lab/jquery-vertical-accordion-menu-plugin/examples/# -->
<link href="/stylesheets/dcaccordion.css" rel="stylesheet" type="text/css" />
<script src="http://ajax.googleapis.com/ajax/libs/jquery/1.4.2/jquery.min.js" type="text/javascript"></script>
<script src="/javascripts/jquery.cookie.js" type="text/javascript"></script>
<script src="/javascripts/jquery.hoverIntent.minified.js" type="text/javascript"></script>
<script src="/javascripts/jquery.dcjqaccordion.2.7.min.js" type="text/javascript"></script>
<script type="text/javascript">
$(document).ready(function($){
$('#accordion-1').dcAccordion({
eventType: 'click',
autoClose: true,
saveState: true,
disableLink: false,
speed: 'fast',
classActive: 'test',
showCount: false
});
});
</script>
<link href="/stylesheets/grey.css" rel="stylesheet" type="text/css" />
<!-- End left-navigation code -->
<script src="/javascripts/all.js" type="text/javascript"></script>
<link href='http://www.gopivotal.com/misc/favicon.ico' rel='shortcut icon'>
<script type="text/javascript">
if (window.location.host === 'docs.gopivotal.com') {
var _gaq = _gaq || [];
_gaq.push(['_setAccount', 'UA-39702075-1']);
_gaq.push(['_setDomainName', 'gopivotal.com']);
_gaq.push(['_trackPageview']);
(function() {
var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
})();
}
</script>
</head>
<body class="pivotalcf pivotalcf_getstarted pivotalcf_getstarted_index">
<div class="viewport">
<div class="mobile-navigation--wrapper mobile-only">
<div class="navigation-drawer--container">
<div class="navigation-item-list">
<div class="navbar-link active">
<a href="http://gopivotal.com">
Home
<i class="icon-chevron-right pull-right"></i>
</a>
</div>
<div class="navbar-link">
<a href="http://gopivotal.com/paas">
PaaS
<i class="icon-chevron-right pull-right"></i>
</a>
</div>
<div class="navbar-link">
<a href="http://gopivotal.com/big-data">
Big Data
<i class="icon-chevron-right pull-right"></i>
</a>
</div>
<div class="navbar-link">
<a href="http://gopivotal.com/agile">
Agile
<i class="icon-chevron-right pull-right"></i>
</a>
</div>
<div class="navbar-link">
<a href="http://gopivotal.com/support">
Help & Support
<i class="icon-chevron-right pull-right"></i>
</a>
</div>
<div class="navbar-link">
<a href="http://gopivotal.com/products">
Products
<i class="icon-chevron-right pull-right"></i>
</a>
</div>
<div class="navbar-link">
<a href="http://gopivotal.com/solutions">
Solutions
<i class="icon-chevron-right pull-right"></i>
</a>
</div>
<div class="navbar-link">
<a href="http://gopivotal.com/partners">
Partners
<i class="icon-chevron-right pull-right"></i>
</a>
</div>
</div>
</div>
<div class="mobile-nav">
<div class="nav-icon js-open-nav-drawer">
<i class="icon-reorder"></i>
</div>
<div class="header-center-icon">
<a href="http://gopivotal.com">
<div class="icon icon-pivotal-logo-mobile"></div>
</a>
</div>
</div>
</div>
<div class='wrap'>
<script src="//use.typekit.net/clb0qji.js" type="text/javascript"></script>
<script type="text/javascript">
try {
Typekit.load();
} catch (e) {
}
</script>
<script type="text/javascript">
document.domain = "gopivotal.com";
</script>
<script type="text/javascript">
WebFontConfig = {
google: { families: [ 'Source+Sans+Pro:300italic,400italic,600italic,300,400,600:latin' ] }
};
(function() {
var wf = document.createElement('script');
wf.src = ('https:' == document.location.protocol ? 'https' : 'http') +
'://ajax.googleapis.com/ajax/libs/webfont/1/webfont.js';
wf.type = 'text/javascript';
wf.async = 'true';
var s = document.getElementsByTagName('script')[0];
s.parentNode.insertBefore(wf, s);
})(); </script>
<div id="search-dropdown-box">
<div class="search-dropdown--container js-search-dropdown">
<div class="container-fluid">
<div class="close-menu-large"><img src="http://www.gopivotal.com/sites/all/themes/gopo13/images/icon-close.png" /></div>
<div class="search-form--container">
<div class="form-search">
<div class='gcse-search'></div>
<script src="http://www.google.com/jsapi" type="text/javascript"></script>
<script src="/javascripts/cse.js" type="text/javascript"></script>
</div>
</div>
</div>
</div>
</div>
<header class="navbar desktop-only" id="nav">
<div class="navbar-inner">
<div class="container-fluid">
<div class="pivotal-logo--container">
<a class="pivotal-logo" href="http://gopivotal.com"><span></span></a>
</div>
<ul class="nav pull-right">
<li class="navbar-link">
<a href="http://www.gopivotal.com/paas" id="paas-nav-link">PaaS</a>
</li>
<li class="navbar-link">
<a href="http://www.gopivotal.com/big-data" id="big-data-nav-link">BIG DATA</a>
</li>
<li class="navbar-link">
<a href="http://www.gopivotal.com/agile" id="agile-nav-link">AGILE</a>
</li>
<li class="navbar-link">
<a href="http://www.gopivotal.com/oss" id="oss-nav-link">OSS</a>
</li>
<li class="nav-search">
<a class="js-search-input-open" id="click-to-search"><span></span></a>
</li>
</ul>
</div>
<a href="http://www.gopivotal.com/contact">
<img id="get-started" src="http://www.gopivotal.com/sites/all/themes/gopo13/images/get-started.png">
</a>
</div>
</header>
<div class="main-wrap">
<div class="container-fluid">
<!-- Google CSE Search Box -->
<div id='docs-search'>
<gcse:search></gcse:search>
</div>
<div id='all-docs-link'>
<a href="http://docs.gopivotal.com/">All Documentation</a>
</div>
<div class="container">
<div id="sub-nav" class="nav-container">
<!-- Collapsible left-navigation-->
<ul class="accordion" id="accordion-1">
<!-- REPLACE <li/> NODES-->
<li>
<a href="index.html">Home</a></br>
<li>
<a href="PivotalHD.html">Pivotal HD 2.0.1</a>
<ul>
<li>
<a href="PHDEnterprise2.0.1ReleaseNotes.html">PHD Enterprise 2.0.1 Release Notes</a>
</li>
</ul>
<ul>
<li>
<a href="PHDInstallationandAdministration.html">PHD Installation and Administration</a>
<ul>
<li>
<a href="OverviewofPHD.html">Overview of PHD</a>
</li>
</ul>
<ul>
<li>
<a href="InstallationOverview.html">Installation Overview</a>
</li>
</ul>
<ul>
<li>
<a href="PHDInstallationChecklist.html">PHD Installation Checklist</a>
</li>
</ul>
<ul>
<li>
<a href="InstallingPHDUsingtheCLI.html">Installing PHD Using the CLI</a>
</li>
</ul>
<ul>
<li>
<a href="UpgradeChecklist.html">Upgrade Checklist</a>
</li>
</ul>
<ul>
<li>
<a href="UpgradingPHDUsingtheCLI.html">Upgrading PHD Using the CLI</a>
</li>
</ul>
<ul>
<li>
<a href="AdministeringPHDUsingtheCLI.html">Administering PHD Using the CLI</a>
</li>
</ul>
<ul>
<li>
<a href="PHDFAQFrequentlyAskedQuestions.html">PHD FAQ (Frequently Asked Questions)</a>
</li>
</ul>
<ul>
<li>
<a href="PHDTroubleshooting.html">PHD Troubleshooting</a>
</li>
</ul>
</li>
</ul>
<ul>
<li>
<a href="StackandToolsReference.html">Stack and Tools Reference</a>
<ul>
<li>
<a href="OverviewofApacheStackandPivotalComponents.html">Overview of Apache Stack and Pivotal Components</a>
</li>
</ul>
<ul>
<li>
<a href="ManuallyInstallingPivotalHD2.0Stack.html">Manually Installing Pivotal HD 2.0 Stack</a>
</li>
</ul>
<ul>
<li>
<a href="ManuallyUpgradingPivotalHDStackfrom1.1.1to2.0.html">Manually Upgrading Pivotal HD Stack from 1.1.1 to 2.0</a>
</li>
</ul>
<ul>
<li>
<a href="PivotalHadoopEnhancements.html">Pivotal Hadoop Enhancements</a>
</li>
</ul>
<ul>
<li>
<a href="Security.html">Security</a>
</li>
</ul>
</li>
</ul>
</li>
<li>
<a href="PivotalCommandCenter.html">Pivotal Command Center 2.2.1</a>
<ul>
<li>
<a href="PCC2.2.1ReleaseNotes.html">PCC 2.2.1 Release Notes</a>
</li>
</ul>
<ul>
<li>
<a href="PCCUserGuide.html">PCC User Guide</a>
<ul>
<li>
<a href="PCCOverview.html">PCC Overview</a>
</li>
</ul>
<ul>
<li>
<a href="PCCInstallationChecklist.html">PCC Installation Checklist</a>
</li>
</ul>
<ul>
<li>
<a href="InstallingPCC.html">Installing PCC</a>
</li>
</ul>
<ul>
<li>
<a href="UsingPCC.html">Using PCC</a>
</li>
</ul>
<ul>
<li>
<a href="CreatingaYUMEPELRepository.html">Creating a YUM EPEL Repository</a>
</li>
</ul>
<ul>
<li>
<a href="CommandLineReference.html">Command Line Reference</a>
</li>
</ul>
</li>
</ul>
</li>
<li>
<a href="PivotalHAWQ.html">Pivotal HAWQ 1.2.0</a>
<ul>
<li>
<a href="HAWQ1.2.0.1ReleaseNotes.html">HAWQ 1.2.0.1 Release Notes</a>
</li>
</ul>
<ul>
<li>
<a href="HAWQInstallationandUpgrade.html">HAWQ Installation and Upgrade</a>
<ul>
<li>
<a href="PreparingtoInstallHAWQ.html">Preparing to Install HAWQ</a>
</li>
</ul>
<ul>
<li>
<a href="InstallingHAWQ.html">Installing HAWQ</a>
</li>
</ul>
<ul>
<li>
<a href="InstallingtheHAWQComponents.html">Installing the HAWQ Components</a>
</li>
</ul>
<ul>
<li>
<a href="UpgradingHAWQandComponents.html">Upgrading HAWQ and Components</a>
</li>
</ul>
<ul>
<li>
<a href="HAWQConfigurationParameterReference.html">HAWQ Configuration Parameter Reference</a>
</li>
</ul>
</li>
</ul>
<ul>
<li>
<a href="HAWQAdministration.html">HAWQ Administration</a>
<ul>
<li>
<a href="HAWQOverview.html">HAWQ Overview</a>
</li>
</ul>
<ul>
<li>
<a href="HAWQQueryProcessing.html">HAWQ Query Processing</a>
</li>
</ul>
<ul>
<li>
<a href="UsingHAWQtoQueryData.html">Using HAWQ to Query Data</a>
</li>
</ul>
<ul>
<li>
<a href="ConfiguringClientAuthentication.html">Configuring Client Authentication</a>
</li>
</ul>
<ul>
<li>
<a href="KerberosAuthentication.html">Kerberos Authentication</a>
</li>
</ul>
<ul>
<li>
<a href="ExpandingtheHAWQSystem.html">Expanding the HAWQ System</a>
</li>
</ul>
<ul>
<li>
<a href="HAWQInputFormatforMapReduce.html">HAWQ InputFormat for MapReduce</a>
</li>
</ul>
<ul>
<li>
<a href="HAWQFilespacesandHighAvailabilityEnabledHDFS.html">HAWQ Filespaces and High Availability Enabled HDFS</a>
</li>
</ul>
<ul>
<li>
<a href="SQLCommandReference.html">SQL Command Reference</a>
</li>
</ul>
<ul>
<li>
<a href="ManagementUtilityReference.html">Management Utility Reference</a>
</li>
</ul>
<ul>
<li>
<a href="ClientUtilityReference.html">Client Utility Reference</a>
</li>
</ul>
<ul>
<li>
<a href="HAWQServerConfigurationParameters.html">HAWQ Server Configuration Parameters</a>
</li>
</ul>
<ul>
<li>
<a href="HAWQEnvironmentVariables.html">HAWQ Environment Variables</a>
</li>
</ul>
<ul>
<li>
<a href="HAWQDataTypes.html">HAWQ Data Types</a>
</li>
</ul>
<ul>
<li>
<a href="SystemCatalogReference.html">System Catalog Reference</a>
</li>
</ul>
<ul>
<li>
<a href="hawq_toolkitReference.html">hawq_toolkit Reference</a>
</li>
</ul>
</li>
</ul>
<ul>
<li>
<a href="PivotalExtensionFrameworkPXF.html">Pivotal Extension Framework (PXF)</a>
<ul>
<li>
<a href="PXFInstallationandAdministration.html">PXF Installation and Administration</a>
</li>
</ul>
<ul>
<li>
<a href="PXFExternalTableandAPIReference.html">PXF External Table and API Reference</a>
</li>
</ul>
</div><!--end of sub-nav-->
<h3 class="title-container">Security</h3>
<div class="content">
<!-- Python script replaces main content -->
<div id ="main"><div style="visibility:hidden; height:2px;">Pivotal Product Documentation : Security</div><div class="wiki-content group" id="main-content">
<p>You must install and configure Kerberos to enable security in Pivotal HD 1.1.x. and higher.</p><p>Kerberos is a network authentication protocol that provides strong authentication for client/server applications using secret-key cryptography.</p><p><style type="text/css">/*<![CDATA[*/
div.rbtoc1400035786210 {padding: 0px;}
div.rbtoc1400035786210 ul {list-style: disc;margin-left: 0px;}
div.rbtoc1400035786210 li {margin-left: 0px;padding-left: 0px;}
/*]]>*/</style><div class="toc-macro rbtoc1400035786210">
<ul class="toc-indentation">
<li><a href="#Security-ConfiguringKerberosforHDFSandYARN(MapReduce)">Configuring Kerberos for HDFS and YARN (MapReduce)</a>
<ul class="toc-indentation">
<li><a href="#Security-KerberosSet-up">Kerberos Set-up</a></li>
<li><a href="#Security-JavaSupportItemsInstallation">Java Support Items Installation</a></li>
<li><a href="#Security-ContainerandScriptModifications">Container and Script Modifications</a></li>
<li><a href="#Security-SiteXMLChanges">Site XML Changes</a></li>
<li><a href="#Security-CompletetheHDFS/YARNSecureConfiguration">Complete the HDFS/YARN Secure Configuration</a></li>
<li><a href="#Security-TurningSecureModeOff">Turning Secure Mode Off</a></li>
<li><a href="#Security-BuildingandInstallingJSVC">Building and Installing JSVC</a></li>
<li><a href="#Security-InstallingtheMITKerberos5KDC">Installing the MIT Kerberos 5 KDC</a></li>
</ul>
</li>
<li><a href="#Security-ConfiguringKerberosforHDFSHighAvailability">Configuring Kerberos for HDFS High Availability</a></li>
<li><a href="#Security-ZookeeperSecureConfiguration">Zookeeper Secure Configuration</a>
<ul class="toc-indentation">
<li><a href="#Security-ZookeeperServers">Zookeeper Servers</a></li>
<li><a href="#Security-ZookeeperClients">Zookeeper Clients</a></li>
</ul>
</li>
<li><a href="#Security-HBaseSecureConfiguration">HBase Secure Configuration</a>
<ul class="toc-indentation">
<li><a href="#Security-HBaseMasterandRegionservers">HBase Master and Regionservers</a></li>
<li><a href="#Security-HBaseClients">HBase Clients</a></li>
<li><a href="#Security-HBasewithSecureZookeeperConfiguration">HBase with Secure Zookeeper Configuration</a></li>
<li><a href="#Security-AccessControlandPXFExternalTables">Access Control and PXF External Tables</a></li>
</ul>
</li>
<li><a href="#Security-HiveSecureConfiguration">Hive Secure Configuration</a>
<ul class="toc-indentation">
<li><a href="#Security-ChangingtoHiveServer2">Changing to Hive Server 2</a></li>
<li><a href="#Security-Hivewarehousepermissionsissues">Hive warehouse permissions issues</a></li>
<li><a href="#Security-ConnectingandusingsecureHivewithBeeline">Connecting and using secure Hive with Beeline</a></li>
</ul>
</li>
<li><a href="#Security-ConfigureHCatalog(WebHCat)onsecureHive">Configure HCatalog (WebHCat) on secure Hive</a>
<ul class="toc-indentation">
<li><a href="#Security-Prerequisites">Prerequisites</a></li>
<li><a href="#Security-CreatekeytabfilefortheWebHCatserver">Create keytab file for the WebHCat server</a></li>
<li><a href="#Security-DistributethekeytabfiletotheWebHCatserver">Distribute the keytab file to the WebHCat server</a></li>
<li><a href="#Security-ConfigureWebHCatandproxyusers">Configure WebHCat and proxy users</a></li>
<li><a href="#Security-VerifyWebHCatisworking">Verify WebHCat is working</a></li>
</ul>
</li>
<li><a href="#Security-HAWQonSecureHDFS">HAWQ on Secure HDFS</a>
<ul class="toc-indentation">
<li><a href="#Security-Requirements">Requirements</a></li>
<li><a href="#Security-Preparation">Preparation</a></li>
<li><a href="#Security-Configuration">Configuration</a></li>
<li><a href="#Security-Troubleshooting">Troubleshooting</a></li>
</ul>
</li>
<li><a href="#Security-Auditing">Auditing</a></li>
<li><a href="#Security-SecureWebAccess">Secure Web Access</a>
<ul class="toc-indentation">
<li><a href="#Security-Overview">Overview</a></li>
<li><a href="#Security-Prerequisites.1">Prerequisites</a></li>
<li><a href="#Security-ConfiguringSecureWebHDFS">Configuring Secure WebHDFS </a></li>
<li><a href="#Security-UsingWebHDFSinSecureMode">Using WebHDFS in Secure Mode</a></li>
</ul>
</li>
<li><a href="#Security-SecureHDFSwebaccessviaHttpFS">Secure HDFS web access via HttpFS</a>
<ul class="toc-indentation">
<li><a href="#Security-Prerequisites.2">Prerequisites</a></li>
<li><a href="#Security-AddprincipalforHttpFS">Add principal for HttpFS</a></li>
<li><a href="#Security-Createanddistributekeytab">Create and distribute keytab</a></li>
<li><a href="#Security-Setthekeytabfileownershipandpermissions">Set the keytab file ownership and permissions</a></li>
<li><a href="#Security-Configuration.1">Configuration</a></li>
<li><a href="#Security-RestartHttpFS">Restart HttpFS</a></li>
<li><a href="#Security-Verifyit'sworking">Verify it's working</a></li>
</ul>
</li>
<li><a href="#Security-FlumeSecurityConfiguration">Flume Security Configuration</a>
<ul class="toc-indentation">
<li><a href="#Security-Prerequisites.3">Prerequisites</a></li>
<li><a href="#Security-CreatetheFlumePrincipal">Create the Flume Principal</a></li>
<li><a href="#Security-CreatetheFlumeKeytabFiles">Create the Flume Keytab Files</a></li>
<li><a href="#Security-DistributetheFlumeKeytabFilestotheFlumeserverandchangetheownershipandpermission">Distribute the Flume Keytab Files to the Flume server and change the ownership and permission</a></li>
<li><a href="#Security-AsingleuserforallHDFSsinks">A single user for all HDFS sinks</a></li>
<li><a href="#Security-DifferentusersacrossmultipleHDFSsinks">Different users across multiple HDFS sinks</a></li>
</ul>
</li>
<li><a href="#Security-OozieSecurityConfiguration">Oozie Security Configuration</a>
<ul class="toc-indentation">
<li><a href="#Security-Prerequisites.4">Prerequisites</a></li>
<li><a href="#Security-CreatetheOoziePrincipal">Create the Oozie Principal</a></li>
<li><a href="#Security-CreatetheHTTPPrincipalfortheOozieServer">Create the HTTP Principal for the Oozie Server</a></li>
<li><a href="#Security-CreatetheOozieKeytabFiles">Create the Oozie Keytab Files</a></li>
<li><a href="#Security-CopytheOozieKeytabFilestotheOozieserverandchangetheownershipandpermission">Copy the Oozie Keytab Files to the Oozie server and change the ownership and permission</a></li>
<li><a href="#Security-EdittheOozieConfiguration">Edit the Oozie Configuration</a></li>
<li><a href="#Security-UsingOoziewithaSecureHiveMetastoreServer">Using Oozie with a Secure Hive Metastore Server</a></li>
<li><a href="#Security-VerifySecureOozie">Verify Secure Oozie</a></li>
</ul>
</li>
<li><a href="#Security-SqoopSecurityConfiguration">Sqoop Security Configuration</a></li>
<li><a href="#Security-PigSecurityConfiguration">Pig Security Configuration</a></li>
<li><a href="#Security-MahoutSecurityConfiguration">Mahout Security Configuration</a></li>
<li><a href="#Security-Troubleshooting.1">Troubleshooting</a>
</li>
</ul>
</div></p> <div class="aui-message warning shadowed information-macro">
<p class="title">Notes</p>
<span class="aui-icon icon-warning">Icon</span>
<div class="message-content">
<ul><li>For HAWQ to work with secure HDFS, the Pivotal ADS version must be 1.1.3 or greater.</li><li>For more information about HAWQ secure configuration, see the <em>Kerberos Authentication</em> section of the <em>Pivotal ADS Administrator Guide.</em></li><li><p>Note that Kerberos operation in Hadoop is very sensitive to proper networking configuration:</p><ul><li>Host IP's for service nodes must reverse map to the FQDN's used to create the node principal for the service/FQDN.</li><li>hostname -f on a node must give the FQDN used to create the principal for the service/FQDN.</li><li>The cluster needs to have been created with FQDN's, not short names.</li></ul><p>Make sure your networking is properly configured before attempting to secure a cluster.</p></li></ul>
</div>
</div>
<p> </p><p><span class="confluence-anchor-link" id="Security-HDFSYARN"></span></p><h2 id="Security-ConfiguringKerberosforHDFSandYARN(MapReduce)">Configuring Kerberos for HDFS and YARN (MapReduce)</h2><p>At a minimum, Kerberos provides protection against user and service spoofing attacks, and allows for enforcement of user HDFS access permissions. The installation is not difficult, but requires very specific instructions with many steps, and suffers from the same difficulties as any system requiring distributed configuration. Pivotal is working to automate the process to make it simple for users to enable/disable secure PHD clusters. Until then, these instructions are intended to provide a step by step process for getting a cluster up and running in secure mode.</p><p>Note that after the initial HDFS/YARN configuration. other services that need to be set up to run on secure HDFS (for example, HBase), or that you want to also secure (for example, Zookeeper), need to configured.</p><p><strong>Important</strong>: Save your command history; it will help in checking for errors when troubleshooting.</p><h3 id="Security-KerberosSet-up">Kerberos Set-up</h3><h4 id="Security-InstalltheKDC">Install the KDC</h4><p>If you do not have a pre-existing KDC, see <a href="#Security-InstallingtheMITKerberos5KDC">Installing the MIT Kerberos 5 KDC</a>. </p> <div class="aui-message warning shadowed information-macro">
<span class="aui-icon icon-warning">Icon</span>
<div class="message-content">
<p>CentOS and RedHat use AES-256 as the default encryption strength. If you want to use AES-256, you will need to install the JCE security policy file (described below) on all cluster hosts. If not, disable this encryption type in the KDC configuration. To disable AES-256 on an MIT kerberos 5 KDC, remove <code>aes256-cts:normal </code>from the <code>supported_enctypes</code> parameter in <code>kdc.conf</code>.</p>
</div>
</div>
<h4 id="Security-IntegratingClusterSecuritywithanOrganizationalKDC">Integrating Cluster Security with an Organizational KDC</h4><p>If your organization runs Active Directory or other Kerberos KDC, it is not recommended this be used for cluster security. Instead, install an MIT Kerberos KDC and realm for the cluster(s) and create all the service principals in this realm as per the instructions below. This KDC will be minimally used for service principals, whilst Active Directory (or your organizations's MIT KDC) will be used for cluster users. Next, configure one-way cross-realm trust from this realm to the Active Directory or corporate KDC realm.</p><p>Important: This configuration is strongly recommended, as a large PHD cluster requires the IT manager to create large numbers of service principals for your organizations' Active Directory or organizational MIT KDC. For example, a 100 node PHD cluster requires 200+ service principals. In addition, when a large cluster starts up, it may impact the performance of your organizations' IT systems, as all the service principals make requests of the AD or MIT Kerberos KDC at once.</p><h4 id="Security-InstallKerberosWorkstationandLibrariesonClusterHosts">Install Kerberos Workstation and Libraries on Cluster Hosts</h4><p>If you are using MIT krb5 run:</p><div class="code panel pdl" style="border-width: 1px;"><div class="codeContent panelContent pdl">
<pre class="theme: Confluence; brush: java; gutter: false" style="font-size:12px;"># yum install krb5-libs krb5-workstation</pre>
</div></div><h4 id="Security-DistributetheKerberosClientConfigurationFiletoallClusterHosts">Distribute the Kerberos Client Configuration File to all Cluster Hosts</h4><p>If you are using Kerberos 5 MIT, the file is<code> /etc/krb5.conf</code>. This file must exist on all cluster hosts. For PHD you can use <code>massh</code> to push the files, and then to copy them to the proper place. </p><p><span class="confluence-anchor-link" id="Security-CreatePrincipal"></span></p><h4 id="Security-CreatethePrincipals">Create the Principals</h4><p>These instructions are for MIT Kerberos 5; command syntax for other Kerberos versions may be different.</p><p>Principals (Kerberos users) are of the form:<code> name/role@REALM</code>. For our purposes the name will be a PHD service name (for example, <code>hdfs</code>), and the role will be a DNS resolvable fully-qualified hostname (<code>host_fqdn</code>); one you could use to connect to the host in question.</p><p><strong>Important</strong>:</p><ul><li>Replace <code>REALM</code> with the KDC realm you are using for your PHD cluster, where it appears.</li><li>The host names used MUST be resolvable to an address on all the cluster hosts and MUST be of the form <code>host.domain,</code> as some Hadoop components require at least one "." part in the host names used for principals.</li><li>The names of the principals seem to matter, as some processes may throw exceptions if you change them. Hence, it is safest to use the specified Hadoop principal names.</li><li>Hadoop supports an<code> _HOST</code> tag in the site XML that is interpreted as the <code>host_fqdn,</code> but this must be used properly. See <a href="#Security-Using_HOSTinSiteXML">Using _HOST in Site XML</a>.</li></ul><p>For the HDFS services, you will need to create an <code>hdfs/host_fqdn</code> principal for each host running an HDFS service (name node, secondary name node, data node).</p><p>For YARN services, you will need to create a <code>yarn/host_fqdn</code> principal for each host running a YARN service (resource manager, node manager, proxy server).</p><p>For MapReduce services, you need to create a principal,<code> mapred/host_fqdn</code> for the Job History Server.</p><p>To create the required secure HD principals (running kadmin.local):</p><ul><li>For each cluster host (excepting client-only hosts) run: </li></ul><div class="code panel pdl" style="border-width: 1px;"><div class="codeContent panelContent pdl">
<pre class="theme: Confluence; brush: java; gutter: false" style="font-size:12px;">addprinc -randkey HTTP/<host_fqdn>@<REALM></pre>
</div></div><ul><li>HDFS (name node, secondary name node, data nodes), for each HDFS service host run:</li></ul><div class="code panel pdl" style="border-width: 1px;"><div class="codeContent panelContent pdl">
<pre class="theme: Confluence; brush: java; gutter: false" style="font-size:12px;"> addprinc -randkey hdfs/<host_fqdn>@<REALM></pre>
</div></div><ul><li>YARN (resource manager, node managers, proxy server), for each YARN service host run:</li></ul><div class="code panel pdl" style="border-width: 1px;"><div class="codeContent panelContent pdl">
<pre class="theme: Confluence; brush: java; gutter: false" style="font-size:12px;">addprinc -randkey yarn/<host_fqdn>@<REALM></pre>
</div></div><ul><li>MAPRED (job history server): for each JHS service host run:</li></ul><div class="code panel pdl" style="border-width: 1px;"><div class="codeContent panelContent pdl">
<pre class="theme: Confluence; brush: java; gutter: false" style="font-size:12px;">addprinc -randkey mapred/<host_fqdn>@<REALM></pre>
</div></div> <div class="aui-message warning shadowed information-macro">
<p class="title">Important</p>
<span class="aui-icon icon-warning">Icon</span>
<div class="message-content">
<p> If you have 1000 cluster hosts running HDFS and YARN, you will need 2000 HDFS and YARN principals, and need to distribute their keytab files. It is recommended that you use a cluster-local KDC for this purpose and configure cross-realm trust to your organizational Active Directory or other Kerberos KDC.</p>
</div>
</div>
<h4 id="Security-CreatetheKeytabFiles">Create the Keytab Files</h4><p><strong>Important</strong>: You MUST use<code> kadmin.local</code> (or the equivalent in your KDC) for this step on the KDC, as <code>kadmin</code> does not support <code>-norandkey.</code></p><p><strong>Important</strong>: You can put the keytab files anywhere during this step. In this document, we created a directory<code> /etc/security/phd/keytab/</code> and are using this directory on cluster hosts, and so, for consistency, are placing them in a similarly named directory on the KDC. If the node you are on already has files in <code>/etc/security/phd/keytab/,</code> it may be advisable to create a separate, empty, directory for this step.</p><p>Each service's keytab file for a given host will have the service principal for that host and the HTTP principal for that host in the file.</p><p><strong>HDFS key tabs</strong></p><p>For each host having an HDFS process (name node, secondary name node, data nodes), run:</p><div class="code panel pdl" style="border-width: 1px;"><div class="codeContent panelContent pdl">
<pre class="theme: Confluence; brush: java; gutter: false" style="font-size:12px;">kadmin.local: ktadd -norandkey -k /etc/security/phd/keytab/hdfs-hostid.service.keytab hdfs/<host_fqdn>@<REALM> HTTP/<host_fqdn@<REALM></pre>
</div></div><p>where <code>hostid</code> is the short name for the host, for example, <code>vm1</code>, <code>vm2</code>, etc. This is to differentiate the files by host. You can use the hostname if desired.</p><p>For example, for a three node cluster (one node name node, two data nodes):</p><div class="code panel pdl" style="border-width: 1px;"><div class="codeContent panelContent pdl">
<pre class="theme: Confluence; brush: java; gutter: false" style="font-size:12px;">kadmin.local: ktadd -norandkey -k /etc/security/phd/keytab/hdfs-vm2.service.keytab hdfs/[email protected] HTTP/[email protected]
kadmin.local: ktadd -norandkey -k /etc/security/phd/keytab/hdfs-vm3.service.keytab hdfs/[email protected] HTTP/[email protected]
kadmin.local: ktadd -norandkey -k /etc/security/phd/keytab/hdfs-vm4.service.keytab hdfs/[email protected] HTTP/[email protected]
</pre>
</div></div><p> </p><p><strong>YARN keytabs</strong></p><p>For each host having a YARN process (resource manager, node manager or proxy server), run:</p><div class="code panel pdl" style="border-width: 1px;"><div class="codeContent panelContent pdl">
<pre class="theme: Confluence; brush: java; gutter: false" style="font-size:12px;">kadmin.local: ktadd -norandkey -k /etc/security/phd/keytab/yarn-hostid.service.keytab yarn/<host_fqdn>@<REALM> HTTP/<host_fqdn>@<REALM></pre>
</div></div><p>For example, for a three node cluster (one node resource manager, two node managers):</p><div class="code panel pdl" style="border-width: 1px;"><div class="codeContent panelContent pdl">
<pre class="theme: Confluence; brush: java; gutter: false" style="font-size:12px;">kadmin.local: ktadd -norandkey -k /etc/security/phd/keytab/yarn-vm2.service.keytab yarn/[email protected] HTTP/[email protected]
kadmin.local: ktadd -norandkey -k /etc/security/phd/keytab/yarn-vm3.service.keytab yarn/[email protected] HTTP/[email protected]
kadmin.local: ktadd -norandkey -k /etc/security/phd/keytab/yarn-vm4.service.keytab yarn/[email protected] HTTP/[email protected]
</pre>
</div></div><p> </p><p><strong>MAPRED keytabs</strong></p><p>For each host having a MapReduce job history server, run:</p><div class="code panel pdl" style="border-width: 1px;"><div class="codeContent panelContent pdl">
<pre class="theme: Confluence; brush: java; gutter: false" style="font-size:12px;">kadmin.local: ktadd -norandkey -k /etc/security/phd/keytab/mapred-hostid.service.keytab mapred/host_fqdn@REALM HTTP/host_fqdn@REALM</pre>
</div></div><p>For example:</p><div class="code panel pdl" style="border-width: 1px;"><div class="codeContent panelContent pdl">
<pre class="theme: Confluence; brush: java; gutter: false" style="font-size:12px;">kadmin.local: ktadd -norandkey -k /etc/security/phd/keytab/mapred-vm2.service.keytab mapred/[email protected] HTTP/[email protected]
</pre>
</div></div><h4 id="Security-DistributetheKeytabFiles">Distribute the Keytab Files</h4><ol><li>On each cluster node, create the directory for the keytab files; here, we are using <code>/etc/security/phd/keytab</code>.</li><li>Move all the keytab files for a given host to the keytab directory on that host. For example: <code>hdfs-vm2.service.keytab</code>, <code>yarn-vm2.service.keytab</code> and <code>mapred-vm2.service.keytab</code> go to host vm2</li><li>On each host:</li></ol><ol><li style="list-style-type: none;background-image: none;"><ol><li>Change the permissions on all key tabs to read-write by owner only: <br/> <code>chmod 400 *.keytab</code></li><li>Change the group on all keytab files to hadoop: <br/> <code>chgrp hadoop *</code></li><li>Change the owner of each keytab to the relevant principal name. <br/>For example, for <code>yarn-vm2.service.keytab</code> run: <code> <br/>chown yarn yarn-vm2.service.keytab</code></li><li>Create links to the files of the form <code> <em>principalname.service.keytab</em> </code>. <br/>For example, for <code>yarn-vm2.service.keytab</code> run:<br/> <code> ln -s yarn-vm2.service.keytab yarn.service.keytab</code></li></ol></li></ol> <div class="aui-message warning shadowed information-macro">
<p class="title">important</p>
<span class="aui-icon icon-warning">Icon</span>
<div class="message-content">
<p> The last step above allows you to maintain clear identification of each keytab file while also allowing you to have common site xml files across cluster hosts.</p>
</div>
</div>
<p>This is an example keytab directory for a cluster control node (namenode, resource manager, JHS):</p><div class="code panel pdl" style="border-width: 1px;"><div class="codeContent panelContent pdl">
<pre class="theme: Confluence; brush: java; gutter: false" style="font-size:12px;">lrwxrwxrwx 1 root root 23 Jun 10 23:50 hdfs.service.keytab -> hdfs-vm2.service.keytab
-rw------- 1 hdfs hadoop 954 Jun 10 23:44 hdfs-vm2.service.keytab
lrwxrwxrwx 1 root root 25 Jun 10 23:51 mapred.service.keytab -> mapred-vm2.service.keytab
-rw------- 1 mapred hadoop 966 Jun 10 23:44 mapred-vm2.service.keytab
lrwxrwxrwx 1 root root 23 Jun 10 23:51 yarn.service.keytab -> yarn-vm2.service.keytab
-rw------- 1 yarn hadoop 954 Jun 10 23:44 yarn-vm2.service.keytab
</pre>
</div></div><p> </p><p>This is an example keytab directory for a cluster node (datanode, node manager, proxy server):</p><div class="code panel pdl" style="border-width: 1px;"><div class="codeContent panelContent pdl">
<pre class="theme: Confluence; brush: java; gutter: false" style="font-size:12px;">lrwxrwxrwx 1 root root 23 Jun 11 01:58 hdfs.service.keytab -> hdfs-vm3.service.keytab
-rw------- 1 hdfs hadoop 954 Jun 10 23:45 hdfs-vm3.service.keytab
lrwxrwxrwx 1 root root 23 Jun 11 01:58 yarn.service.keytab -> yarn-vm3.service.keytab
-rw------- 1 yarn hadoop 954 Jun 10 23:45 yarn-vm3.service.keytab
</pre>
</div></div><h3 id="Security-JavaSupportItemsInstallation">Java Support Items Installation</h3><h4 id="Security-InstallJCEonallClusterHosts">Install JCE on all Cluster Hosts</h4><p><strong>Important</strong>: This step is only if you are using AES-256.</p><p class="emoticon emoticon-warning" title="(warning)"><strong>Note</strong>: These files will already exist in your environment and look the same, but are the <em>limited strength</em> encryption files; you must replace them with the unlimited strength files to use AES-256</p><ol><li>Download and unzip the JCE file for your JDK version (Java Cryptography Extension (JCE) Unlimited Strength Jurisdiction Policy Files 7 for JDK 7).</li><li>Place the <code>local_policy.jar</code> and <code>US_export_policy.jar</code> files in the <code>/usr/java/default/jre/lib/security/</code> directory on all cluster hosts.</li></ol><h4 id="Security-CheckJSVConallDatanodes">Check JSVC on all Datanodes</h4><p>JSVC allows a Java process to start as root and then switch to a less privileged user, and is required for the datanode process to start in secure mode. Your distribution comes with a pre-built JSVC; you need to verify it can find a JVM as follows:</p><ol><li>Run:<br/> <code>/usr/libexec/bigtop-utils/jsvc -help<br/> <br/> </code></li><li>Look under the printed<code> -jvm</code> item in the output and you should see something like: <br/><p><code>use a specific Java Virtual Machine. Available JVMs:</code> <br/> <code>'server'</code> <br/>If you do not see the <code>server</code> line, this jsvc will not work for your platform, so try the following actions:</p><p>a. Install JSVC using yum and run the check again; if it fails try the next step.</p><p>b. Build from source and install manually (see <a href="#Security-BuildingandInstallingJSVC">Building and Installing JSVC</a>).</p></li></ol><p>If you have datanode start-up problems and no other errors are obvious, it might be a JSVC problem and you may need to perform step 2, above, another time. JSVC is very picky about platform and JDK matching, so use the <a href="#Security-BuildingandInstallingJSVC">Building and Installing JSVC</a> instructions for your system OS and JDK.</p><h3 id="Security-ContainerandScriptModifications">Container and Script Modifications</h3><h4 id="Security-ConfiguretheLinuxContainer">Configure the Linux Container</h4><ol><li><p>Edit the <code>/usr/lib/gphd/hadoop-yarn/etc/hadoop/container-executor.cfg</code> as follows:</p><div class="code panel pdl" style="border-width: 1px;"><div class="codeContent panelContent pdl">
<pre class="theme: Confluence; brush: java; gutter: false" style="font-size:12px;"># NOTE: these next two should be set to the same values they have in yarn-site.xml
yarn.nodemanager.local-dirs=/data/1/yarn/nm-local-dir
yarn.nodemanager.log-dirs=/data/1/yarn/userlogs
# configured value of yarn.nodemanager.linux-container-executor.group
yarn.nodemanager.linux-container-executor.group=yarn
# comma separated list of users who can not run applications
banned.users=hdfs,yarn,mapred,bin
# Prevent other super-users
min.user.id=500
</pre>
</div></div><p><strong>Note</strong>: The <code>min.user.id</code> varies by Linux distribution; for CentOS it is 500, RedHat is 1000.</p></li><li><p>Check the permissions on <code>/usr/lib/gphd/hadoop-yarn/bin/container-executor</code>. They should look like:</p><div class="code panel pdl" style="border-width: 1px;"><div class="codeContent panelContent pdl">
<pre class="theme: Confluence; brush: java; gutter: false" style="font-size:12px;">---Sr-s--- 1 root yarn 364 Jun 11 00:08 container-executor
</pre>
</div></div><p>If they do not, then set the owner, group and permissions as:</p><div class="code panel pdl" style="border-width: 1px;"><div class="codeContent panelContent pdl">
<pre class="theme: Confluence; brush: java; gutter: false" style="font-size:12px;">chown root:yarn container-executor
chmod 050 container-executor
chmod u+s container-executor
chmod g+s container-executor
</pre>
</div></div></li></ol><p> </p><p>Check the permissions on <code>/usr/lib/gphd/hadoop-yarn/etc/hadoop/container-executor.cfg</code>. They should look like:</p><div class="code panel pdl" style="border-width: 1px;"><div class="codeContent panelContent pdl">
<pre class="theme: Confluence; brush: java; gutter: false" style="font-size:12px;">-rw-r--r-- 1 root root 363 Jul 4 00:29 /usr/lib/gphd/hadoop-yarn/etc/hadoop/container-executor.cfg
</pre>
</div></div><p>If they do not, then set them as follows:</p><div class="code panel pdl" style="border-width: 1px;"><div class="codeContent panelContent pdl">
<pre class="theme: Confluence; brush: java; gutter: false" style="font-size:12px;">chown root:root container-executor.cfg
chmod 644 container-executor.cfg
</pre>
</div></div><h4 id="Security-EdittheEnvironmentontheDatanodes">Edit the Environment on the Datanodes</h4><p> </p><p><strong>Important</strong>:</p><ul><li class="emoticon emoticon-warning" title="(warning)">At this point you should STOP the cluster, if it is running.</li><li class="emoticon emoticon-warning" title="(warning)">You only need to perform the steps below on the data nodes.</li></ul><ol><li><p>Uncomment the lines at the bottom of <code>/etc/default/hadoop-hdfs-datanode</code>:</p><div class="code panel pdl" style="border-width: 1px;"><div class="codeContent panelContent pdl">
<pre class="theme: Confluence; brush: java; gutter: false" style="font-size:12px;"># secure operation stuff
export HADOOP_SECURE_DN_USER=hdfs
export HADOOP_SECURE_DN_LOG_DIR=${HADOOP_LOG_DIR}/hdfs
export HADOOP_PID_DIR=/var/run/gphd/hadoop-hdfs/
export HADOOP_SECURE_DN_PID_DIR=${HADOOP_PID_DIR}
</pre>
</div></div></li><li><p>Set the JSVC variable:<br/>If you are using the included <code>jsvc</code> the<code> JSVC_HOME</code> variable in <code>/etc/default/hadoop</code> , it should already be properly set:</p><div class="code panel pdl" style="border-width: 1px;"><div class="codeContent panelContent pdl">
<pre class="theme: Confluence; brush: java; gutter: false" style="font-size:12px;">export JSVC_HOME=/usr/libexec/bigtop-utils
</pre>
</div></div><p>If, however, you built or hand-installed JSVC, your <code>JSVC_HOME</code> will be<code> /usr/bin</code> , so you must set it appropriately. Modify<code> /etc/default/hadoop</code> and set the proper <code>JSVC_HOME</code>:</p><div class="code panel pdl" style="border-width: 1px;"><div class="codeContent panelContent pdl">
<pre class="theme: Confluence; brush: java; gutter: false" style="font-size:12px;">export JSVC_HOME=/usr/bin
</pre>
</div></div><p><strong>Important</strong>: Make sure<code> JSVC_HOME</code> points to the correct<code> jsvc</code> binary.</p></li></ol> <div class="aui-message problem shadowed information-macro">
<span class="aui-icon icon-problem">Icon</span>
<div class="message-content">
<p>As long as<code> HADOOP_SECURE_DN_USER</code> is, set the datanode will try to start in secure mode.</p>
</div>
</div>
<p><strong> <br/> </strong></p><p><span class="confluence-anchor-link" id="Security-SiteXML"></span></p><h3 id="Security-SiteXMLChanges">Site XML Changes</h3><h4 id="Security-Using_HOSTinSiteXML">Using _HOST in Site XML</h4><p>You can maintain consistent site XML by using the <code>_HOST</code> <em> </em>keyword for the<code> host_fqdn</code> part in the site XML if:</p><ul><li>Your cluster nodes were identified with fully qualified domain names when configuring the cluster.</li><li><code>hostname -f</code> on all nodes yields the proper fully qualified hostname (same as the one used when creating the principals).</li></ul><p>You cannot use constructs like<code> _HOST.domain</code>; these will be interpreted literally.</p><p>You can only use<code> _HOST</code> in the site XML; files such as <code>jaas.conf,</code> needed for Zookeeper and HBase, must use actual FQDN's for hosts.</p><p><span class="confluence-anchor-link" id="Security-EditSite"></span></p><h4 id="Security-EdittheSiteXML">Edit the Site XML</h4><p>Finally, we are ready to edit the site XML to turn on secure mode. Before getting into this, it is good to understand who needs to talk to whom. By "talk" we mean using authenticated kerberos to initiate establishment of a communication channel. Doing this requires that you know your own principal, to identify yourself, and know the principal of the service you want to talk to. To be able to use its principal, a service needs to be able to login to Kerberos without a password, using a keytab file.</p><ul><li>Each service needs to know its own principal name.</li><li>Each running service on a node needs a service/host specific keytab file to start up.</li><li>Each data node needs to talk to the name node.</li><li>Each node manager needs to talk to the resource manager and the job history server.</li><li>Each client/gateway node needs to talk to the name node, resource manager and job history server.</li></ul><p><strong>Important</strong>:</p><ul><li>Redundant keytab files on some hosts do no harm and it makes management easier to have constant files. Remember, though, that the host_fqdn MUST be correct for each entry. Remembering this helps when setting up and troubleshooting the site xml files.</li><li>Before making changes, backup the current site xml files so that you can return to non-secure operation, if needed.</li></ul><p>Most of the changes can be consistent throughout the cluster site XML. Unfortunately, since data node and node manager principals are host-name-dependent (or more correctly the role for the YARN principal is set to the <code>host_fqdn</code>), the <code>yarn-site.xml</code> for data node and node manager principals will differ across the cluster.</p><ol><li><p>Edit <code>/usr/lib/gphd/hadoop/etc/hadoop/core-site.xml</code> as follows:</p><div class="code panel pdl" style="border-width: 1px;"><div class="codeContent panelContent pdl">
<pre class="theme: Confluence; brush: java; gutter: false" style="font-size:12px;"><property>
<name>hadoop.security.authentication</name>
<value>kerberos</value>
</property>
<property>
<name>hadoop.security.authorization</name>
<value>true</value>
</property>
<!-- THE PROPERTY BELOW IS OPTIONAL: IT ENABLES ON WIRE RPC ENCRYPTION -->
<property>
<name>hadoop.rpc.protection</name>
<value>privacy</value>
</property></pre>
</div></div></li><li><p>Edit <code>/usr/lib/gphd/hadoop/etc/hadoop/hdfs-site.xml</code> as follows:</p><div class="code panel pdl" style="border-width: 1px;"><div class="codeContent panelContent pdl">
<pre class="theme: Confluence; brush: java; gutter: false" style="font-size:12px;"><!-- WARNING: do not create duplicate entries: check for existing entries and modify if they exist! -->
<property>
<name>dfs.block.access.token.enable</name>
<value>true</value>
</property>
<!-- short circuit reads do not work when security is enabled for PHD VERSION LOWER THAN 2.0 so disable ONLY for them -->
<!-- For PHD greater than or equal to 2.0, set this to true -->
<property>
<name>dfs.client.read.shortcircuit</name>
<value>false</value>
</property>
<!-- name node secure configuration info -->
<property>
<name>dfs.namenode.keytab.file</name>
<value>/etc/security/phd/keytab/hdfs.service.keytab</value>
</property>
<property>
<name>dfs.namenode.kerberos.principal</name>
<value>hdfs/_HOST@REALM</value>
</property>
<property>
<name>dfs.namenode.kerberos.http.principal</name>
<value>HTTP/_HOST@REALM</value>
</property>
<property>
<name>dfs.namenode.kerberos.internal.spnego.principal</name>
<value>HTTP/_HOST@REALM</value>
</property>
<!-- (optional) secondary name node secure configuration info -->
<property>
<name>dfs.secondary.namenode.keytab.file</name>
<value>/etc/security/phd/keytab/hdfs.service.keytab</value>
</property>
<property>
<name>dfs.secondary.namenode.kerberos.principal</name>
<value>hdfs/_HOST@REALM</value>
</property>
<property>
<name>dfs.secondary.namenode.kerberos.http.principal</name>
<value>HTTP/_HOST@REALM</value>
</property>
<property>
<name>dfs.secondary.namenode.kerberos.internal.spnego.principal</name>
<value>HTTP/_HOST@REALM</value>
</property>
<!-- data node secure configuration info -->
<property>
<name>dfs.datanode.data.dir.perm</name>
<value>700</value>
</property>
<!-- these ports must be set < 1024 for secure operation -->
<!-- conversely they must be set back to > 1024 for non-secure operation -->
<property>
<name>dfs.datanode.address</name>
<value>0.0.0.0:1004</value>
</property>
<property>
<name>dfs.datanode.http.address</name>
<value>0.0.0.0:1006</value>
</property>
<!-- remember the principal for the datanode is the principal this hdfs-site.xml file is on -->
<!-- these (next three) need only be set on data nodes -->
<property>
<name>dfs.datanode.kerberos.principal</name>
<value>hdfs/_HOST@REALM</value>
</property>
<property>
<name>dfs.datanode.kerberos.http.principal</name>
<value>HTTP/_HOST@REALM</value>
</property>
<property>
<name>dfs.datanode.keytab.file</name>
<value>/etc/security/phd/keytab/hdfs.service.keytab</value>
</property>
<!-- OPTIONAL - set these to enable secure WebHDSF -->
<!-- on all HDFS cluster nodes (namenode, secondary namenode, datanode's) -->
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.web.authentication.kerberos.principal</name>
<value>HTTP/_HOST@REALM</value>
</property>
<!-- since we included the HTTP principal all keytabs we can use it here -->
<property>
<name>dfs.web.authentication.kerberos.keytab</name>
<value>/etc/security/phd/keytab/hdfs.service.keytab</value>
</property>
<!-- THE PROPERTIES BELOW ARE OPTIONAL AND REQUIRE RPC PRIVACY (core-site): THEY ENABLE ON WIRE HDFS BLOCK ENCRYPTION -->
<property>
<name>dfs.encrypt.data.transfer</name>
<value>true</value>
</property>
<property>
<name>dfs.encrypt.data.transfer.algorithm</name>
<value>rc4</value>
<description>may be "rc4" or "3des" - 3des has a significant performance impact</description>
</property>
</pre>
</div></div></li><li><p>Edit <code>/usr/lib/gphd/hadoop/etc/hadoop/yarn-site.xml</code> as follows:</p><div class="code panel pdl" style="border-width: 1px;"><div class="codeContent panelContent pdl">
<pre class="theme: Confluence; brush: java; gutter: false" style="font-size:12px;"><!-- resource manager secure configuration info -->
<property>
<name>yarn.resourcemanager.principal</name>
<value>yarn/_HOST@REALM</value>
</property>
<property>
<name>yarn.resourcemanager.keytab</name>
<value>/etc/security/phd/keytab/yarn.service.keytab</value>
</property>
<!-- remember the principal for the node manager is the principal for the host this yarn-site.xml file is on -->
<!-- these (next four) need only be set on node manager nodes -->
<property>
<name>yarn.nodemanager.principal</name>
<value>yarn/_HOST@REALM</value>
</property>
<property>
<name>yarn.nodemanager.keytab</name>
<value>/etc/security/phd/keytab/yarn.service.keytab</value>
</property>
<property>
<name>yarn.nodemanager.container-executor.class</name>
<value>org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor</value>
</property>
<property>
<name>yarn.nodemanager.linux-container-executor.group</name>
<value>yarn</value>
</property>
<!-- OPTIONAL - set these to enable secure proxy server node -->
<property>
<name>yarn.web-proxy.keytab</name>
<value>/etc/security/phd/keytab/yarn.service.keytab</value>
</property>
<property>
<name>yarn.web-proxy.principal</name>
<value>yarn/_HOST@REALM</value>
</property></pre>
</div></div></li><li><p>Edit <code>/usr/lib/gphd/hadoop/etc/hadoop/mapred-site.xml</code> as follows:</p><div class="code panel pdl" style="border-width: 1px;"><div class="codeContent panelContent pdl">
<pre class="theme: Confluence; brush: java; gutter: false" style="font-size:12px;"><!-- job history server secure configuration info -->
<property>
<name>mapreduce.jobhistory.keytab</name>
<value>/etc/security/phd/keytab/mapred.service.keytab</value>
</property>
<property>
<name>mapreduce.jobhistory.principal</name>
<value>mapred/_HOST@REALM</value>
</property>
</pre>
</div></div></li></ol><h3 id="Security-CompletetheHDFS/YARNSecureConfiguration">Complete the HDFS/YARN Secure Configuration</h3><ol><li><p>Start the cluster:</p><div class="code panel pdl" style="border-width: 1px;"><div class="codeContent panelContent pdl">
<pre class="theme: Confluence; brush: java; gutter: false" style="font-size:12px;">$ icm_client start</pre>
</div></div></li><li>Check that all the processes listed below start up. If not, go to the appendix on troubleshooting.<br/><ul><li>Control processes: namenode, resourcemanager, historyserver should all be running.</li><li>Cluster worker processes: datanode and namenode should be running.<br/> <strong>Note</strong>: Until you do HBase security configuration, HBase will not start up on a secure cluster.</li></ul></li><li><p>Create a principal for a standard user (the user must exist as a Linux user on all cluster hosts):</p><div class="code panel pdl" style="border-width: 1px;"><div class="codeContent panelContent pdl">
<pre class="theme: Confluence; brush: java; gutter: false" style="font-size:12px;">kadmin: addprinc testuser</pre>
</div></div><p>Set the password when prompted.</p></li><li><p>Login as that user on a client box (or any cluster box, if you do not have specific client purposed systems).</p></li><li><p>Get your Kerberos TGT by running <code>kinit</code> and entering the password:</p><div class="code panel pdl" style="border-width: 1px;"><div class="codeContent panelContent pdl">
<pre class="theme: Confluence; brush: java; gutter: false" style="font-size:12px;">kinit testuser</pre>
</div></div></li><li><p>Test simple HDFS file list and directory create:</p><div class="code panel pdl" style="border-width: 1px;"><div class="codeContent panelContent pdl">
<pre class="theme: Confluence; brush: java; gutter: false" style="font-size:12px;">hadoop fs -ls
hadoop fs -mkdir testdir</pre>
</div></div><p>If these do not work, go to the <a href="#Security-Troubleshooting.1">Troubleshooting </a>section.</p></li><li>[Optional] Set the sticky bit on the <code>/tmp</code> directory (prevents non-super-users from moving or deleting other users' files in <code>/tmp</code>):<ol><li>Login as <code>gpadmin</code> on any HDFS service node (namenode, datanode).</li><li><p>Execute the following:</p><div class="code panel pdl" style="border-width: 1px;"><div class="codeContent panelContent pdl">
<pre class="theme: Confluence; brush: java; gutter: false" style="font-size:12px;">sudo -u hdfs kinit -k -t /etc/security/phd/keytab/hdfs.service.keytab hdfs/this-host_fqdn@REALM</pre>
</div></div></li><li><p>Execute the following:</p><div class="code panel pdl" style="border-width: 1px;"><div class="codeContent panelContent pdl">
<pre class="theme: Confluence; brush: java; gutter: false" style="font-size:12px;">sudo -u hdfs hadoop fs -chmod 1777 /tmp</pre>
</div></div></li><li><p>Run a simple MapReduce job such as the Pi example:</p><div class="code panel pdl" style="border-width: 1px;"><div class="codeContent panelContent pdl">
<pre class="theme: Confluence; brush: java; gutter: false" style="font-size:12px;">hadoop jar /usr/lib/gphd/hadoop-mapreduce/hadoop-mapreduce-examples-2.0.2-alpha-gphd-2.0.1.0.jar pi 10 100</pre>
</div></div></li></ol></li></ol><p>If this all works, then you are ready to configure other services. If not, see the <a href="#Security-Troubleshooting.1">Troubleshooting </a>section.</p><h3 id="Security-TurningSecureModeOff">Turning Secure Mode Off</h3><p>To turn off secure mode:</p><ol><li><p>Stop the cluster:</p><div class="code panel pdl" style="border-width: 1px;"><div class="codeContent panelContent pdl">
<pre class="theme: Confluence; brush: java; gutter: false" style="font-size:12px;">icm_client stop</pre>
</div></div></li><li>Comment out<code> HADOOP_SECURE_DN_USER</code> in<code> hadoop-env.sh</code> and <code>/etc/init.d/hadoop-hdfs-datanode</code> on all data nodes.</li><li>Either:<ol><li>If you made backups as suggested above:<br/>Restore the original site xml files<br/>or:</li><li>If you do not have backups, then edit the site xml as follows:</li></ol></li></ol><ul><li style="list-style-type: none;background-image: none;"><ul><li style="list-style-type: none;background-image: none;"><ul><li>Set the Linux container executable to <code>org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor</code> on all data nodes.</li><li>Set <code>dfs.block.access.token.enable</code> to <code>false</code> on all data nodes.</li><li>Return the datanode ports modified above so they are > 1024 again.</li><li>Set <code>hadoop.security.authentication</code> to <code>simple</code> and <code>hadoop.security.authorization</code> to <code>false </code>in <code>core-site.xml</code> on all cluster nodes.</li><li>Undo the changes to the Zookeeper site xml and configuration files.</li><li>If applicable, revert the changes to the <code>hdfs-client.xml</code> and <code>gpinisystem_config</code> for HAWQ.</li><li>If applicable, undo the changes to the Hive and HBase site xml, configuration, and environments.</li><li>Start the cluster.</li></ul></li></ul></li></ul><p><span class="confluence-anchor-link" id="Security-BuildJSVC"></span></p><h3 id="Security-BuildingandInstallingJSVC">Building and Installing JSVC</h3><p>In order for the data nodes to start as root to get secure ports, then switch back to the hdfs user, jsvc must be installed (<a class="external-link" href="http://commons.apache.org/proper/commons-daemon/download_daemon.cgi" rel="nofollow">http://commons.apache.org/proper/commons-daemon/download_daemon.cgi</a>). If the packaged jsvc binary is not working, we recommend building jscv from source for your platform.</p><p>You only need to perform the make on one node, then the binary can be distributed to the others (assuming all systems are the same basic image):</p><ol><li><p>Install gcc and make (you can remove them after this process if desired):</p><div class="code panel pdl" style="border-width: 1px;"><div class="codeContent panelContent pdl">
<pre class="theme: Confluence; brush: java; gutter: false" style="font-size:12px;">yum install gcc make</pre>
</div></div></li><li>Download the Apache commons daemon. For example, <code>commons-daemon-</code>1.0<code>.15-src.zip</code> was tested.<br/>The demon is available here: <a class="external-link" href="http://commons.apache.org/proper/commons-daemon/download_daemon.cgi" rel="nofollow">http://commons.apache.org/proper/commons-daemon/download_daemon.cgi</a></li><li><code>scp</code> it to one of your data node cluster systems.</li><li>Uncompress it.</li><li><p>Change to the install directory:</p><div class="code panel pdl" style="border-width: 1px;"><div class="codeContent panelContent pdl">
<pre class="theme: Confluence; brush: java; gutter: false" style="font-size:12px;">cd commons-daemon-1.0.15-src/src/native/unix</pre>
</div></div></li><li><p>If you are on a 64-bit machine and using a 64 bit JVM, run these exports before configure/make:</p><div class="code panel pdl" style="border-width: 1px;"><div class="codeContent panelContent pdl">