Practical Handbook of Hadoop: 2017

Friday, May 5, 2017

Understanding Merged Keytabs

What is Merge Keytab

Keytab is a file containing pairs of Kerberos principals and encrypted keys. Keytabs are used in kerberos environment and for the purpose of authenticating and obtaining token without entering the password manually.

Why we need to merge two keytabs

Generally each user id should be having its own keytab to be used. Suppose there are two ids and we need to use a single keytab for both of them, then merge both of them.

For example, consider two ids userid1 and userid2 on the machine proxy1. We have two different keytabs for each of them as shown below.

userid1.server.keytab
userid2.server.keytab

Lets go as per our requirement and use single keytab and merge the above two different keytabs,

1. Understand what is stored in a keytab file,

# klist -ekt userid1.server.keytab
Keytab name: FILE:userid1.server.keytab
KVNO Timestamp Principal
---- ----------------- --------------------------------------------------------
2 03/10/17 08:58:35 userid1/server.FQDN@REALM (aes128-cts-hmac-sha1-96)
2 03/10/17 08:58:35 userid1/server.FQDN@REALM (des3-cbc-sha1)
2 03/10/17 08:58:36 userid1/server.FQDN@REALM (des-cbc-crc)
2 03/10/17 08:58:36 userid1/server.FQDN@REALM (arcfour-hmac)
[Dev root @ server /unix/path]
# klist -ekt userid2.server.keytab
Keytab name: FILE:userid2.server.keytab
KVNO Timestamp Principal
---- ----------------- --------------------------------------------------------
2 02/06/17 09:56:45 userid2/server.FQDN@REALM (aes128-cts-hmac-sha1-96)
2 02/06/17 09:56:45 userid2/server.FQDN@REALM (des3-cbc-sha1)
2 02/06/17 09:56:45 userid2/server.FQDN@REALM (des-cbc-crc)
2 02/06/17 09:56:45 userid2/server.FQDN@REALM (arcfour-hmac)

2. Merge using ktutil command

# ktutil
ktutil: rkt userid1.server.keytab [Enter these commands in your terminal followed by keytab name]
ktutil: rkt userid2.server.keytab [Enter these commands in your terminal followed by keytab name]
ktutil: wkt userid1userid2merged.keytab [Enter these commands in your terminal followed by keytab name]
ktutil: clear
ktutil: quit

3. Following single keytab file is created which is the merged one,

-rw------- 1 root root 802 Apr 24 10:25 userid1userid2merged.keytab

4. Look at the keytab file and observe the principals.

# klist -ekt userid1userid2merged.keytab
Keytab name: FILE:userid1userid2merged.keytab
KVNO Timestamp Principal
---- ----------------- --------------------------------------------------------
2 04/24/17 10:25:10 userid1/server.FQDN@REALM (aes128-cts-hmac-sha1-96)
2 04/24/17 10:25:10 userid1/server.FQDN@REALM (des3-cbc-sha1)
2 04/24/17 10:25:10 userid1/server.FQDN@REALM (des-cbc-crc)
2 04/24/17 10:25:10 userid1/server.FQDN@REALM (arcfour-hmac)
2 04/24/17 10:25:10 userid2/server.FQDN@REALM (aes128-cts-hmac-sha1-96)
2 04/24/17 10:25:10 userid2/server.FQDN@REALM (des3-cbc-sha1)
2 04/24/17 10:25:10 userid2/server.FQDN@REALM (des-cbc-crc)
2 04/24/17 10:25:10 userid2/server.FQDN@REALM (arcfour-hmac)

5. Test it by logging to the user id and check there are no tickets generated prior,

/unix/path >kinit -kt userid2.server.keytab
kinit: Keytab contains no suitable keys for host/server.FQDN@REALM while getting initial credentials
/unix/path >klist -kt userid2.server.keytab
Keytab name: FILE:userid2.server.keytab
KVNO Timestamp Principal
---- ----------------- --------------------------------------------------------
2 02/06/17 09:56:45 userid2/server.FQDN@REALM
2 02/06/17 09:56:45 userid2/server.FQDN@REALM
2 02/06/17 09:56:45 userid2/server.FQDN@REALM
2 02/06/17 09:56:45 userid2/server.FQDN@REALM

6. Obtain the ticket for user2 by running the below command and using the merged keytab. Similarly, you can login to user 1 and obtain the credentials by using merged keytab.

/unix/path >kinit -kt userid1userid2merged.keytab userid2/server.FQDN@REALM
/unix/path >klist
Ticket cache: FILE:/tmp/krb5cc_xxxxxx
Default principal: userid2/server.FQDN@REALM

Valid starting Expires Service principal
04/24/17 10:30:49 04/24/17 18:30:49 krbtgt/REALM@REALM
renew until 04/25/17 10:30:49
/unix/path >

Friday, April 28, 2017

Prerequisites on each nodes before building the hadoop cluster

Prerequisites on each nodes before building the cluster:

1. Choosing the supported operating system.
2. Choosing the supported java.

https://wiki.apache.org/hadoop/HadoopJavaVersions

3. Switch off iptables:

Netfilter is a host-based firewall for Linux. It is included as part of the Linux distribution and activated by default. This firewall is controlled by the program called iptable and this should be turned off. iptables applies to IPv4.

Type the following two commands (you must login as the root user):

# /etc/init.d/iptables save
# /etc/init.d/iptables stop

To turn off firewall on reboot:
# chkconfig iptables off

4. Disabling Transparent Hugepage Compaction

The transparent hugepage will automatically use larger pages for dynamically allocated memory and this is not recommended to be enabled for Hadoop.

For RHEL depending upon the version, to disable transparent hugepage compaction, add the following command to /etc/rc.local: (rc.local is the run level script which gets exectued after all the normal services are started.)

echo never > /sys/kernel/mm/redhat_transparent_hugepage/defrag
echo never > /sys/kernel/mm/redhat_transparent_hugepage/enabled
echo never > /sys/kernel/mm/transparent_hugepage/enabled

If reading the above file (defrag/enabled) [always] never means it is enabled
If reading the above file (defrag/enabled) [never] always means it is disabled

5. vm.swappiness Linux Kernel Parameter

Set vm.swappiness to a value 1. We do not want the swappiness on the nodes. Needs to be minimum as possible.

sysctl -w vm.swappiness=1

General onboarding questions for Hadoop

When the companies plan to on board Apache Hadoop, some of these questions will arise,

1. Prerequisites before building a cluster.
2. Division of roles and responsibilities among the team.
3. Infrastructure servers.
4. Multi-tenancy environment.
5. Security around it.
6. Capacity Planning.
7. How users can access hadoop.
8. User convenience.
9. What tools to use.
10. Tools for anlaytics.
11. Tools for data ingestion.
12. How to store data in efficient way on HDFS cluster.
13. Setting up a disaster recovery cluster.

Introduction to Hadoop

We are all aware of the existence of Bigdata and the role/reason of Apache Hadoop in this world. No we are not going to talk about 3 Vs, nor about big data and why the need of Hadoop, nor the comparison about traditional BI system and hadoop, nor its capabilities of distributed computing.

These are the topics which have been often discussed in great length. Probably everyone by this time should be having a very thorough understanding of all these facts.

This blog is about the practical implementation of hadoop in the real world.

Before that, you may be interested in knowing why an organisation decides to go for tapping into the resources of Big Data. Of course to turn the dormant data to a powerhouse of information from which the companies can immensely benefit. Following are some of the outcome of an organisation preferring to reap the advantages from huge amount of data.

1. Fraud detection.
2. Customer behavior and insights.
3. Business optimization.
4. Predictive analysis.
5. Targeted advertising.
6. Business realization of social media data and semi/unstructured data.
7. Staying ahead of the competitors.
8. Trend analysis.
9. Pattern identification.

Practical Handbook of Hadoop

Friday, May 5, 2017

Understanding Merged Keytabs

Friday, April 28, 2017

Prerequisites on each nodes before building the hadoop cluster

General onboarding questions for Hadoop

Introduction to Hadoop

Understanding Merged Keytabs

Report Abuse

Labels