Managing passwords in Sqoop

Sqoop makes it easy to transfer data in and out of Hadoop. In this post, we’ll cover the different options available for managing passwords, with the exception of data source specific integration such as oracle wallet.

Motivation

Here’s a basic Sqoop command:

sqoop import --connect jdbc:mysql://example.com/sqoop --username sqoop --password sqoop --table tbl

The username and password are both passed on the command line. This isn’t desirable in some organizations because:

  1. Passwords are sacred and shell history is easy to gather
  2. A password manager might be hiding real passwords behind a single passphrase (think LastPass)
  3. Viruses that log key strokes are a dime a dozen

Solutions

Sqoop docs provide 3 ways to manage passwords, but there’s actually a 4th option as well. Let’s walk through all the options available.

1. Use –password

This is a completely acceptable way of passing a database password to Sqoop if security is not a concern.

sqoop import --connect jdbc:mysql://example.com/sqoop --username sqoop --password sqoop --table tbl

2. Use -P

One of the problems with –password is that it logs the password in shell history. Luckily, Sqoop supports reading passwords via input stream (STDIN) with echo turned off:

sqoop import --connect jdbc:mysql://example.com/sqoop --username sqoop -P --table tbl

The -P option will tell Sqoop to prompt for a password. This will allow you to pass your password through the standard input stream like so:

sqoop import with -P option

You can also wrap sqoop commands with an expect script that sends passwords to Sqoop via STDIN:

#!/bin/bash
read p
/usr/bin/expect <<EOF
    spawn sqoop import --connect jdbc:mysql://example.com/sqoop --username sqoop -P --table tbl
    expect "Enter password:"
    send "$p\r\n"
    set timeout -1
    expect "~~~~~~~~~~~~"
    wait
EOF

3. Use –password-file

Sqoop can also read passwords from a password file:

echo -n "sqoop" > /etc/sqoop/conf/passwords/mysql-pass.txt
sqoop import --connect jdbc:mysql://example.com/sqoop \
             --username sqoop \
             --password-file /etc/sqoop/conf/passwords/mysql-pass.txt \
             --table tbl

Notice the “echo” command has the “-n” option. This means that a newline will not be added at the end of the echo statement. The entire contents of the password file will be groked and used as the password by sqoop.

Here’s an example of using an encrypted file:

sqoop import -Dorg.apache.sqoop.credentials.loader.class=org.apache.sqoop.util.password.CryptoFileLoader \
             -Dorg.apache.sqoop.credentials.loader.crypto.passphrase=sqoop \
             --connect jdbc:mysql://example.com/sqoop \
             --username sqoop \
             --password-file /etc/sqoop/conf/passwords/mysql-pass.txt \
             --table tbl

4. Use a custom password loader

This is the option NOT in the Sqoop docs AFAIK.

Sometimes, in order to access a RDBMS, a passphrase from an auxilliary key store must be used. Luckily for us, Sqoop can do this if we provide a custom credentials loader implementation.

public class MyPasswordLoader extends PasswordLoader {
  @Override
  public String loadPassword(String p, Configuration configuration) throws IOException {
    return myKeyStore.getPassphrase();
  }
}

Sumary

Hopefully this helps every one manage their passwords with Sqoop1 :).

E: team@ingest.tips!

T: @abeaamase

Tweet about this on TwitterShare on FacebookShare on LinkedIn


'Managing passwords in Sqoop' have 11 comments

  1. March 12, 2015 @ 5:42 am Mark

    Thanks for sharing! Super useful. For option #4, how do you specific the name of the classfile to be used for loading password, on command line?

    Reply

    • Abraham Elmahrek

      March 12, 2015 @ 7:07 pm Abraham Elmahrek

      I didn’t add a lot of clarity in my Blog post about this, so let me clarify real quick. The classname has to be provided using the “org.apache.sqoop.credentials.loader.class” argument as per #3 above. See the following example:

      1. Compile it with the relevant Sqoop and Hadoop jars:

      javac MyPasswordLoader.java -cp /usr/lib/hadoop/hadoop-common-2.5.0-cdh5.2.1.jar:/usr/lib/sqoop/sqoop-1.4.5-cdh5.3.0.jar

      2. Run Sqoop:

      sqoop import -libjars /tmp/test/MyPasswordLoader.class,/tmp/sqoop-abe/compile/a4cf7f924a22afc92218f4a2e924fe85/tbl.jar -Dorg.apache.sqoop.credentials.loader.class=MyPasswordLoader --connect "jdbc:mysql://example.com/sqoop" --table tbl --username sqoop --password-file file

      Note the usage of “-libjars”, “-Dorg.apache.sqoop.credentials.loader.class”, and “–password-file”. “–password-file” tells Sqoop to use the class specified by “-Dorg.apache.sqoop.credentials.loader.class and “-libjars” includes your custom credentials class in Sqoop. Your custom credentials class may not use the password file, but that argument still must be provided in order to use it.

      It’s a hack, but it works!

      Reply

      • March 12, 2015 @ 8:16 pm Mark

        Got it, thanks Abe!

        Reply

  2. March 12, 2015 @ 8:24 am Olivier Renault

    With SQOOP 1.4.5, you can also store the password as part of a JAVA Key Store. As such you don’t have to store the password in clear text in a file.

    You can generate the keystore by :
    hadoop credential create mysql.password -provider jceks://hdfs/user/orenault/test.jceks

    You are then able to re-use this alias in your sqoop command.
    sqoop import -Dhadoop.security.credential.provider.path=jceks://hdfs/user/orenault/test.jceks –verbose –connect ‘jdbc:mysql://localhost/test’ –table audit –username orenault –password-alias mysql.password -m 1

    Olivier

    Reply

  3. March 13, 2015 @ 2:21 pm Jarcec

    Nice blog post Abe! It would be useful to also add instructions how to create the password file for example #3.

    Reply

    • Abraham Elmahrek

      March 17, 2015 @ 7:13 pm Abraham Elmahrek

      Thanks for the thoughts Jarcec. I clearly didn’t go into great enough detail about how Sqoop decrypts password files. Let me expand on how to generate an encrypted password file and what options are available to users.

      To start off, let me explain what options are available to users:

      1. org.apache.sqoop.credentials.loader.crypto.alg – The Algorithm used to decrypt the file (default is AES/ECB/PKCS5Padding).
      2. org.apache.sqoop.credentials.loader.crypto.salt – The salt used to derive a key with the passphrase (default is SALT).
      3. org.apache.sqoop.credentials.loader.crypto.iterations – Number of PBKDF2 iterations (default is 10000).
      4. org.apache.sqoop.credentials.loader.crypto.salt.key.len – Derived key length (default is 128).
      5. org.apache.sqoop.credentials.loader.crypto.passphrase Passphrase used to derive key.

      Here’s a quick example of how to generate a file using Java:

          import javax.crypto.Cipher;
          import javax.crypto.SecretKeyFactory;
          import javax.crypto.spec.PBEKeySpec;
          import javax.crypto.spec.SecretKeySpec;
          import java.io.File;
          import java.io.FileOutputStream;
          import java.io.IOException;
          import java.security.NoSuchAlgorithmException;
      
          public class Encrypt {
            public static void main(String[] args) throws Exception {
              String filename = args[0];
              String password = "sqoop";
              String alg = "AES/ECB/PKCS5Padding";
              String algOnly = alg.split("/")[0];
              String passphrase = "sqoop2";
              String salt = "SALT";
              int iterations = 10000;
              int keylength = 128;
      
              SecretKeyFactory factory = null;
              try {
                factory = SecretKeyFactory.getInstance("PBKDF2WithHmacSHA1");
              } catch (NoSuchAlgorithmException e) {
                throw new IOException("Can't load SecretKeyFactory", e);
              }
      
              SecretKeySpec key = null;
              try {
                key = new SecretKeySpec(factory.generateSecret(new PBEKeySpec(passphrase.toCharArray(), salt.getBytes(), iterations, keylength)).getEncoded(), algOnly);
              } catch (Exception e) {
                throw new IOException("Can't generate secret key", e);
              }
      
              Cipher crypto = null;
              try {
                crypto = Cipher.getInstance(alg);
              } catch (Exception e) {
                throw new IOException("Can't initialize the cipher", e);
              }
      
              byte[] encryptedBytes;
      
              try {
                crypto.init(Cipher.ENCRYPT_MODE, key);
                encryptedBytes = crypto.doFinal(password.getBytes());
              } catch (Exception e) {
                throw new IOException("Can't encrypt the password", e);
              }
      
      
      
              FileOutputStream output = new FileOutputStream(new File(filename));
              output.write(encryptedBytes);
              output.close();
            }
          }
      

      The generated password file can then be used:

      sqoop import -Dorg.apache.sqoop.credentials.loader.class=org.apache.sqoop.util.password.CryptoFileLoader \
                   -Dorg.apache.sqoop.credentials.loader.crypto.passphrase=sqoop2 \
                   --connect jdbc:mysql://example.com/sqoop \
                   --username sqoop \
                   --password-file file:///tmp/pass.enc \
                   --table tbl

      NOTE: I highly recommend changing the salt value to something less obvious.

      NOTE: Sqoop internally uses PBKDF2WithHmacSHA1 (Password Based Key Derivation Function with HMAC SHA 1) to derive the key used to decrypt the password file. If the password file has been encrypted using a key derived with a different algorithm, it will not work.

      Reply

      • March 17, 2015 @ 7:22 pm Jarcec

        That is exactly what I was looking for, thank you for putting the example up Abe! Do you think that it would make sense to improve Sqoop, so that it can generate the password files?

        Reply

      • March 20, 2015 @ 2:07 pm Michael Arena

        This page is immensely helpful. Thanks.
        But I have a concern (that maybe unfounded):
        If using the encrypted file with the passphrase option, wouldn’t that allow anyone to use “ps” (or even examining .bash_history) and view the command line of sqoop and obtain the passphrase and all the other parameters to decrypt the password file?

        Also, since I am trying to set up nightly incremental jobs (in the Sqoop Metastore) and it is not (yet) possible to save “-D” arguments for the Sqoop job (as Jarcec commented in SQOOP-1933 about SQOOP-434), I am not sure how I could even use the passphrase?
        Is it possible to put the “-D” options in sqoop-site.xml or even in the –options-file?

        Reply

        • Abraham Elmahrek

          March 22, 2015 @ 7:25 am Abraham Elmahrek

          Michael, you’re completely right about bash history. In my blog post, I’ve provided 2 alternative methods for providing a password (see #2 and #4). Ideally, these options would be used with a password management solution of some kind. This would be significantly more secure. Also, when using the -P option, you can always just provide the password via standard input with stty echo off :).

          For some of the -Doptions, you might be able to pass them via sqoop-site.xml. Which ones are they specifically? Also, try asking this on the user@sqoop.apache.org mailing list.

          Reply


Would you like to share your thoughts?

Your email address will not be published.