Category Archives: apache-sqoop

Multiple character delimiter using apache sqoop import

I am importing data from teradata(RDBMS) to hive using apache sqoop. The usual delimiters used for import like ",", "|", "~" are present in the tables. Is there a way to use multiple characters as delimiters in apache sqoop.

To avoid it, I have used --escaped-by "\t" and --fields-terminated-by "," parameters in sqoop import command. So is there a way to 'unescape' the "\t" I used in sqoop import.

Cant Connect MSSQL server Using sqoop to Hadoop

I am trying to connect Mssql server to Hadoop using Sqoop.I cant able to connect to MSSQL server .I am struct on this for the past two days would anyone please help me to solve this problem .when I am trying list a databases in MSSQL server using Sqoop it will get error like this

Failed to list databases
com.microsoft.sqlserver.jdbc.SQLServerException: The TCP/IP connection to the host 192.168.1.26, port 1433 has failed. Error: "connect timed out. Verify the connection properties. Make sure that an instance of SQL Server is running on the host and accepting TCP/IP connections at the port. Make sure that TCP connections to the port are not blocked by a firewall.".
    at com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDriverError(SQLServerException.java:191)
    at com.microsoft.sqlserver.jdbc.SQLServerException.ConvertConnectExceptionToSQLServerException(SQLServerException.java:242)
    at com.microsoft.sqlserver.jdbc.SocketFinder.findSocket(IOBuffer.java:2280)
    at com.microsoft.sqlserver.jdbc.TDSChannel.open(IOBuffer.java:493)
    at com.microsoft.sqlserver.jdbc.SQLServerConnection.connectHelper(SQLServerConnection.java:1387)
    at com.microsoft.sqlserver.jdbc.SQLServerConnection.login(SQLServerConnection.java:1068)
    at com.microsoft.sqlserver.jdbc.SQLServerConnection.connectInternal(SQLServerConnection.java:904)
    at com.microsoft.sqlserver.jdbc.SQLServerConnection.connect(SQLServerConnection.java:451)
    at com.microsoft.sqlserver.jdbc.SQLServerDriver.connect(SQLServerDriver.java:1014)
    at java.sql.DriverManager.getConnection(DriverManager.java:571)
    at java.sql.DriverManager.getConnection(DriverManager.java:233)
    at org.apache.sqoop.manager.SqlManager.makeConnection(SqlManager.java:883)
    at org.apache.sqoop.manager.GenericJdbcManager.getConnection(GenericJdbcManager.java:52)
    at org.apache.sqoop.manager.CatalogQueryManager.listDatabases(CatalogQueryManager.java:57)
    at org.apache.sqoop.tool.ListDatabasesTool.run(ListDatabasesTool.java:49)
    at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
    at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
    at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
    at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
16/07/27 16:35:24 ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.RuntimeException: com.microsoft.sqlserver.jdbc.SQLServerException: The TCP/IP connection to the host 192.168.1.26, port 1433 has failed. Error: "connect timed out. Verify the connection properties. Make sure that an instance of SQL Server is running on the host and accepting TCP/IP connections at the port. Make sure that TCP connections to the port are not blocked by a firewall.".
java.lang.RuntimeException: com.microsoft.sqlserver.jdbc.SQLServerException: The TCP/IP connection to the host 192.168.1.26, port 1433 has failed. Error: "connect timed out. Verify the connection properties. Make sure that an instance of SQL Server is running on the host and accepting TCP/IP connections at the port. Make sure that TCP connections to the port are not blocked by a firewall.".
    at org.apache.sqoop.manager.CatalogQueryManager.listDatabases(CatalogQueryManager.java:73)
    at org.apache.sqoop.tool.ListDatabasesTool.run(ListDatabasesTool.java:49)
    at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
    at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
    at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
    at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
Caused by: com.microsoft.sqlserver.jdbc.SQLServerException: The TCP/IP connection to the host 192.168.1.26, port 1433 has failed. Error: "connect timed out. Verify the connection properties. Make sure that an instance of SQL Server is running on the host and accepting TCP/IP connections at the port. Make sure that TCP connections to the port are not blocked by a firewall.".
    at com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDriverError(SQLServerException.java:191)
    at com.microsoft.sqlserver.jdbc.SQLServerException.ConvertConnectExceptionToSQLServerException(SQLServerException.java:242)
    at com.microsoft.sqlserver.jdbc.SocketFinder.findSocket(IOBuffer.java:2280)
    at com.microsoft.sqlserver.jdbc.TDSChannel.open(IOBuffer.java:493)
    at com.microsoft.sqlserver.jdbc.SQLServerConnection.connectHelper(SQLServerConnection.java:1387)
    at com.microsoft.sqlserver.jdbc.SQLServerConnection.login(SQLServerConnection.java:1068)
    at com.microsoft.sqlserver.jdbc.SQLServerConnection.connectInternal(SQLServerConnection.java:904)
    at com.microsoft.sqlserver.jdbc.SQLServerConnection.connect(SQLServerConnection.java:451)
    at com.microsoft.sqlserver.jdbc.SQLServerDriver.connect(SQLServerDriver.java:1014)
    at java.sql.DriverManager.getConnection(DriverManager.java:571)
    at java.sql.DriverManager.getConnection(DriverManager.java:233)
    at org.apache.sqoop.manager.SqlManager.makeConnection(SqlManager.java:883)
    at org.apache.sqoop.manager.GenericJdbcManager.getConnection(GenericJdbcManager.java:52)
    at org.apache.sqoop.manager.CatalogQueryManager.listDatabases(CatalogQueryManager.java:57)
    ... 7 more

sqoop import error for sybase – File does not exist:

I am trying to import data from sybase to HDFS using Sqoop. But I am getting the following error.

How to solve this?

command :sqoop import --verbose --connect jdbc:sybase:Tds:[host]:5000?ServiceName=master --username sa --password password --query "select * from City where \$CONDITIONS" --target-dir /tmp/HELLO/tej01 --driver "com.sybase.jdbc4.jdbc.SybDriver" -m 1

error :

ERROR tool.ImportTool: Encountered IOException running import job: java.io.FileNotFoundException: File does not exist: hdfs://[host]:54310/usr/lib/sqoop/lib/parquet-format-2.0.0.jar
at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1122)
at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1114)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1114)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:93)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:269)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:390)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:483)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1296)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1293)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1293)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1314)
at org.apache.sqoop.mapreduce.ImportJobBase.doSubmitJob(ImportJobBase.java:196)
at org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:169)
at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:266)
at org.apache.sqoop.manager.SqlManager.importQuery(SqlManager.java:729)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:499)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
at org.apache.sqoop.Sqoop.main(Sqoop.java:236)

Hadoop version : 2.6.0

Sqoop version : 1.4.6

sqoop import eror – File does not exist:

I am trying to import data from MySql to HDFS using Sqoop. But I am getting the following error.

How to solve this?

command :

sqoop import --connect jdbc:mysql://localhost/testDB --username root --password password --table student --m 1

error :

 ERROR tool.ImportTool: Encountered IOException running import job: java.io.FileNotFoundException: File does not exist: hdfs://localhost:54310/usr/lib/sqoop/lib/parquet-format-2.0.0.jar
    at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1122)
    at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1114)
    at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
    at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1114)
    at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
    at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
    at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:93)
    at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
    at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:269)
    at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:390)
    at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:483)
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1296)
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1293)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
    at org.apache.hadoop.mapreduce.Job.submit(Job.java:1293)
    at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1314)
    at org.apache.sqoop.mapreduce.ImportJobBase.doSubmitJob(ImportJobBase.java:196)
    at org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:169)
    at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:266)
    at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:673)
    at org.apache.sqoop.manager.MySQLManager.importTable(MySQLManager.java:118)
    at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:497)
    at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
    at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
    at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
    at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)

Hadoop version : 2.6.0

Sqoop version : 1.4.6

Failover handler in apache sqoop

Is there any way i can provide multiple jdbc connections within the sqoop command.

For example JDBC1 is down then sqoop should pick up JDBC2 and run the import/export.

for example:

sqoop export --connect "jdbc string 1 / jdbc string 2" (basing on the availability it should pick the jdbc string which is working.

In SQL Server we have some thing like the below.

"Server=Partner_A; Failover_Partner=Partner_B; Database=AdventureWorks2008R2; Network=dbmssocn"

Can this be used.

Query Extensionless File using Apache Drill

I imported data in Hadoop using Sqoop 1.4.6. Sqoop imports and saves the data in HDFS in an extensionless file but in csv format. I used Apache Drill to query the data from this file but got Table not found error. In Storage Plugin configuration, I even put null, blank (""), space (" ") in extensions but was not able to query the file. Even I was able to query the file when I changed the filename with an extension. Putting any extension in the configuration file works other than null extension. I could query the file saved in csv format but with extension 'mat' or anything.

Is there any way to query the extensionless files?

In Sqoop, file import, I would like to control the imported data within file splits using defined mappers

MySQL -> select * from Employee

empno | empname      | salary 
======================================================
|   101 | Ram          |   5000 |    
|   102 | Hari         |   7000 |   
|   104 | Vamshi       |   7000 |   
|   103 | Revathy      |   7000 |  
|   105 | Jaya         |   9000 |  
|   106 | Suresh       |   8000 |  
|   107 | Ramesh       |   9000 |  
|   108 | Prasana      |  10000 |  
|   109 | Ramsamy      |  20000 |  
|   110 | Singaram     |  30000 |  
|   200 | ramanathan   |  30000 |  
|   201 | Victor       |  33000 |  
|   202 | Naveen       |  33000 |  
|   203 | Karthik      |  33000 |  
|   204 | Karthikeyan  |  33000 |   
|   205 | Somasundaram |  43000 |   
|   301 | Test1        |  50000 |   
|   302 | Test2        |  60000 |   
|   303 | Test3        |  70000 

Command in Sqoop

sqoop import  --connect jdbc:mysql://<hostname>/test --username <username> --password <password> --table employee 
--direct --verbose
 --split-by salary 

By giving above command, it takes min(salary), max(salary) and moves to HDFS table by 10 records in first file,
 3 records in second file,
 3 records in third file & 3 records in last file

    5/07/03 17:32:37 INFO db.DataDrivenDBInputFormat:
 BoundingValsQuery: SELECT MIN(`salary`), MAX(`salary`) FROM employee

15/07/03 17:32:37 DEBUG db.IntegerSplitter: Splits: [      
                 5,000 to 70,000] into 4 parts
15/07/03 17:32:37 DEBUG db.IntegerSplitter: 5,000

15/07/03 17:32:37 DEBUG db.IntegerSplitter: 21,250
15/07/03 17:32:37 DEBUG db.IntegerSplitter: 37,500
15/07/03 17:32:37 DEBUG db.IntegerSplitter: 53,750
15/07/03 17:32:37 DEBUG db.IntegerSplitter: 70,000
15/07/03 17:32:37 DEBUG db.DataDrivenDBInputFormat: Creating input split
 with      lower bound '`salary` >= 5000' and upper bound '`salary` < 21250'

15/07/03 17:32:37 DEBUG db.DataDrivenDBInputFormat: Creating input split with lower bound 'salary >= 21250' and upper bound 'salary < 37500'

15/07/03 17:32:37 DEBUG db.DataDrivenDBInputFormat: Creating input split with lower bound 'salary >= 37500' and upper bound 'salary < 53750'

15/07/03 17:32:37 DEBUG db.DataDrivenDBInputFormat: Creating input split with lower bound 'salary >= 53750' and upper bound 'salary <= 70000'

15/07/03 17:32:37 INFO mapreduce.JobSubmitter: number of splits:4

I would like to know how it categorizes the number of records in each file. Is that customizable.