Skip to content
Advertisement

Arguments mismatch using where IN clause in query

I have column in hive table like below

testing_time
2018-12-31 14:45:55
2018-12-31 15:50:58

Now I want to get the distinct values as a variable so I can use in another query.

I have done like below

abc=`hive -e "select collect_set(testing_time)) from db.tbl";`

echo $abc

["2018-12-31 14:45:55","2018-12-31 15:50:58"]

xyz=${abc:1:-1}

when I do

hive -e "select * from db.tbl where testing_time in ($xyz)"

I get below error

Arguments for IN should be the same type! Types are {timestamp IN (string, string)

what the the mistake I am doing?

What is the correct way of achieving my result?

Note: I know I can use subquery for this scenario but I would like to use variable to achieve my result

Advertisement

Answer

Problem is that you’re comparing timestamp (column testing_time) with string (i.e. "2018-12-31 14:45:55"), so you need to convert string to timestamp, which you can do via TIMESTAMP(string).

Here’s a bash script that adds the conversion:

RES=""                             #  here we will save the resulting SQL
IFS=","
read -ra ITEMS <<< "$xyz"          # split timestamps into array
for ITEM in "${ITEMS[@]}"; do
    RES="${RES}TIMESTAMP($ITEM),"  # add the timestamp to RES variable,
                                   # surrounded by TIMESTAMP(x)
done
unset IFS
RES="${RES%?}"                     # delete the extra comma

Then you can run the constructed SQL query:

hive -e "select * from db.tbl where testing_time in ($RES)"
Advertisement