I have column in hive table like below
testing_time 2018-12-31 14:45:55 2018-12-31 15:50:58
Now I want to get the distinct values as a variable so I can use in another query.
I have done like below
abc=`hive -e "select collect_set(testing_time)) from db.tbl";`
echo $abc
["2018-12-31 14:45:55","2018-12-31 15:50:58"]
xyz=${abc:1:-1}
when I do
hive -e "select * from db.tbl where testing_time in ($xyz)"
I get below error
Arguments for IN should be the same type! Types are {timestamp IN (string, string)
what the the mistake I am doing?
What is the correct way of achieving my result?
Note: I know I can use subquery for this scenario but I would like to use variable to achieve my result
Advertisement
Answer
Problem is that you’re comparing timestamp (column testing_time) with string (i.e. "2018-12-31 14:45:55"), so you need to convert string to timestamp, which you can do via TIMESTAMP(string).
Here’s a bash script that adds the conversion:
RES="" # here we will save the resulting SQL
IFS=","
read -ra ITEMS <<< "$xyz" # split timestamps into array
for ITEM in "${ITEMS[@]}"; do
RES="${RES}TIMESTAMP($ITEM)," # add the timestamp to RES variable,
# surrounded by TIMESTAMP(x)
done
unset IFS
RES="${RES%?}" # delete the extra comma
Then you can run the constructed SQL query:
hive -e "select * from db.tbl where testing_time in ($RES)"