I have column in hive
table like below
testing_time 2018-12-31 14:45:55 2018-12-31 15:50:58
Now I want to get the distinct
values as a variable so I can use in another query.
I have done like below
abc=`hive -e "select collect_set(testing_time)) from db.tbl";` echo $abc ["2018-12-31 14:45:55","2018-12-31 15:50:58"] xyz=${abc:1:-1}
when I do
hive -e "select * from db.tbl where testing_time in ($xyz)"
I get below error
Arguments for IN should be the same type! Types are {timestamp IN (string, string)
what the the mistake I am doing?
What is the correct way of achieving my result?
Note: I know I can use subquery for this scenario but I would like to use variable to achieve my result
Advertisement
Answer
Problem is that you’re comparing timestamp (column testing_time
) with string (i.e. "2018-12-31 14:45:55"
), so you need to convert string to timestamp, which you can do via TIMESTAMP(string)
.
Here’s a bash script that adds the conversion:
RES="" # here we will save the resulting SQL IFS="," read -ra ITEMS <<< "$xyz" # split timestamps into array for ITEM in "${ITEMS[@]}"; do RES="${RES}TIMESTAMP($ITEM)," # add the timestamp to RES variable, # surrounded by TIMESTAMP(x) done unset IFS RES="${RES%?}" # delete the extra comma
Then you can run the constructed SQL query:
hive -e "select * from db.tbl where testing_time in ($RES)"