Skip to content
Advertisement

Trimming string up to certain characters in Bash

I’m trying to make a bash script that will tell me the latest stable version of the Linux kernel.

The problem is that, while I can remove everything after certain characters, I don’t seem to be able to delete everything prior to certain characters.

#!/bin/bash

wget=$(wget --output-document - --quiet www.kernel.org | grep -A 1 "latest_link")

wget=${wget##.tar.xz">}

wget=${wget%</a>}

echo "${wget}"

Somehow the output “ignores” the wget=${wget##.tar.xz">} line.

Advertisement

Answer

You’re trying remove the longest match of the pattern .tar.xz"> from the beginning of the string, but your string doesn’t start with .tar.xz, so there is no match.

You have to use

wget=${wget##*.tar.xz">}

Then, because you’re in a script and not an interactive shell, there shouldn’t be any need to escape grep (presumably to prevent usage of an alias), as aliases are disabled in non-interactive shells.

And, as pointed out, naming a variable the same as an existing command (often found: test) is bound to lead to confusion.

If you want to use command line tools designed to deal with HTML, you could have a look at the W3C HTML-XML-utils (Ubuntu: apt install html-xml-utils). Using them, you could get the info you want as follows:

$ curl -sL www.kernel.org | hxselect 'td#latest_link' | hxextract a -
4.10.8

Or, in detail:

curl -sL www.kernel.org |     # Fetch page
hxselect 'td#latest_link' |   # Select td element with ID "latest_link"
hxextract a -                 # Extract link text ("-" for standard input)
User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement