Skip to content
Advertisement

How to split a text file into blocks with 10+ characters without dividing words using sed in Linux?

I want to come up with a sed command where once every 10 character will look for the nearest space and substitute it with “|”

I tried sed -E -e 's/ /|/( *?[0-9a-zA-Z]*){10,}' new.file, but it shows errors.

Example input:

Hello there! How are you? I am trying to figure this out.

Expected Output:

Hello there!|How are you?|I am trying|to figure this|out.

Advertisement

Answer

This works for given sample:

$ sed -E 's/(.{10}[^ ]*) /1|/g' ip.txt
Hello there!|How are you?|I am trying|to figure this|out.
  • (.{10}[^ ]*) this matches 10 characters, followed by any non-space characters
  • then a space is matched
  • 1| put back captured portion and a | character
User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement