Skip to main content
1 of 3
cas
  • 84.4k
  • 9
  • 136
  • 205

lynx -dump to convert the HTML to plain text, and then awk to reformat the output, setting the field-separator to a newline (\n) and the record separator to two-or-more newlines (\n\n+).

The sub() functions in the awk script remove excess spaces before printing the required output.

$ lynx -dump ramp.html | 
    awk -v RS='\n\n' -F'\n' '/^[[:space:]]+/ {
        sub(/^ +/,"",$1);
        sub(/ +/," ",$2);
        print $1":"$2
    }'
Commodity Orgin: uerb45e001.material.com
Commodity Code & Dimension: 151151.15 Dim 90
Commodity Serial #: 2009081020
Client Name: Jack
cas
  • 84.4k
  • 9
  • 136
  • 205