Minimum Data to regex
\documentclass{article}
\begin{document}
-------------------------------------------------------------
A B C D
Header Aligned Aligned Aligned
----------- ------- --------------- -------------------------
First row 12.0 Example of a row that
spans multiple lines.
Second row 5.0 Here's another one. Note
the blank line between
rows.
-------------------------------------------------------------
Table: Here's the caption. It, too, may span
multiple lines.
\section{Lorem Ipsun}
Hello world!
-------------------------------------------------------------
A B C D
Header Aligned Aligned Aligned
----------- ------- --------------- -------------------------
First row 12.0 Example of a row that
spans multiple lines.
Second row 5.0 Here's another one. Note
the blank line between
rows.
-------------------------------------------------------------
Table: Here's the caption. It, too, may span
multiple lines.
\end{document}
Desired output
-------------------------------------------------------------
A B C D
Header Aligned Aligned Aligned
----------- ------- --------------- -------------------------
First row 12.0 Example of a row that
spans multiple lines.
Second row 5.0 Here's another one. Note
the blank line between
rows.
-------------------------------------------------------------
Table: Here's the caption. It, too, may span
multiple lines.
-------------------------------------------------------------
A B C D
Header Aligned Aligned Aligned
----------- ------- --------------- -------------------------
First row 12.0 Example of a row that
spans multiple lines.
Second row 5.0 Here's another one. Note
the blank line between
rows.
-------------------------------------------------------------
Table: Here's the caption. It, too, may span
multiple lines.
I want extract all tables from LaTeX document.
Pseudocode
- match >7 of "-" in a row and until everything until "Table:". Include the line with "Table:" but not not anything after that line.
- iterate 1) until the end of the file
My attempt
The first step
[-]{10,777}$
and to include now everything except word "Table:"
((?!Table:).)*$
and include finally everything from line with "Table:"
^(?=.*?\Table:\b)
All combined
[-]{10,777}$((?!Table:).)*$^(?=.*?\Table:\b)
which cannot work. There is something wrong but I do not know what.
How can you regex such an environment well in Perl?
pandocto parse the LaTeX file, then select the interesting tables in the result, then usepandocagain to convert the result back to LaTeX.