I have a .toc (table of contents file) from my .tex document.
It contains a lot of lines and some of them have the form
\contentsline {part}{Some title here\hfil }{5}
\contentsline {chapter}{\numberline {}Person name here}{5}
I know how to grep for part and for chapter. But I'd like to filter for those lines and have the output in a csv file like this:
{Some title here},{Person name here},{5}
or with no braces
Some title here,Person name here,5
1. For sure the number (page number) in the last pair {} is the same for both two lines, so we can filter only the second one.
2. Note that some empty pair {} could happens or also could contain another pair {}. For example, it could be
\contentsline {part}{Title with math $\frac{a}{b}$\hfil }{15}
which should be filtered as
Title with math $\frac{a}{b}$
edit 1: I was able to obtain the numbers without braces at end of line using
grep '{part}' file.toc | awk -F '[{}]' '{print $(NF-1)}'
edit 2: I was able to filter the chapter lines and remove the garbage with
grep '{chapter}' file.toc | sed 's/\\numberline//' | sed 's/\\contentsline//' | sed 's/{chapter}//' | sed 's/{}//' | sed 's/^ {/{/'
and the output without blank spaces was
{Person name here}{5}
edit 3: I was able to filter for part and clean the output with
\contentsline {chapter}{\numberline {}Person name here}{5}
which returns
{Title with math $\frac{a}{b}$}{15}
partandchapterand then filter the data to collect the names and page numbers so that the finalcsvfile looks like{Some title here},{Person name here},{5}(with comma and with/without braces). I don't know how to put all 3 info together on a single line of acsvfile.awk? I'd probably use theText::Balancedperl module as that has aextract_bracketedcall, or there might be other modules that know how to parse TeX.parthas severalchapters but there is one thing that bugs me. What does the{5}do at the end? Is it always a 5? I always saw\contentsline{chapter}{title}{}, i.e. that last argument was always empty.partbegins. You are right when you say that apart(could) contain(s) a lot of chapters. In my case, it contains only one. So just after apartline below comes achapterline.