I have several large .xml files. I want to parse out the files to do several things.
I want to pull out only:
- XML-/title1 and save it to list A (for example)
- XML-/title2 and save it to list B
- XML-/title3 and save it to list C
- etc, etc
Using Python 2.x which library would be best to import/use. How would I set this up? Any Suggestions?
For Example:
<PubmedArticle>
<MedlineCitation Owner="NLM" Status="MEDLINE">
<PMID Version="1">8981971</PMID>
<Article PubModel="Print">
<Journal>
<ISSN IssnType="Print">0002-9297</ISSN>
<JournalIssue CitedMedium="Print">
<Volume>60</Volume>
<Issue>1</Issue>
<PubDate>
<Year>1997</Year>
<Month>Jan</Month>
</PubDate>
</JournalIssue>
<Title>American journal of human genetics</Title>
<ISOAbbreviation>Am. J. Hum. Genet.</ISOAbbreviation>
</Journal>
<ArticleTitle>mtDNA and Y chromosome-specific polymorphisms in modern Ojibwa: implications about the origin of their gene pool.</ArticleTitle>
<Pagination>
<MedlinePgn>241-4</MedlinePgn>
</Pagination>
<AuthorList CompleteYN="Y">
<Author ValidYN="Y">
<LastName>Scozzari</LastName>
<ForeName>R</ForeName>
<Initials>R</Initials>
</Author>
</AuthorList>
<MeshHeadingList>
<MeshHeading>
<DescriptorName MajorTopicYN="N">Alleles</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="Y">Y Chromosome</DescriptorName>
</MeshHeading>
</MeshHeadingList>
<OtherID Source="NLM">PMC1712541</OtherID>
</MedlineCitation>
</PubmedArticle>
xml.dom.minidomfor this, it comes with Python and works fine.lxmlis another good library but you'd have to install it.