I have a data set with three columns:
https://drive.google.com/file/d/1gtCssfAXHxRjGfX8uTAaimGPWCA2cnci/view?usp=sharing
Here are the first few lines:
ID transcript_id go_description
MA_10000213g0010 MA_10000213g0010
MA_10000405g0010 MA_10000405g0010 GO:0006468-protein phosphorylation;GO:0030246-carbohydrate binding;GO:0005524-ATP binding;GO:0004672-protein kinase activity
MA_1000049g0010 MA_1000049g0010
MA_10000516g0010 MA_10000516g0010 GO:0005515-protein binding
MA_10001015g0010 MA_10001015g0010
MA_10001337g0010 MA_10001337g0010
MA_10001425g0010 MA_10001425g0010
MA_10001478g0010 MA_10001478g0010
MA_10001558g0010 MA_10001558g0010
MA_10001g0010 MA_10001g0010
MA_10002030g0010 MA_10002030g0010 GO:0005737-cytoplasm;GO:0000184-nuclear-transcribed mRNA catabolic process, nonsense-mediated decay;GO:0004386-helicase activity;GO:0008270-zinc ion binding;GO:0003677-DNA binding;GO:0005524-ATP binding
MA_10002157g0010 MA_10002157g0010 GO:0006468-protein phosphorylation;GO:0005524-ATP binding;GO:0004672-protein kinase activity
MA_10002549g0010 MA_10002549g0010
MA_10002583g0010 MA_10002583g0010 GO:0008168-methyltransferase activity
MA_10002614g0010 MA_10002614g0010
MA_10002643g0010 MA_10002643g0010 GO:0055114-oxidation-reduction process
In the third column, I would like to remove all the text and only keep the GO:xxxxxxx where each of these terms should be separated by a comma. For example:
GO:0006468, GO:0030246
The first two columns should remain unchanged. How can I do this?
GO:0030246-carbohydrate binding;). You can do that directly from Google Sheets: File=> Download => Tab Separated Values (tsv).