0

Situation: I am working in Power Query with data that was imported from a pdf file, and it came in a bit messy. I have a column that contains numbers, as well as text strings. Some of the text strings are mixed-case, containing both upper and lower case characters, while others contain only upper case characters.

Goal: I want to remove all numbers and all mixed-case text strings. The end result should show only the text strings that are completely upper case.

For example, I would want my end result to include things like IRA, IRREVOCABLE TRUST, CHARITABLE TRUST, but replace things like Number of Accounts, Totals, 14 with null.

What I have tried so far:

  1. The following gets rid of numbers and lower case characters, but it doesn't quite work since it keeps upper case characters included in mixed-case character strings.

    Table.AddColumn(#"Added Custom2", "Account Type" each Text.Select([AccountType], {"A".."Z"," "}), type text)

  2. The following code gets rid of the mixed-case text strings, but it doesn't quite work because it doesn't remove the numbers. Also, it is too specific, requiring me to remove strings that contain specific words. I'd prefer to remove all strings that contain lower case characters.

    Table.AddColumn(#"Added Custom2", "Account Type", each if [AccountType]= null or Text.Contains([AccountType],"Totals") or Text.Contains([AccountType],"of") or Text.Contains([AccountType],"report") then null else [AccountType])

Your insights would be appreciated. I'm a new PowerQuery user, so please be specific and detailed with your responses.

2 Answers 2

1

Hard to know exactly what you want since you provide no examples of your data.

  • If you have multiple (space-separated) strings in a cell, then you can use:
   #"Added Custom" = Table.AddColumn(#"Changed Type", "allCaps", each 
        Text.Combine(
            List.Accumulate(Text.Split([Column1]," "),
                            {},
                            (state, current)=>  
                                if List.ContainsAny(
                                        Text.ToList(current),
                                            {"0".."9","a".."z",",",":","?","/","\"," "}) 
                                then state 
                                else state & {current}),", "))
  • If you just have a single string in a cell, then you can use
  #"Added Custom" = Table.AddColumn(#"Changed Type", "Custom", each 
        if List.ContainsAny(
                Text.ToList([Column1]),
                {"0".."9","a".."z",",",":","?","/","\"," "}) 
        then null 
        else [Column1])

Then you can filter by deselecting the null in the added column

In each case, #"Changed Type" is the previous step. If that is not the case in your code, replace with the actual name of your previous step

Sign up to request clarification or add additional context in comments.

9 Comments

I edited my question to include some examples. In trying the first code you provided, I'm getting a syntax error (token comma expected). This is exactly what I typed: = #"Added Custom" = Table.AddColumn(#"Added Custom2", "Account Type", each Text.Combine(List.Accumulate(Text.Split([AccountTypeHelperColumn]," "), {}, (state, current) => if List.ContainsAny(Text.ToList(current),{"0".."9","a".."z",",",":","?","/","\","," "}) then state else state & {current]),", "))
@Qqqqq You have several errors in what you typed. I suggest you copy/paste from the code I posted, then edit the relevant references. Or enter it into a add Column dialog (starting after the each) and edit the relevant references there.
Hmm, okay. I used copy/paste and the syntax error is gone, but the resulting column only shows Error in every cell.
@Qqqqq and what is the error?
How can I tell?
|
0

First formula just looks for all capitals that dont contain a number

= if [Column1] = Text.Remove ([Column1],{"0".."9","a".."z"}) then [Column1] else null

Second formula removes all numbers, THEN looks for all capitals that dont contain a number

= if Text.Remove ([Column1],{"0".."9"}) = Text.Remove ([Column1],{"a".."z","0".."9"}) then Text.Remove ([Column1],{"0".."9"}) else null

enter image description here

let  Source = Excel.CurrentWorkbook(){[Name="Table2"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
#"Added Custom" = Table.AddColumn(#"Changed Type", "Custom", each if [Column1] = Text.Remove ([Column1],{"0".."9","a".."z"}) then [Column1] else null),
#"Added Custom1" = Table.AddColumn(#"Added Custom", "Custom2", each if Text.Remove ([Column1],{"0".."9"}) = Text.Remove ([Column1],{"a".."z","0".."9"}) then Text.Remove ([Column1],{"0".."9"}) else null)
in  #"Added Custom1"

~ ~ ~

If you are looking to parse words out of a list of words.......

This retains words that (a) have no numbers before after or within, and (b) are all in capitals

enter image description here

let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
#"Added Custom" = Table.AddColumn(#"Changed Type", "Custom", each 
Text.Combine(
    List.RemoveNulls(
        List.Transform(Text.Split([Column1]," "), each 
        if  _ = Text.Remove (_,{"a".."z","0".."9"}) then _ else null
        ))," "))
in #"Added Custom"

This removes all numbers, and after that retains words that are all in capitals

enter image description here

let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
#"Added Custom" = Table.AddColumn(#"Changed Type", "Custom", each 
Text.Combine(
    List.RemoveNulls(
        List.Transform(Text.Split(Text.Remove ([Column1],{"0".."9"})," "), each 
        if  _ = Text.Remove (_,{"a".."z"}) then _ else null
        ))," "))
in #"Added Custom"

2 Comments

The first formula is probably more like what I'm looking for. I don't need to parse anything, just eliminate everything that isn't already an upper-case text string. Unfortunately, when I tried to use your formula, I got a column full of Error values.
maybe show Ron and I your sample input data and current code. We know our code works, its a question of implementation

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.