Repo for storing data cleaning tips, tricks, tools
=LEFT(text, [num_chars])
i.e. to get the first 4 characters from the beginning of a text string, use this formula:
=LEFT(A2,4)
=RIGHT(text, [num_chars])
i.e. to get the last 4 characters from the end of a string, use this formula:
=RIGHT(A2,4)
=MID(text, start_num, num_chars)
i.e. to get three characters from the middle of a string beginning with the 6th character, you use the following formula:
=MID(A2,6,3)
This is hacky
=MID(A1,FIND("DISTRICT ",A1)+9,99)
Where we are trying to extract everything after the word "district." +9 is the number of characters in the word we're looking for and 99 will pull out 99 characters after that word.
=LEFT(A2, SEARCH("-",A2)-1)
i.e.
=RIGHT(A6,LEN(A6)-SEARCH(" ",A6))
MID(cell, SEARCH("char", cell)+1, SEARCH ("char", cell, SEARCH ("char", cell)+1) - SEARCH ("char", cell)-1)
i.e. to extract text surrounded by two hyphens (EC1-1BB2-AC12), you'd use this formula:
=MID(A2, SEARCH("-",A2) + 1, SEARCH("-",A2,SEARCH("-",A2)+1) - SEARCH("-",A2) - 1)
=LEFT(K3,SEARCH(",",K3)-1)
This looks at this cell: P.O. BOX 76655, 20013
and brings over all of the text up until one character before it sees the comma!
First, find where to start the extraction by determining the position of the first digit with this generic formula:
MIN(SEARCH({0,1,2,3,4,5,6,7,8,9},cell&"0123456789"))
Then, use the RIGHT function to extract the number.
=RIGHT(A2, LEN(A2)-B2+1)
(Where B2 is the position of the first digit.)
=LEFT(A1,MIN(FIND({0,1,2,3,4,5,6,7,8,9},A1&"0123456789"))-1)
=LEFT(C8,FIND("*",SUBSTITUTE(C8," ","*",LEN(C8)-LEN(SUBSTITUTE(C8," ",""))))-1)
=RIGHT(C2,LEN(C2)-FIND("*",SUBSTITUTE(C2," ","*",LEN(C2)-LEN(SUBSTITUTE(C2," ","")))))
=ISNUMBER(SEARCH("apple",A1))
This would search for the word "apple" in cell A1.
=COUNTIF(J:J,J2)>1
This would look through all of column J for repeat instances of string in J2. All duplicated cells will be marked TRUE.
=IF(COUNTIF($J$2:$J2,J2)>1,"DUP","")
This would look through all of column J for repeat instances of string in J2. The first instance of a duplicated string will be blank, but all subsequent inastanced will be marked "DUP".
Check if a cell has "this" or "that," then take a particular action. If the cells do not, then do something else.
=IF(OR(B2=D2,B2>100)B2*D2,FALSE)
This checks if B2 and D2 are equal, or if B2 is greater than 100. If either of these are true, then the cell will calculate B2xD2, if not it will return false. (I have not tested this example.)
=IF(AND(B2=D2,B2>100)B2*D2,FALSE)
For this example, both conditions would need to be true in order to multiply B2 and D2, otherwise it will return FALSE.
`=C2&" "&C3' This combines the string from C2 and C3 together with a space between.
=CONCATENATE(A1, ", ", B1)
or
=A1 & ", " & B1
=B2&TEXT(C2,REPT("0",3))
In this sample B2 has text and C2 has 3 numbers with leading zeros. the "3" indicates how many of the "0" to add.
=EXACT(T2,E2)
-
Insert/PivotTable/OK
-
Typically, add field(s) to Rows
-
Add calculation field to Values & adjust calculation as needed
-
Add a unique count to a pivot table:
When inserting pivot, click tick box "add this data to the Data Model" and this will add "distinct count" to your Value Field Settings for the variable you are counting.
A1: Smith, John
A2: 111 Pine St.
A3: San Diego, CA
A4: (555) 128-549
A5: Jones, Sue
A6: 222 Oak Ln.
A7: New York, NY
A8: (555) 238-1845
A9: Anderson, Tom
A10: 333 Cherry Ave.
A11: Chicago, IL
A12: (555) 581-4914
In cell C1, type:
=OFFSET($A$1,(ROW()-1)*4+INT((COLUMN()-3)),MOD(COLUMN()-3,1))
then fill down to row 3.
In empty column:
=TEXT(B1,"yyyymm")
or
=TEXT(B1,"mm/yyyy")
, etc.
=TEXT(B1,"mmm")
=VLOOKUP(B1,E2:E100,1,FALSE)
In this sample, B1 refers to the cell with the value we're trying to match. E2:E100 is the range of values we're looking in to find the matching value (this can also span multiple columns - E2:G50 - or sheets - Sheet2!B2:B50. The number 1 refers to the 1st column and FALSE tells it we're looking for an exact match, where TRUE would tell it that it has to be a close match.
=MAXIFS($T$2:$T$29648,$R$2:$R$29648,R2)
In this sample, T2:T29648 (max_range) refers to the range of cells where you want to find the max values in. R2:R29648 (criteria_range) is the range within you want to identify your sub-groups (i.e.: it will group together cells/rows where this value is the same). R2 (criteria) is the criteria I will be using to find all other matching uses within the criteria_range. You can use multiple criteria_range/criteria cases to the one formula. This was used for finding the highest vote count for each Race_ID.
You can add multiple criteria too
=MAXIFS($E$2:$E$29659,$D$2:$D$29659,D2,$S$2:$S$29659,S2)
This essentially would group matching items first in column D and then group matching items in column S.
=IF(ISNUMBER(SEARCH("District",E2)),E2,"No")
In this sample, E2 is the cell we're searching in for the string "District." Not an exact match; E2 can contain other letters/numbers.
{=MEDIAN(IF($B$2:$B$31=B2,$C$2:$C$31))}
To get curly brackets, press Ctrl + Shift + Enter at end of formula. This tells Excel that this is an array. This is key.
This forumal is saying look in the range B2 - B31 and if you find a match to B2 then use the corresponding cell in C2 - C31.
Then, in your pivot table add your median column and change the value field settings to Average.
Hold CTRL down when highlighting what you want to paste. Then, when you paste - it should put each item in the list in its own cell.
Open command prompt and file explorer.
cd into correct file location.
Type:
dir /b > filenames.txt
txt file with all file names with be located in the folder.