Extracting URL:s from files with PowerShell
Recently I needed to extract all URL:s from several files. I thought this was a fun little challenge where I could improve my limited skills in PowerShell and regular expressions.
Solution
After having been working on the problem for a while, I ended up having this little code.
Console
In short, the script does this:
- Find all files with the file extensions, including subdirectories.
- Reads the content in each file.
- Get all strings matching the regular expression pattern that I found in this StackOverflow thread.
- Loops thru all
Matches
expression. - Select the
Value
property. - Sorts the output.
- Get all unique values.
If you want to know how many instances there is of every URL, you could use
Group-Object
instead.
Console
Summary
I am really not an expert on PowerShell, so I learned a bit while doing this. Solving these little tiny problems is always fun. I found the solutions to be especially pleasing when it is just a single line of code.