Mostrando las entradas con la etiqueta clean files. Mostrar todas las entradas
Mostrando las entradas con la etiqueta clean files. Mostrar todas las entradas

lunes, diciembre 09, 2013

Easy way to remove protocol from urls

A very easy way to remove http:// and https:// from urls stored on a file.

Let's assume you have a file (allURLS.txt) with one URL per line. And you want to remove the http:// and https:// and store the result on the file cleanedUrls.txt

Here a very easy way:

grep "http:" allURLS.txt | cut -b 1-7 --complement >> cleanedUrls.txt

grep "https:" allURLS.txt | cut -b 1-8 --complement > cleanedUrls.txt


cut -b does a substring between i-j, with the option complement you select "everything else"