Answered You can hire a professional tutor to get the answer.
I am writing a webscraper in Python and I have figured out how to write the output to a CSV, but I'm not getting the right amount of links I need.
I am writing a webscraper in Python and I have figured out how to write the output to a CSV, but I'm not getting the right amount of links I need. I need to make sure I'm translating relative URL's to absolute URLs and filtering out duplicates, but I'm not sure i have the code to do so. Here's my code below.
from bs4 import BeautifulSoup
import requests
import csv
import re
url = 'https://www.census.gov/programs-surveys/popest.html'
#opening up the connection and grabbing the page
r = requests.get(url).content
#passing the HTML through a parser
soup = BeautifulSoup(r, 'lxml')
#extracting urls
data = []
for link in soup.find_all('a', href=True):
print(link['href'])
data.append(link['href'])
print(data)
#writing to a csv file
with open('assignment1.csv', 'w', newline = '') as f:
write = csv.writer(f, delimiter = ' ')
write.writerows(['Links'])
write.writerows(data)
f.close()