Answered You can hire a professional tutor to get the answer.

QUESTION

I am writing a webscraper in Python and I have figured out how to write the output to a CSV, but I'm not getting the right amount of links I need.

I am writing a webscraper in Python and I have figured out how to write the output to a CSV, but I'm not getting the right amount of links I need. I need to make sure I'm translating relative URL's to absolute URLs and filtering out duplicates, but I'm not sure i have the code to do so. Here's my code below.

from bs4 import BeautifulSoup

import requests

import csv

import re

url = 'https://www.census.gov/programs-surveys/popest.html'

#opening up the connection and grabbing the page

r = requests.get(url).content

#passing the HTML through a parser

soup = BeautifulSoup(r, 'lxml')

#extracting urls

data = []

for link in soup.find_all('a', href=True):

   print(link['href'])

   data.append(link['href'])

print(data)

#writing to a csv file

with open('assignment1.csv', 'w', newline = '') as f:

   write = csv.writer(f, delimiter = ' ')

   write.writerows(['Links'])

   write.writerows(data)

f.close()

Show more
LEARN MORE EFFECTIVELY AND GET BETTER GRADES!
Ask a Question