Django models for storing web scrapper data

Django -- Posted on Oct. 12, 2023

  • Element:

    • Represents an HTML element.
    • It has a name field, which is a character field with a maximum length of 200.
    • Inherits from the Timestamped model (which is not shown here but probably contains timestamp fields like created_at and updated_at).
    • Defines a custom string representation using the __str__ method.
  • Domain:

    • Represents a domain.
    • It has a name field, which is a character field with a maximum length of 200.
    • Inherits from the Timestamped model.
    • Defines a custom string representation using the __str__ method.
  • Page:

    • Represents a web page.
    • It has the following fields:
      • url: a URL field with a maximum length of 2048 characters.
      • domain: a foreign key to the Domain model, indicating the domain to which the page belongs.
      • pages: a many-to-many self-referential relationship to other Page instances, allowing pages to have children.
      • response: a text field for storing the page's content.
      • elements: a many-to-many relationship with the Element model through the PageElement intermediary model.
    • Inherits from the Timestamped model.
    • Defines a custom string representation using the __str__ method.
  • PageElement:

    • Serves as an intermediary model between Page and Element to associate elements with pages and store additional information.
    • It has the following fields:
      • page: a foreign key to the Page model.
      • element: a foreign key to the Element model.
      • value: a text field for storing additional information related to the element on the page.
  • Attribute:

    • Represents attributes associated with an Element.
    • It has the following fields:
      • element: a foreign key to the Element model, indicating the element to which the attribute belongs.
      • name: a character field for the attribute name with a maximum length of 255 characters.
      • value: a text field for storing the attribute's value.
    • Inherits from the Timestamped model.
    • Defines a custom string representation using the __str__ method.

                from django.db import models

# Create your models here.

class Timestamped(models.Model):
    created = models.DateTimeField(auto_now_add=True)
    updated = models.DateTimeField(auto_now=True)

    class Meta:
        abstract = True

class Element(Timestamped):
    name = models.CharField(max_length=200)

    class Meta:
        default_related_name = 'elements'
        verbose_name = 'element'
        verbose_name_plural = 'elements'

    def __str__(self):

class Domain(Timestamped):
    name = models.CharField(max_length=200)

    class Meta:
        default_related_name = 'domains'
        verbose_name = 'domain'
        verbose_name_plural = 'domains'

    def __str__(self):

class Page(Timestamped):
    url = models.URLField(max_length=2048)
    domain = models.ForeignKey("Domain", on_delete=models.CASCADE)
    pages = models.ManyToManyField("self",symmetrical=False,related_name='children')
    response = models.TextField(null=True,blank=True)
    elements = models.ManyToManyField("Element",through="PageElement")

    class Meta:
        default_related_name = 'pages'
        verbose_name = 'page'
        verbose_name_plural = 'page'

    def __str__(self):
        return self.url

class PageElement(Timestamped):
    page = models.ForeignKey("Page", on_delete=models.CASCADE)
    element = models.ForeignKey("Element", on_delete=models.CASCADE)
    value = models.TextField(null=True,blank=True)


class Attribute(Timestamped):
    element = models.ForeignKey("Element", on_delete=models.CASCADE)
    name = models.CharField(max_length=255)
    value = models.TextField(null=True,blank=True)

    class Meta:
        default_related_name = 'attributes'
        verbose_name = 'attribute'
        verbose_name_plural = 'attributes'

    def __str__(self):

Related Posts