Ver código fonte

fix(database): 修复数据插入时重复记录问题

- 在批量插入数据时添加 ignore 参数以避免重复错误
- 在数据字典中新增采集日期字段crawl_date
- 保证插入操作的稳定性和数据完整性
charley 1 semana atrás
pai
commit
980a7dfd73
1 arquivos alterados com 3 adições e 2 exclusões
  1. 3 2
      cgc_spider/cgc_category_spider.py

+ 3 - 2
cgc_spider/cgc_category_spider.py

@@ -92,13 +92,14 @@ def get_single_category_info(log, card_type, url, sql_pool):
             "card_type": card_type,
             "category_id": research_category_id,
             "category_name": name,
-            "population_count": population_count
+            "population_count": population_count,
+            "crawl_date": time.strftime("%Y-%m-%d", time.localtime())
         }
         info_list.append(data_dict)
 
     # 保存信息到数据库
     try:
-        sql_pool.insert_many(table="cgc_category_record", data_list=info_list)
+        sql_pool.insert_many(table="cgc_category_record", data_list=info_list, ignore=True)
     except Exception as e:
         log.error(f"Error inserting data into database: {e}")