Download dev

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Network Driver in Linux 2.4
潘仁義
CCU COMM
Overview
Auto Configuration
I/O access
Byte ordering
Address translation
Bus cycles
Bus
Direct Memory Access
Power management
Operating System
Driver framework
Timer management
Memory management
Race condition handling (SMP)
CPU/Memory cache consistency
Device
Device operations
Interrupt handling
Outline
Driver framework
Linux network drivers
Device operation
RTL8139 programming
Driver example
A piece of code for 93C46 series
EEPROM, 93C46 64 x 16 bits, 93C66 256 x 16 bits
pci_skeleton.c (for RTL8139)
Linux network driver framework
Connecting to the Kernel (1/2)
Module_loading
struct net_device snull_dev = { init : snull_init, }; //初始化函式
if((result = register_netdev(snull_dev)))) printk(“error”);
呼叫前, 先設定name 為“eth%d”, 以便其配置 “ethX”
函式內部會呼叫 devinit()
snull_init( )
Probe function
Called when register_netdev()
Usually avoid registering I/O and IRQ, delay until devopen() time
To fill in the “dev” strcuture
ether_setup(dev)
設定私有資料結構 “priv”; 網路介面生命期與系統一樣長, 可放統計資
料
Module_unloading
kfree(priv);
unregister_netdev (snull_dev);
Linux network driver framework
Connecting to the Kernel (2/2)
struct net_device {
char name[IFNAMSIZ]; // eth%d
unsigned long base_addr, unsigned char irq;
unsigned char broadcast[], dev_addr[MAX_ADDR_LEN];
unsigned short flags; // IFF_UP, IFF_PROMISC, IFF_ALLMULTI
Function pointers:
(*init) 初始化
(*open) 開啟介面
(*stop) 停用介面
(*do_ioctl)()
(*tx_timeout) 逾時處理
(*get_stats) 結算統計資訊
(*hard_start_xmit) 送出封包
(*set_multicast_list) 群播及flag變動處理
unsigned long trans_start, last_rx; // for watchdog and power management
struct dev_mc_list *mc_list; // multicast address list
Linux network driver framework
Opening and closing
在介面傳輸封包之前,必須先以ifconfig開啟介面,並賦予IP位址
ifconfig設定IP位址給介面時:
ioctl(SIOCSIFADDR)設定軟體位址給介面
Ioctl(SIOCSIFILAGS)要求驅動程式開啟、關閉介面觸動open及stop
open()
設法取得必要的系統資源(佔領IRQ, IObase, buffer)
要求介面硬體起動
讀出MAC, 複製到 devdev_addr (也可作在init或probe時)
將devdev_addr設定至介面MAC暫存器中
stop()
停止介面硬體
歸還系統資源
Linux network driver framework
Packet transmission: 當核心需要送出資料封包時
將資料排入出境封包佇列(outgoing queue)
呼叫作業方法
hard_start_transmit(struct sk_buff *skb, struct net_device *dev)
僅將封包交付網卡。網卡後續會再將封包傳送至網路(例如RTL8139)
Spinlock_t xmit_lock; 只有在返回後才有可能再被呼叫
實務上,於返回之後,網路卡仍忙著傳輸剛交付的封包。
網卡緩衝區小,滿了必須讓核心知道,不接收新的傳輸要求。
netif_stop_queue()與netif_wake_queue(),netif_start_queue()
註: 還有Carrier loss detection/Watchdog 的 netif_carrier_on/off()
跟Hot-plugging/power management 的 netif_device_attach/detach()
核心經手的每一封包,都是包裝成一個struct sk_buff
socket buffer
指向sk_buff的指標,通常取名為skb
skbdata指向即將被送出的封包
skblen是該封包的長度,單位是octet
Linux network driver framework
Transmission queuing model
If ( present &&
carrier_ok && queue_stopped &&
( jiffies – trans_start ) > watchdog_timeo
) Then
Call tx_timeout( )
更新統計,並設定使能繼續送封包
Present?
netif_device_attach()
netif_device_detach()
Packets
from OS
Queue stopped ?
netif_start_queue()
netif_wake_queue()
netif_stop_queue()
Carrier ok ?
netif_carrer_on()
netif_carrer_off()
Packets go to
the LAN
Linux network driver framework
Packet reception
封包接收事件通常是從網路硬體觸發中斷開始
多半寫在interrupt handler
配置一個sk_buff,並交給核心內部的網路子系統
Interrupt-based 較 polling方式有效率
Example: snull_rx()
skb = dev_alloc_skb(len+2); // 採用GFP_ATOMIC,可在ISR中用
skb_reserve(skb, 2); // 16 byte align the IP field
memcpy(skb_put(skb, len), receive_packet, len); //skb_put()參考sk_buff
填寫相關資訊
skbdev = dev;
skbprotocol = eth_type_trans(skb, dev);
skbip_summed = CHECKSUM_UNNECESSARY; /* 不必檢查 */
CHECKSUM_HW(硬體算了)/NONE(待算,預設)/UNNECESSARY(不算)
netif_rx(skb); // 交給核心內部的網路子系統
Linux network driver framework
The interrupt handler
Interrupt happen when
A new packet has arrived
Transmission of an outgoing packet is completed
Something happened: PCI bus error, cable length change, time out
Interrupt status register (ISR)
Packet reception
Pass to the kernel
Packet transmission is completed
Reset the transmit buffer of the interface
Statistics
Linux network driver framework
The socket buffers (struct sk_buff)
head
data
headroom
len
payload
tail
end
tailroom
An empty sk_buff
struct sk_buff *dev_alloc_skb(len) 配置
void dev_kfree_skb(struct sk_buff *)釋放
void skb_reserve(skb, len)保留前頭空間
unsigned char *skb_put(skb, len)附加資料
unsigned char*skb_push(skb, len)前置資料
unsigned char *skb_pull(skb, len)前抽資料
Linux network driver framework
Setup receive mode and multicast accept list
Unicast, broadcast (all 1), multicast (bit0==1)
Receive all, receive all multicast, receive a list of multicast address
Transmit
the same as unicast
Receive
Hardware filtering for a list of multicast addresses
void (*set_multicast_list)(dev)
要接收的群播位址清單或是dev->flags有改變, 會被核心呼叫
struct dev_mc_list *mc_list; // int mc_count
串列所有dev必須接收的所有群播位址
IFF_PROMISC
設立則進入『混雜模式』(全收)
IFF_ALLMULTI
收進所有群播封包
Outline
Driver framework
Linux network drivers
Device operation
RTL8139 programming
Driver example
A piece of code for 93C46 series
EEPROM, 93C46 64 x 16 bits, 93C66 256 x 16 bits
pci_skeleton.c (for RTL8139)
RTL8139 block diagram
Device operation
RTL8139(A/B) programming
Packet transmission
4 transmit descriptors in round-robin
Transmit FIFO and Early Transmit
Packet reception
Ring buffer in a physical continuous memory
Receive FIFO and FIFO Threshold
Hardware initialization
Command register (0x37)
Reset (4) / Transmit Enable (2) / Receive Enable (3) / Buffer empty (0)
Transmit (Tx) Configuration Register (0x40~0x43)
Interframe Gap time (螃蟹卡) (25~24)
Receive (Rx) Configuration Register (0x44~0x47)
Rx FIFO threshold (15~13)
Accept Broadcast (3) / Multicast (2) / All (0, Promiscuous mode) packet
Rx buffer length (12~11)
Interrupt Mask Register (0x3C~0x3D)
Software initialization (TxDescriptor and Ring buffer)
RTL8139 Packet transmission
Transmit descriptor
Transmit start address (TSAD0-3)
The physical address of packet
The packet must be in a continuous physical memory
Transmit status(TSD0-3)
TOK(15R)
Set to 1 indicates packet transmission was completed successfully and
no transmit underrun (14R) has occurred
OWN(13R/W)
Set to 1 when the Tx DMA operation of this descriptor was completed
The driver must set this bit to 0 when the “Size” is written
Size(12~0R/W)
The total size in bytes of the data in this descriptor
Early Tx Threshold(21~16R/W)
When the byte count in the Tx FIFO reaches this, the transmit happens.
From 000001 to 111111 in unit of 32 bytes (000000 = 8 bytes)
RTL8139 Packet transmission
Process of transmitting a packet
1. copy the packet to a physically continuous buffer in memory
2. Write the functioning descriptor
Address, Size, Early transmit threshold, Clear OWN bit (this starts PCI operation)
3.
4.
5.
6.
7.
As TxFIFO meet threshold, the chip start to move from FIFO to line
When the whole packet is moved to FIFO, the OWN bit is set to 1
When the whole packet is moved to line, the TOK(TSD) is set to 1
If TOK(IMR) is set, then TOK(ISR) is set and a interrupt is triggered
Interrupt service routine called, driver should clear TOK(ISR)
Packet reception
Ring buffer
1. Data goes to RxFIFO
coming from line
2. Move to the buffer
when early receive
threshold is meet.
Ring buffer
physical continuous
CBR (0x3A~3B R)
the Current address
of data moved to
Buffer
CAPR (0x38~39 R/W)
the pointer keeps
Current Address of
Pkt having been read
Status of receiving a
packet
stored in front of the
packet (packet header)
Packet reception
The Packet Header (32 bits, i.e. 4 bytes)
Bit 31~16: rx_size, including 4 bytes CRC in the tail
pkt_size = rx_size - 4
Packet reception
Process of packet receive in detail
Data received from line is stored in the receive FIFO
When Early Receive Threshold is meet, data is
moved from FIFO to Receive Buffer
After the whole packet is moved from FIFO to
Receive Buffer, the receive packet header (receive
status and packet length) is written in front of the
packet.
CBA is updated to the end of the packet. 4 byte alignment
CMD (BufferEmpty) is clear and ISR(ROK) is set.
ISR routine called and then driver clear ISR(ROK)
Packet
and update CAPR
cur_rx = (cur_rx + rx_size + 4 + 3) & ~3;
NETDRV_W16_F (RxBufPtr, cur_rx - 16);
header
Avoid
overflow
Outline
Driver framework
Linux network drivers
Device operation
RTL8139 programming
Driver example
A piece of code for 93C46 series
EEPROM, 93C46 64 x 16 bits, 93C66 256 x 16 bits
pci_skeleton.c (for RTL8139)
EEPROM 93C46 operations
93C46 Command Register (0x50 R/W)
A piece code for EEPROM 93C46
addr_len = read_eeprom (ioaddr, 0, 8) == 0x8129 ? 8 : 6;
11.
for (i = 0; i < 3; i++) ((u16 *) (dev->dev_addr))[i]
12.
= le16_to_cpu (read_eeprom (ioaddr, i + 7, addr_len));
13.
#define EE_SHIFT_CLK 0x04 /* EEPROM shift clock. */
#define EE_CS
0x08 /* EEPROM chip select. */
#define EE_DATA_WRITE 0x02 /* EEPROM chip data in. */
#define EE_DATA_READ 0x01 /* EEPROM chip data out. */
#define EE_ENB
(0x80 | EE_CS)
#define eeprom_delay()
readl(ee_addr)
/* EEPROM commands include the alway-set leading bit */
#define EE_WRITE_CMD
(5)
#define EE_READ_CMD
(6)
#define EE_ERASE_CMD
(7)
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
static int __devinit read_eeprom (
void *ioaddr, int location, int addr_len)
{
int i;
unsigned retval = 0;
void *ee_addr = ioaddr + Cfg9346;
int read_cmd = location |
(EE_READ_CMD << addr_len);
writeb (EE_ENB & ~EE_CS, ee_addr);
writeb (EE_ENB, ee_addr);
eeprom_delay ();
/* Shift the read command bits out. */
for (i = 4 + addr_len; i >= 0; i--) {
int dataval = (read_cmd & (1 << i))
? EE_DATA_WRITE : 0;
writeb (EE_ENB | dataval, ee_addr);
eeprom_delay ();
writeb (EE_ENB | dataval | EE_SHIFT_CLK,
ee_addr);
eeprom_delay ();
}
writeb (EE_ENB, ee_addr);
eeprom_delay ();
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31.
32.
for (i = 16; i > 0; i--) {
writeb (EE_ENB | EE_SHIFT_CLK, ee_addr);
eeprom_delay ();
retval = (retval << 1) | ((readb (ee_addr) &
EE_DATA_READ) ? 1 : 0);
writeb (EE_ENB, ee_addr);
eeprom_delay ();
}
/* Terminate the EEPROM access. */
writeb (~EE_CS, ee_addr);
eeprom_delay ();
return retval;
}
Outline
Driver framework
Linux network drivers
Device operation
RTL8139 programming
Driver example
A piece of code for 93C46 series
EEPROM, 93C46 64 x 16 bits, 93C66 256 x 16 bits
pci_skeleton.c (for RTL8139)
#include<> of the RTL8139
module_init()
module_exit()
Definitions of I/O
port read/write and
ioremap()
spinlock.h
config.h
MOD_*
MODULE_*()
barrier()
printk()
byteorder.h
Operating System
module.h
init.h
kernel.h
delay.h
PCI BUS
udelay()
definition
asm/io.h
pci-skeleton.c
crc32.h
pci.h
skbuff.h
PCI defines and prototypes
pci_alloc_consistent()
pci_resource_*()
pci_request_regions()
pci_set_master()
pci_read_config_word()(err)
Network Device
mii.h
Definitions for MII_ADVERTISE, MII_LPA
ADVERTISE_FULL, LPA_100FULL…
etherdevice.h
netdevice.h
Definitions for Ethernet
eth_type_trans()
alloc_ethdev()
給multicast算
ether_crc()
被間接引入
sched.h (irq,
jiffies,capable)
slab.h
time.h
spinlock.h
asm/atomic.h
Definitions for
struct net_device
register_netdev()
netif_*()
skbuff.h
Driver structure of the RTL8139
pci_module_init() / pci_unregister_driver()
static struct pci_driver netdrv_pci_driver = {
name:
"netdrv",
id_table:
netdrv_pci_tbl,
probe:
netdrv_init_one,
remove:
netdrv_remove_one,
#ifdef CONFIG_PM
suspend:
netdrv_suspend,
resume:
netdrv_resume,
static struct pci_device_id netdrv_pci_tbl[] __devinitdata = {
{0x10ec, 0x8139, PCI_ANY_ID, PCI_ANY_ID, 0, 0, RTL8139 },
MODULE_DEVICE_TABLE (pci, netdrv_pci_tbl);
driver_data
(Private, Sq# here)
pci_device_id
PCI device probe function
netdrv_init_one()
Linux invoke when probing
netdrv_init_one()
struct pci_dev *pdev,
struct pci_device_id *ent
call netdrv_init_board()
netdrv_init_board()
to get net_device dev,
void *ioaddr
dev = alloc_etherdev(sizeof())
Initial net_device dev
Set up dev_addr[], irq, base_addr
Set up method:
dev->open,
dev->hard_start_transmit,
dev->stop,
dev->get_stats,
dev->set_multicast_list,
dev->do_ioctl,
dev->tx_timeout
登記I/O port
and memory
pci_enable_device (pdev);
pci_request_regions (pdev, “pci-sk");
pci_set_master (pdev);
mmio_start = pci_resource_start (pdev, 1);
ioaddr = ioremap (mmio_start, len);
Soft reset the chip.
NETDRV_W8 (ChipCmd, (NETDRV_R8 (ChipCmd)
& ChipCmdClear) | CmdReset);
identify chip attached to board
register_netdev (dev); // ethX
NETDRV_W?()
/* write MMIO register, with flush */
/* Flush avoids rtl8139 bug w/ posted MMIO writes */
#define NETDRV_W8_F(reg, val8)
do { writeb ((val8), ioaddr + (reg)); readb (ioaddr + (reg)); } while (0)
#define NETDRV_W16_F(reg, val16)
do { writew ((val16), ioaddr + (reg)); readw (ioaddr + (reg)); } while (0)
#define NETDRV_W32_F(reg, val32)
do { writel ((val32), ioaddr + (reg)); readl (ioaddr + (reg)); } while (0)
#define NETDRV_W8
NETDRV_W8_F
#define NETDRV_W16
NETDRV_W16_F
#define NETDRV_W32
NETDRV_W32_F
#define NETDRV_R8(reg)
readb (ioaddr + (reg))
#define NETDRV_R16(reg)
readw (ioaddr + (reg))
#define NETDRV_R32(reg)
((unsigned long) readl (ioaddr + (reg)))
Device methods
dev->open
int netdrv_open (struct net_device *dev);
dev->hard_start_transmit
int netdrv_start_xmit (struct sk_buff *skb, struct net_device *dev);
dev->stop
int netdrv_close (…);
dev->get_stats
struct net_device_stats * netdrv_get_stats (struct net_device *);
dev->set_multicast_list
void netdrv_set_rx_mode (…);
dev->do_ioctl
int netdrv_ioctl (struct net_device *dev, struct ifreq *rq, int cmd);
dev->tx_timeout
void netdrv_tx_timeout (struct net_device *dev);
Up up……
netdrv_open()
netdrv_open()
request_irq (dev->irq, netdrv_interrupt,
SA_SHIRQ, dev->name, dev)
tx_bufs = pci_alloc_consistent(pdev,
TXBUFLEN, &tx_bufs_dma);
rx_ring = pci_alloc_consistent(pdev,
RXBUFLEN, &rx_ring_dma);
netdrv_hw_start (dev)
Soft reset the chip
/* Restore our idea of the MAC address. */
NETDRV_W32_F (MAC0 + 0, cpu_to_le32
(*(u32 *) (dev->dev_addr + 0)));
NETDRV_W32_F (MAC0 + 4, cpu_to_le32
(*(u32 *) (dev->dev_addr + 4)));
NETDRV_W8_F (ChipCmd,
(NETDRV_R8 (ChipCmd) & ChipCmdClear) |
CmdRxEnb | CmdTxEnb);
netdrv_init_ring (dev);
Setting RxConfig and TxConfig
netdrv_hw_start (dev);
NETDRV_W32_F (RxBuf, tp->rx_ring_dma);
init Tx buffer DMA addresses
Set the timer to check for link beat
netdrv_set_rx_mode (dev);
NETDRV_W16_F (IntrMask, netdrv_intr_mask);
netif_start_queue (dev);
Setup receive mode and multicast hashtable
(*set_multicast_list)()
netdrv_set_rx_mode()
if (flags & IFF_PROMISC)
AcceptBroadcast | AcceptMulticast | AcceptMyPhys | AcceptAllPhy
mc_filter[1] = mc_filter[0] = 0xffffffff
else if ((mc_count > multicast_filter_limit)
|| (flags & IFF_ALLMULTI))
AcceptBroadcast | AcceptMulticast | AcceptMyPhys
mc_filter[1] = mc_filter[0] = 0xffffffff
else
AcceptBroadcast | AcceptMulticast | AcceptMyPhys
mclist[0].dmi_addr
mclist[1].dmi_addr
ether_crc()
31 30 29 28 27 26
25...0
mclist[2].dmi_addr
63 62
1 0
Transmit a packet
netdrv_start_xmit()
netdrv_start_xmit()
if (skb->len < ETH_ZLEN)
skb = skb_padto(skb, ETH_ZLEN);
entry = atomic_read (&cur_tx) % NUM_TX_DESC;
tx_info[entry].skb = skb;
memcpy (tx_buf[entry], skb->data, skb->len);
NETDRV_W32 (TxStatus[entry], tx_flag | skb->len);
dev->trans_start = jiffies;
atomic_inc (&cur_tx);
if ((atomic_read (&cur_tx) - atomic_read (&dirty_tx)) >= NUM_TX_DESC)
netif_stop_queue (dev);
0
1
2
dirty_tx
3
0
1
2
cur_tx
Interrupt handling
netdrv_interrupt()
spin_lock (&tp->lock);
status = NETDRV_R16 (IntrStatus);
NETDRV_W16_F (IntrStatus, status); // Acknowledge
Spec says, “The ISR bits are always set to 1 if the condition is present. ”
Spec says, “Reading the ISR clears all. Writing to the ISR has no effect.”
if (status & (PCIErr | PCSTimeout | RxUnderrun |
RxOverflow |RxFIFOOver | TxErr | RxErr))
netdrv_weird_interrupt (dev, tp, ioaddr, status, link_changed);
if (RxOK | RxUnderrun | RxOverflow | RxFIFOOver)
netdrv_rx_interrupt (dev, tp, ioaddr);
ISR 0 1 1 1 0 0 1 1
if (status & (TxOK | TxErr))
netdrv_tx_interrupt (dev, tp, ioaddr);
IMR 0 0 1 1 0 0 1 0
spin_unlock (&tp->lock);
Interrupt
Interrupt handling
netdrv_tx_interrupt(dev, tp, ioaddr)
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
dirty_tx = atomic_read (&tp->dirty_tx);
cur_tx = atomic_read (&tp->cur_tx);
tx_left = cur_tx - dirty_tx;
while (tx_left > 0) {
int entry = dirty_tx % NUM_TX_DESC;
int txstatus = NETDRV_R32 (TxStatus[entry]);
if (!(txstatus & (TxStatOK | TxUnderrun | TxAborted))) break; /* It still hasn't been Txed */
if (txstatus & (TxOutOfWindow | TxAborted)) { /* There was an major error, log it. */
tp->stats.tx_errors++;
} else {
if (txstatus & TxUnderrun) /* Add 64 to the Tx FIFO threshold. */
tp->tx_flag += 0x00020000;
tp->stats.tx_bytes += txstatus & 0x7ff;
tp->stats.tx_packets++;
}
dev_kfree_skb_irq (tp->tx_info[entry].skb);
tp->tx_info[entry].skb = NULL;
dirty_tx++;
if (netif_queue_stopped (dev))
netif_wake_queue (dev);
cur_tx = atomic_read (&tp->cur_tx);
tx_left = cur_tx - dirty_tx;
}
atomic_set (&tp->dirty_tx, dirty_tx);
Interrupt handling
Packet reception
netdrv_rx_interrupt (dev,tp, ioaddr)
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
rx_ring = tp->rx_ring;
cur_rx = tp->cur_rx;
while ((NETDRV_R8 (ChipCmd) & RxBufEmpty) == 0) {
ring_offset = cur_rx % RX_BUF_LEN;
rx_status = le32_to_cpu (*(u32 *) (rx_ring + ring_offset));
rx_size = rx_status >> 16;
pkt_size = rx_size - 4;
skb = dev_alloc_skb (pkt_size + 2);
skb->dev = dev;
skb_reserve (skb, 2);
/* 16 byte align the IP fields. */
eth_copy_and_sum (skb, &rx_ring[ring_offset + 4], pkt_size, 0);
skb_put (skb, pkt_size);
skb->protocol = eth_type_trans (skb, dev);
netif_rx (skb);
dev->last_rx = jiffies;
tp->stats.rx_bytes += pkt_size;
tp->stats.rx_packets++;
cur_rx = (cur_rx + rx_size + 4 + 3) & ~3;
NETDRV_W16_F (RxBufPtr, cur_rx - 16);
}
Status
packet
tp->cur_rx = cur_rx;
CRC
Related documents